Nir Soffer has uploaded a new change for review.
Change subject: log: Use INFO log level as default
......................................................................
log: Use INFO log level as default
The current logs are much too verbose which cause trouble for users, and
make us look unprofessional. Mature project should not use debug log by
default.
To debug issues that are not clear enough using INFO logs, the relevant
logger level can be modified on a user machine as needed.
Change-Id: I767dcd9bad7b9fbeebb438e9ef13cb0ec3f042ee
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm/logger.conf.in
1 file changed, 4 insertions(+), 4 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/04/32504/1
diff --git a/vdsm/logger.conf.in b/vdsm/logger.conf.in
index 64b154f..8e963dd 100644
--- a/vdsm/logger.conf.in
+++ b/vdsm/logger.conf.in
@@ -8,18 +8,18 @@
keys=long,simple,none,sysform
[logger_root]
-level=DEBUG
+level=INFO
handlers=syslog,logfile
propagate=0
[logger_vds]
-level=DEBUG
+level=INFO
handlers=syslog,logfile
qualname=vds
propagate=0
[logger_Storage]
-level=DEBUG
+level=INFO
handlers=logfile
qualname=Storage
propagate=0
@@ -31,7 +31,7 @@
propagate=1
[logger_connectivity]
-level=DEBUG
+level=INFO
handlers=connlogfile
qualname=connectivity
propagate=0
--
To view, visit http://gerrit.ovirt.org/32504
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I767dcd9bad7b9fbeebb438e9ef13cb0ec3f042ee
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>
Nir Soffer has uploaded a new change for review.
Change subject: lvm: Exclude faulty devices from lvm long filter
......................................................................
lvm: Exclude faulty devices from lvm long filter
lvm commands use filter to limit access to relevant devices. When the
filter includes a faulty device, lvm commands may block for several
minutes (stuck in D state). We have seen getDevicesList command stuck
for up to 10 minutes because of faulty devices in the long filter.
We used to build the filter from all multipath devices. Now we build the
filter only from devices which have at least one active paths.
# multipath -ll
360060160f4a0300038ed7058b5e9e311 dm-0 DGC ,VRAID
size=15G features='0' hwhandler='1 emc' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| `- 4:0:3:0 sdd 8:48 failed faulty running
`-+- policy='service-time 0' prio=0 status=enabled
`- 4:0:2:0 sdb 8:16 failed faulty running
360060160f4a030003268ab211002e411 dm-1 DGC ,VRAID
size=30G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='service-time 0' prio=4 status=active
| `- 4:0:3:1 sde 8:64 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 4:0:2:1 sdc 8:32 active ready running
Previously, both devices were included in the filter, now only
360060160f4a030003268ab211002e411 will be included in lvm filter.
A faulty device which became active again will be included in lvm filter
after the next refresh (every 5 minutes), or after trying edit or create
a new storage domain.
lvm uses also short filter, including devices used by the certain vg or
lv. It is possible that we also have to exclude such devices from the
short filter. This will be handled later if needed.
Change-Id: I6d7a973bcefa95813fdc289847760c0955aca30c
Bug-Url: https://bugzilla.redhat.com/880738
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm/storage/lvm.py
M vdsm/storage/multipath.py
2 files changed, 13 insertions(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/75/31875/1
diff --git a/vdsm/storage/lvm.py b/vdsm/storage/lvm.py
index 86edf55..9cfad01 100644
--- a/vdsm/storage/lvm.py
+++ b/vdsm/storage/lvm.py
@@ -244,7 +244,7 @@
if not self._filterStale:
return self._extraCfg
- self._extraCfg = _buildConfig(multipath.getMPDevNamesIter())
+ self._extraCfg = _buildConfig(multipath.getActiveMPDevNamesIter())
_updateLvmConf(self._extraCfg)
self._filterStale = False
diff --git a/vdsm/storage/multipath.py b/vdsm/storage/multipath.py
index ba98866..2b30995 100644
--- a/vdsm/storage/multipath.py
+++ b/vdsm/storage/multipath.py
@@ -382,6 +382,18 @@
yield os.path.join(devicemapper.DMPATH_PREFIX, name)
+def getActiveMPDevNamesIter():
+ status = devicemapper.getPathsStatus()
+ for dmId, guid in getMPDevsIter():
+ active = [slave for slave in devicemapper.getSlaves(dmId)
+ if status.get(slave) == "active"]
+ if not active:
+ log.warning("Skipping device %s - no active slave", guid)
+ continue
+ log.debug("Found device %s %s", guid, active)
+ yield os.path.join(devicemapper.DMPATH_PREFIX, guid)
+
+
def getMPDevsIter():
"""
Collect the list of all the multipath block devices.
--
To view, visit http://gerrit.ovirt.org/31875
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6d7a973bcefa95813fdc289847760c0955aca30c
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>
Nir Soffer has uploaded a new change for review.
Change subject: lvm: Decrease number of retries before failing
......................................................................
lvm: Decrease number of retries before failing
Previously we configured lvm to stop accessing a device during an lvm
opoeration after 3 errors. This can cause lvm to block 3 times when
trying to access inaccessible device. With current iscsi settings, each
block can be 120 seconds, total 360 seconds. We have seen lvm block for
couple of minutes in such cases.
The retries seems uneeded when working with multiple paths, as multipath
already retry all available paths after SCSI errors on one path. However
when working with single path, multipath should fail after one try.
According to lvm developer this may decrease the time lvm is blocked
when devices are not accesible.
(Not tested yet)
Change-Id: I5d11abaaff45ce86e88c6589264e162318ac1f1d
Relates-To: https://bugzilla.redhat.com/880738
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm/storage/lvm.py
1 file changed, 1 insertion(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/56/32356/1
diff --git a/vdsm/storage/lvm.py b/vdsm/storage/lvm.py
index 86edf55..f760f7b 100644
--- a/vdsm/storage/lvm.py
+++ b/vdsm/storage/lvm.py
@@ -106,7 +106,7 @@
preferred_names = ["^/dev/mapper/"]
ignore_suspended_devices=1
write_cache_state=0
-disable_after_error_count=3
+disable_after_error_count=1
obtain_device_list_from_udev=0
%s
}
--
To view, visit http://gerrit.ovirt.org/32356
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d11abaaff45ce86e88c6589264e162318ac1f1d
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>
Nir Soffer has uploaded a new change for review.
Change subject: clusterlock: Remove uneeded workaround
......................................................................
clusterlock: Remove uneeded workaround
Sanlock version XXXX had a off-by-one bug when calling get_hosts with a
host id, returning info for the next host. This bug is fixed in version
XXX. Now we can use the hostId parameter, making the call more efficient
and simpligying clusterlock code.
Change-Id: Ide75e749fbc2916540c2b526b78fedc247b5c6f9
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm.spec.in
M vdsm/storage/clusterlock.py
2 files changed, 5 insertions(+), 16 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/62/31162/1
diff --git a/vdsm.spec.in b/vdsm.spec.in
index 5ba6fc6..fc39eb9 100644
--- a/vdsm.spec.in
+++ b/vdsm.spec.in
@@ -167,7 +167,7 @@
Requires: iscsi-initiator-utils >= 6.2.0.873-21
%endif
-Requires: sanlock >= 2.8, sanlock-python
+Requires: sanlock >= XXX, sanlock-python
%if 0%{?rhel}
Requires: python-ethtool >= 0.6-3
diff --git a/vdsm/storage/clusterlock.py b/vdsm/storage/clusterlock.py
index 24a5d81..17cdd53 100644
--- a/vdsm/storage/clusterlock.py
+++ b/vdsm/storage/clusterlock.py
@@ -265,26 +265,15 @@
return False
def getHostStatus(self, hostId):
- # Note: get_hosts has off-by-one bug when asking for particular host
- # id, so get all hosts info and filter.
- # See https://bugzilla.redhat.com/1111210
try:
- hosts = sanlock.get_hosts(self._sdUUID)
+ hosts = sanlock.get_hosts(self._sdUUID, hostId)
except sanlock.SanlockException as e:
self.log.debug("Unable to get host %d status in lockspace %s: %s",
hostId, self._sdUUID, e)
return HOST_STATUS_UNAVAILABLE
-
- for info in hosts:
- if info['host_id'] == hostId:
- status = info['flags']
- return self.STATUS_NAME[status]
-
- # get_hosts with host_id=0 returns only hosts with timestamp != 0,
- # which means that no host is using this host id now. If there a
- # timestamp, sanlock will return HOST_UNKNOWN and then HOST_LIVE or
- # HOST_FAIL.
- return HOST_STATUS_FREE
+ else:
+ status = hosts[0]['flags']
+ return self.STATUS_NAME[status]
# The hostId parameter is maintained here only for compatibility with
# ClusterLock. We could consider to remove it in the future but keeping it
--
To view, visit http://gerrit.ovirt.org/31162
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide75e749fbc2916540c2b526b78fedc247b5c6f9
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>