If LNST controller processes a pool that contains a machine with a hostname that can't be resolved by DNS the controller fails with unhandled exception. The exception is raised on shutdown of the socket used to connect to the machine. Since the connection on the socket did not succeed the attempt to shutdown the connection is an error. Unfortunately this error is not detected by checking SO_ERR in getsockopt() after calling select() on the socket.
The fix is to mark the machine unavailable immediately unless the error code raised by connect call is EINPROGRESS meaning the connection proceeded to establishing phase (the hostname was succesfully resolved).
Fixes: 5498ba2 ("SlavePool: properly close sockets after connection check") Fixes issue #188
Signed-off-by: Jan Tluka jtluka@redhat.com --- lnst/Controller/SlavePool.py | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/lnst/Controller/SlavePool.py b/lnst/Controller/SlavePool.py index 5f94b07..5069f50 100644 --- a/lnst/Controller/SlavePool.py +++ b/lnst/Controller/SlavePool.py @@ -15,6 +15,7 @@ rpazdera@redhat.com (Radek Pazdera)
import logging import os +import errno import re import socket import select @@ -104,8 +105,21 @@ class SlavePool: s.settimeout(0) try: s.connect((hostname, port)) - except: - pass + except socket.error as msg: + # if the error is other than EINPROGRESS, e.g. the stack + # could not resolve name, the machine should become unavailable + try: + en = msg.errno + except AttributeError: + en = 0 + + if en != errno.EINPROGRESS: + pool[m_id]["available"] = False + s.close() + logging.debug("Bypassing machine '%s' (%s)" % + (m_id, msg)) + continue + check_sockets[s] = m_id
while len(check_sockets) > 0:
On Mon, Jul 31, 2017 at 03:51:10PM +0200, Jan Tluka wrote:
If LNST controller processes a pool that contains a machine with a hostname that can't be resolved by DNS the controller fails with unhandled exception. The exception is raised on shutdown of the socket used to connect to the machine. Since the connection on the socket did not succeed the attempt to shutdown the connection is an error. Unfortunately this error is not detected by checking SO_ERR in getsockopt() after calling select() on the socket.
The fix is to mark the machine unavailable immediately unless the error code raised by connect call is EINPROGRESS meaning the connection proceeded to establishing phase (the hostname was succesfully resolved).
Fixes: 5498ba2 ("SlavePool: properly close sockets after connection check") Fixes issue #188
Signed-off-by: Jan Tluka jtluka@redhat.com
lnst/Controller/SlavePool.py | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/lnst/Controller/SlavePool.py b/lnst/Controller/SlavePool.py index 5f94b07..5069f50 100644 --- a/lnst/Controller/SlavePool.py +++ b/lnst/Controller/SlavePool.py @@ -15,6 +15,7 @@ rpazdera@redhat.com (Radek Pazdera)
import logging import os +import errno import re import socket import select @@ -104,8 +105,21 @@ class SlavePool: s.settimeout(0) try: s.connect((hostname, port))
except:
pass
except socket.error as msg:
# if the error is other than EINPROGRESS, e.g. the stack
# could not resolve name, the machine should become unavailable
try:
en = msg.errno
except AttributeError:
en = 0
if en != errno.EINPROGRESS:
pool[m_id]["available"] = False
s.close()
logging.debug("Bypassing machine '%s' (%s)" %
(m_id, msg))
continue
check_sockets[s] = m_id while len(check_sockets) > 0:
-- 2.7.5 _______________________________________________ LNST-developers mailing list -- lnst-developers@lists.fedorahosted.org To unsubscribe send an email to lnst-developers-leave@lists.fedorahosted.org
pushed, thanks.
-Ondrej
lnst-developers@lists.fedorahosted.org