Hello all, hope all is well Seeing an odd issue on a host. Periodically sssd will state it can't ping the domain...well the service named the same as the domain....and then shutdown and restart. Users can't auth and login till the service restarts. So this effectively restricts access to hosts to filtered users like root
Windows DCs are available the whole time. See nothing untoward in a pcap during that time. Also, since we've been having these issues, the host has not been used for prod duty, so lightly loaded, during these sssd disconnects. Really see heavy traffic to the DCs during the issue. What does it mean for a service ping to timeout in sssd speak? Service on the dbus? Posted snippets from journalctl/sssd logs/sssd.conf all below
thanks in advance, any and all help would be appreciated
host is ubuntu xenial
from journalctl......from the event today Jan 18 12:51:55 X sssd[41083]: Killing service [foo], not responding to pings! Jan 18 12:52:08 X sshd[104273]: fatal: Access denied for user srv_ti by PAM account configuration [preauth] Jan 18 12:52:52 X sshd[104298]: Connection closed by 99.99.99.99 port 60245 [preauth] Jan 18 12:52:55 X sssd[41083]: [foo][41084] is not responding to SIGTERM. Sending SIGKILL. Jan 18 12:52:55 X sssd[be[104300]: Starting up
sssd_log today at debug 9 set in sssd.conf
(Wed Jan 18 12:51:05 2017) [sssd] [ping_check] (0x2000): Service foo replied to ping (Wed Jan 18 12:51:05 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:05 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe109f0 (Wed Jan 18 12:51:05 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:05 2017) [sssd] [ping_check] (0x2000): Service nss replied to ping (Wed Jan 18 12:51:05 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:05 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe11ac0 (Wed Jan 18 12:51:05 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:05 2017) [sssd] [ping_check] (0x2000): Service pam replied to ping (Wed Jan 18 12:51:15 2017) [sssd] [service_send_ping] (0x2000): Pinging foo (Wed Jan 18 12:51:15 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:15 2017) [sssd] [service_send_ping] (0x2000): Pinging nss (Wed Jan 18 12:51:15 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:15 2017) [sssd] [service_send_ping] (0x2000): Pinging pam (Wed Jan 18 12:51:15 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe0d600 (Wed Jan 18 12:51:15 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:15 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe109f0 (Wed Jan 18 12:51:15 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:15 2017) [sssd] [ping_check] (0x2000): Service nss replied to ping (Wed Jan 18 12:51:15 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe0d600 (Wed Jan 18 12:51:15 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe11ac0 (Wed Jan 18 12:51:15 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:15 2017) [sssd] [ping_check] (0x2000): Service pam replied to ping (Wed Jan 18 12:51:25 2017) [sssd] [service_send_ping] (0x2000): Pinging foo (Wed Jan 18 12:51:25 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe0d600 (Wed Jan 18 12:51:25 2017) [sssd] [service_send_ping] (0x2000): Pinging nss (Wed Jan 18 12:51:25 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:25 2017) [sssd] [service_send_ping] (0x2000): Pinging pam (Wed Jan 18 12:51:25 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe09430 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe0c370 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:25 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [0] (Wed Jan 18 12:51:25 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe109f0 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:25 2017) [sssd] [ping_check] (0x2000): Service nss replied to ping (Wed Jan 18 12:51:25 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe09430 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe11ac0 (Wed Jan 18 12:51:25 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:25 2017) [sssd] [ping_check] (0x2000): Service pam replied to ping (Wed Jan 18 12:51:35 2017) [sssd] [service_send_ping] (0x2000): Pinging foo (Wed Jan 18 12:51:35 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe09430 (Wed Jan 18 12:51:35 2017) [sssd] [service_send_ping] (0x2000): Pinging nss (Wed Jan 18 12:51:35 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:35 2017) [sssd] [service_send_ping] (0x2000): Pinging pam (Wed Jan 18 12:51:35 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe109f0 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:35 2017) [sssd] [ping_check] (0x2000): Service nss replied to ping (Wed Jan 18 12:51:35 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe11ac0 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:35 2017) [sssd] [ping_check] (0x2000): Service pam replied to ping (Wed Jan 18 12:51:35 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe0d600 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe0c370 (Wed Jan 18 12:51:35 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:35 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [1] (Wed Jan 18 12:51:45 2017) [sssd] [service_send_ping] (0x2000): Pinging foo (Wed Jan 18 12:51:45 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe0d600 (Wed Jan 18 12:51:45 2017) [sssd] [service_send_ping] (0x2000): Pinging nss (Wed Jan 18 12:51:45 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:45 2017) [sssd] [service_send_ping] (0x2000): Pinging pam (Wed Jan 18 12:51:45 2017) [sssd] [sbus_add_timeout] (0x2000): 0xe15880 (Wed Jan 18 12:51:45 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe09430 (Wed Jan 18 12:51:45 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe0c370 (Wed Jan 18 12:51:45 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching. (Wed Jan 18 12:51:45 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [2] (Wed Jan 18 12:51:45 2017) [sssd] [sbus_remove_timeout] (0x2000): 0xe14540 (Wed Jan 18 12:51:45 2017) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe109f0 (Wed Jan 18 12:51:45 2017) [sssd] [sbus_dispatch] (0x4000): Dispatching.
This also happen this past Monday evening
(Mon Jan 16 19:22:30 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [0] (Mon Jan 16 19:22:40 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [1] (Mon Jan 16 19:22:50 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [2] (Mon Jan 16 19:23:00 2017) [sssd] [tasks_check_handler] (0x0020): Killing service [foo], not responding to pings! (Mon Jan 16 19:23:00 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [3] (Mon Jan 16 19:23:10 2017) [sssd] [ping_check] (0x0020): A service PING timed out on [foo]. Attempt [4] (Mon Jan 16 19:24:00 2017) [sssd] [mt_svc_sigkill] (0x0010): [foo][2084] is not responding to SIGTERM. Sending SIGKILL. (Mon Jan 16 19:24:00 2017) [sssd] [mt_svc_exit_handler] (0x0040): Child [foo] terminated with signal [9] (Mon Jan 16 19:24:00 2017) [sssd] [mt_svc_restart] (0x0400): Scheduling service foo for restart 1 (Mon Jan 16 19:24:00 2017) [sssd] [get_ping_config] (0x0100): Time between service pings for [foo]: [10] (Mon Jan 16 19:24:00 2017) [sssd] [get_ping_config] (0x0100): Time between SIGTERM and SIGKILL for [foo]: [60] (Mon Jan 16 19:24:00 2017) [sssd] [start_service] (0x0100): Queueing service foo for startup (Mon Jan 16 19:24:00 2017) [sssd] [sbus_server_init_new_connection] (0x0200): Entering. (Mon Jan 16 19:24:00 2017) [sssd] [sbus_server_init_new_connection] (0x0200): Adding connection 0x1588b70. (Mon Jan 16 19:24:00 2017) [sssd] [sbus_init_connection] (0x0400): Adding connection 0x1588b70 (Mon Jan 16 19:24:00 2017) [sssd] [sbus_server_init_new_connection] (0x0200): Got a connection (Mon Jan 16 19:24:00 2017) [sssd] [monitor_service_init] (0x0400): Initializing D-BUS Service (Mon Jan 16 19:24:00 2017) [sssd] [sbus_opath_hash_add_iface] (0x0400): Registering interface org.freedesktop.sssd.monitor with path /org/freedesktop/sssd/mon itor (Mon Jan 16 19:24:00 2017) [sssd] [sbus_conn_register_path] (0x0400): Registering object path /org/freedesktop/sssd/monitor with D-Bus connection (Mon Jan 16 19:24:00 2017) [sssd] [sbus_opath_hash_add_iface] (0x0400): Registering interface org.freedesktop.DBus.Properties with path /org/freedesktop/sssd/ monitor (Mon Jan 16 19:24:00 2017) [sssd] [sbus_opath_hash_add_iface] (0x0400): Registering interface org.freedesktop.DBus.Introspectable with path /org/freedesktop/s ssd/monitor (Mon Jan 16 19:24:00 2017) [sssd] [client_registration] (0x0100): Received ID registration: (%BE_foo,1) (Mon Jan 16 19:24:00 2017) [sssd] [mark_service_as_started] (0x0200): Marking foo as started. (Mon Jan 16 19:24:00 2017) [sssd] [mark_service_as_started] (0x0080): Invalid parent pid: 1963
sssd.conf [sssd] config_file_version = 2 debug_level = 9 reconnection_retries = 3 sbus_timeout = 30 services = nss, pam domains = foo
[nss] filter_groups = root, filter_users = root, reconnection_retries = 3
[pam] reconnection_retries = 3
[domain/foo] enumerate = False
id_provider = ad chpass_provider = ad auth_provider = ad
min_id = 1000
ad_hostname = X.us.foo.com ad_domain = us.foo.com
dyndns_update = false
ldap_id_mapping = false ldap_user_home_directory = unixHomeDirectory ldap_user_object_class = user ldap_group_object_class = top ldap_group_nesting_level = 5 ldap_group_name = sAMAccountName ldap_group_search_base = ou=accounts,dc=us,dc=foo,dc=com?subtree?&(objectClass=top)(!(objectClass=computer))(gidnumber=*)(|(groupType<=0)(&(objectClass=user)(objectCategory=person)(uidNumber=*)))
access_provider = simple simple_allow_users = appadmin,srv_ti, simple_allow_groups = SG-MCServices,SG-MTO-SE-Dev,
On (18/01/17 14:34), jsl6uy js16uy wrote:
Hello all, hope all is well Seeing an odd issue on a host. Periodically sssd will state it can't ping the domain...well the service named the same as the domain....and then shutdown and restart. Users can't auth and login till the service restarts. So this effectively restricts access to hosts to filtered users like root
Windows DCs are available the whole time. See nothing untoward in a pcap during that time. Also, since we've been having these issues, the host has not been used for prod duty, so lightly loaded, during these sssd disconnects. Really see heavy traffic to the DCs during the issue. What does it mean for a service ping to timeout in sssd speak? Service on the dbus? Posted snippets from journalctl/sssd logs/sssd.conf all below
thanks in advance, any and all help would be appreciated
host is ubuntu xenial
from journalctl......from the event today Jan 18 12:51:55 X sssd[41083]: Killing service [foo], not responding to pings! Jan 18 12:52:08 X sshd[104273]: fatal: Access denied for user srv_ti by PAM account configuration [preauth] Jan 18 12:52:52 X sshd[104298]: Connection closed by 99.99.99.99 port 60245 [preauth] Jan 18 12:52:55 X sssd[41083]: [foo][41084] is not responding to SIGTERM. Sending SIGKILL. Jan 18 12:52:55 X sssd[be[104300]: Starting up
sssd_log today at debug 9 set in sssd.conf
I would recommend to increase debug_level also in domain section (7 might me enought for this purpose) and then check what is going on at the problematic time in sssd_domain.log
LS
roger roger thanks very much for the suggestion will do
On Thu, Jan 19, 2017 at 12:48 AM, Lukas Slebodnik lslebodn@redhat.com wrote:
On (18/01/17 14:34), jsl6uy js16uy wrote:
Hello all, hope all is well Seeing an odd issue on a host. Periodically sssd will state it can't ping the domain...well the service named the same as the domain....and then shutdown and restart. Users can't auth and login till the service
restarts.
So this effectively restricts access to hosts to filtered users like root
Windows DCs are available the whole time. See nothing untoward in a pcap during that time. Also, since we've been having these issues, the host has not been used for prod duty, so lightly loaded, during these sssd disconnects. Really see heavy traffic to the DCs during the issue. What does it mean for a service ping to timeout in sssd speak? Service on the dbus? Posted snippets from journalctl/sssd logs/sssd.conf all below
thanks in advance, any and all help would be appreciated
host is ubuntu xenial
from journalctl......from the event today Jan 18 12:51:55 X sssd[41083]: Killing service [foo], not responding to pings! Jan 18 12:52:08 X sshd[104273]: fatal: Access denied for user srv_ti by
PAM
account configuration [preauth] Jan 18 12:52:52 X sshd[104298]: Connection closed by 99.99.99.99 port
60245
[preauth] Jan 18 12:52:55 X sssd[41083]: [foo][41084] is not responding to SIGTERM. Sending SIGKILL. Jan 18 12:52:55 X sssd[be[104300]: Starting up
sssd_log today at debug 9 set in sssd.conf
I would recommend to increase debug_level also in domain section (7 might me enought for this purpose) and then check what is going on at the problematic time in sssd_domain.log
LS _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Unfortunately we haven't seen any new issues since turning debug under the domain section to 9. So when sssd shutdowns down and restarts due to missing "pings" to the service named after the domain, should we be blocked on access? We understand for sec reasons its often times better to fail closed, like a fw breaking but locking/closing all ports, but what about if a user is leveraging this on his laptop? remotely and/or on wifi? I would expect the last local copy of the sss database/cache should be used to let them login During this process when sssd shuts down, its not like its removing the cache under /var/lib/sss/db. Should we ever see a point where sssd won't let you login due to the domain service timeout?
as always, thanks for the continued help/education in these matters
On Thu, Jan 19, 2017 at 10:06 AM, jsl6uy js16uy js16uy@gmail.com wrote:
roger roger thanks very much for the suggestion will do
On Thu, Jan 19, 2017 at 12:48 AM, Lukas Slebodnik lslebodn@redhat.com wrote:
On (18/01/17 14:34), jsl6uy js16uy wrote:
Hello all, hope all is well Seeing an odd issue on a host. Periodically sssd will state it can't
ping
the domain...well the service named the same as the domain....and then shutdown and restart. Users can't auth and login till the service
restarts.
So this effectively restricts access to hosts to filtered users like root
Windows DCs are available the whole time. See nothing untoward in a pcap during that time. Also, since we've been having these issues, the host
has
not been used for prod duty, so lightly loaded, during these sssd disconnects. Really see heavy traffic to the DCs during the issue. What does it mean for a service ping to timeout in sssd speak? Service on the dbus? Posted snippets from journalctl/sssd logs/sssd.conf all below
thanks in advance, any and all help would be appreciated
host is ubuntu xenial
from journalctl......from the event today Jan 18 12:51:55 X sssd[41083]: Killing service [foo], not responding to pings! Jan 18 12:52:08 X sshd[104273]: fatal: Access denied for user srv_ti by
PAM
account configuration [preauth] Jan 18 12:52:52 X sshd[104298]: Connection closed by 99.99.99.99 port
60245
[preauth] Jan 18 12:52:55 X sssd[41083]: [foo][41084] is not responding to SIGTERM. Sending SIGKILL. Jan 18 12:52:55 X sssd[be[104300]: Starting up
sssd_log today at debug 9 set in sssd.conf
I would recommend to increase debug_level also in domain section (7 might me enought for this purpose) and then check what is going on at the problematic time in sssd_domain.log
LS _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
On (23/01/17 14:41), jsl6uy js16uy wrote:
Unfortunately we haven't seen any new issues since turning debug under the domain section to 9.
You can try to decrease level to (e.g. 5)
So when sssd shutdowns down and restarts due to missing "pings" to the service named after the domain, should we be blocked on access?
pam_sss is very simple. All logic in daemon part. And If daemon demon is busy with some task it cannot process authentication. That's the reason why it would be good to see log files to find out what was going there,
We understand for sec reasons its often times better to fail closed, like a fw breaking but locking/closing all ports, but what about if a user is leveraging this on his laptop? remotely and/or on wifi? I would expect the last local copy of the sss database/cache should be used to let them login During this process when sssd shuts down, its not like its removing the cache under /var/lib/sss/db.
That should work with "cache_credentials = True" in domain section. e.g. which is standard use-case for laptops.
Should we ever see a point where sssd won't let you login due to the domain service timeout?
From security point of view it's better to block the access If you are not in known state (online, offline). Otherwise users might be able to authenticate even though the access should be denied. (access was disabled on AD but sssd_be is for some reason unresponsive/busy and cannot detect it)
This is a reason why it would be good to find out why sssd_be was unresponsive.
LS
Thanks very much for the help and insights. Unfortunately we may have to return the machine back to the customers soon. I will try to get more info before then. We may leave domain logging @ 5 and return to them. Will have to bounce off my colleagues. we may also try the "cache_credentials = True" directive If we do have to turn over, will report back
Thanks again for the explanations best regards
On Tue, Jan 24, 2017 at 10:00 AM, Lukas Slebodnik lslebodn@redhat.com wrote:
On (23/01/17 14:41), jsl6uy js16uy wrote:
Unfortunately we haven't seen any new issues since turning debug under the domain section to 9.
You can try to decrease level to (e.g. 5)
So when sssd shutdowns down and restarts due to missing "pings" to the service named after the domain, should we be blocked on access?
pam_sss is very simple. All logic in daemon part. And If daemon demon is busy with some task it cannot process authentication. That's the reason why it would be good to see log files to find out what was going there,
We understand for sec reasons its often times better to fail closed, like
a fw
breaking but locking/closing all ports, but what about if a user is leveraging this on his laptop? remotely and/or on wifi? I would expect the last local copy of the sss database/cache should be used to let them login During this process when sssd shuts down, its not like its removing the cache under /var/lib/sss/db.
That should work with "cache_credentials = True" in domain section. e.g. which is standard use-case for laptops.
Should we ever see a point where sssd won't let you login due to the
domain
service timeout?
From security point of view it's better to block the access If you are not in known state (online, offline). Otherwise users might be able to authenticate even though the access should be denied. (access was disabled on AD but sssd_be is for some reason unresponsive/busy and cannot detect it)
This is a reason why it would be good to find out why sssd_be was unresponsive.
LS _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
sssd-users@lists.fedorahosted.org