On Tue, Apr 15, 2014 at 03:35:01PM +0300, "Thomas B. Rücker" wrote:
On 03/04/14 22:43, Dmitri Pal wrote:
On 04/02/2014 05:02 AM, "Thomas B. Rücker" wrote:
Hi,
we're using SSSD in combination with active directory and have received complaints from users about a corner case in our setup.
Our AD servers are only reachable from within our corporate network, connection attempts from the outside are dropped by firewalls. This leads to the following scenario:
- user takes machine (e.g. laptop) outside the corporate network
- user tries to authenticate (or in some cases also tries to "ls" which
causes uid/gid lookup)
- sssd will try to reach the configured servers for up to 30s
- sssd goes (back) into offline mode and uses cached credentials and
authenticates the user
This will however NOT happen if sssd gets told by the IP stack that a connection to the target IP is not possible (e.g. "ip route add blackhole 192.0.2.23/32" or one of the routers along the way generates an ICMP unreachable). In such cases sssd will go immediately into offline mode and use cached credentials.
I'm aware that this is over all sensible behaviour, but what I would hope to fine tune is how sssd stays in offline mode. Currently it seems like it will leave offline mode when it tries to reconnect (hardcoded 30s?). That leads to a flip flop scenario where it seems to be 30s offline and 30s "online/connecting" and users have a fairly high chance to hit a time during which their authentication will seemingly stall.
So my question is: Is there a better way to deal with this in the sssd context? If not we'll probably have to implement separate connection checking and inject and remove blackhole routes accordingly. Not the nicest of workarounds in my book.
Thanks, cheers
Thomas
PS: We're using sssd on many distributions, but our main distro at the moment is ubuntu 12.04 with sssd 1.8.6 and we'll be rolling out 14.04 in addition, which has sssd 1.11.3.
In addtion to other comments I want to say that I experienced similar behavior periodically with my laptop until I moved to 1.9.x. Please try the latest version. The problem might already be addressed.
To this I'd like to specifically reply that I've paid closer attention to the behaviour of 1.11.3 on 14.04 and it's outright HORRIBLE compared to 1.8.6 on 12.04. It locks hard for full 30s on much more things. cd into /etc or /home? - 30s penalty on that shell. cd into different directory a few minutes later - 30s penalty again. Unlock screen? sudo? ... 30s It _seems_ (haven't checked) as if the timer only starts running exactly that instant, instead of what I observed on 12.04 where it was constantly retrying so it would be _up to_ 30s.
All goes away by routing the LDAP IPs to a black hole, just like on 12.04/1.8.6.
If you want to reproduce this behaviour, you could try adding LDAP server IPs to your hosts file while outside your network. This will only show though if some firewall/router along the way doesn't reply with "ICMP host unreachable", AFAIU. Or when inside your network you just add iptables DROP rules on all your LDAP server destiations. Whichever way you prefer.
While I'm not very active on this list, I'm trying to investigate this problem internally to get a better idea of how to mitigate it. Sadly I have more urgent things going on, else I'd have come back with debug logs and more well thought out theories already.
Cheers
Thomas
The logs would be really welcome.