On 1 May 2016 at 17:04, Jakub Hrozek jhrozek@redhat.com wrote:
On 30 Apr 2016, at 10:28, Patrick Coleman patrick.coleman@meraki.com wrote: On 29 Apr 2016 9:10 pm, "Lukas Slebodnik" lslebodn@redhat.com wrote:
Do you meand IO related load or CPU related load?
Lots of both, but we're typically IO bound more of the time.
If there is issue with CPU then you can mount sssd cache to tmpfs to avoid such issues. (there are plans to improve it in 1.14)
Cool, I'll give that a go.
Alternatively, increase the 'timeout' option in sssd's sections..
I appreciate the advice, thankyou. I've put /var/lib/sss on to a tmpfs filesystem on a couple of loaded machines and seen what I believe to be improvements - it's a little too early to say, but I'll report back once I have a wider deployment.
I did want to feed back a little of our research into this issue. If we strace the sssd_be subprocess on a loaded machine, we see it sitting in msync() and fdatasync() for periods of up to 7.3 seconds in one test. This is perhaps expected, given the machine is under heavy IO load, but sssd makes a *lot* of these calls.
In a 7m 49.985s test (this is as long as the sssd_be process lasts before it is killed by the parent for not replying to ping) on a machine with moderate disk load and no new interactive logins, sssd made 232 *sync calls. The median syscall takes only 67ms, but the maximum is more than seven seconds - in the eight minute test sssd spent 1m 00.044s in *sync system calls.
My (naive) analysis here is that the backend process is spending 13% of its time unavailable to service account queries, because it's doing cache maintenance. This seems to rather defeat the point of having a cache... are my assumptions correct here? I'm happy to send the strace log (and any other data) to interested parties off-list, just let me know.
In an attempt to improve this behaviour, in addition to a tmpfs for /var/lib/sss I've also just added the following to the nss and pam stanzas in the config:
memcache_timeout = 1800 entry_cache_timeout = 1800
...the idea being they will respond from their own cache without contacting the backend, which may be busy per the above. Is this reasonable?
Cheers,
Patrick
On Tue, May 03, 2016 at 03:52:03PM +0100, Patrick Coleman wrote:
On 1 May 2016 at 17:04, Jakub Hrozek jhrozek@redhat.com wrote:
On 30 Apr 2016, at 10:28, Patrick Coleman patrick.coleman@meraki.com wrote: On 29 Apr 2016 9:10 pm, "Lukas Slebodnik" lslebodn@redhat.com wrote:
Do you meand IO related load or CPU related load?
Lots of both, but we're typically IO bound more of the time.
If there is issue with CPU then you can mount sssd cache to tmpfs to avoid such issues. (there are plans to improve it in 1.14)
Cool, I'll give that a go.
Alternatively, increase the 'timeout' option in sssd's sections..
I appreciate the advice, thankyou. I've put /var/lib/sss on to a tmpfs filesystem on a couple of loaded machines and seen what I believe to be improvements - it's a little too early to say, but I'll report back once I have a wider deployment.
I did want to feed back a little of our research into this issue. If we strace the sssd_be subprocess on a loaded machine, we see it sitting in msync() and fdatasync() for periods of up to 7.3 seconds in one test. This is perhaps expected, given the machine is under heavy IO load, but sssd makes a *lot* of these calls.
Yes, every cache update does 4 of these. This is a know issue I'm working on right now: https://fedorahosted.org/sssd/ticket/2602 In particular: https://fedorahosted.org/sssd/wiki/DesignDocs/OneFourteenPerformanceImprovem...
In a 7m 49.985s test (this is as long as the sssd_be process lasts before it is killed by the parent for not replying to ping) on a machine with moderate disk load and no new interactive logins, sssd made 232 *sync calls. The median syscall takes only 67ms, but the maximum is more than seven seconds - in the eight minute test sssd spent 1m 00.044s in *sync system calls.
My (naive) analysis here is that the backend process is spending 13% of its time unavailable to service account queries, because it's doing cache maintenance.
Very nice analysis.
Just a detail, it's not cache maintenance, but updates. The thing we are doing wrong at the moment is that we do a full write of the whole object even if nothing changes.
This seems to rather defeat the point of having a cache... are my assumptions correct here? I'm happy to send the strace log (and any other data) to interested parties off-list, just let me know.
In an attempt to improve this behaviour, in addition to a tmpfs for /var/lib/sss I've also just added the following to the nss and pam stanzas in the config:
memcache_timeout = 1800 entry_cache_timeout = 1800
...the idea being they will respond from their own cache without contacting the backend, which may be busy per the above. Is this reasonable?
Yes it is, in the sense that the cache writes would be performed less frequently. But your value of entry_cache_timeout is too low, the default is 5400. By the way, an authentication request ignores the cache validity, since during authentication we always try to get fresh group membership.
When #2602 is implemented, the cost of cache updates should (mostly) go away.
On Tue, May 03, 2016 at 05:30:23PM +0200, Jakub Hrozek wrote:
On Tue, May 03, 2016 at 03:52:03PM +0100, Patrick Coleman wrote:
On 1 May 2016 at 17:04, Jakub Hrozek jhrozek@redhat.com wrote:
On 30 Apr 2016, at 10:28, Patrick Coleman patrick.coleman@meraki.com wrote: On 29 Apr 2016 9:10 pm, "Lukas Slebodnik" lslebodn@redhat.com wrote:
Do you meand IO related load or CPU related load?
Lots of both, but we're typically IO bound more of the time.
If there is issue with CPU then you can mount sssd cache to tmpfs to avoid such issues. (there are plans to improve it in 1.14)
Cool, I'll give that a go.
Alternatively, increase the 'timeout' option in sssd's sections..
I appreciate the advice, thankyou. I've put /var/lib/sss on to a tmpfs filesystem on a couple of loaded machines and seen what I believe to be improvements - it's a little too early to say, but I'll report back once I have a wider deployment.
I did want to feed back a little of our research into this issue. If we strace the sssd_be subprocess on a loaded machine, we see it sitting in msync() and fdatasync() for periods of up to 7.3 seconds in one test. This is perhaps expected, given the machine is under heavy IO load, but sssd makes a *lot* of these calls.
Yes, every cache update does 4 of these. This is a know issue I'm working on right now: https://fedorahosted.org/sssd/ticket/2602 In particular: https://fedorahosted.org/sssd/wiki/DesignDocs/OneFourteenPerformanceImprovem...
By the way, some comparison from my WIP branch. Without the patches, updating a user who is a member of several hundred large groups with 'id' takes the following: Total run time of id was: 19415 ms Number of zero-level cache transactions: 283 --> Time spent in level-0 sysdb transactions: 7694 ms --> Time spent writing to LDB: 2958 ms Number of LDAP searches: 562 Time spent waiting for LDAP: 4548 ms
With the patches to avoid cache writes: Total run time of id was: 9482 ms Number of zero-level cache transactions: 283 --> Time spent in level-0 sysdb transactions: 1074 ms --> Time spent writing to LDB: 38 ms Number of LDAP searches: 562 Time spent waiting for LDAP: 4792 ms
So I think this already shows a nice improvement, although there is still quite a bit to work on..
On 3 May 2016 at 17:04, Jakub Hrozek jhrozek@redhat.com wrote:
On Tue, May 03, 2016 at 05:30:23PM +0200, Jakub Hrozek wrote:
Yes, every cache update does 4 of these. This is a know issue I'm working on right now: https://fedorahosted.org/sssd/ticket/2602 In particular: https://fedorahosted.org/sssd/wiki/DesignDocs/OneFourteenPerformanceImprovem...
By the way, some comparison from my WIP branch. Without the patches, updating a user who is a member of several hundred large groups with 'id' takes the following: Total run time of id was: 19415 ms Number of zero-level cache transactions: 283 --> Time spent in level-0 sysdb transactions: 7694 ms --> Time spent writing to LDB: 2958 ms Number of LDAP searches: 562 Time spent waiting for LDAP: 4548 ms
With the patches to avoid cache writes: Total run time of id was: 9482 ms Number of zero-level cache transactions: 283 --> Time spent in level-0 sysdb transactions: 1074 ms --> Time spent writing to LDB: 38 ms Number of LDAP searches: 562 Time spent waiting for LDAP: 4792 ms
That's good to know, and I'm happy to test any patches - just get in touch directly.
I wanted to report back that after a few days of running with a tmpfs at /var/lib/sss/db the issue with child processes timing out seems to have been mostly resolved. Our most loaded machines will report occasional timeouts but this doesn't seem to impact service:
(Wed May 4 03:32:38 2016) [sssd] [mark_service_as_started] (0x0400): SSSD is initialized, terminating parent process (Wed May 4 03:32:43 2016) [sssd] [services_startup_timeout] (0x0400): Handling timeout (Wed May 4 17:15:48 2016) [sssd] [ping_check] (0x0020): A service PING timed out on [pam]. Attempt [0] (Thu May 5 04:57:28 2016) [sssd] [ping_check] (0x0020): A service PING timed out on [nss]. Attempt [0] (Thu May 5 11:17:18 2016) [sssd] [ping_check] (0x0020): A service PING timed out on [nss]. Attempt [0] (Thu May 5 15:44:18 2016) [sssd] [ping_check] (0x0020): A service PING timed out on [nss]. Attempt [0] (Fri May 6 03:57:48 2016) [sssd] [ping_check] (0x0020): A service PING timed out on [nss]. Attempt [0] (Fri May 6 05:17:28 2016) [sssd] [ping_check] (0x0020): A service PING timed out on [nss]. Attempt [0]
As far as the actual auth failures go, it's hard to quantify whether we're now seeing _zero_ problems because they tend to not get reported, but our metrics are showing substantially fewer failures.
Really appreciate your assistance here, and if you need any further debugging information just let me know.
Cheers,
Patrick
sssd-users@lists.fedorahosted.org