disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

List overview All Threads
Download

newer

older

Re: disable ad backend group...

speeding up iterative enumeration?

James Ralston

26 Jan 2016 26 Jan '16

4:50 p.m.

On Tue, Jan 26, 2016 at 3:03 PM, Jakub Hrozek jhrozek@redhat.com wrote:

...

On Tue, Jan 26, 2016 at 02:19:42PM -0500, James Ralston wrote:

...
Here's the problem: unless the user/group objects already happen to be in sssd's cache, enumerating the passwd/group entries in this way is very slow: 3-5 entries per second, at best. For a larger AD domain, the program can take 10-15 minutes to perform this iterative enumeration, which is much longer than we'd prefer.

Can anyone think of a way to make this iterative enumeration go faster?

Did you try mounting the cache to tmpfs to get rid of the cache writes?

[...]

That's… a very clever idea.

From testing using tmpfs to back /var/lib/sss/db, the speed of lookups increases by about an order of magnitude: about 44 lookups per second, instead of 4-5 lookups per second. We have around 5,000 AD objects, so the ~100 second wait would be tolerable.

A related question: is there any possibility of adding an option to the ad backend to disable the filtering of distribution groups (group type flag 0x8)?

It's a long story, but what we are trying to do here is to take regular snapshots of our AD users and groups, and sssd's getpwnam()/getgrnam() mapping is the perfect way to do it. I think I understand why distribution groups are filtered by default (they're not security-enabled in AD, and can't be used in Windows ACLs), but in this one particular case, we really do want to be able to enumerate every single group.

Show replies by date

Jakub Hrozek

27 Jan 27 Jan

3:07 a.m.

On Tue, Jan 26, 2016 at 05:50:06PM -0500, James Ralston wrote:

...

On Tue, Jan 26, 2016 at 3:03 PM, Jakub Hrozek jhrozek@redhat.com wrote:

...
On Tue, Jan 26, 2016 at 02:19:42PM -0500, James Ralston wrote:

...
Here's the problem: unless the user/group objects already happen to be in sssd's cache, enumerating the passwd/group entries in this way is very slow: 3-5 entries per second, at best. For a larger AD domain, the program can take 10-15 minutes to perform this iterative enumeration, which is much longer than we'd prefer.

Can anyone think of a way to make this iterative enumeration go faster?

Did you try mounting the cache to tmpfs to get rid of the cache writes?

[...]

That's… a very clever idea.

From testing using tmpfs to back /var/lib/sss/db, the speed of lookups increases by about an order of magnitude: about 44 lookups per second, instead of 4-5 lookups per second. We have around 5,000 AD objects, so the ~100 second wait would be tolerable.

A related question: is there any possibility of adding an option to the ad backend to disable the filtering of distribution groups (group type flag 0x8)?

I'm glad it helped. FWIW, we're considering adding a nosync option to the cache as well at some point, which should have the same performance effect as using tmpfs except the cache would be persistent (otoh, if sssd was killed during the transaction, the cache might got corrupt..which is why always sync by default)

...

It's a long story, but what we are trying to do here is to take regular snapshots of our AD users and groups, and sssd's getpwnam()/getgrnam() mapping is the perfect way to do it. I think I understand why distribution groups are filtered by default (they're not security-enabled in AD, and can't be used in Windows ACLs), but in this one particular case, we really do want to be able to enumerate every single group.

can you try setting: ldap_group_type = nosuchattr ?

That should trick sssd into not seeing the group type at all and would avoid filtering I guess (not tested).

John Hodrien

3:43 a.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

On Wed, 27 Jan 2016, Jakub Hrozek wrote:

...

I'm glad it helped. FWIW, we're considering adding a nosync option to the cache as well at some point, which should have the same performance effect as using tmpfs except the cache would be persistent (otoh, if sssd was killed during the transaction, the cache might got corrupt..which is why always sync by default)

Sounds like a great option to add. If you can't sanity check it, just deleting it if you don't know that it's been cleanly written may work?

Jakub Hrozek

4:27 a.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

On Wed, Jan 27, 2016 at 09:43:21AM +0000, John Hodrien wrote:

...

On Wed, 27 Jan 2016, Jakub Hrozek wrote:

...
I'm glad it helped. FWIW, we're considering adding a nosync option to the cache as well at some point, which should have the same performance effect as using tmpfs except the cache would be persistent (otoh, if sssd was killed during the transaction, the cache might got corrupt..which is why always sync by default)

Sounds like a great option to add. If you can't sanity check it, just deleting it if you don't know that it's been cleanly written may work?

Yes, that might be one idea, write some 'canary' on shutdown and start fresh if the canary was not there.

Coming up in 1.14..

Stephen Gallagher

8:17 a.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 01/27/2016 05:27 AM, Jakub Hrozek wrote:

...

On Wed, Jan 27, 2016 at 09:43:21AM +0000, John Hodrien wrote:

...
On Wed, 27 Jan 2016, Jakub Hrozek wrote:

...
I'm glad it helped. FWIW, we're considering adding a nosync option to the cache as well at some point, which should have the same performance effect as using tmpfs except the cache would be persistent (otoh, if sssd was killed during the transaction, the cache might got corrupt..which is why always sync by default)

Sounds like a great option to add. If you can't sanity check it, just deleting it if you don't know that it's been cleanly written may work?

Yes, that might be one idea, write some 'canary' on shutdown and start fresh if the canary was not there.

Coming up in 1.14..

The problem is that when the cache is deleted, it's not just losing the remote data. The cache also maintains the cached credentials, so it would break in the following common scenario:

Road warrior is out at a customer site and forgets their power cable. By unfortunate chance, the battery runs out while the cache is being written to (for any of a thousand reasons). Once the machine is plugged back in and powered on, SSSD starts up and sees that the cache canary is missing, so it deletes the cache and starts anew. Now the user cannot log in because their cached credentials are no longer there (and since they're sitting in a hotel somewhere far from a direct hookup to their company network, they can't get back in).

This... would not be a good thing.

Now, I can certainly see an argument for having such a nosync (or deferred sync) option for machines that are expected to always be connected to the identity network (and as such are using SSSD mostly for performance and surviving the occasional outage hiccup). But I'd say that such an option, if added, should be VERY carefully documented to explain all of the things that could go wrong.

As an aside, there are plenty of other things that can go wrong when the cache is deleted, including manual overrides from the sss_override command as well as ID ranges if any of them had hash collisions or were using the autorid compat mode.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlao0WEACgkQeiVVYja6o6NXZACfeAEe0SVIz3iztqxOSNi/0ejf 9LQAoIkwUHQMn2cJeh2Ef7h/Uc+5Nj/H =iY7e -----END PGP SIGNATURE-----

John Hodrien

8:21 a.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

On Wed, 27 Jan 2016, Stephen Gallagher wrote:

...

Now, I can certainly see an argument for having such a nosync (or deferred sync) option for machines that are expected to always be connected to the identity network (and as such are using SSSD mostly for performance and surviving the occasional outage hiccup). But I'd say that such an option, if added, should be VERY carefully documented to explain all of the things that could go wrong.

I don't disagree with what you've said, and it's exactly this situation I'd be interested in. If they're a road warrior, I'd be much less likely to enable nosync.

...

As an aside, there are plenty of other things that can go wrong when the cache is deleted, including manual overrides from the sss_override command as well as ID ranges if any of them had hash collisions or were using the autorid compat mode.

Sure, but none of this ends up being worse than using tmpfs, which we currently resort to in order to get acceptable performance. nosync with canary sounds like it can only be better in my situation.

Stephen Gallagher

8:29 a.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 01/27/2016 09:21 AM, John Hodrien wrote:

...

On Wed, 27 Jan 2016, Stephen Gallagher wrote:

...
Now, I can certainly see an argument for having such a nosync (or deferred sync) option for machines that are expected to always be connected to the identity network (and as such are using SSSD mostly for performance and surviving the occasional outage hiccup). But I'd say that such an option, if added, should be VERY carefully documented to explain all of the things that could go wrong.

I don't disagree with what you've said, and it's exactly this situation I'd be interested in. If they're a road warrior, I'd be much less likely to enable nosync.

...
As an aside, there are plenty of other things that can go wrong when the cache is deleted, including manual overrides from the sss_override command as well as ID ranges if any of them had hash collisions or were using the autorid compat mode.

Sure, but none of this ends up being worse than using tmpfs, which we currently resort to in order to get acceptable performance. nosync with canary sounds like it can only be better in my situation.

Actually, there is one slight difference: tmpfs won't persist across a reboot, but with the nosync-and-canary, it's possible that the cache could be destroyed during an SSSD package upgrade (for example).

Let's say that we introduced a bug and the canary doesn't get written in all cases (maybe we have a crash-on-shutdown bug somewhere). If you do a `yum|dnf update sssd`, this will restart SSSD as part of the process, to ensure that you are running the latest bits. If we crash during the shutdown, this restart might delete the cache unexpectedly.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlao1E0ACgkQeiVVYja6o6M15QCfbZnmLJbi3+jOSzvMbkOXz5Pv 094An0SFYxxSWSYjEsivB5NaXmQa9up1 =mauQ -----END PGP SIGNATURE-----

Jakub Hrozek

9:24 a.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

On Wed, Jan 27, 2016 at 09:17:09AM -0500, Stephen Gallagher wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 01/27/2016 05:27 AM, Jakub Hrozek wrote:

...
On Wed, Jan 27, 2016 at 09:43:21AM +0000, John Hodrien wrote:

...
On Wed, 27 Jan 2016, Jakub Hrozek wrote:

...
I'm glad it helped. FWIW, we're considering adding a nosync option to the cache as well at some point, which should have the same performance effect as using tmpfs except the cache would be persistent (otoh, if sssd was killed during the transaction, the cache might got corrupt..which is why always sync by default)

Sounds like a great option to add. If you can't sanity check it, just deleting it if you don't know that it's been cleanly written may work?

Yes, that might be one idea, write some 'canary' on shutdown and start fresh if the canary was not there.

Coming up in 1.14..

The problem is that when the cache is deleted, it's not just losing the remote data. The cache also maintains the cached credentials, so it would break in the following common scenario:

Yes, but the majority of users who require this speed up require it for use on the IPA server itself for example.

...

Road warrior is out at a customer site and forgets their power cable. By unfortunate chance, the battery runs out while the cache is being written to (for any of a thousand reasons). Once the machine is plugged back in and powered on, SSSD starts up and sees that the cache canary is missing, so it deletes the cache and starts anew. Now the user cannot log in because their cached credentials are no longer there (and since they're sitting in a hotel somewhere far from a direct hookup to their company network, they can't get back in).

This... would not be a good thing.

Now, I can certainly see an argument for having such a nosync (or deferred sync) option for machines that are expected to always be connected to the identity network (and as such are using SSSD mostly for performance and surviving the occasional outage hiccup). But I'd say that such an option, if added, should be VERY carefully documented to explain all of the things that could go wrong.

Sure.

btw the other thing we've been talking about is only do write the entry when it actually changes. Most of the time, when we refresh the entry from the server, nothing changes. The idea would be to write only the dataExpireTimestamp and other stampts to a separate ldb file that would be in nosync mode. Only write the data with full sync if something actually changes. That way, if we lose the nosync database, we'd only lose timestamp.s

...

As an aside, there are plenty of other things that can go wrong when the cache is deleted, including manual overrides from the sss_override command as well as ID ranges if any of them had hash collisions or were using the autorid compat mode.

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iEYEARECAAYFAlao0WEACgkQeiVVYja6o6NXZACfeAEe0SVIz3iztqxOSNi/0ejf 9LQAoIkwUHQMn2cJeh2Ef7h/Uc+5Nj/H =iY7e -----END PGP SIGNATURE----- _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-users@lists.fedorahosted.org

James Ralston

3:17 p.m.

New subject: disable ad backend group filtering? (was Re: Re: speeding up iterative enumeration?)

On Wed, Jan 27, 2016 at 10:24 AM, Jakub Hrozek jhrozek@redhat.com wrote:

...

btw the other thing we've been talking about is only do write the entry when it actually changes. Most of the time, when we refresh the entry from the server, nothing changes. The idea would be to write only the dataExpireTimestamp and other stampts to a separate ldb file that would be in nosync mode. Only write the data with full sync if something actually changes. That way, if we lose the nosync database, we'd only lose timestamps.

FWIW, I think this is the best solution: it would greatly accelerate the vast majority of lookups (all lookups except the initial one, or when an entry changes), but it would sidestep the cache coherency complexity that would result if syncs were disabled.

In contrast, adding an option to adjust the sync behavior would get sssd into the business of ensuring cache consistency, which is very difficult to get right. (Filesystem and database designers spend a lot of time addressing these issues.)

3010

Age (days ago)

3011

Last active (days ago)

sssd-users@lists.fedorahosted.org

8 comments

4 participants

tags (0)

participants (4)

Jakub Hrozek
James Ralston
John Hodrien
Stephen Gallagher