Issues with SSSD cache on version 1.13.4

List overview All Threads
Download

newer

older

ad_access_filter and splitting...

System Error

Beale (US), Gareth

21 Sep 2018 21 Sep '18

12:53 p.m.

We are running SUSE 12 SP3 which uses SSSD 1.13.4 which I believe is a LTM version.

Due to the large number of users and groups in our LDAP directory, and the limitations of some legacy Unix systems, we have some large groups that have been broken into "sub-groups" with the same GID but an incremental suffix. I don't believe this is an uncommon solution, and it has worked fine for many years. There are efforts underway to patch some older systems such that they can handle very large groups so that we can collapse these sub-groups, but it is a slow process and there are a lot of servers.

Recently we upgraded some Linux systems to SUSE 12 SP3 and this has made us transition to using SSSD instead of configuring LDAP in /etc/ldap/conf. In the last few weeks we have encountered an issue related to these groups with the same GID. Most of the time, everything works as before, and for instance "getent group" commands using either GID or (sub-group) name return results. However at times those commands return an empty list and the following error appears in the system log:

sssd[nss]: More groups have the same GID [nnnn] in directory server. SSSD will not work correctly. (group ID elided in this email per company policy)

Using sss_cache to expire the entire cache, group cache or specific group from cache has no effect. I understand that this expires the entries, not removes them, but subsequent getent calls do not overwrite what was there, the error persists. Stopping SSSD, removing the cache DB and restarting was effective, but this is not a viable solution in production. Since the problem clears itself eventually (only to come back later) I tried various strategies, one of which was to do a "getent group" on every sub-group, and this does clear the problem (until it returns).

Since I discovered this issue on SUSE, others in the company have verified that it also appears in RH 6 and 7. RH 7 is running 1.16.0, so the problem is still present up to that release, though the above error message does not appear in the messages log. Instead there is an error in the sssd_nss.log:

[sssd[nss]] [cache_req_search_cache] (0x0020): CR #1122: Multiple objects were found when only one was expected!

Gareth

Gareth Beale (bemsid: 45600) Enterprise High Performance Computing Service Application Infrastructure Services Global Information Technology Infrastrucure Services Need help? http://iticket.web.boeing.com/secure/create.aspx?id=serverhpc / 425-234-0911

Attachments:

attachment.html (text/html — 5.8 KB)

Show replies by date

Simo Sorce

21 Sep 21 Sep

1:07 p.m.

I am probably guilty of introducing this behavior in the original implementation, and although I believe it is the correct behavior for UIDs, it is probably suboptimal for GIDs. I think we should open an issue to deal with this in a better way if one is not open yet.

Simo.

On Fri, 2018-09-21 at 17:53 +0000, Beale (US), Gareth wrote:

...

We are running SUSE 12 SP3 which uses SSSD 1.13.4 which I believe is a LTM version.

Due to the large number of users and groups in our LDAP directory, and the limitations of some legacy Unix systems, we have some large groups that have been broken into "sub-groups" with the same GID but an incremental suffix. I don't believe this is an uncommon solution, and it has worked fine for many years. There are efforts underway to patch some older systems such that they can handle very large groups so that we can collapse these sub-groups, but it is a slow process and there are a lot of servers.

Recently we upgraded some Linux systems to SUSE 12 SP3 and this has made us transition to using SSSD instead of configuring LDAP in /etc/ldap/conf. In the last few weeks we have encountered an issue related to these groups with the same GID. Most of the time, everything works as before, and for instance "getent group" commands using either GID or (sub-group) name return results. However at times those commands return an empty list and the following error appears in the system log:

sssd[nss]: More groups have the same GID [nnnn] in directory server. SSSD will not work correctly. (group ID elided in this email per company policy)

Using sss_cache to expire the entire cache, group cache or specific group from cache has no effect. I understand that this expires the entries, not removes them, but subsequent getent calls do not overwrite what was there, the error persists. Stopping SSSD, removing the cache DB and restarting was effective, but this is not a viable solution in production. Since the problem clears itself eventually (only to come back later) I tried various strategies, one of which was to do a "getent group" on every sub-group, and this does clear the problem (until it returns).

Since I discovered this issue on SUSE, others in the company have verified that it also appears in RH 6 and 7. RH 7 is running 1.16.0, so the problem is still present up to that release, though the above error message does not appear in the messages log. Instead there is an error in the sssd_nss.log:

[sssd[nss]] [cache_req_search_cache] (0x0020): CR #1122: Multiple objects were found when only one was expected!

Gareth

Gareth Beale (bemsid: 45600) Enterprise High Performance Computing Service Application Infrastructure Services Global Information Technology Infrastrucure Services Need help? http://iticket.web.boeing.com/secure/create.aspx?id=serverhpc / 425-234-0911

sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...

-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc

gfbhwo＠yahoo.com

1:36 p.m.

So how does one determine if there is an issue open or not? And how do they get prioritized?

This issue makes SSSD unusable in our production environment. We may have to look at running without SSSD in the short term, if that is possible on SLES 12.

I'm not quite understanding why this error does not show up all the time. How do two entries with the same GID get into the cache?

For our case, say we have a set of groups abcd..1, abcd..2 etc, all with the same GID. I would expect the first lookup (e.g. abcd..1) to put an entry in the cache. If there is then a lookup by GID, (getent group <GID>) it would return this entry. However a lookup by name (e.g. abcd..2) would have to query LDAP, right? Then what happens, does this new data overwrite the old GID entry in the cache? Or is there some bug whereby sometimes a duplicate entry gets made? Why is there a check for duplicates when a GID is looked up as opposed to when an entry is placed in the cache?

Is there any kind of workaround for this? e.g. is there a way to exclude specific GIDs from being placed in the cache? This would make for inefficient lookups as they would have to hit the LDAP server, but it is a small percentage of the total number of groups.

Gareth

Jakub Hrozek

24 Sep 24 Sep

4:38 a.m.

...

On 21 Sep 2018, at 20:36, gfbhwo@yahoo.com wrote:

For our case, say we have a set of groups abcd..1, abcd..2 etc, all with the same GID. I would expect the first lookup (e.g. abcd..1) to put an entry in the cache. If there is then a lookup by GID, (getent group <GID>) it would return this entry. However a lookup by name (e.g. abcd..2) would have to query LDAP, right? Then what happens, does this new data overwrite the old GID entry in the cache? Or is there some bug whereby sometimes a duplicate entry gets made? Why is there a check for duplicates when a GID is looked up as opposed to when an entry is placed in the cache?

I’m not so sure it would be a good idea to support this, honestly. What do you suggest would then be returned for lookups by GID (getgrgid 1234) if there are multiple entries with GID=1234 in the cache? Just let the first match win? I know this is what nss_ldap does, whatever is returned from LDAP is then passed on to NSS, but I’m mostly concerned about consistency, suppose a first machine does getent group abcd..1, another one does geten group abcd..2. Then you get a different result on each machine for by-GID request..

LDAP also doesn’t guarantee any ordering of results AFAIK (even though in practice I’ve seen the replies are quite consistent), so it’s even not guaranteed to always receive the same answer for the by-GID LDAP search..

btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

Beale (US), Gareth

9:12 a.m.

...

I’m not so sure it would be a good idea to support this, honestly.

Well that rather depends on what you mean by "this". I was reporting a problem that seemed an inconsistency to me. Either multiple groups with the same GID are supported, or they aren't. The current implementation is inconsistent in its response over time, and it flags an error and then fails - that should not happen in either scenario.

I think one has to be careful in applying rules that haven't been applied in the past. Since using GIDs in this way is a "hack" but has been used in the past, albeit with the expectation that some corners don't work properly, changing the behaviour needs to be carefully thought out. The example case you state does produce differing results with nss_ldap, so should that really change? I suspect it is the way it is because there's no clear "right" way to handle it. For the most part, an /etc/group file or LDAP directory doesn't prevent duplicate GIDs, so should that restriction really be applied by the caching service?

In practical terms, since the first lookup results depend on LDAP, SSSD doesn't even know there are multiple groups with the same GID, so I don't think it even can reject looking up such a group. Arguably, a lookup by GID should return ALL entries with that GID and put them as a single entry in the cache, even if they have different names. This of course is behavior different from nss_ldap, and who knows if you can even ask LDAP for it?

It looks like you have established some kind of checking is done, but nonetheless SSSD is reporting a duplicate entry. So either there's a path through the code that bypasses the checks, or the error report itself is in error. I have not been able to establish which. The ldbsearch utility reports zero entries for the GID, and indeed zero entries if I try to list "all". IS there another way to query the SSS DB? I think tdbtools can do it, but the utility doesn't have the fine-grained queries that ldbsearch seems to.

Gareth

-----Original Message----- From: Jakub Hrozek [mailto:jhrozek@redhat.com] Sent: Monday, September 24, 2018 2:38 AM To: End-user discussions about the System Security Services Daemon sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: Issues with SSSD cache on version 1.13.4

...

On 21 Sep 2018, at 20:36, gfbhwo@yahoo.com wrote:

For our case, say we have a set of groups abcd..1, abcd..2 etc, all with the same GID. I would expect the first lookup (e.g. abcd..1) to put an entry in the cache. If there is then a lookup by GID, (getent group <GID>) it would return this entry. However a lookup by name (e.g. abcd..2) would have to query LDAP, right? Then what happens, does this new data overwrite the old GID entry in the cache? Or is there some bug whereby sometimes a duplicate entry gets made? Why is there a check for duplicates when a GID is looked up as opposed to when an entry is placed in the cache?

btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test.. _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...

Michael Ströder

9:40 a.m.

On 9/24/18 4:12 PM, Beale (US), Gareth wrote:

...

...
I’m not so sure it would be a good idea to support this, honestly.

Well that rather depends on what you mean by "this". I was reporting a problem that seemed an inconsistency to me. Either multiple groups with the same GID are supported, or they aren't. The current implementation is inconsistent in its response over time, and it flags an error and then fails - that should not happen in either scenario.

You're absolutely right that the sssd behaviour you've observed is inconsistent.

That's why Jakub Hrozek wrote:

...

btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness andeven a test..

So for me it boils down to: Multiple group entries with same GID are not supported in sssd and should never be added to the cache. Why it happened in your case has to be examined.

...

I think one has to be careful in applying rules that haven't been applied in the past. Since using GIDs in this way is a "hack" but has been used in the past, albeit with the expectation that some corners don't work properly, changing the behaviour needs to be carefully thought out. The example case you state does produce differing results with nss_ldap, so should that really change?

The question here is whether the behaviour of nss_ldap shall be considered as the standard reference. Personally I don't think so.

Looking at RFC 2307:

( nisSchema.1.1 NAME 'gidNumber' DESC 'An integer uniquely identifying a group in an administrative domain' EQUALITY integerMatch SYNTAX 'INTEGER' SINGLE-VALUE )

The wording in DESC implies that Luke Howard assumed 'gidNumber' to be unique even though he did not enforce this when implementing nss_ldap.

After thinking more about this I'd consider this inconsistency to be a real security risk, maybe not in your particular deployment, but in general. So I'd never trust a NSS implementation like this.

...

I suspect it is the way it is because there's no clear "right" way to handle it.

I think there is a right way.

Ciao, Michael.

Jakub Hrozek

1:01 p.m.

On Mon, Sep 24, 2018 at 04:40:34PM +0200, Michael Ströder wrote:

...

On 9/24/18 4:12 PM, Beale (US), Gareth wrote:

...
...
I’m not so sure it would be a good idea to support this, honestly.

Well that rather depends on what you mean by "this". I was reporting a problem that seemed an inconsistency to me. Either multiple groups with the same GID are supported, or they aren't. The current implementation is inconsistent in its response over time, and it flags an error and then fails - that should not happen in either scenario.

You're absolutely right that the sssd behaviour you've observed is inconsistent.

Yes, I think it's a bug in SSSD. We should either fail right away or permit the duplicates.

Would either of you care to file a bug? :)

...

That's why Jakub Hrozek wrote:

...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness andeven a test..

So for me it boils down to: Multiple group entries with same GID are not supported in sssd and should never be added to the cache. Why it happened in your case has to be examined.

Yes, this is what I meant.

Simo Sorce

9:22 a.m.

On Mon, 2018-09-24 at 11:38 +0200, Jakub Hrozek wrote:

...

...
On 21 Sep 2018, at 20:36, gfbhwo@yahoo.com wrote:

For our case, say we have a set of groups abcd..1, abcd..2 etc, all with the same GID. I would expect the first lookup (e.g. abcd..1) to put an entry in the cache. If there is then a lookup by GID, (getent group <GID>) it would return this entry. However a lookup by name (e.g. abcd..2) would have to query LDAP, right? Then what happens, does this new data overwrite the old GID entry in the cache? Or is there some bug whereby sometimes a duplicate entry gets made? Why is there a check for duplicates when a GID is looked up as opposed to when an entry is placed in the cache?

I’m not so sure it would be a good idea to support this, honestly. What do you suggest would then be returned for lookups by GID (getgrgid 1234) if there are multiple entries with GID=1234 in the cache? Just let the first match win? I know this is what nss_ldap does, whatever is returned from LDAP is then passed on to NSS, but I’m mostly concerned about consistency, suppose a first machine does getent group abcd..1, another one does geten group abcd..2. Then you get a different result on each machine for by-GID request..

For groups I would expect us to merge memberships in rfc2307 mode, and keeping the alphabetically "smaller" name as the group name for predictability. For RFC 2307bis it may be a little harder because of nested membership stuff, that needs a little bit more thinking. Maybe allow it only for RFC2307 trees ?

...

LDAP also doesn’t guarantee any ordering of results AFAIK (even though in practice I’ve seen the replies are quite consistent), so it’s even not guaranteed to always receive the same answer for the by-GID LDAP search..

btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

Simo.

-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc

Michael Ströder

9:44 a.m.

On 9/24/18 4:22 PM, Simo Sorce wrote:

...

For groups I would expect us to merge memberships in rfc2307 mode,

If you really want to implement such merging then please disable it by default. So that it must be explicitly enabled after careful consideration.

Ciao, Michael.

Simo Sorce

10:46 a.m.

On Mon, 2018-09-24 at 16:44 +0200, Michael Ströder wrote:

...

On 9/24/18 4:22 PM, Simo Sorce wrote:

...
For groups I would expect us to merge memberships in rfc2307 mode,

If you really want to implement such merging then please disable it by default. So that it must be explicitly enabled after careful consideration.

Yes it would have to be optional and disabled by default, we do not want to promote bad practices.

What we can do to make the code more predictable (albeit slower) is to always "reverse resolve" by gid (and by name) whenever a search by name (or by gid) is performed, so duplicates are always consistently dealt with (either first in alphabetic order only or always completely fail to accept a group with duplicate gid (or name).

This check can be optimized on servers that support dereference controls.

Simo.

-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc

Beale (US), Gareth

11:23 a.m.

It's a good discussion, and I don't necessarily disagree with your thoughts on groups with the same GID, despite the nss_ldap implementation. I'd agree that any change in behavior should have an option to revert to previous but that it shouldn't be the default.

However, is the issue with duplicate entries in the cache an artifact of this problem, or is it unrelated? If the latter then it should be a separate bug report/fix that could then be worked on ahead of any kind of agreement on duplicate GIDs.

I suspect that whatever the decision, it's not going to happen in any kind of short time frame. What I will probably have to do is revert to a non-SSSD configuration in SLES 12 to get things working as before until we can get the split groups aligned into single entries.

Gareth

-----Original Message----- From: Simo Sorce [mailto:simo@redhat.com] Sent: Monday, September 24, 2018 8:46 AM To: End-user discussions about the System Security Services Daemon sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: Issues with SSSD cache on version 1.13.4

On Mon, 2018-09-24 at 16:44 +0200, Michael Ströder wrote:

...

On 9/24/18 4:22 PM, Simo Sorce wrote:

...
For groups I would expect us to merge memberships in rfc2307 mode,

If you really want to implement such merging then please disable it by default. So that it must be explicitly enabled after careful consideration.

Yes it would have to be optional and disabled by default, we do not want to promote bad practices.

This check can be optimized on servers that support dereference controls.

Simo.

-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...

Jakub Hrozek

12:55 p.m.

On Mon, Sep 24, 2018 at 11:46:08AM -0400, Simo Sorce wrote:

...

On Mon, 2018-09-24 at 16:44 +0200, Michael Ströder wrote:

...
On 9/24/18 4:22 PM, Simo Sorce wrote:

...
For groups I would expect us to merge memberships in rfc2307 mode,

If you really want to implement such merging then please disable it by default. So that it must be explicitly enabled after careful consideration.

Yes it would have to be optional and disabled by default, we do not want to promote bad practices.

What we can do to make the code more predictable (albeit slower) is to always "reverse resolve" by gid (and by name) whenever a search by name (or by gid) is performed, so duplicates are always consistently dealt with (either first in alphabetic order only or always completely fail to accept a group with duplicate gid (or name).

btw this is what the proxy provider does (why only the proxy provider I don't know..maybe because there we don't have any other means to detect what kind of an object this is, like original DN)

Jakub Hrozek

12:59 p.m.

On Mon, Sep 24, 2018 at 10:22:35AM -0400, Simo Sorce wrote:

...

...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

Simo Sorce

1:25 p.m.

On Mon, 2018-09-24 at 19:59 +0200, Jakub Hrozek wrote:

...

On Mon, Sep 24, 2018 at 10:22:35AM -0400, Simo Sorce wrote:

...
...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

The way the code is currently written is, if there is a duplicate: - check if the "new" group has the same SID, uniqueID or original DN as the "old" one - yes, same: this is a rename, allow - no, different: this is a duplicate, error

not sure how the original DN would match if you rename the object and that changes the DN too ?

Simo.

-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc

Jakub Hrozek

25 Sep 25 Sep

1:40 a.m.

...

On 24 Sep 2018, at 20:25, Simo Sorce simo@redhat.com wrote:

On Mon, 2018-09-24 at 19:59 +0200, Jakub Hrozek wrote:

...
On Mon, Sep 24, 2018 at 10:22:35AM -0400, Simo Sorce wrote:

...
...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

The way the code is currently written is, if there is a duplicate:

check if the "new" group has the same SID, uniqueID or original DN as the "old" one

yes, same: this is a rename, allow

no, different: this is a duplicate, error

not sure how the original DN would match if you rename the object and that changes the DN too ?

Yes, in that case, the rename would throw an error. The originalDN handling was meant to support renames in the case the RDN doesn’t contain the name and therefore doesn’t change.

This is honestly something where I don’t know what is the right thing to do. If we detect that a group with some GID already exists, then how do we distinguish between “err, there are duplicates on the LDAP side” and “look, the group was renamed” without any peristent identifier like a SID? At the point the code was changed the last time we also thought about returning an error code from the sysdb save operation and then running an LDAP search by GID before deciding about rename or duplicate. But we also thought that a) a rename was not very common operation and b) because IPA and especially AD form the bulk of the deployments, we could go away with the SID or uniqueID helper..

Of course, if more people complain about group renames with a “plain LDAP” server, then we should implement the more complex logic, but my take at the time was that SSSD is complex as it is already, so I didn’t think it was worth adding a complex logic for a corner case.

Michael Ströder

1:57 a.m.

On 9/25/18 8:40 AM, Jakub Hrozek wrote:

...

This is honestly something where I don’t know what is the right thing to do. If we detect that a group with some GID already exists, then how do we distinguish between “err, there are duplicates on the LDAP side” and “look, the group was renamed” without any peristent identifier like a SID?

I fully agree this is a can of worms.

...

Of course, if more people complain about group renames with a “plain LDAP” server,

Some LDAP servers, e.g. OpenLDAP and IIRC OpenDJ, also implement 'entryUUID' [RFC 4530].

But still so many things can go wrong: - entryUUID not visible for sssd - Other client components using the same groups not prepared for all that ...

Ciao, Michael.

Simo Sorce

9:19 a.m.

On Tue, 2018-09-25 at 08:40 +0200, Jakub Hrozek wrote:

...

...
On 24 Sep 2018, at 20:25, Simo Sorce simo@redhat.com wrote:

On Mon, 2018-09-24 at 19:59 +0200, Jakub Hrozek wrote:

...
On Mon, Sep 24, 2018 at 10:22:35AM -0400, Simo Sorce wrote:

...
...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

The way the code is currently written is, if there is a duplicate:

check if the "new" group has the same SID, uniqueID or original DN as the "old" one

yes, same: this is a rename, allow

no, different: this is a duplicate, error

not sure how the original DN would match if you rename the object and that changes the DN too ?

Yes, in that case, the rename would throw an error. The originalDN handling was meant to support renames in the case the RDN doesn’t contain the name and therefore doesn’t change.

This is honestly something where I don’t know what is the right thing to do. If we detect that a group with some GID already exists, then how do we distinguish between “err, there are duplicates on the LDAP side” and “look, the group was renamed” without any peristent identifier like a SID?

We can do an additional search (or always do dereference searches) for the gidNumber, and if only one entry comes back then the old entry is gone and the new one is what we should keep.

...

At the point the code was changed the last time we also thought about returning an error code from the sysdb save operation and then running an LDAP search by GID before deciding about rename or duplicate.

Yes.

...

But we also thought that a) a rename was not very common operation and b) because IPA and especially AD form the bulk of the deployments, we could go away with the SID or uniqueID helper..

Yes, we can have a functiont hat does various optimizations (entryUUID, SID, IPAUniqueID) or use ASQ/dereference controls to always get back a sub search for gidnumber.

...

Of course, if more people complain about group renames with a “plain LDAP” server, then we should implement the more complex logic, but my take at the time was that SSSD is complex as it is already, so I didn’t think it was worth adding a complex logic for a corner case.

The problem is that when the corner case happens you have incorrect results, so we should handle it more gracefully in any case.

Simo.

-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc

Beale (US), Gareth

24 Sep 24 Sep

1:52 p.m.

...

The way the code is currently written is, if there is a duplicate:

check if the "new" group has the same SID, uniqueID or original DN as the "old" one

yes, same: this is a rename, allow

no, different: this is a duplicate, error

I'm not clear on the start of this flow - what is meant by "if there is a duplicate"?

What I see on the affected system is e.g.:

getent group abcd..1 abcd..1 :*:1234:<userlist for abcd..1> getent group 1234 (returns same entry as for abcd..1)

Oddly, if I then:

getent group abcd..2 abcd..2 :*:1234:<userlist for abcd..2> getent group 1234 (returns same entry as for abcd..1 - not abcd..2)

However, at some point the cache gets into a state whereby:

getent group 1234 (returns empty result and also the duplicate GID error message in system log) a subsequent "getent group abcd..N" will also generally return the empty result. However if I script a getent of every suffixed group, each time followed by a getent of the GID, eventually it "kicks loose" and reverts to the initial state. It doesn't last very long however. General system activity seems to return it to the "stuck cache" before too long. Since we have multiple split groups, this can be happening simultaneously for multiple groups.

Gareth

-----Original Message----- From: Jakub Hrozek [mailto:jhrozek@redhat.com] Sent: Monday, September 24, 2018 10:59 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: Issues with SSSD cache on version 1.13.4

On Mon, Sep 24, 2018 at 10:22:35AM -0400, Simo Sorce wrote:

...

...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

The way the code is currently written is, if there is a duplicate: - check if the "new" group has the same SID, uniqueID or original DN as the "old" one - yes, same: this is a rename, allow - no, different: this is a duplicate, error _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...

Sumit Bose

25 Sep 25 Sep

1:31 a.m.

On Mon, Sep 24, 2018 at 06:52:50PM +0000, Beale (US), Gareth wrote:

...

...
The way the code is currently written is, if there is a duplicate:

check if the "new" group has the same SID, uniqueID or original DN as the "old" one

yes, same: this is a rename, allow

no, different: this is a duplicate, error

I'm not clear on the start of this flow - what is meant by "if there is a duplicate"?

What I see on the affected system is e.g.:

getent group abcd..1 abcd..1 :*:1234:<userlist for abcd..1> getent group 1234 (returns same entry as for abcd..1)

Oddly, if I then:

getent group abcd..2 abcd..2 :*:1234:<userlist for abcd..2> getent group 1234 (returns same entry as for abcd..1 - not abcd..2)

This is most probably returned from the memory cache. If you call

SSS_NSS_USE_MEMCACHE=no getent group 1234

I would expect that you see the empty results always after 'getent group abcd..2' is called because the request will now go directly to the SSSD nss responder where the duplicate GID is detected.

bye, Sumit

...

However, at some point the cache gets into a state whereby:

getent group 1234 (returns empty result and also the duplicate GID error message in system log) a subsequent "getent group abcd..N" will also generally return the empty result. However if I script a getent of every suffixed group, each time followed by a getent of the GID, eventually it "kicks loose" and reverts to the initial state. It doesn't last very long however. General system activity seems to return it to the "stuck cache" before too long. Since we have multiple split groups, this can be happening simultaneously for multiple groups.

Gareth

-----Original Message----- From: Jakub Hrozek [mailto:jhrozek@redhat.com] Sent: Monday, September 24, 2018 10:59 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users] Re: Issues with SSSD cache on version 1.13.4

On Mon, Sep 24, 2018 at 10:22:35AM -0400, Simo Sorce wrote:

...
...
btw it’s a good question to ask why isn’t the check done on saving the group. I thought it was and I see code that checks for ID uniqueness and even a test..

In current code, saving would override data as if the group was renamed changed I think ?

The way the code is currently written is, if there is a duplicate: - check if the "new" group has the same SID, uniqueID or original DN as the "old" one - yes, same: this is a rename, allow - no, different: this is a duplicate, error _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...

Michael Ströder

21 Sep 21 Sep

3:30 p.m.

On 9/21/18 7:53 PM, Beale (US), Gareth wrote:

...

Due to the large number of users and groups in our LDAP directory, and the limitations of some legacy Unix systems, we have some large groups that have been broken into “sub-groups” with the same GID but an incremental suffix.

I'd consider this to be broken data.

...

I don’t believe this is an uncommon solution,

Frankly I never saw this. Personally I'd consider this to be rather uncommon.

...

and it has worked fine for many years.

Your systems really handled full group lookups by GID correctly? How?

Ciao, Michael.

gfbhwo＠yahoo.com

3:53 p.m.

On 9/21/18 7:53 PM, Beale (US), Gareth wrote: ...

I'd consider this to be broken data. ... - you are entitled to your opinion. It is a hack, but it has worked for a long time as a workaround to deficiencies in services like NIS, and legacy Unix systems

I don’t believe this is an uncommon solution, Frankly I never saw this. Personally I'd consider this to be rather uncommon. ... Your mileage may vary

and it has worked fine for many years. Your systems really handled full group lookups by GID correctly? How? Ciao, Michael.

Lookup by GID would likely return the most recently cached group with that GID (though that isn't the case with SSSD strangely). So a manual lookup by GID to find a user might not return the right result, but it doesn't appear to be how things work for most system utilities (groups, id etc.).

I'm really looking for some assistance on this thread. I'm aware that opinions may vary, but the bottom line is that we are seeing errors and lookup failures that didn't happen before SSSD was inserted in front of LDAP. And the failures do not happen consistently.

Also if having duplicate GIDs in the cache is an error, how did they get there in the first place? Clearly things are not working the way they should.

2039

Age (days ago)

2043

Last active (days ago)

sssd-users@lists.fedorahosted.org

20 comments

6 participants

tags (0)

participants (6)

Beale (US), Gareth
gfbhwo＠yahoo.com
Jakub Hrozek
Michael Ströder
Simo Sorce
Sumit Bose