How often does ldap cache clear?

List overview All Threads
Download

newer

older

Sssd group id mismatch

Re: SSSD user mailing list:...

Lachlan Musicman

17 Aug 2017 17 Aug '17

12:36 a.m.

We use FreeIPA/SSSD to authenticate our RStudio Server, which we control via HBAC membership of an AD group.

Our users are having their sessions ended frequently - once a day or more - with the logged message

17 Aug 2017 05:16:21 [rserver] WARNING User <user>@<domain> could not be authenticated because they do not belong to one of the required groups (rstudio); LOGGED FROM: bool rstudio::server::auth::validateUser(const std::string&, const std::string&, unsigned int, bool) /root/rstudio-pro/src/cpp/server/auth/ServerValidateUser.cpp:103

Most likely this is partially because RStudio server is overly aggressive, but I am also noticing that their log is telling the truth:

id <user>@<domain>

is not returning the full membership set of the user - in particular the user group overrides are not being registered. IE, I can see that <user> is in the appropriate AD group, but the IPA group that overrides it isn't being reported.

And hence the user is getting booted.

So, two questions:

1. Why is the group override not working and how can I get it working or change our set up so that it does work

2. If this is because users's are being timed out of the sss db cache (/var/lib/sss/db/cache_<domain>.ldb ), how can I set the cache refresh to a much much longer period?

cheers L.

------ "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are still here — and we are creative agents who can shape our destinies. Apocalyptic civics is the conviction that the only way out is through, and the only way through is together. "

*Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Jakub Hrozek

18 Aug 18 Aug

2:33 a.m.

On Thu, Aug 17, 2017 at 03:36:20PM +1000, Lachlan Musicman wrote:

...

We use FreeIPA/SSSD to authenticate our RStudio Server, which we control via HBAC membership of an AD group.

Our users are having their sessions ended frequently - once a day or more - with the logged message

17 Aug 2017 05:16:21 [rserver] WARNING User <user>@<domain> could not be authenticated because they do not belong to one of the required groups (rstudio); LOGGED FROM: bool rstudio::server::auth::validateUser(const std::string&, const std::string&, unsigned int, bool) /root/rstudio-pro/src/cpp/server/auth/ServerValidateUser.cpp:103

Most likely this is partially because RStudio server is overly aggressive, but I am also noticing that their log is telling the truth:

id <user>@<domain>

is not returning the full membership set of the user - in particular the user group overrides are not being registered. IE, I can see that <user> is in the appropriate AD group, but the IPA group that overrides it isn't being reported.

And hence the user is getting booted.

So, two questions:

Why is the group override not working and how can I get it working or

change our set up so that it does work

Could you please describe how you set up the group membership with the override so that we could set up a similar environment locally?

...

If this is because users's are being timed out of the sss db cache

(/var/lib/sss/db/cache_<domain>.ldb ), how can I set the cache refresh to a much much longer period?

During login, the group membership should always be fetched again from the server, so the cache should effectively be ignored, precisely so that we want the group membership to be very precise during login. The only additional cache might be the sssd cache for the AD domain data, because the identity data of the AD users are fetched from the IPA server.

But unless your group memberships or overrides are changing a lot, this shouldn't be an issue.

Lachlan Musicman

20 Aug 20 Aug

7:24 p.m.

On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

...

On Thu, Aug 17, 2017 at 03:36:20PM +1000, Lachlan Musicman wrote:

...
We use FreeIPA/SSSD to authenticate our RStudio Server, which we control via HBAC membership of an AD group.

...

...

Why is the group override not working and how can I get it working or

change our set up so that it does work

Could you please describe how you set up the group membership with the override so that we could set up a similar environment locally?

Users that are allowed to use the system belong to an AD group called bioinf_rstudio

In IPA (Centos 7.3, IPA v 4.4.0.14.el7 API: 2.213) there is

- an external group called ad_rstudio, with an external member, bioinf_rstudio@<ad-domain> - a posix group called rstudio that has the external group ad_rstudio as a user group member - it has a single HBAC rule associated, called rstudio_access

- rstudio_access allows users in the (posix) rstudio group access to the single server rstudio@<unix-domain> with the services login, rstudio and sshd.

The rstudio services has nothing else going on - it's just a label.

On the rstudio@<unix-domain> server we have a single file called /etc/pam.d/rstudio which contains

#%PAM-1.0 auth required pam_sss.so

account required pam_sss.so

...

If this is because users's are being timed out of the sss db cache

...
(/var/lib/sss/db/cache_<domain>.ldb ), how can I set the cache refresh

to a

...
much much longer period?

During login, the group membership should always be fetched again from the server, so the cache should effectively be ignored, precisely so that we want the group membership to be very precise during login. The only additional cache might be the sssd cache for the AD domain data, because the identity data of the AD users are fetched from the IPA server.

But unless your group memberships or overrides are changing a lot, this shouldn't be an issue.

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server on request (via login from other machine, via id command).

We have debug_level = 7 in [pam] and [domain/loremipsum], I have now added to [sssd] and [ssh] and will restart.

Is there anything I should be looking out for?

We are using sssd 1.15.3 from COPR for Centos 7.3

cheers L.

*Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857

Jakub Hrozek

21 Aug 21 Aug

9:46 a.m.

On Mon, Aug 21, 2017 at 10:24:50AM +1000, Lachlan Musicman wrote:

...

On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

...
On Thu, Aug 17, 2017 at 03:36:20PM +1000, Lachlan Musicman wrote:

...
We use FreeIPA/SSSD to authenticate our RStudio Server, which we control via HBAC membership of an AD group.

...

...

Why is the group override not working and how can I get it working or

change our set up so that it does work

Could you please describe how you set up the group membership with the override so that we could set up a similar environment locally?

Users that are allowed to use the system belong to an AD group called bioinf_rstudio

In IPA (Centos 7.3, IPA v 4.4.0.14.el7 API: 2.213) there is

an external group called ad_rstudio, with an external member,

bioinf_rstudio@<ad-domain>

a posix group called rstudio that has the external group ad_rstudio as a

user group member

it has a single HBAC rule associated, called rstudio_access

rstudio_access allows users in the (posix) rstudio group access to the

single server rstudio@<unix-domain> with the services login, rstudio and sshd.

The rstudio services has nothing else going on - it's just a label.

On the rstudio@<unix-domain> server we have a single file called /etc/pam.d/rstudio which contains

#%PAM-1.0 auth required pam_sss.so

account required pam_sss.so

...

If this is because users's are being timed out of the sss db cache

...
(/var/lib/sss/db/cache_<domain>.ldb ), how can I set the cache refresh

to a

...
much much longer period?

During login, the group membership should always be fetched again from the server, so the cache should effectively be ignored, precisely so that we want the group membership to be very precise during login. The only additional cache might be the sssd cache for the AD domain data, because the identity data of the AD users are fetched from the IPA server.

But unless your group memberships or overrides are changing a lot, this shouldn't be an issue.

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server on request (via login from other machine, via id command).

We have debug_level = 7 in [pam] and [domain/loremipsum], I have now added to [sssd] and [ssh] and will restart.

Is there anything I should be looking out for?

The only other thing I can think of is the group scope. Please make sure the group is not domain-local, other scopes will do.

Lachlan Musicman

5:33 p.m.

On 22 August 2017 at 00:46, Jakub Hrozek jhrozek@redhat.com wrote:

...

On Mon, Aug 21, 2017 at 10:24:50AM +1000, Lachlan Musicman wrote:

...
On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server

on

...
request (via login from other machine, via id command).

We have debug_level = 7 in [pam] and [domain/loremipsum], I have now

added

...
to [sssd] and [ssh] and will restart.

Is there anything I should be looking out for?

The only other thing I can think of is the group scope. Please make sure the group is not domain-local, other scopes will do.

Hmmm. No, the group is Universal, so that's ruled out. I'm really interested in this one. I wonder if it's because the ipa in CentOS is so ancient?

I added a user override yesterday and it's not noticably changed at all - her account still id's with the AD UID (1750704475) instead of the local UID number (5685), there's no recognition that her account has been added to the appropriate group.

I have tried all of the following without success:

sss_cache -u <username> systemctl stop sssd; sss_cache -u <username>; systemctl start sssd; systemctl stop sssd; sss_cache -E; systemctl start sssd;

I know that I can

systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl start sssd;

I also know that this will *work*.

Unfortunately, we are finding with this set up (HBAC, pam.d, RStudio server) that it kills all active sessions, which is hard to manage on a high use node. ("Users get grumpy").

Also, I saw you mention recently that you wish people didn't do it, so I'm taking this opportunity to provide a debug opportunity - how do we make sss_cache work?

cheers L.

*Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857

Lachlan Musicman

11:48 p.m.

On 22 August 2017 at 08:33, Lachlan Musicman datakid@gmail.com wrote:

...

On 22 August 2017 at 00:46, Jakub Hrozek jhrozek@redhat.com wrote:

...
On Mon, Aug 21, 2017 at 10:24:50AM +1000, Lachlan Musicman wrote:

...
On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server

on

...
request (via login from other machine, via id command).

Just to follow up on this, I've done a comparative test.

On a machine in which <user> comes up correctly (all groups, properly over-written by ipa's idview) and one in which the same <user> does not. Note that both machines are on the same network, run the same OS and SSSD versions.

When I run the following on both servers:

sss_cache -u <user> 'id <user>’

On the sssd client that returns the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0200): Entry [name=<user>,cn=users,cn=domain.com,cn=sysdb] has set [ts_cache] attrs. (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #5760]: Request handler finished [0]: Success (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #5760]: Receiving request data. ....

On the sssd client which does *not* return the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0400): No such entry (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0080): Cannot set attrs for name=<user>,cn=users,cn=domain.com,cn=sysdb, 2 [No such file or directory] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0040): Cache update failed: 2 (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0400): Error: 2 (No such file or directory) (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_save_objects] (0x0040): sysdb_store_user failed [2]: No such file or directory (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_get_user_done] (0x0040): ipa_s2n_save_objects failed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #400]: Request handler finished [0]: Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #400]: Receiving request data. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #400]: Finished. Success. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_std] (0x1000): DP Request [Account #400]: Returning [Success]: 0,0,Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1::domain.com:name=<user>] from reply table (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): DP Request [Account #400]: Request removed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): Number of active DP request: 0 ....

The difference is

[sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)]

What is that? Why is that failing?

Is it because <user> is already cached against the AD UID instead of the IPA UID so the command isn't finding <user> in the ldb?

Cheers L.

Lukas Slebodnik

22 Aug 22 Aug

3:23 a.m.

On (22/08/17 14:48), Lachlan Musicman wrote:

...

On 22 August 2017 at 08:33, Lachlan Musicman datakid@gmail.com wrote:

...
On 22 August 2017 at 00:46, Jakub Hrozek jhrozek@redhat.com wrote:

...
On Mon, Aug 21, 2017 at 10:24:50AM +1000, Lachlan Musicman wrote:

...
On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server

on

...
request (via login from other machine, via id command).

Just to follow up on this, I've done a comparative test.

On a machine in which <user> comes up correctly (all groups, properly over-written by ipa's idview) and one in which the same <user> does not. Note that both machines are on the same network, run the same OS and SSSD versions.

When I run the following on both servers:

sss_cache -u <user> 'id <user>’

On the sssd client that returns the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0200): Entry [name=<user>,cn=users,cn=domain.com,cn=sysdb] has set [ts_cache] attrs. (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #5760]: Request handler finished [0]: Success (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #5760]: Receiving request data. ....

On the sssd client which does *not* return the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0400): No such entry (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0080): Cannot set attrs for name=<user>,cn=users,cn=domain.com,cn=sysdb, 2 [No such file or directory] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0040): Cache update failed: 2 (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0400): Error: 2 (No such file or directory) (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_save_objects] (0x0040): sysdb_store_user failed [2]: No such file or directory (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_get_user_done] (0x0040): ipa_s2n_save_objects failed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #400]: Request handler finished [0]: Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #400]: Receiving request data. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #400]: Finished. Success. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_std] (0x1000): DP Request [Account #400]: Returning [Success]: 0,0,Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1::domain.com:name=<user>] from reply table (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): DP Request [Account #400]: Request removed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): Number of active DP request: 0 ....

The difference is

[sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)]

It is a bug in processing group hierarchy in sssd.

It would be good if you could provide a minimal reproducer because I expect you cannot dump whole directory server for us :-) :-) :-)

Lukas Slebodnik

5:40 a.m.

On (22/08/17 10:23), Lukas Slebodnik wrote:

...

On (22/08/17 14:48), Lachlan Musicman wrote:

...
On 22 August 2017 at 08:33, Lachlan Musicman datakid@gmail.com wrote:

...
On 22 August 2017 at 00:46, Jakub Hrozek jhrozek@redhat.com wrote:

...
On Mon, Aug 21, 2017 at 10:24:50AM +1000, Lachlan Musicman wrote:

...
On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server

on

...
request (via login from other machine, via id command).

Just to follow up on this, I've done a comparative test.

On a machine in which <user> comes up correctly (all groups, properly over-written by ipa's idview) and one in which the same <user> does not. Note that both machines are on the same network, run the same OS and SSSD versions.

When I run the following on both servers:

sss_cache -u <user> 'id <user>’

On the sssd client that returns the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0200): Entry [name=<user>,cn=users,cn=domain.com,cn=sysdb] has set [ts_cache] attrs. (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #5760]: Request handler finished [0]: Success (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #5760]: Receiving request data. ....

On the sssd client which does *not* return the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0400): No such entry (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0080): Cannot set attrs for name=<user>,cn=users,cn=domain.com,cn=sysdb, 2 [No such file or directory] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0040): Cache update failed: 2 (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0400): Error: 2 (No such file or directory) (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_save_objects] (0x0040): sysdb_store_user failed [2]: No such file or directory (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_get_user_done] (0x0040): ipa_s2n_save_objects failed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #400]: Request handler finished [0]: Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #400]: Receiving request data. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #400]: Finished. Success. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_std] (0x1000): DP Request [Account #400]: Returning [Success]: 0,0,Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1::domain.com:name=<user>] from reply table (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): DP Request [Account #400]: Request removed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): Number of active DP request: 0 ....

The difference is

[sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)]

It is a bug in processing group hierarchy in sssd.

It would be good if you could provide a minimal reproducer because I expect you cannot dump whole directory server for us :-) :-) :-)

Another possible solution would be to enable debugging for ldb functions. So we might see also action and not only result.

curl -O /usr/local/lib64/sss_ldb_debug.so https://lslebodn.fedorapeople.org/sss_ldb_debug/sss_ldb_debug.so echo "LD_PRELOAD=/usr/local/lib64/sss_ldb_debug.so" >> /etc/sysconfig/sssd

* clear sssd cache and old sssd log files; rm -f /var/lib/sssd/db/* /var/log/sssd/* * increase debug_level in domain section * restart sssd * reproduce problem

An provide sanitized log files. Feel free to send them privately if you do not want to send them to mailing list.

Lukas Slebodnik

8:03 a.m.

On (22/08/17 12:40), Lukas Slebodnik wrote:

...

On (22/08/17 10:23), Lukas Slebodnik wrote:

...
On (22/08/17 14:48), Lachlan Musicman wrote:

...
On 22 August 2017 at 08:33, Lachlan Musicman datakid@gmail.com wrote:

...
On 22 August 2017 at 00:46, Jakub Hrozek jhrozek@redhat.com wrote:

...
On Mon, Aug 21, 2017 at 10:24:50AM +1000, Lachlan Musicman wrote:

...
On 18 August 2017 at 17:33, Jakub Hrozek jhrozek@redhat.com wrote:

Hmmm. Weird. We are still seeing the "AD group not reflected in cache" problem and am not seeing evidence of SSSD updating from the IPA server

on

...
request (via login from other machine, via id command).

Just to follow up on this, I've done a comparative test.

On a machine in which <user> comes up correctly (all groups, properly over-written by ipa's idview) and one in which the same <user> does not. Note that both machines are on the same network, run the same OS and SSSD versions.

When I run the following on both servers:

sss_cache -u <user> 'id <user>’

On the sssd client that returns the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0200): Entry [name=<user>,cn=users,cn=domain.com,cn=sysdb] has set [ts_cache] attrs. (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #5760]: Request handler finished [0]: Success (Tue Aug 22 13:13:17 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #5760]: Receiving request data. ....

On the sssd client which does *not* return the correct data, sssd_<domain>.log shows:

.... (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sss_domain_get_state] (0x1000): Domain sub.domain.com is Active (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_cache_entry_attr] (0x0400): No such entry (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_set_entry_attr] (0x0080): Cannot set attrs for name=<user>,cn=users,cn=domain.com,cn=sysdb, 2 [No such file or directory] (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0040): Cache update failed: 2 (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [sysdb_store_user] (0x0400): Error: 2 (No such file or directory) (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_save_objects] (0x0040): sysdb_store_user failed [2]: No such file or directory (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [ipa_s2n_get_user_done] (0x0040): ipa_s2n_save_objects failed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_done] (0x0400): DP Request [Account #400]: Request handler finished [0]: Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [_dp_req_recv] (0x0400): DP Request [Account #400]: Receiving request data. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #400]: Finished. Success. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_reply_std] (0x1000): DP Request [Account #400]: Returning [Success]: 0,0,Success (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1::domain.com:name=<user>] from reply table (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): DP Request [Account #400]: Request removed. (Tue Aug 22 12:54:25 2017) [sssd[be[sub.domain.com]]] [dp_req_destructor] (0x0400): Number of active DP request: 0 ....

The difference is

[sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait: No such object (32)]

It is a bug in processing group hierarchy in sssd.

It would be good if you could provide a minimal reproducer because I expect you cannot dump whole directory server for us :-) :-) :-)

Another possible solution would be to enable debugging for ldb functions. So we might see also action and not only result.

curl -O /usr/local/lib64/sss_ldb_debug.so https://lslebodn.fedorapeople.org/sss_ldb_debug/sss_ldb_debug.so

^^ Sorry for typo. It need to be -o

curl -o /usr/local/lib64/sss_ldb_debug.so https://lslebodn.fedorapeople.org/sss_ldb_debug/sss_ldb_debug.so

...

echo "LD_PRELOAD=/usr/local/lib64/sss_ldb_debug.so" >> /etc/sysconfig/sssd

clear sssd cache and old sssd log files; rm -f /var/lib/sssd/db/* /var/log/sssd/*

increase debug_level in domain section

restart sssd

reproduce problem

An provide sanitized log files. Feel free to send them privately if you do not want to send them to mailing list.

Lachlan Musicman

6:20 p.m.

On 22 August 2017 at 20:40, Lukas Slebodnik lslebodn@redhat.com wrote:

...

...
It is a bug in processing group hierarchy in sssd.

It would be good if you could provide a minimal reproducer because I expect you cannot dump whole directory server for us :-) :-) :-)

Another possible solution would be to enable debugging for ldb functions. So we might see also action and not only result.

curl -O /usr/local/lib64/sss_ldb_debug.so https://lslebodn.fedorapeople. org/sss_ldb_debug/sss_ldb_debug.so echo "LD_PRELOAD=/usr/local/lib64/sss_ldb_debug.so" >> /etc/sysconfig/sssd

clear sssd cache and old sssd log files; rm -f /var/lib/sssd/db/*

/var/log/sssd/*

increase debug_level in domain section

restart sssd

reproduce problem

An provide sanitized log files. Feel free to send them privately if you do not want to send them to mailing list.

The problem with this - in the short term - is that we know clearing the sssd cache will fix the problem, so it reproduction will need to wait until the next person is added that this happens to.

When I look at this problem, the biggest issue for us is that magic combination of: despite all servers being identically set up (via Ansible and Katello), this is the only server that it happens on; it doesn't happen every time we add a user; and while we know clearing the cache works - we could automate that as part of the add new user process - clearing cache also kills working sessions on a busy server. Making it impractical and inconvenient to the other users.

But I will add this debug.so for you and do these things and next time it happens I'll slice the log files up for you?

(Thanks for the help)

Cheers L.

Lukas Slebodnik

23 Aug 23 Aug

2:16 a.m.

On (23/08/17 09:20), Lachlan Musicman wrote:

...

On 22 August 2017 at 20:40, Lukas Slebodnik lslebodn@redhat.com wrote:

...
...
It is a bug in processing group hierarchy in sssd.

It would be good if you could provide a minimal reproducer because I expect you cannot dump whole directory server for us :-) :-) :-)

Another possible solution would be to enable debugging for ldb functions. So we might see also action and not only result.

curl -O /usr/local/lib64/sss_ldb_debug.so https://lslebodn.fedorapeople. org/sss_ldb_debug/sss_ldb_debug.so echo "LD_PRELOAD=/usr/local/lib64/sss_ldb_debug.so" >> /etc/sysconfig/sssd

clear sssd cache and old sssd log files; rm -f /var/lib/sssd/db/*

/var/log/sssd/*

increase debug_level in domain section

restart sssd

reproduce problem

An provide sanitized log files. Feel free to send them privately if you do not want to send them to mailing list.

The problem with this - in the short term - is that we know clearing the sssd cache will fix the problem, so it reproduction will need to wait until the next person is added that this happens to.

I see. I though you can reproduce it more easily; but it seems to depend on the order of requested data from sssd.

...

When I look at this problem, the biggest issue for us is that magic combination of: despite all servers being identically set up (via Ansible and Katello), this is the only server that it happens on; it doesn't happen every time we add a user; and while we know clearing the cache works - we could automate that as part of the add new user process - clearing cache also kills working sessions on a busy server. Making it impractical and inconvenient to the other users.

But I will add this debug.so for you and do these things and next time it happens I'll slice the log files up for you?

Maybe it would be enough for troubleshooting just set LD_PRELOAD + restart sssd.

So if you have another machine where you can reproduce + sssd cache was not purged then you can try to do it.

Lachlan Musicman

22 Aug 22 Aug

5:55 p.m.

On 22 August 2017 at 18:23, Lukas Slebodnik lslebodn@redhat.com wrote:

...

On (22/08/17 14:48), Lachlan Musicman wrote:

...
On 22 August 2017 at 08:33, Lachlan Musicman datakid@gmail.com

wrote:It is a bug in processing group hierarchy in sssd.

It would be good if you could provide a minimal reproducer because I expect you cannot dump whole directory server for us :-) :-) :-)

How would I do that? I don't exactly know what you are asking for? A slice of our ldap? Would you like a copy of sssd_<domain>.ldb from working host and not working host and a copy of respective sssd.confs?

Cheers L.

*Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857

2444

Age (days ago)

2450

Last active (days ago)

sssd-users@lists.fedorahosted.org

11 comments

3 participants

tags (0)

participants (3)

Jakub Hrozek
Lachlan Musicman
Lukas Slebodnik