Hi,
I have a weird situation when a user launches a Slurm interactive job on a new compute node with empty sssd cache. I'm not sure if its an issue with Slurm or SSSD/NSS. Below is the output of the Slurm session originated on node ip-0A0B0004 and starting interactive node ip-0A0B0006 by user1 with uid 705601104:
[user1@ip-0A0B0004 ~]$ srun -p lowmem -w lowmem-2 --pty bash
/usr/bin/id: cannot find name for group ID 705600513 /usr/bin/id: cannot find name for user ID 705601104 [I have no name!@ip-0A0B0006 ~]$
From this node I can successfully id user1 or any other AD user (this populates my cache and subsequent Slurm sessions for any user will resolve):
[I have no name!@ip-0A0B0006 ~]$ id user1 uid=705601104(user1) gid=705600513 groups=705600513,705601103(group1) [I have no name!@ip-0A0B0006 ~]$
Here is the relevant session from my sssd_nss.log:
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_send] (0x0400): CR #0: New request 'User by ID' (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_select_domains] (0x0400): CR #0: Performing a multi-domain search (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_domains] (0x0400): CR #0: Search will check the cache and check the data provider (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_set_domain] (0x0400): CR #0: Using domain [jmorey.net] (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_send] (0x0400): CR #0: Looking up UID:705601104@jmorey.net (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_ncache] (0x0400): CR #0: Checking negative cache for [UID:705601104@jmorey.net] (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_ncache] (0x0400): CR #0: [UID:705601104@jmorey.net] is not present in negative cache (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #0: Looking up [UID:705601104@jmorey.net] in cache (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #0: Object [UID:705601104@jmorey.net] was not found in cache (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_dp] (0x0400): CR #0: Looking up [UID:705601104@jmorey.net] in data provider (Fri Jul 10 22:47:58 2020) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing request for [0x562848473820:1:705601104@jmorey.net] (Fri Jul 10 22:47:58 2020) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): Creating request for [jmorey.net][0x1][BE_REQ_USER][idnumber=705601104:-] (Fri Jul 10 22:47:58 2020) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): Entering request [0x562848473820:1:705601104@jmorey.net] (Fri Jul 10 22:47:58 2020) (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_common_dp_recv] (0x0040): CR #0: Data Provider Error: 3, 0, Success (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_common_dp_recv] (0x0400): CR #0: Due to an error we will return cached data (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #0: Looking up [UID:705601104@jmorey.net] in cache (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #0: Object [UID:705601104@jmorey.net] was not found in cache (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_process_result] (0x0400): CR #0: Finished: Not found
To my inexperienced eye it looks like it is communicating with the backend but receives some kind of error so it reverts to the cache, which is empty. The other oddity is the 'User by id' request that is trying to resolve based on the uid not the name. While this is happening I can successfully resolve user1 using SSH to the same node.
The SSH session:
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_send] (0x0400): CR #6: New request 'User by name' (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_process_input] (0x0400): CR #6: Parsing input name [user1] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'user1' matched without domain, user is user1 (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_set_name] (0x0400): CR #6: Setting name [user1] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_select_domains] (0x0400): CR #6: Performing a multi-domain search (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_domains] (0x0400): CR #6: Search will check the cache and check the data provider (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_set_domain] (0x0400): CR #6: Using domain [jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_prepare_domain_data] (0x0400): CR #6: Preparing input data for domain [jmorey.net] rules (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_send] (0x0400): CR #6: Looking up user1@jmorey.net (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_ncache] (0x0400): CR #6: Checking negative cache for [user1@jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_ncache] (0x0400): CR #6: [user1@jmorey.net] is not present in negative cache (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #6: Looking up [user1@jmorey.net] in cache (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #6: Object [user1@jmorey.net] was not found in cache (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_dp] (0x0400): CR #6: Looking up [user1@jmorey.net] in data provider (Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing request for [0x564629422820:1:user1@jmorey.net@jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): Creating request for [jmorey.net][0x1][BE_REQ_USER][name=user1@jmorey.net:-] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): Entering request [0x564629422820:1:user1@jmorey.net@jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got reply from Data Provider - DP error code: 0 errno: 0 error message: Success (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #6: Looking up [user1@jmorey.net] in cache (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_ncache_filter] (0x0400): CR #6: This request type does not support filtering result by negative cache (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_done] (0x0400): CR #6: Returning updated object [user1@jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_create_and_add_result] (0x0400): CR #6: Found 1 entries in domain jmorey.net (Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_req_destructor] (0x0400): Deleting request: [0x564629422820:1:user1@jmorey.net@jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_done] (0x0400): CR #6: Finished: Success
Here is my sssd.conf:
[sssd] config_file_version = 2 services = nss,pam domains = jmorey.net
[nss] filter_users = root filter_groups = root debug_level = 7
[pam]
[domain/jmorey.net] id_provider = ldap auth_provider = ldap access_provider = ldap debug_level = 7 dns_discovery_domain = jmorey.net enumerate = False cache_credentials = True case_sensitive = false ldap_schema = ad
ldap_uri = ldaps://jmorey-novo-ad.jmorey.net ldap_user_search_base = cn=Users,dc=jmorey,dc=net ldap_group_search_base = cn=group1,cn=Users,dc=jmorey,dc=net ldap_referrals = False ldap_tls_reqcert = never ldap_use_tokengroups = True ldap_id_mapping = True override_homedir = /mnt/exports/shared/home/%u fallback_homedir = /shared/home/%u default_shell = /bin/bash ldap_access_order = filter, expire ldap_account_expire_policy = ad ldap_access_filter = (|(memberOf=cn=group1,cn=Users,dc=jmorey,dc=net)) ldap_default_bind_dn = cn=user 1,cn=Users,dc=jmorey,dc=net ldap_default_authtok_type = password ldap_default_authtok =
thanks, Jerry
Hi,
On Thu, Jul 16, 2020 at 2:10 PM Jerry Morey themorey@gmail.com wrote:
Hi,
I have a weird situation when a user launches a Slurm interactive job on
a new compute node with empty sssd cache. I'm not sure if its an issue with Slurm or SSSD/NSS. Below is the output of the Slurm session originated on node ip-0A0B0004 and starting interactive node ip-0A0B0006 by user1 with uid 705601104:
I'm sorry I'm not familiar with Slurm. What does it mean to "start a new compute node"? Does this new node have isolated sssd process and /var/lib/sss/* ?
[user1@ip-0A0B0004 ~]$ srun -p lowmem -w lowmem-2 --pty bash
/usr/bin/id: cannot find name for group ID 705600513 /usr/bin/id: cannot find name for user ID 705601104 [I have no name!@ip-0A0B0006 ~]$
From this node I can successfully id user1 or any other AD user (this
populates my cache and subsequent Slurm sessions for any user will resolve):
[I have no name!@ip-0A0B0006 ~]$ id user1 uid=705601104(user1) gid=705600513 groups=705600513,705601103(group1) [I have no name!@ip-0A0B0006 ~]$
Here is the relevant session from my sssd_nss.log:
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_send] (0x0400): CR #0:
New request 'User by ID'
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_select_domains]
(0x0400): CR #0: Performing a multi-domain search
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_domains]
(0x0400): CR #0: Search will check the cache and check the data provider
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_set_domain] (0x0400):
CR #0: Using domain [jmorey.net]
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_send] (0x0400):
CR #0: Looking up UID:705601104@jmorey.net
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_ncache]
(0x0400): CR #0: Checking negative cache for [UID:705601104@jmorey.net]
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_ncache]
(0x0400): CR #0: [UID:705601104@jmorey.net] is not present in negative cache
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #0: Looking up [UID:705601104@jmorey.net] in cache
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #0: Object [UID:705601104@jmorey.net] was not found in cache
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_dp] (0x0400): CR
#0: Looking up [UID:705601104@jmorey.net] in data provider
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [sss_dp_issue_request] (0x0400):
Issuing request for [0x562848473820:1:705601104@jmorey.net]
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [sss_dp_get_account_msg] (0x0400):
Creating request for [jmorey.net][0x1][BE_REQ_USER][idnumber=705601104:-]
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [sss_dp_internal_get_send]
(0x0400): Entering request [0x562848473820:1:705601104@jmorey.net]
(Fri Jul 10 22:47:58 2020) (Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_common_dp_recv]
(0x0040): CR #0: Data Provider Error: 3, 0, Success
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_common_dp_recv]
(0x0400): CR #0: Due to an error we will return cached data
I would advise to consult "sssd_$domain.log" to figure out what was a reason of failed "[BE_REQ_USER][idnumber=705601104:-]"
But chances are, the backend process was still offline (if SSSD was just started in this fresh virtual env)...
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #0: Looking up [UID:705601104@jmorey.net] in cache
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #0: Object [UID:705601104@jmorey.net] was not found in cache
(Fri Jul 10 22:47:58 2020) [sssd[nss]] [cache_req_process_result]
(0x0400): CR #0: Finished: Not found
To my inexperienced eye it looks like it is communicating with the
backend but receives some kind of error so it reverts to the cache, which is empty. The other oddity is the 'User by id' request that is trying to resolve based on the uid not the name. While this is happening I can successfully resolve user1 using SSH to the same node.
The SSH session:
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_send] (0x0400): CR #6:
New request 'User by name'
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_process_input]
(0x0400): CR #6: Parsing input name [user1]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_parse_name_for_domains]
(0x0200): name 'user1' matched without domain, user is user1
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_set_name] (0x0400): CR
#6: Setting name [user1]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_select_domains]
(0x0400): CR #6: Performing a multi-domain search
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_domains]
(0x0400): CR #6: Search will check the cache and check the data provider
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_set_domain] (0x0400):
CR #6: Using domain [jmorey.net] (Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_prepare_domain_data] (0x0400): CR #6: Preparing input data for domain [jmorey.net] rules
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_send] (0x0400):
CR #6: Looking up user1@jmorey.net
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_ncache]
(0x0400): CR #6: Checking negative cache for [user1@jmorey.net]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_ncache]
(0x0400): CR #6: [user1@jmorey.net] is not present in negative cache
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #6: Looking up [user1@jmorey.net] in cache
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #6: Object [user1@jmorey.net] was not found in cache
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_dp] (0x0400): CR
#6: Looking up [user1@jmorey.net] in data provider
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_issue_request] (0x0400):
Issuing request for [0x564629422820:1:user1@jmorey.net@jmorey.net]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_get_account_msg] (0x0400):
Creating request for [jmorey.net][0x1][BE_REQ_USER][name=user1@jmorey.net:-]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_internal_get_send]
(0x0400): Entering request [0x564629422820:1:user1@jmorey.net@jmorey.net]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got
reply from Data Provider - DP error code: 0 errno: 0 error message: Success
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_cache] (0x0400):
CR #6: Looking up [user1@jmorey.net] in cache
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_ncache_filter]
(0x0400): CR #6: This request type does not support filtering result by negative cache
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_search_done] (0x0400):
CR #6: Returning updated object [user1@jmorey.net]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_create_and_add_result]
(0x0400): CR #6: Found 1 entries in domain jmorey.net
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [sss_dp_req_destructor] (0x0400):
Deleting request: [0x564629422820:1:user1@jmorey.net@jmorey.net]
(Mon Jul 13 20:30:27 2020) [sssd[nss]] [cache_req_done] (0x0400): CR #6:
Finished: Success
Here is my sssd.conf:
[sssd] config_file_version = 2 services = nss,pam domains = jmorey.net
[nss] filter_users = root filter_groups = root debug_level = 7
[pam]
[domain/jmorey.net] id_provider = ldap auth_provider = ldap access_provider = ldap debug_level = 7 dns_discovery_domain = jmorey.net enumerate = False cache_credentials = True case_sensitive = false ldap_schema = ad
ldap_uri = ldaps://jmorey-novo-ad.jmorey.net ldap_user_search_base = cn=Users,dc=jmorey,dc=net ldap_group_search_base = cn=group1,cn=Users,dc=jmorey,dc=net ldap_referrals = False ldap_tls_reqcert = never ldap_use_tokengroups = True ldap_id_mapping = True override_homedir = /mnt/exports/shared/home/%u fallback_homedir = /shared/home/%u default_shell = /bin/bash ldap_access_order = filter, expire ldap_account_expire_policy = ad ldap_access_filter = (|(memberOf=cn=group1,cn=Users,dc=jmorey,dc=net)) ldap_default_bind_dn = cn=user 1,cn=Users,dc=jmorey,dc=net ldap_default_authtok_type = password ldap_default_authtok =
thanks, Jerry _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Thanks Alexey.
I'm sorry I'm not familiar with Slurm. What does it mean to "start a new compute node"? Does this new node have isolated sssd process and /var/lib/sss/* ?
Slurm is a job scheduler that will submit tasks to other compute nodes to process. The compute nodes are separate Linux VMs each with its own sssd process and /var/lib/sss/*.
I would advise to consult "sssd_$domain.log" to figure out what was a reason of failed "[BE_REQ_USER][idnumber=705601104:-]"
But chances are, the backend process was still offline (if SSSD was just started in this fresh virtual env)...
The sssd process appears to be online but I see a mapping issue in sssd_jmorey.net.log:
[dp_get_account_info_handler] (0x0200): Got request for [0x1][BE_REQ_USER][idnumber=705601104] [sssd[be[jmorey.net]]] [dp_attach_req] (0x0400): DP Request [Account #4]: New request. Flags [0x0001]. [sssd[be[jmorey.net]]] [dp_attach_req] (0x0400): Number of active DP request: 1 [sssd[be[jmorey.net]]] [sss_domain_get_state] (0x1000): Domain jmorey.net is Active [sssd[be[jmorey.net]]] [users_get_send] (0x0080): [705601104] did not match any configured ID mapping domain [sysdb_search_user_by_uid] (0x0400): No such entry [sssd[be[jmorey.net]]] [sysdb_delete_user] (0x0400): Error: 2 (No such file or directory) [sssd[be[jmorey.net]]] [dp_req_done] (0x0400): DP Request [Account #4]: Request handler finished [0]: Success [sssd[be[jmorey.net]]] [_dp_req_recv] (0x0400): DP Request [Account #4]: Receiving request data. [sssd[be[jmorey.net]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #4]: Finished. Success. [sssd[be[jmorey.net]]] [dp_req_reply_std] (0x1000): DP Request [Account #4]: Returning [Internal Error]: 3,0,Success
What is this last line, "Returning [Internal Error]: 3,0,Success"?
On Thu, Jul 16, 2020 at 4:42 PM Jerry Morey themorey@gmail.com wrote:
Thanks Alexey.
I'm sorry I'm not familiar with Slurm. What does it mean to "start a new compute node"? Does this new node have isolated sssd process and /var/lib/sss/* ?
Slurm is a job scheduler that will submit tasks to other compute nodes to process. The compute nodes are separate Linux VMs each with its own sssd process and /var/lib/sss/*.
I would advise to consult "sssd_$domain.log" to figure out what was a reason of failed "[BE_REQ_USER][idnumber=705601104:-]"
But chances are, the backend process was still offline (if SSSD was just started in this fresh virtual env)...
The sssd process appears to be online but I see a mapping issue in sssd_jmorey.net.log:
[dp_get_account_info_handler] (0x0200): Got request for [0x1][BE_REQ_USER][idnumber=705601104] [sssd[be[jmorey.net]]] [dp_attach_req] (0x0400): DP Request [Account #4]: New request. Flags [0x0001]. [sssd[be[jmorey.net]]] [dp_attach_req] (0x0400): Number of active DP request: 1 [sssd[be[jmorey.net]]] [sss_domain_get_state] (0x1000): Domain jmorey.net is Active [sssd[be[jmorey.net]]] [users_get_send] (0x0080): [705601104] did not match any configured ID mapping domain
"Domain SID <-> ID range (slice)" map isn't populated yet.
`man sssd-ldap`: "When a user or group entry for a particular domain is encountered for the first time, the SSSD allocates one of the available slices for that domain."
I'm not entirely sure, but it seems this slice (range) for a domain is only created when a first entry SID from this domain is read. Since in this case the first request is "by UID", this didn't happen yet, and there is no data for SSSD to convert UID to a SID.
IMO, work-arounds could be: 1) trigger "by name" look up first (e.g. ssh) 2) using `ldap_idmap_default_domain_sid` option to "bind" domain to a fixed slice (0). IIUC, this should pre-populate id mapping. But please be careful with it, as this result in a *new* UIDs generated for all objects in this domain (since currently this domain clearly maps to a non-zero slice)
[sysdb_search_user_by_uid] (0x0400): No such entry [sssd[be[jmorey.net]]] [sysdb_delete_user] (0x0400): Error: 2 (No such file or directory) [sssd[be[jmorey.net]]] [dp_req_done] (0x0400): DP Request [Account #4]: Request handler finished [0]: Success [sssd[be[jmorey.net]]] [_dp_req_recv] (0x0400): DP Request [Account #4]: Receiving request data. [sssd[be[jmorey.net]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #4]: Finished. Success. [sssd[be[jmorey.net]]] [dp_req_reply_std] (0x1000): DP Request [Account #4]: Returning [Internal Error]: 3,0,Success
What is this last line, "Returning [Internal Error]: 3,0,Success"? _______________________________________________
The workarounds worked but as you suggested the uids changed when bound to slice0, which will cause all sorts of permissions issues.
These are ephemeral VMs in Azure that spin up to process a job and then spin down. As such, I copied a known good sssd cache file (cache_jmorey.net.ldb) at boot to the ephemeral nodes and that also worked. Although I'm not sure of the downside of this approach, the upside is the existing uid/gid are preserved.
IMO, work-arounds could be:
- trigger "by name" look up first (e.g. ssh)
- using `ldap_idmap_default_domain_sid` option to "bind" domain to a
fixed slice (0). IIUC, this should pre-populate id mapping. But please be careful with it, as this result in a *new* UIDs generated for all objects in this domain (since currently this domain clearly maps to a non-zero slice)
JFTR: this meant to be "OR".
Perhaps you could trigger such a lookup by placing `getent -s sss passwd user1` somewhere in a startup script (after sssd started).
Copying cache file feels error prone....
sssd-users@lists.fedorahosted.org