Hi,
Where I work I've just spent a fair amount of time tracing down an issue we were having with Linux servers (CentOS 6) which authenticate against the company Active Directory domain.
We found that SSSD 1.12.4-46.el6 clients were failing to work correctly against a particular DC in one of our sites. Looking in the SSSD logs I discovered it was a Kerberos "TGS-REQ" issue, whereby it would do a request and get back "Principal unknown".
I captured the conversation with tcpdump, and compared it with a conversation with a working DC, and found that the "Prinical unknown" response came back with the Kerberos server listed as:
domaindnszones.example.com
and in the working case was instead the name of the DC, let's say:
site-a-dc01.example.com
Looking further at the DNS records for the affected DC, I found that the DC's IP had 4 PTR records:
site-a-dc01.example.com forestdnszones.example.com domaindnszones.com gc._msdcs.example.com
Given we didn't believe the 3 extra PTRs were performing any useful function, we deleted them, and started SSSD again. SSSD now happily connected to the DC, and is functional.
So, is there any reason why these PTRs would have upset SSSD like they appear to have?
I can supply SSSD logs and/or pcap files off-list if helpful...
Cheers,
John
On Mon, Sep 21, 2015 at 03:10:50PM +0100, John Beranek wrote:
Hi,
Where I work I've just spent a fair amount of time tracing down an issue we were having with Linux servers (CentOS 6) which authenticate against the company Active Directory domain.
We found that SSSD 1.12.4-46.el6 clients were failing to work correctly against a particular DC in one of our sites. Looking in the SSSD logs I discovered it was a Kerberos "TGS-REQ" issue, whereby it would do a request and get back "Principal unknown".
I captured the conversation with tcpdump, and compared it with a conversation with a working DC, and found that the "Prinical unknown" response came back with the Kerberos server listed as:
domaindnszones.example.com
and in the working case was instead the name of the DC, let's say:
site-a-dc01.example.com
Looking further at the DNS records for the affected DC, I found that the DC's IP had 4 PTR records:
site-a-dc01.example.com forestdnszones.example.com domaindnszones.com gc._msdcs.example.com
Given we didn't believe the 3 extra PTRs were performing any useful function, we deleted them, and started SSSD again. SSSD now happily connected to the DC, and is functional.
So, is there any reason why these PTRs would have upset SSSD like they appear to have?
SSSD tries to detect the environment mostly with DNS SRV requests. In general it tries a DNS query for _ldap._tcp.domain.name. In an AD environment where you have sites SSSD first tries to determine the site with the help of a CLDAP request to a DC and then use a query like _ldap._tcp.sitename._sites.domain.name to only get the DCs for the given site. The names returned by the query are considered a valid host names for which Kerberos service tickets can be requested.
forestdnszones.example.com, domaindnszones.com are special names in AD which return all DCs in the forest or in the domain respectively. gc._msdcs.example.com is a special AD SRV record. All do not represent a single DC but a collection of them and hence cannot be used to get a Kerberos ticket. Since they do not relate to a single host I think they should not have a PTR record assigned.
I can supply SSSD logs and/or pcap files off-list if helpful...
SSSD logs would be nice. I would like to understand why SSSD fails here and does not continue until the finds a name for which a Kerberos ticket can be returned successful. Feel free to send the log to me directly.
bye, Sumit
Cheers,
John
-- John Beranek To generalise is to be an idiot. http://redux.org.uk/ -- William Blake
sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users
On 21/09/2015 17:06, Sumit Bose wrote:
On Mon, Sep 21, 2015 at 03:10:50PM +0100, John Beranek wrote:
Hi,
Where I work I've just spent a fair amount of time tracing down an issue we were having with Linux servers (CentOS 6) which authenticate against the company Active Directory domain.
We found that SSSD 1.12.4-46.el6 clients were failing to work correctly against a particular DC in one of our sites. Looking in the SSSD logs I discovered it was a Kerberos "TGS-REQ" issue, whereby it would do a request and get back "Principal unknown".
[snip]
Given we didn't believe the 3 extra PTRs were performing any useful function, we deleted them, and started SSSD again. SSSD now happily connected to the DC, and is functional.
So, is there any reason why these PTRs would have upset SSSD like they appear to have?
SSSD tries to detect the environment mostly with DNS SRV requests. In general it tries a DNS query for _ldap._tcp.domain.name. In an AD environment where you have sites SSSD first tries to determine the site with the help of a CLDAP request to a DC and then use a query like _ldap._tcp.sitename._sites.domain.name to only get the DCs for the given site. The names returned by the query are considered a valid host names for which Kerberos service tickets can be requested.
Well, in this case I've disabled site selection in order to force SSSD to connect to the troublesome DC, so it doesn't do that...
I don't understand quite why SSSD might be performing a reverse lookup on a selected DC's IP, and then using that in a Kerberos request, if that is in fact what it's doing.
forestdnszones.example.com, domaindnszones.com are special names in AD which return all DCs in the forest or in the domain respectively. gc._msdcs.example.com is a special AD SRV record. All do not represent a single DC but a collection of them and hence cannot be used to get a Kerberos ticket. Since they do not relate to a single host I think they should not have a PTR record assigned.
It does appear that only one of our DCs had those PTRs defined, and apparently manually, not automatically added by the DC.
I can supply SSSD logs and/or pcap files off-list if helpful...
SSSD logs would be nice. I would like to understand why SSSD fails here and does not continue until the finds a name for which a Kerberos ticket can be returned successful. Feel free to send the log to me directly.
I've sent log files and config file to Sumit directly.
Cheers,
John
On Mon, Sep 21, 2015 at 07:35:13PM +0100, John Beranek wrote:
I don't understand quite why SSSD might be performing a reverse lookup on a selected DC's IP, and then using that in a Kerberos request, if that is in fact what it's doing.
btw this might not be sssd, but rather libkrb5 or cyrus-sasl. I haven't seen the logs, but I wonder if rdns=False in krb5.conf would help here?
On Mon, Sep 21, 2015 at 09:48:59PM +0200, Jakub Hrozek wrote:
On Mon, Sep 21, 2015 at 07:35:13PM +0100, John Beranek wrote:
I don't understand quite why SSSD might be performing a reverse lookup on a selected DC's IP, and then using that in a Kerberos request, if that is in fact what it's doing.
btw this might not be sssd, but rather libkrb5 or cyrus-sasl. I haven't seen the logs, but I wonder if rdns=False in krb5.conf would help here?
Hi John,
Thank you for the log files. I think Jakub is right the error occurs during ldap_sasl_bind() and the logs show that your working and non-working setup are using the server names to connect to.
In RHEL/CentOS 6 rdns is not set in krb5.conf and the default is 'true' which means that a reverse DNS lookup might happen during the SASL bind (SSSD does not do any reverse lookups on it's own). Since this only covers the Kerberos part you should also set
SASL_NOCANON on
in /etc/openldap/ldap.conf.
In RHEL-7 and current Fedora versions we changed this and have the setting above already in the default installation.
HTH
bye, Sumit
sssd-users@lists.fedorahosted.org