Odd issue with AD domain authentication (PTR records)

List overview All Threads
Download

newer

older

SSSD Full Cache Refresh

Race condition when /var/lib/sssd...

John Beranek

21 Sep 2015 21 Sep '15

9:10 a.m.

Hi,

Where I work I've just spent a fair amount of time tracing down an issue we were having with Linux servers (CentOS 6) which authenticate against the company Active Directory domain.

We found that SSSD 1.12.4-46.el6 clients were failing to work correctly against a particular DC in one of our sites. Looking in the SSSD logs I discovered it was a Kerberos "TGS-REQ" issue, whereby it would do a request and get back "Principal unknown".

I captured the conversation with tcpdump, and compared it with a conversation with a working DC, and found that the "Prinical unknown" response came back with the Kerberos server listed as:

domaindnszones.example.com

and in the working case was instead the name of the DC, let's say:

site-a-dc01.example.com

Looking further at the DNS records for the affected DC, I found that the DC's IP had 4 PTR records:

site-a-dc01.example.com forestdnszones.example.com domaindnszones.com gc._msdcs.example.com

Given we didn't believe the 3 extra PTRs were performing any useful function, we deleted them, and started SSSD again. SSSD now happily connected to the DC, and is functional.

So, is there any reason why these PTRs would have upset SSSD like they appear to have?

I can supply SSSD logs and/or pcap files off-list if helpful...

Cheers,

John

-- John Beranek To generalise is to be an idiot. http://redux.org.uk/ -- William Blake

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Sumit Bose

21 Sep 21 Sep

11:06 a.m.

On Mon, Sep 21, 2015 at 03:10:50PM +0100, John Beranek wrote:

...

Hi,

Where I work I've just spent a fair amount of time tracing down an issue we were having with Linux servers (CentOS 6) which authenticate against the company Active Directory domain.

We found that SSSD 1.12.4-46.el6 clients were failing to work correctly against a particular DC in one of our sites. Looking in the SSSD logs I discovered it was a Kerberos "TGS-REQ" issue, whereby it would do a request and get back "Principal unknown".

I captured the conversation with tcpdump, and compared it with a conversation with a working DC, and found that the "Prinical unknown" response came back with the Kerberos server listed as:

domaindnszones.example.com

and in the working case was instead the name of the DC, let's say:

site-a-dc01.example.com

Looking further at the DNS records for the affected DC, I found that the DC's IP had 4 PTR records:

site-a-dc01.example.com forestdnszones.example.com domaindnszones.com gc._msdcs.example.com

Given we didn't believe the 3 extra PTRs were performing any useful function, we deleted them, and started SSSD again. SSSD now happily connected to the DC, and is functional.

So, is there any reason why these PTRs would have upset SSSD like they appear to have?

SSSD tries to detect the environment mostly with DNS SRV requests. In general it tries a DNS query for _ldap._tcp.domain.name. In an AD environment where you have sites SSSD first tries to determine the site with the help of a CLDAP request to a DC and then use a query like _ldap._tcp.sitename._sites.domain.name to only get the DCs for the given site. The names returned by the query are considered a valid host names for which Kerberos service tickets can be requested.

forestdnszones.example.com, domaindnszones.com are special names in AD which return all DCs in the forest or in the domain respectively. gc._msdcs.example.com is a special AD SRV record. All do not represent a single DC but a collection of them and hence cannot be used to get a Kerberos ticket. Since they do not relate to a single host I think they should not have a PTR record assigned.

...

I can supply SSSD logs and/or pcap files off-list if helpful...

SSSD logs would be nice. I would like to understand why SSSD fails here and does not continue until the finds a name for which a Kerberos ticket can be returned successful. Feel free to send the log to me directly.

bye, Sumit

...

Cheers,

John

-- John Beranek To generalise is to be an idiot. http://redux.org.uk/ -- William Blake

...

sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-users

John Beranek

1:35 p.m.

On 21/09/2015 17:06, Sumit Bose wrote:

...

On Mon, Sep 21, 2015 at 03:10:50PM +0100, John Beranek wrote:

...
Hi,

Where I work I've just spent a fair amount of time tracing down an issue we were having with Linux servers (CentOS 6) which authenticate against the company Active Directory domain.

We found that SSSD 1.12.4-46.el6 clients were failing to work correctly against a particular DC in one of our sites. Looking in the SSSD logs I discovered it was a Kerberos "TGS-REQ" issue, whereby it would do a request and get back "Principal unknown".

[snip]

...

...
Given we didn't believe the 3 extra PTRs were performing any useful function, we deleted them, and started SSSD again. SSSD now happily connected to the DC, and is functional.

So, is there any reason why these PTRs would have upset SSSD like they appear to have?

SSSD tries to detect the environment mostly with DNS SRV requests. In general it tries a DNS query for _ldap._tcp.domain.name. In an AD environment where you have sites SSSD first tries to determine the site with the help of a CLDAP request to a DC and then use a query like _ldap._tcp.sitename._sites.domain.name to only get the DCs for the given site. The names returned by the query are considered a valid host names for which Kerberos service tickets can be requested.

Well, in this case I've disabled site selection in order to force SSSD to connect to the troublesome DC, so it doesn't do that...

I don't understand quite why SSSD might be performing a reverse lookup on a selected DC's IP, and then using that in a Kerberos request, if that is in fact what it's doing.

...

forestdnszones.example.com, domaindnszones.com are special names in AD which return all DCs in the forest or in the domain respectively. gc._msdcs.example.com is a special AD SRV record. All do not represent a single DC but a collection of them and hence cannot be used to get a Kerberos ticket. Since they do not relate to a single host I think they should not have a PTR record assigned.

It does appear that only one of our DCs had those PTRs defined, and apparently manually, not automatically added by the DC.

...

...
I can supply SSSD logs and/or pcap files off-list if helpful...

SSSD logs would be nice. I would like to understand why SSSD fails here and does not continue until the finds a name for which a Kerberos ticket can be returned successful. Feel free to send the log to me directly.

I've sent log files and config file to Sumit directly.

Cheers,

John

Jakub Hrozek

2:48 p.m.

On Mon, Sep 21, 2015 at 07:35:13PM +0100, John Beranek wrote:

...

I don't understand quite why SSSD might be performing a reverse lookup on a selected DC's IP, and then using that in a Kerberos request, if that is in fact what it's doing.

btw this might not be sssd, but rather libkrb5 or cyrus-sasl. I haven't seen the logs, but I wonder if rdns=False in krb5.conf would help here?

Sumit Bose

23 Sep 23 Sep

3:49 a.m.

On Mon, Sep 21, 2015 at 09:48:59PM +0200, Jakub Hrozek wrote:

...

On Mon, Sep 21, 2015 at 07:35:13PM +0100, John Beranek wrote:

...
I don't understand quite why SSSD might be performing a reverse lookup on a selected DC's IP, and then using that in a Kerberos request, if that is in fact what it's doing.

btw this might not be sssd, but rather libkrb5 or cyrus-sasl. I haven't seen the logs, but I wonder if rdns=False in krb5.conf would help here?

Hi John,

Thank you for the log files. I think Jakub is right the error occurs during ldap_sasl_bind() and the logs show that your working and non-working setup are using the server names to connect to.

In RHEL/CentOS 6 rdns is not set in krb5.conf and the default is 'true' which means that a reverse DNS lookup might happen during the SASL bind (SSSD does not do any reverse lookups on it's own). Since this only covers the Kerberos part you should also set

SASL_NOCANON on

in /etc/openldap/ldap.conf.

In RHEL-7 and current Fedora versions we changed this and have the setting above already in the default installation.

HTH

bye, Sumit

3145

Age (days ago)

3147

Last active (days ago)

sssd-users@lists.fedorahosted.org

4 comments

3 participants

tags (0)

participants (3)

Jakub Hrozek
John Beranek
Sumit Bose