I've got a test instance of FreeIPA 4.4.4 running on F25 that was installed with --external-ca, and the resulting CSR signed with a validity period of 30 days to test behavior around expirations.
Upon booting that instance today, certmonger decided to preemptively renew every IPA cert -- which is a good thing -- but did so without waiting for renewal of the IPA CA cert first, which is less good. Now that instance has a pile of certs that expire in two weeks, since they were signed with and thus tied to the expiration of the old IPA CA cert.
While I'm guessing certmonger will figure this out and do the right thing within a couple weeks -- and with the expectation that this would only happen once per IPA CA renewal with a "real" deployment -- is this the intended behavior?
Logs are a bit of a mess between this and a potentially-resolved SELinux issue with certmonger, but I'll wedge them all into a proper bug report if desired.
-Rob
On Thu, May 25, 2017 at 01:34:16AM -0400, Rob Foehl via FreeIPA-users wrote:
I've got a test instance of FreeIPA 4.4.4 running on F25 that was installed with --external-ca, and the resulting CSR signed with a validity period of 30 days to test behavior around expirations.
Upon booting that instance today, certmonger decided to preemptively renew every IPA cert -- which is a good thing -- but did so without waiting for renewal of the IPA CA cert first, which is less good. Now that instance has a pile of certs that expire in two weeks, since they were signed with and thus tied to the expiration of the old IPA CA cert.
This is not correct. The CA cert must be valid for the leaf cert to be valid, but the CA cert *can* be renewed without requiring leaf certificates to be reissued. So long as the following conditions are met, everything will be fine:
1. The CA's key (and Subject Key Identifier) do not change 2. The CA's Subject DN does not change 3. The new CA certificate gets distributed to clients.
Cheers, Fraser
While I'm guessing certmonger will figure this out and do the right thing within a couple weeks -- and with the expectation that this would only happen once per IPA CA renewal with a "real" deployment -- is this the intended behavior?
Logs are a bit of a mess between this and a potentially-resolved SELinux issue with certmonger, but I'll wedge them all into a proper bug report if desired.
-Rob _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
On Thu, 25 May 2017, Fraser Tweedale wrote:
This is not correct. The CA cert must be valid for the leaf cert to be valid, but the CA cert *can* be renewed without requiring leaf certificates to be reissued. So long as the following conditions are met, everything will be fine:
- The CA's key (and Subject Key Identifier) do not change
- The CA's Subject DN does not change
- The new CA certificate gets distributed to clients.
Huh? The CA cert's validity wasn't in question -- it was still valid, and was used to issue a slew of new certificates, all of which expire in two weeks, at expiration of the original CA cert. It has since been renewed, but that doesn't change the state of any of the leaf certs issued in the interim. Also not sure what the list of conditions has to do with anything, when it's up to "ipa-cacert-manage renew" to get those right.
-Rob
On Thu, May 25, 2017 at 10:59:11AM -0400, Rob Foehl via FreeIPA-users wrote:
On Thu, 25 May 2017, Fraser Tweedale wrote:
This is not correct. The CA cert must be valid for the leaf cert to be valid, but the CA cert *can* be renewed without requiring leaf certificates to be reissued. So long as the following conditions are met, everything will be fine:
- The CA's key (and Subject Key Identifier) do not change
- The CA's Subject DN does not change
- The new CA certificate gets distributed to clients.
Huh? The CA cert's validity wasn't in question -- it was still valid, and was used to issue a slew of new certificates, all of which expire in two weeks, at expiration of the original CA cert. It has since been renewed, but that doesn't change the state of any of the leaf certs issued in the interim. Also not sure what the list of conditions has to do with anything, when it's up to "ipa-cacert-manage renew" to get those right.
-Rob
What is the validity of the leaf certificates? Is the notAfter time of the leaf certificate pegged to the notAfter time of the CA certificate? If so, this is (IMO) a bug.
Thanks, Fraser
On Fri, 26 May 2017, Fraser Tweedale wrote:
What is the validity of the leaf certificates? Is the notAfter time of the leaf certificate pegged to the notAfter time of the CA certificate? If so, this is (IMO) a bug.
The leaf certs' expiration is pegged to that of the CA cert that was used to issue them -- the old one, in this case -- but that is expected behavior for any CA. It wouldn't be semantically valid otherwise, and there's no guarantee that the CA cert will actually be renewed without changing the key.
The odd behavior here is that certmonger woke up, noticed that every IPA cert including the externally-signed IPA CA needed to be renewed, and immediately caused the CA to renew them all. The IPA CA cert itself yielded a log entry like this:
May 25 00:25:21 ipa.example.com dogtag-ipa-ca-renew-agent-submit[868]: Certificate with subject 'CN=Certificate Authority,O=EXAMPLE.COM' is about to expire, use ipa-cacert-manage to renew it
The other 7 or so IPA-generated certificates (host, RA, OCSP, etc.) were renewed using the existing CA cert, with new validity periods tied to that cert. As mentioned, certmonger would likely figure this out and renew them all again using the since-replaced CA cert within the ~2 week period until they all expire again, but this seems like unexpected behavior when the IPA CA cert is signed by an external CA and can't be auto-renewed.
(Actually, based on the order the renewals were submitted, this seems like it'd be an issue even if the CA cert were automatically renewed -- it wasn't the first one to be submitted, either. Incidentally, the certs which were renewed aren't a complete list -- both the "CN=ipa-ca-agent" and "CN=Object Signing Cert" certs weren't renewed and aren't tracked by certmonger.)
-Rob
Rob Foehl via FreeIPA-users wrote:
On Fri, 26 May 2017, Fraser Tweedale wrote:
What is the validity of the leaf certificates? Is the notAfter time of the leaf certificate pegged to the notAfter time of the CA certificate? If so, this is (IMO) a bug.
The leaf certs' expiration is pegged to that of the CA cert that was used to issue them -- the old one, in this case -- but that is expected behavior for any CA. It wouldn't be semantically valid otherwise, and there's no guarantee that the CA cert will actually be renewed without changing the key.
The odd behavior here is that certmonger woke up, noticed that every IPA cert including the externally-signed IPA CA needed to be renewed, and immediately caused the CA to renew them all. The IPA CA cert itself yielded a log entry like this:
May 25 00:25:21 ipa.example.com dogtag-ipa-ca-renew-agent-submit[868]: Certificate with subject 'CN=Certificate Authority,O=EXAMPLE.COM' is about to expire, use ipa-cacert-manage to renew it
The other 7 or so IPA-generated certificates (host, RA, OCSP, etc.) were renewed using the existing CA cert, with new validity periods tied to that cert. As mentioned, certmonger would likely figure this out and renew them all again using the since-replaced CA cert within the ~2 week period until they all expire again, but this seems like unexpected behavior when the IPA CA cert is signed by an external CA and can't be auto-renewed.
(Actually, based on the order the renewals were submitted, this seems like it'd be an issue even if the CA cert were automatically renewed -- it wasn't the first one to be submitted, either. Incidentally, the certs which were renewed aren't a complete list -- both the "CN=ipa-ca-agent" and "CN=Object Signing Cert" certs weren't renewed and aren't tracked by certmonger.)
certmonger doesn't have the context to know internal vs external. It just knows a cert is expiring within its window so it renews it. IMHO this is completely expected.
I believe that certmonger will renew it again as the final day approaches.
The object signing cert is deprecated and not used (it was used to sign a JAR file to automatically configure Firefox). The ipa-ca-agent cert isn't used either, it is an artifact of the dogtag install.
rob
On Fri, 26 May 2017, Rob Crittenden wrote:
Rob Foehl via FreeIPA-users wrote:
On Fri, 26 May 2017, Fraser Tweedale wrote:
What is the validity of the leaf certificates? Is the notAfter time of the leaf certificate pegged to the notAfter time of the CA certificate? If so, this is (IMO) a bug.
The leaf certs' expiration is pegged to that of the CA cert that was used to issue them -- the old one, in this case -- but that is expected behavior for any CA. It wouldn't be semantically valid otherwise, and there's no guarantee that the CA cert will actually be renewed without changing the key.
The odd behavior here is that certmonger woke up, noticed that every IPA cert including the externally-signed IPA CA needed to be renewed, and immediately caused the CA to renew them all. The IPA CA cert itself yielded a log entry like this:
May 25 00:25:21 ipa.example.com dogtag-ipa-ca-renew-agent-submit[868]: Certificate with subject 'CN=Certificate Authority,O=EXAMPLE.COM' is about to expire, use ipa-cacert-manage to renew it
The other 7 or so IPA-generated certificates (host, RA, OCSP, etc.) were renewed using the existing CA cert, with new validity periods tied to that cert. As mentioned, certmonger would likely figure this out and renew them all again using the since-replaced CA cert within the ~2 week period until they all expire again, but this seems like unexpected behavior when the IPA CA cert is signed by an external CA and can't be auto-renewed.
(Actually, based on the order the renewals were submitted, this seems like it'd be an issue even if the CA cert were automatically renewed -- it wasn't the first one to be submitted, either. Incidentally, the certs which were renewed aren't a complete list -- both the "CN=ipa-ca-agent" and "CN=Object Signing Cert" certs weren't renewed and aren't tracked by certmonger.)
certmonger doesn't have the context to know internal vs external. It just knows a cert is expiring within its window so it renews it. IMHO this is completely expected.
Right, I wouldn't expect it to know the provenance of the CA cert... I am wondering whether it should be able to recognize the dependency between the certs, though -- it should be able to recognize the chain, at least.
I believe that certmonger will renew it again as the final day approaches.
So, out of curiosity, I left the VM running through the original CA expiration date to see what would happen. The results aren't pretty:
- the running httpd kept using the old certificate (and CA chain), which broke https sessions to the UI/API (as might be expected);
- certmonger thinks it's renewed everything, with the new expiration dates lined up with that of the replaced external CA;
- none of the services recognize that they have new certs installed, for example the same httpd issue as seen in another thread:
[Fri Jun 09 01:07:37.413789 2017] [:error] [pid 14616] SSL Library Error: -8162 The certificate issuer's certificate has expired. Check your system date and time [Fri Jun 09 01:07:37.413828 2017] [:error] [pid 14616] Unable to verify certificate 'Server-Cert'. Add "NSSEnforceValidCerts off" to nss.conf so the server can start until the problem can be resolved.
vs.
# getcert list -d /etc/httpd/alias -n Server-Cert Number of certificates and requests being tracked: 8. Request ID '20170508063315': status: MONITORING stuck: no key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt' certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB' CA: IPA issuer: CN=Certificate Authority,O=EXAMPLE.COM subject: CN=ipa1.example.com,O=EXAMPLE.COM expires: 2017-06-24 04:32:24 UTC dns: ipa1.example.com principal name: HTTP/ipa1.example.com@EXAMPLE.COM key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment eku: id-kp-serverAuth,id-kp-clientAuth pre-save command: post-save command: /usr/libexec/ipa/certmonger/restart_httpd track: yes auto-renew: yes
- neither httpd nor pki-tomcatd will (re)start as a result, the former fails and the latter just spews stack traces into the logs every minute if forcibly started (even if httpd is band-aided first):
Jun 09 01:14:24 ipa1.example.com server[15236]: WARNING: Exception processing realm com.netscape.cms.tomcat.ProxyRealm@43523130 background process Jun 09 01:14:24 ipa1.example.com server[15236]: javax.ws.rs.ServiceUnavailableException: Subsystem unavailable Jun 09 01:14:24 ipa1.example.com server[15236]: at com.netscape.cms.tomcat.ProxyRealm.backgroundProcess(ProxyRealm.java:130) Jun 09 01:14:24 ipa1.example.com server[15236]: at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1154) Jun 09 01:14:24 ipa1.example.com server[15236]: at org.apache.catalina.core.StandardContext.backgroundProcess(StandardContext.java:5707) Jun 09 01:14:24 ipa1.example.com server[15236]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1377) Jun 09 01:14:24 ipa1.example.com server[15236]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1381) Jun 09 01:14:24 ipa1.example.com server[15236]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1381) Jun 09 01:14:24 ipa1.example.com server[15236]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1349) Jun 09 01:14:24 ipa1.example.com server[15236]: at java.lang.Thread.run(Thread.java:748)
- certmonger never actually logged anything about replacing these certs a second time, and the first round were signed with the now-expired CA cert instead of the new one;
- the UI is only partially functional, after clicking through the certificate warnings in the browser;
- pki-tomcatd is non-functional even if forcibly started, leading to repeated "IPA Error 4301: CertificateOperationError" in the UI when trying to access anything CA-related;
- possibly other issues I haven't discovered yet.
In short, that didn't go particularly well at all, which in some ways brings me back to the original as-yet-unanswered deployment question:
Is trying to do this with an external CA worth the pain?
If I go that route, at some point I'm going to have to replace the CA cert -- and between this test VM and the "phase 2" mentioned in https://www.freeipa.org/page/V4/Distribution_of_CA_certificates_to_clients and the linked Pagure issue 4322, I have basically zero confidence in any of this...
-Rob
On Fri, 9 Jun 2017, I wrote:
In short, that didn't go particularly well at all, which in some ways brings me back to the original as-yet-unanswered deployment question:
Is trying to do this with an external CA worth the pain?
Three attempts at this question, and zero answers...
Can I at least get a yes or no on whether external CA certificate renewal has ever been tested when that certificate is nearing expiration?
I just duplicated last week's result using an earlier snapshot of the same VM and a renewed CA cert with a 3-day validity. certmonger ignored every other cert that it already renewed once with the original CA; whole system is hosed after the original cert expires. It's probably possible to recover by manually replacing every certificate, but I haven't had time to try that.
-Rob
Rob Foehl wrote:
On Fri, 9 Jun 2017, I wrote:
In short, that didn't go particularly well at all, which in some ways brings me back to the original as-yet-unanswered deployment question:
Is trying to do this with an external CA worth the pain?
Three attempts at this question, and zero answers...
Things slip through the cracks.
Can I at least get a yes or no on whether external CA certificate renewal has ever been tested when that certificate is nearing expiration?
Yes. I tested this with IPA v3.0. Did it break in between? Possible.
As I pointed out certmonger is unaware of the certificate chain and focuses only on the cert not-after date and resubmits the CSR to the CA that issued the certificate originally.
I just duplicated last week's result using an earlier snapshot of the same VM and a renewed CA cert with a 3-day validity. certmonger ignored every other cert that it already renewed once with the original CA; whole system is hosed after the original cert expires. It's probably possible to recover by manually replacing every certificate, but I haven't had time to try that.
certmonger checks at days 28, 7, 3, 2 and 1 before expiration by default for certificate expiration so it should have looked at the certs at least two times, three depending on timing (and really, it's seconds before expiration). Did you let the system sit for 3 days before things died? Was anything logged to syslog? Moving time forward a day at a time is insufficient to test this without restarting certmonger.
Even in a worst-case scenario, where all the certs expire, it is a fairly straightforward process to get the services back up by going back in time, renewing the IPA CA then restarting certmonger to renew the service certificates.
Is it perfect? No. A search of the users forum should make that apparent. It has been difficult to reproduce the failures because it's difficult to simulate by moving time around. Several years ago I left VMs running for months to try to simulate failures and it always worked for me.
Note too that there is a difference between certmonger and the renewals. certmonger renews certs but there are helpers that need to fire off to update information within IPA as well and to distribute updated certificates to replicas. These scripts were updated significantly since I wrote them to be much more robust in terms of reliability and logging.
rob
On Thu, 15 Jun 2017, Rob Crittenden wrote:
Rob Foehl wrote:
Can I at least get a yes or no on whether external CA certificate renewal has ever been tested when that certificate is nearing expiration?
Yes. I tested this with IPA v3.0. Did it break in between? Possible.
As I pointed out certmonger is unaware of the certificate chain and focuses only on the cert not-after date and resubmits the CSR to the CA that issued the certificate originally.
Thanks for the reply.
certmonger not knowing about the chain is understandable, as is the resubmission of each tracked cert to the existing CA. Doing this results in a pile of certs that expire relatively quickly, being tied to the old CA, but that's also not surprising -- the surprise is that it only did that once, and has since appeared to ignore them all, even after the CA was renewed manually and the newly-issued-but-short-lived certs tied to the old CA expired.
I just duplicated last week's result using an earlier snapshot of the same VM and a renewed CA cert with a 3-day validity. certmonger ignored every other cert that it already renewed once with the original CA; whole system is hosed after the original cert expires. It's probably possible to recover by manually replacing every certificate, but I haven't had time to try that.
certmonger checks at days 28, 7, 3, 2 and 1 before expiration by default for certificate expiration so it should have looked at the certs at least two times, three depending on timing (and really, it's seconds before expiration). Did you let the system sit for 3 days before things died? Was anything logged to syslog? Moving time forward a day at a time is insufficient to test this without restarting certmonger.
I let the original VM snapshot run for a month straight, renewing the IPA CA by hand after the first round of certmonger-initiated renewals with 14 days til expiration and on the second attempt after expiration. The first attempt used another 30-day cert, the second used a 3-day and was allowed to run straight through. No time jumps while the VM is running, and all snapshots with the VM powered off, so it always booted with an accurate clock.
certmonger never logged anything after the first renewal cycle on either attempt. A 'getcert list' on the long-running VM shows all of the tracked certificates with an expiration date of 2017-06-24, which matches the lifetime of the renewed CA cert, but none of the services attempting to load or use them are happy.
Now that I poke around some more, here's a wrinkle:
# getcert list -d /etc/httpd/alias -n Server-Cert Number of certificates and requests being tracked: 8. Request ID '20170508063315': status: MONITORING stuck: no key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt' certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB' CA: IPA issuer: CN=Certificate Authority,O=EXAMPLE.COM subject: CN=ipa1.example.com,O=EXAMPLE.COM expires: 2017-06-24 04:32:24 UTC dns: ipa1.example.com principal name: HTTP/ipa1.example.com@EXAMPLE.COM key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment eku: id-kp-serverAuth,id-kp-clientAuth pre-save command: post-save command: /usr/libexec/ipa/certmonger/restart_httpd track: yes auto-renew: yes
So certmonger thinks it renewed that, and indeed that certificate is now tied to the new IPA CA lifetime *and* was renewed a week after the original IPA CA renewal on May 24 (even though this was never logged):
# certutil -L -n Server-Cert -d /etc/httpd/alias Certificate: Data: Version: 3 (0x2) Serial Number: 19 (0x13) Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption Issuer: "CN=Certificate Authority,O=EXAMPLE.COM" Validity: Not Before: Wed May 31 14:52:53 2017 Not After : Sat Jun 24 04:32:24 2017 Subject: "CN=ipa1.example.com,O=EXAMPLE.COM"
But httpd still refuses to start with that NSSDB, and this appears to be why:
# certutil -L -n Signing-Cert -d /etc/httpd/alias Certificate: Data: Version: 3 (0x2) Serial Number: 9 (0x9) Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption Issuer: "CN=Certificate Authority,O=EXAMPLE.COM" Validity: Not Before: Mon May 08 06:33:16 2017 Not After : Wed Jun 07 06:25:53 2017 Subject: "CN=Object Signing Cert,O=EXAMPLE.COM"
Does certmonger know how to replace the entire certificate chain in the respective store(s)?
(The third certificate in there, ipaCert / CN=IPA RA, has the same dates as the Server-Cert above.)
Even in a worst-case scenario, where all the certs expire, it is a fairly straightforward process to get the services back up by going back in time, renewing the IPA CA then restarting certmonger to renew the service certificates.
Is it perfect? No. A search of the users forum should make that apparent. It has been difficult to reproduce the failures because it's difficult to simulate by moving time around. Several years ago I left VMs running for months to try to simulate failures and it always worked for me.
I haven't tried kicking the clock around yet... The second attempt booted from a month-old snapshot and immediately blew itself up; renewing the CA cert and restarting certmonger (really, the whole VM) didn't change anything.
Note too that there is a difference between certmonger and the renewals. certmonger renews certs but there are helpers that need to fire off to update information within IPA as well and to distribute updated certificates to replicas. These scripts were updated significantly since I wrote them to be much more robust in terms of reliability and logging.
Consider uses of "certmonger" above to include these... Another wrinkle, discovered early on, was broken SELinux policy that prevented certmonger from running any of them. That was (apparently) fixed by a later selinux-policy-targeted package release, but I haven't tried the whole process from a bare install since. The second test with the 3-day lifetime on the IPA CA renewal should've been okay here. I can try again with a fresh install and relatively short IPA CA cert lifetimes, say 4 days per renewal if that'll be sufficient to provoke this a bit faster.
I'm still worried about the missing "phase 2" when it comes to distributing a new external CA certificate -- the CA I have expires in 3 years, and it'd be nice to know whether I'm shooting myself in the foot if I try signing the for-real IPA CA with it now.
-Rob
Rob Foehl wrote:
On Thu, 15 Jun 2017, Rob Crittenden wrote:
Rob Foehl wrote:
Can I at least get a yes or no on whether external CA certificate renewal has ever been tested when that certificate is nearing expiration?
Yes. I tested this with IPA v3.0. Did it break in between? Possible.
As I pointed out certmonger is unaware of the certificate chain and focuses only on the cert not-after date and resubmits the CSR to the CA that issued the certificate originally.
Thanks for the reply.
certmonger not knowing about the chain is understandable, as is the resubmission of each tracked cert to the existing CA. Doing this results in a pile of certs that expire relatively quickly, being tied to the old CA, but that's also not surprising -- the surprise is that it only did that once, and has since appeared to ignore them all, even after the CA was renewed manually and the newly-issued-but-short-lived certs tied to the old CA expired.
Ok, I'll need to try to reproduce it. It may take me a while to get around to this so feel free to nag me.
I just duplicated last week's result using an earlier snapshot of the same VM and a renewed CA cert with a 3-day validity. certmonger ignored every other cert that it already renewed once with the original CA; whole system is hosed after the original cert expires. It's probably possible to recover by manually replacing every certificate, but I haven't had time to try that.
certmonger checks at days 28, 7, 3, 2 and 1 before expiration by default for certificate expiration so it should have looked at the certs at least two times, three depending on timing (and really, it's seconds before expiration). Did you let the system sit for 3 days before things died? Was anything logged to syslog? Moving time forward a day at a time is insufficient to test this without restarting certmonger.
I let the original VM snapshot run for a month straight, renewing the IPA CA by hand after the first round of certmonger-initiated renewals with 14 days til expiration and on the second attempt after expiration. The first attempt used another 30-day cert, the second used a 3-day and was allowed to run straight through. No time jumps while the VM is running, and all snapshots with the VM powered off, so it always booted with an accurate clock.
certmonger never logged anything after the first renewal cycle on either attempt. A 'getcert list' on the long-running VM shows all of the tracked certificates with an expiration date of 2017-06-24, which matches the lifetime of the renewed CA cert, but none of the services attempting to load or use them are happy.
It depends on why they aren't happy. Are they not happy due to expired certs or something else?
[snip]
But httpd still refuses to start with that NSSDB, and this appears to be why:
# certutil -L -n Signing-Cert -d /etc/httpd/alias Certificate: Data: Version: 3 (0x2) Serial Number: 9 (0x9) Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption Issuer: "CN=Certificate Authority,O=EXAMPLE.COM" Validity: Not Before: Mon May 08 06:33:16 2017 Not After : Wed Jun 07 06:25:53 2017 Subject: "CN=Object Signing Cert,O=EXAMPLE.COM"
mod_nss shouldn't be considering the signing cert so I doubt this is related.
Does certmonger know how to replace the entire certificate chain in the respective store(s)?
(The third certificate in there, ipaCert / CN=IPA RA, has the same dates as the Server-Cert above.)
So it was renewed as well.
certmonger doesn't push out new chains so if that changed in between that would do it. This is another way to test cert validation from the command-line:
# certutil -V -u V -d /etc/httpd/alias -n Server-Cert
If you want to see if updating the CA cert(s) makes any difference.
Even in a worst-case scenario, where all the certs expire, it is a fairly straightforward process to get the services back up by going back in time, renewing the IPA CA then restarting certmonger to renew the service certificates.
Is it perfect? No. A search of the users forum should make that apparent. It has been difficult to reproduce the failures because it's difficult to simulate by moving time around. Several years ago I left VMs running for months to try to simulate failures and it always worked for me.
I haven't tried kicking the clock around yet... The second attempt booted from a month-old snapshot and immediately blew itself up; renewing the CA cert and restarting certmonger (really, the whole VM) didn't change anything.
If the chain changes then yeah, that'd cause problems.
Note too that there is a difference between certmonger and the renewals. certmonger renews certs but there are helpers that need to fire off to update information within IPA as well and to distribute updated certificates to replicas. These scripts were updated significantly since I wrote them to be much more robust in terms of reliability and logging.
Consider uses of "certmonger" above to include these... Another wrinkle, discovered early on, was broken SELinux policy that prevented certmonger from running any of them. That was (apparently) fixed by a later selinux-policy-targeted package release, but I haven't tried the whole process from a bare install since. The second test with the 3-day lifetime on the IPA CA renewal should've been okay here. I can try again with a fresh install and relatively short IPA CA cert lifetimes, say 4 days per renewal if that'll be sufficient to provoke this a bit faster.
I'm still worried about the missing "phase 2" when it comes to distributing a new external CA certificate -- the CA I have expires in 3 years, and it'd be nice to know whether I'm shooting myself in the foot if I try signing the for-real IPA CA with it now.
The really tricky bit is distributing the updated CA chain around. I've been away from IPA for a while but I can give you some bread crumbs. I believe that ipa-cacert-manage can be used to update the stored CA chain in LDAP and then running ipa-certupdate will pull the chain down, it just needs to be run on every master and client.
The biggest gap here is any notification of impending doom: YOUR CA WILL EXPIRE IN 'n' DAYS.
This gap is due to no real way of notifying anyone. We could nag in the UI perhaps, or log, but that wouldn't guarantee that an admin would see it.
E-mail might work but it requires a configured MUA and that can't be assumed, and I seriously doubt a typical IPA admin wants yet another service to configure.
rob
On Mon, 19 Jun 2017, Rob Crittenden wrote:
Rob Foehl wrote:
On Thu, 15 Jun 2017, Rob Crittenden wrote:
Rob Foehl wrote:
Can I at least get a yes or no on whether external CA certificate renewal has ever been tested when that certificate is nearing expiration?
Yes. I tested this with IPA v3.0. Did it break in between? Possible.
As I pointed out certmonger is unaware of the certificate chain and focuses only on the cert not-after date and resubmits the CSR to the CA that issued the certificate originally.
Thanks for the reply.
certmonger not knowing about the chain is understandable, as is the resubmission of each tracked cert to the existing CA. Doing this results in a pile of certs that expire relatively quickly, being tied to the old CA, but that's also not surprising -- the surprise is that it only did that once, and has since appeared to ignore them all, even after the CA was renewed manually and the newly-issued-but-short-lived certs tied to the old CA expired.
Ok, I'll need to try to reproduce it. It may take me a while to get around to this so feel free to nag me.
Consider this that, maybe... I just got around to beating my head against this some more myself. I'm still trying to convince myself that use of an external CA is viable, so I'd resurrected the test VM from May/June and this time actually managed to sort it out. More detail below.
I just duplicated last week's result using an earlier snapshot of the same VM and a renewed CA cert with a 3-day validity. certmonger ignored every other cert that it already renewed once with the original CA; whole system is hosed after the original cert expires. It's probably possible to recover by manually replacing every certificate, but I haven't had time to try that.
certmonger checks at days 28, 7, 3, 2 and 1 before expiration by default for certificate expiration so it should have looked at the certs at least two times, three depending on timing (and really, it's seconds before expiration). Did you let the system sit for 3 days before things died? Was anything logged to syslog? Moving time forward a day at a time is insufficient to test this without restarting certmonger.
I let the original VM snapshot run for a month straight, renewing the IPA CA by hand after the first round of certmonger-initiated renewals with 14 days til expiration and on the second attempt after expiration. The first attempt used another 30-day cert, the second used a 3-day and was allowed to run straight through. No time jumps while the VM is running, and all snapshots with the VM powered off, so it always booted with an accurate clock.
certmonger never logged anything after the first renewal cycle on either attempt. A 'getcert list' on the long-running VM shows all of the tracked certificates with an expiration date of 2017-06-24, which matches the lifetime of the renewed CA cert, but none of the services attempting to load or use them are happy.
It depends on why they aren't happy. Are they not happy due to expired certs or something else?
They weren't happy due to the expired CA certs, and in some cases the leaf certificates hadn't been updated in place due to SELinux denials.
I'm still not sure why certmonger thought it'd replaced certificates when it hadn't, and I don't remember which of the last ~30 snapshots left them in this state, or I'd dig deeper :)
But httpd still refuses to start with that NSSDB, and this appears to be why:
# certutil -L -n Signing-Cert -d /etc/httpd/alias Certificate: Data: Version: 3 (0x2) Serial Number: 9 (0x9) Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption Issuer: "CN=Certificate Authority,O=EXAMPLE.COM" Validity: Not Before: Mon May 08 06:33:16 2017 Not After : Wed Jun 07 06:25:53 2017 Subject: "CN=Object Signing Cert,O=EXAMPLE.COM"
mod_nss shouldn't be considering the signing cert so I doubt this is related.
The startup failures may have been related to the WSGI modules trying to connect to other services with expired certs. Either way, that expired CA cert is what gets presented to HTTPS clients, so it's still a problem.
Does certmonger know how to replace the entire certificate chain in the respective store(s)?
(The third certificate in there, ipaCert / CN=IPA RA, has the same dates as the Server-Cert above.)
So it was renewed as well.
certmonger doesn't push out new chains so if that changed in between that would do it. This is another way to test cert validation from the command-line:
# certutil -V -u V -d /etc/httpd/alias -n Server-Cert
If you want to see if updating the CA cert(s) makes any difference.
Even in a worst-case scenario, where all the certs expire, it is a fairly straightforward process to get the services back up by going back in time, renewing the IPA CA then restarting certmonger to renew the service certificates.
Is it perfect? No. A search of the users forum should make that apparent. It has been difficult to reproduce the failures because it's difficult to simulate by moving time around. Several years ago I left VMs running for months to try to simulate failures and it always worked for me.
I haven't tried kicking the clock around yet... The second attempt booted from a month-old snapshot and immediately blew itself up; renewing the CA cert and restarting certmonger (really, the whole VM) didn't change anything.
If the chain changes then yeah, that'd cause problems.
I think I've stumbled onto what happened here, but I don't know how to reliably reproduce it. See below.
Note too that there is a difference between certmonger and the renewals. certmonger renews certs but there are helpers that need to fire off to update information within IPA as well and to distribute updated certificates to replicas. These scripts were updated significantly since I wrote them to be much more robust in terms of reliability and logging.
Consider uses of "certmonger" above to include these... Another wrinkle, discovered early on, was broken SELinux policy that prevented certmonger from running any of them. That was (apparently) fixed by a later selinux-policy-targeted package release, but I haven't tried the whole process from a bare install since. The second test with the 3-day lifetime on the IPA CA renewal should've been okay here. I can try again with a fresh install and relatively short IPA CA cert lifetimes, say 4 days per renewal if that'll be sufficient to provoke this a bit faster.
I'm still worried about the missing "phase 2" when it comes to distributing a new external CA certificate -- the CA I have expires in 3 years, and it'd be nice to know whether I'm shooting myself in the foot if I try signing the for-real IPA CA with it now.
The really tricky bit is distributing the updated CA chain around. I've been away from IPA for a while but I can give you some bread crumbs. I believe that ipa-cacert-manage can be used to update the stored CA chain in LDAP and then running ipa-certupdate will pull the chain down, it just needs to be run on every master and client.
Bingo. The necessity of running ipa-certupdate in this case isn't really covered anywhere in the documentation, with the best description I could find in https://www.freeipa.org/page/V4/CA_certificate_renewal starting with "there will be a new utility"...
Here's what it took to coerce everything back into working order:
- 'setenforce 0', followed by a shower attempting to wash away the shame
Seriously, the lack of idempotent helper scripts is a huge problem here, and is the underlying cause of most of this pain. certmonger can wind up in a state where it thinks it's replaced certs when it hasn't; various services (including Dogtag and the KDC proxy) can wind up unable to connect to the directory service; et cetera.
See https://bugzilla.redhat.com/show_bug.cgi?id=1475528 for the specific instance still affecting pki-tomcatd.
- Modify /etc/pki/pki-tomcat/ca/CS.cfg and /etc/pki/pki-tomcat/password.conf to use plain LDAP connections, based in part on information found in this post: https://www.redhat.com/archives/freeipa-users/2017-January/msg00216.html
This step was necessary to get pki-tomcatd to start at all, after its client cert had been partially mangled by the earlier renewal attempt.
- Stop certmonger, IPA, and chronyd or ntpd, as appropriate, and roll the clock back to a date when the originally installed certs were valid
- Really stop certmonger, violently, then remove /var/run/ipa/renewal.lock
- Start IPA services via 'ipactl start', wait for everything to come up, then start certmonger and wait for it to settle (which takes a while if it's decided to attempt renewals with the old CA)
- Run 'ipa-cacert-manage renew --external-ca' and sign the resulting CSR with a validity interval that overlaps the original CA cert
- Run 'ipa-cacert-manage renew --external-cert-file=/path/to/ipa-ca.pem --external-cert-file=/path/to/ca.pem' to import the resulting CA chain
- Stop certmonger again, clean up as above if necessary
- Run 'ipa-certupdate', possibly after 'kinit admin' to get a ticket
- Step clock forward to a day or two prior to original leaf certificate expiration, as imposed by the original CA lifetime and within the validity period of the new CA cert
- Start certmonger, wait for it to renew all the leaf certificates, and verify the results with 'getcert list', paying attention to the expiration times across the board
- Assuming this worked: stop all services again, revert the CS.cfg and password.conf changes, and either manually fix the clock and restart everything (including the time service) or just reboot
Here's the catch: this worked the first time I did it, with a new CA set to expire 30 days after the last one and only stepping the clock forward enough to land in the middle of that one. I repeated the whole process (less the CS.cfg steps) with another externally signed CA cert for another 30 days, and after that pass, certmonger refused to update anything using the new CA, clinging instead to the one from the first attempt and reusing its expiration date on all renewed certs.
Why this happened isn't entirely clear, but one thing I did notice after that attempt is that the newly replaced CA wasn't the first one listed in (at least) the NSSDBs for httpd and pki-tomcatd, instead coming second in the list when examined with certutil. I generated another CSR and CA with different dates and a different offset from the second attempt, and ran through the whole process again; the result was even more bizarre, with all five CAs (the original, first renewal, and three recent) now all appearing in the correct order in the NSSDBs, and certmonger happily renewing the leaf certs, pinned to the new CA expiration date.
I'm not sure what to take away from that, other than that it worked eventually, and I now have a functional IPA instance which I'd thought was a lost cause the last time I looked at it. Happy to share anything anyone wants a look at, including the NSSDBs which now look like this:
# certutil -L -d /etc/pki/pki-tomcat/alias
Certificate Nickname Trust Attributes SSL,S/MIME,JAR/XPI
OU=example.com CA,O=example.com,C=US CT,C,C caSigningCert cert-pki-ca CTu,Cu,Cu caSigningCert cert-pki-ca CTu,Cu,Cu Server-Cert cert-pki-ca u,u,u auditSigningCert cert-pki-ca u,u,Pu ocspSigningCert cert-pki-ca u,u,u caSigningCert cert-pki-ca CTu,Cu,Cu caSigningCert cert-pki-ca CTu,Cu,Cu caSigningCert cert-pki-ca CTu,Cu,Cu subsystemCert cert-pki-ca u,u,u
(Aside: is there any sane way to clean these up?)
I'll keep this image around for a while, although I don't plan on spending too much more time with it. Been enough "fun" already...
-Rob
Rob Foehl wrote:
On Mon, 19 Jun 2017, Rob Crittenden wrote:
Rob Foehl wrote:
On Thu, 15 Jun 2017, Rob Crittenden wrote:
Rob Foehl wrote:
Can I at least get a yes or no on whether external CA certificate renewal has ever been tested when that certificate is nearing expiration?
Yes. I tested this with IPA v3.0. Did it break in between? Possible.
As I pointed out certmonger is unaware of the certificate chain and focuses only on the cert not-after date and resubmits the CSR to the CA that issued the certificate originally.
Thanks for the reply.
certmonger not knowing about the chain is understandable, as is the resubmission of each tracked cert to the existing CA. Doing this results in a pile of certs that expire relatively quickly, being tied to the old CA, but that's also not surprising -- the surprise is that it only did that once, and has since appeared to ignore them all, even after the CA was renewed manually and the newly-issued-but-short-lived certs tied to the old CA expired.
Ok, I'll need to try to reproduce it. It may take me a while to get around to this so feel free to nag me.
Consider this that, maybe... I just got around to beating my head against this some more myself. I'm still trying to convince myself that use of an external CA is viable, so I'd resurrected the test VM from May/June and this time actually managed to sort it out. More detail below.
I just duplicated last week's result using an earlier snapshot of the same VM and a renewed CA cert with a 3-day validity. certmonger ignored every other cert that it already renewed once with the original CA; whole system is hosed after the original cert expires. It's probably possible to recover by manually replacing every certificate, but I haven't had time to try that.
certmonger checks at days 28, 7, 3, 2 and 1 before expiration by default for certificate expiration so it should have looked at the certs at least two times, three depending on timing (and really, it's seconds before expiration). Did you let the system sit for 3 days before things died? Was anything logged to syslog? Moving time forward a day at a time is insufficient to test this without restarting certmonger.
I let the original VM snapshot run for a month straight, renewing the IPA CA by hand after the first round of certmonger-initiated renewals with 14 days til expiration and on the second attempt after expiration. The first attempt used another 30-day cert, the second used a 3-day and was allowed to run straight through. No time jumps while the VM is running, and all snapshots with the VM powered off, so it always booted with an accurate clock.
certmonger never logged anything after the first renewal cycle on either attempt. A 'getcert list' on the long-running VM shows all of the tracked certificates with an expiration date of 2017-06-24, which matches the lifetime of the renewed CA cert, but none of the services attempting to load or use them are happy.
It depends on why they aren't happy. Are they not happy due to expired certs or something else?
They weren't happy due to the expired CA certs, and in some cases the leaf certificates hadn't been updated in place due to SELinux denials.
We recently started seeing this as well, https://bugzilla.redhat.com/show_bug.cgi?id=1481388
This is a frustrating issue as the certificate gets issued but the places that need to be updated to reflect it aren't which causes a cascade of failures.
I'm still not sure why certmonger thought it'd replaced certificates when it hadn't, and I don't remember which of the last ~30 snapshots left them in this state, or I'd dig deeper :)
Because certmonger doesn't track the pre or post scripts. From certmonger's perspective (at least in the BZ above) the certificate was successfully renewed but because of SELinux issues parts of the post-command script blew up which leaves things in an unhappy state in general.
But httpd still refuses to start with that NSSDB, and this appears to be why:
# certutil -L -n Signing-Cert -d /etc/httpd/alias Certificate: Data: Version: 3 (0x2) Serial Number: 9 (0x9) Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption Issuer: "CN=Certificate Authority,O=EXAMPLE.COM" Validity: Not Before: Mon May 08 06:33:16 2017 Not After : Wed Jun 07 06:25:53 2017 Subject: "CN=Object Signing Cert,O=EXAMPLE.COM"
mod_nss shouldn't be considering the signing cert so I doubt this is related.
The startup failures may have been related to the WSGI modules trying to connect to other services with expired certs. Either way, that expired CA cert is what gets presented to HTTPS clients, so it's still a problem.
Does certmonger know how to replace the entire certificate chain in the respective store(s)?
(The third certificate in there, ipaCert / CN=IPA RA, has the same dates as the Server-Cert above.)
So it was renewed as well.
certmonger doesn't push out new chains so if that changed in between that would do it. This is another way to test cert validation from the command-line:
# certutil -V -u V -d /etc/httpd/alias -n Server-Cert
If you want to see if updating the CA cert(s) makes any difference.
Even in a worst-case scenario, where all the certs expire, it is a fairly straightforward process to get the services back up by going back in time, renewing the IPA CA then restarting certmonger to renew the service certificates.
Is it perfect? No. A search of the users forum should make that apparent. It has been difficult to reproduce the failures because it's difficult to simulate by moving time around. Several years ago I left VMs running for months to try to simulate failures and it always worked for me.
I haven't tried kicking the clock around yet... The second attempt booted from a month-old snapshot and immediately blew itself up; renewing the CA cert and restarting certmonger (really, the whole VM) didn't change anything.
If the chain changes then yeah, that'd cause problems.
I think I've stumbled onto what happened here, but I don't know how to reliably reproduce it. See below.
Note too that there is a difference between certmonger and the renewals. certmonger renews certs but there are helpers that need to fire off to update information within IPA as well and to distribute updated certificates to replicas. These scripts were updated significantly since I wrote them to be much more robust in terms of reliability and logging.
Consider uses of "certmonger" above to include these... Another wrinkle, discovered early on, was broken SELinux policy that prevented certmonger from running any of them. That was (apparently) fixed by a later selinux-policy-targeted package release, but I haven't tried the whole process from a bare install since. The second test with the 3-day lifetime on the IPA CA renewal should've been okay here. I can try again with a fresh install and relatively short IPA CA cert lifetimes, say 4 days per renewal if that'll be sufficient to provoke this a bit faster.
I'm still worried about the missing "phase 2" when it comes to distributing a new external CA certificate -- the CA I have expires in 3 years, and it'd be nice to know whether I'm shooting myself in the foot if I try signing the for-real IPA CA with it now.
The really tricky bit is distributing the updated CA chain around. I've been away from IPA for a while but I can give you some bread crumbs. I believe that ipa-cacert-manage can be used to update the stored CA chain in LDAP and then running ipa-certupdate will pull the chain down, it just needs to be run on every master and client.
Bingo. The necessity of running ipa-certupdate in this case isn't really covered anywhere in the documentation, with the best description I could find in https://www.freeipa.org/page/V4/CA_certificate_renewal starting with "there will be a new utility"...
Here's what it took to coerce everything back into working order:
'setenforce 0', followed by a shower attempting to wash away the shame
Seriously, the lack of idempotent helper scripts is a huge problem here, and is the underlying cause of most of this pain. certmonger can wind up in a state where it thinks it's replaced certs when it hasn't; various services (including Dogtag and the KDC proxy) can wind up unable to connect to the directory service; et cetera.
Yeah, I'm not sure what the best recourse is. Ideally there should be no need to re-run things manually. Practically it may still be required, but the current scripts aren't exactly meant to be run by end-users. The focus for at least the next release is tightening up loose ends exactly like this.
As for the setenforce 0 once the SELinux issues are ironed out this should no longer be needed. The issue wasn't caught during pre-release testing.
See https://bugzilla.redhat.com/show_bug.cgi?id=1475528 for the specific instance still affecting pki-tomcatd.
Modify /etc/pki/pki-tomcat/ca/CS.cfg and /etc/pki/pki-tomcat/password.conf to use plain LDAP connections, based in part on information found in this post: https://www.redhat.com/archives/freeipa-users/2017-January/msg00216.html
This step was necessary to get pki-tomcatd to start at all, after its client cert had been partially mangled by the earlier renewal attempt.
Stop certmonger, IPA, and chronyd or ntpd, as appropriate, and roll the clock back to a date when the originally installed certs were valid
Really stop certmonger, violently, then remove /var/run/ipa/renewal.lock
Start IPA services via 'ipactl start', wait for everything to come up, then start certmonger and wait for it to settle (which takes a while if it's decided to attempt renewals with the old CA)
Run 'ipa-cacert-manage renew --external-ca' and sign the resulting CSR with a validity interval that overlaps the original CA cert
Run 'ipa-cacert-manage renew --external-cert-file=/path/to/ipa-ca.pem --external-cert-file=/path/to/ca.pem' to import the resulting CA chain
Stop certmonger again, clean up as above if necessary
Run 'ipa-certupdate', possibly after 'kinit admin' to get a ticket
Step clock forward to a day or two prior to original leaf certificate expiration, as imposed by the original CA lifetime and within the validity period of the new CA cert
Start certmonger, wait for it to renew all the leaf certificates, and verify the results with 'getcert list', paying attention to the expiration times across the board
Assuming this worked: stop all services again, revert the CS.cfg and password.conf changes, and either manually fix the clock and restart everything (including the time service) or just reboot
Here's the catch: this worked the first time I did it, with a new CA set to expire 30 days after the last one and only stepping the clock forward enough to land in the middle of that one. I repeated the whole process (less the CS.cfg steps) with another externally signed CA cert for another 30 days, and after that pass, certmonger refused to update anything using the new CA, clinging instead to the one from the first attempt and reusing its expiration date on all renewed certs.
Why this happened isn't entirely clear, but one thing I did notice after that attempt is that the newly replaced CA wasn't the first one listed in (at least) the NSSDBs for httpd and pki-tomcatd, instead coming second in the list when examined with certutil. I generated another CSR and CA with different dates and a different offset from the second attempt, and ran through the whole process again; the result was even more bizarre, with all five CAs (the original, first renewal, and three recent) now all appearing in the correct order in the NSSDBs, and certmonger happily renewing the leaf certs, pinned to the new CA expiration date.
I'm not sure what to take away from that, other than that it worked eventually, and I now have a functional IPA instance which I'd thought was a lost cause the last time I looked at it. Happy to share anything anyone wants a look at, including the NSSDBs which now look like this:
# certutil -L -d /etc/pki/pki-tomcat/alias
Certificate Nickname Trust Attributes
SSL,S/MIME,JAR/XPI
OU=example.com CA,O=example.com,C=US CT,C,C caSigningCert cert-pki-ca CTu,Cu,Cu caSigningCert cert-pki-ca CTu,Cu,Cu Server-Cert cert-pki-ca u,u,u auditSigningCert cert-pki-ca u,u,Pu ocspSigningCert cert-pki-ca u,u,u caSigningCert cert-pki-ca CTu,Cu,Cu caSigningCert cert-pki-ca CTu,Cu,Cu caSigningCert cert-pki-ca CTu,Cu,Cu subsystemCert cert-pki-ca u,u,u
(Aside: is there any sane way to clean these up?)
I'll keep this image around for a while, although I don't plan on spending too much more time with it. Been enough "fun" already...
NSS is supposed to pick the "best" cert to use when there is an overlap of subjects (best as in matches the usage, time is valid, etc). I don't know that the order in the output is meaningful.
To clean up (after making one or several backups of the db files) would be to use certutil -L -a to export all the certs. IIRC it will dump them all into a single PEM file. Edit that file to pull out the one you want, use certutil -D to remove the cert(s) from the db, then certutil -A to add in the one from the PEM file you chose.
Congratulations on most excellent troubleshooting!
rob
freeipa-users@lists.fedorahosted.org