Hi Carlos,
I was going through the latest test failures in rawhide and one of them is in tst-dl-iter-static. I tracked the reason for failure to the fix for the bz 737223.
The bz is an obscure crash in GL applications when using the proprietary nvidia driver and someone tracked it down to the assignment of the vdso soname to l->l_name. The fix was to do that assignment only when debug mask is non-zero, which seems to mask the crash.
Through all that, there seemed to have been no root cause analysis for the crash, so we don't even know if the fix only papered over a symptom or if it actually fixed something. I could easily work around the failed test, but it seems to me that the right thing to here would be to drop the patch and see if the crash is still there, which would obviously be by waiting for bug reports. That would give us a chance to fix it correctly or at least justify the current fix to send it upstream. What do you think?
Siddhesh
The question of corrupt DSOs and hardening against them is entirely up to our good judgement. IIRC this patch was to harden against these DSOs. If we think there is a real bug we should file an issue and investigate.
Your call.
Cheers, Carlos.
-----Original Message----- From: Siddhesh Poyarekar [siddhesh@redhat.com] Received: Friday, 04 Jul 2014, 2:16PM To: glibc@lists.fedoraproject.org CC: carlos@redhat.com Subject: [RFC] Revert patch for bz#737223
Hi Carlos,
I was going through the latest test failures in rawhide and one of them is in tst-dl-iter-static. I tracked the reason for failure to the fix for the bz 737223.
The bz is an obscure crash in GL applications when using the proprietary nvidia driver and someone tracked it down to the assignment of the vdso soname to l->l_name. The fix was to do that assignment only when debug mask is non-zero, which seems to mask the crash.
Through all that, there seemed to have been no root cause analysis for the crash, so we don't even know if the fix only papered over a symptom or if it actually fixed something. I could easily work around the failed test, but it seems to me that the right thing to here would be to drop the patch and see if the crash is still there, which would obviously be by waiting for bug reports. That would give us a chance to fix it correctly or at least justify the current fix to send it upstream. What do you think?
Siddhesh
Hi Carlos,
I was going through the latest test failures in rawhide and one of them is in tst-dl-iter-static. I tracked the reason for failure to the fix for the bz 737223.
The bz is an obscure crash in GL applications when using the proprietary nvidia driver and someone tracked it down to the assignment of the vdso soname to l->l_name. The fix was to do that assignment only when debug mask is non-zero, which seems to mask the crash.
Through all that, there seemed to have been no root cause analysis for the crash, so we don't even know if the fix only papered over a symptom or if it actually fixed something. I could easily work around the failed test, but it seems to me that the right thing to here would be to drop the patch and see if the crash is still there, which would obviously be by waiting for bug reports. That would give us a chance to fix it correctly or at least justify the current fix to send it upstream. What do you think?
Siddhesh
On Sat, Jul 05, 2014 at 10:13:08AM -0400, Carlos O'Donell wrote:
The question of corrupt DSOs and hardening against them is entirely up to our good judgement. IIRC this patch was to harden against these DSOs. If we think there is a real bug we should file an issue and investigate.
Hardening against corrupt DSOs is a valid case, but the problem here is that we don't *really* know why the fix works. In fact, I suspect it only papers over the crashes and for any use that enables debug, the crash will resurface. If it doesn't, then again I suspect that the problem may be something else altogether whose symptom got fixed somehow with this patch. Either way, there's nothing to indicate why the fix works and why it is correct.
Your call.
I'll revert this patch. I don't have nvidia hardware to see if the crash still persists but if it does (that is someone reports it), then we can take a look at it afresh and figure out the best possible fix, this time with better documentation of what we're fixing than "Horrible workaround for horribly broken software".
Siddhesh
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 07/07/2014 05:10 AM, Siddhesh Poyarekar wrote:
On Sat, Jul 05, 2014 at 10:13:08AM -0400, Carlos O'Donell wrote:
The question of corrupt DSOs and hardening against them is entirely up to our good judgement. IIRC this patch was to harden against these DSOs. If we think there is a real bug we should file an issue and investigate.
Hardening against corrupt DSOs is a valid case, but the problem here is that we don't *really* know why the fix works. In fact, I suspect it only papers over the crashes and for any use that enables debug, the crash will resurface. If it doesn't, then again I suspect that the problem may be something else altogether whose symptom got fixed somehow with this patch. Either way, there's nothing to indicate why the fix works and why it is correct.
Your call.
I'll revert this patch. I don't have nvidia hardware to see if the crash still persists but if it does (that is someone reports it), then we can take a look at it afresh and figure out the best possible fix, this time with better documentation of what we're fixing than "Horrible workaround for horribly broken software".
Sounds good to me. It seems negligent on our part to have patches in glibc with poor rationale for their inclusion.
Cheers, Carlos.