Well, we know smp_flags lead to undeterministic builds that lead to
o uncomparable build logs o difficult to trace bugs in the the build output o undeterministic triggering of build system bugs
I had vtk [1] in the queue for a year, and one of the reviewers' demand was to add smp_flags, since "it worked that well". I added this for the sake of getting it reviewed and once the package was reviewed it wouldn't build on Fedora builders. And it was quite non-reproducible on my 4-way system, probably due to different number of make threads, different timings etc. It was also non-reproducable on the Fedora builders - it would fail but at different build stages, so I started to check what BR changed between invocations and the like.
The benefits of smp_flags is that some very big builds are taking less time. So if say openoffice uses them (I have no idea) it would make no sense to forbid them for these packages. But for most packages that usually build under 15 minutes on a serial invocation the drawbacks are higher than the benefits (like package rebuilds simply breaking for no apparent reason like vtk did).
But as *recommendation* we should switch from endorsing it to questioning it and recommending not to use it.
[1] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199405
On Mon, 2007-07-02 at 22:23 +0200, Axel Thimm wrote:
But as *recommendation* we should switch from endorsing it to questioning it and recommending not to use it.
I disagree. Well written code almost never has problems with smp flags.
In the uncommon case (aka, corner cases) where a piece of code doesn't build properly or predictably with smp_mflags, it can be removed, but a comment needs to be added.
# This code won't build properly with smp_mflags ...
~spot
On 7/2/07, Tom spot Callaway tcallawa@redhat.com wrote:
In the uncommon case (aka, corner cases) where a piece of code doesn't build properly or predictably with smp_mflags, it can be removed, but a comment needs to be added.
# This code won't build properly with smp_mflags ...
And maybe a bug filed upstream ?
On Mon, Jul 02, 2007 at 03:58:48PM -0500, Tom spot Callaway wrote:
On Mon, 2007-07-02 at 22:23 +0200, Axel Thimm wrote:
But as *recommendation* we should switch from endorsing it to questioning it and recommending not to use it.
I disagree. Well written code almost never has problems with smp flags.
Well, not "well written", but "trivial". Once you start messing with non-conventional, non-linear builds (just think tex) you start leaving the safe harbour, and you can reverse the above: "complex Makefiles almost never wokr wit hsmpt flags".
So, keeping smp_flags will create pain and detcet such issues, that can be considered a good thing. Only I don't think the pain is worth it. I for one will not start hunting down java Makefile issues which only exhibit on parallel builds.
Make smpt_flags and opt-in instead of opt-out.
On Tue, 2007-07-03 at 00:34 +0200, Axel Thimm wrote:
Well, not "well written", but "trivial". Once you start messing with non-conventional, non-linear builds (just think tex) you start leaving the safe harbour, and you can reverse the above: "complex Makefiles almost never wokr wit hsmpt flags".
Out of 130+ packages I maintain, only 3 of them fail to build with smp_mflags.
That's roughly 2%. IMHO, the guidelines should be for the most common cases.
If smp_mflags doesn't work with your package, no sweat, take it out and document it. But it should be the default, and all packages should try to use it whenever possible.
~spot
On Mon, Jul 02, 2007 at 05:40:43PM -0500, Tom spot Callaway wrote:
On Tue, 2007-07-03 at 00:34 +0200, Axel Thimm wrote:
Well, not "well written", but "trivial". Once you start messing with non-conventional, non-linear builds (just think tex) you start leaving the safe harbour, and you can reverse the above: "complex Makefiles almost never wokr wit hsmpt flags".
Out of 130+ packages I maintain, only 3 of them fail to build with smp_mflags.
That's the package you know about. The ugly thing about smp_flags is that bugs may not exhibit at all, or only on every Nth build.
The submission of vtk was stalled for over a month due to that and I wouldn't count myself as a greenhorn. Imagine Joe Average Packager hitting this on every 40th package (or every 40th first time submitter being killed that way).
That's roughly 2%. IMHO, the guidelines should be for the most common cases.
If the 2% + dark factor are causing too much pain, then the picture changes.
If smp_mflags doesn't work with your package, no sweat, take it out and document it. But it should be the default, and all packages should try to use it whenever possible.
OK, we just disagree on the default policy, this is something we can vote on.
On Tue, 2007-07-03 at 00:47 +0200, Axel Thimm wrote:
If the 2% + dark factor are causing too much pain, then the picture changes.
I'm not sure how much pain they could be causing. If it breaks, take it out and document it.
If code is actually miscompiling due to smp_mflags, you've either found a gcc bug or you've got some really, really broken code.
~spot
On Mon, Jul 02, 2007 at 05:50:17PM -0500, Tom spot Callaway wrote:
On Tue, 2007-07-03 at 00:47 +0200, Axel Thimm wrote:
If the 2% + dark factor are causing too much pain, then the picture changes.
I'm not sure how much pain they could be causing.
Like stalling a package entering Fedora for over a month due to the failures not being reproducible on neither the submitter's nor the reviewer's systems? Not to mention the lost man power?
We don't gain much by triggering Makefile bugs. Most upstream projects don't design their Makefiles for -j usage, it just happens to work 98% or less of the times.
On Monday 02 July 2007 19:02:02 Axel Thimm wrote:
Like stalling a package entering Fedora for over a month due to the failures not being reproducible on neither the submitter's nor the reviewer's systems? Not to mention the lost man power?
We don't gain much by triggering Makefile bugs. Most upstream projects don't design their Makefiles for -j usage, it just happens to work 98% or less of the times.
This sounds like "tribal knowledge". This has come up enough for me that when I have strange compile issues one of the first things I do is strip down the compile flags and start adding them back in one by one until I reproduce the problem where it happened in the first place. Often times I even seek out the smp flags and drop them. So it would be good to have this "tribal knowledge" documented somewhere for people to follow if they're seeing this type of issue, and then we can cut down that month to something /far/ more reasonable.
On Tue, Jul 03, 2007 at 07:01:01AM -0400, Jesse Keating wrote:
On Monday 02 July 2007 19:02:02 Axel Thimm wrote:
Like stalling a package entering Fedora for over a month due to the failures not being reproducible on neither the submitter's nor the reviewer's systems? Not to mention the lost man power?
We don't gain much by triggering Makefile bugs. Most upstream projects don't design their Makefiles for -j usage, it just happens to work 98% or less of the times.
This sounds like "tribal knowledge". This has come up enough for me that when I have strange compile issues one of the first things I do is strip down the compile flags and start adding them back in one by one until I reproduce the problem where it happened in the first place. Often times I even seek out the smp flags and drop them. So it would be good to have this "tribal knowledge" documented somewhere for people to follow if they're seeing this type of issue, and then we can cut down that month to something /far/ more reasonable.
But you understand that the month of trials is about buildon the Fedora builders, not on your own server or workstation? It isn't really a sensible turnover to edit, submit, wait for being queued in, wait for being built on all archs, get the response on failure, dig trhough a web interface, find the logs, donwload them, comaper them to local logs etc. Now add the fact to it that if someone else than the submitter has some suggestion it needs first to get piped through the submitter, who needs to <repeat all steps above>.
It's different if the problem is actually reproducable on your own system.
What's the drawback of a general discommendation of smp_flags (not talking about openoffice and friends)? A submitter waiting twice as long in build time? Is that really a big drawback in comparison to having packages break out of the blue? No Makefile project not intended/tested by upstream for parallel builds is smp_flags safe, the race may just not have been triggered yet (as in the case of vtk).
And if your build succeeds on F7 and start to mysteriously fail on F8 will the first thought be that the hidden make -jX bug hit you or that something in the build environment is screwed?
On Tuesday 03 July 2007 07:47:26 Axel Thimm wrote:
But you understand that the month of trials is about buildon the Fedora builders, not on your own server or workstation? It isn't really a sensible turnover to edit, submit, wait for being queued in, wait for being built on all archs, get the response on failure, dig trhough a web interface, find the logs, donwload them, comaper them to local logs etc. Now add the fact to it that if someone else than the submitter has some suggestion it needs first to get piped through the submitter, who needs to <repeat all steps above>.
Scratch builds are your friend. You can target a single arch, build any srpm you want, etc... No need for the hyperbole of above.
It's different if the problem is actually reproducable on your own system.
What's the drawback of a general discommendation of smp_flags (not talking about openoffice and friends)? A submitter waiting twice as long in build time? Is that really a big drawback in comparison to having packages break out of the blue? No Makefile project not intended/tested by upstream for parallel builds is smp_flags safe, the race may just not have been triggered yet (as in the case of vtk).
And if your build succeeds on F7 and start to mysteriously fail on F8 will the first thought be that the hidden make -jX bug hit you or that something in the build environment is screwed?
To use somebody else's argument it can be useful to /find/ these bugs. If the software doesn't compile right due to smp often it is an upstream bug that needs to be addressed. Maybe the upstream doesn't have access to smp, and we can get them into Fedora and give them access to our build farm so that they can make better code.
And yes, things are faster. The faster they get through the build system, the faster the next job can start, and so on and so forth.
On Tue, Jul 03, 2007 at 08:25:59AM -0400, Jesse Keating wrote:
On Tuesday 03 July 2007 07:47:26 Axel Thimm wrote:
But you understand that the month of trials is about buildon the Fedora builders, not on your own server or workstation? It isn't really a sensible turnover to edit, submit, wait for being queued in, wait for being built on all archs, get the response on failure, dig trhough a web interface, find the logs, donwload them, comaper them to local logs etc. Now add the fact to it that if someone else than the submitter has some suggestion it needs first to get piped through the submitter, who needs to <repeat all steps above>.
Scratch builds are your friend. You can target a single arch, build any srpm you want, etc... No need for the hyperbole of above.
So with scratch builds there is no need to commit in CVS, queue the build, wait for the queue to reach the job, and it really allows everyone and his cat to do it, not only the submitter?
Where's that hyperbole again?
It's different if the problem is actually reproducable on your own system.
What's the drawback of a general discommendation of smp_flags (not talking about openoffice and friends)? A submitter waiting twice as long in build time? Is that really a big drawback in comparison to having packages break out of the blue? No Makefile project not intended/tested by upstream for parallel builds is smp_flags safe, the race may just not have been triggered yet (as in the case of vtk).
And if your build succeeds on F7 and start to mysteriously fail on F8 will the first thought be that the hidden make -jX bug hit you or that something in the build environment is screwed?
To use somebody else's argument it can be useful to /find/ these bugs. If the software doesn't compile right due to smp often it is an upstream bug that needs to be addressed.
Unless upstream never advertized a parallel make feature (which one really does?) and upstream cares more about fixing real code bugs than spurious Makefile racings on an 8-way system.
On Tue, Jul 03, 2007 at 06:07:55PM +0200, Axel Thimm wrote:
Unless upstream never advertized a parallel make feature (which one really does?) and upstream cares more about fixing real code bugs than spurious Makefile racings on an 8-way system.
Sometimes these discussions astound me. "Hey, broken Makefiles exist! Let's stop using smp_mflags everywhere." Can common sense not apply here?
If the Makefile is trivially broken, fix it, send patches upstream, move on. If the Makefile is broken in some overwhelmingly complex manner, ignore it, file bug upstream, document the fact in %build, move on.
joe
"JO" == Joe Orton jorton@redhat.com writes:
JO> Sometimes these discussions astound me. "Hey, broken Makefiles JO> exist! Let's stop using smp_mflags everywhere." Can common sense JO> not apply here?
Actually it did; the five members of the committee present at the meeting today voted unanimously against this proposal.
- J<
On Tue, Jul 03, 2007 at 05:19:38PM -0500, Jason L Tibbitts III wrote:
"JO" == Joe Orton jorton@redhat.com writes:
JO> Sometimes these discussions astound me. "Hey, broken Makefiles JO> exist! Let's stop using smp_mflags everywhere." Can common sense JO> not apply here?
Actually it did; the five members of the committee present at the meeting today voted unanimously against this proposal.
I thought there was no quorum.
And how does that fit in comon sense? ;)
OK, I got it, another one down.
"AT" == Axel Thimm Axel.Thimm@ATrpms.net writes:
AT> I thought there was no quorum.
f13 arrived 44 minutes into the meeting, providing the fifth member and thus a quorum. At that point spot called for a vote. The five members present (tibbs spot f13 abadger1999 scop) all voted against the proposal. (I suppose there were six total votes counting your on-list vote; the minutes will reflect that.)
- J<
On 03 Jul 2007 19:28:33 -0500, Jason L Tibbitts III tibbs@math.uh.edu wrote:
"AT" == Axel Thimm Axel.Thimm@ATrpms.net writes:
AT> I thought there was no quorum.
f13 arrived 44 minutes into the meeting, providing the fifth member and thus a quorum. At that point spot called for a vote. The five members present (tibbs spot f13 abadger1999 scop) all voted against the proposal. (I suppose there were six total votes counting your on-list vote; the minutes will reflect that.)
Ok... I would like to say that is very poor political clarity (I expect such from Congress, not from a group that has higher standards). If for some reason people were against this, came to the meeting, and left thinking that there was no quorum... they have been effectively cut out of the process. Not that this was the case this time... but people should be aware that these events lead to the "secret cabal meetings."
A quorum should always be called/met within a certain amount of time of a meeting being held. Generally, this is 15-20 minutes of the called time. If quorum is not met, its time to cancel the meeting until the next general listing or if an emergency meeting can be called/advertised/met.
Sorry for coming out of the blue here... but I would like to make sure that groups are heard even if the vote goes against their views.
"SJS" == Stephen John Smoogen smooge@gmail.com writes:
SJS> Ok... I would like to say that is very poor political clarity (I SJS> expect such from Congress, not from a group that has higher SJS> standards).
Erm, we were informed ahead of time that f13 would be late, and that fact, along with the fact that we would await his arrival before voting, was mentioned in the meeting.
- J<
On Tue, 2007-07-03 at 18:40 -0600, Stephen John Smoogen wrote:
A quorum should always be called/met within a certain amount of time of a meeting being held. Generally, this is 15-20 minutes of the called time. If quorum is not met, its time to cancel the meeting until the next general listing or if an emergency meeting can be called/advertised/met.
Indeed this is a valid point, but everyone in attendance was informed at the beginning that f13 was arriving late, and everyone opted to stay. (Several members missed the meeting entirely).
~spot
packaging@lists.fedoraproject.org