We're announcing a Fedora Activity day coming up very very soon (apologies for the short notice). This activity day is for maintainers, QA, and release engineering folks to meet and discuss ongoing issues with the Fedora Development Cycle and to create a proposal on how to fix many of the issues. Note, this is not an event to decide on a solution, it is an event to decide on a proposal, which will then be shared with the whole community for more input and work.
The timing of this is very short, so that we may have a chance at changing something within the Fedora 12 development cycle. Funding for the event was only confirmed a day or two ago, hence the late notice. If unable to attend (which most will be) but highly interested in helping with the process, we will be attempting to setup a Fedora Talk conference room to use throughout the event, as well as an IRC channel. We'll try to blog the process as well and gather feedback to be used during the event.
If you will be able to attend in person, please add your name to the wiki page [1] so that we can properly plan the space needed within RHT.
While the wiki page says that the page is still under construction, the dates are solid, the hours during the day are mostly solid, and the location (one of the RHT buildings) is solid.
Please feel free to use the discussion page on the wiki to express your thoughts about the event and what problems you're having with the development cycle. Even thoughts on the initial proposal I drew up at the bottom of the wiki page would be welcome, although I do believe that this event will result in a proposal different from what is currently listed.
Again we apologize for the short notice!
[1]: https://fedoraproject.org/wiki/Fedora_Activity_Day_Fedora_Development_Cycle_...
On the wiki page is says that the Budget will be used for certain attendees "Flights, Hotel, Transportation to/from airport/hotel/office/food" I was just curious how that list of attendees who's bill will be taken care of is/was decided upon?
-Adam
On Fri, 2009-05-29 at 16:43 -0500, Adam Miller wrote:
On the wiki page is says that the Budget will be used for certain attendees "Flights, Hotel, Transportation to/from airport/hotel/office/food" I was just curious how that list of attendees who's bill will be taken care of is/was decided upon?
We invited a couple members specifically to this event, and those folks will be covered. I organized the event and wanted specific people there, and got the funding for those people. I think that's how FADs are going to work, somebody has an idea, organizes it and decides (with help of the Fedora Community Action Team) who is funded.
On Fri, May 29, 2009 at 3:57 PM, Jesse Keating jkeating@redhat.com wrote:
We're announcing a Fedora Activity day coming up very very soon (apologies for the short notice). This activity day is for maintainers, QA, and release engineering folks to meet and discuss ongoing issues with the Fedora Development Cycle and to create a proposal on how to fix many of the issues. Note, this is not an event to decide on a solution, it is an event to decide on a proposal, which will then be shared with the whole community for more input and work.
... snip ...
Jesse,
I just want to wish you guys a great and productive FAD.
I hope it inspires other community members to look for opportunities within the parts of the project they work on to organize their own FADs in the future.
If a group of folks in Docs or Art or Infra or whatever could get something important done faster/better by getting together in the same place please do it. If you think you have a good idea but are uncertain how to make it happen feel free to ping your friendly neighborhood ambassador and we'd be happy to help you get the ball rolling.
John
On Fri, May 29, 2009 at 05:48:44PM -0500, inode0 wrote:
On Fri, May 29, 2009 at 3:57 PM, Jesse Keating jkeating@redhat.com wrote:
We're announcing a Fedora Activity day coming up very very soon (apologies for the short notice). This activity day is for maintainers, QA, and release engineering folks to meet and discuss ongoing issues with the Fedora Development Cycle and to create a proposal on how to fix many of the issues. Note, this is not an event to decide on a solution, it is an event to decide on a proposal, which will then be shared with the whole community for more input and work.
... snip ...
Jesse,
I just want to wish you guys a great and productive FAD.
I hope it inspires other community members to look for opportunities within the parts of the project they work on to organize their own FADs in the future.
If a group of folks in Docs or Art or Infra or whatever could get something important done faster/better by getting together in the same place please do it. If you think you have a good idea but are uncertain how to make it happen feel free to ping your friendly neighborhood ambassador and we'd be happy to help you get the ball rolling.
For instance, the Docs team is holding a FAD at the SELF conference in Clemson, SC in a couple weeks:
https://fedoraproject.org/wiki/FAD_SELF_2009
On 29.05.2009 22:57, Jesse Keating wrote:
Please feel free to use the discussion page on the wiki to express your thoughts about the event and what problems you're having with the development cycle. Even thoughts on the initial proposal I drew up at the bottom of the wiki page would be welcome, although I do believe that this event will result in a proposal different from what is currently listed.
Find a few random thoughts/suggestions I added to the wiki below. I send them here as well, because the page in the wiki feels to me like a place in a corner where nearly nobody looks -- thus it's unlikely that a real discussion evolves.
- should we set an way earlier freezes date for things like anaconda, kernel, isolinux, grub and other crucial pieces to make sure they are in better shape a bit earlier and thus are less likely a reason for release slips?
- do we need better communication/more detailed release schedule? I've seen you writing this in #fedora-kernel recently:
18:52:18 < f13> | so here's the deal 18:52:25 < f13> | I need a kernel today, that fixes those bugs on the blocker list 18:52:34 < f13> | or we decide that those bugs are not worth fixing 18:52:40 < f13> | or we slip the release, again. 18:52:53 < f13> | and by "today" I mean building in the next hour or so
From the discussion after that it looked a whole lot like as if some
important kernel developers where *not* aware that the kernel for the release was needed on that day, which I find quite alarming... (even if you got a proper kernel after you wrote the text quoted above)
- how about reducing the number or zero day updates (which is ridiculous high for F11) by setting a different, later freeze date for all packages that are neither on the install DVD or on a Spin?
- other distributions seems to manage a whole lot more test releases (e.g. alphas, beta, RC, milestones, ...) per devel cycle; is that something we should aim for as well?
- at least it sometimes feels like "rawhide installation using boot.iso feels like broken for weeks or months". That annoying and confusing even for people like me. How about targetting "rawhide must be install-able using boot.iso every Friday; crazy new stuff (like a new python release) must get imported on Mondays with the goal to have things in a good shape by Friday"
- how about doing something like a "cp -l development devel-snapshot" now and then (once a week) when we know rawhide is mostly working?
- how can we reduce our time between finishing a (test) release and releasing it dramatically? It seems other distributions get new (test) release out to the users a lot quicker then the three to five days days we require, which seems a whole lot for a devel cycle that takes 180 days in total (and we all know how much rawhide can move on with a few days)
- the anaconda storage rewrite was a bit bumpy and created lot of trouble this devel cycle; what will get taken to make things like that more smooth in the future?
- I'd be glad if we could stick to our release targets a lot better. Delaying releases looks quite unprofessional. Delaying also creates trouble for those depending on our releases. Take computer magazines (which have hard deadlines for productions) that might want to ship with a new release on a CD or DVD together with the next issue -- due to our fame in missing deadlines it seems to me that we are a lot more unattractive than Ubuntu (which afaics is on the shelf's here in Germany with new computer magazines just a few days after it has been released)
- why do we have to slip by a whole week most of the time? can't we find ways to slip just a day or two if there really is no way around a delay?
- I like the idea to "keep rawhide (nearly) always moving" a lot
- until a year or a bit more ago we had three test releases, now we have alpha, beta and preview -- were the changes done together with the renaming worth it (btw, for me it feels a lot like "not much changed" apart from the names).
- I'd be glad if the final release directories (e.g. release/12/Everything" could be available earlier, even if what is in them is not yet what "12" actually will become
CU knurd
On Mon, 2009-06-01 at 19:49 +0200, Thorsten Leemhuis wrote:
On 29.05.2009 22:57, Jesse Keating wrote:
Please feel free to use the discussion page on the wiki to express your thoughts about the event and what problems you're having with the development cycle. Even thoughts on the initial proposal I drew up at the bottom of the wiki page would be welcome, although I do believe that this event will result in a proposal different from what is currently listed.
Find a few random thoughts/suggestions I added to the wiki below. I send them here as well, because the page in the wiki feels to me like a place in a corner where nearly nobody looks -- thus it's unlikely that a real discussion evolves.
- should we set an way earlier freezes date for things like anaconda,
kernel, isolinux, grub and other crucial pieces to make sure they are in better shape a bit earlier and thus are less likely a reason for release slips?
I've asked some of these groups to do that, but it's like asking any other upstream. A lot of the problem stems from things like anaconda using more and more of other bits in the distro, which change significantly right at the freeze date, breaking anaconda. So while we're waiting for anaconda, it's really because something else changed late. I tried to mitigate this by setting the feature freeze back a week from the beta freeze, so that the nasty changes land a week before we freeze for beta, giving folks like anaconda time to survive. It almost worked, and maybe we need to extend that back even farther.
- do we need better communication/more detailed release schedule? I've
seen you writing this in #fedora-kernel recently:
18:52:18 < f13> | so here's the deal 18:52:25 < f13> | I need a kernel today, that fixes those bugs on the blocker list 18:52:34 < f13> | or we decide that those bugs are not worth fixing 18:52:40 < f13> | or we slip the release, again. 18:52:53 < f13> | and by "today" I mean building in the next hour or so
From the discussion after that it looked a whole lot like as if some
important kernel developers where *not* aware that the kernel for the release was needed on that day, which I find quite alarming... (even if you got a proper kernel after you wrote the text quoted above)
That's a fair point. We don't well document or communicate the "points of no return" as it were. We've also had to discuss how we set those. In the past it was the last point at which we could compose and upload the bits to the master mirror in order to have enough mirrors synced at release day. Now it seems more that we have to make that decision within enough time to spin up (or down) the marketing machine, which looks to require more leadtime than the mirrors do.
- how about reducing the number or zero day updates (which is ridiculous
high for F11) by setting a different, later freeze date for all packages that are neither on the install DVD or on a Spin?
That's a bit hard to determine easily and programaticaly. I've all but given up on my quest to reduce the number of updates. It's just doesn't seem to be in line with the desires of the package maintainers, whether or not it's in line with the desires of (some of) the project leaders.
- other distributions seems to manage a whole lot more test releases
(e.g. alphas, beta, RC, milestones, ...) per devel cycle; is that something we should aim for as well?
If there was a way to do it without adding stress and work needed to teams like releng, installer, etc.. that would be possible, also if they were done without freezes. But the overwhelming requests I get are of /less/ of these test releases rather than more. If our release cycle were 9 months or even 12~24 months it would make a lot more sense to have more milestones. But with only a 6 month cycle it gets hard to find time to do development in between all the already existing milestones.
- at least it sometimes feels like "rawhide installation using boot.iso
feels like broken for weeks or months". That annoying and confusing even for people like me. How about targetting "rawhide must be install-able using boot.iso every Friday; crazy new stuff (like a new python release) must get imported on Mondays with the goal to have things in a good shape by Friday"
That's what we hoped to have with the post-beta snapshots. Hard to say if it's had any real impact, likely due to the short amount of time between beta and the final freeze/preview release. If we attempted to do snapshots leading up to the beta that might make it more noticeable, but then we're back in the realm of more milestones instead of less.
- how about doing something like a "cp -l development devel-snapshot"
now and then (once a week) when we know rawhide is mostly working?
The rawhide trees for at least the past month are kept in the Fedora infrastructure and are even available by a public url. We haven't tagged any of them as "stable" though.
- how can we reduce our time between finishing a (test) release and
releasing it dramatically? It seems other distributions get new (test) release out to the users a lot quicker then the three to five days days we require, which seems a whole lot for a devel cycle that takes 180 days in total (and we all know how much rawhide can move on with a few days)
If we didn't care to let the mirrors sync up we could "release" it earlier. However what good does that do when nobody can /get/ to the release because none of the mirrors have it, and the ones that do can't help sync to those that don't because users are tying up all the bandwidth?
- the anaconda storage rewrite was a bit bumpy and created lot of
trouble this devel cycle; what will get taken to make things like that more smooth in the future?
Less snapshots for them to worry about, taking time away from the development and bugfixing effort. Automated testing. Better bug management.
- I'd be glad if we could stick to our release targets a lot better.
Delaying releases looks quite unprofessional. Delaying also creates trouble for those depending on our releases. Take computer magazines (which have hard deadlines for productions) that might want to ship with a new release on a CD or DVD together with the next issue -- due to our fame in missing deadlines it seems to me that we are a lot more unattractive than Ubuntu (which afaics is on the shelf's here in Germany with new computer magazines just a few days after it has been released)
What looks more unprofessional? Delaying the release, or hitting our date and releasing with bugs that eat people's data? Or releasing with broken graphics for large swaths of users?
- why do we have to slip by a whole week most of the time? can't we find
ways to slip just a day or two if there really is no way around a delay?
The marketing machine has very strongly requested that we only do releases on Tuesdays.
I like the idea to "keep rawhide (nearly) always moving" a lot
until a year or a bit more ago we had three test releases, now we have
alpha, beta and preview -- were the changes done together with the renaming worth it (btw, for me it feels a lot like "not much changed" apart from the names).
At first the plan was just Alpha, Beta, and release. But people really wanted a final snapshot after the final freeze to base their tests on, so Preview came to be. Now we're talking about dropping Alpha because of it's dubious value at that point in the release cycle, which would reduce us back down to 2 major snapshots during the cycle, Beta and Preview. That's what the F12 schedule has right now.
- I'd be glad if the final release directories (e.g.
release/12/Everything" could be available earlier, even if what is in them is not yet what "12" actually will become
You'll have to enumerate why that is. One reason I've avoided this is added confusion as to when the "release" happens. If we created that directory and put content in there, would we have then released Fedora 12? When does it become "released" and thus trusted?
CU knurd
On Mon, Jun 01, 2009 at 11:14:26AM -0700, Jesse Keating wrote:
On Mon, 2009-06-01 at 19:49 +0200, Thorsten Leemhuis wrote:
[...snip...]
From the discussion after that it looked a whole lot like as if some important kernel developers where *not* aware that the kernel for the release was needed on that day, which I find quite alarming... (even if you got a proper kernel after you wrote the text quoted above)
That's a fair point. We don't well document or communicate the "points of no return" as it were. We've also had to discuss how we set those. In the past it was the last point at which we could compose and upload the bits to the master mirror in order to have enough mirrors synced at release day. Now it seems more that we have to make that decision within enough time to spin up (or down) the marketing machine, which looks to require more leadtime than the mirrors do.
And release day parties, moving translation deadlines for zero-day updates, and so on... There are a lot of moving parts to fit together.
- other distributions seems to manage a whole lot more test
releases (e.g. alphas, beta, RC, milestones, ...) per devel cycle; is that something we should aim for as well?
If there was a way to do it without adding stress and work needed to teams like releng, installer, etc.. that would be possible, also if they were done without freezes. But the overwhelming requests I get are of /less/ of these test releases rather than more. If our release cycle were 9 months or even 12~24 months it would make a lot more sense to have more milestones. But with only a 6 month cycle it gets hard to find time to do development in between all the already existing milestones.
To what extent do these other distributions:
(1) ...offer a set of tools by which anyone can produce something roughly equivalent to a milestone release, as I can do with pungi any day that the installer is working properly?* Honest question, not snark.
(2) ...consume their content from another integrated packaging source upstream?
* I realize that's a question unto itself, which Thorsten asked about later. Or rather, sooner:
- at least it sometimes feels like "rawhide installation using
boot.iso feels like broken for weeks or months". That annoying and confusing even for people like me. How about targetting "rawhide must be install-able using boot.iso every Friday; crazy new stuff (like a new python release) must get imported on Mondays with the goal to have things in a good shape by Friday"
That's what we hoped to have with the post-beta snapshots. Hard to say if it's had any real impact, likely due to the short amount of time between beta and the final freeze/preview release. If we attempted to do snapshots leading up to the beta that might make it more noticeable, but then we're back in the realm of more milestones instead of less.
- how about doing something like a "cp -l development
devel-snapshot" now and then (once a week) when we know rawhide is mostly working?
The rawhide trees for at least the past month are kept in the Fedora infrastructure and are even available by a public url. We haven't tagged any of them as "stable" though.
I didn't even know this last part. But I think many people, including me, rel-eng and QA, are interested in having a more consistently testable Rawhide.
- how can we reduce our time between finishing a (test) release
and releasing it dramatically? It seems other distributions get new (test) release out to the users a lot quicker then the three to five days days we require, which seems a whole lot for a devel cycle that takes 180 days in total (and we all know how much rawhide can move on with a few days)
If we didn't care to let the mirrors sync up we could "release" it earlier. However what good does that do when nobody can /get/ to the release because none of the mirrors have it, and the ones that do can't help sync to those that don't because users are tying up all the bandwidth?
I'm getting out of my ken here, but could this be done in stages with I2 connected hosts getting the bits early/first and then moving on to others?
[...snip...]
- why do we have to slip by a whole week most of the time? can't we find
ways to slip just a day or two if there really is no way around a delay?
The marketing machine has very strongly requested that we only do releases on Tuesdays.
There is a group of resources inside Red Hat that we're able to call on for enhancing press coverage of our releases. They've given us their expert opinion that Tuesdays are the best day for that. While it's not a concrete requirement to make use of those services, we tend to listen to the experts.
In quite a number of cases, though, a slip necessitates additional testing which itself takes a number of days. Putting in a bug fix and respinning can have its own undesired effects....
- I like the idea to "keep rawhide (nearly) always moving" a lot
It will be interesting to hear the brainstorming around this at the FAD. I think many people would be happy if that moving Rawhide train ended up serving the dual purpose of decreasing late changes in a release cycle, but I'm not sure there's a 1:1 fit there.
On Mon, Jun 01, 2009 at 04:03:04PM -0400, Paul W. Frields wrote:
I'm getting out of my ken here, but could this be done in stages with I2 connected hosts getting the bits early/first and then moving on to others?
We need to move ~130GB to each of ~230 mirrors, in about 4 days.
We already have in place a limited amount of "tiering". The Tier 0/1 mirrors get the bits first, then downstream mirrors pull from them. We have nearly all our "Tier 1" mirrors on I2 (all but the us.kernel.org). Right now it's not mandatory, but no "new" mirrors (those signed up in the last 18 months or so) have been granted ACL permissions to download from the masters.
http://fedoraproject.org/wiki/Infrastructure/Mirroring/Tiering
One of my hopes for the F12 cycle is that we will have increased use of tiering and push mirroring.
On Mon, 2009-06-01 at 17:22 -0500, Matt Domsch wrote:
We need to move ~130GB to each of ~230 mirrors, in about 4 days.
We already have in place a limited amount of "tiering". The Tier 0/1 mirrors get the bits first, then downstream mirrors pull from them. We have nearly all our "Tier 1" mirrors on I2 (all but the us.kernel.org). Right now it's not mandatory, but no "new" mirrors (those signed up in the last 18 months or so) have been granted ACL permissions to download from the masters.
http://fedoraproject.org/wiki/Infrastructure/Mirroring/Tiering
One of my hopes for the F12 cycle is that we will have increased use of tiering and push mirroring.
One problem is that our master mirror is /not/ on I2. So when we put the content on PHX, we have to wait for non-I2 transfer to RDU which is on I2. Then our I2 hosts can get it.
If we had I2 in PHX this would get a lot faster.
On Mon, Jun 01, 2009 at 06:45:07PM -0400, Tom spot Callaway wrote:
On 06/01/2009 06:45 PM, Jesse Keating wrote:
If we had I2 in PHX this would get a lot faster.
We just need to hold some classes and get the PHX datacenter certified as a University. ;)
Not necessarily. I don't see why the Fedora Project couldn't qualify as a Sponsored Participant on Internet2 [1]. In fact, Red Hat is already connected in Raleigh. I'd gladly help pursue this, but I may not be the right person seeing as I'm in Boston, not PHX.
I2 also has a "private lambda" service where you can get your own dedicated 10Gig wavelength across the backbone [2]. It seems they are currently offering no-fee trials of this service to I2 connectors.
Arizona State University is already on I2 via CENIC, and CENIC offers this Dynamic Circuit capability. MCNC in Durham where Red Hat is connected doesn't appear to have DCN though.
[1] http://www.internet2.edu/network/participants/
"Sponsored participants are individual educational institutions (including not-for-profit and for-profit K-20, technical, and trade schools), museums, art galleries, libraries, hospitals, as well as other non-educational, not-for-profit or for-profit organizations that require routine collaboration on instructional, clinical, and/or research projects, services, and content with Primary participants or with other Sponsored Participants. Such organizations typically are either not eligible or not able to become Internet2 members."
[2] http://www.internet2.edu/network/dc/
"To support the development, deployment, and use of innovative hybrid optical networking capabilities, Internet2 is initiating a no-fee trial of the Internet2 DCN."
On 06/03/2009 07:48 PM, Chuck Anderson wrote:
Not necessarily. I don't see why the Fedora Project couldn't qualify as a Sponsored Participant on Internet2 [1]. In fact, Red Hat is already connected in Raleigh.
I think this is because they're technically on NC State University.
~spot
On Thu, 4 Jun 2009, Tom "spot" Callaway wrote:
On 06/03/2009 07:48 PM, Chuck Anderson wrote:
Not necessarily. I don't see why the Fedora Project couldn't qualify as a Sponsored Participant on Internet2 [1]. In fact, Red Hat is already connected in Raleigh.
I think this is because they're technically on NC State University.
and last time I checked it's a 100mbit connection to i2.
-sv
On Thu, Jun 04, 2009 at 10:56:34AM -0400, Seth Vidal wrote:
On Thu, 4 Jun 2009, Tom "spot" Callaway wrote:
On 06/03/2009 07:48 PM, Chuck Anderson wrote:
Not necessarily. I don't see why the Fedora Project couldn't qualify as a Sponsored Participant on Internet2 [1]. In fact, Red Hat is already connected in Raleigh.
I think this is because they're technically on NC State University.
and last time I checked it's a 100mbit connection to i2.
Well, they are listed as "Red Hat" on the Internet2 map. Time to upgrade to Gig :-)
On Mon, Jun 01, 2009 at 15:45:45 -0700, Jesse Keating jkeating@redhat.com wrote:
One problem is that our master mirror is /not/ on I2. So when we put the content on PHX, we have to wait for non-I2 transfer to RDU which is on I2. Then our I2 hosts can get it.
If we had I2 in PHX this would get a lot faster.
What about hand carrying a disk drive over to RDU? Would that be faster than using an internet transfer?
Do you hard link the new release files to the ones identical in rawhide so that rsync doesn't have to transfer them to places mirroring rawhide?
On Mon, 2009-06-01 at 20:56 -0500, Bruno Wolff III wrote:
What about hand carrying a disk drive over to RDU? Would that be faster than using an internet transfer?
Not really.
Do you hard link the new release files to the ones identical in rawhide so that rsync doesn't have to transfer them to places mirroring rawhide?
Yes and no. We do use hardlinks, however the mechanism that gets it from PHX to Raleigh is Netapp snapmirror, which works at the block level. I don't know enough about snapmirror to know if it is helped by hardlinks or not. The individual files aren't the real problem, it's the isos, particularly the live isos.
Jesse Keating wrote:
On Mon, 2009-06-01 at 20:56 -0500, Bruno Wolff III wrote:
Do you hard link the new release files to the ones identical in rawhide so that rsync doesn't have to transfer them to places mirroring rawhide?
Yes and no. We do use hardlinks, however the mechanism that gets it from PHX to Raleigh is Netapp snapmirror, which works at the block level. I don't know enough about snapmirror to know if it is helped by hardlinks or not. The individual files aren't the real problem, it's the isos, particularly the live isos.
A program similar to Jigdo could speed this up. Transfer only the RPM packages (taking advantage of hard links) and information on what packages are in each ISO image, and then recreate the ISO images at the destination. That way each package would only be transferred once, regardless of how many ISO images it occurs in.
Björn Persson
Björn Persson wrote:
A program similar to Jigdo could speed this up. Transfer only the RPM packages (taking advantage of hard links) and information on what packages are in each ISO image, and then recreate the ISO images at the destination. That way each package would only be transferred once, regardless of how many ISO images it occurs in.
Jigdo doesn't work in Fedora unless you want to implement a self-compiling jigdo creator. Why? 'Cause old Fedora updates are not kept on mirrors. A Jigdo file you create today may not work next week. Bad, bad, bad. k?
On Tue, 2009-06-02 at 22:01 +0200, Björn Persson wrote:
Jesse Keating wrote:
On Mon, 2009-06-01 at 20:56 -0500, Bruno Wolff III wrote:
Do you hard link the new release files to the ones identical in rawhide so that rsync doesn't have to transfer them to places mirroring rawhide?
Yes and no. We do use hardlinks, however the mechanism that gets it from PHX to Raleigh is Netapp snapmirror, which works at the block level. I don't know enough about snapmirror to know if it is helped by hardlinks or not. The individual files aren't the real problem, it's the isos, particularly the live isos.
A program similar to Jigdo could speed this up. Transfer only the RPM packages (taking advantage of hard links) and information on what packages are in each ISO image, and then recreate the ISO images at the destination. That way each package would only be transferred once, regardless of how many ISO images it occurs in.
Unfortunately we don't have access/ability to use anything but snapmirror to get content from PHX into RDU where the I2 links are.
On 02.06.2009 00:22, Matt Domsch wrote:
On Mon, Jun 01, 2009 at 04:03:04PM -0400, Paul W. Frields wrote:
I'm getting out of my ken here, but could this be done in stages with I2 connected hosts getting the bits early/first and then moving on to others?
We need to move ~130GB to each of ~230 mirrors, in about 4 days.
I'd like to question these numbers. All the SRPMS (18 GByte) and RPMS (20 GByte per arch) of a new release are in the rawhide trees already -- thus if they are hardlinked properly then they can be transferred in a minute or two. The install CD and DVD images (~ 8 GByte each per arch?) are bigger, but if we get RC with the final name transferred to the mirrors ahead of time then they can be updated relative quickly as well, as only a few bit change.
The spins that get compressed are a problem. But we only send Desktop and KDE spins to the mirrors for x86-32 and x86-64 afaics -- so roughly 3 GByte in total.
So 3 GByte + some other random stuff that changed -- install.img, boot.iso, and some other things. Not sure how much that will sum up to, but can't sum up to that much -- maybe 3 or 5 more GByte.
Let's say it are 10 GBtyte. It afaics shouldn't take more then a day if all mirrors look out for new stuff at least every 4 hours and have a link that isn't slow by todays standards.
Or what am I missing?
Cu knurd
On Tue, Jun 02, 2009 at 09:30:17PM +0200, Thorsten Leemhuis wrote:
On 02.06.2009 00:22, Matt Domsch wrote:
On Mon, Jun 01, 2009 at 04:03:04PM -0400, Paul W. Frields wrote:
I'm getting out of my ken here, but could this be done in stages with I2 connected hosts getting the bits early/first and then moving on to others?
We need to move ~130GB to each of ~230 mirrors, in about 4 days.
The 130GB is what we posted for F10. Of that, 48GB were ISOs, 93GB was the Everything/ tree which was hardlinked from rawhide, so yes, that would cut down on the transfer quite a bit.
Honestly, I don't recall how the signing aspect went for F10. If the process involved signing, then copying the packages into the Everything tree, then there wouldn't be hardlinks to help.
I'd like to question these numbers. All the SRPMS (18 GByte) and RPMS (20 GByte per arch) of a new release are in the rawhide trees already -- thus if they are hardlinked properly then they can be transferred in a minute or two. The install CD and DVD images (~ 8 GByte each per arch?) are bigger, but if we get RC with the final name transferred to the mirrors ahead of time then they can be updated relative quickly as well, as only a few bit change.
RCs with the final name haven't been posted for the mirrors ahead of time, yet. We could consider it for F12 though.
The spins that get compressed are a problem. But we only send Desktop and KDE spins to the mirrors for x86-32 and x86-64 afaics -- so roughly 3 GByte in total.
Right.
So 3 GByte + some other random stuff that changed -- install.img, boot.iso, and some other things. Not sure how much that will sum up to, but can't sum up to that much -- maybe 3 or 5 more GByte.
Let's say it are 10 GBtyte. It afaics shouldn't take more then a day if all mirrors look out for new stuff at least every 4 hours and have a link that isn't slow by todays standards.
Or what am I missing?
I think you're right, except that RC ISOs aren't published ahead of time for mirrors to sync, and signing of the RPMs that land in the Everything tree if they weren't signed, or were signed with a different key. I believe the signing process has changed for F11, so the packages in rawhide are signed with the F11 release key. I think that means at some point, either through a mass rebuild, or through a resigning, rawhide will then need to be re-signed with the F12 key.
On 02.06.2009 22:14, Matt Domsch wrote:
On Tue, Jun 02, 2009 at 09:30:17PM +0200, Thorsten Leemhuis wrote:
On 02.06.2009 00:22, Matt Domsch wrote:
On Mon, Jun 01, 2009 at 04:03:04PM -0400, Paul W. Frields wrote:
I'm getting out of my ken here, but could this be done in stages with I2 connected hosts getting the bits early/first and then moving on to others?
We need to move ~130GB to each of ~230 mirrors, in about 4 days.
The 130GB is what we posted for F10. Of that, 48GB were ISOs, 93GB was the Everything/ tree which was hardlinked from rawhide, so yes, that would cut down on the transfer quite a bit.
Honestly, I don't recall how the signing aspect went for F10. If the process involved signing, then copying the packages into the Everything tree, then there wouldn't be hardlinks to help.
They would -- just copy the files two days ahead of signing somewhere (a temporary hidden place not exported to the world for example). When you sign later and put the files in the proper place then iirc just the signature of the files need to be updated, which rsync handles well.
[...]
Or what am I missing?
I think you're right, except that RC ISOs aren't published ahead of time for mirrors to sync,
Which shouldn't be to hard to do, or?
and signing of the RPMs that land in the Everything tree if they weren't signed, or were signed with a different key. I believe the signing process has changed for F11, so the packages in rawhide are signed with the F11 release key. I think that means at some point, either through a mass rebuild, or through a resigning, rawhide will then need to be re-signed with the F12 key.
Yeah, that's how I understood it as well. But it means it should be easy to hardlink the files in the target place when a test/final release happens, which is something quite easy for rsync afaics.
CU knurd
On Tue, 2009-06-02 at 21:30 +0200, Thorsten Leemhuis wrote:
but if we get RC with the final name transferred to the mirrors ahead of time then they can be updated relative quickly as well, as only a few bit change.
We don't do this as it tends to lead to leaks, and confusion as to whether the release has been done or not.
On 02.06.2009 22:30, Jesse Keating wrote:
On Tue, 2009-06-02 at 21:30 +0200, Thorsten Leemhuis wrote:
but if we get RC with the final name transferred to the mirrors ahead of time then they can be updated relative quickly as well, as only a few bit change.
We don't do this as it tends to lead to leaks, and confusion as to whether the release has been done or not.
Then put it in a temporary folder that is rsynced from the mirror masters, but not exported it to the world. Later it's just a update to the file and a hardlink to a proper place.
Or simply ignore that "there might be leaks" problem more -- the clientele that is huntin for leaks before something is announced is doing something wrong already in any case. And if the whole process from finishing to release would be a whole lot shorter then it shortens the time where something leaks.
CU knurd
Matt Domsch wrote:
On Mon, Jun 01, 2009 at 04:03:04PM -0400, Paul W. Frields wrote:
I'm getting out of my ken here, but could this be done in stages with I2 connected hosts getting the bits early/first and then moving on to others?
We need to move ~130GB to each of ~230 mirrors, in about 4 days.
We already have in place a limited amount of "tiering". The Tier 0/1 mirrors get the bits first, then downstream mirrors pull from them. We have nearly all our "Tier 1" mirrors on I2 (all but the us.kernel.org). Right now it's not mandatory, but no "new" mirrors (those signed up in the last 18 months or so) have been granted ACL permissions to download from the masters.
http://fedoraproject.org/wiki/Infrastructure/Mirroring/Tiering
One of my hopes for the F12 cycle is that we will have increased use of tiering and push mirroring.
What about dropping hierarchical mirroring altogether? Why hasn't someone developed a distributed (i.e. bittorrent-like) system for mass mirroring? :-)
On Tue, Jun 2, 2009 at 3:02 PM, Matthew Woehlke mw_triad@users.sourceforge.net wrote:
Matt Domsch wrote:
What about dropping hierarchical mirroring altogether? Why hasn't someone developed a distributed (i.e. bittorrent-like) system for mass mirroring? :-)
-- Matthew Please do not quote my e-mail address unobfuscated in message bodies. -- Congratulations! You've won a free trip to the future! All you have to do to claim your prize is wait five minutes...
-- fedora-devel-list mailing list fedora-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-devel-list
Why not? Might be an interesting idea to pursue.
-Adam
Matthew Woehlke wrote:
What about dropping hierarchical mirroring altogether? Why hasn't someone developed a distributed (i.e. bittorrent-like) system for mass mirroring? :-)
Already discussed[1][2] on the fedora-test-list.
[1] https://www.redhat.com/archives/fedora-test-list/2009-June/msg00032.html [2] https://www.redhat.com/archives/fedora-test-list/2009-June/msg00062.html
Michael Cronenworth wrote:
Matthew Woehlke wrote:
What about dropping hierarchical mirroring altogether? Why hasn't someone developed a distributed (i.e. bittorrent-like) system for mass mirroring? :-)
Already discussed[1][2] on the fedora-test-list.
[1] https://www.redhat.com/archives/fedora-test-list/2009-June/msg00032.html [2] https://www.redhat.com/archives/fedora-test-list/2009-June/msg00062.html
It's too bad fedora-test-list doesn't seem to be on gmane (or isn't named obviously; gmane.org is being too slow for me to ask about the mail address).
In an idealized network (all servers have roughly the same speed links to all other servers), BT distribution should get everything to everyone in about 2x as long as to send everything to one server. In the worst case, it should take 2x as long as to send everything over the slowest link in the mesh, which if only care about when /all/ mirrors are fully synced (a very reasonable assumption in this type of scenario) is still pretty close to being an unconditional improvement. In practice, the actual result will be somewhere between 2x the time to transfer over the fastest link, and over the slowest link.
In a generalized sense, the time-order to distribute via bittorrent in an idealized network is O(2 * K), where hierarchical systems are, at best I believe O(log(base N) K) for the furthest mirrors, and still O(N0 * K) for the tier-0 mirrors. That's an improvement of an entire order (O(log n) -> O(n)).
The point about there not being tools currently is what would need to be addressed.
Kevin Kofler, Wed, 03 Jun 2009 01:29:35 +0200:
gmane.linux.redhat.fedora.testers
Jesse Keating wrote:
Now it seems more that we have to make that decision within enough time to spin up (or down) the marketing machine, which looks to require more leadtime than the mirrors do.
[snip]
The marketing machine has very strongly requested that we only do releases on Tuesdays.
I think it's a bad idea to delay the releases more than necessary for marketing reasons. IMHO we should release as soon as technically feasible.
You'll have to enumerate why that is. One reason I've avoided this is added confusion as to when the "release" happens. If we created that directory and put content in there, would we have then released Fedora 12? When does it become "released" and thus trusted?
It means people can go from Rawhide to the release without service interruption. The way it is now, once the release is finalized, the mirror list redirect of the release tree to Rawhide gets turned off (because Rawhide moves on to the next release), but the Everything directory gets opened up only a few days later, on the official release day. That means people cannot install packages from the Everything repository (which can also affect updates because they can add dependencies) for a few days.
It also seems natural that, if Rawhide moves on to F13 early, the F12 branch will get put directly into releases/12/Everything so the mirrors won't have to sync everything again once the release becomes official.
It shall also be noted that a certain distribution with a 'U' even opens up the repository for the next release right after releasing the previous one, they don't call their development repository "development", "rawhide" or something like that, but with the release name of the next release.
Kevin Kofler
On Tue, 2009-06-02 at 00:45 +0200, Kevin Kofler wrote:
It means people can go from Rawhide to the release without service interruption. The way it is now, once the release is finalized, the mirror list redirect of the release tree to Rawhide gets turned off (because Rawhide moves on to the next release), but the Everything directory gets opened up only a few days later, on the official release day. That means people cannot install packages from the Everything repository (which can also affect updates because they can add dependencies) for a few days.
It also seems natural that, if Rawhide moves on to F13 early, the F12 branch will get put directly into releases/12/Everything so the mirrors won't have to sync everything again once the release becomes official.
It shall also be noted that a certain distribution with a 'U' even opens up the repository for the next release right after releasing the previous one, they don't call their development repository "development", "rawhide" or something like that, but with the release name of the next release.
This is one of the things we hope to come up with a proposal to fix at our Fedora Activity Day this month.
On Tue, Jun 02, 2009 at 00:45:32 +0200, Kevin Kofler kevin.kofler@chello.at wrote:
I think it's a bad idea to delay the releases more than necessary for marketing reasons. IMHO we should release as soon as technically feasible.
I think the release is really a marketting event anyway. Assuming there wasn't an install or driver issue affecting you, people could have been running F11 since the preview without getting a much different experience than what you'll get with the release. And even then the release is going to be modified by the updates (there are around 900 updates available in stable, testing and pending for F11).
It means people can go from Rawhide to the release without service interruption. The way it is now, once the release is finalized, the mirror list redirect of the release tree to Rawhide gets turned off (because Rawhide moves on to the next release), but the Everything directory gets opened up only a few days later, on the official release day. That means people cannot install packages from the Everything repository (which can also affect updates because they can add dependencies) for a few days.
I agree that using the same pattern for development releases would make transitioning between development and production. This helps those of us that don't continously track rawhide, but often switch to it at some point during the development cycle. It also makes working with mirror scripts easier as you can base things off the release version and not have to worry about the changes in mirror paths between releases and rawhide that currently exists.
One downside would be people less familiar with Fedora accidentally grabbing stuff from the develpoment trees not realizing that they were development trees. But that seems like an unlikely path for people new to Fedora to get started.
Another potential downside would be for people always wanting to track rawhide. But this might be handled by having a symlink from rawhide to the highest numbered release that gets changed everytime a new release is branched.
On 01.06.2009 20:14, Jesse Keating wrote:
On Mon, 2009-06-01 at 19:49 +0200, Thorsten Leemhuis wrote:
On 29.05.2009 22:57, Jesse Keating wrote:
Thx for your reply and sorry, I didn't found time to answer earlier. Some more comments:
[...]
- how about reducing the number or zero day updates (which is ridiculous
high for F11) by setting a different, later freeze date for all packages that are neither on the install DVD or on a Spin?
That's a bit hard to determine easily and programaticaly. I've all but given up on my quest to reduce the number of updates.
Just a general comment, not relevant for the "Announcing Fedora Activity Day". I'm glad that you gave up that quest, as "Fedora gets lot's of updates" is afaics a reason why a lot people actually use or contribute to Fedora. But that doesn't matter much. What I really want to say:
I totally agree with this:
It's just doesn't seem to be in line with the desires of the package maintainers, whether or not it's in line with the desires of (some of) the project leaders.
It IMHO shows a big and more and more pressing problem in Fedora: Packagers and leadership are not working towards the same direction.
Anyway, back to topic:
- other distributions seems to manage a whole lot more test releases
(e.g. alphas, beta, RC, milestones, ...) per devel cycle; is that something we should aim for as well?
If there was a way to do it without adding stress and work needed to teams like releng, installer, etc.. [...]
One comment: *from a outside* it often looks a bit like - some (not all) people go crazy for weeks or months and ignore some of those bugs that are not pressing, but nevertheless pressing (e.g. kind of bugs that tend to land on target or blocker tracker bugs or already are there) - then you send a reminder "there will be a a test release next week" - people suddenly wake up and try to fix those bugs for the test release - they notice: arghh, serious things are broken, we need more time; can we please slip? - we slip
Maybe more target dates where people should "get things into shape" might help to reduce the work for the real test/final releases.
- how about doing something like a "cp -l development devel-snapshot"
now and then (once a week) when we know rawhide is mostly working?
The rawhide trees for at least the past month are kept in the Fedora infrastructure and are even available by a public url. We haven't tagged any of them as "stable" though.
They are afaics way to far away/way to slow to reach to do a proper network install in an acceptable period of time (at least from Germany). Is rsync available -- I'm not aware of it, so I doubt it?
- how can we reduce our time between finishing a (test) release and
releasing it dramatically? It seems other distributions get new (test) release out to the users a lot quicker then the three to five days days we require, which seems a whole lot for a devel cycle that takes 180 days in total (and we all know how much rawhide can move on with a few days)
If we didn't care to let the mirrors sync up we could "release" it earlier. However what good does that do when nobody can /get/ to the release because none of the mirrors have it, and the ones that do can't help sync to those that don't because users are tying up all the bandwidth?
I didn't say to not care. But maybe shorten the time we wait for them and help to get the bits out more quickly; pushing users to use bittorrent more might help a lot as well.
- I'd be glad if we could stick to our release targets a lot better.
Delaying releases looks quite unprofessional. Delaying also creates trouble for those depending on our releases. Take computer magazines (which have hard deadlines for productions) that might want to ship with a new release on a CD or DVD together with the next issue -- due to our fame in missing deadlines it seems to me that we are a lot more unattractive than Ubuntu (which afaics is on the shelf's here in Germany with new computer magazines just a few days after it has been released)
What looks more unprofessional? Delaying the release, or hitting our date and releasing with bugs that eat people's data? Or releasing with broken graphics for large swaths of users?
I think you drifted a bit way to far away and into a opposition without need. I for example nowhere said that we should not slip if there is a strong reason to. But my comment was quite open, so it's partly my fault; so let me rephrase:
What is rel-eng doing in the next devel cycle to make sure we slip less then we used to in the past? Or does rel-eng think everything was fine and missing three and a half target dates (alpha, beta, release and the first slipped release target date) out of four is acceptable?
- why do we have to slip by a whole week most of the time? can't we find
ways to slip just a day or two if there really is no way around a delay?
The marketing machine has very strongly requested that we only do releases on Tuesdays.
Some reasons why Tuesdays are "must" instead of "good" would be way more helpful then a simple statement "they said so". So all I can say is: Yes, let's target Tuesdays if that is idea (which I agree), but if there is a slip then slip only a day, two or three if the problem can be fixed within that timeframe (which for example was not the case for the first release slip for the final, but maybe for the second).
[...]
- I'd be glad if the final release directories (e.g.
release/12/Everything" could be available earlier, even if what is in them is not yet what "12" actually will become
You'll have to enumerate why that is.
Kevin outlined some of the reasons already in a reply: https://www.redhat.com/archives/fedora-devel-list/2009-June/msg00073.html
One reason I've avoided this is added confusion as to when the "release" happens. If we created that directory and put content in there, would we have then released Fedora 12? When does it become "released" and thus trusted?
As Kevin said as well: It seems to work quite well for Ubuntu and actually avoids a lot of confusion. I'd say their scheme even works better then our scheme. I'd also think most "normal" users don't ever look on the servers.
BTW, one more general thing for the FAD: Would it make sense to make rawhide updates more than once a day in case something that bugs lot's of people can be fixed easily by a quick update?
CU knurd
On Tue, 2009-06-02 at 21:09 +0200, Thorsten Leemhuis wrote:
Just a general comment, not relevant for the "Announcing Fedora Activity Day". I'm glad that you gave up that quest, as "Fedora gets lot's of updates" is afaics a reason why a lot people actually use or contribute to Fedora. But that doesn't matter much. What I really want to say:
I totally agree with this:
It's just doesn't seem to be in line with the desires of the package maintainers, whether or not it's in line with the desires of (some of) the project leaders.
It IMHO shows a big and more and more pressing problem in Fedora: Packagers and leadership are not working towards the same direction.
Yes, that is a problem. In other projects, the contributions from folks who don't agree with where the project leaders are trying to take the project are often rejected or ignored. That isn't happening here, which is a good thing I suppose, but it is a point of contention that will have to be addressed.
Anyway, back to topic:
- other distributions seems to manage a whole lot more test releases
(e.g. alphas, beta, RC, milestones, ...) per devel cycle; is that something we should aim for as well?
If there was a way to do it without adding stress and work needed to teams like releng, installer, etc.. [...]
One comment: *from a outside* it often looks a bit like
- some (not all) people go crazy for weeks or months and ignore some of
those bugs that are not pressing, but nevertheless pressing (e.g. kind of bugs that tend to land on target or blocker tracker bugs or already are there)
- then you send a reminder "there will be a a test release next week"
- people suddenly wake up and try to fix those bugs for the test release
- they notice: arghh, serious things are broken, we need more time; can
we please slip?
- we slip
Maybe more target dates where people should "get things into shape" might help to reduce the work for the real test/final releases.
Yeah, I can see how that would be the case from the outside. From the inside, the more freeze points we have, the more times work in progress has to be stifled and beat into some sort of working shape, even if it's nothing like what the final product goal is, which results in a lot of wasted effort and testing which gets thrown out as development continues and changes everything that was "tested" earlier.
- how about doing something like a "cp -l development devel-snapshot"
now and then (once a week) when we know rawhide is mostly working?
The rawhide trees for at least the past month are kept in the Fedora infrastructure and are even available by a public url. We haven't tagged any of them as "stable" though.
They are afaics way to far away/way to slow to reach to do a proper network install in an acceptable period of time (at least from Germany). Is rsync available -- I'm not aware of it, so I doubt it?
No, but the most important part of these are the boot.iso which has the anaconda and other code that anaconda needs. The repository that you point the boot.iso at to install packages matters a lot less at least so far as "completing the install".
- how can we reduce our time between finishing a (test) release and
releasing it dramatically? It seems other distributions get new (test) release out to the users a lot quicker then the three to five days days we require, which seems a whole lot for a devel cycle that takes 180 days in total (and we all know how much rawhide can move on with a few days)
If we didn't care to let the mirrors sync up we could "release" it earlier. However what good does that do when nobody can /get/ to the release because none of the mirrors have it, and the ones that do can't help sync to those that don't because users are tying up all the bandwidth?
I didn't say to not care. But maybe shorten the time we wait for them and help to get the bits out more quickly; pushing users to use bittorrent more might help a lot as well.
I honestly feel that trying to shorten the length will only lead to mistakes. We're short as it is, nearly too short, with extremely little time to recover from a mistake without slipping.
- I'd be glad if we could stick to our release targets a lot better.
Delaying releases looks quite unprofessional. Delaying also creates trouble for those depending on our releases. Take computer magazines (which have hard deadlines for productions) that might want to ship with a new release on a CD or DVD together with the next issue -- due to our fame in missing deadlines it seems to me that we are a lot more unattractive than Ubuntu (which afaics is on the shelf's here in Germany with new computer magazines just a few days after it has been released)
What looks more unprofessional? Delaying the release, or hitting our date and releasing with bugs that eat people's data? Or releasing with broken graphics for large swaths of users?
I think you drifted a bit way to far away and into a opposition without need. I for example nowhere said that we should not slip if there is a strong reason to. But my comment was quite open, so it's partly my fault; so let me rephrase:
What is rel-eng doing in the next devel cycle to make sure we slip less then we used to in the past? Or does rel-eng think everything was fine and missing three and a half target dates (alpha, beta, release and the first slipped release target date) out of four is acceptable?
I don't think it's acceptable, and what we're doing is taking a look at the problems with our development cycle, so that we can provide more time for development and bugfixing without getting in the way of maintainers, we're holding this FAD to investigate and brainstorm.
- why do we have to slip by a whole week most of the time? can't we find
ways to slip just a day or two if there really is no way around a delay?
The marketing machine has very strongly requested that we only do releases on Tuesdays.
Some reasons why Tuesdays are "must" instead of "good" would be way more helpful then a simple statement "they said so". So all I can say is: Yes, let's target Tuesdays if that is idea (which I agree), but if there is a slip then slip only a day, two or three if the problem can be fixed within that timeframe (which for example was not the case for the first release slip for the final, but maybe for the second).
It wouldn't have been. Nearly every time we tried to slip only for a couple days, we've had to slip for longer. Slipping usually means that we didn't have a GOLD set at the point of no return, which means we have to generate a fix for whatever is broken, re-create the release candidate, and re-validate it through our test matrix. It simply takes more time to do than what we could fit into a 2 or 3 day slip.
[...]
- I'd be glad if the final release directories (e.g.
release/12/Everything" could be available earlier, even if what is in them is not yet what "12" actually will become
You'll have to enumerate why that is.
Kevin outlined some of the reasons already in a reply: https://www.redhat.com/archives/fedora-devel-list/2009-June/msg00073.html
I don't want to make assumptions. If his reasons are yours, that's fine we'll take those issues as the problems to solve. What I'm not looking for is solutions looking for a problem, or pre-determined solutions that we'll try to make every problem fit into. I want to look at the problem set as a whole and /then/ work toward a solution. The proposal on the page is mostly a thought exercise to get people thinking about the scope of things that we could change. If we come out of this FAD with exactly my proposal, then we'll have likely wasted a lot of time and money.
One reason I've avoided this is added confusion as to when the "release" happens. If we created that directory and put content in there, would we have then released Fedora 12? When does it become "released" and thus trusted?
As Kevin said as well: It seems to work quite well for Ubuntu and actually avoids a lot of confusion. I'd say their scheme even works better then our scheme. I'd also think most "normal" users don't ever look on the servers.
BTW, one more general thing for the FAD: Would it make sense to make rawhide updates more than once a day in case something that bugs lot's of people can be fixed easily by a quick update?
Unfortunately with the addition of delta rpms, rawhide composes are back up to taking 8+ hours. There are places in the delta code paths were we can optimize, but the fact that we're dealing with 8500 packages to generate a lot more rpms across 4 arches, determining multilib on these, and then generating / validating deltas for all of these means that we're working on a scale that is very large, with a comparatively very small budget for hardware to deal with it, which means that our composes are not going to be quick. Particularly when we're trying to also push tonnes of updates for 3 other releases, which puts more stress on the same pieces of hardware. The scale is enormous, and only getting bigger, which means slower.
Thorsten Leemhuis wrote:
It IMHO shows a big and more and more pressing problem in Fedora: Packagers and leadership are not working towards the same direction.
The best solution for that is to change the leadership. :-) So don't vote for the same old hats for FESCo and FPB.
One comment: *from a outside* it often looks a bit like
- some (not all) people go crazy for weeks or months and ignore some of
those bugs that are not pressing, but nevertheless pressing (e.g. kind of bugs that tend to land on target or blocker tracker bugs or already are there)
- then you send a reminder "there will be a a test release next week"
- people suddenly wake up and try to fix those bugs for the test release
- they notice: arghh, serious things are broken, we need more time; can
we please slip?
- we slip
Maybe more target dates where people should "get things into shape" might help to reduce the work for the real test/final releases.
Actually a better strategy is to just schedule the release a few weeks earlier than when you actually want to release, then the slips will make it hit the real target date very closely. :-)
But I don't see those slips as a big issue in the first place.
Kevin Kofler
On Wed, Jun 03, 2009 at 01:08:15AM +0200, Kevin Kofler wrote:
Thorsten Leemhuis wrote:
It IMHO shows a big and more and more pressing problem in Fedora: Packagers and leadership are not working towards the same direction.
The best solution for that is to change the leadership. :-) So don't vote for the same old hats for FESCo and FPB.
Honestly, that is pretty short-sighted. And Thorsten's statement isn't entirely accurate either. Entirely new FESCo and FPB would still be faced with the same problems we have today.
Let's look at in a bit more detail.
1) I don't recall ever seeing FESCo or FPB state as a committee that they want fewer packages and updates. If you have a mailing list post to meeting minutes that say that, I would be happy to look at it.
2) The people that _have_ advocated for fewer updates have actual limitations they are facing that would make it desirable. As Jesse said in his reply earlier, it takes 8+ hours to mash rawhide now. It _also_ takes at least 8 hours to do an updates push.
We are facing some real limitations on our turn around time for things at the moment and they are only going to get worse as we have newer releases that will get the delta rpms. At the same time, the same people are getting raked over the coals for not getting bits out fast enough.
We are working on this from a rel-eng standpoint, but advocating for a bit of discretion on what should be pushed as an update is not entirely a bad thing. Personally, I would love it if package maintainers slowed down a bit. But it's not an end solution.
So certainly the leadership, defined as FESCo and FPB, is not in conflict with the contributor's apparent direction. As far as I can see, they haven't made a statement either way. If there is a group that was pushing for something that ran contrary, it was Rel-Eng. And given that Jesse and I both just said we're going to basically stop begging people to slow down on updates, I think even that group is trying to figure out a way to make things better. Hell, that's partly what this FAD is all about.
So, please. The rhetoric isn't really needed, it's not productive, and it's just going to stir things up more than necessary.
Maybe more target dates where people should "get things into shape" might help to reduce the work for the real test/final releases.
Actually a better strategy is to just schedule the release a few weeks earlier than when you actually want to release, then the slips will make it hit the real target date very closely. :-)
Except our schedule is public and open. So whatever we say the date is, pretend or not, is the date that people will expect and target.
josh
On Tue, Jun 02, 2009 at 08:15:32PM -0400, Josh Boyer wrote:
We are facing some real limitations on our turn around time for things at the moment and they are only going to get worse as we have newer releases that will get the delta rpms. At the same time, the same people are getting raked over the coals for not getting bits out fast enough.
We are working on this from a rel-eng standpoint, but advocating for a bit of discretion on what should be pushed as an update is not entirely a bad thing. Personally, I would love it if package maintainers slowed down a bit. But it's not an end solution.
So certainly the leadership, defined as FESCo and FPB, is not in conflict with the contributor's apparent direction. As far as I can see, they haven't made a statement either way. If there is a group that was pushing for something that ran contrary, it was Rel-Eng. And given that Jesse and I both just said we're going to basically stop begging people to slow down on updates, I think even that group is trying to figure out a way to make things better. Hell, that's partly what this FAD is all about.
If the FAD identifies some tangibles (hardware, etc.) that would help alleviate some of the time problems, I can tell you that Spot and I will do our best to procure them. From what I've heard others describe up until now, it doesn't seem like there's one clear roadblock in that regard -- just a huge mountain of tasks that our current systems have to chug through for composing, and no matter how you slice it, it takes a lot of time and I/O bandwidth.
On Wed, Jun 03, 2009 at 08:55:48AM -0400, Paul W. Frields wrote:
On Tue, Jun 02, 2009 at 08:15:32PM -0400, Josh Boyer wrote:
We are facing some real limitations on our turn around time for things at the moment and they are only going to get worse as we have newer releases that will get the delta rpms. At the same time, the same people are getting raked over the coals for not getting bits out fast enough.
We are working on this from a rel-eng standpoint, but advocating for a bit of discretion on what should be pushed as an update is not entirely a bad thing. Personally, I would love it if package maintainers slowed down a bit. But it's not an end solution.
So certainly the leadership, defined as FESCo and FPB, is not in conflict with the contributor's apparent direction. As far as I can see, they haven't made a statement either way. If there is a group that was pushing for something that ran contrary, it was Rel-Eng. And given that Jesse and I both just said we're going to basically stop begging people to slow down on updates, I think even that group is trying to figure out a way to make things better. Hell, that's partly what this FAD is all about.
If the FAD identifies some tangibles (hardware, etc.) that would help alleviate some of the time problems, I can tell you that Spot and I will do our best to procure them. From what I've heard others describe up until now, it doesn't seem like there's one clear roadblock in that regard -- just a huge mountain of tasks that our current systems have to chug through for composing, and no matter how you slice it, it takes a lot of time and I/O bandwidth.
Yep. As a simple test, We'd like to do some experiments to see if running updates pushes and rawhide composes on separate boxen makes things worse or better or about the same. I don't think we need additional procured hardware for that, just a cloned guest which I already have a ticket opened for.
Oh, and time. Always need time. If you or spot could procure time, let me know ;)
josh
On 06/03/2009 02:38 PM, Tom "spot" Callaway wrote:
On 06/03/2009 09:01 AM, Josh Boyer wrote:
Oh, and time. Always need time. If you or spot could procure time, let me know ;)
Man, if I knew how to do that, I'd be a lot wealthier than I am now. ;)
Extend the day to 36 hours
Gosh feel like a millionaire already
Sleep is overrated anyway. :)
Johann "who get's enough sleep when he's dead" Gudmundsson
On Wed, 2009-06-03 at 08:55 -0400, Paul W. Frields wrote:
If the FAD identifies some tangibles (hardware, etc.) that would help alleviate some of the time problems, I can tell you that Spot and I will do our best to procure them. From what I've heard others describe up until now, it doesn't seem like there's one clear roadblock in that regard -- just a huge mountain of tasks that our current systems have to chug through for composing, and no matter how you slice it, it takes a lot of time and I/O bandwidth.
Well upgrading all the buildsystem and storage system to 10gigE, and replacing the 10TB or so filer with something that has fantastically fast disks would certainly reduce the amount of I/O wait time, but that's going to cost a /lot/ of money. We are still trying to find where our bottlenecks are. We had a pretty good handle on things prior to delta rpms, but now we've gone from 3~ hour null composes (no changed packages) to nearly 9 hour null composes. There is obviously some optimization to be had here.
On Wed, 3 Jun 2009, Jesse Keating wrote:
On Wed, 2009-06-03 at 08:55 -0400, Paul W. Frields wrote:
If the FAD identifies some tangibles (hardware, etc.) that would help alleviate some of the time problems, I can tell you that Spot and I will do our best to procure them. From what I've heard others describe up until now, it doesn't seem like there's one clear roadblock in that regard -- just a huge mountain of tasks that our current systems have to chug through for composing, and no matter how you slice it, it takes a lot of time and I/O bandwidth.
Well upgrading all the buildsystem and storage system to 10gigE, and replacing the 10TB or so filer with something that has fantastically fast disks would certainly reduce the amount of I/O wait time, but that's going to cost a /lot/ of money. We are still trying to find where our bottlenecks are. We had a pretty good handle on things prior to delta rpms, but now we've gone from 3~ hour null composes (no changed packages) to nearly 9 hour null composes. There is obviously some optimization to be had here.
And the optimization there is fairly well known. We need to read in and not change the prestodelta file. It's on my short-ish createrepo list.
-sv
On Wed, 2009-06-03 at 13:49 -0400, Seth Vidal wrote:
And the optimization there is fairly well known. We need to read in and not change the prestodelta file. It's on my short-ish createrepo list.
Hrm, bill thought it was something on the mash side, where he validates the signature of all the existing deltas to catch if a gpg sig changed without a n-v-r bump.
On Wed, 3 Jun 2009, Jesse Keating wrote:
On Wed, 2009-06-03 at 13:49 -0400, Seth Vidal wrote:
And the optimization there is fairly well known. We need to read in and not change the prestodelta file. It's on my short-ish createrepo list.
Hrm, bill thought it was something on the mash side, where he validates the signature of all the existing deltas to catch if a gpg sig changed without a n-v-r bump.
Ah - well I hadn't heard the mash part - I know that we could speed up the prestodelta xml creation part - I wasn't convinced it was a 6 hour process, though :)
-sv
Jesse Keating (jkeating@redhat.com) said:
On Wed, 2009-06-03 at 13:49 -0400, Seth Vidal wrote:
And the optimization there is fairly well known. We need to read in and not change the prestodelta file. It's on my short-ish createrepo list.
Hrm, bill thought it was something on the mash side, where he validates the signature of all the existing deltas to catch if a gpg sig changed without a n-v-r bump.
I haven't characterized that that is *definitely* what's causing pain, but it's a likely source.
It's also a hard one to optimize unless you decree that packages will never change signatures, which doesn't seem practical.
Bill
On Wed, 3 Jun 2009, Bill Nottingham wrote:
Jesse Keating (jkeating@redhat.com) said:
On Wed, 2009-06-03 at 13:49 -0400, Seth Vidal wrote:
And the optimization there is fairly well known. We need to read in and not change the prestodelta file. It's on my short-ish createrepo list.
Hrm, bill thought it was something on the mash side, where he validates the signature of all the existing deltas to catch if a gpg sig changed without a n-v-r bump.
I haven't characterized that that is *definitely* what's causing pain, but it's a likely source.
It's also a hard one to optimize unless you decree that packages will never change signatures, which doesn't seem practical.
We could always go to detached signatures or auto-pkg signatures and then only manually sign the repomd's.
-sv
Sorry, was quite busy with other stuff over the past few days and didn't get around to answer this
On 03.06.2009 02:15, Josh Boyer wrote:
On Wed, Jun 03, 2009 at 01:08:15AM +0200, Kevin Kofler wrote:
Thorsten Leemhuis wrote:
It IMHO shows a big and more and more pressing problem in Fedora: Packagers and leadership are not working towards the same direction.
The best solution for that is to change the leadership. :-) So don't vote for the same old hats for FESCo and FPB.
Honestly, that is pretty short-sighted. And Thorsten's statement isn't entirely accurate either. Entirely new FESCo and FPB would still be faced with the same problems we have today.
Let's look at in a bit more detail.
- I don't recall ever seeing FESCo or FPB state as a committee that they want
fewer packages and updates. If you have a mailing list post to meeting minutes that say that, I would be happy to look at it.
In short: And that from my point of view is exactly the leadership problem.
The verbose version: Fedora obviously has a problem here as some packagers follow a (kind of) debian like update scheme while others are more rolling release scheme. That's bad, as those users that prefer to get the latest version of the software as regular update are not satisfied, as some packages stay on old versions; neither are those that prefer "old, but stable", as they sometimes can't avoid to update to new versions for security reasons (Note that this is the very long story very short and without lots of details/special cases where doing either the first or the second is the better thing to do).
The policy that FESCo worked out a few months didn't help much. In fact it's so vague that it's IMHO more confusing then helpful. Not to forget Jesse (as rel-eng lead in a quite important position) and his "quest to reduce the number of updates" (which he gave up -- see earlier this thread), which likely made some packagers wonder "is it right how I do it"?
A real leader/leading group would have said guided people better. Like "This is how we want to do it [...], here are a few examples that will help to understand [...]". Or even better: work out a overall solution that changes Fedora into a distribution that satisfies both users groups mentioned above: those that prefer older, but stable packages and those that prefer newer packages, but avoid rawhide because it's to dangerous.
[...]
Cu knurd
On Mon, Jun 8, 2009 at 6:24 PM, Thorsten Leemhuisfedora@leemhuis.info wrote:
Not to forget Jesse (as rel-eng lead in a quite important position) and his "quest to reduce the number of updates" (which he gave up -- see earlier this thread),
FTR, he actually said "I've all *but* given up on my quest to reduce the number of updates", emphasis is mine.
Jesse Keating said the following on 06/01/2009 11:14 AM Pacific Time:
On Mon, 2009-06-01 at 19:49 +0200, Thorsten Leemhuis wrote:
- why do we have to slip by a whole week most of the time? can't we find
ways to slip just a day or two if there really is no way around a delay?
The marketing machine has very strongly requested that we only do releases on Tuesdays.
Seeking clarification... From my perspective, "marketing machine" has a derogatory tone to it. Is that the intended tone?
If, "yes," why? If "no," never mind :)
John
On Thu, 2009-06-04 at 19:29 -0700, John Poelstra wrote:
Seeking clarification... From my perspective, "marketing machine" has a derogatory tone to it. Is that the intended tone?
If, "yes," why? If "no," never mind :)
There was no derogatory tone intended.
On Mon, 2009-06-01 at 19:49 +0200, Thorsten Leemhuis wrote:
- should we set an way earlier freezes date for things like anaconda,
kernel, isolinux, grub and other crucial pieces to make sure they are in better shape a bit earlier and thus are less likely a reason for release slips?
If you're basing this off the F11 cycle, it's worth noting that kernel and anaconda have not been 'reasons for release slips' in this cycle in that late changes were made to them which turned out to be bad ideas. It was simply that there were bugs in them all along which were critical enough to block the release. An earlier freeze date would not have helped at all.
On 01.06.2009 21:50, Adam Williamson wrote:
On Mon, 2009-06-01 at 19:49 +0200, Thorsten Leemhuis wrote:
- should we set an way earlier freezes date for things like anaconda,
kernel, isolinux, grub and other crucial pieces to make sure they are in better shape a bit earlier and thus are less likely a reason for release slips?
If you're basing this off the F11 cycle, it's worth noting that kernel and anaconda have not been 'reasons for release slips' in this cycle in that late changes were made to them which turned out to be bad ideas.
I know, but:
It was simply that there were bugs in them all along which were critical enough to block the release. An earlier freeze date would not have helped at all.
It might have helped to find the problem earlier -- I for example got the impression that a lot of people had problems with the storage rewrite and thus aborted their tests with Alpha or Beta.
CU knurd
On Tue, 2009-06-02 at 21:12 +0200, Thorsten Leemhuis wrote:
It might have helped to find the problem earlier -- I for example got the impression that a lot of people had problems with the storage rewrite and thus aborted their tests with Alpha or Beta.
An earlier freeze would have just frozen the work unfinished. The rewrite was a massive undertaking and we knew it was going to take longer than the release cycle to finish. Freezing earlier wouldn't have helped.
Jesse Keating wrote:
An earlier freeze would have just frozen the work unfinished. The rewrite was a massive undertaking and we knew it was going to take longer than the release cycle to finish. Freezing earlier wouldn't have helped.
Then it should have been done in a work branch and targeted for a later release.
Kevin Kofler
On Wed, Jun 03, 2009 at 01:10:14AM +0200, Kevin Kofler wrote:
Jesse Keating wrote:
An earlier freeze would have just frozen the work unfinished. The rewrite was a massive undertaking and we knew it was going to take longer than the release cycle to finish. Freezing earlier wouldn't have helped.
Then it should have been done in a work branch and targeted for a later release.
You have a point, but for something like anaconda you really need it to be installable and tested via installs. Simply branching the package isn't enough, since you need actual composes with the branched code in it.
Not impossible to do, but when we have trouble getting people to test rawhide already I'm not sure diluting our test pool that way is a great answer.
josh
An earlier freeze would have just frozen the work unfinished. The rewrite was a massive undertaking and we knew it was going to take longer than the release cycle to finish. Freezing earlier wouldn't have helped.
Then it should have been done in a work branch and targeted for a later release.
Which, of course, it was. But there's no substitute for real-world testing. We do not have nearly the variety of hardware setups, existing partition layouts, or unusual requirements here that users do. At some point, we really do just have to let the new code loose on everyone and get the broad testing that's supposed to be one of the hallmarks of free software.
- Chris
It might have helped to find the problem earlier -- I for example got the impression that a lot of people had problems with the storage rewrite and thus aborted their tests with Alpha or Beta.
There was no storage rewrite in the Alpha, so this isn't the case there. For the beta, you are correct.
- Chris
devel@lists.stg.fedoraproject.org