A few weeks ago we had a 4-way amd64 web server running RHEL 4 that crashed sporadically -- nothing left in the syslog. up2date didn't find a new kernel, so I just downloaded and installed the latest kernel from kernel.org and the system has been stable ever since. I'm not sure if I could have gone to RH for support because Cornell has a site license, and even if I had a direct line to RH management, it would take me more time to explain the problem than it would take to try a mainstream kernel.
Overall, I'm quite happy with the four-digit revision mainstream linux kernels. We had a crash on our main machine that left a stack trace, did some research on the web, found that this had been fixed in 2.6.11.something, upgraded the kernel, case closed.
People are willing to pay $$ to get an "enterprise" product which is reliable, and supported, but this is another case where the generic product turns out to be more reliable than the branded product, and looking at what's happening with Fedora, I've got a lot of concern that RH's pursuit of innovation will always lead to a kernel long on gee-whiz features and short on reliability. Crashes mean I get calls from the NOC at 4am, and god forbid that my toddler hears the phone ring or me walking down the stairs, because I'll need to entertain him while dealing with the crash and for the rest of the morning. Then a week later I go to netcraft and they say my uptime is seven days and I feel like a jerk because the whole world knows about my problems.
I think there are two reasons for the RHEL 4 instability: (i) the quarterly release cycle means that I have to wait for bug fixes -- and if you're running a non-x86 architecture, it seems like 2.6 is shaking out bugs at a high rate, and (ii) RH is aggressively pushing new features.
I really don't know what's in RHEL 4 (it would take me more time to look at the patches than it would to revert to mainstream) but the activation of 4KSTACKS in Fedora is one of those changes that reduces reliably.
I've been looking, and I've never found out what benefit that 4KSTACKS has for end users. The kernel team is sensible, so I'm sure that there are some real benefits, but looking at the problem reports and at the attitudes of some people on this list, I start to wonder if it's just a vindicitive attempt to put an end to ndiswrappers. (I'd really love to see an explanation of the benefits of 4KSTACKS)
The real trouble is that 4KSTACKS problems aren't in kernel modules per se, but really are in the combination of modules that are running. Yeah, maybe they can get reiserfs running under 4KSTACKS, but what if you're running an NFSv4 server with all the whizzy options turned on, and IPv6 with tunneling and it's a reiserfs filesystem and you're using LVM and RAID and a particularly funky SCSI driver, what then?
By adopting 4KSTACKS early, Fedora has helped shake out problems with 4KSTACKS, but when 4KSTACKS becomes the main option in the mainstream kernel, we'll see people dealing with weird problems that happen sporadically on certain setups for years to come. We seem to have one of the worst workloads in the world, and the last thing I need is more crashes.
On Tue, 2005-08-02 at 13:10 -0400, Paul A Houle wrote:
A few weeks ago we had a 4-way amd64 web server running RHEL 4 that
1) this is the fedora list not the RHEL list. 2) If you have an amd64 you are running the 64 bit OS, right? The 64 bit OS has 8K stacks
I really don't know what's in RHEL 4 (it would take me more time to
look at the patches than it would to revert to mainstream) but the activation of 4KSTACKS in Fedora is one of those changes that reduces reliably.
I've been looking, and I've never found out what benefit that
4KSTACKS has for end users.
1) more threads in userspace 2) FAR FAR less vm problems
By adopting 4KSTACKS early, Fedora has helped shake out problems
with 4KSTACKS, but when 4KSTACKS becomes the main option in the
it already is a main option, and soon to be the only option.
but given that you don't seem to be running a 4K stacks kernel at all anyway (eg 64 bit OS) I wonder what your flame is really about...
On Tue, Aug 02, 2005 at 01:10:08PM -0400, Paul A Houle wrote:
I've been looking, and I've never found out what benefit that 4KSTACKS has for end users. The kernel team is sensible, so I'm sure
I don't think x86_64 has 4KSTACKS, has it? It should be 8K for 64 bit kernels. I even just looked at the 2.6.12 config file in /boot and can't find any mention of stack sizes.
On Tue, Aug 02, 2005 at 01:10:08PM -0400, Paul A Houle wrote:
People are willing to pay $$ to get an "enterprise" product which is reliable, and supported, but this is another case where the generic product turns out to be more reliable than the branded product,
This is entirely subjective. Look at the bugzilla stats for proof of this. Open kernel bugs: FC3: 616 FC4: 256 rawhide: 170 RHEL4: 189
RHEL4 bugs get fixed very quickly. It being a supported product, engineers devote time to fixing those problems. Fedora bugs have no guarantee of being fixed in the next update, or even the next release. A lot of Fedora bugs that get closed happen just due to the rebasing to a newer upstream release.
Looking at the numbers month over month, Fedora bugs go up, RHEL bugs go down. This may disappoint a lot of Fedora users, but this is the harsh reality. A lot of the kernel bugs filed against Fedora are also bugs that are relevant upstream. More recently, I've been pushing these bugs to their upstream maintainers in an attempt to try and beat down the volume. (Things were actually even worse a few months back).
Your complaint seems to be that some bug you hit was fixed upstream, but not in RHEL, yet at the same time you mention that you never filed a RHEL bug on this. We'll work on psychic-bug-reporting for RHEL5, but in the meantime, we need to know when things break to fix them. Whilst we watch upstream, and backport some fixes, with upstream committing ~4000 changes per point release, its not feasible to catch everything. Changes also need to be evaluated in terms of risk before they go into a RHEL release.
and looking at what's happening with Fedora, I've got a lot of concern that RH's pursuit of innovation will always lead to a kernel long on gee-whiz features and short on reliability.
gee-whiz features ? Exec-shield and Tux are the only real big-ticket items we carry these days in Fedora, and they're in RHEL too. Fedora kernels are a lot closer to mainline than any of the RHL kernels were.
I think there are two reasons for the RHEL 4 instability: (i) the quarterly release cycle means that I have to wait for bug fixes
We do push out interim updates for really important problems (typically dataloss/corruption/security issues).
-- and if you're running a non-x86 architecture, it seems like 2.6 is shaking out bugs at a high rate,
upstream is also introducing regressions at a high rate. After rebasing the FC3 kernel last week to 2.6.12, from 2.6.11, ~50 bugs got closed, and ~50 new ones got filed. This pattern has been going on for the last few releases. Some releases are worse than others, and we get more new issues than we get closed issues.
and (ii) RH is aggressively pushing new features.
Such as ?
I really don't know what's in RHEL 4 (it would take me more time to look at the patches than it would to revert to mainstream)
The GA release was very close to a Fedora kernel circa FC3. Subsequent updates have diverged.
but the activation of 4KSTACKS in Fedora is one of those changes that reduces reliably.
Funny, our evidence shows the contrary. FC2 was plagued with VM problems that magically went away when we moved to 4K stacks, and separate interrupt stacks. The fact is that on 32bit x86, you never had all of that 8kb stack anyway.
I've been looking, and I've never found out what benefit that 4KSTACKS has for end users.
Better reliability under load.
I start to wonder if it's just a vindicitive attempt to put an end to ndiswrappers. (I'd really love to see an explanation of the benefits of 4KSTACKS)
If you're using ndiswrapper, there's your problem right there. Windows drivers expect a 12KB stack. So in certain circumstances, you're out of luck even with 4k stacks disabled.
Dave
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Thank you very much indeed Dave, for a knowledgeable, instructing, honest and reassuring reply, which gives me great confidence in the Red Hat/Fedora project.
Sincerely,
- -- Jean-Luc Fontaine http://jfontain.free.fr/
Dave Jones wrote:
Your complaint seems to be that some bug you hit was fixed upstream, but not in RHEL, yet at the same time you mention that you never filed a RHEL bug on this. We'll work on psychic-bug-reporting for RHEL5, but in the meantime, we need to know when things break to fix them. Whilst we watch upstream, and backport some fixes, with upstream committing ~4000 changes per point release, its not feasible to catch everything. Changes also need to be evaluated in terms of risk before they go into a RHEL release.
It would have taken me a great deal of time to have filed a useful bug report. I had no stack trace and I can say little more about it than has already been said, other than the hardware details of the machine. The only cheap way that you could (possibly) resolve the problem for me is to send me a kernel that has more recent patches from upstream and hope that the problem goes away. (Maybe that's what you do -- if you do, than it might ~not~ be a waste of time for me to go through your process.)
On the other hand, my experience is that going straight to upstream solves problems rapidly and lets me go back to work. Yes, I can be an 'altruist' and spend more of my time helping RH fix a product that costs $2000 per server, or I can get the job done.
If my quick method didn't resolve my problem then I'd have the choice of going to LKML with a mainstream kernel (meaning I can have an e-mail message read by someone who knows how to fix a race condition) or submitting a bug report to RH (which starts with a password reset for my redhat.com account, and, if RH is like any other vendor, having to explain my problem several times to people who don't know how to fix race conditions.)
I'll consider going back to an RHEL kernel if the machine in question has problems that I can't fix my way, and then I'll try your process.
We do push out interim updates for really important problems (typically dataloss/corruption/security issues).
That's great, but it doesn't solve my problem.
Anyway, I know that these problems aren't easy, and they are things that don't really fit into one box (reliability of RHEL, Fedora and the Linux kernel) but I've really got a lot of concerns about reliability and there are days that I envy the guys in the other office who work with a SPARC/Solaris stack. I've got concerns that bad things are going to come in the upstream kernel -- for instance, there's a lot of cleanup and simplification of the network stack in 2.6.13 that will be nice a year from now, but they'll probably drop something on the floor, so I'll probably wait until 2.6.13.something to upgrade.
The funny thing is that the world is increasingly believing the Linux is "ready for the enterprise" in a time that I'm questioning my faith.
Paul A Houle wrote:
On the other hand, my experience is that going straight to upstream solves problems rapidly and lets me go back to work. Yes, I can be an 'altruist' and spend more of my time helping RH fix a product that costs $2000 per server, or I can get the job done.
At the risk of sounding trollish, I'd like to voice some of the words that the rest of us are muttering under our breath--
You're complaining about RHEL on the Fedora devel mailing list. And despite the fact that this has already been brought to your attention, your reply does not even hint as to why you think that this behavior is okay, but rather contains even more complaints about a distro that this list does not support.
In Fedora Land (which is where you're posting, by the way) the user is expected to help out. "Altruism" has nothing to do with it--the bugs that users identify will be fixed first because those are the bugs we know about. Dave has been surprisingly gracious and has addressed your concerns to a degree far beyond what your original (flaming) post would merit. Yet the scathing and disparaging nature of your reply indicates a level of disdainful indifference that simply does not belong in this community. Please understand that here we talk about Fedora (not RHEL), and here we behave as a friendly community.
Please do not badger the talent. Dave does a very good job and we all appreciate his help.
On Tue, 2 Aug 2005, Paul A Houle wrote:
A few weeks ago we had a 4-way amd64 web server running RHEL 4 that
crashed sporadically -- nothing left in the syslog. up2date didn't find a new kernel, so I just downloaded and installed the latest kernel from kernel.org and the system has been stable ever since. I'm not sure if I could have gone to RH for support because Cornell has a site license, and even if I had a direct line to RH management, it would take me more time to explain the problem than it would take to try a mainstream kernel.
Overall, I'm quite happy with the four-digit revision mainstream
linux kernels. We had a crash on our main machine that left a stack trace, did some research on the web, found that this had been fixed in 2.6.11.something, upgraded the kernel, case closed.
People are willing to pay $$ to get an "enterprise" product which is
reliable, and supported, but this is another case where the generic product turns out to be more reliable than the branded product, and looking at what's happening with Fedora, I've got a lot of concern that RH's pursuit of innovation will always lead to a kernel long on gee-whiz features and short on reliability. Crashes mean I get calls from the NOC at 4am, and god forbid that my toddler hears the phone ring or me walking down the stairs, because I'll need to entertain him while dealing with the crash and for the rest of the morning. Then a week later I go to netcraft and they say my uptime is seven days and I feel like a jerk because the whole world knows about my problems.
I think there are two reasons for the RHEL 4 instability: (i) the
quarterly release cycle means that I have to wait for bug fixes -- and if you're running a non-x86 architecture, it seems like 2.6 is shaking out bugs at a high rate, and (ii) RH is aggressively pushing new features.
I really don't know what's in RHEL 4 (it would take me more time to
look at the patches than it would to revert to mainstream) but the activation of 4KSTACKS in Fedora is one of those changes that reduces reliably.
I've been looking, and I've never found out what benefit that
4KSTACKS has for end users. The kernel team is sensible, so I'm sure that there are some real benefits, but looking at the problem reports and at the attitudes of some people on this list, I start to wonder if it's just a vindicitive attempt to put an end to ndiswrappers. (I'd really love to see an explanation of the benefits of 4KSTACKS)
The real trouble is that 4KSTACKS problems aren't in kernel modules
per se, but really are in the combination of modules that are running. Yeah, maybe they can get reiserfs running under 4KSTACKS, but what if you're running an NFSv4 server with all the whizzy options turned on, and IPv6 with tunneling and it's a reiserfs filesystem and you're using LVM and RAID and a particularly funky SCSI driver, what then?
By adopting 4KSTACKS early, Fedora has helped shake out problems
with 4KSTACKS, but when 4KSTACKS becomes the main option in the mainstream kernel, we'll see people dealing with weird problems that happen sporadically on certain setups for years to come. We seem to have one of the worst workloads in the world, and the last thing I need is more crashes.
I find it hard to understand why it is so important to remove this option, making the kernel 4K stacks only, as was proposed not so long ago.
I also find it hard to understand why it is such a problem having a larger stack. As you point out, as software evolves it ultimately becomes more complex. If the developers design needs it and the software is reliable and efficient (aka performs well) then why not.
A quick caclulation.
2000*4k is about 8M in say 1G at least.
Not a large percentage overhead I think.
But of course I don't know the reasoning behind the change (as I missed the thread), I just get a little burned by the consequences from time to time, like yourself.
Ian
On Fri, Aug 05, 2005 at 09:22:55AM +0800, Ian Kent wrote:
I also find it hard to understand why it is such a problem having a larger stack. As you point out, as software evolves it ultimately becomes more complex. If the developers design needs it and the software is reliable and efficient (aka performs well) then why not.
A quick caclulation.
2000*4k is about 8M in say 1G at least.
Not a large percentage overhead I think.
Now try finding 2000 _contiguous_ pairs of pages after the machine has been up for a while, under load. Memory fragmentation makes this a really nasty problem, and the VM eats its own head after repeatedly scanning every page in the system.
Dave
On Thu, 2005-08-04 at 21:43 -0400, Dave Jones wrote:
On Fri, Aug 05, 2005 at 09:22:55AM +0800, Ian Kent wrote:
I also find it hard to understand why it is such a problem having a larger stack. As you point out, as software evolves it ultimately becomes more complex. If the developers design needs it and the software is reliable and efficient (aka performs well) then why not.
A quick caclulation.
2000*4k is about 8M in say 1G at least.
Not a large percentage overhead I think.
Now try finding 2000 _contiguous_ pairs of pages after the machine has been up for a while, under load. Memory fragmentation makes this a really nasty problem, and the VM eats its own head after repeatedly scanning every page in the system.
I thought I heard that there was some work being done in the upstream kernel to have a process "defrag" memory in the background. This would help alleviate this problem on systems with long up-times.
Paul
On Thu, 4 Aug 2005, Paul wrote:
On Thu, 2005-08-04 at 21:43 -0400, Dave Jones wrote:
On Fri, Aug 05, 2005 at 09:22:55AM +0800, Ian Kent wrote:
I also find it hard to understand why it is such a problem having a larger stack. As you point out, as software evolves it ultimately becomes more complex. If the developers design needs it and the software is reliable and efficient (aka performs well) then why not.
A quick caclulation.
2000*4k is about 8M in say 1G at least.
Not a large percentage overhead I think.
Now try finding 2000 _contiguous_ pairs of pages after the machine has been up for a while, under load. Memory fragmentation makes this a really nasty problem, and the VM eats its own head after repeatedly scanning every page in the system.
I thought I heard that there was some work being done in the upstream kernel to have a process "defrag" memory in the background. This would help alleviate this problem on systems with long up-times.
I'm afraid I have to agree with Dave on this. Scanning pagelists really needs to be reduced to a minimum where ever possible.
Ian
On Thu, 2005-08-04 at 22:27 -0500, Paul wrote:
On Thu, 2005-08-04 at 21:43 -0400, Dave Jones wrote:
On Fri, Aug 05, 2005 at 09:22:55AM +0800, Ian Kent wrote:
I also find it hard to understand why it is such a problem having a larger stack. As you point out, as software evolves it ultimately becomes more complex. If the developers design needs it and the software is reliable and efficient (aka performs well) then why not.
A quick caclulation.
2000*4k is about 8M in say 1G at least.
Not a large percentage overhead I think.
Now try finding 2000 _contiguous_ pairs of pages after the machine has been up for a while, under load. Memory fragmentation makes this a really nasty problem, and the VM eats its own head after repeatedly scanning every page in the system.
I thought I heard that there was some work being done in the upstream kernel to have a process "defrag" memory in the background. This would help alleviate this problem on systems with long up-times.
actually that work is different; it is intended to defrag *userspace* pages; not kernel pages. And the existing vm already can reclaim those (by freeing them; the defrag work is there to avoid the actual free just to move them). The problem really is more complex than that, and the kernel VM got a lot of robustness back by having 4Kb stacks.
(Now on x86-64 and other 64 bit machines this is FAR less of a problem; actually it's almost exclusively a x86 problem. x86 has a 1Gb lowmem zone where all kernel stacks and other kernel datastructures have to go, and the rest of memory goes into a highmem zone. This split is like quadrupling the VM pain; without this split, multi-page stacks are still not pretty but an order of magnitude less of a problem)