I've got a system that freezes up solid when I am doing certain network operations. I can't open a new session, I can't unfreeze it, basically all I can do is power down and reboot.
How do I figure out what is causing the problem ? I've checked the system logs, but they are clean.
What else could I look at and how ?
On Tue, 2005-01-25 at 19:09 -0700, Kim Lux wrote:
I've got a system that freezes up solid when I am doing certain network operations. I can't open a new session, I can't unfreeze it, basically all I can do is power down and reboot.
How do I figure out what is causing the problem ? I've checked the system logs, but they are clean.
What else could I look at and how ?
Serial console?
Phil
On Tue, 2005-01-25 at 19:09 -0700, Kim Lux wrote:
How do I figure out what is causing the problem ? I've checked the system logs, but they are clean.
With lots of crashes lately but never an oops or panic message to report, I was about to have the same question, but just to be safe I left memtest86 running today, and found bad ram :(
I'm running with mem=236M for now to block out the bad parts, but has there been an RFE for the badram kernel patch? (not seeing any on bugzilla, not even closed GOAWAY or BADIDEA or whatever) We've already got a version of memtest86 that can spit out the badram values... Assuming the labor of maintaining it in the patchset isn't too high, I think it's probably a better thing to recognize that people are going to use imperfect hardware and give them a way to deal with it, than to decide that everyone needs new hardware. (start flamewar now)
http://rick.vanrein.org/linux/badram/
If that turns out not to be the (only) problem, what *is* the best way to get debug info from bad crashes, where even alt-sysrq-jitsu does no good? I know about the serial console capability; lately I've also seen stuff about diskdump and netdump... which of these is most likely to survive serious kernel problems long enough to get a useful report that can be bugzilla'ed?
Em Qua, 2005-01-26 às 18:48 -0700, Wes Shull escreveu:
I'm running with mem=236M for now to block out the bad parts, but has there been an RFE for the badram kernel patch? (not seeing any on
I've never got the badram's patch code. Is this so hard to have as a separate package, from, say, Dag or AT?
I found the problem the other day: I was running ndiswrapper with a kernel using a 4K stack.
It would operate fine most of the time but if the network load got just right it would silently crash the kernel. The thing that drove it over the edge, ie started it crashing regularly was when I started doing some light NATing work with it. Then I could almost crash it at will.
I just happened to find the cause of the crashing when I installed a new kernel and was building ndiswrapper. I happened to notice that there was a warning in the build messages about some drivers not working with a kernel using a 4K stack.
I built a custom kernel with an 8K stack and I've been running crash free for 3 days, NATing and all.
I think if I ran into this sort of thing again I would build a custom kernel with all of the debugging features turned on and maybe run with a serial console to capture the debugger output stream.
On Wed, 2005-01-26 at 18:48 -0700, Wes Shull wrote:
On Tue, 2005-01-25 at 19:09 -0700, Kim Lux wrote:
How do I figure out what is causing the problem ? I've checked the system logs, but they are clean.
With lots of crashes lately but never an oops or panic message to report, I was about to have the same question, but just to be safe I left memtest86 running today, and found bad ram :(
I'm running with mem=236M for now to block out the bad parts, but has there been an RFE for the badram kernel patch? (not seeing any on bugzilla, not even closed GOAWAY or BADIDEA or whatever) We've already got a version of memtest86 that can spit out the badram values... Assuming the labor of maintaining it in the patchset isn't too high, I think it's probably a better thing to recognize that people are going to use imperfect hardware and give them a way to deal with it, than to decide that everyone needs new hardware. (start flamewar now)
http://rick.vanrein.org/linux/badram/
If that turns out not to be the (only) problem, what *is* the best way to get debug info from bad crashes, where even alt-sysrq-jitsu does no good? I know about the serial console capability; lately I've also seen stuff about diskdump and netdump... which of these is most likely to survive serious kernel problems long enough to get a useful report that can be bugzilla'ed?
On Fri, 2005-01-28 at 13:41 -0300, an anonymous guy wrote:
I found the problem the other day: I was running ndiswrapper with a kernel using a 4K stack.
You're recompiled it, or just grabbed one from somewhere?
I rebuilt it. I did so with 2.6.10-1.753. I used all the stock settings except for the kernel stack size. I wrote up the procedure in case I ever had to do it again. Let me know if you need it.
ndiswrapper also complains about spinlock debug enabled. Did you changed that?
I did not see that warning in the build messages, so no, I left it alone. The NVidia driver also complains about kernel settings, specifically the video frame buffer.
Here was the warning that appeared when building ndiswrapper. (I just happened to save it.)
WARNING: Kernel is compiled with 4K stack size option (CONFIG_4KSTACKS); many Windows drivers will not work with this option enabled. Disable CONFIG_4KSTACKS option, recompile and install kernel
On Fri, 2005-01-28 at 10:47 -0700, Kim Lux wrote:
On Fri, 2005-01-28 at 13:41 -0300, an anonymous guy wrote:
I found the problem the other day: I was running ndiswrapper with a kernel using a 4K stack.
You're recompiled it, or just grabbed one from somewhere?
I rebuilt it. I did so with 2.6.10-1.753. I used all the stock settings except for the kernel stack size. I wrote up the procedure in case I ever had to do it again. Let me know if you need it.
ndiswrapper also complains about spinlock debug enabled. Did you changed that?
I did not see that warning in the build messages, so no, I left it alone. The NVidia driver also complains about kernel settings, specifically the video frame buffer.
-- Kim Lux, Diesel Research Inc.