Installing the 0.74 i386 RPM's from sabdb/maxdb produced the following dmesg output when doing an rpm -ihv of sapdb-testdb74-7.4.3.30-1.i386.rpm:
general protection fault: 0000 CPU 1 Pid: 16297, comm: kernel Not tainted RIP: 0010:[<ffffffff801c85b4>]{ia32_copy_siginfo_to_user+244} RSP: 0000:00000100e55cde40 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 0000000059374710 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000100e55cdf38 RDI: 00000000593747a0 RBP: 0000000000000000 R08: 00000100e61b9f60 R09: 0000000000000000 R10: ffffffff8013db71 R11: 0000000000010283 R12: 00000100e55cdf58 R13: 00000100f7cd86e8 R14: 0000000000000020 R15: 00000100e55cca58 FS: 0000002a9555c6c0(0000) GS:ffffffff80574dc0(005b) knlGS:0000000059374bb0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000ffffcef8 CR3: 00000000066a8000 CR4: 00000000000006e0 Process kernel (pid: 16297, stackpage=100e55cd000) Stack: 00000100e55cde40 0000000000000000 5557ed68801c918e 0000000400000000 59374bb000000000 0000000100000000 0000008000000000 00003fa900000000 ffffffff00000000 802c4c44ffffffff 00000010ffffffff 0001028300000000 Call Trace: [<ffffffff801101ce>]{do_signal+158} [<ffffffff802c4c44>]{bad_get_user+0} [<ffffffff8013db71>]{compat_sys_sched_setaffinity+17} [<ffffffff8011061b>]{intret_signal+45}
Code: c3 66 66 66 90 66 66 66 90 66 66 90 48 83 ec 18 48 81 e2 ff RIP [<ffffffff801c85b4>]{ia32_copy_siginfo_to_user+244} RSP <00000100e55cde40> CPU 1 Pid: 16297, comm: kernel Not tainted RIP: 0010:[<ffffffff801c85b4>]{ia32_copy_siginfo_to_user+244} RSP: 0000:00000100e55cde40 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 0000000059374710 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000100e55cdf38 RDI: 00000000593747a0 RBP: 0000000000000000 R08: 00000100e61b9f60 R09: 0000000000000000 R10: ffffffff8013db71 R11: 0000000000010283 R12: 00000100e55cdf58 R13: 00000100f7cd86e8 R14: 0000000000000020 R15: 00000100e55cca58 FS: 0000002a9555c6c0(0000) GS:ffffffff80574dc0(005b) knlGS:0000000059374bb0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000ffffcef8 CR3: 00000000066a8000 CR4: 00000000000006e0 Process kernel (pid: 16297, stackpage=100e55cd000) Stack: 00000100e55cde40 0000000000000000 5557ed68801c918e 0000000400000000 59374bb000000000 0000000100000000 0000008000000000 00003fa900000000 ffffffff00000000 802c4c44ffffffff 00000010ffffffff 0001028300000000 Call Trace: [<ffffffff801101ce>]{do_signal+158} [<ffffffff802c4c44>]{bad_get_user+0} [<ffffffff8013db71>]{compat_sys_sched_setaffinity+17} [<ffffffff8011061b>]{intret_signal+45}
Code: c3 66 66 66 90 66 66 66 90 66 66 90 48 83 ec 18 48 81 e2 ff
This is on a dual-processor Opteron rig with 4GB of RAM.
On Mon, 2004-02-02 at 17:21, Ken Snider wrote:
Installing the 0.74 i386 RPM's from sabdb/maxdb produced the following dmesg output when doing an rpm -ihv of sapdb-testdb74-7.4.3.30-1.i386.rpm:
Did you set the variable LD_ASSUME_KERNEL=2.2.5 to disable NPTL threading before starting the x_server process?
For the sapdb 7.4 rpm's you need to patch the /etc/init.d/sapdb74 script: the 3rd line under the start) header should read: su - -c "export LD_ASSUME_KERNEL=2.2.5; $X_SERVER start" sapdb
/dev/null 2>&1
Besides that it could be an opteron-specific bug you're hitting...
Klaasjan
Klaasjan Brand wrote:
Did you set the variable LD_ASSUME_KERNEL=2.2.5 to disable NPTL threading before starting the x_server process?
This might crash sapdb but it shouldn't OOPS the kernel should it??
For the sapdb 7.4 rpm's you need to patch the /etc/init.d/sapdb74 script: the 3rd line under the start) header should read: su - -c "export LD_ASSUME_KERNEL=2.2.5; $X_SERVER start" sapdb
/dev/null 2>&1
Actually.. interestingly if I DO use that variable, *nothing* works.
example:
[sapdb@dw1.dw testdb74]$ export LD_ASSUME_KERNEL=2.2.5 ; ./create_demo_db.sh /bin/sh: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
Interestingly, this problem does *NOT* occur if I don't use that variable (though, of course, then the database won't start).
--Ken.
On Mon, Feb 02, 2004 at 11:21:21AM -0500, Ken Snider wrote:
Installing the 0.74 i386 RPM's from sabdb/maxdb produced the following dmesg output when doing an rpm -ihv of sapdb-testdb74-7.4.3.30-1.i386.rpm:
Which kernel version is this? If 2135, does the problem still occur with 2163?
Thanks, Justin
Justin M. Forbes wrote:
Which kernel version is this? If 2135, does the problem still occur with 2163?
This is with 2163.
It should be noted that the oopses are now periodic with the sapdb xserver running..
Here's the most recent two.
<snip>
<0>general protection fault: 0000 CPU 0 Pid: 19398, comm: slowknl Not tainted RIP: 0010:[<ffffffff801c85b4>]{ia32_copy_siginfo_to_user+244} RSP: 0000:00000100e36e7e40 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 0000000059374710 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000100e36e7f38 RDI: 00000000593747a0 RBP: 0000000000000000 R08: 00000100e335ff60 R09: 0000000000000000 R10: ffffffff8013db71 R11: 0000000000010283 R12: 00000100e36e7f58 R13: 00000100f7b28ee8 R14: 0000000000000020 R15: 00000100e36e6a58 FS: 0000002a9555c6a0(0000) GS:ffffffff80574d40(005b) knlGS:0000000059374bb0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000008da694c CR3: 0000000000101000 CR4: 00000000000006e0 Process slowknl (pid: 19398, stackpage=100e36e7000) Stack: 00000100e36e7e40 0000000000000000 5557ed68801c918e 0000000400000000 59374bb000000000 0000000100000000 0000008000000000 00004bc600000000 ffffffff00000000 802c4c44ffffffff 00000010ffffffff 0001028300000000 Call Trace: [<ffffffff801101ce>]{do_signal+158} [<ffffffff802c4c44>]{bad_get_user+0} [<ffffffff8013db71>]{compat_sys_sched_setaffinity+17} [<ffffffff8011061b>]{intret_signal+45}
Code: c3 66 66 66 90 66 66 66 90 66 66 90 48 83 ec 18 48 81 e2 ff RIP [<ffffffff801c85b4>]{ia32_copy_siginfo_to_user+244} RSP <00000100e36e7e40> CPU 0 Pid: 19398, comm: slowknl Not tainted RIP: 0010:[<ffffffff801c85b4>]{ia32_copy_siginfo_to_user+244} RSP: 0000:00000100e36e7e40 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 0000000059374710 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000100e36e7f38 RDI: 00000000593747a0 RBP: 0000000000000000 R08: 00000100e335ff60 R09: 0000000000000000 R10: ffffffff8013db71 R11: 0000000000010283 R12: 00000100e36e7f58 R13: 00000100f7b28ee8 R14: 0000000000000020 R15: 00000100e36e6a58 FS: 0000002a9555c6a0(0000) GS:ffffffff80574d40(005b) knlGS:0000000059374bb0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000008da694c CR3: 0000000000101000 CR4: 00000000000006e0 Process slowknl (pid: 19398, stackpage=100e36e7000) Stack: 00000100e36e7e40 0000000000000000 5557ed68801c918e 0000000400000000 59374bb000000000 0000000100000000 0000008000000000 00004bc600000000 ffffffff00000000 802c4c44ffffffff 00000010ffffffff 0001028300000000 Call Trace: [<ffffffff801101ce>]{do_signal+158} [<ffffffff802c4c44>]{bad_get_user+0} [<ffffffff8013db71>]{compat_sys_sched_setaffinity+17} [<ffffffff8011061b>]{intret_signal+45}
Code: c3 66 66 66 90 66 66 66 90 66 66 90 48 83 ec 18 48 81 e2 ff
</snip>
These go away if sapdb is stopped.
dmesg attached.
<snip>
ok Bootdata ok (command line is ro root=LABEL=/ hdb=ide-scsi) Linux version 2.4.22-1.2163.nptlsmp (bhcompile@thor.perf.redhat.com) (gcc version 3.2.3 20030422 (Red Hat Linux 3.2.3-6)) #1 SMP Fri Jan 16 12:58:20 EST 2004 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000faff0000 (usable) BIOS-e820: 00000000faff0000 - 00000000fafff000 (ACPI data) BIOS-e820: 00000000fafff000 - 00000000fb000000 (ACPI NVS) BIOS-e820: 00000000ff7c0000 - 0000000100000000 (reserved) kernel direct mapping tables upto 10100000000 @ 8000-d000 ACPI: have wakeup address 0x10000002000 Scan SMP from 0000010000000000 for 1024 bytes. Scan SMP from 000001000009fc00 for 1024 bytes. Scan SMP from 00000100000f0000 for 65536 bytes. found SMP MP-table at 00000000000ff780 hm, page 000ff000 reserved twice. hm, page 00100000 reserved twice. hm, page 000f9000 reserved twice. hm, page 000fa000 reserved twice. On node 0 totalpages: 1028080 zone(0): 4096 pages. zone(1): 1023984 pages. zone(2): 0 pages. ACPI: RSDP (v002 ACPIAM ) @ 0x00000000000f4710 ACPI: XSDT (v001 A M I OEMXSDT 0x08000318 MSFT 0x00000097) @ 0x00000000faff0100 ACPI: FADT (v001 A M I OEMFACP 0x08000318 MSFT 0x00000097) @ 0x00000000faff0281 ACPI: MADT (v001 A M I OEMAPIC 0x08000318 MSFT 0x00000097) @ 0x00000000faff0380 ACPI: OEMB (v001 A M I OEMBIOS 0x08000318 MSFT 0x00000097) @ 0x00000000fafff040 ACPI: ASF! (v001 AMIASF AMDSTRET 0x00000001 INTL 0x02002026) @ 0x00000000faff3580 ACPI: DSDT (v001 0ABCF 0ABCF007 0x00000007 INTL 0x02002026) @ 0x0000000000000000 ACPI: Parsing Local APIC info in MADT ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:5 APIC version 16 ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23 ACPI: IOAPIC (id[0x03] address[0xfebfe000] global_irq_base[0x18]) IOAPIC[1]: Assigned apic_id 3 IOAPIC[1]: apic_id 3, version 17, address 0xfebfe000, IRQ 24-27 ACPI: IOAPIC (id[0x04] address[0xfebff000] global_irq_base[0x1c]) IOAPIC[2]: Assigned apic_id 4 IOAPIC[2]: apic_id 4, version 17, address 0xfebff000, IRQ 28-31 ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) Using ACPI (MADT) for SMP configuration information Checking aperture... CPU 0: aperture @ 3fe0000000 size 32 MB Aperture from northbridge cpu 0 too small (32 MB) No AGP bridge found Kernel command line: ro root=LABEL=/ hdb=ide-scsi ide_setup: hdb=ide-scsi Initializing CPU#0 time.c: Detected 1.193182 MHz PIT timer. time.c: Detected 1591.558 MHz TSC timer. Console: colour VGA+ 80x25 Calibrating delay loop... 3171.94 BogoMIPS Memory: 4017576k/4112320k available (1858k kernel code, 94356k reserved, 1603k data, 160k init) Dentry cache hash table entries: 131072 (order: 9, 2097152 bytes) Inode cache hash table entries: 131072 (order: 9, 2097152 bytes) Mount cache hash table entries: 256 (order: 0, 4096 bytes) Buffer cache hash table entries: 262144 (order: 9, 2097152 bytes) Page-cache hash table entries: 262144 (order: 9, 2097152 bytes) CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2 way) CPU: L2 Cache: 1024K (64 bytes/line/1 way) Machine Check Reporting enabled for CPU#0 POSIX conformance testing by UNIFIX mtrr: v2.02 (20020716)) CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2 way) CPU: L2 Cache: 1024K (64 bytes/line/1 way) CPU0: AMD Opteron(tm) Processor 242 stepping 01 per-CPU timeslice cutoff: 5120.94 usecs. task migration cache decay timeout: 10 msecs. Booting processor 1/1 rip 6000 page 00000100066ac000 Initializing CPU#1 Calibrating delay loop... 3178.49 BogoMIPS CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2 way) CPU: L2 Cache: 1024K (64 bytes/line/1 way) Machine Check Reporting enabled for CPU#1 CPU1: AMD Opteron(tm) Processor 242 stepping 01 Total of 2 processors activated (6350.43 BogoMIPS). ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23, 3-0, 3-1, 3-2, 3-3, 4-0, 4-1, 4-2, 4-3 not connected. ..TIMER: vector=0x31 pin1=2 pin2=0 number of MP IRQ sources: 16. number of IO-APIC #2 registers: 24. number of IO-APIC #3 registers: 4. number of IO-APIC #4 registers: 4. testing the IO APIC.......................
IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 02000000 ....... : arbitration: 02 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 001 01 0 0 0 0 0 1 1 39 02 001 01 0 0 0 0 0 1 1 31 03 001 01 0 0 0 0 0 1 1 41 04 001 01 0 0 0 0 0 1 1 49 05 001 01 0 0 0 0 0 1 1 51 06 001 01 0 0 0 0 0 1 1 59 07 001 01 0 0 0 0 0 1 1 61 08 001 01 0 0 0 0 0 1 1 69 09 001 01 0 0 0 0 0 1 1 71 0a 001 01 0 0 0 0 0 1 1 79 0b 001 01 0 0 0 0 0 1 1 81 0c 001 01 0 0 0 0 0 1 1 89 0d 001 01 0 0 0 0 0 1 1 91 0e 001 01 0 0 0 0 0 1 1 99 0f 001 01 0 0 0 0 0 1 1 A1 10 000 00 1 0 0 0 0 0 0 00 11 000 00 1 0 0 0 0 0 0 00 12 000 00 1 0 0 0 0 0 0 00 13 000 00 1 0 0 0 0 0 0 00 14 000 00 1 0 0 0 0 0 0 00 15 000 00 1 0 0 0 0 0 0 00 16 000 00 1 0 0 0 0 0 0 00 17 000 00 1 0 0 0 0 0 0 00
IO APIC #3...... .... register #00: 03000000 ....... : physical APIC id: 03 .... register #01: 00030011 ....... : max redirection entries: 0003 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 000 00 1 0 0 0 0 0 0 00 02 000 00 1 0 0 0 0 0 0 00 03 000 00 1 0 0 0 0 0 0 00
IO APIC #4...... .... register #00: 04000000 ....... : physical APIC id: 04 .... register #01: 00030011 ....... : max redirection entries: 0003 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 000 00 1 0 0 0 0 0 0 00 02 000 00 1 0 0 0 0 0 0 00 03 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 .................................... done. Using local APIC timer interrupts. Detected 12.434 MHz APIC timer. cpu: 0, clocks: 1989447, slice: 663149 CPU0T0:1989440,T1:1326288,D:3,S:663149,C:1989447 cpu: 1, clocks: 1989447, slice: 663149 CPU1T0:1989440,T1:663136,D:6,S:663149,C:1989447 checking TSC synchronization across CPUs: passed. testing NMI watchdog ... OK. time.c: Using PIT/TSC based timekeeping. Starting migration thread for cpu 0 smp_num_cpus: 2. Starting migration thread for cpu 1 ACPI: Subsystem revision 20031002 PCI: Using configuration type 1 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: System [ACPI] (supports S0 S1 S4 S5) ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [_SB_.PCI0.PCI1._PRT] ACPI: PCI Interrupt Routing Table [_SB_.PCI0.GOLA._PRT] ACPI: PCI Interrupt Routing Table [_SB_.PCI0.GOLB._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) PCI: Using configuration type 1 PCI: Probing PCI hardware IOAPIC[0]: Set PCI routing entry (2-16 -> 0xa9 -> IRQ 16) Mode:1 Active:1 00:00:07[A] -> 2-16 -> vector 0xa9 -> IRQ 16 IOAPIC[0]: Set PCI routing entry (2-17 -> 0xb1 -> IRQ 17) Mode:1 Active:1 00:00:07[B] -> 2-17 -> vector 0xb1 -> IRQ 17 IOAPIC[0]: Set PCI routing entry (2-18 -> 0xb9 -> IRQ 18) Mode:1 Active:1 00:00:07[C] -> 2-18 -> vector 0xb9 -> IRQ 18 IOAPIC[0]: Set PCI routing entry (2-19 -> 0xc1 -> IRQ 19) Mode:1 Active:1 00:00:07[D] -> 2-19 -> vector 0xc1 -> IRQ 19 Pin 2-19 already programmed Pin 2-18 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-19 already programmed Pin 2-17 already programmed IOAPIC[1]: Set PCI routing entry (3-3 -> 0xc9 -> IRQ 27) Mode:1 Active:1 00:02:08[A] -> 3-3 -> vector 0xc9 -> IRQ 27 IOAPIC[1]: Set PCI routing entry (3-0 -> 0xd1 -> IRQ 24) Mode:1 Active:1 00:02:08[B] -> 3-0 -> vector 0xd1 -> IRQ 24 IOAPIC[1]: Set PCI routing entry (3-1 -> 0xd9 -> IRQ 25) Mode:1 Active:1 00:02:08[C] -> 3-1 -> vector 0xd9 -> IRQ 25 IOAPIC[1]: Set PCI routing entry (3-2 -> 0xe1 -> IRQ 26) Mode:1 Active:1 00:02:08[D] -> 3-2 -> vector 0xe1 -> IRQ 26 Pin 3-2 already programmed Pin 3-3 already programmed Pin 3-0 already programmed Pin 3-1 already programmed Pin 3-0 already programmed Pin 3-1 already programmed Pin 3-0 already programmed Pin 3-1 already programmed IOAPIC[2]: Set PCI routing entry (4-0 -> 0xe9 -> IRQ 28) Mode:1 Active:1 00:01:03[A] -> 4-0 -> vector 0xe9 -> IRQ 28 IOAPIC[2]: Set PCI routing entry (4-1 -> 0x32 -> IRQ 29) Mode:1 Active:1 00:01:03[B] -> 4-1 -> vector 0x32 -> IRQ 29 IOAPIC[2]: Set PCI routing entry (4-2 -> 0x3a -> IRQ 30) Mode:1 Active:1 00:01:03[C] -> 4-2 -> vector 0x3a -> IRQ 30 IOAPIC[2]: Set PCI routing entry (4-3 -> 0x42 -> IRQ 31) Mode:1 Active:1 00:01:03[D] -> 4-3 -> vector 0x42 -> IRQ 31 Pin 4-1 already programmed Pin 4-2 already programmed Pin 4-3 already programmed Pin 4-0 already programmed Pin 4-1 already programmed Pin 4-2 already programmed Pin 4-3 already programmed Pin 4-0 already programmed Pin 4-2 already programmed Pin 4-3 already programmed Pin 4-0 already programmed Pin 4-1 already programmed Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 3852M agpgart: no supported devices found. PCI-DMA: Disabling IOMMU. Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd VFS: Disk quotas vdquot_6.5.1 Journalled Block Device driver loaded IA32 emulation $Id: sys_ia32.c,v 1.62 2003/09/22 04:25:53 ak Exp $ initialize_kbd: Keyboard reset failed, no ACK Detected PS/2 Mouse Port. pty: 2048 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI enabled ttyS0 at 0x03f8 (irq = 4) is a 16550A Real Time Clock Driver v1.10e NET4: Frame Diverter 0.46 RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx AMD8111: IDE controller at PCI slot 00:07.1 AMD8111: chipset revision 3 AMD8111: not 100% native mode: will probe irqs later AMD_IDE: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) UDMA100 controller on pci00:07.1 ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:DMA ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio hdb: LITE-ON LTR-52327S, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Initializing Cryptographic API NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 16384 buckets, 256Kbytes TCP: Hash tables configured (established 131072 bind 65536) Linux IP multicast router 0.06 plus PIM-SM NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. RAMDISK: Compressed image found at block 0 VFS: Mounted root (ext2 filesystem). SCSI subsystem driver Revision: 1.00 3ware Storage Controller device driver for Linux v1.02.00.036. scsi0 : Found a 3ware Storage Controller at 0xac00, IRQ: 26, P-chip: 1.3 scsi0 : 3ware Storage Controller blk: queue 0000010037f10030, no I/O memory limit Vendor: 3ware Model: Logical Disk 0 Rev: 1.0 Type: Direct-Access ANSI SCSI revision: 00 blk: queue 00000100f9fade30, no I/O memory limit Vendor: 3ware Model: Logical Disk 6 Rev: 1.0 Type: Direct-Access ANSI SCSI revision: 00 blk: queue 00000100f9fadc30, no I/O memory limit Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi disk sdb at scsi0, channel 0, id 6, lun 0 SCSI device sda: 1953599360 512-byte hdwr sectors (1000243 MB) Partition check: sda: sda1 SCSI device sdb: 234439600 512-byte hdwr sectors (120033 MB) sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 > libata version 0.81 loaded. sata_promise version 0.87 ata1: SATA max UDMA/133 cmd 0xFFFFFF000001A200 ctl 0xFFFFFF000001A238 bmdma 0x0 irq 17 ata2: SATA max UDMA/133 cmd 0xFFFFFF000001A280 ctl 0xFFFFFF000001A2B8 bmdma 0x0 irq 17 ata1: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:207f ata1: dev 0 ATA, max UDMA/133, 72303840 sectors (lba48) ata1: dev 0 configured for UDMA/133 ata2: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:207f ata2: dev 0 ATA, max UDMA/133, 72303840 sectors (lba48) ata2: dev 0 configured for UDMA/133 scsi1 : sata_promise scsi2 : sata_promise Vendor: ATA Model: WDC WD360GD-00FN Rev: 0.81 Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: WDC WD360GD-00FN Rev: 0.81 Type: Direct-Access ANSI SCSI revision: 05 sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 > libata version 0.81 loaded. sata_promise version 0.87 ata1: SATA max UDMA/133 cmd 0xFFFFFF000001A200 ctl 0xFFFFFF000001A238 bmdma 0x0 irq 17 ata2: SATA max UDMA/133 cmd 0xFFFFFF000001A280 ctl 0xFFFFFF000001A2B8 bmdma 0x0 irq 17 ata1: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:207f ata1: dev 0 ATA, max UDMA/133, 72303840 sectors (lba48) ata1: dev 0 configured for UDMA/133 ata2: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:207f ata2: dev 0 ATA, max UDMA/133, 72303840 sectors (lba48) ata2: dev 0 configured for UDMA/133 scsi1 : sata_promise scsi2 : sata_promise Vendor: ATA Model: WDC WD360GD-00FN Rev: 0.81 Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: WDC WD360GD-00FN Rev: 0.81 Type: Direct-Access ANSI SCSI revision: 05 Attached scsi disk sdc at scsi1, channel 0, id 0, lun 0 Attached scsi disk sdd at scsi2, channel 0, id 0, lun 0 SCSI device sdc: 72303840 512-byte hdwr sectors (37020 MB) sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 > SCSI device sdd: 72303840 512-byte hdwr sectors (37020 MB) sdd: sdd1 sdd2 sdd3 sdd4 < sdd5 sdd6 sdd7 > kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Freeing unused kernel memory: 160k freed ACPI: Power Button (FF) [PWRF] ACPI: Processor [CPU1] (supports C1, 8 throttling states) ACPI: Processor [CPU2] (supports C1) usb.c: registered new driver usbdevfs usb.c: registered new driver hub usb-ohci.c: USB OHCI at membase 0xffffff000001c000, IRQ 19 usb-ohci.c: usb-03:00.0, Advanced Micro Devices [AMD] AMD-8111 USB usb.c: new USB bus registered, assigned bus number 1 hub.c: USB hub found hub.c: 3 ports detected usb-ohci.c: USB OHCI at membase 0xffffff000001e000, IRQ 19 usb-ohci.c: usb-03:00.1, Advanced Micro Devices [AMD] AMD-8111 USB (#2) usb.c: new USB bus registered, assigned bus number 2 hub.c: USB hub found hub.c: 3 ports detected usb.c: registered new driver hiddev usb.c: registered new driver hid hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik vojtech@suse.cz hid-core.c: USB HID support drivers mice: PS/2 mouse device common for all mice EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,19), internal journal Adding Swap: 8385920k swap-space (priority -1) hub.c: new USB device 03:00.0-2, assigned address 2 input: USB HID v1.10 Keyboard [HEWLETT PACKARD USB Keyboard ] on usb1:2.0 input,hiddev0: USB HID v1.10 Device [HEWLETT PACKARD USB Keyboard ] on usb1:2.1 kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,1), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,21), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,22), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,23), internal journal EXT3-fs: mounted filesystem with ordered data mode. hdb: attached ide-scsi driver. scsi3 : SCSI host adapter emulation for IDE ATAPI devices Vendor: LITE-ON Model: LTR-52327S Rev: QS03 Type: CD-ROM ANSI SCSI revision: 02
</snip>
Justin M. Forbes wrote:
Would you mind giving it a spin on 2135? I am wondering if this is related to some of the NUMA changes in 2163.
No issues whatsoever on 2135. Hasn't happened again in several hours, so it appears that whatever was causing the issue is *not* present in the 2135 kernel.
Thanks Justin, though I assume I'm likely losing something important in downgrading. :)
If you need specific tests run on my end to further isolate the issue, please let me know.
Ken Snider wrote:
No issues whatsoever on 2135. Hasn't happened again in several hours, so it appears that whatever was causing the issue is *not* present in the 2135 kernel.
Spoke too soon. Came in this morning to the same two errors. Even crashed the db engine overnight. I'll have to update my bugzilla ticket I see.
It should be noted that this same hardware ran RH9 and RHEL 3WS (x86) without issue, before hardware is considered as the culprit:
<0>general protection fault: 0000 CPU 0 Pid: 3809, comm: slowknl Not tainted RIP: 0010:[<ffffffff801c73f4>]{ia32_copy_siginfo_to_user+244} RSP: 0000:00000100f56ede40 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 0000000059374710 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000100f56edf38 RDI: 00000000593747a0 RBP: 0000000000000000 R08: 00000100f3dcdf60 R09: 0000000000000000 R10: ffffffff8013daf1 R11: 0000000000010283 R12: 00000100f56edf58 R13: 00000100f648a928 R14: 0000000000000020 R15: 00000100f56eca58 FS: 0000002a9555d620(0000) GS:ffffffff80572800(005b) knlGS:0000000059374bb0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 000000000875a010 CR3: 0000000000101000 CR4: 00000000000006e0 Process slowknl (pid: 3809, stackpage=100f56ed000) Stack: 00000100f56ede40 0000000000000000 5557ed68801c7fce 0000000400000000 59374bb000000000 0000000100000000 0000008000000000 00000ee100000000 ffffffff00000000 802c37c4ffffffff 00000010ffffffff 0001028300000000 Call Trace: [<ffffffff801101ce>]{do_signal+158} [<ffffffff802c37c4>]{bad_get_user+0} [<ffffffff8013daf1>]{compat_sys_sched_setaffinity+17} [<ffffffff8011061b>]{intret_signal+45}
Code: c3 66 66 66 90 66 66 66 90 66 66 90 48 83 ec 18 48 81 e2 ff RIP [<ffffffff801c73f4>]{ia32_copy_siginfo_to_user+244} RSP <00000100f56ede40> CPU 0 Pid: 3809, comm: slowknl Not tainted RIP: 0010:[<ffffffff801c73f4>]{ia32_copy_siginfo_to_user+244} RSP: 0000:00000100f56ede40 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 0000000059374710 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000100f56edf38 RDI: 00000000593747a0 RBP: 0000000000000000 R08: 00000100f3dcdf60 R09: 0000000000000000 R10: ffffffff8013daf1 R11: 0000000000010283 R12: 00000100f56edf58 R13: 00000100f648a928 R14: 0000000000000020 R15: 00000100f56eca58 FS: 0000002a9555d620(0000) GS:ffffffff80572800(005b) knlGS:0000000059374bb0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 000000000875a010 CR3: 0000000000101000 CR4: 00000000000006e0 Process slowknl (pid: 3809, stackpage=100f56ed000) Stack: 00000100f56ede40 0000000000000000 5557ed68801c7fce 0000000400000000 59374bb000000000 0000000100000000 0000008000000000 00000ee100000000 ffffffff00000000 802c37c4ffffffff 00000010ffffffff 0001028300000000 Call Trace: [<ffffffff801101ce>]{do_signal+158} [<ffffffff802c37c4>]{bad_get_user+0} [<ffffffff8013daf1>]{compat_sys_sched_setaffinity+17} [<ffffffff8011061b>]{intret_signal+45}
Code: c3 66 66 66 90 66 66 66 90 66 66 90 48 83 ec 18 48 81 e2 ff
Looks like I'll be going back to an x86-based RedHat at this point, at least in the short term, since this server is useless as a database host at the moment :/
More information for you all.
Decided to grab the MAXDB Alpha RPM's from mysql (http://www.mysql.com/downloads/maxdb-7.5.01.html), and see what effect it had on the kernel.
At Justin's request, I'm running the 2135-rev kernel.
The problem is *worse* if I decide to start the db, or load the demo_db script, however, passing LD_ASSUME_KERNEL=2.2.5 before the xserver starts, of before we issue the db_start dbmcli command *eliminates* the issue completely.
So, that would place the blame for the kernel messages squarely on NPTL in some way, as disabling NPTL eliminates the kernel messages.
I still have the problem where enabling LD_ASSUME_KERNEL=2.2.5 in my shell causes all system commands to be unable to find libraries.. a problem I don't see in FC1_x86 or RHEL.