Dear list,
I'm experiencing random freezes of two desktops, both up-to-date FC 18, totally different hardware (Core2Duo E5300, Core i7), different Graphis Cards but both Nvidia using nouveau with a dual monitor setup.
Desktop completely freezes, Num Lock on keyboard is dead. Gut feeling tells me it is induced by scrolling.
When happening, I can ssh into the machine, everything looks perfectly normal. No 100% CPU on nothing, no zombies, no whatever. Nothing in dmesg, Xorg.0.log, messages.
Restart of graphical.target does not do anything, when killing /usr/bin/X (with -9 though) I get kdm back.
Any thoughts where to debug further? regards Jens
On 06/17/2013 04:06 PM, Jens Neu wrote:
On 17.06.2013 15:49, Joe Zeff wrote:
Have you looked in ~/.xsession-errors yet?
I have a couple of: libpng warning: Application built with libpng-1.2.8 but running with 1.5.13 libpng warning: Application built with libpng-1.2.8 but running with 1.5.13 libpng warning: Application built with libpng-1.2.8 but running with 1.5.13
at the end, nothing more interesting.
I did an strace of the (already hanging) /usr/bin/X before killing it with -9:
futex(0x3a6a50d3e0, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- rt_sigreturn() = -1 EINTR (Interrupted system call) futex(0x3a6a50d3e0, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- rt_sigreturn() = -1 EINTR (Interrupted system call) futex(0x3a6a50d3e0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> +++ killed by SIGKILL +++
Any ideas?
regards Jens
On 19.06.2013 16:14, Roberto Ragusa wrote:
pstack on the process (better if you previously install the debuginfo rpm for the X server)
pstack with debuginfo for XServer as well as nouveau package:
#0 0x0000003a5bc0de4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003a5bc09cc1 in _L_lock_885 () from /lib64/libpthread.so.0 #2 0x0000003a5bc09bda in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003a6a2a86c0 in ?? () from /lib64/libGL.so.1 #4 0x0000003a6a2acea0 in ?? () from /lib64/libGL.so.1 #5 0x0000003a6a2ad3ca in ?? () from /lib64/libGL.so.1 #6 0x0000003a67401bf7 in ?? () from /lib64/tls/libnvidia-tls.so.310.32 #7 0x00007f430a4ca96f in mtdev_fetch_event () from /lib64/libmtdev.so.1 #8 0x00007f430a4caa8d in mtdev_get () from /lib64/libmtdev.so.1 #9 0x00007f430a6d3e06 in EvdevReadInput () from /usr/lib64/xorg/modules/input/evdev_drv.so #10 0x0000000000489db7 in xf86SigioReadInput (fd=<optimized out>, closure=0x29ba480) at xf86Events.c:299 #11 0x00000000004b3528 in xf86SIGIO (sig=<optimized out>) at ../shared/sigio.c:110 #12 <signal handler called> #13 0x0000003a5bc09bd4 in pthread_mutex_lock () from /lib64/libpthread.so.0 #14 0x0000003a6a2a86c0 in ?? () from /lib64/libGL.so.1 #15 0x0000003a6a2acea0 in ?? () from /lib64/libGL.so.1 #16 0x0000003a6a2ad386 in ?? () from /lib64/libGL.so.1 #17 0x0000003a67401bf7 in ?? () from /lib64/tls/libnvidia-tls.so.310.32 #18 0x0000000000478f06 in read (__nbytes=4096, __buf=0x2f19e50, __fd=<optimized out>) at /usr/include/bits/unistd.h:44 #19 _XSERVTransSocketRead (ciptr=0x3648870, buf=0x2f19e50 "\024", size=4096) at /usr/include/X11/Xtrans/Xtranssock.c:2116 #20 0x000000000046f2a6 in ReadRequestFromClient (client=client@entry=0x2f0ccd0) at io.c:332 #21 0x0000000000439666 in Dispatch () at dispatch.c:399 #22 0x00000000004282da in main (argc=9, argv=0x7fff882da638, envp=<optimized out>) at main.c:298
anyway, I'm close to suspect temperature problems on the graphics card, however lmsensors does not give me a reading on my card (NVIDIA Corporation G94 [Quadro FX 1800] (rev a1)):
$ sensors nouveau-pci-0100 Adapter: PCI adapter temp1: +0.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C)
On 20.06.2013 14:23, Jens Neu wrote:
On 19.06.2013 16:14, Roberto Ragusa wrote:
pstack on the process (better if you previously install the debuginfo rpm for the X server)
pstack with debuginfo for XServer as well as nouveau package:
…
nouveau.ko dri-devel@lists.freedesktop.org
nouveau_drv.so, nouveau_dri.so, libvdpau_nouveau.so nouveau@lists.freedesktop.org
poma
On 06/20/2013 02:23 PM, Jens Neu wrote:
On 19.06.2013 16:14, Roberto Ragusa wrote:
pstack on the process (better if you previously install the debuginfo rpm for the X server)
pstack with debuginfo for XServer as well as nouveau package:
#0 0x0000003a5bc0de4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003a5bc09cc1 in _L_lock_885 () from /lib64/libpthread.so.0 #2 0x0000003a5bc09bda in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000003a6a2a86c0 in ?? () from /lib64/libGL.so.1 #4 0x0000003a6a2acea0 in ?? () from /lib64/libGL.so.1 #5 0x0000003a6a2ad3ca in ?? () from /lib64/libGL.so.1 #6 0x0000003a67401bf7 in ?? () from /lib64/tls/libnvidia-tls.so.310.32 #7 0x00007f430a4ca96f in mtdev_fetch_event () from /lib64/libmtdev.so.1 #8 0x00007f430a4caa8d in mtdev_get () from /lib64/libmtdev.so.1 #9 0x00007f430a6d3e06 in EvdevReadInput () from /usr/lib64/xorg/modules/input/evdev_drv.so #10 0x0000000000489db7 in xf86SigioReadInput (fd=<optimized out>, closure=0x29ba480) at xf86Events.c:299 #11 0x00000000004b3528 in xf86SIGIO (sig=<optimized out>) at ../shared/sigio.c:110 #12 <signal handler called> #13 0x0000003a5bc09bd4 in pthread_mutex_lock () from /lib64/libpthread.so.0 #14 0x0000003a6a2a86c0 in ?? () from /lib64/libGL.so.1 #15 0x0000003a6a2acea0 in ?? () from /lib64/libGL.so.1 #16 0x0000003a6a2ad386 in ?? () from /lib64/libGL.so.1 #17 0x0000003a67401bf7 in ?? () from /lib64/tls/libnvidia-tls.so.310.32 #18 0x0000000000478f06 in read (__nbytes=4096, __buf=0x2f19e50, __fd=<optimized out>) at /usr/include/bits/unistd.h:44 #19 _XSERVTransSocketRead (ciptr=0x3648870, buf=0x2f19e50 "\024", size=4096) at /usr/include/X11/Xtrans/Xtranssock.c:2116 #20 0x000000000046f2a6 in ReadRequestFromClient (client=client@entry=0x2f0ccd0) at io.c:332 #21 0x0000000000439666 in Dispatch () at dispatch.c:399 #22 0x00000000004282da in main (argc=9, argv=0x7fff882da638, envp=<optimized out>) at main.c:298
Hmmm, I do not want to make false accusation, but as I'm reading it, the code is doing something (22-14), it appears to be reading from a socket (18), this implies taking a lock (13), a signal arrives in that moment (12) and something scary is done in the signal handler (11-0), including some kind of read (10) which implies taking some lock (2); probably deadlocking with 13, as the process remains stuck at 0.
It smells like a bug. Too much dangerous processing in a sig handler.
Anyone with better reading of this?
On 20.06.2013 15:17, Roberto Ragusa wrote:
#17 0x0000003a67401bf7 in ?? () from /lib64/tls/libnvidia-tls.so.310.32
Wait a moment, this does not look as nouveau to me.
I've used the nvidia drivers on this host months ago from rpmfusion as well as nvidia- binary installer but failure this was. I switched back to nouveau pretty fast. I removed all the kmod-nvidia packages (from rpmfusion). I think the libnvidia-tls thing is a relict of the nvidia- installer. Very strange that this thing gets called since:
# lsmod | grep nouveau nouveau 984080 2 mxm_wmi 12865 1 nouveau wmi 18697 2 mxm_wmi,nouveau i2c_algo_bit 13257 1 nouveau drm_kms_helper 46343 1 nouveau ttm 79750 1 nouveau drm 272623 4 ttm,drm_kms_helper,nouveau i2c_core 34096 6 drm,i2c_i801,drm_kms_helper,i2c_algo_bit,adt7475,nouveau video 18991 1 nouveau
is pretty explicit.
However, I just sweeped the nvidia binary stuff, hopefully everything is gone now...
regards Jens
On 21.06.2013 10:27, Jens Neu wrote:
On 20.06.2013 15:17, Roberto Ragusa wrote:
#17 0x0000003a67401bf7 in ?? () from /lib64/tls/libnvidia-tls.so.310.32
Wait a moment, this does not look as nouveau to me.
found it. Root cause: the nvidia-uninstaller "forgot" to remove his version of libGL.so.
regards Jens