I wanted to do this for a long time but only now I had the time and a destop beefy enough to try this. Basically I replaced /usr/bin/gnome-session by a shell script : #!/bin/sh /usr/bin/valgrind --trace-children=yes --log-file=/tmp/valgrind /usr/bin/gnome-session.orig $*
Then logged on in gdm , and checked what happened from an ssh connection top the box. The good news: - logging went through, but it took a few minutes - everything looked functional though extremely slow - there wasn't many logs reported by valgrind The bad news: - I had to stop the session shortly after the login fully complete the VM was full (1G of Ram + 500M of swap) - reports from the logs are a pain to try to analyze. - one python (rhn applet I suspect) generated a huge log, python-2.3 doesn't seems valgrindable.
I them eliminated all the empty /tmp/valgrind.pid* files, I was left with reports from oly 25 processes. First a word of warning, I used the normal optimized code as shipped as part of Fedora devel (fully up-to-date box for todays version), some of the optimizations sometimes defeat valgrind so there may be false positive.
I have tried to sort all the reports to gather together what was frequently reported because all apps went through the same code path, for example there is an error reported when opening gdk display which is reported like 30 times by various apps. So what I saw most:
- gdk_display_open leading to write(buf) contains uninitialised or unaddressable byte in __write_nocancel though _X11TransWrite hard to tell without a debugging lib if the error is a false positive a lack of initialization gdk_display_open() or within X. Strange thing is that valgrind report the block as being alloc'ed with calloc() offending address is 128 bytes inside a block of size 16384
- giop_send_buffer_write in libORBit-2 leading to Syscall param writev(vector[...]) contains uninitialised or unaddressable byte(s) that time the uninitialized data is 10 bytes inside a block of size 2048 allocated within orbit itself.
- pango read_line raises a strange pthread mutex error: pthread_mutex_lock/trylock: mutex has invalid owner in pthread_mutex_lock called by pango_read_line from pango_find_map
Apparently the GStreamer code detects it's running under valgrind and manage to shut it up :-)
Except those 3 repeated all other the place and consisting of the bulk of the reports, I have seen errors in:
- /usr/bin/gnome-session: invalid file descriptors, pango_attr_list_get_iterator uninitialized value. - /usr/bin/pam-panel-icon: 2 invalid file descriptor, seems the same as for gnome-session with value 828 too. - /usr/lib/libwnck: uninitialized values in _wnck_read_icons - /usr/libexec/gconfd-2: repeated g_strdup of initialized values from gconf_set_daemon_ior, gconf_get_lock, gconf_object_to_string, gconf_quote_string, and an fprintf - /usr/libexec/bonobo-activation-server: uninitialized values in CORBA_ORB_object_to_stringr,fprintf,giop_send_buffer_write - gam_server : I got one too :-) - metacity: uninitialized values in gdk_window_new, gdk_window_resize, gdk_region_rectangle, gdk_region_subtract, a couple of strange g_int_equal bugs, meta_display_begin_grab_op, meta_display_end_grab_op - gnome-terminal: terminal_profile_update and _vte_pty_open
The best way to double check is to do the same trick as I did for gnome-session, move the original somewhere else, replace it by a script calling valgrind but without recursion to child on a local copy of the program in debugging mode.
Enclosed are the data as sorted and recouped for more informations.
happy valgrinding,
Daniel
On Tue, 2004-09-21 at 14:05 -0400, Daniel Veillard wrote:
I wanted to do this for a long time but only now I had the time and a destop beefy enough to try this. Basically I replaced /usr/bin/gnome-session by a shell script : #!/bin/sh /usr/bin/valgrind --trace-children=yes --log-file=/tmp/valgrind /usr/bin/gnome-session.orig $*
Cool, definitely useful stuff.
- one python (rhn applet I suspect) generated a huge log, python-2.3 doesn't seems valgrindable.
Yeah, I think the problem is that Python does its own internal memory management.
- giop_send_buffer_write in libORBit-2 leading to Syscall param writev(vector[...]) contains uninitialised or unaddressable byte(s) that time the uninitialized data is 10 bytes inside a block of size 2048 allocated within orbit itself.
It's my understanding that ORBit often allocates large buffers and writes the whole thing even after only using a small portion, this works in the CORBA protocol. You can compile ORBit with some special configure option to make it initialize the buffers.
- pango read_line raises a strange pthread mutex error: pthread_mutex_lock/trylock: mutex has invalid owner in pthread_mutex_lock called by pango_read_line from pango_find_map
This one is odd, maybe Owen has an idea.
On Tue, 2004-09-21 at 16:26 -0400, Colin Walters wrote:
It's my understanding that ORBit often allocates large buffers and writes the whole thing even after only using a small portion, this works in the CORBA protocol. You can compile ORBit with some special configure option to make it initialize the buffers.
- pango read_line raises a strange pthread mutex error: pthread_mutex_lock/trylock: mutex has invalid owner in pthread_mutex_lock called by pango_read_line from pango_find_map
This one is odd, maybe Owen has an idea.
Presumably it's the flockfile() funlockfile() calls in that function. I can't see anything in this that looks wrong, so it's probably a bad valgrind/libc interaction.
Regards, Owen
On Tue, 2004-09-21 at 14:05 -0400, Daniel Veillard wrote:
- giop_send_buffer_write in libORBit-2 leading to Syscall param writev(vector[...]) contains uninitialised or unaddressable byte(s) that time the uninitialized data is 10 bytes inside a block of size 2048 allocated within orbit itself.
For "performance reason" orbit doesn't initialize padding bytes. To make it do this, build with --enable-purify.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@redhat.com alla@lysator.liu.se He's a time-tossed drug-addicted vagrant looking for a cure to the poison coursing through his veins. She's a tortured thirtysomething mechanic living on borrowed time. They fight crime!
tir, 21,.09.2004 kl. 16.47 -0400, skrev Owen Taylor:
On Tue, 2004-09-21 at 16:26 -0400, Colin Walters wrote:
It's my understanding that ORBit often allocates large buffers and writes the whole thing even after only using a small portion, this works in the CORBA protocol. You can compile ORBit with some special configure option to make it initialize the buffers.
- pango read_line raises a strange pthread mutex error: pthread_mutex_lock/trylock: mutex has invalid owner in pthread_mutex_lock called by pango_read_line from pango_find_map
This one is odd, maybe Owen has an idea.
Presumably it's the flockfile() funlockfile() calls in that function. I can't see anything in this that looks wrong, so it's probably a bad valgrind/libc interaction.
I've been looking at this a bit more and it seems that pango should be ok. valgrind seems to be mapping flockfile/funlockfile to pthread_mutex_lock/unlock directly, is this the right thing to do?
pango_read_line() uses flockfile/funlockfile on the stream passed in but still valgrind complains that a locked mutex is being free'd by way of fclose() in read_modules() in modules.c:501. Does this suggest a race between two threads calling these functions maybe?
Cheers Kjartan
desktop@lists.stg.fedoraproject.org