Subject: Re: "parked" processes hell: debug available..
To: George Michaelson <ggm@apnic.net>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 09/21/2007 17:33:30
On Fri, Sep 21, 2007 at 05:10:32PM +0100, Andrew Doran wrote:

> On Fri, Sep 21, 2007 at 10:31:25AM +1000, George Michaelson wrote:
> > This parking thing is still there for me. even with -O0 state on
> > libpthread.
> > 
> > I have been able to get a back-trace on where its doing the
> > parking. Its coming in (in this instance) via citrus/i8n stuff.
> 
> Thanks, that's very useful.
> 
> > (to think that we have allowed s/w to get this horrendously, insanely
> > complicated. a 62-deep procedure-call stack..)

Indeed. Around 10 hours of stress testing went into the latest rwlock/mutex
changes on a variety of systems, and they passed every test that I came up
with. Still, there is always something more :-)
 
> > Program received signal SIGINT, Interrupt.
> > 0xbaf01987 in _lwp_park () from /usr/lib/libc.so.12
> > (gdb) where
> > #0  0xbaf01987 in _lwp_park () from /usr/lib/libc.so.12
> > #1  0xbb3ed94f in pthread__park () from /usr/lib/libpthread.so.0
> > #2  0xbb3e9662 in pthread_rwlock_tryrdlock () from /usr/lib/libpthread.so.0
> > #3  0xbb3e9808 in pthread_rwlock_wrlock () from /usr/lib/libpthread.so.0
> > #4  0xbaf4c854 in _citrus_mapper_open () from /usr/lib/libc.so.12
> 
> I can reproduce this locally with gnome-terminal. It appears to be a problem
> in libc, with citrus trying to recursively acquire a reader-writer lock. It
> would have silently errored out before. I'm trying to figure out how it's
> managing to recurse like that.

I don't know what citrus is doing, but I have changed the rwlocks to be more
permissive and so behave as they did before (and match the standard).

Revision 1.5 of src/lib/libpthread/pthread_rwlock2.c has the fix - with this
change gnome-terminal works for me again.

Thanks,
Andrew