Subject: Re: race condition in pthread write()?
To: Nathan J. Williams <firstname.lastname@example.org>
From: Karl Janmar <email@example.com>
Date: 02/11/2003 23:34:12
On 11 Feb 2003, Nathan J. Williams wrote:
> Karl Janmar <firstname.lastname@example.org> writes:
> > Here is some more data:
> > (gdb) info thread
> > 5 Thread 0 0x485700f7 in poll () from /usr/lib/libc.so.12
> > 4 Thread 21 0x4857008b in select () from /usr/lib/libc.so.12
> > 3 Thread 22 0x4857013f in nanosleep () from /usr/lib/libc.so.12
> > 2 Thread 69 0x4857013f in nanosleep () from /usr/lib/libc.so.12
> > * 1 Thread 70 0x482e5f65 in write () from /usr/lib/libpthread.so.0
> > (gdb) thread ex all
> > 0x48fc0000: thread 70 in kernel
> > 0x48f80000: thread 69 in kernel
> > 0x48f40000: thread 22 in kernel
> > 0x48b40000: thread 21 in kernel
> > 0xbfbc0000: thread 0 in kernel
> OK. All the userland threads are sleeping in the kernel (actually,
> they probably woke up from their original system call, tried to
> deliver a UNBLOCKED upcall, and are sleeping waiting for a stack to
> become avaliable). The fact that there's still a thread consuming CPU
> probably means another bug in libpthread.. possibly in the
> resolve_locks dance in pthread_sa.c; more likely a spinlock count is
> going negative somewhere, which will confuse the resolve_locks algorithm.
> This is i386, right (guessing from the address of thread 0)? I'll send
> you some additional debugging tools in a little bit.
> > If I get the point the sa_nstacks is running out because the stacks aren't
> > recycle properly, so someware in kern_sa.c it "leaks stacks"?
> That's one possibility... more likely, though, is that the userland
> code is wedged in a state where it doesn't get around to recycling the
> used stacks back to the kernel.
> - Nathan
I have some more info after logging some debug info.
I think sa_getcachelwp() get the hold of a LWP that is not part of the
game. I have 6 LWP running for a long while and then it get a LWP that
haven't been there before LWP 8 and then the hell breaks out, this in turn
get LWP 9 from sa_getcachelwp() and so on until the 19'nth LWP is pulled
out then, the number of stacks is finished.
I will have a look in sa_getcachelwp(), but maybe this info will help you
a little. The only thing I can do is hitting a little bit in the dark.