Subject: Re: race condition in pthread write()?
To: Nathan J. Williams <nathanw@wasabisystems.com>
From: Karl Janmar <karlj@mdstud.chalmers.se>
List: tech-kern
Date: 02/11/2003 23:50:40
On Tue, 11 Feb 2003, Karl Janmar wrote:

> On 11 Feb 2003, Nathan J. Williams wrote:
>
> > Karl Janmar <karlj@mdstud.chalmers.se> writes:
> >
> > > Here is some more data:
> > > (gdb) info thread
> > >   5 Thread 0  0x485700f7 in poll () from /usr/lib/libc.so.12
> > >   4 Thread 21  0x4857008b in select () from /usr/lib/libc.so.12
> > >   3 Thread 22  0x4857013f in nanosleep () from /usr/lib/libc.so.12
> > >   2 Thread 69  0x4857013f in nanosleep () from /usr/lib/libc.so.12
> > > * 1 Thread 70  0x482e5f65 in write () from /usr/lib/libpthread.so.0
> > > (gdb) thread ex all
> > > 0x48fc0000: thread   70 in kernel
> > > 0x48f80000: thread   69 in kernel
> > > 0x48f40000: thread   22 in kernel
> > > 0x48b40000: thread   21 in kernel
> > > 0xbfbc0000: thread    0 in kernel
> >
> > OK. All the userland threads are sleeping in the kernel (actually,
> > they probably woke up from their original system call, tried to
> > deliver a UNBLOCKED upcall, and are sleeping waiting for a stack to
> > become avaliable). The fact that there's still a thread consuming CPU
> > probably means another bug in libpthread.. possibly in the
> > resolve_locks dance in pthread_sa.c; more likely a spinlock count is
> > going negative somewhere, which will confuse the resolve_locks algorithm.
> >
> > This is i386, right (guessing from the address of thread 0)? I'll send
> > you some additional debugging tools in a little bit.
> >
> > > If I get the point the sa_nstacks is running out because the stacks aren't
> > > recycle properly, so someware in kern_sa.c it "leaks stacks"?
> >
> > That's one possibility... more likely, though, is that the userland
> > code is wedged in a state where it doesn't get around to recycling the
> > used stacks back to the kernel.
> >
> >         - Nathan
> >
>
>
> Okej, after looking in sa_getcachelwp():
>
> what is 	/* XXX lock sadata */
> and 	/* XXX unlock */
> maybe this has something to do with the problem??
> a race condition after all??
>
> Regards, Karl.
>

Sorry for being to fast.
missed,
SCHED_ASSERT_LOCKED();

 - Karl