Subject: Re: race condition in pthread write()?
To: Nathan J. Williams <nathanw@wasabisystems.com>
From: Karl Janmar <karlj@mdstud.chalmers.se>
List: tech-kern
Date: 02/11/2003 18:54:22
On 11 Feb 2003, Nathan J. Williams wrote:
> Karl Janmar <karlj@mdstud.chalmers.se> writes:
>
> > I think there is a race condition in write()-libc or pthread, I don't
> > exactly where to look, but here is my case:
> > I run xmms with pthread, it use one thread for writing to esd, and some
> > others for encode the mp3-stream. After a while (could be up to 1-2 hour)
> > it freeze. when I do ps axs it shows that all threads(10 or something) is
> > stuck in sastacks (kern_sa.c:sa_uppcall_userret() waiting for a stack to
> > run?) and ONE is in waiting (it says wait in the WCHAN col. in ps).
>
> Okay, the problem is that the process isn't recycling enough stacks
> for the kernel. A few other people have seen this; I'm not entirely
> sure what the solution is.
>
> > When I do a backtrace on the running xmms i found this:
> > #0 0x4856fffb in write () from /usr/lib/libc.so.12
> > #1 0x482e5f65 in write () from /usr/lib/libpthread.so.0
> > #2 0x4863b8ef in get_oplugin_info ()
> > from /usr/pkg/lib/xmms/Output/libesdout.so
> > #3 0x4863bbed in get_oplugin_info ()
> > from /usr/pkg/lib/xmms/Output/libesdout.so
> > #4 0x1c340 in ?? ()
>
> I believe this is an artifact of GDB making a poor choice of an
> initial thread to show you. If you get into this state, can you
> additionally run "info thread" and "thread ex all" in GDB?
>
> - Nathan
>
Here is some more data:
(gdb) info thread
5 Thread 0 0x485700f7 in poll () from /usr/lib/libc.so.12
4 Thread 21 0x4857008b in select () from /usr/lib/libc.so.12
3 Thread 22 0x4857013f in nanosleep () from /usr/lib/libc.so.12
2 Thread 69 0x4857013f in nanosleep () from /usr/lib/libc.so.12
* 1 Thread 70 0x482e5f65 in write () from /usr/lib/libpthread.so.0
(gdb) thread ex all
0x48fc0000: thread 70 in kernel
0x48f80000: thread 69 in kernel
0x48f40000: thread 22 in kernel
0x48b40000: thread 21 in kernel
0xbfbc0000: thread 0 in kernel
Maybe this will help a little more.
If I get the point the sa_nstacks is running out because the stacks aren't
recycle properly, so someware in kern_sa.c it "leaks stacks"?
- Karl