Subject: Why SAs suck, part N
To: None <firstname.lastname@example.org>
From: Charles M. Hannum <email@example.com>
Date: 01/02/2005 12:56:09
So I've been tracking down a problem with the native Java VM hanging.
Sometimes it manifests itself with the process spinning and taking 100% of
the CPU; sometimes the process just stops. As it happens, this is due to a
fundamental design flaw in the way upcalls are implemented.
If I understand this correctly, the user-level library creates a bunch of
special-purpose threads for handling upcalls. When an upcall happens, a
thread is allocated (by using copyin() to inspect each one until we find an
empty one!) and the current processor state is saved there. The user-level
upcall handler is poked into the PC, and we return.
The upcall stack/thread gets cleaned up later when the LWP is unblocked, in
response to another upcall. This is where the fun begins.
Since any long-sleeping LWP may have a SA_UPCALL_BLOCKED upcall associated
with it, we need an upcall slot for every LWP! But the thread library only
allocates a fixed number (16). When we run out, we're toast. Various bad
* We can no longer send SA_UPCALL_UNLOCKED upcalls to unblock sleeping
threads. This is a deadlock.
* We can spin trying to deliver timer signals, because sa_upcall() keeps
returning ENOMEM and we never drop the signal.
* We can actually crash through one code path when trying to kill the horked
There are two apparent solutions to this:
1) Allocate an upcall stack for every LWP, and a few extra for things like
SA_UPCALL_SIGEV. Unfortunately, we don't actually know how many LWPs there
are, and more may be added dynamically, so this is problematic. It's also
wasteful. OTOH, having it scale down when there are fewer LWPs would reduce
resource waste in things like nslookup.
2) Completely rework how state is stored. I suggest allocating a save area at
the bottom of each thread stack, where we already store the pthread
structure, and rework the interface so that we only ever save the thread
state once, no matter how many different types of upcalls we're sending to
it. This would eliminate the extra memory use, and would also make the
upcall mechanism faster, by eliminating the silly stack allocation mechanism.
Anyone have further insights on this?