Subject: Why SAs suck, part N
To: None <tech-kern@netbsd.org>
From: Charles M. Hannum <abuse@spamalicious.com>
List: tech-kern
Date: 01/02/2005 12:56:09
So I've been tracking down a problem with the native Java VM hanging.  
Sometimes it manifests itself with the process spinning and taking 100% of 
the CPU; sometimes the process just stops.  As it happens, this is due to a 
fundamental design flaw in the way upcalls are implemented.

If I understand this correctly, the user-level library creates a bunch of 
special-purpose threads for handling upcalls.  When an upcall happens, a 
thread is allocated (by using copyin() to inspect each one until we find an 
empty one!) and the current processor state is saved there.  The user-level 
upcall handler is poked into the PC, and we return.

The upcall stack/thread gets cleaned up later when the LWP is unblocked, in 
response to another upcall.  This is where the fun begins.

Since any long-sleeping LWP may have a SA_UPCALL_BLOCKED upcall associated 
with it, we need an upcall slot for every LWP!  But the thread library only 
allocates a fixed number (16).  When we run out, we're toast.  Various bad 
things happen:

* We can no longer send SA_UPCALL_UNLOCKED upcalls to unblock sleeping 
threads.  This is a deadlock.

* We can spin trying to deliver timer signals, because sa_upcall() keeps 
returning ENOMEM and we never drop the signal.

* We can actually crash through one code path when trying to kill the horked 
process.

There are two apparent solutions to this:

1) Allocate an upcall stack for every LWP, and a few extra for things like 
SA_UPCALL_SIGEV.  Unfortunately, we don't actually know how many LWPs there 
are, and more may be added dynamically, so this is problematic.  It's also 
wasteful.  OTOH, having it scale down when there are fewer LWPs would reduce 
resource waste in things like nslookup.

2) Completely rework how state is stored.  I suggest allocating a save area at 
the bottom of each thread stack, where we already store the pthread 
structure, and rework the interface so that we only ever save the thread 
state once, no matter how many different types of upcalls we're sending to 
it.  This would eliminate the extra memory use, and would also make the 
upcall mechanism faster, by eliminating the silly stack allocation mechanism.

Anyone have further insights on this?