Subject: SA, libpthread and page faults
To: None <tech-kern@NetBSD.org>
From: Stephan Uphoff <ups@stups.com>
List: tech-kern
Date: 08/13/2003 14:30:25
Hi,

I am looking at what needs to be done to replace the current kernel 
part of Scheduler Activations (SA).

Unfortunately I see some general problems with the current
SA interface and page faults that I would like to resolve 
first.

While these problems are in libpthread I believe that they can
not be solved without some help from the kernel.

Here are some problems:
 
1) The pthread__runqueue_lock holder runs out of upcall stack for 
   UNBLOCKED upcall.

   Example:
   A thread holding the spin-lock pthread__runqueue_lock blocks due
   to a page fault.
   When the thread blocks, sa_switch sends a SA_BLOCKED upcall.
   If the SA_BLOCKED upcall used the last upcall-stack the system
   will probably deadlock because:
	-  upcalls will block trying to acquire pthread__runqueue_lock
           and will not return upcall stacks to the kernel.

	- The thread holding the pthread__runqueue_lock spin-lock cannot
          resume because it needs an upcall stack.

2) Multiple page faults and the associated BLOCKED / UNBLOCKED upcalls.

   pthread__resolve_locks() runs through the interrupt queue trying
   to finish interrupted upcalls and spin-lock holding threads.

   Unfortunately BLOCKED upcalls break the interrupt chain.
   UNBLOCKED upcalls will later mend the interrupt chain.
   However multiple BLOCKED/UNBLOCKED upcalls can lead to a
   reordered interrupt chain.

   This can cause pthread__resolve_locks() to run a thread that needs to
   acquire a spin-lock that is being held by another thread on the chain.
   ( Deadlock)

3) UNBLOCKED upcall can overtake BLOCKED upcall.

    When a BLOCKED upcall (C1) for a thread (A) gets BLOCKED due to a page fault,
    the UNBLOCK upcall (C2) for thread (A) can resume thread (A) before an
    UNBLOCK upcall (C3) resumes the UNBLOCK upcall (C2).

    The UNBLOCK upcall (C2) will try to change state of thread A.
    Thread A might not even exist anymore.

Right now the only idea I have is normal sleeping (no upcalls) on page faults.

Perhaps by stealing the PC trick from restartable atomic sequences we can
limit disabling upcalls to the thread being active in libpthread ?
( Requires some ugly linking tricks to combine object files ? )

Any ideas on how to solve these problems would be appreciated.

Thanks
	Stephan