Subject: Re: SA, libpthread and page faults
To: Bill Studenmund <firstname.lastname@example.org>
From: Christian Limpach <email@example.com>
Date: 08/24/2003 21:32:19
Quoting Bill Studenmund <firstname.lastname@example.org>:
> > How about checking if the stack that the lwp which page faulted ran
> > on is an upcall stack?
> I think the problem with this is that there are times when the page
> fault happens in code that is not on an upcall stak yet has locks
> held such that an upcall will block.
I have extended my page fault detection code to detect the 2nd page fault
on the same pc/sp. I have also changed my code which detects page faults
on upcalls because the code I posted earlier didn't work right when a
second upcall was generated before the first one got to the point where it
Next, libpthread can't handle threads which block with locks held. The
UNBLOCKED upcall will trigger asserts which prevent threads which hold
locks from being put on the runqueue. I deal with this by making the
BLOCKED upcall end immediately and reschedule the interrupted thread.
There's no need for an UNBLOCKED upcall for such a thread since it
continues as if it was never interrupted. There's not much else libpthread
can do about this since the blocked thread is holding locks.
I have implemented rescheduling the interrupted thread through a new
syscall (sa_unblockyield) which removes the L_SA_BLOCKING flag from the
lwp to unblock and puts the blocked lwp back on the vp and ends the lwp
on which the upcall was running since there's nothing more for it to do.
The syscall doesn't return. It also recycles the upcalls stack.
Rescheduling the interrupted thread is complicated by 2 things:
- the UNBLOCKED upcall which should be prevented can happen while the
BLOCKED upcall is still running. I deal with this by delaying UNBLOCKED
upcalls until the BLOCKED upcalls have finished. A BLOCKED upcall has
finished when the stack it was running on has been returned to the kernel.
UNBLOCKED upcalls are delayed until the stack for the BLOCKED upcall has
been returned to the kernel.
- the BLOCKED upcall can be interrupted by some other upcall. It will be
continued by the interrupting upcalls resolve_locks dance. In this case
the sa_unblockyield syscall has to end the original blocked lwp and
return. The BLOCKED upcall can then be recycled after it puts the
interrupted thread into its pt_switchto which the interrupting upcall will
resume once the BLOCKED upcall switches back to the interrupting upcall.
The double page fault prevention ensures that the interrupted thread will
continue when the interrupting upcall switches to the interrupted thread.
The syscall detects if the BLOCKED upcall was interrupted by comparing the
lwp id of the lwp where the syscall is running (l->l_lid) with the id of
the lwp where the upcall was running (sas->sa_id, passed to the syscall
My test case is running xmms and a programm which grabs all available
memory every 10 seconds. With the patch this runs for between 10 and 30
minutes when xmms will eventually crash with signal 11. I haven't yet
found the cause for this. Without the patch xmms crashes after 10-40
The patch only includes support for i386:
- arch/i386/include/mcontext.h: define _UC_MACHINE_PC to get the pc from a
- arch/i386/i386/trap.c: flag page faulting lwp's with L_SA_PAGEFAULT
Other archs won't compile because _UC_MACHINE_PC is not defined.
- detecting that a page fault was on an upcall: The code assumes that the
initial sys_sa_stacks will include all the upcall stacks ever used and that
the stacks are in a contiguous space. This is the case with libpthread.
- unused lwps are lwp_exit'ed: they should probably be put in the cache.
The patch is available at:
You'll need to make includes in sys/sys and include and build/install a new
libc and libpthread.
Comments are appreciated.
Christian Limpach <email@example.com>