Subject: Re: Question about sa_upcall_userret() and sa_makeupcalls()
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 05/18/2007 11:42:28
--lCAWRPmW1mITcIfM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, May 18, 2007 at 02:52:31PM +1000, Daniel Carosone wrote:
> On Thu, May 17, 2007 at 09:23:12PM -0700, Bill Stouder-Studenmund wrote:
> > What I'm not sure about is what actually causes the upcalls to get=20
> > processed. I see where we set the stack pointer and pc in the frame on=
=20
> > return. So I readily see how ONE of these upcall stacks will get run. B=
ut=20
> > what causes any others we generate to also run?
>=20
> My murky understanding:
>=20
>  Each of the upcall stacks is in a different libpthread thread, and
>  the libpthread user-level scheduler is responsible for noticing these
>  threads need to run and process an upcall.

Ok! That makes sense. Also, there's a bit of cleverness in the code. When=
=20
we block, we deliver all pending upcalls. We walk the list from front to=20
back. We also add the upcall for blocking to the end of the list, so it=20
will be the last one written to userland. That one is the one that=20
actually gets run when the upcall-delivery thread goes to userland. Thus=20
it gets delivered first.

>  If the upcall in question tells libpthread that the lwp has been
>  blocked, the upcall stack doesn't get recycled to process more
>  upcalls until the lwp later unblocks.

I don't think that's fully correct.

There are two different stacks involved. There's the stack of the thread=20
that was running, made a system call, and got blocked, and then there's=20
the stack for the upcall telling libpthread about the blockage.

The blocked thread's stack does stay blocked until another upcall tells
libpthread that it can run. However the upcall stack is only active while
that upcall's being delivered. In the case of a blocked upcall, we do very
little work. Look at pthread_sa.c:pthread__upcall(). We gather some data
from call parameters, we find what was interupted, we play with stack
generations, cancel the BLOCKED thread if needed, check to see if we got
interrupted and need to exit to another upcall (that's what self->pt_next
is about AFAIK) then go into the libpthread scheduling code.

After that, the upcall stack is not needed and can be re-donated to the=20
kernel.

So the window of difficulty is much smaller.

>  That later unblocking requires another upcall to be made. Although we
>  send unlocked events in a batch, we may have run out of upcall frames
>  to send that batch to, before something unlocked.
>=20
> This leads to the problems described in
>  http://mail-index.netbsd.org/tech-kern/2005/01/02/0001.html

I agree that running out of stacks is VERY VERY bad. As in we may just=20
want to kill the app. :-)

We let libpthread offer 16 upcall stacks to the kernel times the max=20
concurrency level. I think 16 upcalls per CPU is a lot.

One thing I hadn't realized is that each upcall stack we offer to the
kernel is the stack of a thread in libpthread. So we have up to 16 threads
just for receiving upcalls (I hope we both allocate the max and also make=
=20
sure we scale it w/ concurrency).

My understanding is that when we deliver upcalls, we fill a stack with an=
=20
upcall (or all the unblocking ones for an unblocking upcall). The act of=20
doing this makes it such that libpthread knows it should run said thread.=
=20
We then cause one of them to be run. When this thread finishes whatever=20
it's doing, it will switch to another thread. My understanding is that it=
=20
will prefer to run one of the other upcall-receiving threads. So upcalls=20
should get processed quickly.

The problem I can see is that bad things can happen if we were in a=20
critical section of code & triggered recovery code (the self->pt_next=20
stuff).

I can see a potential problem in that we can try to deliver upcalls (fill=
=20
upcall stacks up) even when libpthread isn't in a position to deal.

I'd love a reproducible test case to run into this blocking, so we can=20
figure out what's really wrong.

One thing I wonder about is if we want to add a rate limiting to upcall=20
generation. For instance, if we are running low on stacks, maybe we should=
=20
stop delivering unblocked upcalls. If libpthread is having trouble getting=
=20
us stacks back, it's probably busy doing something in upcall-handling=20
code. So let's let it finish what it's doing. Note: I am naively assuming=
=20
that libphread will be proactive at getting us stacks back.

My understanding is that we will deliver upcalls whenever we return to=20
userland. Thus an interrupt can trigger upcall delivery when we go back.=20
While we batch unblock upcalls as much as possible, it could be that=20
hardware hands up a stream of interrupts that each unblock a thread or=20
two. Further, it could be that they are spaced just so that they come=20
right when we've started an upcall processing. Like in the first few=20
instructions of pthread_upcall(). We could then use up all our stacks=20
delivering UNBLOCKED upcalls, not giving libpthread time to acutally=20
process the info.

As above, I wish I had a good test case to repro this so I can be sure=20
that's what's wrong. I'd rather we not fix non-problems. :-)

> I can't recall what was done to resolve this issue, if
> anything, between when the message was written and when SA was
> removed (or, more significantly, when -4 was rebranched).

I'm not sure about the deadlock described in that post. I'll add it to the=
=20
look-at list.

> Two suggestions were made in the reference email.  A third possibility
> might be to rework the userlevel scheduler, such that the stack that
> receives the SA_UPCALL_BLOCKED upcall can be recycled immediately,
> rather than holding on to it until the LWP unblocks.  I have no idea
> how the libpthread scheduler would be affected by this, or why it
> behaves as it does in the first place, though.

I think that we already have the ability to recycle upcall stacks quickly,=
=20
so I think the problem is something else.

Another thing I'll add to the look-at list is see how we decide when we=20
want to hand stacks back to the kernel, and more importantly if we=20
consider how many we have to deliver or how many the kernel has waiting.=20
I'm not sure. It could be that we still want to wait, we just want to=20
finish processing the pending upcalls we have.

Take care,

Bill

--lCAWRPmW1mITcIfM
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGTfOUWz+3JHUci9cRAj5EAKCCTAk154zjQFQFuN+HENGVFdd+9ACeL5Rj
nYT6+bL9b9WtLVTcY14z5IM=
=pOzK
-----END PGP SIGNATURE-----

--lCAWRPmW1mITcIfM--