Re: revivesa status 2008/07/09

To: Mindaugas Rasiukevicius <rmind%NetBSD.org@localhost>
Subject: Re: revivesa status 2008/07/09
From: Bill Stouder-Studenmund <wrstuden%netbsd.org@localhost>
Date: Thu, 24 Jul 2008 21:37:45 -0700

On Thu, Jul 24, 2008 at 10:57:16PM +0100, Mindaugas Rasiukevicius wrote:
> Bill Stouder-Studenmund <wrstuden%netbsd.org@localhost> wrote:
> > > Imagine a case of 1000 (new) pthreads which block - that would mean:
> > > 1000 * (LWP creation + SA context switch) operations. Plus, LWPs for
> > > VPs...
> > 
> > That would be the exact same LWP usage as a 1:1 threading model would 
> > give. The SA process spends the time creating the LWPs between blocking 
> > events while the 1:1 process created all of the same LWPs at initial 
> > thread creation time.
> 
> Not exactly. To create LWP when blocking (that is, switching the context) SA
> invents a lot of complexity, and hacks (eg. locking against order). Also,
> inventing the limits on such flow is harder.

It's not that complicated. Yes, we have to grab an additional lock in
sa_switch(). However "locking against order" is a bit of a stretch as I've
not seen a locking hierarchy of how you lock multiple sleepq locks at
once. (*) So if we can't lock the vp (and thus the lwp that we need to
wake up), we do the unlock/relock dance. We then make sure we haven't been
woken up in the process. The lock we're grabbing is one we should rarely
find locked.

(*) I've seen the locking ordering in kern_lwp.c, but my reading of it was
it speaks of one lwp lock then other locks, like runqueue locks, etc. The
specific issue here is our lwp is on a sleepq, so our lwp lock is a sleepq
lock. The sleeper thread we want to wake is also on another sleepq, so its
lwp lock is a sleepq lock. And I saw no global hierarchy of them. If there
were one, I'd have made it so that the lock we need to take is something
we could safely take w/o deadlock. Assuming I understood it correctly.:-)

Also, some aspects of "hackishness" in the code stem from the fact that
the rest of the kernel hasn't been changed much to better support the SA
code.  I know there are one place where I had to copy code rather than use
existing routines because they each were almost-but-not-fully what was
needed. The main thing is that the SA code wants to manage a pool of
threads, so it needs to be able to take other threads (not curlwp) and add
them to and remove them from sleepqueues. Or SA needs to do something in 
the middle of un-sleeping something on its sleep queues. So I have a 
"first half" and "second half" routine.

About limiting, there is code in the "start up the new LWP to report 
blocking" routine that handles failure to set the new thread up (this 
means getting a stack for the upcall we're delivering about blocking, 
allocating a new upcall data structure, and allocating a new LWP). If that 
happens, we put our upcall data structure back on the vp, mark the lwp 
that was the blessed one and blocked as the blessed lwp again, and put 
ourselves into the lwp cache. Simply put, we undo the upcall triggering 
and turn the blocking into a no-upcall block. So if the routine to make a 
new lwp fails, we will enter this. It will be slow and expensive, but it 
should work.

> > One other thing to consider is how long different context switches take. 
> > The two important ones are intra-process-same-space switches (inter-LWP in 
> > the kernel and inter-thread in SA userland) and user-kernel switches. When 
> > I was starting the Wasabi iSCSI target, I asked around before we used (SA) 
> > pthreads to implement this. I asked a number of NetBSD threading folks 
> > about this.
> > 
> > The answer I was given was that user-kernel switches are NOTABLY more 
> > expensive. Like 10x. Their numbers, not mine. So while SA is adding extra 
> > steps, they are steps that aren't the most expensive thing around.
> 
> But well.. what Andrew said - let's rather spend time optimising the context
> switch on such architectures like ARM - that would give overall benefit.

Do you really think we can get a 10x reduction in the amount of time it
takes to get in and out of the kernel? I'd love it if we could, but my
understanding is that the numbers I was given were things that people had
spent a fair amount of time working on before I asked the question, so
there is not much time to squeak out. The hardware is going to need a
certain amount of time, and there's only so much that can be done about
it.

Put another way, people don't complain about system calls on ARM because 
all the OSs are stupid, they complain because of how the hardware handles 
it.

Also, we're a volunteer project. As such, it's not clear to me that doing
one thing really means not doing another thing. Since I'm personally much
more interested in SA than in improving ARM context switching (and to be
blunt, I feel I have no skill at the latter), me doing this isn't holding
us back.

> > What I don't understand, though, is why we're discussing this issue like 
> > this. I don't see what the NetBSD kernel loses by having both 1:1 AND SA 
> > threading support. While the SA code is a fresh port, it is a fresh port 
> > of the NetBSD 4 code. So it actually is something we're familiar with as a 
> > project. People on this list have shown that SA does better on some work 
> > loads, and other people have shown (quite spectacularly) that 1:1 performs 
> > stunningly.
> 
> Bringing SA back invents more than 3000 lines of very complicated code. Why?
> 
> - To support specific backwards compatibility which we never actually
>   supported (see what Andrew and Jason wrote).

That's two voices, three if we add you. What did everyone else say? Almost 
all the other comments I've heard have supported this. We used to make 
exactly this promise, and we never clearly decided not to. It would be 
fine to ship if we had no alternative, but we do have an alternative.

Note, Jason's saying we should shift what our promise is about
compatability. That seems to me to clearly indicate what our promise used
to be. I think he's right that static libpthread is something that should 
be touched with care (and generally avoided), and compatability at the 
.so level may be a better thing to do overall. But that's not what the 
promise has been.

Also, please pay attention to the comments about the practicalities of 
this. Keeping a promise at the .so level is one thing if we support an 
update model like say Mac OS. In it, you update the whole OS, then carry 
forward using your existing apps linking against new libraries & living 
under the new kernel.

That's not what we've done in the past, and most importantly, it's not
what our users are used to doing. In the past, the new kernel would
support the old libraries. This makes a differnece in chroot environments
and in kernel-only upgrades.

Finally, we've tested the code. Yes, it's a port to our new kernel, but 
it's a port of code we've been shipping for three previous releases. We as 
a project have a feel for what it does and doesn't do well.

> - To support theoretical performance for some workload, where seems nobody in
>   this mailing-list can provide a prove-of-concept test application, or even
>   a reasonable SA benchmark. And no - "I saw a benchmark" or 5 years old
>   graph about NPTL, unfortunately, does not say anything...

Listen to the people who are saying this. A good number of them are people
that I have come to pay attention to on this list. If they bother to say 
something, it usually is worth listening to.

> Looks ironical. Especially when people arguing more from belief, instead of
> saying: "Hey, here is the example of real-world application which works with
> SA much better - let's try it!"

Conversely you can't _prove_ that there aren't real-world cases where SA 
would do better. :-)

> But again, the main thing which makes me upset is adding thousands of lines
> to improve few percent of theoretical cases. This breaks one of the main
> software engineering principles. I thought it is not the way NetBSD goes...

But it is. We've supported extreme amounts of backwards compatability. We 
have options to turn on system call compatability going back to NetBSD 
0.9. I've never NetBSD pre-1.2 (I think. It was late 1995).

Also, I think it'd be kinda exciting for us to be one of the few OSs able 
to support both threadings. :-)

I also wish you were not so upset by this. Your help has been invaluable, 
and your comments have led to a MUCH better SA than we would have had 
otherwise. :-) To be blunt, without your comments, KERN_SA would not be 
something we could be talking about integrating. Thank you for this 
assistance.

Take care,

Bill

Attachment: pgp0RG_TlYO64.pgp
Description: PGP signature

Follow-Ups:
- Re: revivesa status 2008/07/09
  - From: Matthew Mondor

References:
- Re: revivesa status 2008/07/09
  - From: Gary Thorpe
- Re: revivesa status 2008/07/09
  - From: Jason Thorpe
- Re: revivesa status 2008/07/09
  - From: Mindaugas Rasiukevicius
- Re: revivesa status 2008/07/09
  - From: Bill Stouder-Studenmund
- Re: revivesa status 2008/07/09
  - From: Mindaugas Rasiukevicius

Prev by Date: IPLs - One too many?
Next by Date: Re: revivesa status 2008/07/09
Previous by Thread: Re: revivesa status 2008/07/09
Next by Thread: Re: revivesa status 2008/07/09
Indexes:

Home | Main Index | Thread Index | Old Index