Re: revivesa status 2008/07/09

To: Andrew Doran <andrew%hairylemon.org@localhost>
Subject: Re: revivesa status 2008/07/09
From: Bill Stouder-Studenmund <wrstuden%netbsd.org@localhost>
Date: Fri, 11 Jul 2008 11:13:31 -0700

On Fri, Jul 11, 2008 at 12:47:45PM +0000, Andrew Doran wrote:
> On Thu, Jul 10, 2008 at 11:39:12AM -0700, Bill Stouder-Studenmund wrote:
> > 
> > That's a separate discussion. The goal of the revivesa branch is to 
> > implement support for the Scheduler Activations system calls used by the 
> > SA libpthread in NetBSD 4.x.
> 
> It has to be said, I have not seen a convincing explanation as to why this
> is desirable or in the best interests of NetBSD as a product.
> 
> SA threading in NetBSD has serious problems and drawbacks. For example:
> 
> - it works only on a handful of architectures, eg x86.

I have heard conflicting reports about this. We have architecture-specific 
code for all architectures (unless we added a CPU arch since 4.0). The 
only ones that I know are questionable are the sparc ones because no one 
has gotten around to fully fully fixing register window saving. Chuq made 
it better a few years ago, but I think there still may be issues. 
Specifically, I think register window saving happens to a process-wide 
area, not a per-thread (or per-vp) one.

I've herad people say it only works on x86, but no one has come up with
test cases about how it fails. As time permits, I've been responsive to SA
bugs once a test case shows up, but I need a way to poke and prod at the
problem first. Also, I may need help from others. I'm really good at C
level, and I knew 68k assembler once, but I'm at a loss at the asm level
of other CPUs.

Also, any tests that failed need to be tried again as the first bug fixed
in fixsa (about allocating memory in ltsleep()) was REALLY REALLY bad,
would lead to lost interrupt issues and could really really mess things
up.

> - in most tests its performance is demonstrably inferior to 1:1.

Most tests we've seen. For this comparison, the problem I see with the
tests we've done so far is that they have not been on this SA. You've done
so much work on the kernel that we also need to take that into account; an 
even comparison needs to be between the two systems on the same kernel.

Also, the SA libpthread has not been optimized for speed the way you've
worked on the 1:1 libpthread. A number of the changes can probably be
transfered, but that'd be a post-5.0 project. So yes, it will most likely
not do as well. And the 4.0 libpthread never was heavily tested for 
concurrency. I've tested it some, but I would not be surprised if there 
are still bugs.

However I at least have not been pitching SA as the default libpthread, so 
I'm not sure why its performance matters much to this discussion.

Finally, you've tested the kind of apps, like database, that are tuned for
device-limited performance. To be honest, you (ad) have kicked butt in
this area. Thank you!! The problem I see with this comparison, however, is
that to make such an app work well, you need to reduce/remove contention.
Thread contention, however, is the case that SA is optimized for. So by
definition, you've moved the case to one that isn't optimized for SA.

I think the place where SA could do better than 1:1 is limited, but it
exists. I think it's an app that was written with the attitude of using
threads to solve all problems. And/or an app that has a lot of different
libraries pulled into it. If you're writing a library, an easy way to do
the things you need to do is to create your own thread or two, and have
them fix it all. Pull a number of these libraries into one app, and you
can have buckets of threads. Do this on a low # of CPUs (yes, less and
less likely as time goes by, but it happens), and you can end up in SA's
realm. Work patterns might not put you there (i.e. only a thread or two
stays hot), but they might.

> - it's completely unreliable, even opening the machine to DOS attacks.

How is it "completely" unreliable? At Wasabi, our iSCSI target used a
libpthread that was identical to shipping libpthread, and it was very 
solid.

Yes, both of us have been party to a bug about what happens when we run
out of stacks, and that can serve as a DOS. That issue is however 
resolvable. My instinct is that we should just kill the program at that 
point. Another (possibly parallel) option is to require a number of stacks 
have been given to the kernel before increasing concurrency - say 8 stacks 
per concurrency. If you don't give the kernel 8 stacks, no SA. Don't give 
it 16, no concurrency == 2, etc.

You're right that what we do in face of resource exhastion is something 
that we can do better. The code in the past has tried to keep the app 
going, but maybe we should just kill it.

If we want to add some sort of resolution to this issue as a requirement 
for revivesa, that can be done.

> - it has architectural, code quality and code maintenance issues.

It definitely had. I've tried to make it a lot better in that respect. To 
be perfectly honest, it was hideous before. There are more words in this 
EMail than there were comments in the code. I should go through and add 
more comments, and I'd like to add a description of thread life-cycles, so 
there's even more. If there's something you'd like described more, please 
let me know.

I know you have thoughts on how signal handling should be different, which
I'm still missing part of. Other than that, given that we have chosen SA
by virtue of using it, I'm not sure what architectural issues remain.

> - it completely lacks any kind of real-time support.

So? We have a perfectly excellent real-time support in the 1:1 threading. 

Adding real-time support would require one of the two closed-source 
real-time SA libpthread modifications that are out there. We however don't 
have them.

While I would love the SA libpthread to be 100% feature compatible with 
the 1:1 libpthread, I have not had the time to do that. That's why 
revivesa was very specifically targeted at adding systemcall support.

> In its current form SA threading is a regressive proposition. Even if all
> the remaining issues are addressed, what benefits would it bring over and
> above 1:1 threading?

I think that's the flaw in your argument. I'm not (nor am I aware anyone 
else is) arguing that SA will be "over and above" 1:1 threading. You're 
right, such an argument would fail.

Yes, such arguments were made before when we went to 1:1 threading. But 
1:1 threading has won the day as the default, and there is no changing 
that.

Some of the things SA _will_ bring us are:

1) We've never just flat-out pulled the plug on shipping system calls the 
way we did with SA. Given where we were and where we wanted to go at the 
time, I think pulling the plug was the right thing to do. However we now 
are in a position to reconnect SA. So I thnk it would be very fitting for 
us to do so.

We've really prided ourselves on backwards compat, so I think this is a
strong reason for adding SA. We've kept crazy amounts of backwards compat
in the past, and been proud of it, like how the mount command from 0.9
NetBSD could still work (i.e. talked about it as a strong point of our
OS). To be honest, once I get SA as a kernel option that can be turned
off, I think this point alone is reason enough to add it into -current.

2) We're a research OS, in addition to everything else. I've spoken with 
another developer who's interested in experimenting with upcalls as a way 
for a driver to make calls into a support daemon. Is this the only way to 
do it? No. Is this the way this developer's interested in doing it? Yes.

(Reason 0, which was me learning about our new kernel as a function of 
this work, has been achieved and doesn't need SA to be merged to complete)

Take care,

Bill

Attachment: pgpwiEnizQqI7.pgp
Description: PGP signature

References:
- revivesa status
  - From: Bill Stouder-Studenmund
- Re: revivesa status 2008/07/09
  - From: Bill Stouder-Studenmund
- Re: revivesa status 2008/07/09
  - From: Andy Shevchenko
- Re: revivesa status 2008/07/09
  - From: Bill Stouder-Studenmund
- Re: revivesa status 2008/07/09
  - From: Andrew Doran

Prev by Date: Re: Large raid sets and large disks in those raid sets?
Next by Date: Re: revivesa status 2008/07/09
Previous by Thread: Re: revivesa status 2008/07/09
Next by Thread: Re: revivesa status 2008/07/09
Indexes:

Home | Main Index | Thread Index | Old Index