Subject: Re: libpthread
To: Todd Vierling <tv@pobox.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/25/2003 10:07:03
On Wed, 25 Jun 2003, Todd Vierling wrote:

> On Wed, 25 Jun 2003, Jason Thorpe wrote:
>
> : I think it's important to remember that Solaris's M:N implementation
> : was particularly poor; it did indeed suffer from performance problems,
> : and a host of other issues.
>
> "Sounds familiar."
>
> : They were, however, artifacts of their particular implementation of
> : M:N, not properties inherent to the M:N strategy.
>
> So what?  *NetBSD*'s pthreads project has been how long coming with
> completely unstable (forget even thinking about performance yet!) results?

Todd, please: 1) calm down, and 2) get your facts correct. Our current
pthreads implementation is not "completely unstable." I'm not saying it's
perfect, but it's also not "completely unstable."

The biggest problem with out pthreads project is that it's really only had
one person working on it, Nathan. Every other project has had multiple
people. I'm not saying everyone needs to be super kernel experts, but if
more folks were trying to figure stuff out, it'd help. Figure out what's
within reach, and try it. :-)

> You mentioned earlier in this thread that 1:1 implementation would be "more
> difficult."  I have quite a hard time believing that in the context of real
> LWP attachment (as opposed to clone(2)-based, for instance), since it means
> near zero need for userspace context jumping.  If the kernel is in charge of
> the context switches, the stability difference against the current approach
> *should* be quite noticeable.
>
> Now, for the purposes of having *any* kernel-assisted threading at all
> (since we're still stumbling while everyone else has jumped the hurdle and
> kept on going), the performance difference for 1:1, while known to exist,
> should not be alarmingly huge.  All previous attempts to benchmark in favor
> of either approach have been documentably slanted in favor of one of the two
> directions to show any appreciable performance gain.
>
> I'm quite the bit surprised that we didn't go the 1 thread to 1 LWP route
> before adding M:N as an additional feature.  Going 1:1 first would not have
> meant "doing it wrong" or "doing it the non-NetBSD way"; it would simply
> have meant "doing it in stages."  So why not insert a 1:1 code path as
> default now, to gain stability, and resurrect M:N in NetBSD 2.1?
>
> Compromise here, folks....

And what about the tone of this note says compromise? One-sided compromise
isn't compromise, now is it? :-)

You're assuming that at this point a 1:1 model would be easier to
implement that fixing the bugs we have now. I disagree. But let's chart
things out to see where we are. Issues:

	B	Scheduler is not LWP-aware
	?	SMP + SA/LWP uncleanliness
	?	Lennart's error case
	B	Buggy applications
	B	Thread stack sizing

	Things a 1:1 would need that it doesn't have

		gdb support
		libpthread
		kernel support for mutexes and condvars

The 'B' items above are ones that would apply to both a 1:1 lib and our
current lib. The '?' items are issues for the M:N lib, but may or may not
be issues for a 1:1 lib; since we don't have it we can't tell.

So from looking at the list, it looks like a 1:1 lib won't be the "quick
fix" everyone's thinking it is.

Take care,

Bill