current-users: the path from nathanw

Subject: the path from nathanw_sa -> newlock2
To: Current Users <current-users@NetBSD.org>
From: Aaron J. Grier <agrier@poofygoof.com>
List: current-users
Date: 02/12/2007 00:00:21

On Sun, Feb 11, 2007 at 11:03:47PM +0000, Matthias Scheler wrote:
> I don't know all the details. What I can remember is:
> 1.) Our M:N implementation never worked properly on multiple CPUs.
> 2.) It didn't work very well on certain platforms e.g NetBSD-sparc
>     and NetBSD-sparc64.

both of which could've been addressed, no?

> Is it really surprising that we got there? Solaris and Linux both
> moved from M:N to 1:1 because M:N didn't work very well.

what surprises me is that instead of fixing a known list of issues
discovered over the last couple years, the choice was made to trade
everything out with a new codebase with new undiscovered bugs.

what surprises me is the seeming paucity of discussion regarding such a
significant change.  I'm fine with not being a part of it, but the place
I expect to see discussion (tech-kern) doesn't seem to have any.  I'm
simply trying to figure out how we got from M:N to 1:1.

dropping sendmail from base was a similar "surprise bomb" but at least I
could trace the discussion leading up to it, and since both alternatives
were in-tree for years, postfix was hardly an unknown quantity when the
changeover was made.

> One of the problems with M:N in Linux is e.g. that a userland thread
> can be running on a kernel thread which was previously used to lock a
> mutex  by another userland thread which is currently not running. And
> if the active  thread now tries to lock the same mutex the application
> will die because of a recursive mutex entry.

it's my understanding that this is heavily implementation-dependent.
the problem has been demonstrably solved in other OSes by allowing
recursive mutexes.

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
              "silly brewer, saaz are for pils!"  --  virt