Subject: Re: what was wrong with m:n, was Re: newlock2 breaks arm
To: Bucky Katz <bucky@picovex.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 02/18/2007 12:05:49
--rS8CxjVDS/+yyDmU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 18, 2007 at 12:01:53AM -0800, Bucky Katz wrote:
> Bill Studenmund <wrstuden@netbsd.org> writes:
>=20
> > On Sat, Feb 17, 2007 at 09:55:12PM -0800, Bucky Katz wrote:
> >>=20
> >> Let me rephrase the question: If the only thing that changes was
> >> the implementation of primitives, why does m:n have to come out? On
> >> the other hand, if more changed, why do you think that just
> >> implementing the primitives is enough to fix an arch?
> >
> > I think that implementing the primitives will fix an arch for
> > compilation, thus the (implied) use of "just".
>=20
> Sorry, it's late in this timezone and I'm still confused but as I
> understand it you're saying that implementing the primitives will
> cause the arch to start compiling again, but that other changes are
> needed for 'compiles' to turn into 'works'?

Well, I guess the question is what is the goal?

Getting 1:1 threading on ARM working should happen once things are=20
compiling. So for 1:1, I think 'compiles' =3D=3D 'works'.

Getting m:n threading working will take a lot of work. It will take much=20
more than getting locks to work.

> > Well, the point of new-lock is to make the kernel away from big-lock.
> > So the threading code needing big-lock will not do.
>=20
> Changing the primitives doesn't get you that. You have to do a lot of

No, it doesn't. I did not mean to imply that changing primitives does.

However 1:1 threading is working on a number of architectures, as=20
indicated in the initial newlock2 EMail. So the situation is we have a=20
(new) form of threading that needs new primitives. So getting the=20
primitives working on an arch can well be all that's needed to get the new=
=20
threading working on said arch.

> surgery on subsystems to fix locking.  In fact, the _last_ thing I'd
> do as part of trying to make my locking more fine grained is to start
> by doing new locking primitives.

Well, what if you were setting out on file graining, and the only lock=20
primitives you had sucked? i.e. you did not feel they were sufficient for=
=20
the transition? That's the position we felt we were in, and which is why=20
Jason Thorpe started the newlock branch way back when.

All we had was ltsleep() and lockmgr(). ltsleep is, well, usable as a=20
base. But we want, AFAIK, to have different semantics. So wrappers that=20
map our desired semantics into ltsleep() semantics which then are mapped=20
into MD code is cumbersome. lockmgr() does a LOT. It's generally=20
considered too fat.

So to make fine-graining work, we need different locking primitives.

I agree that changing primitives should not be done randomly. But, for=20
what we wanted to do in the end, I think it was needed.

> > I wish we'd known that you might be in a position to help fix SA.
> > Things would probably been handled differently.
>=20
> As I understand it, things are still being handled poorly. I've heard
> through the rumor mill that private technical discussions about how to
> fix locking in ARM are being held and no one is bothering to include
> us. (This is not unexpected. I would have been _very_ surprised if the
> problem of having technical discussions with the wrong audience could
> be fixed that quickly. And, of course, the rumor mill could be wrong.)

I'm sorry, but it takes two to communicate. I agree we, the NetBSD
developer community, could do more to include others in discussions. But I
also have to question if you have been approaching this in a way that
really makes us want to communicate with you. Your pain and frustration
have been very clear. But I haven't seen a lot of, "Let's get this fixed,"=
=20
attitude in the messages so far. What would you do, were you in our shoes?

> > As to the long-term options for SA, I'd love it if someone would fix
> > it.  The system calls are still assigned, and the calls can be
> > re-added once there's a good implementation. I'm not 100% sure how
> > we'd live w/ two libpthreads, but I'm sure we can figure it out.
>=20
> Personally, I'd burn down SA. There are better ways to get to M:N
> threading.
>=20
> > So if you can get SA working, please do!
>=20
> At this point, I don't even have time to fix the problems that
> dropping m:n is causing us.

Take care,

Bill

--rS8CxjVDS/+yyDmU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFF2LGdWz+3JHUci9cRAj9hAJ9RlhCpuK6brSPCY/qxOhz+ev1QlgCfYVYl
sXIeVzyqgKCbCjNFrNBGys0=
=ET/d
-----END PGP SIGNATURE-----

--rS8CxjVDS/+yyDmU--