Subject: Re: MP?
To: Havard Eidnes <he@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 01/19/2004 16:01:06
[ On Monday, January 19, 2004 at 17:58:20 (+0100), Havard Eidnes wrote: ]
> Subject: Re: MP?
>
> I've just done a successful "-j 2" of -current on my cs20 running
> NetBSD-current code from Dec 8 2003, and yes, this is also with
> two setiathome clients running at the same time.

That is very good news!

Any chance you can do something similar with sources or objects on NFS
just to test that possibility?  Or even run postmark over NFS?

> Well, it's not really a show-stopper for 1.6.2 on alpha either,
> since what's in 1.6.2 isn't a regression (i.e. "no worse") in
> functionality compared to what was in 1.6.1 or 1.6.

Can anyone confirm that's true?  Dave McGuire has mentioned that the
problem has been with us for at least a year by his reckoning, but I
don't know what codebase that would correspond to.  He also seemed to
hint that some prior release was stable.

I agree it probably wasn't 1.6 though -- there was a post here by
Stephen Jones back in the fall 2002 about very suspiciously similar
problems on a fresh 1.6 install:

    http://mail-index.netbsd.org/port-alpha/2002/11/10/0000.html

so it's not likely a regression for 1.6.2 to have similar problems,
regardless of how serious they are.  :-(

I think he says though that 1.5Z was stable, but I'm not sure.

> If someone is up to the task of digging out the relevant MD
> changes from -current to stabilize MP operation for alpha on
> netbsd-1-6, I'm all for it,

Me too!  and I can certainly do the testing....  but I don't know where
to start looking for relevant changes.

>   but my guess is that it's going to be
> difficult, as the changes may depend on other things (possibly
> MI) which may again depend on others etc. etc.  It may be that
> the number of associated changes would be so large that it would
> not make sense from a general stability point of view to apply
> them all to the netbsd-1-6 branch.

It may not be all that bad.  I would have thought MP on alpha should be
the most mature MP code in NetBSD and if I'm right to assume some
earlier version was stable then my guess is that some fairly "recent"
change was made somewhere in the kernel without considering the fact
that some platforms do already have MP support and the result is the
deadlock we're seeing.

Perhaps one way to track this would be to find a timeframe for when the
fix might have been made to -current.  If anyone who was tracking
-current across the time when it went from hanging to running stable,
and they can remember about when that was, then at least we could get
some diffs to peer at....

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>