Subject: Re: the 'hanging' mount...an update
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Matthew Jacob <mjacob@feral.com>
List: port-alpha
Date: 04/07/1999 01:22:52
On Wed, 7 Apr 1999, Jason Thorpe wrote:

> On Tue, 6 Apr 1999 18:44:13 -0700 (PWT) 
>  Matthew Jacob <mjacob@feral.com> wrote:
> 
>  > FWIW, the same hang occurs on an Alpha 4100.
>  > 
>  > Since I haven't heard from the most of the people who may have made a
>  > delta that caused this, I guess that I'll have to look further at the
>  > problem (or convert these systems to Linux or FreeBSD).
> 
> "What problem?"  You haven't so much as even sent a bug report about it.
> In fact, the other message I read just minutes ago was the first report
> of any problem on an Alpha 4100 or 8200 (for which _you_ wrote the
> systype specific support code, I might add) I have seen on this list.

I guess you've missed most of the email on this? 

It's the 'Hanging mount' problem. It's not an 8200 or 4100 specific issue-
although I'd like to point out that I've done little to touch either the
8200 or 4100 specific code in some months and am some saddened to see that
at least the 8200 somehow is now broken despite it's ready availability
for sanity testing to a  number of people involved in the alpha port.
C'est la vie.

I didn't send-pr things like this because:

	a) it's right at the fork- a lot of people might be paying
	attention to the current state of affairs, and PR's are to be
	reported against *releases* (I would think).
	b) the pr's I file tend to not be responded to (remember the
	ncr bug? I've filed a couple of other alpha related ones that
	seemed to languish too). Maybe I don't file good enough reports-
	but the lack of response doesn't encourage me to file more..
	c) I'm assuming that I'll have to look at it since the for
	the diffs I published between working and non-working kernels- the
	authors of said diffs declined to respond- or have been busy doing
	other things- whatever...

This is the same problem that's been chatted about in port-alpha for a
week- some people see it, some don't. You apparently don't have the
problem with the exact same alphastation I do. It's not a 4100 specific
problem- I was trying a couple of different machines before settling down
to really try and track the problem. I installed the last port-alpha
snapshot on an 8200 (mother) so that I could try a user space quite recent
to avoid
all the pollution problems of a constantly updated system- only to have it
get this wierd vector trap when I typed something at the console (so
something got broken for the8200 - surprise surprise, I guess I'll have
to figure out what got broken there)- then tried a -current kernel on a
an alpha 4100 (brunner) and voila, it hangs in /etc/rc too..

So, to resummarize what *I've* seen about this is that it isn't a problem
in mount, per se- it's a problem where subshells in /etc/rc execute-
commenting out the 'critical filesystems' mount line just causes a hang
further down. I've found this problem on:

	433Mhz	Alpha PC164
	500Mhz	Alpha 4100
	500Mhz  AlphaStation 600

and it is isolated to kernel changes made sometime around 3/26 through
3/27. I published a list of diffs- and got the dates somewhat wrong for
the diffs, but the diffs are still valid. Mostly a wad of UVM and NEW_PMAP
changes after factoring out things that likely couldn't have made a
difference.


> If you could provice some actual details, perhaps someone could look into
> it.  "Pukes all over its shoes" isn't exactly descriptive.

It's the same thing that's been mentioned in human chat (not ICB) for some
time- mostly for the 4100, separate problem from the hanging mount- when
you drop back to the PROM to reboot the prom blows up- you and I and Dr.
Bill have speculated that this is likely because of the VM scroggling that
we do. It's possible that the 8200 breakage Something probably ought to be
done- maybe like Linux does where re-entry to the prom restores complete
prom (and main iobus) state.

In general the 4100's and the 8200's have been having spots of trouble
with the new VM code for some time- I thought I filed PR's about it (I'm
pretty sure) and mentioned it- you know, things like saying "Hey, Jason-
mother just while sitting there halts with a "kernel stack not valid"
halt...whaddya think?"- so if it comes as a big surprise to you that there
might be problems- well, sorry, fella- I thought you knew...

Anyway- my mail to port-alpha about this was mostly an update- I wasn't
asking for help at this point- I was just mentioning it in case anyone is
still vaguely interested. I've provided plenty of details in other email
about this issue- I said I was holding off on looking into it because I
wanted the authors of the kernel changes which *may* be involved to check
it out 'coz they're definitely more on top of that area of the code than I
am. If there isn't a response, I'll go look at it to fix I guess because
none of the machines available to me are usable in NetBSD with this now.

-matt