Subject: Re: 'hanging' mount problem 'fixed'? Forklift Upgrade....
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Matthew Jacob <mjacob@feral.com>
List: port-alpha
Date: 04/07/1999 21:02:35
> 
>  > My kernels were always really current too. It's the user space bits that
>  > have hit the problem.
> 
> Did you preserve those user space bits, or pinpoint EXACTLY what they were
> doing differently that caused them to lose?

See other mail.

> 
>  > (you done with brunner then?)
> 
> No, actually.  I'm sick as a dog (NASTY cold, unfortunate timing), and
> caught the early express train back to San Francisco, where I am currently
> parked on the sofa with a hot cup of tea and a warm blanket, listening to
> KQED.  After I get a bite to eat, I'm gonna huff down to Cala and get a
> bottle of Nyquil.  I am taking a sick day tomorrow :-)

Sorry to hear you're unwell. I was certainly this way a week or so ago.



> 
>  > > ...well, that's ... unclear, considering that you never pinpointed exactly
>  > > what the problem was.
>  > > 
>  > 
>  > True- I *did* pinpoint the source changes that caused it- and I *did*
>  > narrow it down to subshells within /etc/rc. If it had been *just* me, I
>  > would just write it off with an apology, but because at least one other
>  > person had the issue, I'd just be a bit uneasy about COMPAT_13.
> 
> No you didn't... Because you never commented out the part of pmap_enter()
> that makes the rest of those changes do anything and see if the problem
> still exists, i.e. if that was actually the problem.  You didn't pinpoint
> the system call or fault or whatever that caused the hang to occur.  Note
> that the other person that observes this can get it to happen without
> /etc/rc subshells.  In other words, all you did was find "evidence" to
> support your theory, but you didn't actually provide anything to make
> me believe that your theory is in any way correct (especially considering
> that I wrote a very large portion of the Alpha pmap module, and know
> precisely what the change you're pointing at does and how it works).

It was in email about a week ago I asked for someone who *does* know this
area to look at it and comment. I *did* say I didn't know the area of code
well. I really didn't think through about how to test it, and wasn't going
to try and decipher this and get up to speed on all of this because I have
other stuff to work on. I was upset because I didn't get anything about a
sarcastic comment from you starting today. I was sitting down trying to
read the code when you dropped by and said you wanted to look at the
problem (whereupon I relinquished one of the machines right away).

> 
> The point I'm trying to make here is that you're getting upset as "us"
> for not dealing with the problem, when we have no real information to
> go on because we can't reproduce it, and you haven't provided any actual
> info other than "I think this is what caused it".. and now we'll never
> get that info because you added variables (updated your userland) and
> now can't get the problem to occur.  That's just nonsense.

Like my other email said, I have two other systems that have the problem.