Subject: Re: Data modified on freelist...
To: Andrew Gallatin <gallatin@cs.duke.edu>
From: Chris G. Demetriou <cgd@cs.cmu.edu>
List: port-alpha
Date: 03/17/1997 13:18:12
>  > > On the same 500, I've also been seeing a few 'unexpected machine
>  > > check' panics.  This 500 has some, well.. suspect dimms & Digital UNIX
>  > > will occasionally kick out "Machine Check error corrected by
>  > > processor" messages when ECC kicks in & corrects a single-bit error.
>  > > A brief scan through arch/alpha/alpha/interrupt.c makes me think that
>  > > NetBSD might panic on the same sort of interrupt, is that true?
>  > 
>  > Wow, yes.  NetBSD just bites the dust when it takes an unexpected
>  > machine check, though for some it should just continue.
>  > 
>  > I could easily hack up a kernel that does the right thing, and send it
>  > to you to test if you'd like.
> 
> That'd be great!  I'll be happy to test it.

OK, i'll try to get this done sometime later today, but it may take a
couple of days.


> A few more things I should mention:
> 
> When running the snapshot kernel, I've gotten a few
> 	panic: pmap_enter_ptpage: PT page not entered 
> 
> panics.   The machine locks up solid after these (though it does claim
> to be syncing its disks, it needs to have its halt button pressed).  

Never seen this, but given how much the pmap code sucks i could
believe it.  Is this running on any kind of significant load?  Also,
how much RAM do you have?


> When running a kernel built from sources supped this morning (ignore
> the date mentioned below; I forgot to reset the date after rebooting
> from DU), I cannot get it to go multi-user.  /bin/sh dies part-way
> through /etc/rc:
> 
> pid 3 (sh): unaligned access: va=0x12010c43c pc=0x120000e90 ra=0x120000e90 op=ldq
> pid 3 (sh): unaligned access: va=0x12010ad2c pc=0x120000e98 ra=0x120000e98 op=ldq
> Mar 16 22:55:34 init: /bin/sh on /etc/rc terminated abnormally, going to single user mode
> 
> The first thing I was going to try was re-building sh from -current
> sources, but it doesn't make sense to me that it would work fine w/a
> week old kernel & die w/a brand-new one.

Right.  As noted in the 21164-based systems web pages:

<li> There are some minor 21164-related fixes that are not currently
     in -current. This makes it impossible to (re)build kernels from
     -current source.  For more information about these fixes,
     contact <a href="mailto:cgd@netbsd.org">Chris Demetriou</a>.


I'm working to get those changes in the source tree, but there are
some ... issues that have to be resolved before they can go in.

Send me private mail if you'd like a copy of them.


cgd