Subject: Re: -current very unstable on Ultra10 (fwd)
To: None <port-sparc64@netbsd.org>
From: Arto Huusko <arto.huusko@utu.fi>
List: port-sparc64
Date: 10/30/2004 12:34:20
On Sat, 30 Oct 2004, Gert Doering wrote:

> Typical problems manifest like this (on the serial console):
>
>
> pmap_page_protect: pseg empty!
> kdb breakpoint at 12816a4
> Stopped in pid 24367.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
> db> cont
>
> pmap_page_protect: pseg empty!
> kdb breakpoint at 12816a4
> Stopped in pid 15862.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
> db> cont
> hme0: status=30001<GOTFRAME,RXTOHOST,NORXD>
> pmap_page_protect: pseg empty!
> kdb breakpoint at 12816a4
> Stopped in pid 15862.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
> db> cont
> pmap_page_protect: pseg empty!
> kdb breakpoint at 12816a4
> Stopped in pid 15862.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
> db> cont

I saw things like this on Ultra5 also. I'd say that the machine
is completely unusable. (I've been meaning to try 2.0, but haven't
got around to it yet)

But more than just these crashes, the system seemed very unstable
otherwise, too: userland programs kept crashing, but more or
less predictably to rule out faulty memory. For example sh
crashes always in the same place during perl 5.8 config, but
the core gives slightly different traces each time. (oh, and I know
there was a patch to sh which introduced crashes, but I have the
fixed version) Another interesting feature of perl-58 build is that
it loops; once it has built, it starts to configure all over again, etc.
This could just be a pkgsrc problem, though...

> I'm a bit surprised at this.  I have a number of other ultras (2x U5, 1x
> U10) running NetBSD 2.0 (not -current), and all of them have survived
> at least one "build.sh -x" without crashing.
>
> OTOH, the machine in question has an original Sun 4 G hard disk, which
> seems to be the slowest piece of hardware ever built.  The U5s have
> recent IDE disks (non-Sun, larger & faster).  Maybe this is triggering
> something in the IDE subsystem?

Hmm. That's an interesting idea. My other problems with -current
on U5 included the fact that the built in hard drive (not likely
Sun original; I have a 8G seagate medalist, IIRC) would not work.
I was first baffled by huge amounts of random userland program
crashes, until I tried doing this:

cd /usr/bin; for f in *; do cmp $f /orig/build/dir/usr/bin/$f; done

Where /orig/build/dir is on NFS. And the result was that most of
the installed  binaries differed from the originals I built. And if
I booted the  machine, and did the above again, different binaries
were corrupted and in different places. So the disk exhibited
completely random read errors, silently.

However, a different, newer hard drive (which I put to the same
IDE channel, though) did not exhibit these problems. The kernel
still stayed unstable, no matter which disk I had attached.

I'm wondering whether the built in IDE disk even is faulty.
If it were that faulty, shouldn't the IDE subsystem be reporting
CRC errors etc. instead of just silently (and randomly) returning
wrong data?