Subject: Re: esp failures on a 1+
To: Scott Bartram <scottb@orionsoft.com>
From: Erik E. Fair <fair@clock.org>
List: port-sparc
Date: 05/22/1997 13:31:51
At 6:20 -0700 5/22/97, Scott Bartram wrote:

>What conclusion do you draw from the dmesg output?

It would help a lot if you could look up those addresses in your kernel (at
least the "pc" values), and give the names of the routines on the stack.
You probably won't find those exact addresses in the nm(1) output from
/netbsd; the trick is to figure out the closest symbol whose value is less
than or equal to the "pc" value. Is there an obvious tool for this? Off the
top of my head, I'd

	nm /netbsd | sort > /tmp/symbols

and then walk through it for each value; for example, your final pc value
is somewhere in sonewconn1() in my kernel (of course, your kernel is going
to be different, this is why you have to produce the data here; if you were
running a precompiled GENERIC 1.2 or 1.2.1, anyone could do this for you).

As for the panic itself, I asked this list in January/February about "data
fault" panics I was getting on my Sun 4/330 (running SunOS 4.1.4) about
every three to five days after several years of reliable operation; I
didn't get too much response from here (I was hoping for a hardware wizard
more knowledgeable than myself to say, "Ah! It's foo!").

There was one panic where SunOS didn't simply say "data fault" but said
"memory parity error in CHIP foo" (two parity errors, same chip - first one
killed a process, and the second one killed the system). I finally
concluded that my hardware was simply getting old and trying to tell me
that it was gonna die, so I replaced it (and that's another, in progress,
story! Can you say, "panic: pv_unlink0"?).

I gave my 4/330 to a friend who expressed interest in it (even in its
questionable state), and so I loaded up NetBSD-current as of the end of
April on an old disk for him. He's not seen this problem yet, but he hasn't
run the system much, or put real load on it yet. I also discovered during
the process of replacing it that it has a burned out keyboard serial port
(annoying; it means he can't use it for anything but a serial console).

	Erik Fair <fair@clock.org>