port-pmax: Re: details of panic with binary-1.1-alpha2

Subject: Re: details of panic with binary-1.1-alpha2
To: None <gord@enci.ucalgary.ca, port-pmax@NetBSD.ORG>
From: Arne H. Juul <arnej@pvv.unit.no>
List: port-pmax
Date: 11/21/1995 01:39:09

 > I've downloaded Arne's binary-1.1-alpha2 kernels, and am including
 > stack traces from the panic I get:
 > 
 > I did "boot 5/rz1/rz1netbsd -a".  This panic is repeatable, with the
 > same addresses listed by the stack trace.

Good (or rather, deterministically bad :-)

 > stacktrace_subr+4c (arguments omitted)
 > stacktrace+18
 > trap+928
 > MachKernGenException+78
 > 800ae148+630 

This is inside ffs_alloccgblk. I *think* it's on or near line 998,
after the call to ffs_clusteracct, at least that what I get out
with disassembly on the kernel.  I could probably make a kernel
with debugging symbols if you want to try that.

 > 800adc88+150
ffs_alloccg

 > 800ad6f8+40
ffs_hashalloc

 > 800ac63c+350
ffs_realloccg

 > 800b0570+368
ffs_balloc

 > 800b4a08+2e0
ffs_write

 > 80065040+140
vn_write

 > 80046df4+1bc
sys_writev

 > trap+6ac
(probably the system call trap routine).


I don't really know what is going on, but here are some
speculations:
a) Maybe there's a problem with clustering.  Jonathan once recommended
doing:
	sysctl -w debug.doclusterread=0
	sysctl -w debug.doclusterwrite=0
You could always try these to see if it helps, can't hurt.
b) Maybe the asc (scsi) driver is returning bad data sometimes, so
the filesystem routines do something wildly wrong.
c) Maybe the filesystem is trashed in such a way that fsck doesn't
notice/repair it.
d) A bug somewhere else entirely is trashing filesystem data structures.

  - Arne H. J.