Subject: Interesting panics on NetBSD/sparc 2.0 MP kernel.
To: None <port-sparc@netbsd.org>
From: Eric Schnoebelen <eric@cirr.com>
List: port-sparc
Date: 08/12/2004 11:01:57
Greetings all.

I'm trying to get my UUCP and mailing list server upgraded to
2.0_BETA (so I can make use of a pair of Ross 125's in the box.)
I'm currently running GENERIC.MP on a single processor system (to
shake out bugs and the like)

I'm getting interesting panics each night, when the nightly report
runs.  In particular, when find starts walking the filesystem , I
get a huge number of the following messages (on the order of 160
lines):

	sd0(esp0:0:1:0): unable to allocate ecb
	sd0(esp0:0:1:0): unable to allocate scsipi_xfer
	sd0: not queued, error 12   

followed by about 1100 lines of the following:

	sd0(esp0:0:1:0): adapter resource shortage
	sd0(esp0:0:1:0): unable to allocate ecb

and eventually panicing with the following:

	dev = 0x700, block = 40500, fs = /
	panic: blkfree: freeing free frag
	syncing disks... panic: cpu0: stuck on lock@f0f2d960
	Frame pointer is at 0xf030ce00
	Call traceback:
	  pc = 0xf0282640  args = (0x1, 0x5, 0x0, 0x0, 0xf030cf20, 0x1, 0xf030c
e68) fp = 0xf030ce68
	  pc = 0xf01a89b0  args = (0x104, 0x0, 0x126e242e, 0x35eb, 0xffff, 0x15
5830, 0xf 030ced8) fp = 0xf030ced8 
	  pc = 0xf000b050  args = (0xf000b058, 0x0, 0xf0f2d960, 0x1e8000e1, 0xf
0369000, 0x104, 0xf030cf40) fp = 0xf030cf40
	  pc = 0xf01cac64  args = (0xf0f2d960, 0xff, 0xffffffff, 0xa1a3b, 0xda,
 0x2fef, 0xf030cfa0) fp = 0xf030cfa0
	  pc = 0xf0262acc  args = (0xf0f2d958, 0x27b4fd, 0x1000000, 0xf026bb30,
 0xf85, 0x21009, 0xf030d008) fp = 0xf030d008

	dumping to dev 7,1 offset 2050090
	dump Async registers (mid 8): afsr=0<AFA=0>; afva=0x00
	cpu0: NMI: system interrupts: 100c0000<VME=0,SBUS=0,SC,T,M>
	memory error:
		EFSR: 10002<DW=0,SYNDROME=0,ME>   
		MBus transaction: fc64d30<VAH=0,TYPE=3,SIZE=5,C,VA=19,S,MID=0>
		address: 0x0f028e000
		module location: ?
	Type  'go' to resume

Now, I'm having a bit of a problem believing it's a memory error,
as the system was running NetBSD 1.6ZG (GENERIC) without a hiccup
for several weeks.

As I said, this machine is my mailing list and UUCP server.  It's
got mimedefang configured and running, and mimedefang makes use of
clamd and spamassasssin (installed from pkgsrc).  And the mailing
list manager is a hacked version of majordomo, so it's running perl
quite a bit.

I'm going to back down to the GENERIC kernel, but I want to help
get GENERIC.MP fixed too.  What else would be useful (and how
should I go about getting it? my knowledge of obp is limited.)

I've placed the console log for the system from Monday night at
ftp://ftp.cirr.com/pub/NetBSD/crash/ihnp4.log-20040809.  I'm
sure more will show up shortly.. :-(

	Thanks, 
		Eric

--
Eric Schnoebelen		eric@cirr.com		 http://www.cirr.com
        There is this special biologist word we use for 'stable'.
			It is 'dead'. -- Jack Cohen