Subject: -current on Ultra 5+ - now it's major siop0 lossage
To: None <port-sparc@NetBSD.ORG>
From: Greg Earle <earle@isolar.DynDNS.ORG>
List: port-sparc
Date: 01/25/2001 05:23:05
I'm not having much luck here, kids ...

First I tried the January 14th snapshot kernel.  I still get the rash of

hme0: invalid packet size 2048; dropping

errors as described previously, when using the Ethernet.  And I started
seeing these:

an 25 05:05:21 netbsd4me /netbsd: DMA IRQ: bus fault dma fifo empty, DSP=0xc0069fec DSA=0xc006df00: last msg_in=0x0 status=0xff
Jan 25 05:05:21 netbsd4me /netbsd: siop0: scsi bus reset
Jan 25 05:05:21 netbsd4me /netbsd: cmd 0x1a3ed00 (target 0:0) in reset list
Jan 25 05:05:21 netbsd4me /netbsd: cmd 0x1a3e888 (target 0:0) in reset list
Jan 25 05:05:21 netbsd4me /netbsd: cmd 0x1a3ed00 (status 2) about to be processed
Jan 25 05:05:21 netbsd4me /netbsd: cmd 0x1a3e888 (status 2) about to be processed
Jan 25 05:05:21 netbsd4me /netbsd: siop0: target 0 now synchronous at 20.0Mhz, offset 16

every once in a while, on disk transfers over the SCSI bus.  (Actually, I
checked my messages logs, and I've gotten these ever since using the vanilla
1.5 GENERIC kernel, but back then it only happened once or twice per each
reboot cycle.)

While booted in this mode (1/14 snapshot kernel, 1.5 userland) I decided to
download the -current src and build a completely new -current kernel, with
_HME_NDESC bumped up from 32 to 128.  (Unlike Simon Gerraty, I didn't need
a new "ld" to successfully build this new kernel - why not?)

I booted with this new kernel, and the good news is that the previous errors
with the Ethernet register displays ("<GOTFRAME,...>") have gone away.

However, the rest of the news is bad - the "invalid packet size 2048; dropping"
errors are still there, and just as bad as ever; and what's even worse, the
"DMA IRQ: bus fault dma fifo empty"/"siop0: scsi bus reset" errors shown
above have gone from occasional/annoying in the January 14th snapshot kernel
to a veritable torrent in the new one - at every sync or disk access (or so
it seems), I get a rash of these errors.  The machine is basically unusable
now, as a result.  (I also had earlier tried building a 1.5 custom kernel
with the _HME_NDESC changes, but that wedges up solid during the boot cycle.)

I did what you asked, Eduardo; I upgraded (well, my kernel anyway) to -current.

What now?

	- Greg

P.S. If there's a better place to discuss/debug this (even off-list privately),
     please let me know.