Subject: Re: netbsd-1-6 branch vs. recent esp(4) fixes....
To: NetBSD/sparc Discussion List <port-sparc@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-sparc
Date: 10/24/2002 12:55:09
[ On Thursday, October 24, 2002 at 13:13:45 (+0200), Martin Husemann wrote: ]
> Subject: Re: netbsd-1-6 branch vs. recent esp(4) fixes....
>
> On Wed, Oct 23, 2002 at 06:58:41PM -0400, Greg A. Woods wrote:
> 
> > Has there been enough experience yet to know if the more recent fixes to
> > esp(4) (i.e. sys/dev/ic/ncr53c9x.c et al) are well enough tested to be
> > pulled up to the netbsd-1-6 branch yet?
> 
> As a data point:
> 
> I've been using my U2 with tagged queuing enabled (and one single not
> yet commited patch from Andrey) for about a week now, beating on it
> pretty hard. This setup always lost when tagged queueing was enabled before,
> but now it did survive.

Yes, I'm fairly happy with the tagged queuing in the esp(4) driver too
on a sparc-20 clone with a pair of drives which both support tagged
queuing.

It's just that I have this less than perfectly reliable ST32430N (with
version 0510 firmware) in an external box which I use for the local
/var/obj and /var/packages-obj and it keeps taking itself offline when
under really heavy load (untarring big archives with big files, linking
really big programs such as the kernel, installing really big packages
such as mozilla, etc.):

sd1(esp0:0:1:0): esp0: timed out [ecb 0xf093e388 (flags 0x1, dleft 800, stat 0)], <state 1, nexus 0x0, phase(l 10, c 100, p 3), resid 800, msg(q 0,o 0) >

and of course then without the bus reset fixes it won't ever be co-erced
back online from the driver's point of view even though it'll re-probe
just fine from OFW.

Unfortunately when a drive goes offline like this with a process holding
files open on both it and the other drive then you can't do anything at
all with either drive (eg. unmount filesystems, etc.) so you just have
to drop to OFW and reboot and hope for the best.  That's why I'm hoping
the bus reset fixes let me bring the drive back online relatively
cleanly.

(I'm also hoping the bus reset fixes will allow me to hot-swap an SCA
drive in an SS5/SS10/SS20 so that I can build a more reliable server
using RAIDframe on the root drives.  Currently doing that has dire
consequences for the driver just like having a drive go offline.)

(maybe the real problem is that the ST32430N does have a tagged queuing
bug, but if so then it's not a complete failure -- just a rare one.... ;-)

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>