port-sparc: Re: Returned mail: see transcript for details

Subject: Re: Returned mail: see transcript for details
To: None <port-sparc@netbsd.org>
From: NetBSD list <netbsd@mrynet.com>
List: port-sparc
Date: 06/24/2001 20:58:26
Jim Bernard wrote:
>   OK, I did this, and the resulting kernel has now been running for 4 days
> without hanging, so I guess tagged queuing was the culprit in some way.
> 
>   I'm still seeing a few scsi parity errors during the execution of the
> /etc/daily script, though.  (But there were none when I ran the script by
> hand just after first booting the kernel.)

I have been running with the commented-out TQING code as well.  All is 
fine here with that modification on my Sparc 5 and Sparc 10.

Andrey Petrov had contacted me and requested that I try to log the
debugging output after setting the ncr controller to debugging mode,
but the disk hanging happens immediately after logging in on the
console.  So, I'm not able to provide any SCSI debug logging from here.
He indicates his Ultra has no problems, however, with the tagged queuing.

It seems that perhaps it is only an issue on the slower machines.

Additionally, a patch sent to the list by john heasley doesn't work
any differently for me either.

I have, however, submitted a report through sendpr.  

Are there any Sparc 5 or 10 (or 20) users out there actually running
the latest kernel code with no on-board SCSI problems?

-scott

> 
> On Wed, Jun 20, 2001 at 11:47:22AM +0100, David Brownlee wrote:
> > 	Could you try commenting out
> > 		xm.xm_mode |= PERIPH_CAP_TQING
> > 	in (I think)
> > 		/sys/dev/ic/ncr53c9x.c
> > 
> > -- 
> > 		David/absolute		-- www.netbsd.org: No hype required --
> > 
> > 
> > On Tue, 19 Jun 2001, Jim Bernard wrote:
> > 
> > >   Same here, also on a sparc 20.  With recent kernels I occasionally see
> > > scsi parity errors on one disk (can't seem to find a real hardware fault,
> > > though), and eventually the system just hangs.  Some things continue to
> > > work for a while after others stop working (logins via ssh seem to be one
> > > of the first to go, while sendmail continues to work much longer).  I also
> > > found that if I happened to be logged in while it was in this state,
> > > it would eventually just not execute commands, and after a bit more time
> > > even window focus changes would stop working.  The last working kernel
> > > I have is 1.5U from mid April.  The first one on which I observed the
> > > failure was 1.5V from May 29.  Unfortunately, that's all the info I
> > > have on the problem so far.
> > >
> > >   BTW: I noticed that the most recent working kernel shows tagged queuing
> > > rejected on all the disks, e.g.:
> > >
> > > sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST34555N, 0930> SCSI2 0/direct fixed
> > > sd0(esp0:0:0): max sync rate 10.00MB/s
> > > esp0: tagged queuing rejected: target 0
> > >
> > > whereas the problematic kernel I built June 16 shows it enabled:
> > >
> > > sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST34555N, 0930> SCSI2 0/direct fixed
> > > sd0: 4340 MB, 6300 cyl, 8 head, 176 sec, 512 bytes/sect x 8888924 sectors
> > > sd0: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers, tagged queueing
> > >
> > > I don't know whether this is related to the problem.
> > >
> > > --Jim
> > >