Subject: Re: ahc freezes
To: None <current-users@netbsd.org>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: current-users
Date: 12/21/2000 12:21:34
Conjecture: lfs and ffs+ubc are "faster" than ffs, so with ffs + small i/o
buffers the <QUANTUM, XP34550S, LYK8> (Atlas II) can cope with tagged
queueing, and the freezes are probably because the swap partition is on
that disk.

Only fact: switching off tagged queueing stopped the freezes.

Does the conjecture make sense? Is it a problem with the ahc driver that is
fixable, or a problem with the disk? (AFAIK LYK8 is the latest firmware) Can
one enable tagged queueing for one device and disable it for another on the
same scsibus?

Cheers,

Patrick

On Tue, Dec 19, 2000 at 05:29:13PM +0000, Patrick Welche wrote:
> I am having stange freezes which are hard to pin down. A 1.5L/i386 kernel of
> Wed Nov 29 16:52:19 GMT 2000 seems fine though I may just have not used the
> box enough to exhibit the freeze. With a kernel of yesterday and this
> morning, the box locks up solid with no error messages.
> 
> The computer has:
> 
> ahc0: interrupting at irq 14
> ex0: interrupting at irq 14
> 
> ahc0: aic7860 ahc0: dmamem for shared data at busaddr 8000 virt ca6f7000 nseg 1 size 768
> ahc0: dmamem for hardware SCB structures at busaddr 9000 virt ca6f8000 nseg 1 size 16320
> ahc0: dmamem for sense buffers at busaddr d000 virt ca6fc000 nseg 1 size 8160
> ahc0: dmamem for SG space at busaddr f000 virt ca6fe000 nseg 1 size 4096
> Single Channel A, SCSI Id=7, 3/255 SCBs
> ahc0: hardware scb 64 bytes; kernel scb 40 bytes; ahc_dma 8 bytes
> DISCENABLE == 0xffff00ff
> ULTRAENB == 0x85
> scsibus0 at ahc0 channel 0: 8 targets, 8 luns per target
> ahc0: target 0 synchronous at 20.0MHz, offset = 0xf
> ahc0: target 0 using tagged queuing
> sd0 at scsibus0 target 0 lun 0: <IBM, DNES-318350, SA30> SCSI3 0/direct fixed
> ahc0: target 2 synchronous at 20.0MHz, offset = 0xf
> ahc0: target 2 using tagged queuing
> sd1 at scsibus0 target 2 lun 0: <QUANTUM, XP34550S, LYK8> SCSI2 0/direct fixed
> 
> Once, just before it happened many sd1(ahc0:2:0): queue full messages appeared.
> Building a kernel with AHC_DEBUG greatly increased the time between freezes
> possibly because of having to write all the
> 
> sd0(ahc0:0:0): Handled Residual of 0 bytes
> sd1(ahc0:2:0): Handled Residual of 0 bytes
> 
> messages to /var/log/messages on sd0. But finally a hang when the following
> happened:
> 
> sd0(ahc0:0:0): SCB 1d - timed out in Data-in phase, SEQADDR == 0x110
> SCSIRATE == 0xf
> scb:0xc074a488 tag 1d control:0x6a tcl:0x0 cmdlen:10 cmdpointer:0x9760
>         datlen:4096 data:0x19b5000 segs:0x2 segp:0xff70
>         sg_addr:19b5000 sg_len:4096
>         cdb:28 0 2 9 9b d0 0 0 10 0 0 0
> sd0(ahc0:0:0): BDR message in message buffer
> ): queue full
> sd1(ahc0:2:0): Handled Residual of 8192 bytes
> sd1(ahc0:2:0): queue full
> sd1(ahc0:2:0): Handled Residual of 5120 bytes
> sd1(ahc0:2:0): queue full
> sd1(ahc0:2:0): Handled Residual of 2048 bytes
> sd1(ahc0:2:0): queue full
> sd1(ahc0:2:0): Handled Residual of 2048 bytes
> sd1(ahc0:2:0): queue full
> sd1(ahc0:2:0): Handled Residual of 8192 bytes
> sd1(ahc0:2:0): queue full
> 
> etc. - no recovery.
> 
> Any ideas?
> 
> Cheers,
> 
> Patrick