Subject: Re: ahc & mpt scsi timeouts
To: Tracy Di Marco White <netbsd@gendalia.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 05/27/2006 09:30:08
On Fri, May 26, 2006 at 11:36:43PM -0500, Tracy Di Marco White wrote:
> 
> I have a machine with 4 tape drives attached, each on their own scsi
> chain, to do backups with.  I regularly get these timeouts, hanging
> the process accessing a drive, and requiring me to restart the machine,
> and causing problems with backups.  The tape drives are attached via
> ahc(4) cards.  It also has two spool disks, attached via mpt(4).
> 
> I am running a not exactly new current at this point.  It is a
> multiprocessor machine that I am running UP in hopes that it would
> be more stable.  The only modification I have to the kernel is
> that I doubled ST_IO_TIME in src/sys/dev/scsipi/stvar.h from
> 3 minutes to 6 minutes.
> 
> Is there something I can do to make these stop happening, and
> allow backups to work more consistently?
> 
> The mpt timeouts only prevent me from booting, and if I reboot
> it, possibly a few times, it'll eventually come up.  They look
> like:
> 
>   probe(mpt0:0:0:0): command timeout
>   mpt0: timeout on request index = 0xfe, seq = 0x00000068
>   mpt0: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
>   mpt0: request state: On Chip
>   probe(mpt0:0:1:0): command timeout
> 
> and are repeated over & over until I drop to the debugger
> to reboot, or it finally drops to single user mode, unable
> to mount the spool disks and I reboot it.  These don't happen
> in any consistent fashion.
> 
> As for the adaptec timeouts, I don't see them in any consistent
> fashion either.  I believe they're usually on commands where a
> tape is being mounted for write.
> 
> ahc4:SCB 0xe - timed out

Do you always see the timeout on mpt0 and ahc4, or can it occurs with any
adapter ?
I notice that mpt0 and ahc4 share interrupt with the PERC

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--