Subject: Re: savecore kills scsi bus?
To: ww <ww@styx.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-sparc64
Date: 12/17/2001 12:15:48
On Mon, Dec 17, 2001 at 01:29:22AM -0500, ww wrote:
> this is odd...
> 
> the machine in question is an ultra5 with an ide disk and a
> scsi disk on a symbios logic 53c860 that i picked up at a
> local hardware store. the disk layout is as follows:
> 
> Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
> /dev/wd0a      127175    29937     90879    24%    /
> /dev/wd0d     1014221    55353    908156     5%    /var
> /dev/wd0g     5988356   234833   5454105     4%    /usr
> mfs:109          7911        1      7514     0%    /tmp
> /dev/sd0a     4061755  1259337   2599330    32%    /src
> /dev/sd0b     4061755       10   3858657     0%    /home
> /dev/sd0g     1015318   157670    806882    16%    /sparc32
> 
> i did something that caused a kernel panic (an apparently
> repeatable problem relating to sparc32 emulation and 
> chroot). when savecore runs upon a reboot, the ide controller
> gets confused:
> 
> pciide0:0:0: lost interrupt
>         type: ata tc_bcount: 65536 tc_skip: 0
> pciide0:0:0: bus-master DMA error: status=0x22
> pciide0:0:0: bus-master DMA error: missing interrupt, status=0x22
> 
> and a disk operation fails recoverably:
> 
> wd0: transfer error, downgrading to PIO mode 4
> wd0(pciide0:0:0): using PIO mode 4
> wd0b: DMA error reading fsbn 1834496 of 1834496-1834623 (wd0 bn 2097584; cn 2080
>  tn 14 sn 62), retrying
> wd0: soft error (corrected)

IDE stops using DMA

> 
> a little later in the boot sequence, when an attempt is made to mount /dev/sd0a,
> the scsi controller resets and does not recover:
> 
> DMA IRQ: Illegal instruction dma fifo empty, DSP=0x6cca8 DSA=0xffffffff: siop0: 
> current DSA invalid
> siop0: scsi bus reset
> sd0(siop0:0:6:0): command with tag id 1 reset
> 
> i have only seen this behaviour when savecore runs, never in any
> other circumstance.

Looks like DMA stops working completely, for both IDE and SCSI.
IDE recovers from this by using PIO, but the siop can't.

Maybe something in the DMA engine doesn't get reset properly ?

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--