Subject: raidframe and pciide list interrupts
To: None <current-users@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: current-users
Date: 11/22/2000 04:05:37
Hi,

One half of my raidframe mirror across wd0 and wd1 (a pair of IBM 46GB
disks) on my Alpha PC164 running 1.5_BETA2 just died with:

Nov 22 02:44:32 thoreau /netbsd: pciide0:0:0: lost interrupt
Nov 22 03:10:30 thoreau /netbsd:        type: ata tc_bcount: 8192 tc_skip: 0
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: bus-master DMA error: missing interrupt, status=0x21
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: device timeout, c_bcount=8192, c_skip0
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
				(wd0 bn 5275776; cn 5233 tn 14 sn 30)
Nov 22 03:10:30 thoreau /netbsd: raid0: IO Error.  Marking /dev/wd0a as failed.
Nov 22 03:10:30 thoreau /netbsd: raid0: node (Rmir) returned fail, rolling backward

This continued for about 10 minutes with lots of pciide and wd0 errors
interspersed with the following raidframe errors:

Nov 22 03:10:30 thoreau /netbsd: raid0: IO Error.  Marking /dev/wd0a as failed.
Nov 22 03:10:30 thoreau /netbsd: raid0: node (Rmir) returned fail, rolling backward
Nov 22 03:10:30 thoreau /netbsd: raid0: DAG failure: r addr 0x508040 (5275712) nblk 0x10 (16) buf 0xfffffe000307c000
Nov 22 03:11:22 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:12:14 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:12:14 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:15:41 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:20:01 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward

and now seems to be ignoring wd0 altogether.

So, a couple of questions:

 1) Shouldn't raidframe have stopped accessing wd0 after the first
    "Marking /dev/wd0a as failed"?

 2) Is the disk hosed?  Sleep time now - I'll reboot in the morning and
    see what happens.

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD Sales, Support and Service:  http://www.wasabisystems.com/