port-alpha: RAIDFrame problem

Subject: RAIDFrame problem
To: NetBSD/alpha <port-alpha@netbsd.org>
From: Matt Dainty <matt@bodgit-n-scarper.com>
List: port-alpha
Date: 07/04/2003 21:32:19
--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,

I recently converted my PC164LX loaded with NetBSD 1.6, to running on a
RAIDFrame RAID 1 setup using a pair of Seagate ST39102LC disks on an
Adaptec 2940UW.

Once I got my head around the setting up bit, performance and
usability-wise it's run great, but I've had two crashes of the same
nature, and I was wondering if anyone might shed some light.

I didn't collect the data of the first crash as I thought it might be
just a freak occurance, but the second time, I've copied the relevant
bits from dmesg after getting dropped to the db>, and I'm fairly sure
it's the same type of errors as the first time:

=2E..
tlp0: transmit underrun; new threshold: 160/1024 bytes
sd1(ahc0:0:5:0): SCB 6a - timed out while idle, SEQADDR =3D=3D 0xa
SCSIRATE =3D=3D 0x0
sd1(ahc0:0:5:0): Queuing a BDR SCB
sd1(ahc0:0:5:0): Bus Device Reset Message Sent
sd1(ahc0:0:5:0): no longer in timeout, status =3D 0
sd1: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:5. 10 SCBs aborted
sd1: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd1(ahc0:0:5:0): SCB 6a - timed out while idle, SEQADDR =3D=3D 0xa
SCSIRATE =3D=3D 0x0
sd1(ahc0:0:5:0): Queuing a BDR SCB
sd1(ahc0:0:5:0): Bus Device Reset Message Sent
sd1(ahc0:0:5:0): no longer in timeout, status =3D 2
sd1: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:5. 10 SCBs aborted
sd1: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd1(ahc0:0:5:0): SCB 6a - timed out while idle, SEQADDR =3D=3D 0xa
SCSIRATE =3D=3D 0x0
sd1(ahc0:0:5:0): Queuing a BDR SCB
sd1(ahc0:0:5:0): Bus Device Reset Message Sent
sd1(ahc0:0:5:0): no longer in timeout, status =3D 2
sd1: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:5. 10 SCBs aborted
sd1: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd1(ahc0:0:5:0): SCB 6a - timed out while idle, SEQADDR =3D=3D 0xc
SCSIRATE =3D=3D 0x0
sd1(ahc0:0:5:0): Queuing a BDR SCB
sd1(ahc0:0:5:0): Bus Device Reset Message Sent
sd1(ahc0:0:5:0): no longer in timeout, status =3D 2
sd1: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:5. 10 SCBs aborted
sd1: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd1(ahc0:0:5:0): SCB 6a - timed out while idle, SEQADDR =3D=3D 0xd
SCSIRATE =3D=3D 0x0
sd1(ahc0:0:5:0): Queuing a BDR SCB
sd1(ahc0:0:5:0): Bus Device Reset Message Sent
sd1(ahc0:0:5:0): no longer in timeout, status =3D 2
sd1: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:5. 10 SCBs aborted
raid0: IO Error.  Marking /dev/sd1d as failed.
raid0: node (Wsd) returned fail, rolling forward
raid0: node (Wsd) returned fail, rolling forward
raid0: node (Wsd) returned fail, rolling forward
raid0: node (Wsd) returned fail, rolling forward
raid2: IO Error.  Marking /dev/sd1f as failed.
raid2: node (Wsd) returned fail, rolling forward
raid2: node (Wsd) returned fail, rolling forward
raid2: node (Wsd) returned fail, rolling forward
raid2: node (Wsd) returned fail, rolling forward
raid2: node (Wsd) returned fail, rolling forward
raid2: node (Wsd) returned fail, rolling forward
sd0(ahc0:0:6:0): SCB 65 - timed out while idle, SEQADDR =3D=3D 0xa
SCSIRATE =3D=3D 0x0
sd0(ahc0:0:6:0): Queuing a BDR SCB
sd0(ahc0:0:6:0): Bus Device Reset Message Sent
sd0(ahc0:0:6:0): no longer in timeout, status =3D 0
sd0: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:6. 9 SCBs aborted
sd0: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd0(ahc0:0:6:0): SCB 63 - timed out while idle, SEQADDR =3D=3D 0xc
SCSIRATE =3D=3D 0x0
sd0(ahc0:0:6:0): Queuing a BDR SCB
sd0(ahc0:0:6:0): Bus Device Reset Message Sent
sd0(ahc0:0:6:0): no longer in timeout, status =3D 2
sd0: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:6. 9 SCBs aborted
sd0: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd0(ahc0:0:6:0): SCB 63 - timed out while idle, SEQADDR =3D=3D 0xc
SCSIRATE =3D=3D 0x0
sd0(ahc0:0:6:0): Queuing a BDR SCB
sd0(ahc0:0:6:0): Bus Device Reset Message Sent
sd0(ahc0:0:6:0): no longer in timeout, status =3D 2
sd0: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:6. 9 SCBs aborted
sd0: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged
queueing
sd0(ahc0:0:6:0): SCB 63 - timed out while idle, SEQADDR =3D=3D 0xc
SCSIRATE =3D=3D 0x0
sd0(ahc0:0:6:0): Queuing a BDR SCB
sd0(ahc0:0:6:0): Bus Device Reset Message Sent
sd0(ahc0:0:6:0): no longer in timeout, status =3D 2
sd0: async, 8-bit transfers, tagged queueing
ahc0: Bus Device Reset on A:6. 9 SCBs aborted
raid0: IO Error.  Marking /dev/sd0d as failed.
raid0: node (Wpd) returned fail, rolling forward
raid0: node (Wpd) returned fail, rolling forward
raid0: node (Wpd) returned fail, rolling forward
raid0: node (Wpd) returned fail, rolling forward
raid2: IO Error.  Marking /dev/sd0f as failed.
raid2: node (Wpd) returned fail, rolling forward
raid2: node (Wpd) returned fail, rolling forward
raid2: node (Wpd) returned fail, rolling forward
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
Multiple disks failed in a single group!  Aborting I/O operation.
[Failed to create a DAG]
panic: raidframe error at line 458 file
/usr/src/sys/arch/alpha/compile/GELF/../
=2E./../../dev/raidframe/rf_states.c
db>

In both cases, I've powered the machine back up, the parity has been
reconstructed across the two disks and the machine has worked again for
4 or 5 days before crashing again.

I've taken both disks and plugged them up to an Adaptec 29160 in my PC,
and used the BIOS utility on the card to perform a full verification
and neither disk has had a problem, so I'm not sure there's a hardware
issue, at least with the disks.

I looked the changelog for 1.6 -> 1.6.1 and I couldn't see anything
obvious pertaining to either RAIDFrame or the Adaptec ahc driver, but is
this a known issue at all?

My disk setup is pretty standard, I have a raid0 for /, raid1 for swap
and raid2 for /usr, /var, etc. it was in a HOWTO I read somewhere via
google, but I gather it's typical.

Matt
--=20
"Phased plasma rifle in a forty-watt range?"
"Hey, just what you see, pal."

--vkogqOf2sHV7VnPd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (NetBSD)

iD8DBQE/BeRTKP58eR+X2TMRAjWPAJ0fUivmXhdP+HSHLsSFUU1spTZJVwCfTU0/
YqmLzP9Y64ltBypxcJnUJXE=
=hrGq
-----END PGP SIGNATURE-----

--vkogqOf2sHV7VnPd--