tech-kern: RE: twe status queries?

Subject: RE: twe status queries?
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Gordon Waidhofer <gww@traakan.com>
List: tech-kern
Date: 12/02/2005 11:00:48
You could be getting burned by a slow drive.
Successful retries would not get reported as
a hard error. You can try swapping drives, one
position at a time, between the two RAID sets
and see if the problem moves.

A better way, though time consuming, is to
verify that each drive performs within a
tolerance. RAID integrators usually qualify
individual drives before putting them in
RAID sets.

	-gww

> -----Original Message-----
> From: tech-kern-owner@NetBSD.org [mailto:tech-kern-owner@NetBSD.org]On
> Behalf Of der Mouse
> Sent: Friday, December 02, 2005 10:37 AM
> To: tech-kern@netbsd.org
> Subject: twe status queries?
> 
> 
> I'm working with a machine with a twe in it, running 2.0.  It's got 12
> drives, all identical as far as I can tell:
> 
> twe0 at pci3 dev 1 function 0: 3ware Escalade
> twe0: interrupting at irq 10
> twe0: 12 ports, Firmware FE7S 1.05.00.065, BIOS BE7X 1.08.00.048
> twe0: Monitor ME7X 1.01.00.038, PCB Rev5    , Achip 3.20    , 
> Pchip 1.30-66 
> twe0: port 0: ST3300831AS                              286168 MB
> twe0: port 1: ST3300831AS                              286168 MB
> twe0: port 2: ST3300831AS                              286168 MB
> twe0: port 3: ST3300831AS                              286168 MB
> twe0: port 4: ST3300831AS                              286168 MB
> twe0: port 5: ST3300831AS                              286168 MB
> twe0: port 6: ST3300831AS                              286168 MB
> twe0: port 7: ST3300831AS                              286168 MB
> twe0: port 8: ST3300831AS                              286168 MB
> twe0: port 9: ST3300831AS                              286168 MB
> twe0: port 10: ST3300831AS                              286168 MB
> twe0: port 11: ST3300831AS                              286168 MB
> 
> Because the twe doesn't support more than 2T in a single RAID, I've got
> the drives split up into two RAID5s, each formed from six of the
> drives, one drives 0-5 and one 6-11:
> 
> ld0 at twe0 unit 0: 64K stripe RAID5, status: Normal
> ld0: 1397 GB, 182405 cyl, 255 head, 63 sec, 512 bytes/sect x 
> 2930351360 sectors
> ld1 at twe0 unit 6: 64K stripe RAID5, status: Normal
> ld1: 1397 GB, 182405 cyl, 255 head, 63 sec, 512 bytes/sect x 
> 2930351360 sectors
> 
> (the above quote is from the most recent reboot).
> 
> However, I'm seeing a tremendous performance difference between ld0 and
> ld1.  I have a disk exerciser program; when I run it on ld0, it runs at
> a particular speed; on ld1 - in the same way - it runs at about 15% of
> that speed.  Yes, that's 3/20 the speed - almost an order of magnitude.
> At first I thought it might be that ld0 had priority over ld1, because
> they were running simultaneously, but when the ld0 run finished, the
> ld1 run didn't speed up any.
> 
> The only thing I can think of which would justify this would be if one
> of the drives in ld1 has failed.  But the front panel lights give no
> indication, and dmesg contains nothing at all from either twe or ld
> since boot.  Is there any way to check the status of the arrays without
> bringing the machine down to the BIOS?  If not, any idea how hard it
> would be to hack such a thing into the driver?  Doing RAID seems
> semi-useless if you don't hear about drive failures in time to replace
> them before a second drive fails and you lose data.
> 
> /~\ The ASCII				der Mouse
> \ / Ribbon Campaign
>  X  Against HTML	       mouse@rodents.montreal.qc.ca
> / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
> 
>