Subject: disk failure/hot swap
To: None <port-alpha@netbsd.org>
From: Pete Vickers <pete.vickers@uk.adtranz.com>
List: port-alpha
Date: 03/09/2001 17:23:25
Hi All,

quick query.. not sure it's o/s [NetBSD 1.4.2] or h/w [DEC alpha 5000 with NCR
SCSI card] related...

basically I started getting the error below, so i assumed a disk was dying,
therefore I [feeling brave & not having down-time available]...

- umounted it
- physically pulled the disk
- inserted a new disk
- since it's the same as the other disks, i tried:
     - 'disklabel sd4 > image.bin'
     - then, 'disklabel -R sd5 image.bin'
- at this point was was expecting to then do a newfs etc & mount it...

but instead it got the other error below. Throughout this all other disk on the
same SCSI bus & backplane contiune to be fine. Am I asking to much of the h/w
here without using a RAID controller ? [I have one but there is no NetBSD driver
for it] or am I just being stupid ?

any thoughts or comments invited.



first error:
---------------------------------------------------------------------------------------------------
Mar  1 15:45:58 hal /netbsd: sd5(ncr2:2:0):  Check Condition on CDB: 0x2a 00
004e c0 c0 00 00 10 00
Mar  1 15:45:58 hal /netbsd:     SENSE KEY:  Recovered Error
Mar  1 15:45:58 hal /netbsd:    INFO FIELD:  5161167
Mar  1 15:45:58 hal /netbsd:  COMMAND INFO:  133365977 (0x7f300d9)
Mar  1 15:45:58 hal /netbsd:      ASC/ASCQ:  ASC 0x03 ASCQ 0xa8
Mar  1 15:45:58 hal /netbsd:      FRU CODE:  0x55
Mar  1 15:45:58 hal /netbsd:          SKSV:  Actual Retry Count: 11
Mar  1 15:45:58 hal /netbsd:
Mar  1 15:45:59 hal /netbsd: sd5(ncr2:2:0):  Check Condition on CDB: 0x2a 00
004e f9 a0 00 00 0c 00
Mar  1 15:45:59 hal /netbsd:     SENSE KEY:  Recovered Error
Mar  1 15:45:59 hal /netbsd:    INFO FIELD:  5175722
Mar  1 15:45:59 hal /netbsd:  COMMAND INFO:  133758994 (0x7f90012)
Mar  1 15:45:59 hal /netbsd:      ASC/ASCQ:  ASC 0x03 ASCQ 0xa8
Mar  1 15:45:59 hal /netbsd:      FRU CODE:  0x55
Mar  1 15:45:59 hal /netbsd:          SKSV:  Actual Retry Count: 14
Mar  1 15:45:59 hal /netbsd:
Mar  1 15:46:05 hal /netbsd: sd5(ncr2:2:0):  Check Condition on CDB: 0x2a 00
004e f9 a0 00 00 0c 00
Mar  1 15:46:05 hal /netbsd:     SENSE KEY:  Recovered Error
Mar  1 15:46:05 hal /netbsd:    INFO FIELD:  5175720
Mar  1 15:46:05 hal /netbsd:  COMMAND INFO:  133758992 (0x7f90010)
Mar  1 15:46:05 hal /netbsd:      ASC/ASCQ:  ASC 0x03 ASCQ 0xa8
Mar  1 15:46:05 hal /netbsd:      FRU CODE:  0x55
Mar  1 15:46:05 hal /netbsd:          SKSV:  Actual Retry Count: 21
---------------------------------------------------------------------------------------------------


error on disklabel command:
---------------------------------------------------------------------------------------------------
sd5(ncr2:2:0): COMMAND FAILED (9 80) @0xfffffe0000267c00.
sd5(ncr2:2:0): extraneous data discarded.
sd5(ncr2:2:0): COMMAND FAILED (9 80) @0xfffffe0000267c00.
sd5(ncr2:2:0): extraneous data discarded.
sd5(ncr2:2:0): COMMAND FAILED (9 80) @0xfffffe0000267c00.
sd5(ncr2:2:0): extraneous data discarded.
sd5(ncr2:2:0): COMMAND FAILED (9 80) @0xfffffe0000267c00.
sd5(ncr2:2:0): extraneous data discarded.
sd5(ncr2:2:0): COMMAND FAILED (9 80) @0xfffffe0000267c00.
sd5(ncr2:2:0): extraneous data discarded.
sd5(ncr2:2:0): COMMAND FAILED (9 80) @0xfffffe0000267c00.
---------------------------------------------------------------------------------------------------



from dmesg
---------------------------------------------------------------------------------------------------
...
DIGITAL Server 5000 Model 5305 6533A 5/533 4MB, 531MHz
....
ncr0 at pci0 dev 1 function 0: ncr 53c810 fast10 scsi
ncr0: interrupting at kn300 irq 36
ncr0: minsync=25, maxsync=206, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
...
ncr1 at pci2 dev 0 function 0: ncr 53c875 fast20 wide scsi
ncr1: interrupting at kn300 irq 12
ncr1: minsync=25, maxsync=254, maxoffs=16, 128 dwords burst, large dma fifo
ncr1: single-ended, open drain IRQ driver, using on-chip SRAM
ncr1: restart (scsi reset).
...
ncr2 at pci2 dev 1 function 0: ncr 53c875 fast20 wide scsi
ncr2: interrupting at kn300 irq 13
ncr2: minsync=25, maxsync=254, maxoffs=16, 128 dwords burst, large dma fifo
ncr2: single-ended, open drain IRQ driver, using on-chip SRAM
ncr2: restart (scsi reset).
...
sd5 at scsibus2 targ 2 lun 0: <DEC, RZ2DC-PA (C) DEC, 5520> SCSI2 0/direct fixed
sd5(ncr2:2:0): WIDE SCSI (16 bit) enabled
sd5(ncr2:2:0): 20.0 MB/s (100 ns, offset 16)
sd5: 8678MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
...
---------------------------------------------------------------------------------------------------

regards,

Pete