NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RAIDFrame issues ...



I've got a 4.0 laptop with a single internal drive wd0.  I've set up the
disk with a broken RAIDFrame mirror with the intention of being able
to perform full disk backups to an external disk of the same type by
attaching the disk and resyncing the mirror.

Initially, raidctl reports /dev/wd0a as optimal and component1 as 
failed, which is the usual situation for this laptop.  I attached the
external disk and added /dev/sd0a as a spare and initiated the resync.

All of that went fine - I then tested this by syncing the filesystems and 
physically yanked the cable to the external disk ... laptop kept on 
humming away nicely, no panic, all good[1] ... 

After this test I thought I'd see what happened if I re-attached the 
disk to the host and tried to get the mirror to resync ... I can't
seem to get it to do it ...

snipped session log:

|wd0 at atabus0 drive 0: <WDC WD2500BEVE-00WZT0>
|wd0: drive supports 16-sector PIO transfers, LBA48 addressing
|wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
|wd0: 32-bit data port
|wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
|wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
:
:
|raid0: RAID Level 1
|raid0: Components: /dev/wd0a component1[**FAILED**]
|raid0: Total Sectors: 488396928 (238475 MB)

I then attached the external disk:

|umass0 at uhub3 port 3 configuration 1 interface 0
|umass0: ITE TECH. INC. USB TO IDE, rev 2.00/2.00, addr 2
|umass0: using SCSI over Bulk-Only
|scsibus0 at umass0: 2 targets, 1 lun per target
|sd0 at scsibus0 target 0 lun 0: <, WDC WD2500BEVE-0, 01.0> disk fixed
|sd0: fabricating a geometry
|sd0: 232 GB, 238475 cyl, 64 head, 32 sec, 512 bytes/sect x 488397168 sectors
:
:
|sd0: fabricating a geometry
|Warning: truncating spare disk /dev/sd0a to 488396928 blocks (from 488397041)
|RECON: initiating reconstruction on col 1 -> spare at col 2
|wd0a: LBA48 bug reading fsbn 268435392 of 268435392-268435519 (wd0 bn 
268435455; cn 266305 tn 0 sn 15), retrying
|wd0: soft error (corrected)

... and began a reconstruction:

|raid0: Reconstruction of disk at col 1 completed
|raid0: Recon time was 11804.145201 seconds, accumulated XOR time was 0 us 
(0.000000)
|raid0:  (start time 1205892682 sec 526134 usec, end time 1205904486 sec 671335 
usec)
|raid0: Total head-sep stall count was 0
|raid0: 7629308 recon event waits, 7 recon delays
|raid0: 4143270104 max exec ticks

... then pulled the cable out:

|umass0: at uhub3 port 3 (addr 2) disconnected
|sd0 detached
|scsibus0 detached
|umass0 detached
|raid0: IO Error.  Marking /dev/sd0a as failed.
|maja[~/projects/rrrchive] 8v>: raidctl -s raid0
|Components:
|           /dev/wd0a: optimal
|          component1: spared
|Spares:
|           /dev/sd0a: failed
|Component label for /dev/wd0a:
:
:
|component1 status is: spared.  Skipping label.
|/dev/sd0a status is: failed.  Skipping label.
|Parity status: clean
|Reconstruction is 100% complete.
|Parity Re-write is 100% complete.
|Copyback is 100% complete.

... at this point everything is looking OK, so I reattached the drive ... 

|umass0 at uhub3 port 3 configuration 1 interface 0
|umass0: ITE TECH. INC. USB TO IDE, rev 2.00/2.00, addr 2
|umass0: using SCSI over Bulk-Only
|scsibus0 at umass0: 2 targets, 1 lun per target
|sd0 at scsibus0 target 0 lun 0: <, WDC WD2500BEVE-0, 01.0> disk fixed
|sd0: fabricating a geometry
|sd0: 232 GB, 238475 cyl, 64 head, 32 sec, 512 bytes/sect x 488397168 sectors
|sd0: fabricating a geometry

... then tried several things to get the re-attached drive synced again:

|maja[~/projects/rrrchive] 10v>: sudo raidctl -F /dev/sd0a raid0
|Password:
|raidctl: ioctl (RAIDFRAME_FAIL_DISK) failed: Invalid argument
|maja[~/projects/rrrchive] 12v>: raidctl -s raid0
|Components:
|           /dev/wd0a: optimal
|          component1: spared
|Spares:
|           /dev/sd0a: failed
|component1 status is: spared.  Skipping label.
:
:
|/dev/sd0a status is: failed.  Skipping label.
|Parity status: clean
|Reconstruction is 100% complete.
|Parity Re-write is 100% complete.
|Copyback is 100% complete.

(checked the disk label, to make sure I hadn't accidentally fragged it):

|maja[~/projects/rrrchive] 14v>: disklabel sd0
|# /dev/rsd0d:
:
:
|16 partitions:
|#        size    offset     fstype [fsize bsize cpg/sgs]
| a: 488397105        63       RAID                     # (Cyl.      0*- 484520)
| c: 488397105        63     unused      0     0        # (Cyl.      0*- 484520)
| d: 488397168         0     unused      0     0        # (Cyl.      0 - 484520)
| e:   4194304       127     4.2BSD      0     0     0  # (Cyl.      0*-   
4161*)
| f:   1048576   4194431       swap                     # (Cyl.   4161*-   
5201*)
| g:   4194304   5243007     4.2BSD      0     0     0  # (Cyl.   5201*-   
9362*)
| h:   4194304   9437311     4.2BSD      0     0     0  # (Cyl.   9362*-  
13523*)
| i: 474765440  13631615     4.2BSD      0     0     0  # (Cyl.  13523*- 
484520*)
|disklabel: partitions a and e overlap
|disklabel: partitions a and f overlap
|disklabel: partitions a and g overlap
|disklabel: partitions a and h overlap
|disklabel: partitions a and i overlap

(this is good - allows me to get to the filesystems if raid is broken)

... and tried again to initiate a reconstruction:

|maja[~/projects/rrrchive] 15v>: sudo raidctl -R /dev/sd0a raid0
|raidctl: ioctl (RAIDFRAME_REBUILD_IN_PLACE) failed: Invalid argument
|maja[~/projects/rrrchive] 17v>: sudo raidctl -r component1 raid0
|maja[~/projects/rrrchive] 18v>: raidctl -s raid0
|Components:
|           /dev/wd0a: optimal
|          component1: spared
|Spares:
|           /dev/sd0a: failed
|Component label for /dev/wd0a:
:
:
|component1 status is: spared.  Skipping label.
|/dev/sd0a status is: failed.  Skipping label.
|Parity status: clean
|Reconstruction is 100% complete.
|Parity Re-write is 100% complete.
|Copyback is 100% complete.
|maja[~/projects/rrrchive] 23v>: sudo raidctl -R /dev/sd0a raid0
|raidctl: ioctl (RAIDFRAME_REBUILD_IN_PLACE) failed: Invalid argument

Finally I decided to try and add the reconnected drive, to see if it would do 
the Right Thing:

|maja[~/projects/rrrchive] 24v>: sudo raidctl -a /dev/sd0a raid0
|maja[~/projects/rrrchive] 25v>: raidctl -s raid0
|Components:
|           /dev/wd0a: optimal
|          component1: spared
|Spares:
|           /dev/sd0a: failed
|           /dev/sd0a: spare
|Component label for /dev/wd0a:
:
:
|component1 status is: spared.  Skipping label.
|/dev/sd0a status is: failed.  Skipping label.
|/dev/sd0a status is: spare.  Skipping label.
|Parity status: clean
|Reconstruction is 100% complete.
|Parity Re-write is 100% complete.
|Copyback is 100% complete.

... ?  the device is listed twice, once as failed, once as spared?

|maja[~/projects/rrrchive] 26v>: sudo raidctl -R /dev/sd0a raid0
|raidctl: ioctl (RAIDFRAME_REBUILD_IN_PLACE) failed: Invalid argument
|maja[~/projects/rrrchive] 27v>: sudo raidctl -r /dev/sd0a raid0
|maja[~/projects/rrrchive] 28v>: raidctl -s raid0
|Components:
|           /dev/wd0a: optimal
|          component1: spared
|Spares:
|           /dev/sd0a: failed
|           /dev/sd0a: spare
|Component label for /dev/wd0a:
:
:
|component1 status is: spared.  Skipping label.
|/dev/sd0a status is: failed.  Skipping label.
|/dev/sd0a status is: spare.  Skipping label.
|Parity status: clean
|Reconstruction is 100% complete.
|Parity Re-write is 100% complete.
|Copyback is 100% complete.

... so I'm stumped - I would guess that if I removed the disk, rebooted
the host and added it again it might work, but I'd prefer to see if I
can get it working without the reboot.

Was there something I should have done that I didn't?  Any thoughts?
I'm surprised that none of the remove device commands did anything ...

Regards,
Malcolm

[1] there may be a better way to break the mirror administratively, but
I figured I'd not yet explored this behaviour.  Besides, even if I broke
the mirror administratively, the filesystems would still need to be 
fscked if/when it was mounted again.

-- 
Malcolm Herbert                                This brain intentionally
mjch%mjch.net@localhost                                                left 
blank


Home | Main Index | Thread Index | Old Index