Subject: raid: failed device used after reboot
To: None <email@example.com>
From: Manuel Bouyer <firstname.lastname@example.org>
Date: 05/22/2000 19:16:17
I'm playing with an array of disk and raidframe, experimenting with various
failure type. Here's what I've just got into:
I've a raid1 spread accross drives in different enclosures, in such a way that
I can power down one enclosure without loosing the raid.
I've got in trouble with the following senario:
- start writing to the filesystem: dd if=/dev/zero of=file bs=64k
- power down one of the enclosures. raidframe mark the corresponding devices
as failed and continue running. dd doesn't stop.
- power back the enclosure.
When the machine reboots, raidframe finds all disks with status 'optimal' and
parity 'dirty' so it starts revriting parity. Unfortunably some of the failed
disks were master, so data a read from them instead of from the slave which
has the accurate data. This resulted for me in an unclean filesystem,
which had to be fixed with a 'fsck -y' (there was an unallocated inode in a
In this senario raidframe should record elsewhere that the failed disk, and
not reuse them. Maybe something based on the mod ref counter would work (or
did I miss what the mod ref counter is for ?).
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr