Subject: Re: raid: failed device used after reboot
To: Manuel Bouyer <email@example.com>
From: Greg Oster <firstname.lastname@example.org>
Date: 05/22/2000 18:44:36
Manuel Bouyer writes:
> I'm playing with an array of disk and raidframe, experimenting with various
> failure type. Here's what I've just got into:
> I've a raid1 spread accross drives in different enclosures, in such a way tha
> I can power down one enclosure without loosing the raid.
> I've got in trouble with the following senario:
> - start writing to the filesystem: dd if=/dev/zero of=file bs=64k
> - power down one of the enclosures. raidframe mark the corresponding devices
> as failed and continue running. dd doesn't stop.
> - power back the enclosure.
> - reboot.
> When the machine reboots, raidframe finds all disks with status 'optimal' and
> parity 'dirty' so it starts revriting parity. Unfortunably some of the failed
> disks were master, so data a read from them instead of from the slave which
> has the accurate data. This resulted for me in an unclean filesystem,
> which had to be fixed with a 'fsck -y' (there was an unallocated inode in a
What vintage of NetBSD are you running (1.4.2, -current, what date?)?
> In this senario raidframe should record elsewhere that the failed disk, and
> not reuse them. Maybe something based on the mod ref counter would work (or
> did I miss what the mod ref counter is for ?).
The modification counters are supposed to handle this, but there were some
problems with this a while back... If you're running a recent -current,
you should see the master marked as 'failed' even after it comes back up.
Can you send me the relevant chunk of 'dmesg' or /var/log/messages for when it
boots, and/or does device autodetection? There could be a bug in there, but I
don't have enough info yet...