[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/49244: raid(4): reconstruct-on-spare-in-progress not a separate state
The following reply was made to PR kern/49244; it has been noted by GNATS.
From: Greg Oster <oster%cs.usask.ca@localhost>
Subject: Re: kern/49244: raid(4): reconstruct-on-spare-in-progress not a
Date: Tue, 11 Nov 2014 18:04:52 -0600
On Mon, 29 Sep 2014 21:10:03 +0000 (UTC)
Thomas Klausner <wiz%NetBSD.org@localhost> wrote:
> >Number: 49244
> >Category: kern
> >Synopsis: raid(4): reconstruct-on-spare-in-progress not a
> >separate state Confidential: no
> >Severity: serious
> >Priority: high
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Mon Sep 29 21:10:00 +0000 2014
> >Originator: Thomas Klausner
> >Release: NetBSD 5.1.2
> Curiosity is the very basis of education and if you tell me that
> curiosity killed the cat, I say only that the cat died nobly.
> - Arnold Edinborough
> Architecture: x86_64
> Machine: amd64
> After replacing a hard disk in a 3-disk RAID 5, I added the
> replacement disk as spare and started reconstructing on it (in
> single-user mode).
> After having started that, I rebooted, because I thought it should
> work (even if it might start from 0% again) and I could use the
> machine in multi-user mode with a degraded raid.
> When I rebooted, I noticed that the raid thought itself to be perfect;
> i.e. even though reconstruction had been at 1%, it now said that
> everything was fine for all 3 disks.
> I immediately failed the new disk and started reconstruction to it
> The whole procedure cost me one inode on the RAID, which fsck could
> fix (I was lucky).
> Greg Oster has analyzed:
> > Basically, when reconstruction starts to a spare disk, the disk
> > is marked as rf_used_spare -- and that is what we key off of
> > when updating the component labels on shutdown, etc. So what
> > happens when you reboot in the middle of a reconstruction is that
> > the disk being rebuilt *to* gets updated with a component label as
> > though the reconstruction is finished!!! I think all that's needed
> > is another intermediate state (say rf_ds_rebuild_in_progress)
> > that basically marks the spare disk as being in used, but not done
> > with reconstruction until rf_used_spare is
> > hit....
> See above.
> Greg has agreed to fix this. Thanks, Greg :)
I have a fix for this that I hope to commit very soon, and request
pullups. I'm just working on double-checking the changes to make sure
I havn't introduced any unintended consequences.
Main Index |
Thread Index |