NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/49244: raid(4): reconstruct-on-spare-in-progress not a separate state



The following reply was made to PR kern/49244; it has been noted by GNATS.

From: Greg Oster <oster%cs.usask.ca@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/49244: raid(4): reconstruct-on-spare-in-progress not a
 separate state
Date: Tue, 11 Nov 2014 18:04:52 -0600

 On Mon, 29 Sep 2014 21:10:03 +0000 (UTC)
 Thomas Klausner <wiz%NetBSD.org@localhost> wrote:
 
 > >Number:         49244
 > >Category:       kern
 > >Synopsis:       raid(4): reconstruct-on-spare-in-progress not a
 > >separate state Confidential:   no
 > >Severity:       serious
 > >Priority:       high
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Mon Sep 29 21:10:00 +0000 2014
 > >Originator:     Thomas Klausner
 > >Release:        NetBSD 5.1.2
 > >Organization:
 > Curiosity is the very basis of education and if you tell me that 
 > curiosity killed the cat, I say only that the cat died nobly.
 > - Arnold Edinborough
 > >Environment:
 > 	
 > 	
 > Architecture: x86_64
 > Machine: amd64
 > >Description:
 > After replacing a hard disk in a 3-disk RAID 5, I added the
 > replacement disk as spare and started reconstructing on it (in
 > single-user mode).
 > 
 > After having started that, I rebooted, because I thought it should
 > work (even if it might start from 0% again) and I could use the
 > machine in multi-user mode with a degraded raid.
 > 
 > When I rebooted, I noticed that the raid thought itself to be perfect;
 > i.e. even though reconstruction had been at 1%, it now said that
 > everything was fine for all 3 disks.
 > 
 > I immediately failed the new disk and started reconstruction to it
 > again.
 > 
 > The whole procedure cost me one inode on the RAID, which fsck could
 > fix (I was lucky).
 > 
 > Greg Oster has analyzed:
 > > Basically, when reconstruction starts to a spare disk, the disk
 > > is marked as rf_used_spare -- and that is what we key off of
 > > when updating the component labels on shutdown, etc.  So what
 > > happens when you reboot in the middle of a reconstruction is that
 > > the disk being rebuilt *to* gets updated with a component label as
 > > though the reconstruction is finished!!!  I think all that's needed
 > > is another intermediate state (say rf_ds_rebuild_in_progress)
 > > that basically marks the spare disk as being in used, but not done
 > > with reconstruction until rf_used_spare is
 > > hit....                                                                                                                                                                   
 > 
 > 
 > >How-To-Repeat:
 > See above.
 > >Fix:
 > Greg has agreed to fix this. Thanks, Greg :)
 > 
 > >Unformatted:
 
 I have a fix for this that I hope to commit very soon, and request
 pullups.  I'm just working on double-checking the changes to make sure
 I havn't introduced any unintended consequences.
 
 Later...
 
 Greg Oster
 


Home | Main Index | Thread Index | Old Index