NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/49244: raid(4): reconstruct-on-spare-in-progress not a separate state

>Number:         49244
>Category:       kern
>Synopsis:       raid(4): reconstruct-on-spare-in-progress not a separate state
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 29 21:10:00 +0000 2014
>Originator:     Thomas Klausner
>Release:        NetBSD 5.1.2
Curiosity is the very basis of education and if you tell me that 
curiosity killed the cat, I say only that the cat died nobly.
- Arnold Edinborough
Architecture: x86_64
Machine: amd64
After replacing a hard disk in a 3-disk RAID 5, I added the replacement disk
as spare and started reconstructing on it (in single-user mode).

After having started that, I rebooted, because I thought it should
work (even if it might start from 0% again) and I could use the
machine in multi-user mode with a degraded raid.

When I rebooted, I noticed that the raid thought itself to be perfect;
i.e. even though reconstruction had been at 1%, it now said that
everything was fine for all 3 disks.

I immediately failed the new disk and started reconstruction to it again.

The whole procedure cost me one inode on the RAID, which fsck could
fix (I was lucky).

Greg Oster has analyzed:
> Basically, when reconstruction starts to a spare disk, the disk is                                                                                                                                              
> marked as rf_used_spare -- and that is what we key off of when                                                                                                                                                  
> updating the component labels on shutdown, etc.  So what happens                                                                                                                                                
> when you reboot in the middle of a reconstruction is that the disk                                                                                                                                              
> being rebuilt *to* gets updated with a component label as though                                                                                                                                                
> the reconstruction is finished!!!  I think all that's needed is                                                                                                                                                 
> another intermediate state (say rf_ds_rebuild_in_progress) that                                                                                                                                                 
> basically marks the spare disk as being in used, but not done with                                                                                                                                              
> reconstruction until rf_used_spare is hit....                                                                                                                                                                   

See above.
Greg has agreed to fix this. Thanks, Greg :)


Home | Main Index | Thread Index | Old Index