tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: disklabel on RAID disappeared



On Fri, 2 Mar 2012 20:04:56 +0100
Edgar Fuß <Edgar.Fuss%bn2.maus.net@localhost> wrote:

> Help, there's something weird going on on our fileserver!
> I'm on vacation and had a colleague do this over the phone.
> Please CC me in replies because I don't have access to my regular
> mail.
> 
> raid0 is level 1, sd0a sd1a
> raid1 is level 5, sd2a .. sd9a
> sd0/1 are scsibus0, targets 0/1
> sd2..9 are scsibus1, targets 0..8
> 
> The machine paniced
> After reboot, parity rewrite on raid0 succeded and failed on raid1
> because of a read error on sd2a.
> He did scsictl stop sd2, scsictl detach scsibus1 0 0, replaced sd2,
> scsictl scan scsibus1 0 0. Something strange must have happened and
> sd2 was async.
> He nevertheless started the reconstruction (raidctl -R sd2a raid1),
> but raidctl -S estimated 24 hours.
> I asked him to stop the reconstruction, but neither failing sd2 nor
> detaching scsibus1 0 0 stopped it. Shortly after, the machine paniced
> again. It came up with raid0 and raid1 configured correctly, but fsck
> raid1a railed. We now have no disklabel on raid1 (disklabel -r says
> something about not being able to read it and disklabel without -r
> shows the fabricated one). Since fsck raid1a said somethin like
> "incorrect fs size". I assume the superblock of raid1a is still
> there, only the disklabel is broken.
> 
> Any hints? He is currently running the reconstruction and we'll see
> whether the disklabel returns or what happens if we re-write it from
> the backup we have in /var.

Panic messages and raid-related info from /var/log/messages would help
here.  Also the NetBSD version would help too.

> Is there a sane way to stop an on-going reconstruction?

Hmm... no.

> May trying to stop it have corrupted the raid1 contents?

No... I expect the disklabel for raid1 would have been physically
living on sd2a, but it should have been recoverable from the data and
parity on the remaining didks.  You don't say what arch you're on, but
if you use 'dd if=/dev/rraid1a' to go hunting, do you find something
that looks like a disklabel, or is it just garbage? (I'm guessing the
latter...)

Later...

Greg Oster


Home | Main Index | Thread Index | Old Index