tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

disklabel on RAID disappeared

Help, there's something weird going on on our fileserver!
I'm on vacation and had a colleague do this over the phone.
Please CC me in replies because I don't have access to my regular mail.

raid0 is level 1, sd0a sd1a
raid1 is level 5, sd2a .. sd9a
sd0/1 are scsibus0, targets 0/1
sd2..9 are scsibus1, targets 0..8

The machine paniced
After reboot, parity rewrite on raid0 succeded and failed on raid1 because
of a read error on sd2a.
He did scsictl stop sd2, scsictl detach scsibus1 0 0, replaced sd2,
scsictl scan scsibus1 0 0. Something strange must have happened and sd2
was async.
He nevertheless started the reconstruction (raidctl -R sd2a raid1), but
raidctl -S estimated 24 hours.
I asked him to stop the reconstruction, but neither failing sd2 nor
detaching scsibus1 0 0 stopped it. Shortly after, the machine paniced again.
It came up with raid0 and raid1 configured correctly, but fsck raid1a railed.
We now have no disklabel on raid1 (disklabel -r says something about not
being able to read it and disklabel without -r shows the fabricated one).
Since fsck raid1a said somethin like "incorrect fs size". I assume the
superblock of raid1a is still there, only the disklabel is broken.

Any hints? He is currently running the reconstruction and we'll see
whether the disklabel returns or what happens if we re-write it from
the backup we have in /var.
Is there a sane way to stop an on-going reconstruction?
May trying to stop it have corrupted the raid1 contents?

Home | Main Index | Thread Index | Old Index