Subject: Re: practical RAIDframe questions
To: Ben Collver <collver@peak.org>
From: Stephen Borrill <netbsd@precedence.co.uk>
List: netbsd-users
Date: 01/27/2006 15:56:23
On Thu, 26 Jan 2006, Ben Collver wrote:
> On Fri, Jan 27, 2006 at 12:28:03PM +1100, Simon Burge wrote:
>> I don't do this for performance, but for safety. It means that the
>> other filesystems are still fully mirrored until I can swap in a new
>> disk. I figure there's no point in degrading all filesystems on a
>> disk if you only have a single disk error.
>
> That makes sense. Have you experienced any disk errors yet? How did it
> go?
I've had numerous disk errors (see my various other postings for more
detail). In fact, virtually every disk I've ever installed in RAID 1 array
has failed in some way. The problem is that a bad sector leaves you with a
failed component (after 5 retries). This component is then not used again
for reading or writing; the latter being the method which will swap out a
spare sector for the bad one. A second bad sector on the other
component leaves with a machine which just panics. Luckily with 2.1 and
later, the last component won't get failed and so this problem is fixed.
I spent a lot of time trying things like:
dd if=/dev/zero of=/dev/rwd0e seek=123456
(where the seek value is the once from the dmesg), but this didn't help.
The same sectors remained bad and gave I/O errors. Then I read about the
dkctl command which is crucial to solving the problem. The kernel caches
the bad sectors and so when you try to write zeros to for a sector swap it
doesn't get as far as the disk. "dkctl wd0 badsector flush" wipes out the
cached copy and then the dd works fine. Since that time, I've managed to
fix up a number of disks without to swap them out.
--
Stephen