Subject: Re: practical RAIDframe questions
To: Ben Collver <collver@peak.org>
From: Stephen Borrill <netbsd@precedence.co.uk>
List: netbsd-users
Date: 01/27/2006 15:56:23
On Thu, 26 Jan 2006, Ben Collver wrote:
> On Fri, Jan 27, 2006 at 12:28:03PM +1100, Simon Burge wrote:
>> I don't do this for performance, but for safety.  It means that the
>> other filesystems are still fully mirrored until I can swap in a new
>> disk.  I figure there's no point in degrading all filesystems on a
>> disk if you only have a single disk error.
>
> That makes sense.  Have you experienced any disk errors yet?  How did it
> go?

I've had numerous disk errors (see my various other postings for more 
detail). In fact, virtually every disk I've ever installed in RAID 1 array 
has failed in some way. The problem is that a bad sector leaves you with a 
failed component (after 5 retries). This component is then not used again 
for reading or writing; the latter being the method which will swap out a 
spare sector for the bad one. A second bad sector on the other 
component leaves with a machine which just panics. Luckily with 2.1 and 
later, the last component won't get failed and so this problem is fixed.

I spent a lot of time trying things like:
dd if=/dev/zero of=/dev/rwd0e seek=123456

(where the seek value is the once from the dmesg), but this didn't help. 
The same sectors remained bad and gave I/O errors. Then I read about the 
dkctl command which is crucial to solving the problem. The kernel caches 
the bad sectors and so when you try to write zeros to for a sector swap it 
doesn't get as far as the disk. "dkctl wd0 badsector flush" wipes out the 
cached copy and then the dd works fine. Since that time, I've managed to 
fix up a number of disks without to swap them out.

-- 
Stephen