Subject: Re: parity check with root on raid
To: None <netbsd-help@netbsd.org>
From: =?ISO-8859-1?Q?Ari_Sovij=E4rvi?= <apz-list@2304.org>
List: netbsd-help
Date: 04/21/2005 17:18:38
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--1690437087-897405116-1114093118=:12179
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

> on a current i386 system I have almost all filesystems (incl. /) on a
> RAIDframe (level 1). The RAID set is automatically configured.
> I don't understand why the rc.d raidframeparity check is done so late:

I've been wondering the same thing, as Linux for example initiates the rebuild 
right after the autoconfiguring arrays have been detected and assembled.

> Shouldn't parity be checked (and possibly be rewritten) before filesystems
> are checked and mounted?

AFAIK the parity check is transparent, so once you initate it, you can go on 
checking the disk and making modifications to it. I took a look at your patch, 
and I'd leave the ") &" line as it was.

Running FSCK on a RAIDframe array before parity check seems to actually corrupt 
the array. Here's what happened to me:

The machine crashed under heavy load (updating pkgsrc & compiling something). 
When it came up again, FSCK found lots of problems that got fixed 
(which raises another question, shouldn't soft depensies prevent this?). After 
FSCK RAIDframe parity was rewritten. After I logged in, I immediately rebooted 
the machine and surprise, FSCK found a whole new set of trouble.

I thought about this for a while and came up with a theory, that the content on 
the disk were different at the time of FSCK. Maybe some data got written on one 
disk before the crash so that the actualy filesystem metadata didn't match any 
more. So when FSCK was checking the array, it fixed (or left unfixed) something 
that wasn't the same on both disks. And when RAIDframe rebuilt the array, 
neither of the disks had perfectly checked and fixed file system.

I let RAIDframe to finish rebuilding and rebooted again. This time FSCK came up 
clean. I found some corrupted files in the lost+found, all from /usr/pkgsrc. 
So, no precious data was lost and the situation was recovered by regetting 
pkgsrc, but this left me thinking maybe the parity rebuild should be initiated 
before FSCK.

-- 
Ari Sovijärvi
http://apz.fi/
--1690437087-897405116-1114093118=:12179--