Improving RAIDframe Parity Handling: The Diff

To: tech-kern%netbsd.org@localhost
Subject: Improving RAIDframe Parity Handling: The Diff
From: Jed Davis <jld%panix.com@localhost>
Date: Tue, 20 Oct 2009 16:53:59 -0400

My Google Summer of Code project this past summer was to make raid(4)
not need to check every single bit of parity on a mirror or RAID-[45]
set after an unclean shutdown.

The reason for the parity check is that a write operation might have
been in progress, and might have been committed to some disks but not
others, leaving inconsistent parity, at the time of the unclean
shutdown; thus, the solution is to keep better track of which parts of
the RAID might have been being written.

The approach I took, which is also used by other RAID implementations,
was to divide the set into a certain number of regions and keep, on
disk, a dirty bit per region; this bit is set before any write, but
cleared only if some amount of time has passed with no intervening
writes, in order to keep the I/O overhead low.

(See also my mentor's summary at
http://blog.netbsd.org/tnf/entry/summer_of_code_results_improving )

I have prepared a patch against HEAD as of a few days ago, available at
http://www.NetBSD.org/~jld/gsoc09-1017.diff ; the changes to the
raidctl(8) man page should explain things reasonably well, and if not
then that can be fixed.  I have done some testing, in particular to
determine reasonable default parameters, but given the particular
importance of the correctness of anything related to storage, it needs
more testing and ideally more eyeballs.

Note in particular that, with the patch, a parity map will be used by
default with any non-RAID-0 set.  As far as compatibility with
non-parity-map kernels, I spent a fair amount of time on this and it
should Do The Right Thing -- the parity map is in addition to the
existing global dirty bit which is maintained as before, and if the
previous kernel to touch the RAID was not parity-map enabled, a
parity-map kernel will detect this and disregard the parity map.

As mentioned above, I have done some benchmarking, mainly with the case
of untarring pkgsrc on async FFS (wapbl is about the same[*]) on a
RAID-1; thus, many small writes with mediocre locality, and not
dominated by small-stripe parity-RAID overhead.  I found that, with the
defaults settings I arrived at, the I/O overhead was small enough that
it couldn't be reliably measured, and the reduction in parity considered
dirty was... if I read my notes correctly, one such test on a 136GB
mirror had at most 1.2GB dirty at any given time.  Another test of 10
parallel pkgsrc-untarrings, went up to 9.8GB.

I was initially going to do another round of benchmarking and get some
harder numbers before posting this, but then school started eating my
life, and it's been delayed quite enough, I think.

Comments and questions (and testing) welcome.


[*] In particular, the journal should get hit enough to never be marked
clean as long as there's any I/O, and thus incur no ongoing overhead.
-- 
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))))))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))))))    '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))

Follow-Ups:
- re: Improving RAIDframe Parity Handling: The Diff
  - From: matthew green
- Re: Improving RAIDframe Parity Handling: The Diff
  - From: Matthias Scheler

Prev by Date: Re: virtualized nfsd (Re: virtual kernels, syscall routing, etc.)
Next by Date: Re: Kernel panic in "subr_xcall.c"
Previous by Thread: Re: Detecting available ports
Next by Thread: Re: Improving RAIDframe Parity Handling: The Diff
Indexes:

Home | Main Index | Thread Index | Old Index