Subject: Re: RAID, ccd, and vinum.
To: Richard Rauch <rkr@olib.org>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 12/21/2004 08:58:10
Richard Rauch writes:
> On Mon, Dec 20, 2004 at 09:02:27PM -0600, Greg Oster wrote:
> > Richard Rauch writes:
[snip] 
> Well, the NFS server has less than that, so 300MB forces the server
> to move bits on and off disk.
> 
> The client (where I do most editing) has 512MB.  Even that isn't
> always enough, though.  When it isn't, it needs to talk to the server.
> And that happens at ethernet (100Mbit/sec) speeds, so...if the disk can
> do sustained rates of 10MB or so per second on large files (bonnie++
> indicates rather more than that), I'm going to wind up waiting on the
> network anyway.
> 
> Short of upgrading to gigabit, or trying to juggle 2+ NICs for one
> NFS mount (can that be done?) on both ends, that's where it mostly
> ends.

I'm not sure... would be interesing to know the answer tho.

[snip]
> > > Here is one result, with ccd:
> > > 
> > > ccd0 (both wd1, wd2; softdep; interleave 32; 1 cable)
> > [snip] 
> > > Here's a run with a somewhat larger stripe size:
> > > 
> > > ccd0 (both wd1, wd2; softdep; interleave 1024; 1 cable)
> > [snip]
> > 
> > 1 cable means two (both?) drives on the same cable?
> 
> Yes.  It was annotation that I tacked on while I was doing the tests.
> It's a little terse.  In that case, I was curious how one cable
> would differ from two if the disks were being used in tandem.  I had
> run a number of single-cable tests with ccd.  I don't think that I
> bothered comparing RAID with just 1 cable.
> 
> 
>  [...]
> > For further giggles, try the benchmarks using just a single disk... 
> 
> Did that one, too.  Here's a sample:
> 
> wd1a, softdep mount
[snip]
> Some things are signiicantly lower.  Mostly it's about the same.

Ya...  one would hope that RAID 0/ccd would be quite a bit faster 
than just a single spindle.

> (I also have a comparison-only bonnie++ run on the 4GB IBM
> drive that that machine uses for /.  (^&)
> 
> 
> 
>  [...]
> > > Well, yes.  Let me put it another way:
> > > 
> > > What is some of the overhead that makes RAID 0 perform significantly
> > > slower (to my estimation) at seeking in these tests?
> > 
> > Even for RAID 0, RAIDframe constructs a directed, acyclic graph to 
> > describe the IO operation.  It then traverses this graph from single 
> > source to single sink, "firing" nodes along the way.  And while this 
> > provides a very general way of describing disk IO, all that graph 
> > creation, traversing, and teardown does take some time.
> 
> Interesting...  I'm surprised that when seeks are running on the order
> of 100 to 250 per second, there is that much work for an 800MHz Athlon.
> 
> 
> 
> > For 2 disks in a RAID 0 config, try a stripe width of 64.  If the 
> > filesystem is going to have large files on it, a block/frag setting 
> > of 65536/8192 might yield quite good performance.
> 
> Thanks.  I'll give it a spin.  (^&
> 
> I had actually tried at 64 stripe size.  In fact, reviewing, it seems
> that one of those performed fairly close to ccd for seeks (a little over
> 220 seeks/sec) and otherwise was about as good as I was going to get:
> 
> raid 0 (ffs; softdeps; 2 cables; 64 stripe)      
> Version  1.03       ------Sequential Output------ --Sequential Input- --Rando
> m-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks
> --
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %
> CP
> cyclopes       300M 35343  89 39401  36 15060  16 27032  93 78939  54 223.5  
>  2
>                     ------Sequential Create------ --------Random Create------
> --
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete
> --
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %
> CP
>                  16  1176  97 +++++ +++ 14202  99  1171  95   959  99  3120  
> 93
> cyclopes,300M,35343,89,39401,36,15060,16,27032,93,78939,54,223.5,2,16,1176,97
> ,+++++,+++,14202,99,1171,95,959,99,3120,93

Ya.. I suspect it'll take a bunch of work to get faster than this... 

> I didn't fool with the disklabel much.  Maybe I should have.  I did try
> telling newfs to use different block-sizes.  Here's a modification of the
> above, with newfs blocksize of 64K:
> 
> raid 0 (same as above, but newfs block-size of 64K)
> Version  1.03       ------Sequential Output------ --Sequential Input- --Rando
> m-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks
> --
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %
> CP
> cyclopes       300M 36782  89 38826  35 17878  20 27493  94 84635  56 136.1  
>  2
>                     ------Sequential Create------ --------Random Create------
> --
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete
> --
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %
> CP
>                  16  1215  98 +++++ +++ 12901  99  1197  98   959  99  3274  
> 94
> cyclopes,300M,36782,89,38826,35,17878,20,27493,94,84635,56,136.1,2,16,1215,98
> ,+++++,+++,12901,99,1197,98,959,99,3274,94

Where was the start of the newfs'ed partition?  I.e. did it start at 
block 0 or block 63 or ??? in the disklabel?  If it didn't start at a 
multiple of the stripe width (64), then move it to 0 or a multiple of 
the stripe width.

> Then, again, here's the current config over NFS (the end-result that really
> affects me):
> 
> raid 0 (raid stripe of 64; softdep; normal newfs; NFS-mounted)
> Version  1.03       ------Sequential Output------ --Sequential Input- --Rando
> m-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks
> --
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %
> CP
> socrates       600M 11206   8 11199   1  3513   1 11355  16 11370   1 116.0  
>  0
>                     ------Sequential Create------ --------Random Create------
> --
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete
> --
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %
> CP
>                  16   480   0  5581   6  3550   4   484   0  5880   5  1484  
>  2
> socrates,600M,11206,8,11199,1,3513,1,11355,16,11370,1,116.0,0,16,480,0,5581,6
> ,3550,4,484,0,5880,5,1484,2

If you're going to be limited by network, "make it as fast as you can 
without too much effort", and then not worry about it :)

Later...

Greg Oster