Subject: Re: RAID, ccd, and vinum.
To: Richard Rauch <rkr@olib.org>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 12/21/2004 08:58:10
Richard Rauch writes:
> On Mon, Dec 20, 2004 at 09:02:27PM -0600, Greg Oster wrote:
> > Richard Rauch writes:
[snip]
> Well, the NFS server has less than that, so 300MB forces the server
> to move bits on and off disk.
>
> The client (where I do most editing) has 512MB. Even that isn't
> always enough, though. When it isn't, it needs to talk to the server.
> And that happens at ethernet (100Mbit/sec) speeds, so...if the disk can
> do sustained rates of 10MB or so per second on large files (bonnie++
> indicates rather more than that), I'm going to wind up waiting on the
> network anyway.
>
> Short of upgrading to gigabit, or trying to juggle 2+ NICs for one
> NFS mount (can that be done?) on both ends, that's where it mostly
> ends.
I'm not sure... would be interesing to know the answer tho.
[snip]
> > > Here is one result, with ccd:
> > >
> > > ccd0 (both wd1, wd2; softdep; interleave 32; 1 cable)
> > [snip]
> > > Here's a run with a somewhat larger stripe size:
> > >
> > > ccd0 (both wd1, wd2; softdep; interleave 1024; 1 cable)
> > [snip]
> >
> > 1 cable means two (both?) drives on the same cable?
>
> Yes. It was annotation that I tacked on while I was doing the tests.
> It's a little terse. In that case, I was curious how one cable
> would differ from two if the disks were being used in tandem. I had
> run a number of single-cable tests with ccd. I don't think that I
> bothered comparing RAID with just 1 cable.
>
>
> [...]
> > For further giggles, try the benchmarks using just a single disk...
>
> Did that one, too. Here's a sample:
>
> wd1a, softdep mount
[snip]
> Some things are signiicantly lower. Mostly it's about the same.
Ya... one would hope that RAID 0/ccd would be quite a bit faster
than just a single spindle.
> (I also have a comparison-only bonnie++ run on the 4GB IBM
> drive that that machine uses for /. (^&)
>
>
>
> [...]
> > > Well, yes. Let me put it another way:
> > >
> > > What is some of the overhead that makes RAID 0 perform significantly
> > > slower (to my estimation) at seeking in these tests?
> >
> > Even for RAID 0, RAIDframe constructs a directed, acyclic graph to
> > describe the IO operation. It then traverses this graph from single
> > source to single sink, "firing" nodes along the way. And while this
> > provides a very general way of describing disk IO, all that graph
> > creation, traversing, and teardown does take some time.
>
> Interesting... I'm surprised that when seeks are running on the order
> of 100 to 250 per second, there is that much work for an 800MHz Athlon.
>
>
>
> > For 2 disks in a RAID 0 config, try a stripe width of 64. If the
> > filesystem is going to have large files on it, a block/frag setting
> > of 65536/8192 might yield quite good performance.
>
> Thanks. I'll give it a spin. (^&
>
> I had actually tried at 64 stripe size. In fact, reviewing, it seems
> that one of those performed fairly close to ccd for seeks (a little over
> 220 seeks/sec) and otherwise was about as good as I was going to get:
>
> raid 0 (ffs; softdeps; 2 cables; 64 stripe)
> Version 1.03 ------Sequential Output------ --Sequential Input- --Rando
> m-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks
> --
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %
> CP
> cyclopes 300M 35343 89 39401 36 15060 16 27032 93 78939 54 223.5
> 2
> ------Sequential Create------ --------Random Create------
> --
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete
> --
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %
> CP
> 16 1176 97 +++++ +++ 14202 99 1171 95 959 99 3120
> 93
> cyclopes,300M,35343,89,39401,36,15060,16,27032,93,78939,54,223.5,2,16,1176,97
> ,+++++,+++,14202,99,1171,95,959,99,3120,93
Ya.. I suspect it'll take a bunch of work to get faster than this...
> I didn't fool with the disklabel much. Maybe I should have. I did try
> telling newfs to use different block-sizes. Here's a modification of the
> above, with newfs blocksize of 64K:
>
> raid 0 (same as above, but newfs block-size of 64K)
> Version 1.03 ------Sequential Output------ --Sequential Input- --Rando
> m-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks
> --
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %
> CP
> cyclopes 300M 36782 89 38826 35 17878 20 27493 94 84635 56 136.1
> 2
> ------Sequential Create------ --------Random Create------
> --
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete
> --
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %
> CP
> 16 1215 98 +++++ +++ 12901 99 1197 98 959 99 3274
> 94
> cyclopes,300M,36782,89,38826,35,17878,20,27493,94,84635,56,136.1,2,16,1215,98
> ,+++++,+++,12901,99,1197,98,959,99,3274,94
Where was the start of the newfs'ed partition? I.e. did it start at
block 0 or block 63 or ??? in the disklabel? If it didn't start at a
multiple of the stripe width (64), then move it to 0 or a multiple of
the stripe width.
> Then, again, here's the current config over NFS (the end-result that really
> affects me):
>
> raid 0 (raid stripe of 64; softdep; normal newfs; NFS-mounted)
> Version 1.03 ------Sequential Output------ --Sequential Input- --Rando
> m-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks
> --
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %
> CP
> socrates 600M 11206 8 11199 1 3513 1 11355 16 11370 1 116.0
> 0
> ------Sequential Create------ --------Random Create------
> --
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete
> --
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %
> CP
> 16 480 0 5581 6 3550 4 484 0 5880 5 1484
> 2
> socrates,600M,11206,8,11199,1,3513,1,11355,16,11370,1,116.0,0,16,480,0,5581,6
> ,3550,4,484,0,5880,5,1484,2
If you're going to be limited by network, "make it as fast as you can
without too much effort", and then not worry about it :)
Later...
Greg Oster