Subject: Re: raidframe consumes cpu like a terminally addicted
To: Robert Elz <kre@munnari.OZ.AU>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-users
Date: 04/30/2001 09:31:49
Robert Elz writes:
> Date: Mon, 30 Apr 2001 00:46:02 +0200
> From: Matthias Buelow <mkb@mukappabeta.de>
> Message-ID: <20010430004602.A1934@altair.mayn.de>
>
> Greg, I'm not sure if you read netbsd-users,
Ya.. I try to :)
> so I have added you
> explicitly, I think you are the one person who knows enough about
> raidframe to answer this quickly...
>
> | and that's the label of the raid5 set on top of them:
> |
> | # /dev/rraid1d:
>
> | bytes/sector: 512
> | sectors/track: 256
> | tracks/cylinder: 1
> | sectors/cylinder: 256
> | cylinders: 139970
>
> Yes, that's certainly not a good layout. If you don't have the kernel
> patch mentioned on the list, then allocating new blocks is (sometimes)
> likely to be very slow (CPU intensive).
This is the problem that Herb mentioned...
> You'll also most likely find that you're wasting more filesys space in
> overheads than you would really like - I'll bet that when you newfs'd
> that filesys it was printing "alternate superblock numbers" until you
> never thought they would stop... That's because this layout will
> cause way too many cylinder groups (with all their headers, etc) with
> way too few blocks in them.
>
> But I doubt this is the cause of your ls -l problems.
>
> | During the test (the ls -l on a directory with ~1000 entries) I have
> | looked on disk i/o with systat iostat and there was almost nothing
> | (most of it was in the buffer cache anyways) so insufficient bandwidth
> | shouldn't be the problem in that case, imho.
>
> I wonder if perhaps raid5 is requiring that the parity be recomputed
> (checked) every time the blocks are accessed.
No, it doesn't.
> Most likely the 1000
> entries will be overflowing the vnode cache, meaning each stat() will
> require the inode to be fetched from the buffer cache (and of course,
> certain the first time they're referenced). 1000 inodes from one directory
> isn't much of a buffer cache load though, so quite likely all the inode
> blocks are in memory - hence little or no actual I/O.
>
> But if doing a read from the buffer cache requires the raid code to
> validate the raid5 parity each time, then there's likely to be a lot of
> CPU overhead involved there (1000 times 3 times 2K word reads (ie, 6M RAM
> accesses), plus the computations).
and if it was validating the parity each time, yes, it would kill things...
> However, I am speculating, Greg will know if anything like that might
> possibly be causing high CPU usage while doing an "ls -l" on a raid5
> directory containing lots of files (which doesn't happen doing a similar
> access on a non-raid filesys).
A couple of other things:
Matthias Buelow writes (in a previous message):
> START layout
> # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
> 32 1 1 5
That's only 16K per component in this case, which is probably too small
for the best performance... 32K per component may perform better.
Also: do a 'time ls -l > /dev/null' on the offending directory, and a
'time ls -l > /dev/null' on the "single disk of the same type". I'm
interested in seeing what those say...
merlin# cat * > /dev/null
merlin# time ls -l > /dev/null
0.4u 0.4s 0:01.13 76.9% 0+0k 53+4io 41pf+0w
merlin# time ls -l > /dev/null
0.4u 0.4s 0:00.94 93.6% 0+0k 0+4io 0pf+0w
merlin# time ls -l > /dev/null
0.4u 0.4s 0:00.94 94.6% 0+0k 0+0io 0pf+0w
merlin# ls -1 | wc
21613 21613 231666
merlin# du -sk .
252408 .
merlin#
(RAID 5 over 3 IDE disks, stripe width of 64, disklabel that looks (in part)
like:
bytes/sector: 512
sectors/track: 128
tracks/cylinder: 32
sectors/cylinder: 4096
cylinders: 43976
i386 box running 1.5.1_BETA as of Apr. 7)
Later...
Greg Oster