Subject: Re: Why did my strip array slow down?
To: Byron Servies <bservies@pacang.com>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-users
Date: 11/08/2002 10:08:04
Byron Servies writes:
> Hi there!
> 
> I am having a serious performance problem with a raidframe
> device.  I was certain I had seen something like this on the list
> before, but failed to find appropriate threads while searching the
> netbsd site.
> 
> The problem, I believe, is with the raidframe stripe device
> (configuration and disklabel information below).  I had hoped
> that the raidframe geometry problem was in effect, but that does
> not appear that this is the case unless I have mis-read the
> PR.  Initially, transfer to the array was fine, but at some
> point after it passed through 40% full throughput dropped
> dramatically (some dumpfs, df info below)

If you run a benchmark (e.g pkgsrc/benchmarks/bonnie) on the RAID set, 
what sort of performance do you see? 

> When I open a new ftp connection to my netbsd server and send
> a file to the raidframe device, transfer begins and then 
> immediately stalls for 10-20 seconds before continuing at a
> reduced but relatively normal rate.  

If you monitor the progress of the benchmark (e.g. with 'systat iostat', 
do you see those pauses as well?

> Using a non-raidframe device, transfer is very fast, as expected.

Hmm.

> My NIC is in full duplex mode and while I am receiving
> CRC errors occasionally from the tlp driver, I do not
> see them during ftp transfers. 

Hmm... do the NIC and the IDE controller share the same IRQ?
Are the drives on different IDE channels?
Are the drives reporting any sort of read/write errors?
Any sort of power-management on the drives?

> I am running kernel 1.6
> beta 2 on a 1.5.2 base; I was to upgrade to 1.6 this
> weekend, but I needed to complete this other operation
> (backup of another machine) to the raid device first.
> 
> If anyone has a pointers on what might be wrong or where
> I should be looking, I would appreciate it.
> 
> Byron
> 
> -- raid0.conf
> 
> START array
> # numRow numCol numSpare
> 1 2 0
> 
> START disks
> /dev/wd1h
> /dev/wd2h
> 
> START layout
> # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
> 63 1 1 0

Is this really 63?  63 might help scatter directory bits better, but 
I'm not sure it would be better than 64 for general performance.
(For a 32K write, for example, you'll be putting (at most) 31.5K on one
disk, and then have to do a separate IO for the remaining 0.5K to the other 
disk...  With an 8K write, even that could end up being split over both disks,
which could be slightly slower.   I'm not sure that this is the cause of the 
performance problems, but it probably isn't helping anything :( )

> START queue
> fifo 100
[snip]

Later...

Greg Oster