tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: raw/block device disc troughput



On Thu, 24 May 2012, Edgar Fuß wrote:

> It seems that I have to update my understanding of raw and block devices
> for discs.
> 
> Using a (non-recent) 6.0_BETA INSTALL kernel and an ST9146853SS 15k SAS disc
> behind an LSI SAS 1068E (i.e. mpt(4)), I did a
>       dd if=/dev/zero od=/dev/[r]sd0b bs=nn, count=xxx.

What's "od="?

> For the raw device, the troughput dramatically increased with the block size:
>       Block size              16k     64k     256k    1M
>       Troughput (MByte/s)     4       15      49      112
> For the block device, throughput was around 81MByte/s independent of block 
> size.
> 
> This surprised me in two ways:
> 1. I would have expected the raw device to outperform the block devices
>    with not too small block sizes.
> 2. I would have expected inceasing the block size above MAXPHYS not
>    improving the performance.
> 
> So obviously, my understanding is wrong.

Not awfully surprizing given your setup.  Keep in mind mpt uese a rather 
inefficient communication protocol and does tagged queuing.  The former 
means the overhead for each command is not so good, but the latter means 
it can keep lots of commands in the air at the same time. 

> I then build a RAID 1 with SectorsPerSU=128 (e.g. a 64k stripe size) on two
> of these discs, and, after the parity initialisation was complete, wrote
> to [r]raid0b.
> On the raw device, throghput ranged from 4MByte/s to 97MByte/s depending on 
> bs.
> On the block device, it was always 3MByte/s. Furthermore, dd's WCHAN was
> "vnode" for the whole run. Why is that so and why is throughput so low?

Now you're just complicating things 8^).

Let's see, RAID 1 is striping.  That means all operations are broken at 
64K boundaries so they can be sent to different disks.  And split 
operations need to wait for all the devices to complete before the master 
operation can be completed.  I expect you would probably get some rather 
unusual non-linear behavior in this sort of setup.  

Eduardo


Home | Main Index | Thread Index | Old Index