Re: raw/block device disc troughput

To: tech-kern%netbsd.org@localhost
Subject: Re: raw/block device disc troughput
From: Thor Lancelot Simon <tls%panix.com@localhost>
Date: Thu, 24 May 2012 13:00:45 -0400

On Thu, May 24, 2012 at 06:26:45PM +0200, Edgar Fu? wrote:
> It seems that I have to update my understanding of raw and block devices
> for discs.
> 
> Using a (non-recent) 6.0_BETA INSTALL kernel and an ST9146853SS 15k SAS disc
> behind an LSI SAS 1068E (i.e. mpt(4)), I did a
>       dd if=/dev/zero od=/dev/[r]sd0b bs=nn, count=xxx.
> For the raw device, the troughput dramatically increased with the block size:
>       Block size              16k     64k     256k    1M
>       Troughput (MByte/s)     4       15      49      112
> For the block device, throughput was around 81MByte/s independent of block 
> size.
> 
> This surprised me in two ways:
> 1. I would have expected the raw device to outperform the block devices
>    with not too small block sizes.

The block device will cause readahead at the OS layer.  Since you are
accessing the disk sequentially, this will have a significant effect --
evidently greater than the overhead caused by memory allocation in the
cache layer under the block device.

I suspect that if you double-buffered at the client application layer
this effect might disappear, since the drive itself will already readahead,
and if we can present it with enough requests at once, it should return
the results simultaneously.  Plain dd on the raw device will not do that
since it waits for every read to complete before issuing another, thus
increasing latency and reducing the number of transactions the drive can
effectively overlap.

> 2. I would have expected inceasing the block size above MAXPHYS not
>    improving the performance.

The increase is again tied to less latency in the synchronous dd read-write
loop.  The kernel breaks the large request down to many MAXPHYS sized ones
and dispatches each in turn.  I can't remember whether it is really
asynchronous or whether it waits for each request to complete before issuing
the next; if the former, it's effectively double- buffering for you.

> 
> I then build a RAID 1 with SectorsPerSU=128 (e.g. a 64k stripe size) on two
> of these discs, and, after the parity initialisation was complete, wrote
> to [r]raid0b.
> On the raw device, throghput ranged from 4MByte/s to 97MByte/s depending on 
> bs.

What does read performance look like?  I would be particularly interested to
know what it looks like if you use a tool like "buffer" or "ddd" that
double-buffers the I/O for you.  It should be roughly twice the single-disk
rate or something is wrong with RAIDframe (or, at least, suboptimal).

> On the block device, it was always 3MByte/s. Furthermore, dd's WCHAN was
> "vnode" for the whole run. Why is that so and why is throughput so low?

I would guess locking, or, somehow, extra buffering.  It may be waiting on
the lock for the vnode for the block device?

-- 
Thor Lancelot Simon                                          
tls%panix.com@localhost
  "The liberties...lose much of their value whenever those who have greater
   private means are permitted to use their advantages to control the course
   of public debate."                                   -John Rawls

Follow-Ups:
- Re: raw/block device disc troughput
  - From: Edgar Fuß
- Re: raw/block device disc troughput
  - From: Mouse

References:
- raw/block device disc troughput
  - From: Edgar Fuß

Prev by Date: Re: raw/block device disc troughput
Next by Date: Re: raw/block device disc troughput
Previous by Thread: Re: raw/block device disc troughput
Next by Thread: Re: raw/block device disc troughput
Indexes:

Home | Main Index | Thread Index | Old Index