Subject: Re: loaning for read() of regular files
To: Chuck Silvers <chuq@chuq.com>
From: Stephan Uphoff <ups@tree.com>
List: tech-kern
Date: 02/15/2005 17:00:09
Hi Chuck,

looks great !

You may want to call pmap_remove for failed pmap_enter calls
(or remove PMAP_CANFAIL?) to prevent stale page table entries.

Doesn't look like an issue on i386 unless you try to read() on a COW
mapped device buffer. (If this is even possible ;-)
However it can theoretically be an issue on a SMP amd64 machines.
(And maybe other architectures)

Mhhh .. for SMP it could be beneficial just to call pmap_remove for the
whole range before the pmap_enter calls to cut down on the number of IPI
calls. (At least for multi page reads on i386)

Stephan


On Tue, 2005-02-15 at 12:36, Chuck Silvers wrote:
> hi folks,
> 
> I've been fiddling with some changes that implement loaning pages for read()
> of regular files for quite a while, and I figure it's time to share them.
> there's a diff attached which adds this generic support and changes FFS
> to use it.
> 
> the new interface is:
> 
>     int uvm_map_loanobj(struct vm_map *, struct uvm_object *, struct uio *);
> 
> this says to read (or potentially write) to (or from) the given map from
> (or to) the given object, using the addresses, lengths and offset given by
> the uio.  it returns 0 for success and an errno if we can't do it (such as
> because it's not aligned properly).  this call can partially succeed, in
> which case the uio is updated to indicate how far we got.  the expectation
> is that the caller will complete the transfer in the normal fashion if this
> operation does not transfer everything that was requested.  (currently I'm
> ignoring the error return, maybe this should just be void.)
> 
> the benchmark I used to evaluate this was sequential read()s of a large
> file that would fit completely in memory.  I used two different request
> sizes: 4KB (1 page on a pc, the minimum where loaning is possible) and 1MB
> (an arbitrary "large" size).  I ran each test case twice in a row, to see
> the effects for both cold and warm caches.  the results were:
> 
> 
> the current code:
> 
> # time pt -r -s 4096 -c 655360 /usr/file
> 2684354560 bytes transferred in 69.628 secs (38552690 bytes/sec)
> 0.226u 9.922s 1:09.63 14.5%     0+0k 43+0io 0pf+0w
> # time pt -r -s 4096 -c 655360 /usr/file
> 2684354560 bytes transferred in 8.011 secs (335078017 bytes/sec)
> 0.210u 7.801s 0:08.01 100.0%    0+0k 0+0io 0pf+0w
> 
> # time pt -r -s 1048576 -c 2560 /usr/file
> 2684354560 bytes transferred in 69.640 secs (38546036 bytes/sec)
> 0.000u 17.310s 1:09.64 24.8%    0+0k 41+0io 0pf+0w
> # time pt -r -s 1048576-c 2560 /usr/file
> 2684354560 bytes transferred in 13.311 secs (201652336 bytes/sec)
> 0.000u 13.315s 0:13.31 100.0%   0+0k 0+0io 0pf+0w
> 
> 
> the new code:
> 
> # time pt -r -s 4096 -c 655360 /usr/file
> 2684354560 bytes transferred in 69.661 secs (38534439 bytes/sec)
> 0.213u 3.316s 1:09.67 5.0%      0+0k 43+0io 0pf+0w
> # time pt -r -s 4096 -c 655360 /usr/file
> 2684354560 bytes transferred in 2.979 secs (900835767 bytes/sec)
> 0.170u 2.811s 0:02.98 100.0%    0+0k 0+0io 0pf+0w
> 
> # time pt -r -s 1048576 -c 2560 /usr/file
> 2684354560 bytes transferred in 69.710 secs (38506963 bytes/sec)
> 0.000u 1.504s 1:09.71 2.1%      0+0k 41+0io 0pf+0w
> # time pt -r -s 1048576 -c 2560 /usr/file
> 2684354560 bytes transferred in 0.974 secs (2755281003 bytes/sec)
> 0.000u 0.979s 0:00.97 100.0%    0+0k 0+0io 0pf+0w
> 
> 
> 
> comments?   if everyone is happy with this then I'll look into
> loaning for write() also.
> 
> -Chuck