Subject: Re: yamt-readahead branch
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 11/15/2005 19:02:53
hi,
thanks for taking this on, I'm glad someone got around to it.
I have some comments about the design:
- I don't think that having the read ahead state in struct file
is the way to go, for a couple reasons:
- it doesn't handle multi-threaded programs, nor groups of processes
that inherit file descriptors from a common parent.
- it doesn't handle doing read ahead from page-faults.
I think it would be much better to keep the read ahead state in the
vnode (or rather the genfs_node) and detect multiple read patterns
heuristically. in the common case, there's only one sequential pattern
so detecting that is really easy.
- as for the read ahead policy, we should have a sliding-window kind of
scheme, such that we can keep multiple disk I/Os pending in the disk driver
all the time (or at least until the application takes a breather).
ie. we shouldn't wait for the application to copy out all the data that
we've read ahead before starting more I/Os. (actually on second look,
I think you already do this, but it's hard to tell since you didn't
describe your algorithm at all.)
- there needs to be some feedback for scaling back the amount of read ahead
that we do in case memory gets low. otherwise we'll have cases where we
do a bunch of read ahead, but the memory is reclaimed for a different
purpose before the application catches up.
- I see you have some XXX comments about tuning the amount of data to
read ahead based on physical memory, which is true, but we should also
tune based on the I/O throughput of the underlying device. we want to
be able to keep any device 100% busy, ideally without the user needing
to configure this manually. but we'll need some way to allow manual
per-device tuning as well.
and some comments on the implementation:
- if this is going to be part of UVM, then we should use the MADV_* constants
rather than the POSIX_FADVISE_* ones. or the other way around, but we
we should use the same ones for both read() and reading from mappings.
- please use a DPRINTF kind of macro instead of having a bunch of
"if defined(READAHEAD_DEBUG)". the former is much easier on the eyes.
- please add some comments to the code describing how it's supposed to work.
-Chuck
On Tue, Nov 15, 2005 at 04:32:30PM +0900, YAMAMOTO Takashi wrote:
> hi,
>
> i made some read-ahead related changes on the branch.
> unless any serious problem is found, i'll merge it to trunk shortly.
> so please don't hesitate to review the changes.
>
> summary:
> - switch from contextless-readahead.
> - isolate read-ahead code so that you can implement
> and evalutate other algorithms more easily.
> - implement posix_fadvise.
>
> todo:
> - test posix_fadvise.
> - posix_fadvise manpage. can take from TOG one?
> - remove old genfs read-ahead code.
> - update vop_read_args comments for each filesystems.
>
> a quick benchmark (bonnie++ -n0) is attached.
> *1 and *2 are the same kernel. i did the same test twice.
> i'm wondering why "Rewrite" value has been changed.
>
> YAMAMOTO Takashi