tech-kern: Re: yamt-readahead branch

Subject: Re: yamt-readahead branch
To: Chuck Silvers <chuq@chuq.com>
From: Jonathan Stone <jonathan@Pescadero.dsg.stanford.edu>
List: tech-kern
Date: 11/15/2005 19:29:51

In message <20051116030253.GA9382@spathi.chuq.com>,
Chuck Silvers writes,
[...]

> - I see you have some XXX comments about tuning the amount of data to
>   read ahead based on physical memory, which is true, but we should also
>   tune based on the I/O throughput of the underlying device.  we want to
>   be able to keep any device 100% busy, ideally without the user needing
>   to configure this manually.  but we'll need some way to allow manual
>   per-device tuning as well.

No, in point of fact, for NFS, I/O throughput rate is not enough: one
really need to issue sufficient readahed to fill the bandwidth-delay product.

Heck, I have *severe* difficulties filling even gigabit Ethernet.
When affordable PCI-e 10GbE NICs meet with afforable PCI-e [*] eight-way
or 12-way SATA-I RAIDs, expect a large fuss to be made, if we still
have the status quo ante, viz., wholly inadequate readahead for NFS,

[*] Note well absence of any claims about "affordable 10GbE switches".


>and some comments on the implementation:
>
> - if this is going to be part of UVM, then we should use the MADV_* constants
>   rather than the POSIX_FADVISE_* ones.  or the other way around, but we
>   we should use the same ones for both read() and reading from mappings.

Absolutely.

> - please use a DPRINTF kind of macro instead of having a bunch of
>   "if defined(READAHEAD_DEBUG)".  the former is much easier on the eyes.
>
> - please add some comments to the code describing how it's supposed to work.

And maybe *where* (e.g., which scenarios) it's supposed to work for,
or not expected to work for?