Subject: Re: yamt-readahead branch
To: Chuck Silvers <chuq@chuq.com>
From: Jonathan Stone <jonathan@Pescadero.dsg.stanford.edu>
List: tech-kern
Date: 11/15/2005 19:29:51
In message <20051116030253.GA9382@spathi.chuq.com>,
Chuck Silvers writes,
[...]
> - I see you have some XXX comments about tuning the amount of data to
> read ahead based on physical memory, which is true, but we should also
> tune based on the I/O throughput of the underlying device. we want to
> be able to keep any device 100% busy, ideally without the user needing
> to configure this manually. but we'll need some way to allow manual
> per-device tuning as well.
No, in point of fact, for NFS, I/O throughput rate is not enough: one
really need to issue sufficient readahed to fill the bandwidth-delay product.
Heck, I have *severe* difficulties filling even gigabit Ethernet.
When affordable PCI-e 10GbE NICs meet with afforable PCI-e [*] eight-way
or 12-way SATA-I RAIDs, expect a large fuss to be made, if we still
have the status quo ante, viz., wholly inadequate readahead for NFS,
[*] Note well absence of any claims about "affordable 10GbE switches".
>and some comments on the implementation:
>
> - if this is going to be part of UVM, then we should use the MADV_* constants
> rather than the POSIX_FADVISE_* ones. or the other way around, but we
> we should use the same ones for both read() and reading from mappings.
Absolutely.
> - please use a DPRINTF kind of macro instead of having a bunch of
> "if defined(READAHEAD_DEBUG)". the former is much easier on the eyes.
>
> - please add some comments to the code describing how it's supposed to work.
And maybe *where* (e.g., which scenarios) it's supposed to work for,
or not expected to work for?