tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Kernel VS application file caching



On Jan 21, 2010, at 4:16 PM, Sad Clouds wrote:

> On Thursday 21 January 2010 14:52:04 Steven Bellovin wrote:
>> On Jan 21, 2010, at 9:25 AM, Sad Clouds wrote:
>>> As far as I know Unix kernels will transparently cache files into any
>>> available memory to speed up future I/O on those files.
>>> 
>>> For applications like Internet servers, which serve many static files
>>> from disk, is there any point in implementing file caching at application
>>> level? It seems like you would end up with 2 copies of the same data -
>>> one copy cached by kernel, another copy cached by application.
>> 
>> To avoid kernel-to-userland copies?
>> 
>> What is your real performance limit?  CPU?  RAM?  I/O bandwidth?  Network
>> bandwidth?
> 
> Well the idea is to keep frequently accessed data in RAM. 

Let me repeat my question slightly differently: why do you think that will help 
performance?

Let me give a real-world example.  Between a NetBSD laptop and a NetBSD 
desktop, connected via gigE, I can run 'ttcp -s' at (if I recall correctly) 
700M bps.  I can upload data to my office at ~2.5M bps; I can download at about 
13M bps.  In other words, when I go in or out of my house, the network is by 
far the limiting performance factor.  For anything but a floppy drive, it 
doesn't really matter how fast my disk is or how much caching happens; I can't 
ship data faster than the network.

On the other hand, I have a machine in a colo with several hundred Mbps links 
to the outside.  On that machine, file system performance might matter.

In the abstract, you're quite correct that caching strategies matter.  The 
kernel does caching because of those applications.  But for specific workloads 
-- like sending files to the Internet -- the bottleneck might be something 
else, like your link or the round-trip time to your clients.

It's always very sound advice to build, measure, optimize.  Study after study 
has shown that programmer guesses about what needs optimizing are almost always 
wrong.

My advice is to build your application, but modularize it in such a way that 
you can easily plug in an application-level cache if measurements show that 
that's the problem.


                --Steve Bellovin, http://www.cs.columbia.edu/~smb







Home | Main Index | Thread Index | Old Index