tech-kern: Re: ARC algorithm implementation

Subject: Re: ARC algorithm implementation
To: Kentaro A. Kurahone <kurahone@sigusr1.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 06/06/2005 15:09:04
On Mon, Jun 06, 2005 at 06:39:55PM +0000, Kentaro A. Kurahone wrote:
> On Mon, Jun 06, 2005 at 03:35:58PM +0200, Xavier Gu?rin wrote:
> > Hi all,
> > 
> > In the scope of the google summer code operation, I was looking for a  
> > subject related to OSes (and particulary kernels).
> > 
> > I have looked through your differents projects over the NetBSD  
> > kernel, and I would be really interrested by the ARC algorithm  
> > implementation.
> 
> Personally I've been more partial to CAR than ARC[0] by the same developers.
> Both are patent encumbered, and I do not think that it will be good to
> incorporate either into the NetBSD tree.  The Postgresql people have
> removed ARC from the database over this, and I think that the stance is
> correct[1].
> 
> That being said, there are plenty of algorithms that give better performance
> than LRU, that would be good alternatives to ARC/CAR[2].

I didn't realize Postgres had had to back out ARC for patent reasons
when I proposed the project.  Yuck.

The problem is that we have many separate LRUs in the kernel, each sized
either by specific magic or by some knob exposed with sysctl.  It is far
from clear how to size these individual object caches, each with its own
LRU, for globally optimal replacement.  The interesting thing about ARC,
or a generalized ARC-like algorithm, for this application, is that it is
in fact designed to size LRUs against one another.

Even just tackling the buffer cache vs. page cache issue -- the most
obvious place for improvement of this kind -- has a few obvious
initial problems to solve.  Here are two:

1) 	We don't currently track objects' (whether pages in the page
	cache, pool items in pools, etc) frequency of use, which is a
	prerequisite for many algorithms better than LRU.

2)	The cost of evicting an object from the cache, considered
	globally, isn't constant.  You can't evict one page of an
	8-page filesystem buffer, though you _can_ evict one page
	of file data from the page cache.  How do you take this into
	account when choosing an object to replace in the cache?

We currently "address" #2 in a rather dreadful way: the metadata cache
code is told how many pages it must free -- hopefully substantially
larger than "one buffer of minimal size" and it just frees buffers,
adding up the total, until it gets to the target.  That's some of
the ugliest code in the buffer cache implementation and it does not do
a very good job at maximizing global hit rate, either.

-- 
 Thor Lancelot Simon	                                      tls@rek.tjls.com

"The inconsistency is startling, though admittedly, if consistency is to be
 abandoned or transcended, there is no problem."		- Noam Chomsky