Subject: Re: bugs/features and using UVM for directory buffers
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 01/10/2006 22:02:24
On Tue, Jan 10, 2006 at 02:43:49PM +0100, Reinoud Zandijk wrote:
> Dear folks,
> 
> On Tue, Jan 10, 2006 at 12:53:12AM +0100, Reinoud Zandijk wrote:
> > for a long time but then it gets ugly. For unknown reasons buf's containing 
> > directory information are then constantly recycled and buffers just touched 
> > a transaction before seem to be lost again and directories have to be 
> > reread and reread and reread over and over again.
> 
> now that was a silly bug of me. I brelse()'d directory buf's with B_AGE. 
> Who ever though of naming B_AGE the flag that would mean pushing the buffer 
> on the freelist.... *sigh* No wonder that when buffers got tight they got 
> recycled first. FFS offcource did push stuff on the LRU list and the 
> freelist gets eaten first :-/

actually, the buffer cache system has two free lists, one called "LRU"
and the other called "AGE".  (both named being pretty poor, I'd say.)
there's a third list too, the "LOCKED" list, which contains buffers
that are also "free" but will not be reused or flushed by the buffer
cache system.  this is somewhat similar to "wired" pages in the VM system
(though wired pages can still be written to backing store).


> Revisiting my UDF directory buffer reading code and figuring out how to 
> make it smarter by reading ahead directory contents and such i wondered if 
> UVM would do a better job in handling the buffers. A small patch to genfs 
> did the trick and i have to say it works like a charm complete with 
> directory read-ahead etc.
> 
> Since its doesn't change existing behaviour and was only added as a 
> `sanity' check for existing usage, i'd like to commit the genfs patch 
> preferably after Takashi posts his genfs patch. Its only a replacement of 
> (vp->v_type == VREG) by (vp->v_type == VREG || bp->b_type == VDIR).

this is fine in itself, but there are additional complications with
using pagecache pages for metadata.  when accessing these pages you'll
need to either:

 (a) keep the pages busy while accessing them (eg. use the UBC_FAULTBUSY
     flag to ubc_alloc()
or
 (b) use only page-fault-tolerant mechanisms like kcopy() or uiomove()
     to access the pages via their kernel mappings.

the first choice would be dangerous if the vnodes could be mapped
(ie. it might cause a deadlock depending on what else you were doing),
but since directories cannot be mmap()d it should be ok to use it for those.

-Chuck