Subject: Re: swapfs and uvm.
To: Simon Burge <simonb@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/15/2000 22:50:37
On Wed, Mar 15, 2000 at 10:10:14PM +1100, Simon Burge wrote:
> "R. C. Dowdeswell" wrote:
> 
> > On 953096532 seconds since the Beginning of the UNIX epoch
> > Chuck Silvers wrote:
> > >
> > >the FFS disk layout is actually pretty decent for use in a swapfs, except that
> > >FFS has no notion of being able to grow the underlying device (a swap-backed
> > >virtual device in this case) when it runs out of space.  one approach would
> > >be to enhance FFS this way (and also to call some device hook when it
> > >frees a block), and then you'd get swapfs for a very small amount of
> > >additional work.  also, this would pave the way for on-line growing of
> > >disk-backed FFS filesystems.
> 
> Two comments here:
> 
>  * Unless we do something like mmaping the aobj to userland (and ending
>    up with a horribly complicated mount procedure), we'd also need a
>    full-blown mkfs in the kernel too.  Granted my original idea would
>    also require building a filesystem in the kernel too, but I was
>    thinking of something a little simpler than FFS.  If we just go for
>    something like the current mfs but implemented on an aobj we will
>    also hit a lot of pages as all the cylinder groups are set up.  If we
>    do have an in-kernel mkfs, this could be worked around by keep track
>    of which c/g's have been formatted and which haven't.

yea, I was thinking that if you did it by making a growable FFS, you could
start with just 1 c/g and add more as you wanted more space.  but...


>  * I was planning on creating a aobj of a set size and not worrying
>    about growing/shrinking it.  Adding grow/shrink abilities to FFS was
>    a little beyond what I was thinking :-)

ok, I wasn't sure how much pie-in-the sky you wanted to think.  :-)


> > I think that what one really wants here is something along the
> > lines of what Solaris which IIRC is that files in the tmpfs live
> > in the buffer cache (avoiding a bcopy to get it in the buffer cache
> > which our mfs requires, and duplicating data thereby wasting memory)
> > and are paged out to disk when so required.
> 
> As I understand it now, there's no way to say "don't put this block
> in the buffer cache" - the current filesystem model revolves around
> I/O to buffers only, right?  Is this something that will be more
> easily addressed with UBC?

well, there are a couple ways to go about this:

1.  cache file data in your swapfs vnode buffers and the aobj pages.
    this has the problems that you no doubt have been considering,
    mainly that the data has to pass thru the buffer cache before
    it can get to the aobj.

2.  cache file data only in aobj pages.
    you can do this without UBC, but it means that every time you
    access a swapfs vnode, you'll need to retranslate the swapfs vnode
    request into the namespace of the aobj in order to find the data.
    the benefit from not double-caching the data will no doubt outweigh
    the penalty of storing the data at the lower layer.

3.  cache file data in aobj pages and swapfs vnode pages.
    this is what you'd want UBC for, but the first phase of UBC won't
    even do everything you'd want, since this is basically stackable
    vm objects, which have all the same issues as stackable filesystems.


so I guess if you'd actually like to get something working I'd recommend
option 2.  it should be easy to refit this to option 3 once the appropriate
framework is in place someday.

-Chuck