tech-kern: UBC info

Subject: UBC info
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 06/15/1999 08:59:08
here's some info on UBC, how it works, what changes are needed for a
filesystem to support it, etc.

in the existing system, regular file data is stored in two places:
in the buffer cache for read() and write() to use, and in the page cache
for mmap() to use.  read() and write() totally ignore the page cache.
when resolving page faults for mmap()d regions, the page cache is
checked first, and if the data is not found there, it is copied
from the buffer cache to the page cache.

with UBC, regular file data lives only in the page cache, and is read
and written directly from the page cache without going thru the buffer
cache.  read() and write() use a new kind of non-wired kernel mapping
to access the page cache pages when copying the data to the user's buffer.
(eventually this copy can be optimized away using UVM's loaning feature.)

to achieve this end, the vnode pager "get" and "put" handling is completely
redone to call some new filesystem interfaces to do i/o to the disk.
these new interfaces are VOP_GETPAGES() and VOP_PUTPAGES().
VOP_GETPAGES() just passes the UVM pager "get" operation down to the
filesystem to read the requested page and do whatever readahead is desired.
VOP_PUTPAGES() is used to write dirty pages to disk during sync(), fsync(),
msync() operations, and also by the pagedaemon when memory becomes scarce.

in addition, the VOP_READ() and VOP_WRITE() interfaces of a filesystem
need to be changed to use ubc_alloc() and ubc_release() instead of the
current getblk() and brelse() to get access to file data in memory.
ubc_alloc() is like getblk() in that it returns a virtual address
where you can access the file data, but it's different in several ways,
most importantly that the UBC mapping isn't wired and so accessing it
may generate page faults, and those page faults can even fail to be
resolved successfully in the case of a device i/o error.  so UBC mappings
can only be accessed via fault-safe methods like uiomove().

currently I'm only looking to have regular files use the page cache.
directories, symlinks and other metadata will still use the buffer cache
like they do now.  eventually I'd like to have everything using the page cache
and eliminate the buffer cache as it exists today, but that can wait
until after the basics are working.

so that's the short explanation.  I'd recommend looking at the changes
I've made to NFS and FFS for details on how the VOPs should work now,
and I'm happy to answer questions if it's still clear as mud.
(and before anyone says it, yes, the NFS getpages and putpages are
currently a complete hack.  would anyone like to work on them?)

-Chuck