tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: don't use wapbl (-o log) on / (/dev)



aha, found this thread again...

On Mon, Feb 16, 2009 at 09:24:11AM +0900, YAMAMOTO Takashi wrote:
 > > On Sun, Feb 15, 2009 at 01:41:14PM -0500, Steven M. Bellovin wrote:
 > >  > And where does the wapabl come in?  The file system where /dev exists?
 > >  > Strange...
 > > 
 > > Yes. This keeps recurring with different pairs of fs types; the
 > > ultimate cause is some kind of structural problem that doesn't isolate
 > > filesystems from one another well enough.
 > > 
 > > I'm inclined to think a big part of it is that the buffer cache tries
 > > (to support LFS) to be virtually indexed and never physically indexed;
 > > that is, all buffers belong to vnodes and file offsets rather than
 > > devices and device offsets. The problem is that this doesn't work for
 > > blocks that don't belong to files, and so they get attached to the
 > > vnode of the device the fs is mounted on, so the "file" is the device
 > > and the offset is the device offset. Trouble is, that vnode belongs to
 > > the root fs, and so we end up calling the root fs's vnode ops or vfs
 > > ops to work on the buffers. I think. (I haven't traced this all
 > > through yet and I may be missing something.)
 > > 
 > > While in theory we could special-case all handling of buffers
 > > belonging to block devices, that requires a lot of caution in a lot of
 > > places and is not going to be maintainable as the system evolves...
 > > even assuming we can find all the relevant places, which given that
 > > the problem keeps reappearing doesn't seem to have been the case in
 > > practice.
 > > 
 > > I think the buffer cache needs to be restructured so it can be either
 > > virtually or physically indexed. This is going to be a big hassle.
 > 
 > actually filesystems can use any kind of numbers as buffer cache index
 > for their own vnodes.

Well, sure, but it's an off_t. I guess a filesystem could use physical
offsets if it was careful, but AIUI the lookup key for buffers is the
passed vnode and the offset, which means that using physical addresses
allows multiple keys to name the same disk block. That in turn means
that one can have multiple buffers for the same disk block, which
isn't likely to work out too well. (In theory, one should only ever
have one, and old ones should be dropped/BC_INVAL'd before a block is
reallocated to another vnode; but many possible race conditions lie
therein.)

 > i don't understand how you think it's related to the bug, tho.
 > the "structural problem" you mentioned exists as far as a filesystem
 > are backed by a VBLK vnode and the filesystem uses the VBLK node's
 > buffer cache, doesn't it?

Yes.

The issue is that the VBLK vnode belongs to another FS, and the
recurring problem we have is that the other FS's methods end up being
used in connection with those buffers.

As I said above/earlier, we could handle these buffers carefully and
avoid problems that way, but the bug rate so far suggests that this
isn't a good idea. Instead, we ought to keep these buffers somewhere
else so they aren't in reach from the wrong places.

I think this means that some buffers (the ones that don't belong to
any particular on-FS vnode) ought to be hung on the struct mount. This
will fix the structural problem at the cost of complicating the
interface some... but I think the resulting interface will be clearer.

There is some discussion of this in PR 41189. ad thinks devfs will
solve the problem, but he hasn't explained why and I'm very skeptical.
Having the block device vnodes belong to devfs will just cause devfs
ops to be run on them accidentally instead of wapbl ops, and more
radical restructurings are pretty much ruled out by the requirement
that devfs continue to use traditional block/character device inodes.

What i'm not at all clear on is how this relates to genfs_putpages.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index