tech-kern: Re: Filesystem locking and cache question.

Subject: Re: Filesystem locking and cache question.
To: Sung-Won Chung <swchung7@hotmail.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 01/17/2003 16:11:28
On Fri, 17 Jan 2003, Sung-Won Chung wrote:

> I am thinking a process that can relocate file system block to
> be relocated other locations, to make a kind of disk defragmentation
> program to get a continuous file allocation.
>
> Now, I am thinking how to manage locking and caches to this.
>
>
> 1. Locking
>
> During a relocation of a disk block, other processes should not
> access that block. I think a simple solution is using a lock for
> a vnode related with a block under relocation, since we can find
> the inode corresponding to a block under relocation, though it
> should traverse file system. This lock is different from vn_lock,
> since it should protect the whole vnode operations such as
> VOP_RENAME().

I don't understand what is wrong with using a vnode lock? Just lock the
node while you're moving the blocks.

There are a number of tricks with UBC that you could play too.

> There is shortcoming in the use of locking to vnode during relocation,
> since if the granularity of lock is vnode operation, using simple lock
> serialize all vnode operations.  It may lead to performance degrade.
> Therefore, locking must occur only when the vnodes related with a
> vnode operation is under relocation, by using hash or so.
>
>
>
>
> 2. Cache
>
> FFS uses inode, vnode, and buffer cache. After a block is relocated,
> we should update caches related with the block just moved, before
> releasing a lock that have prevented enterance of vnode operations
> related with the block under relocation.
>
> Simple solution is, instead of update, 1) flush buffer cache related
> with the moved block, and 2) flush inode cache related with the moved
> block, since they have old location of the block.

What do you mean, "instead of update?"

> If we avoid cache flushing, the work to be done for inode is simple.
> We just update block pointers (di_db or di_ib). However, avoiding the
> flush of a buffer cache is rather complicated. 1) if the buffer cache
> contains general data block, we can avoid flush only by changing
> physical block number in the buffer (b_blkno). 2) if the buffer cache
> contains directory entry, we can avoid flush by changing inode number
> field in the relavant directory entry. 3) if the buffer cache contains
> inode itself, we change file system block pointers, 4) if the buffer
> cache contains a block containing indirect file system block pointers,
> we updates some of those pointers to reflect new location of moved
> block.

Why do we want to not synchronize the disk and the buffer cache?

> The difficult to implement this idea is, current buffer cache doesn't
> know what kind of data does it have. But adding flags that can
> tell what the buffer has may degrade the file system indepedency of
> buffer cache.

Look at LFS. It routinely moves data blocks around, and so it will show
you how to do this.

Take care,

Bill