Subject: Re: Filesystem locking and cache question.
To: None <wrstuden@netbsd.org>
From: Sung-Won Chung <swchung7@hotmail.com>
List: tech-kern
Date: 01/18/2003 13:50:05
>From: Bill Studenmund <wrstuden@netbsd.org>
>
> > 1. Locking
> >
> > During a relocation of a disk block, other processes should not
> > access that block. I think a simple solution is using a lock for
> > a vnode related with a block under relocation, since we can find
> > the inode corresponding to a block under relocation, though it
> > should traverse file system. This lock is different from vn_lock,
> > since it should protect the whole vnode operations such as
> > VOP_RENAME().
>
>I don't understand what is wrong with using a vnode lock? Just lock the
>node while you're moving the blocks.

     For FFS, if vn_lock is used, I thought there is chance that VOP_RENAME
     may notice the inode of source directory changed during internal
     re-locking. Because I`m just a beginner in file system's internal,
     I didn't know that VOP_RENAME avoides this situation by
     setting IN_RENAME flag, and I can also check it.

     I think that there are some race conditions that can not be avoided
     by vn_lock(). In vnode operations that call ffs_makeinode()
     such as VOP_MKNOD/MKDIR/CREATE() return a locked vnode for a
     created file or dir. If an inode is relocated between after it is
     allocated by ffs_nodealloc() and before registered by VFS_VGET(),
     we can't lock the vnode corresponding to the inode under relocation.
     Then the relocation program moves the inode without vnode locking,
     and its content is lost.

     Another possible race is in VOP_LOOKUP(). When an inode is moved
     between reading directory entry and calling VFS_GET(),
     When this race condition is possible,  VOP_LOOKUP() sets PDIRUNLOCK
     flag to inform caller before returning error. However,
     curent vfs_lookup() implementation doesn't seem to deal with it..

>There are a number of tricks with UBC that you could play too.

     The only interface I know about UBC is ubc_alloc/ubc_release.
     I have no idea how to lock with this interface.
     Could you show me a little more hint ?

> > 2. Cache
> >
> > FFS uses inode, vnode, and buffer cache. After a block is relocated,
> > we should update caches related with the block just moved, before
> > releasing a lock that have prevented enterance of vnode operations
> > related with the block under relocation.
> >
> > Simple solution is, instead of update, 1) flush buffer cache related
> > with the moved block, and 2) flush inode cache related with the moved
> > block, since they have old location of the block.
>
>What do you mean, "instead of update?"

     I'm sorry if I confused you. I am not good at English.
     I meant "update" by correcting the content of buffer cache or
     inode cache that had the previous location of a block which
     had moved to a new location.

> > If we avoid cache flushing, the work to be done for inode is simple.
> > We just update block pointers (di_db or di_ib). However, avoiding the
> > flush of a buffer cache is rather complicated. 1) if the buffer cache
> > contains general data block, we can avoid flush only by changing
> > physical block number in the buffer (b_blkno). 2) if the buffer cache
> > contains directory entry, we can avoid flush by changing inode number
> > field in the relavant directory entry. 3) if the buffer cache contains
> > inode itself, we change file system block pointers, 4) if the buffer
> > cache contains a block containing indirect file system block pointers,
> > we updates some of those pointers to reflect new location of moved
> > block.
>
>Why do we want to not synchronize the disk and the buffer cache?


     I thought if we synchoronize by flushing invalid cache, the frequently
     used part of cache may need to be reloaded soon again.
     I admit that I was too greedy not to lose cache.

> > The difficult to implement this idea is, current buffer cache doesn't
> > know what kind of data does it have. But adding flags that can
> > tell what the buffer has may degrade the file system indepedency of
> > buffer cache.
>
>Look at LFS. It routinely moves data blocks around, and so it will show
>you how to do this.

      Thank you much for your considerations and suggestions.
      I'll study the LFS code to see how they solve my problems.

      Best Regrads,

      - Sungwon

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online 
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963