Subject: Re: Filesystem locking and cache question.
To: Sung-Won Chung <swchung7@hotmail.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 01/20/2003 08:57:16
On Sat, 18 Jan 2003, Sung-Won Chung wrote:

>
> >From: Bill Studenmund <wrstuden@netbsd.org>
> >
> >I don't understand what is wrong with using a vnode lock? Just lock the
> >node while you're moving the blocks.
>
>      For FFS, if vn_lock is used, I thought there is chance that VOP_RENAME
>      may notice the inode of source directory changed during internal
>      re-locking. Because I`m just a beginner in file system's internal,
>      I didn't know that VOP_RENAME avoides this situation by
>      setting IN_RENAME flag, and I can also check it.

The other option is to just not update the access times when you move
blocks around. The access times show when things outside the file system
(userland and some parts of the kernel) interacted with files; they don't
have to show when the file system did internal maintenance.

>      I think that there are some race conditions that can not be avoided
>      by vn_lock(). In vnode operations that call ffs_makeinode()
>      such as VOP_MKNOD/MKDIR/CREATE() return a locked vnode for a
>      created file or dir. If an inode is relocated between after it is
>      allocated by ffs_nodealloc() and before registered by VFS_VGET(),
>      we can't lock the vnode corresponding to the inode under relocation.
>      Then the relocation program moves the inode without vnode locking,
>      and its content is lost.

Wait, are you moving indoes around, or data blocks within an inode?
Because the objects returned by MKNOD/MKDIR/CREATE should not have any
data blocks, so there's no data blocks to move. So this doesn't matter.

>      Another possible race is in VOP_LOOKUP(). When an inode is moved
>      between reading directory entry and calling VFS_GET(),
>      When this race condition is possible,  VOP_LOOKUP() sets PDIRUNLOCK
>      flag to inform caller before returning error. However,
>      curent vfs_lookup() implementation doesn't seem to deal with it..

The paragraph above looks like you're moving inodes, not the blocks in
them. You don't want to do that with any of the ufs file systems.

Because just moving the data around shouldn't cause the error you mention
above.

> >There are a number of tricks with UBC that you could play too.
>
>      The only interface I know about UBC is ubc_alloc/ubc_release.
>      I have no idea how to lock with this interface.
>      Could you show me a little more hint ?

All of the file data now are cached in ubc, so if you're moving blocks
around, ubc is where you'll need to do it.

> > > 2. Cache
> > >
> > > FFS uses inode, vnode, and buffer cache. After a block is relocated,
> > > we should update caches related with the block just moved, before
> > > releasing a lock that have prevented enterance of vnode operations
> > > related with the block under relocation.
> > >
> > > Simple solution is, instead of update, 1) flush buffer cache related
> > > with the moved block, and 2) flush inode cache related with the moved
> > > block, since they have old location of the block.
> >
> >What do you mean, "instead of update?"
>
>      I'm sorry if I confused you. I am not good at English.
>      I meant "update" by correcting the content of buffer cache or
>      inode cache that had the previous location of a block which
>      had moved to a new location.

I thought that's what you meant. Youy don't want to do that. We need to
keep the on-disk metadata (file block tables, volume free block tables)
synchronized, so moving a data block will mean writing metadata. You MUST
flush the cache.

> >Why do we want to not synchronize the disk and the buffer cache?
>
>
>      I thought if we synchoronize by flushing invalid cache, the frequently
>      used part of cache may need to be reloaded soon again.
>      I admit that I was too greedy not to lose cache.

You can also synchronize by just writing the contents. You don't have to
remove the entries from the cache.

> > > The difficult to implement this idea is, current buffer cache doesn't
> > > know what kind of data does it have. But adding flags that can
> > > tell what the buffer has may degrade the file system indepedency of
> > > buffer cache.
> >
> >Look at LFS. It routinely moves data blocks around, and so it will show
> >you how to do this.
>
>       Thank you much for your considerations and suggestions.
>       I'll study the LFS code to see how they solve my problems.

Take care,

Bill