tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: buffer cache & ufs changes (preliminary ffsv2 extattr support)


> Hello,
> I'm working on porting the FreeBSD FFSv2 extended attributes support.
> What we have right now only works for ffsv1 (it's a restriction in our
> sources but it could be extended to ffsv2), and uses a file hierarchy
> to store attributes. This has several issues, one being that it doesn't
> integrate with WAPBL and is very slow (glusterfs shows this very well).
> FFSv2 has native extended attributes support, in the form of 2 direct
> blocks reserved for this purpose in the on-disk inode. This was commented out
> in our kernel when FFSv2 support was imported. It should be possible to
> integrate this with WAPBL and handle it as other metadata, so it should
> be fast. fsck will also be able to check it.
> I don't think I'll be able to have this ready for netbsd-6, but I now know
> this requires 2 changes that will require a kernel version bump, so theses
> changes needs to go in before netbsd-6 is branched so that full
> extended attributes support can be pulled up later.
> The fisrt change is to the buffer cache. Right now the buffer cache is
> indexed by the couple <vnode pointer, block number>, block number being
> a block offset in the file being pointed to by vnode pointer. 
> But we'll have 2 kinds of blocks: data blocks (what we have now) and
> extended attributes blocks, so block number is not enough to identify
> blocks from a vnode. FreeBSD use negative block numbers for extattrs,
> but I find it unclean, I'm not sure it won't cause problems with our
> buffer cache (at last block -1 is used already for vtruncbuf()), and negative
> blocks numbers are already used in ufs for indirect blocks.
> I see 2 ways to fix this:
> 1) Add a new bflag, B_ALTDATA. When the buffer refers to a extended attr
>   block (and not a data block), this flag is set. This flag can also be
>   passed to bread() and breadn() (not the same namespace, but the same
>   B_ prefix and so the same name, this part of the buffer cache API could also
>   be improved). When looking up a buffer in the cache we also check for
>   this flag. For consumers to be able to specify we're looking up a
>   B_ALTDATA buffer, incore(), getblk() and vtruncbuf() gains a new flag
>   argument. To avoid touching a all buffer cache users, I choose to
>   introduce incore2(), getblk2() and vtruncbuf2() with the extra argument,
>   and the origical functions just call the *2 version with flag set to 0.
>   This is implemented in buffer.diff, and has been tested to not
>   introduce new problems with existing code.
> 2) instead of using a new flag, add a new 'int type' member to struct buf,
>    which is opaque to the buffer cache itself (the meaning of type > 0 is
>    fs-dependant) but is checked when looking up a buffer.
>    Type 0 would still mean regular vnode data, so that existing users
>    won't have to be changed, other values could be used by filesystems
>    for their internal data usage (for example, ufs could use 1 for
>    first indirect blocks, 2, for second indirect blocks, 3 for thrird
>    indirect blocks, and some other values for extended attributes. Another
>    filesystem with e.g. blocks to store ACLs could also use its own
>    type to have its ACL blocks entered in the bufcache).
>    In addition to new incore2(), getblk2() and vtruncbuf2() functions
>    with a type argument, we'd also need a bread2() and breadn2() with
>    a type argument.
>    I've not implemented this yet.

have you considered to separate the entity being cached from vnode?
iirc, irix called it "buffer cache target" or such.
your vtruncbuf2 function seems to imply needs to have separate
v_dirtyblkhd/v_cleanblkhd for each types.
besides that, it's more natural representation for NTFS/NFSv4 style
attribute forks.


> Althrough I've done 1 as a POC, I prefer solution 2 (the patch is mostly the
> same, with bflag remplaced by b_type). What do other think ?
> The second change needed outside of sys/ufs/ffs/ is:
> - new members in struct inode. This is strait from FreeBSD, and this
>   affects modules so require a kernel version bump
> - ufs_inode.c:ufs_inactive() truncate extended data to 0 as well when
>   freeing the inode. This require changes to ffs and lfs to ignore
>   IO_EXT (for now).
> - ufs_vnops.c:ufs_strategy(): when requesting extended attribute data
>   (B_ALTDATA actually), get extdata bn instead of regular bn via VOP_BMAP().
> This is in ufs.diff attached.
> The complete diff of actual code (where extended attributes are not working
> yet, there's locking issues, as well as more code needed for reverse endian
> handling, WAPBL and fsck) is also included, so that you can see how
> all of this goes in together.
> I'd really like to be able to get this in netbsd-6 later, which would
> means (given my own schedule) I have to commit the kern and ufs/ufs parts
> before next friday if we want to avoid kernel API changes in the branch.
> -- 
> Manuel Bouyer <>
>      NetBSD: 26 ans d'experience feront toujours la difference
> --

Home | Main Index | Thread Index | Old Index