Subject: swapfs filesystem design (and mount/umount question)
To: None <tech-kern@netbsd.org>
From: Simon Burge <simonb@netbsd.org>
List: tech-kern
Date: 03/20/2000 00:18:03
Folks,

Here's some rough notes on how I think a swapfs filesystem should be
implemented from a layout POV.

 A- The filesystem size will be limited to 2^32 512 bytes "blocks", as
    this is how big the size parameter is (on 32 bit machines anyway),
    and I've kept the size parameter in terms of 512 bytes blocks since
    that matches what mfs uses.  Page offsets internally are u_int32_t's
    so there's an absolute maximum filesystem size of 2^32 * PAGE_SIZE.
    I don't see the maximum filesystem size as a real limitation...

 B- The filesystem is contained in one aobj.  This is split up into four
    parts:

    1) a bitmap for each page that is used
    2) a bitmap for each inode that is used
    3) an page map for the pages that contain inodes (see C below).
    4) the filesystem inodes and data

 C- Inode pages are allocated dynamically.  Initially I was thinking of
    a separate page map that contained indexes to the inodes' pages.
    I've also been thinking of keeping all the inodes in a normal (but
    hidden from the user-visible namespace) file.  The former would
    probably be easier to implement and possibly faster (not having
    to use the filesystem to locate inode pages).  The later has the
    advantage that there's less space reserved that may never be used if
    there aren't that many inode allocated.  It would appear that the
    potential wastage associated with the former would only be about 1%
    of the total filesystem size.  I'm tending towards the former at the
    moment.

 D- Inodes look quite similar to a ffs dinode but have an extra field or
    two from the in-memory ffs inode (si_dev and si_number) since the
    filesystem inode and the ``in-memory'' inode are the same thing.
    Here's the proposed swapfs inode layout:

    #define NDADDR  14
    #define NIADDR  3

    struct swapfs_inode {
	u_int16_t	si_mode;	/*   0: IFMT, permissions */
	int16_t		si_nlink;	/*   2: file link count */
	u_int32_t	si_flags;	/*   4: status flags (chflags) */
	u_int32_t	si_uid;		/*   8: file owner */
	u_int32_t	si_gid;		/*  12: file group */
	u_int64_t	si_size;	/*  16: file byte count */
	int32_t		si_atime;	/*  24: last access time */
	int32_t		si_atimensec;	/*  28: last access time */
	int32_t		si_mtime;	/*  32: last modified time */
	int32_t		si_mtimensec;	/*  36: last modified time */
	int32_t		si_ctime;	/*  40: last inode change time */
	int32_t		si_ctimensec;	/*  44: last inode change time */
	u_int32_t	si_blocks;	/*  48: blocks actually held */
	u_int32_t	si_dev;		/*  52: device inode is on */
	u_int32_t	si_number;	/*  52: inode number */
	u_int32_t	si_db[NDADDR];	/*  56: direct blocks */
	u_int32_t	si_ib[NIADDR];	/* 116: indirect blocks */
    };

    NDADDR can be trimmed down if any more fields are needed.  There's
    no on-disk filesystems to keep backwards compatbility with :-)

 E- Directories in the initial version will be much the same as FFS
    directories.  I plan to investigate a hashed or btree scheme later
    on, but I'll KISS early on.

 F- No fragments on day one either - I'd just like to get the framework
    going.  I suspect that they will be the first improvement.

Can anyone see any glaring errors, blunders or omissions in the above so
far?



I've just now got a basic empty filesystem framework working (ie, no vops
supported except for mount, umount and statfs, and mount doesn't do
anything like creating a filesystem).  At the moment I can't unmount the
filesystem:

	wincen:vfs/miscfs/swapfs 1# modload obj.i386/swapfs.o 
	Module loaded as ID 0
	wincen:vfs/miscfs/swapfs 2# mount_swapfs -s 131072 swapfs /mnt
	wincen:vfs/miscfs/swapfs 3# df -i
	Filesystem  1K-blocks  Used  Avail Capacity iused   ifree  %iused  Mounted on
	...
	swapfs          65404     0  65404     0%       0  523232     0%   /mnt
	wincen:vfs/miscfs/swapfs 4# mount | grep swap
	swapfs on /mnt type swapfs (local)
	wincen:vfs/miscfs/swapfs 5# umount /mnt
	umount: /mnt: not currently mounted

I've copied bits of kernfs_mount() in making my swapfs_mount() - the
important bits seem to be:

	error = getnewvnode(VT_SWAPFS, ...);

	MALLOC(swfsp, struct swapfs_mount *, sizeof(struct swapfs_mount),
	    M_MISCFSMNT, M_WAITOK);
	rvp->v_type = VBLK;
	rvp->v_flag |= VROOT;
	rvp->v_data = swfsp;

	swfsp->sm_root = rvp;

	mp->mnt_flag |= MNT_LOCAL;
	mp->mnt_data = (qaddr_t)swfsp;
	vfs_getnewfsid(mp, MOUNT_SWAPFS);

Any suggestions on where to start looking for this one?


Simon.