Subject: Re: swapfs filesystem design (and mount/umount question)
To: Simon Burge <simonb@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/19/2000 18:52:53
On Mon, Mar 20, 2000 at 12:18:03AM +1100, Simon Burge wrote:
> Folks,
> 
> Here's some rough notes on how I think a swapfs filesystem should be
> implemented from a layout POV.
> 
>  A- The filesystem size will be limited to 2^32 512 bytes "blocks", as
>     this is how big the size parameter is (on 32 bit machines anyway),
>     and I've kept the size parameter in terms of 512 bytes blocks since
>     that matches what mfs uses.  Page offsets internally are u_int32_t's
>     so there's an absolute maximum filesystem size of 2^32 * PAGE_SIZE.
>     I don't see the maximum filesystem size as a real limitation...
> 
>  B- The filesystem is contained in one aobj.  This is split up into four
>     parts:
> 
>     1) a bitmap for each page that is used
>     2) a bitmap for each inode that is used
>     3) an page map for the pages that contain inodes (see C below).
>     4) the filesystem inodes and data

you might consider making each of these types of data a separate aobj.
if you put the inodes in an aobj (ie. one aobj contains all the inodes),
then you don't need the "page map" (if I understand what that's supposed
to be), and you don't need the "si_number" in the swapfs_inode.
I don't think you need the "si_dev" in each inode either, since that
should be the same for all inodes in the filesystem.

-Chuck


>  C- Inode pages are allocated dynamically.  Initially I was thinking of
>     a separate page map that contained indexes to the inodes' pages.
>     I've also been thinking of keeping all the inodes in a normal (but
>     hidden from the user-visible namespace) file.  The former would
>     probably be easier to implement and possibly faster (not having
>     to use the filesystem to locate inode pages).  The later has the
>     advantage that there's less space reserved that may never be used if
>     there aren't that many inode allocated.  It would appear that the
>     potential wastage associated with the former would only be about 1%
>     of the total filesystem size.  I'm tending towards the former at the
>     moment.
> 
>  D- Inodes look quite similar to a ffs dinode but have an extra field or
>     two from the in-memory ffs inode (si_dev and si_number) since the
>     filesystem inode and the ``in-memory'' inode are the same thing.
>     Here's the proposed swapfs inode layout:
> 
>     #define NDADDR  14
>     #define NIADDR  3
> 
>     struct swapfs_inode {
> 	u_int16_t	si_mode;	/*   0: IFMT, permissions */
> 	int16_t		si_nlink;	/*   2: file link count */
> 	u_int32_t	si_flags;	/*   4: status flags (chflags) */
> 	u_int32_t	si_uid;		/*   8: file owner */
> 	u_int32_t	si_gid;		/*  12: file group */
> 	u_int64_t	si_size;	/*  16: file byte count */
> 	int32_t		si_atime;	/*  24: last access time */
> 	int32_t		si_atimensec;	/*  28: last access time */
> 	int32_t		si_mtime;	/*  32: last modified time */
> 	int32_t		si_mtimensec;	/*  36: last modified time */
> 	int32_t		si_ctime;	/*  40: last inode change time */
> 	int32_t		si_ctimensec;	/*  44: last inode change time */
> 	u_int32_t	si_blocks;	/*  48: blocks actually held */
> 	u_int32_t	si_dev;		/*  52: device inode is on */
> 	u_int32_t	si_number;	/*  52: inode number */
> 	u_int32_t	si_db[NDADDR];	/*  56: direct blocks */
> 	u_int32_t	si_ib[NIADDR];	/* 116: indirect blocks */
>     };
> 
>     NDADDR can be trimmed down if any more fields are needed.  There's
>     no on-disk filesystems to keep backwards compatbility with :-)
> 
>  E- Directories in the initial version will be much the same as FFS
>     directories.  I plan to investigate a hashed or btree scheme later
>     on, but I'll KISS early on.
> 
>  F- No fragments on day one either - I'd just like to get the framework
>     going.  I suspect that they will be the first improvement.
> 
> Can anyone see any glaring errors, blunders or omissions in the above so
> far?
> 
> 
> 
> I've just now got a basic empty filesystem framework working (ie, no vops
> supported except for mount, umount and statfs, and mount doesn't do
> anything like creating a filesystem).  At the moment I can't unmount the
> filesystem:
> 
> 	wincen:vfs/miscfs/swapfs 1# modload obj.i386/swapfs.o 
> 	Module loaded as ID 0
> 	wincen:vfs/miscfs/swapfs 2# mount_swapfs -s 131072 swapfs /mnt
> 	wincen:vfs/miscfs/swapfs 3# df -i
> 	Filesystem  1K-blocks  Used  Avail Capacity iused   ifree  %iused  Mounted on
> 	...
> 	swapfs          65404     0  65404     0%       0  523232     0%   /mnt
> 	wincen:vfs/miscfs/swapfs 4# mount | grep swap
> 	swapfs on /mnt type swapfs (local)
> 	wincen:vfs/miscfs/swapfs 5# umount /mnt
> 	umount: /mnt: not currently mounted
> 
> I've copied bits of kernfs_mount() in making my swapfs_mount() - the
> important bits seem to be:
> 
> 	error = getnewvnode(VT_SWAPFS, ...);
> 
> 	MALLOC(swfsp, struct swapfs_mount *, sizeof(struct swapfs_mount),
> 	    M_MISCFSMNT, M_WAITOK);
> 	rvp->v_type = VBLK;
> 	rvp->v_flag |= VROOT;
> 	rvp->v_data = swfsp;
> 
> 	swfsp->sm_root = rvp;
> 
> 	mp->mnt_flag |= MNT_LOCAL;
> 	mp->mnt_data = (qaddr_t)swfsp;
> 	vfs_getnewfsid(mp, MOUNT_SWAPFS);
> 
> Any suggestions on where to start looking for this one?
> 
> 
> Simon.