Subject: Re: swapfs filesystem design (and mount/umount question)
To: Simon Burge <simonb@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/19/2000 18:52:53
On Mon, Mar 20, 2000 at 12:18:03AM +1100, Simon Burge wrote:
> Folks,
>
> Here's some rough notes on how I think a swapfs filesystem should be
> implemented from a layout POV.
>
> A- The filesystem size will be limited to 2^32 512 bytes "blocks", as
> this is how big the size parameter is (on 32 bit machines anyway),
> and I've kept the size parameter in terms of 512 bytes blocks since
> that matches what mfs uses. Page offsets internally are u_int32_t's
> so there's an absolute maximum filesystem size of 2^32 * PAGE_SIZE.
> I don't see the maximum filesystem size as a real limitation...
>
> B- The filesystem is contained in one aobj. This is split up into four
> parts:
>
> 1) a bitmap for each page that is used
> 2) a bitmap for each inode that is used
> 3) an page map for the pages that contain inodes (see C below).
> 4) the filesystem inodes and data
you might consider making each of these types of data a separate aobj.
if you put the inodes in an aobj (ie. one aobj contains all the inodes),
then you don't need the "page map" (if I understand what that's supposed
to be), and you don't need the "si_number" in the swapfs_inode.
I don't think you need the "si_dev" in each inode either, since that
should be the same for all inodes in the filesystem.
-Chuck
> C- Inode pages are allocated dynamically. Initially I was thinking of
> a separate page map that contained indexes to the inodes' pages.
> I've also been thinking of keeping all the inodes in a normal (but
> hidden from the user-visible namespace) file. The former would
> probably be easier to implement and possibly faster (not having
> to use the filesystem to locate inode pages). The later has the
> advantage that there's less space reserved that may never be used if
> there aren't that many inode allocated. It would appear that the
> potential wastage associated with the former would only be about 1%
> of the total filesystem size. I'm tending towards the former at the
> moment.
>
> D- Inodes look quite similar to a ffs dinode but have an extra field or
> two from the in-memory ffs inode (si_dev and si_number) since the
> filesystem inode and the ``in-memory'' inode are the same thing.
> Here's the proposed swapfs inode layout:
>
> #define NDADDR 14
> #define NIADDR 3
>
> struct swapfs_inode {
> u_int16_t si_mode; /* 0: IFMT, permissions */
> int16_t si_nlink; /* 2: file link count */
> u_int32_t si_flags; /* 4: status flags (chflags) */
> u_int32_t si_uid; /* 8: file owner */
> u_int32_t si_gid; /* 12: file group */
> u_int64_t si_size; /* 16: file byte count */
> int32_t si_atime; /* 24: last access time */
> int32_t si_atimensec; /* 28: last access time */
> int32_t si_mtime; /* 32: last modified time */
> int32_t si_mtimensec; /* 36: last modified time */
> int32_t si_ctime; /* 40: last inode change time */
> int32_t si_ctimensec; /* 44: last inode change time */
> u_int32_t si_blocks; /* 48: blocks actually held */
> u_int32_t si_dev; /* 52: device inode is on */
> u_int32_t si_number; /* 52: inode number */
> u_int32_t si_db[NDADDR]; /* 56: direct blocks */
> u_int32_t si_ib[NIADDR]; /* 116: indirect blocks */
> };
>
> NDADDR can be trimmed down if any more fields are needed. There's
> no on-disk filesystems to keep backwards compatbility with :-)
>
> E- Directories in the initial version will be much the same as FFS
> directories. I plan to investigate a hashed or btree scheme later
> on, but I'll KISS early on.
>
> F- No fragments on day one either - I'd just like to get the framework
> going. I suspect that they will be the first improvement.
>
> Can anyone see any glaring errors, blunders or omissions in the above so
> far?
>
>
>
> I've just now got a basic empty filesystem framework working (ie, no vops
> supported except for mount, umount and statfs, and mount doesn't do
> anything like creating a filesystem). At the moment I can't unmount the
> filesystem:
>
> wincen:vfs/miscfs/swapfs 1# modload obj.i386/swapfs.o
> Module loaded as ID 0
> wincen:vfs/miscfs/swapfs 2# mount_swapfs -s 131072 swapfs /mnt
> wincen:vfs/miscfs/swapfs 3# df -i
> Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on
> ...
> swapfs 65404 0 65404 0% 0 523232 0% /mnt
> wincen:vfs/miscfs/swapfs 4# mount | grep swap
> swapfs on /mnt type swapfs (local)
> wincen:vfs/miscfs/swapfs 5# umount /mnt
> umount: /mnt: not currently mounted
>
> I've copied bits of kernfs_mount() in making my swapfs_mount() - the
> important bits seem to be:
>
> error = getnewvnode(VT_SWAPFS, ...);
>
> MALLOC(swfsp, struct swapfs_mount *, sizeof(struct swapfs_mount),
> M_MISCFSMNT, M_WAITOK);
> rvp->v_type = VBLK;
> rvp->v_flag |= VROOT;
> rvp->v_data = swfsp;
>
> swfsp->sm_root = rvp;
>
> mp->mnt_flag |= MNT_LOCAL;
> mp->mnt_data = (qaddr_t)swfsp;
> vfs_getnewfsid(mp, MOUNT_SWAPFS);
>
> Any suggestions on where to start looking for this one?
>
>
> Simon.