Subject: swapfs filesystem design (and mount/umount question)
To: None <tech-kern@netbsd.org>
From: Simon Burge <simonb@netbsd.org>
List: tech-kern
Date: 03/20/2000 00:18:03
Folks,
Here's some rough notes on how I think a swapfs filesystem should be
implemented from a layout POV.
A- The filesystem size will be limited to 2^32 512 bytes "blocks", as
this is how big the size parameter is (on 32 bit machines anyway),
and I've kept the size parameter in terms of 512 bytes blocks since
that matches what mfs uses. Page offsets internally are u_int32_t's
so there's an absolute maximum filesystem size of 2^32 * PAGE_SIZE.
I don't see the maximum filesystem size as a real limitation...
B- The filesystem is contained in one aobj. This is split up into four
parts:
1) a bitmap for each page that is used
2) a bitmap for each inode that is used
3) an page map for the pages that contain inodes (see C below).
4) the filesystem inodes and data
C- Inode pages are allocated dynamically. Initially I was thinking of
a separate page map that contained indexes to the inodes' pages.
I've also been thinking of keeping all the inodes in a normal (but
hidden from the user-visible namespace) file. The former would
probably be easier to implement and possibly faster (not having
to use the filesystem to locate inode pages). The later has the
advantage that there's less space reserved that may never be used if
there aren't that many inode allocated. It would appear that the
potential wastage associated with the former would only be about 1%
of the total filesystem size. I'm tending towards the former at the
moment.
D- Inodes look quite similar to a ffs dinode but have an extra field or
two from the in-memory ffs inode (si_dev and si_number) since the
filesystem inode and the ``in-memory'' inode are the same thing.
Here's the proposed swapfs inode layout:
#define NDADDR 14
#define NIADDR 3
struct swapfs_inode {
u_int16_t si_mode; /* 0: IFMT, permissions */
int16_t si_nlink; /* 2: file link count */
u_int32_t si_flags; /* 4: status flags (chflags) */
u_int32_t si_uid; /* 8: file owner */
u_int32_t si_gid; /* 12: file group */
u_int64_t si_size; /* 16: file byte count */
int32_t si_atime; /* 24: last access time */
int32_t si_atimensec; /* 28: last access time */
int32_t si_mtime; /* 32: last modified time */
int32_t si_mtimensec; /* 36: last modified time */
int32_t si_ctime; /* 40: last inode change time */
int32_t si_ctimensec; /* 44: last inode change time */
u_int32_t si_blocks; /* 48: blocks actually held */
u_int32_t si_dev; /* 52: device inode is on */
u_int32_t si_number; /* 52: inode number */
u_int32_t si_db[NDADDR]; /* 56: direct blocks */
u_int32_t si_ib[NIADDR]; /* 116: indirect blocks */
};
NDADDR can be trimmed down if any more fields are needed. There's
no on-disk filesystems to keep backwards compatbility with :-)
E- Directories in the initial version will be much the same as FFS
directories. I plan to investigate a hashed or btree scheme later
on, but I'll KISS early on.
F- No fragments on day one either - I'd just like to get the framework
going. I suspect that they will be the first improvement.
Can anyone see any glaring errors, blunders or omissions in the above so
far?
I've just now got a basic empty filesystem framework working (ie, no vops
supported except for mount, umount and statfs, and mount doesn't do
anything like creating a filesystem). At the moment I can't unmount the
filesystem:
wincen:vfs/miscfs/swapfs 1# modload obj.i386/swapfs.o
Module loaded as ID 0
wincen:vfs/miscfs/swapfs 2# mount_swapfs -s 131072 swapfs /mnt
wincen:vfs/miscfs/swapfs 3# df -i
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on
...
swapfs 65404 0 65404 0% 0 523232 0% /mnt
wincen:vfs/miscfs/swapfs 4# mount | grep swap
swapfs on /mnt type swapfs (local)
wincen:vfs/miscfs/swapfs 5# umount /mnt
umount: /mnt: not currently mounted
I've copied bits of kernfs_mount() in making my swapfs_mount() - the
important bits seem to be:
error = getnewvnode(VT_SWAPFS, ...);
MALLOC(swfsp, struct swapfs_mount *, sizeof(struct swapfs_mount),
M_MISCFSMNT, M_WAITOK);
rvp->v_type = VBLK;
rvp->v_flag |= VROOT;
rvp->v_data = swfsp;
swfsp->sm_root = rvp;
mp->mnt_flag |= MNT_LOCAL;
mp->mnt_data = (qaddr_t)swfsp;
vfs_getnewfsid(mp, MOUNT_SWAPFS);
Any suggestions on where to start looking for this one?
Simon.