Subject: Large inodes for ffs
To: None <tech-kern@netbsd.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/23/1999 10:50:13
Jason, Albeaus Bayucan, and I have been working on adding large inode
support to ffs. We've had it working here for a few months, and would like
to get it in 1.4.

Jason did most of the ffs kernel work, Albeaus the userland, and I got
vnextops going.


The main idea is to grow ffs inode to 256 bytes so that we can add new
features to the fs, such as adding acess control lists (ACL's) and storing
application-specific info in the node (to let layered filesystems store
their info in the inode). We also pick up enough room to handle being
Y2038-safe.

We do this by giving large inode ffs's a different magic number from
current ffs's. Everything which depends on the size of the on-disk inode
tests/remembers which inode type we have, and reacts accordingly.

We also have added a new system call, vnextops, which performs "extended
operations" on a vnode. This system call gives a general interface to
testing, reading, and changing the information saved in the extra space. 
These "extended ops" are like ioctl's, except they manipulate the file
system rather than a device. They can be used to update ACL's (not yet
implimented) or to perform fs-specific operations. The vnextops() command
space is broken into two spaces of 32k each. One is for general operations
(like testing for the presence of Application Specific opaque data or acl
updating), and the other half is for fs-specific use.


At present, we have:

Defined a large inode structure which contains: the traditional inode, 96
bytes of application-specific (layer fs, etc) data, a flags byte to
indicate what optional data is present, and 28 u_int32_t's. One of the
bits in the flag byte indicates the presence of application-specific data.
The other bits will indicate the presence of ACL's, etc., when defined.

Modified ffs to support two sizes of inodes. All tests to determine the
size of the inode check the inode type and return the correct value (128
or 256). We also handle endian-independence correctly.

We added a vnextops() syscall, designed to manipulate this extra
information. At present the only defined extened ops are to test, get,
set, and clear application-specific opaque data.

We have updated newfs, fsck, dump, restore, fsirand, and quotacheck to
deal with the different indoe structure.

We have been running this changed code on both i386 and Alpha machines,
and it seems fine.

Thoughts?

Take care,

Bill