tech-kern: Re: `Large Inodes'

Subject: Re: `Large Inodes'
To: Wolfgang Solfrank <ws@tools.de>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/26/1999 11:34:02
On Fri, 26 Mar 1999, Wolfgang Solfrank wrote:

> > 2) In a seperate stream off of the file, a la MacOS resource forks or
> > NTFS's ability to store restoures forks. My concern with our application
> > using this is the cleanest way I've thought of to impliment it in ufs is
> > to make one file's inode refer to another inode, which would hold that
> > fork's data. On the current systems we have, we already have to crank up
> > the number of inodes in the fs just to hold what we have. We have fs's
> > with about 10 million inodes on them. If we have to have two inodes per
> > file, it's even worse.
> 
> In what sense it that worse than having to double the space per inode?
> I don't get it.

In that you then have to worry (well, you don't HAVE to, but should :-)
about this second inode not being near your first one. By doubling the
inode, this extra storage is close by. Also, our data wasn't going to be
the only client for this extra space (there was going to be space outside
the 96-byte ufs-opaque data which could cover ACL's, etc, when
implimented) - thus the doubling might be wanted for other reasons too.
Thus if we forced a tie to another inode in and want ACL's, we'd need two
fat inodes.

> > (we chould build the other fork reading into ffs, but then I don't think
> > it'd still be ffs :-)
> 
> No, last I looked, ffs didn't have forks :-).

:-) There seems to be some interest in adding them, but that's another
discussion (and in that discussion I'd vote for tying inodes together).

> > 3) In the front of the file, say as a 256 byte header. This solution has
> > two problems. First, for the migration problem, we want to totally zap
> > files off of disk. If we store our header in the file, we'll have to keep
> > at least one frag around. That seems wasteful. Second, we have to keep
> > accesses from reading/writing that area (not hard, but an extra step).
> > Third, restoring would be tricky. dump reads the disk info, so would see
> > our data. But restore writes, so it couldn't store it.
> 
> Depends on whether you backup/restore the underlying ffs using dump/restore
> (in which case you would of course get the header on tape and be able to
> restore it to disk) or using dump/restore on your overlay filesystem
> (using unmodified dump/restore of one filesystem to backup another
> filesystem seems quite silly to me).

Don't forget that there's an asymetry in dump & restore. dump wants access
to the unmounted device - it reads the raw, on-disk image. restore
accesses the live filesystem. Since our overlay layer doesn't have a
filestore, you can't dump it. Without hacking restore, a restore would
need to take the overlay fs off-line. Not a happy thought. :-)

Also, the dumps of the fs aren't dumps of the file, they are dumps of the
file plus junk. With the other solutions, you could take the restore to
another system, restore, and have all the data that was on disk at the
time of the dump, ready to read. You wouldn't have the ability to read in
migrated files, but you'd have what was on disk in a readily readable
format.

> > 4) In another file system. This idea, while possibly quite sensable for
> > other forms of overlay VFS, gives me great pause here. First off, it is
> > even more overhead space, and also it means TWO things have to bet backed
> > up at once. That's an added hassle for the machine room, and strikes me as
> > an added point of failure.
> 
> Why would this be "more overhead space"?  For starters, you can put a file
> on the same disk just 96 bytes or whatever space you need (I didn't follow
> the discussion too closely, so I'm not sure whether you are using the 96
> byte part, or the 32 byte part of the 128 extra bytes per inode) times the
> number of inodes on the disk, and index with the inode number into that
> file to get at the relevant information. Would probably use less space
> than doubling the inode size, and solve the backup problem, too (assuming
> you backup the underlying ffs filesystem), and has the added advantage
> that it could possibly be used on some other filesystems than ffs, too
> (depending however on the way they compute their inode number equivalent).

Bill Sommerfield suggested this. And it's very intersting. My one concern
is that we'd have to hack restore, which possily isn't a big deal. Because
a restore to a live overlay system would need to stick this metadata in
the new files, not the old ones.

That is to say such a storage system is keyed to inode number, which is
a universal attribute of our VFS system (check out the vfs_vget call which
takes a mount point and an ino_t, and returns a vnode pointer). (So any fs
on NetBSD would work fine with it.) But the restored file doesn't
necessarily have the same inode number as the dumped file. So dmfsrestore
would have to read this special file, and for each file, look up what info
it had, and then store that info in the new file.

Take care,

Bill