tech-kern: Re: LFS v2 layout

Subject: Re: LFS v2 layout
To: Jesse Off <joff@gci-net.com>
From: Konrad Schroder <perseant@hhhh.org>
List: tech-kern
Date: 12/07/2000 23:07:17
On Thu, 7 Dec 2000, Jesse Off wrote:

> True, though we already address most every other kind of block on LFS
> with 512 byte disk blocks.  Assuming 32bits address for 512 byte disk
> blocks, the theoritical maximum filesystem size would be 2047 Gb.  

Hm, 2T is not an outlandish size for a disk anymore, sigh.  Of course, we
already can't create >=2T filesystems with 512-byte-sectored devices
because the disk drivers are addressed in 32-bit quantities; but it would
be nice if we didn't have to change the fs format to support them.

I've thought about having yet another parameter, disk address unit size,
which might or might not be the fragment size: in v1 filesystems, the
dausize would be 512 != the default fsize of 1024.  (Since it's been
different in v1 it would have to be a separate parameter internally, for
the compatibility code, even if it doesn't get exposed through the
superblock.)  But as you pointed out, there isn't really any point to
having fragments larger than the dausize...so yes, converting to
fragment-sized units looks like a good idea.

The reason I originally suggested parametrizable inode blocks was because,
unlike other fss, how full LFS inode blocks will be depends on how many
files are generally written at a time, which is application-dependent.  
If you've got an application that writes to a single file all the time,
you're going to have 1 or 2 inodes in every inode block, which wastes more
space than it needs to; if you have many parallel writes you'll be fine
with larger blocks.  Having inode blocks as small as possible looks good,
but then you're taking up space in the segment summary if you ever have to
write more than one inode block: significantly less wasteful of space, but
if you run out of room in the summary, it means writing another partial
segment which means losing half a rotation's worth of time.

(I don't think we can just have a count of the inode blocks, though that
would be the best solution, because the number of inode blocks isn't known
before we start gathering blocks.  The problem is that the loop in
lfs_writevnodes doesn't know how much space each inode will take, so it
doesn't know how many inodes it can write until it's already gathered
their blocks.  It could make two passes, but then it'd need to either
gather twice for every block, or mark blocks "half-gathered" in case more
were being written while it was working.  It might work if the inode
blocks were placed after...I need to think about this some more.)

> don't think users would appreciate knowing they might have to rebuild
> their LFS's for a 3rd revision of LFS sometime in the future. :-)  
> The "Get it Right The First Time" NetBSD slogan, right?

True ... but we will have to go to v3 when there is a ufs inode format
that has 64-bit time quantities on it, so there's time yet.  :^)

						Konrad Schroder
						perseant@hhhh.org