Subject: Re: File system > 1 Terabyte
To: David Laight <david@l8s.co.uk>
From: Greg A. Woods <woods@planix.com>
List: current-users
Date: 10/02/2004 19:12:59
[ On Saturday, October 2, 2004 at 10:25:39 (+0100), David Laight wrote: ]
> Subject: Re: File system > 1 Terabyte
>
> > I'll probably need a few more inodes, esp. for a mail partition....  :-)
> ...
> > BTW, what was that trick to avoiding so much "wastage" again?  Even
> > fewer cylinder groups (and more inodes/g)?
> 
> Almost all the 'wastage' is space for inodes.

Yeah, I guess I knew that....  :-)

> You can half it by using FFSv1 (not FFSv2).

Those numbers were from a FFSv1 example.  :-(

And as I said I will probably need still more inodes, not fewer....


> Also I can't help feeling that the very large number of 'cylinder groups'
> isn't actually a very good idea...

That could well be true, though perhaps you're thinking of even more
reasons than I've come up with.

For example since the "disk" I'm dealing with is a well cached hardware
RAID system there's not as much need to spread the inode table across
the "spindle".  The whole idea of cylinder groups seems as though it
could only ever apply well to individual spindles with really well known
geometries and seek/rotational properties.  Indeed papers I've read
about techniques to dynamically re-discover cylinder boundaries in
variable sized track disks and to use this dynamic geometry again in
smart ways would seem to agree, though only in a tangential way.

I suspsect that a FFS filesystem which "experiences" mostly files of
similar size ranges over its lifetime won't suffer (i.e. degrade in
performance due to fragmentation issues) too much if the number of
cylinder groups is greatly reduced.  Perhaps it depends on just exactly
how many inodes are typically active on the filesystem.  However the
clustering done to reduce I/Os may throw my rather naive assumptions for
a loop.


> FFSv1 limits you to 2^32 (maybe 2^31) fragments - ie >2^32 sectors.

Then are all the signed/unsigned problems just in /sbin/disklabel (and
maybe in whatever other code deals directly with reading, writing and
interpreting disk labels), or is it also throughout FFSv1?

Jason said "negative block numbers" were used internally, but did he
really mean "negative sector numbers"?  I.e. did he mean logical
filesystem block numbers, or physical disk block/sector numbers?  The
latter would explain the 1.0TB limitation I'm seeing, but the former
would allow the 2.0TB limit he claimed provided the signed int problems
in /sbin/disklabel were fixed.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/