Subject: Re: cluster waste?
To: NetBSD Users <netbsd-users@NetBSD.org>
From: Steven M. Bellovin <smb@research.att.com>
List: netbsd-users
Date: 03/18/2004 09:51:23
In message <20040318142443.GE11739@cs.uni-bonn.de>, Ignatios Souvatzis writes:
>
>--PGNNI9BzQDUtgA2J
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: inline
>
>Hi,
>
>On Thu, Mar 18, 2004 at 03:12:07PM +0100, Ignatios Souvatzis wrote:
>
>> Different problem, I think. As far as I recall, the "find block" algorithm
>> uses a hash function, and thus gets ineffective if the table is too full.
>
>Found the reference. From /usr/src/share/doc/smm/05.fastfs:
>
>++++++++++++++++++++++++++++++++++++++++
> If a requested block is not available, the local allo-
>cator uses a four level allocation strategy:
>
>1) Use the next available block rotationally closest to
> the requested block on the same cylinder. It is
> assumed here that head switching time is zero. On disk
> controllers where this is not the case, it may be pos-
> sible to incorporate the time required to switch
> between disk platters when constructing the rotational
> layout tables. This, however, has not yet been tried.
>
>2) If there are no blocks available on the same cylinder,
> use a block within the same cylinder group.
>
>
>3) If that cylinder group is entirely full, quadratically
> hash the cylinder group number to choose another cylin-
> der group to look for a free block.
>
>4) Finally if the hash fails, apply an exhaustive search
> to all cylinder groups.
>
> Quadratic hash is used because of its speed in finding
>unused slots in nearly full hash tables [Knuth75]. File
>systems that are parameterized to maintain at least 10% free
>space rarely use this strategy. File systems that are run
>without maintaining any free space typically have so few
>free blocks that almost any allocation is random; the most
>important characteristic of the strategy used under such
>conditions is that the strategy be fast.
This is why I'm skeptical. The first two, and part of the third, are
geared towards finding a "close" block. The rest of the third is, I
believe, optimized towards not reading too many per-cylinder group
free-lists, since that implies disk I/O. But with today's memory sizes
and buffer cache algorithms, I suspect that little or no I/O is
necessary. It might be an interesting experiment to build a file
system consisting of a single cylinder group and measure its
performance. (No, I'm not volunteering; I have neither the machines
nor the disks nor the tools.....)
--Steve Bellovin, http://www.research.att.com/~smb