netbsd-users: Re: cluster waste?

Subject: Re: cluster waste?
To: NetBSD Users <netbsd-users@NetBSD.org>
From: Steven M. Bellovin <smb@research.att.com>
List: netbsd-users
Date: 03/18/2004 09:51:23
In message <20040318142443.GE11739@cs.uni-bonn.de>, Ignatios Souvatzis writes:
>
>--PGNNI9BzQDUtgA2J
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: inline
>
>Hi,
>
>On Thu, Mar 18, 2004 at 03:12:07PM +0100, Ignatios Souvatzis wrote:
>
>> Different problem, I think. As far as I recall, the "find block" algorithm
>> uses a hash function, and thus gets ineffective if the table is too full.
>
>Found the reference. From /usr/src/share/doc/smm/05.fastfs:
>
>++++++++++++++++++++++++++++++++++++++++
>     If  a requested block is not available, the local allo-
>cator uses a four level allocation strategy:
>
>1)   Use the next available block  rotationally  closest  to
>     the  requested  block  on  the  same  cylinder.   It is
>     assumed here that head switching time is zero.  On disk
>     controllers  where this is not the case, it may be pos-
>     sible  to  incorporate  the  time  required  to  switch
>     between  disk platters when constructing the rotational
>     layout tables.  This, however, has not yet been  tried.
>
>2)   If  there are no blocks available on the same cylinder,
>     use a block within the same cylinder group.
>
>
>3)   If that cylinder group is entirely full,  quadratically
>     hash the cylinder group number to choose another cylin-
>     der group to look for a free block.
>
>4)   Finally if the hash fails, apply an  exhaustive  search
>     to all cylinder groups.
>
>  Quadratic  hash is used because of its speed in finding
>unused slots in nearly full  hash  tables  [Knuth75].   File
>systems that are parameterized to maintain at least 10% free
>space rarely use this strategy.  File systems that  are  run
>without  maintaining  any  free  space typically have so few
>free blocks that almost any allocation is random;  the  most
>important  characteristic  of  the  strategy used under such
>conditions is that the strategy be fast.

This is why I'm skeptical.  The first two, and part of the third, are 
geared towards finding a "close" block.  The rest of the third is, I 
believe, optimized towards not reading too many per-cylinder group 
free-lists, since that implies disk I/O.  But with today's memory sizes 
and buffer cache algorithms, I suspect that little or no I/O is 
necessary.  It might be an interesting experiment to build a file 
system consisting of a single cylinder group and measure its 
performance.  (No, I'm not volunteering; I have neither the machines 
nor the disks nor the tools.....)


		--Steve Bellovin, http://www.research.att.com/~smb