current-users: Re: SGI will freely license its XFS

Subject: Re: SGI will freely license its XFS
To: Bill Studenmund <wrstuden@nas.nasa.gov>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: current-users
Date: 05/24/1999 17:38:57

> Someone else in another discussion of LFS described the point I think Oleg
> was refereing to. That a highly fragmented file will read very slowly on
> an LFS (as the data are in multiple segments). That doesn't mean reading
> is slow, just a highly-fragmented file will read slow.
> 
> Likewise FFS will WRITE slowly to such a file. So it's just a difference
> in where the penalty is paid. Also, with LFS, if the cleaner runs, you
> should get back a quickly-read file.

That would be true if we had a file-coalescing cleaner, but we don't.  You
need to understand that LFS doesn't provide -- at least not when data
is first written -- "physical" locality of reference: data which is in the
same file or same directory is not necessarily gathered close together on
the disk.  Instead, it provides temporal locality of reference: data which
was written close together in time is close together on the disk; it will
often be *read* close together in time, so this is an effective optimzation.

There are workloads which are pathological for each type of filesystem.  I
can make a NetApp fall over pretty easily with a workload which always
reads data together which was not written together (e.g. a huge POP
mailserver) and I can make FFS do the same by reading data scattered
across the filesystem with little regard to filesystem structure, for
example with a gigantic parallel 'make'.  Caching is the only solution to
either problem.

One problem with our LFS is that the cleaner moves data around (as it has 
to), thereby destroying temporal locality of reference.  After data's been 
moved enough by cleaning, it's no longer anywhere near data which was
written at the same time.  The original Sprite solution to this was a
cleaner which implemented an FFS-like allocation policy, coalescing data
which is in the same file.  The paper I posted a reference to recently
describes a better method, in which the cleaner actually tries to group
data together by *observed* access patterns.  This is the best of both
worlds.