tech-kern: RE: namei caching of newly created files?

Subject: RE: namei caching of newly created files?
To: Havard Eidnes <tech-perform@NetBSD.org>
From: Gordon Waidhofer <gww@traakan.com>
List: tech-kern
Date: 01/19/2005 16:47:15
You've got some apples to oranges here.

If you mount ext3 with -o sync,data=journal, you'll find FFS faster.
FFS mounted with -o sync slows down a bit, but not as much as ext3.
It's because ext3 is much more aggressive about write-back than FFS.
ReiserFS really takes the cake about holding things in memory. It'll
blow the doors of ext3 on small loads, but crumbles under heavy loads.
XFS has better all around performance, though I have made it hang.
Don't bother with JFS.

Now, it ain't fair to knock ext3's aggressive write-back without
first defining the finish line. NFS stable storage rules are
considerably more rigid than all these file systems default mount mode.
Indeed, I'm not even sure FFS mounted -o sync meets the SPECsfs
NFS stable storage rules. XFS probably does. ReiserFS 3 does not and
can't be coerced into anything close. EXT3 ?might?, but it's performance
in stable storage mode is almost unusable. NFS stable storage
rules are the minimum for enterprise-class storage service.

IMO, once NFS rules are conceded, might was well go hog wild
with write-back caching. Why not. Loss is loss.

FWIW, I don't use the namei cache at all with my file system,
which uses hashed directories, and it outperforms FFS on large
directories.

Regards,
	-gww

> -----Original Message-----
> From: tech-kern-owner@NetBSD.org [mailto:tech-kern-owner@NetBSD.org]On
> Behalf Of Havard Eidnes
> Sent: Wednesday, January 19, 2005 1:54 PM
> To: tech-perform@NetBSD.org
> Cc: tech-kern@NetBSD.org
> Subject: namei caching of newly created files?
>
>
> Hi,
>
> I've been doing some simple testing using postmark using various
> hardware configurations.  The postmark test is always run with "set
> number 20000" and "set transactions 50000".  This will create 20000
> files in a single directory, and perform various transactions (read/
> append and create/delete) on those files before removing them all.
>
> My latest test was with a large FFSv2 file system using soft
> dependencies, and on identical hardware running recentish Linux, ext3
> with (supposedly) btree directory data structure, we are being
> trounched -- where Linux gets 3000 transactions/s and 15-20 MB/s
> read/write, we get at most around 800 t/s and 2-3 MB/s read/write.
>
> The system time on NetBSD in this case turns out to be really high;
> some of the time around 99%.  Running kernel profiling reveals that
> most of the CPU time is spent in ufs_lookup (subset of gprof output
> follows below).  What is also evident from watching "systat vm" output
> is that the namei cache gets a quite low hit rate during this test --
> 30-40% at the most.
>
> By inspecting the value of "numcache" (from vfs_cache.c) using gdb
> during the test, it becomes clear that newly created directory entries
> are not being put on the namei() cache, because the value starts
> increasing first after the "Creating files..." phase is done.
>
> It would probably help in this particular test if the namei() cache
> was primed with newly created directory entries.  That probably means
> that the namei() neighborhood needs to grow a new interface, and the
> "create" caller(s) needs to be instructed to use it.  Intuitively it
> would probably be beneficial in the general case as well -- when you
> create a file you usually want to do other operations on it as well.
>
> The problem that ufs_lookup() uses lots of CPU time in the kernel
> (when it is invoked) is however probably more of an architectural
> issue related to the on-disk directory data structure (simple linear),
> and this is probably not easily fixed without introducing another
> on-disk directory data structure.  The suggested fix would hopefully
> reduce the number of times it's invoked, though.
>
> Comments?  Anyone I can goad into making a patch along these lines?
>
> Regards,
>
> - Håvard
>