Subject: namei caching of newly created files?
To: None <tech-perform@NetBSD.org>
From: Havard Eidnes <he@uninett.no>
List: tech-kern
Date: 01/19/2005 22:53:49
----Next_Part(Wed_Jan_19_22:53:49_2005_431)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

I've been doing some simple testing using postmark using various
hardware configurations.  The postmark test is always run with "set
number 20000" and "set transactions 50000".  This will create 20000
files in a single directory, and perform various transactions (read/
append and create/delete) on those files before removing them all.

My latest test was with a large FFSv2 file system using soft
dependencies, and on identical hardware running recentish Linux, ext3
with (supposedly) btree directory data structure, we are being
trounched -- where Linux gets 3000 transactions/s and 15-20 MB/s
read/write, we get at most around 800 t/s and 2-3 MB/s read/write.

The system time on NetBSD in this case turns out to be really high;
some of the time around 99%.  Running kernel profiling reveals that
most of the CPU time is spent in ufs_lookup (subset of gprof output
follows below).  What is also evident from watching "systat vm" output
is that the namei cache gets a quite low hit rate during this test --
30-40% at the most.

By inspecting the value of "numcache" (from vfs_cache.c) using gdb
during the test, it becomes clear that newly created directory entries
are not being put on the namei() cache, because the value starts
increasing first after the "Creating files..." phase is done.

It would probably help in this particular test if the namei() cache
was primed with newly created directory entries.  That probably means
that the namei() neighborhood needs to grow a new interface, and the
"create" caller(s) needs to be instructed to use it.  Intuitively it
would probably be beneficial in the general case as well -- when you
create a file you usually want to do other operations on it as well.

The problem that ufs_lookup() uses lots of CPU time in the kernel
(when it is invoked) is however probably more of an architectural
issue related to the on-disk directory data structure (simple linear),
and this is probably not easily fixed without introducing another
on-disk directory data structure.  The suggested fix would hopefully
reduce the number of times it's invoked, though.

Comments?  Anyone I can goad into making a patch along these lines?

Regards,

- H=E5vard

----Next_Part(Wed_Jan_19_22:53:49_2005_431)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 61.89     57.40    57.40   185418     0.31     0.31  ufs_lookup
 30.88     86.04    28.64                             mpidle
  0.75     86.74     0.70    45093     0.02     0.02  newdirrem
  0.52     87.22     0.48    53074     0.01     0.01  scanc
...
-----------------------------------------------
                0.00    0.00       2/185307      elf32_load_file [289]
                0.00    0.00       2/185307      sys_readlink [290]
                0.00    0.00       3/185307      check_exec [275]
                0.00    0.01      42/185307      sys___stat13 [206]
                0.00   14.18   45093/185307      sys_unlink [10]
                0.00   14.18   45093/185307      sys___lstat13 [11]
                0.00   29.90   95072/185307      vn_open [8]
[3]     62.9    0.00   58.29  185307         namei [3]
                0.03   58.21  185312/185312      lookup [4]
                0.02    0.00  185307/186773      pool_cache_get_paddr [176]
                0.02    0.00  140212/186774      pool_cache_put_paddr [179]
                0.01    0.00  185305/185348      copyinstr [226]
                0.00    0.00       5/3502301     vput <cycle 1> [148]
                0.00    0.00       5/3502301     vrele <cycle 1> [218]
                0.00    0.00       5/5           VOP_READLINK [431]
                0.00    0.00       2/2           copystr [888]
-----------------------------------------------
...
               57.40    0.72  185418/185418      VOP_LOOKUP [5]
[6]     62.7   57.40    0.72  185418         ufs_lookup [6]
                0.02    0.29  462794/552351      VOP_BLKATOFF [43]
                0.10    0.05  185418/185418      cache_lookup [67]
                0.00    0.14   90187/166258      ffs_vget [50]
                0.02    0.03  275604/325586      VOP_ACCESS [112]
                0.00    0.03   45096/45096       cache_enter [152]
                0.03    0.00  462794/1006051     brelse [108]
                0.00    0.02  135911/135917      __udivdi3 [192]
                0.00    0.00   32315/723173      VOP_UNLOCK [96]
                0.00    0.00       2/18184       Xspllower [69]
...


----Next_Part(Wed_Jan_19_22:53:49_2005_431)----