Subject: namei caching of newly created files?
To: None <tech-perform@NetBSD.org>
From: Havard Eidnes <he@uninett.no>
List: tech-kern
Date: 01/19/2005 22:53:49
----Next_Part(Wed_Jan_19_22:53:49_2005_431)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hi,
I've been doing some simple testing using postmark using various
hardware configurations. The postmark test is always run with "set
number 20000" and "set transactions 50000". This will create 20000
files in a single directory, and perform various transactions (read/
append and create/delete) on those files before removing them all.
My latest test was with a large FFSv2 file system using soft
dependencies, and on identical hardware running recentish Linux, ext3
with (supposedly) btree directory data structure, we are being
trounched -- where Linux gets 3000 transactions/s and 15-20 MB/s
read/write, we get at most around 800 t/s and 2-3 MB/s read/write.
The system time on NetBSD in this case turns out to be really high;
some of the time around 99%. Running kernel profiling reveals that
most of the CPU time is spent in ufs_lookup (subset of gprof output
follows below). What is also evident from watching "systat vm" output
is that the namei cache gets a quite low hit rate during this test --
30-40% at the most.
By inspecting the value of "numcache" (from vfs_cache.c) using gdb
during the test, it becomes clear that newly created directory entries
are not being put on the namei() cache, because the value starts
increasing first after the "Creating files..." phase is done.
It would probably help in this particular test if the namei() cache
was primed with newly created directory entries. That probably means
that the namei() neighborhood needs to grow a new interface, and the
"create" caller(s) needs to be instructed to use it. Intuitively it
would probably be beneficial in the general case as well -- when you
create a file you usually want to do other operations on it as well.
The problem that ufs_lookup() uses lots of CPU time in the kernel
(when it is invoked) is however probably more of an architectural
issue related to the on-disk directory data structure (simple linear),
and this is probably not easily fixed without introducing another
on-disk directory data structure. The suggested fix would hopefully
reduce the number of times it's invoked, though.
Comments? Anyone I can goad into making a patch along these lines?
Regards,
- H=E5vard
----Next_Part(Wed_Jan_19_22:53:49_2005_431)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
% cumulative self self total
time seconds seconds calls ms/call ms/call name
61.89 57.40 57.40 185418 0.31 0.31 ufs_lookup
30.88 86.04 28.64 mpidle
0.75 86.74 0.70 45093 0.02 0.02 newdirrem
0.52 87.22 0.48 53074 0.01 0.01 scanc
...
-----------------------------------------------
0.00 0.00 2/185307 elf32_load_file [289]
0.00 0.00 2/185307 sys_readlink [290]
0.00 0.00 3/185307 check_exec [275]
0.00 0.01 42/185307 sys___stat13 [206]
0.00 14.18 45093/185307 sys_unlink [10]
0.00 14.18 45093/185307 sys___lstat13 [11]
0.00 29.90 95072/185307 vn_open [8]
[3] 62.9 0.00 58.29 185307 namei [3]
0.03 58.21 185312/185312 lookup [4]
0.02 0.00 185307/186773 pool_cache_get_paddr [176]
0.02 0.00 140212/186774 pool_cache_put_paddr [179]
0.01 0.00 185305/185348 copyinstr [226]
0.00 0.00 5/3502301 vput <cycle 1> [148]
0.00 0.00 5/3502301 vrele <cycle 1> [218]
0.00 0.00 5/5 VOP_READLINK [431]
0.00 0.00 2/2 copystr [888]
-----------------------------------------------
...
57.40 0.72 185418/185418 VOP_LOOKUP [5]
[6] 62.7 57.40 0.72 185418 ufs_lookup [6]
0.02 0.29 462794/552351 VOP_BLKATOFF [43]
0.10 0.05 185418/185418 cache_lookup [67]
0.00 0.14 90187/166258 ffs_vget [50]
0.02 0.03 275604/325586 VOP_ACCESS [112]
0.00 0.03 45096/45096 cache_enter [152]
0.03 0.00 462794/1006051 brelse [108]
0.00 0.02 135911/135917 __udivdi3 [192]
0.00 0.00 32315/723173 VOP_UNLOCK [96]
0.00 0.00 2/18184 Xspllower [69]
...
----Next_Part(Wed_Jan_19_22:53:49_2005_431)----