NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/39564: wapbl performance issues with disk cache flushing


I have some additional data points and comments related to this
issue.  I'm testing by doing parallel builds on a HP ProLiant
460c G1 blade, which is equipped with 16GB memory, dual quad-
core Xeon processors at 2.83GHz, and with an on-board ciss(4)
controller with a single file system (root, /usr, /var etc.).

I'm testing by doing " -j 16" on this system.  The
different variants and the result of "time" from csh were as

First a few attempts without simon's patch to enable one to turn
off cache flushing via sysctl:

1) Plain " -j 16":

   6515.491u 2108.338s 2:19:19.68 103.1%   0+0k 3+688241io 32279pf+0w

   (yes, that's almost 2 hours and 20 minutes(!))

2) As above, but with a "time sh -c 'while true; do sync; sleep 1; done'
   loop running in parallel with the build:

   5777.841u 2284.086s 21:59.48 610.9%     0+0k 627+44423io 32323pf+0w

   while the sync loop got

   0.878u 62.666s 24:11.92 4.3%  0+0k 0+48589io 0pf+0w

3) Just for comparison, a "plain" without -j specified:

   5312.852u 970.429s 2:52:58.37 60.5%     0+0k 103+190425io 32351pf+0w

Some observations / comments:

 o Even when taking the i/o's triggered by the sync loop in 2)
   into account, the amount of IO under 1) is massively larger
   than in 2), with a factor of about 7.4(!)

 o Running "top" and "systat vm" in two windows while the build
   in 1) is running shows that in periods of low cpu utilization,
   lots of processes appear "stuck" in "tstile/<n>" state.

 o The "tstile/<n>" status is kind of non-descript, it basically just
   says "I'm waiting on some unspecified kernel-internal resource".
   Pinpointing exactly which resource there is contention for cannot
   easily be done by using these normal system state monitor tools.
   Some other developers hinted that it would be better if the
   resource was described by the wait channel string.  It appears that
   to do this, the locking primitives which end up calling
   turnstile_block() would need to pass down some informative string
   which better describes the resource locked instead of using a
   constant "tstile" string as is done in the turnstile_block()
   function when calling sleepq_enqueue().

So, in the next round I installed simon's patch and turned off wapbl
disk cache flushing, and repeated the tests 1) and 2) above.  Contrary
to what one might beleive, these actually revealed a difference(!):

4) A standard " -j 16"

   5987.961u 2149.343s 28:30.74 475.6%     0+0k 11678+562510io 56375pf+0w

5) as in 4), but with the sync shell loop:

   5746.054u 2302.403s 21:08.90 634.2%     0+0k 1+46619io 32331pf+0w

   and for the sync loop itself:

   0.774u 52.742s 21:21.80 4.1%  0+0k 0+49910io 0pf+0w

Observations / comments:

 o While the difference in wall-clock time is much less than between
   1) and 2) above, the difference in the amount of IO is still very
   significant, with a factor around 5.8(!)

 o Observing the "top" display in times of ebbs in the CPU
   utilization, I still see many processes in "tstile/<n>" state.

I've repeated this experiment on a non-wapbl file system on the same
system, but there the difference was in the "sync loop" disfavour:

5799.240u 2232.222s 24:57.93 536.1%     0+0k 494+710112io 32612pf+0w
5881.837u 2435.640s 27:14.19 508.9%     0+0k 0+720501io 32646pf+0w

and the number of IOs is comparable.

So...  It seems to me that something or other is causing the wapbl
file systems to experience massive lock contention with parallel
activity on the file system.  Also, this seems to indicate that the
patch to make it possible to turn off disk cache flushing with wapbl
via sysctl may just paper over an underlying issue.  However, due to
the non-descript nature of the "tstile" status, it's difficult to give
any further hints as to which resource is actually contested.

Best regards,

- Håvard

Home | Main Index | Thread Index | Old Index