Subject: Some kernel profiling on a 5000/133 with -current
To: port-pmax <port-pmax@netbsd.org>
From: Ken Wellsch <kwellsch@tampabay.rr.com>
List: port-pmax
Date: 08/24/2001 09:41:43
With the recent discussion about disk performance, I thought I would
run another kernel profile and see what may be happening.

I'm running a -current kernel on a 5000/133 (128Mb) with a Jun 1/01
userland.  The profiling kernel is custom (e.g. MIPS1 only) via the
usual "comment out" stuff using a GENERIC kernel as a base.

The summary is that a remarkable amount of time is spent in the
assembler memcpy/memmove function.  The profile I did back at the
end of January (with 1.5R) also shows this as a top consumer.

Does UBC and/or scsipi require a lot of buffer bouncing?
Hopefully the profiling itself is not the culprit here.

I suppose I should try and split memcpy & memmove for better profiling,
and also instrument them so they provide proper call-graph data etc. I'm
also curious about src/dst alignment distribution as well as the copy
size distribution... but I'm not knowledgeable enough to instrument that
so that I can tug the results out of a running kernel.

Here are the details from the profiling.

The first profile output is that produced after building GENERIC
overnight (with about 4 hours of idle time after it completed):

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 25.01    573.68   573.68    70264  8164.69  8164.69  mips_idle
  8.58    770.42   196.74                             _mcount
  7.00    930.91   160.49                             memmove
  6.59   1082.07   151.15                             __mcount
  4.75   1190.98   108.91 13355734     8.15     8.15  lockmgr
  4.69   1298.54   107.56   577608   186.22   186.22  mips_pagezero
  4.02   1390.83    92.29                             _splset_noprof
  2.78   1454.60    63.77  2185474    29.18    38.93  uvn_attach
  2.37   1508.97    54.37  1277593    42.56   277.91  uvm_fault
  1.82   1550.64    41.67  1347762    30.92    44.64  pmap_enter
  1.23   1578.89    28.25   533670    52.94   403.19  lookup
  1.15   1605.17    26.28  1880346    13.98   226.94  syscall_plain
  1.08   1629.96    24.79                             _splraise_noprof
  1.03   1653.56    23.60  1492532    15.81    90.38  ufs_lookup
  1.00   1676.54    22.98     1950 11782.85 50933.92  ffs_sync
	...
  0.00   2293.80     0.00        1     0.00    47.84  vn_fcntl

I then ran "bonnie -s 800" and that took over an hour, producing:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 11.43    433.44   433.44                             memmove
  9.99    812.10   378.66                             _mcount
  8.40   1130.36   318.27   105108  3027.99  3027.99  mips_idle
  8.22   1442.12   311.76                             __mcount
  4.73   1621.33   179.21                             _splset_noprof
  4.56   1794.27   172.94 18343012     9.43     9.43  lockmgr
  2.91   1904.57   110.30  2310538    47.74    64.96  uvn_attach
  2.85   2012.43   107.85   579127   186.23   186.23  mips_pagezero
  2.24   2097.50    85.07  4048512    21.01    28.16  pmap_enter
  2.11   2177.43    79.93  2003441    39.90   214.02  uvm_fault
  1.89   2249.16    71.73 38764271     1.85     4.49  st_scsibus_cmprss
  1.54   2307.53    58.37  1401437    41.65    41.65  mips1_FlushDCache
  1.48   2363.55    56.02  4751875    11.79    11.79  memset
  1.42   2417.37    53.82                             _splraise_noprof
  1.31   2467.15    49.78  1734271    28.70    48.90  genfs_getpages
  1.14   2510.38    43.23 13678369     3.16     3.16  pmap_is_modified
  1.13   2553.33    42.95  1129678    38.02   213.59  ncr53c9x_intr
  1.08   2594.36    41.02  2645605    15.51   382.60  syscall_plain
  1.05   2633.97    39.62  1839952    21.53    67.28  uvn_findpages
	...
  0.00   3790.57     0.00        1     0.00    73.84  vn_fcntl

I then ran "bonnie -s 800" again, and this is my final result:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 12.20    702.29   702.29                             memmove
  9.77   1264.69   562.40                             _mcount
  8.73   1767.05   502.36   140244  3582.07  3582.07  mips_idle
  8.21   2239.42   472.37                             __mcount
  5.10   2532.87   293.45                             _splset_noprof
  4.13   2770.59   237.73 23342045    10.18    10.19  lockmgr
  2.79   2931.02   160.43  2436080    65.85    90.39  uvn_attach
  2.25   3060.25   129.23  6750366    19.14    25.29  pmap_enter
  2.10   3180.97   120.73 63173285     1.91     4.92  st_scsibus_cmprss
  1.91   3290.66   109.69  2662591    41.20    41.20  mips1_FlushDCache
  1.88   3398.83   108.17   580935   186.20   186.20  mips_pagezero
  1.83   3503.85   105.02  2730621    38.46   205.57  uvm_fault
  1.80   3607.41   103.57  7750359    13.36    13.36  memset
  1.57   3697.68    90.26  2767455    32.62    57.45  genfs_getpages
  1.44   3780.31    82.63                             _splraise_noprof
  1.20   3849.46    69.16  1988694    34.77   210.07  ncr53c9x_intr
  1.16   3916.46    66.99 21491923     3.12     3.12  pmap_is_modified
  1.02   3975.18    58.72  2977164    19.72    56.78  uvn_findpages
	...
  0.00   5754.21     0.00        1     0.00    83.83  vn_fcntl