Subject: how many kern.maxvnodes it too many/not enough?
To: None <tech-kern@netbsd.org>
From: Stephen Jones <smj@cirr.com>
List: tech-net
Date: 01/15/2004 23:23:38
I've hired a relatively expensive NetBSD developer to help trouble 
shoot a consistent vnlock deadlock (or an extremely long lock, or pile 
of locks that never get unlocked) that we've been seeing for months on 
our heavily used NFS clients.

On my own, I'm trying to learn and understand more about vnode locking 
and why an NFS client would run into this problem.  I've been able to 
verify this problem with another high volume site, but it doesn't seem 
to be an issue for others.  Originally I assumed this was vnode 
deadlocking, but that is probably wrong.  What is probably happening is 
that lots of vnodes are being used and eventually a pile of them causes 
a temporary deadlock.  In some cases, a client can recover on its own 
if traffic slows down, but in most cases waiting is not an option and 
the client must be rebooted
to regain access.

I've tried setting kern.maxvnodes to various sizes and I was wondering 
what others would find reasonable for a number and how that number 
should be determined.  I've played with values between 16k and 64k ..

In the typical vnode lock we see everyday on at least one or two of our 
clients (it plays no favourites), a ps from the debugger might look 
like this:

  PID             PPID       PGRP        UID S   FLAGS          COMMAND  
   WAIT
  13062           4982       2320      62035 3  0x4086            sleep 
nanosle
  13059          13046      13059      18375 3  0x4006             mail  
vnlock
  13055          12888      12879         92 3  0x4084            sleep 
nanosle
  13046          13008      13046      18375 3  0x4086             tcsh  
  pause
  13008            590      13008          0 3   0x184             sshd  
select
  12888          12883      12879         92 3  0x4084              ksh  
  pause
  12883          12879      12879         92 3  0x4084              ksh  
  pause
  12879          12875      12879          0 3  0x4084               sh  
   wait
  12875            637        637          0 3    0x84             cron  
piperd
  12819              1      12819      35449 3  0x4004              ksh  
vnlock
  12716          12715        631      32767 3  0x4004           finger  
vnlock
  12715            631        631      32767 3  0x4084          fingerd  
piperd
  12633          12632        631      32767 3  0x4004           finger  
vnlock
  12632            631        631      32767 3  0x4084          fingerd  
piperd
  11332            596        596      32767 3   0x184            httpd  
  netio
  11327            596        596      32767 3   0x184            httpd  
netcon
  11319            596        596      32767 3   0x184            httpd  
netcon
  11138           9509      11138       8904 3  0x4006              ksh  
vnlock
  11016              1      11000      39284 3  0x4006              ksh  
vnlock
  10918              1      10900      47900 3  0x4006              ksh  
vnlock
  10767              1      10751      46262 3  0x4006              ksh  
vnlock
  9513             596        596      32767 3   0x184            httpd  
  netio
  9509             590       9509          0 3   0x184             sshd  
select
  7351               1       7335      39284 3  0x4006              ksh  
vnlock
  5773               1       5757      39284 3  0x4006              ksh  
vnlock
  5379               1       5361      39284 3  0x4006              ksh  
vnlock
  4992            4991       2320      62035 3  0x4006               sh  
vnlock
  4991            2322       2320      62035 3  0x4086          getchar  
   wait
  4982            2322       2320      62035 3    0x86              ksh  
  pause
  4931               1       4911       6326 3  0x4006              ksh  
vnlock
  3221               1       3221      59077 3  0x4004              ksh  
vnlock
  3140               1       3140      59077 3  0x4006              ksh  
vnlock
  2936               1       2917       6326 3  0x4006              ksh  
vnlock
  2659               1       2659      59077 3  0x4006              ksh  
vnlock
  2254               1       2238      40035 3  0x4006              ksh  
vnlock
  29854              1      29854      34321 3  0x4006              ksh  
vnlock
  29681              1      29681      35449 3  0x4006              ksh  
vnlock
  28351              1      28351      35449 3  0x4006              ksh  
vnlock
  28122              1      28122      35449 3  0x4006              ksh  
vnlock
  25044          24821      24821      53559 3  0x4006              ksh  
vnlock
  24821          24714      24821      53559 3  0x400f             mutt  
genput
  24714              1      24714      53559 3  0x4006            ksh93  
vnlock
  22242            596        596      32767 3   0x184            httpd  
netcon
  12560              1      12560      35449 3  0x4006             bash  
vnlock
  11733          11621      11733      52788 3  0x4086             bash  
  ttyin
  11621            590      11621          0 3   0x184             sshd  
select
  28957          20518      28957       3428 3  0x5086            emacs  
select
  20518          20490      20518       3428 3  0x4086             bash  
   wait
  20490            590      20490          0 3   0x184             sshd  
select
  18102              1      18102       8308 3  0x4006             pine  
vnlock
  2635               1       2635      54984 3  0x4006             bash  
vnlock
  26638            596        596      32767 3   0x184            httpd  
netcon
  26637            596        596      32767 3   0x184            httpd  
  netio
  26636            596        596      32767 3   0x184            httpd  
  netio
  26635            596        596      32767 3   0x184            httpd  
  netio
  26634            596        596      32767 3   0x184            httpd  
netcon
  26633            596        596      32767 3   0x184            httpd  
netcon
  26632            596        596      32767 3   0x184            httpd  
netcon
  26631            596        596      32767 3   0x184            httpd  
netcon
  26630            596        596      32767 3   0x184            httpd  
  netio
  26629            596        596      32767 3   0x184            httpd  
netcon
  26628            596        596      32767 3   0x184            httpd  
netcon
  26627            596        596      32767 3   0x184            httpd  
netcon
  26626            596        596      32767 3   0x184            httpd  
  netio
  26625            596        596      32767 3   0x184            httpd  
netcon
  26624            596        596      32767 3   0x184            httpd  
netcon
  26623            596        596      32767 3   0x184            httpd  
netcon
  26622            596        596      32767 3   0x184            httpd  
  netio
  26621            596        596      32767 3   0x184            httpd  
  netio
  26620            596        596      32767 3   0x184            httpd  
  netio
  26619            596        596      32767 3   0x184            httpd  
netcon
  26618            596        596      32767 3   0x184            httpd  
netcon
  26617            596        596      32767 3   0x184            httpd  
netcon
  26616            596        596      32767 3   0x184            httpd  
netcon
  26615            596        596      32767 3   0x184            httpd  
netcon
  26614            596        596      32767 3   0x184            httpd  
netcon
  26613            596        596      32767 3   0x184            httpd  
netcon
  26612            596        596      32767 3   0x184            httpd  
netcon
  26611            596        596      32767 3   0x184            httpd  
netcon
  26610            596        596      32767 3   0x184            httpd  
  netio
  26609            596        596      32767 3   0x184            httpd  
netcon
  25603          25549      25603      49211 3  0x5006            emacs  
vnlock
  25593              1      25592      49211 3    0x86            twait 
nanosle
  25549          25489      25549      49211 3  0x4086              zsh  
  pause
  25489            590      25489          0 3   0x184             sshd  
select
  11343              1      11343      50983 3  0x400f             mutt  
vnlock
  19243          12348      19243      35058 3  0x400f             mutt  
vnlock
  14238          14234      14238      64910 3  0x4086              ksh  
  ttyin
  14234            590      14234          0 3   0x184             sshd  
select
  2322            2321       2320      62035 3  0x4086              ksh  
piperd
  2321            2320       2320      62035 3  0x4086               sh  
   wait
  2320            2146       2320      62035 3  0x4186              com  
   wait
  2146            2134       2146      62035 3  0x4086             tcsh  
  pause
  2134             590       2134          0 3   0x184             sshd  
select
  12348          12326      12348      35058 3  0x4086              ksh  
  pause
  12326            590      12326          0 3   0x184             sshd  
select
  641                1        641        100 3  0x4106            login  
vnlock
  637                1        637          0 3    0x84             cron 
nanosle
  631                1        631          0 3    0x84            inetd  
select
  613                1        613          0 3    0x84            timed  
select
  596                1        596          0 3    0x84            httpd  
select
  592                1        592          0 3   0x184             sshd  
select
  590                1        590          0 3   0x184             sshd  
select
  152                1        152          0 3    0x84        rpc.lockd  
select
  136                1        136          0 3    0x84              xfs  
select
  111                1        111          0 3    0x84           ypbind  
select
  106                1        106          0 3    0x84          rpcbind  
select
  93                 1         93          0 3    0x84          syslogd  
select
  59                 0          0          0 3 0x20284            nfsio  
nfsidl
  58                 0          0          0 3 0x20284            nfsio  
nfsidl
  57                 0          0          0 3 0x20284            nfsio  
nfsidl
  56                 0          0          0 3 0x20284            nfsio  
nfsidl
  55                 0          0          0 3 0x20284            nfsio  
nfsidl
  54                 0          0          0 3 0x20284            nfsio  
nfsidl
  53                 0          0          0 3 0x20284            nfsio  
nfsidl
  52                 0          0          0 3 0x20284            nfsio  
nfsidl
  51                 0          0          0 3 0x20284            nfsio  
nfsidl
  50                 0          0          0 3 0x20284            nfsio  
nfsidl
  49                 0          0          0 3 0x20284            nfsio  
nfsidl
  48                 0          0          0 3 0x20284            nfsio  
nfsidl
  47                 0          0          0 3 0x20284            nfsio  
nfsidl
  46                 0          0          0 3 0x20284            nfsio  
nfsidl
  45                 0          0          0 3 0x20284            nfsio  
nfsidl
  44                 0          0          0 3 0x20284            nfsio  
nfsidl
  43                 0          0          0 3 0x20284            nfsio  
nfsidl
  42                 0          0          0 3 0x20284            nfsio  
nfsidl
  41                 0          0          0 3 0x20284            nfsio  
nfsidl
  40                 0          0          0 3 0x20284            nfsio  
nfsidl
  5                  0          0          0 3 0x20204         aiodoned 
aiodone
  4                  0          0          0 3 0x20204          ioflush  
syncer
  3                  0          0          0 3 0x20204           reaper  
reaper
  2                  0          0          0 3 0x20204       pagedaemon 
pgdaemo
  1                  0          1          0 3  0x4084             init  
   wait
  0                 -1          0          0 3 0x20204          swapper 
schedul