Subject: Re: problems with nmbcluster (?)
To: Manuel Bouyer <>
From: Stephen Jones <>
List: tech-net
Date: 01/10/2007 17:35:09
> On Mon, Jan 08, 2007 at 07:01:40AM +0100,  
> wrote:
>> I have already testet with NMBCLUSTERS=4096. I think the system  
>> runs some
>> days longer until the stall occurs. Now I will test with 8192, but  
>> I think
>> it will not solve the problem.

Manuel -

Why is there such a black magic to this?  Is this something that  
could be handled more
gracefully with kernel warnings prior to actually hanging?  Could it  
be set to increase
(or decrease) dynamically?

Nearly all the NetBSD crashes I experience are related to this, or so  
I am told, and over
the years I've never gotten it figured out.  I've cited this as a  
'vnlock deadlock' issue,
but thats just a symptom.  The real issue is resource starvation ..  
spectre or the real ghost?

One of the big problems is that you might not even get a clue before  
a system hangs.
So for me, I see about 18-24 days of uptime prior to inevitable  
silent hang.  No
warning, no panic .. just a hang on the NFS server which causes all  
of the clients
to cascade vnlock deadlocks.

Just a few days ago I had a fortunate clue.  I awoke to my phone  
beeping at me telling
me of a problem and when I got to the console I was able to break to  
a debugger and
kill init to get the NFS server to drop to single user mode.  I was  
being patient
hoping that it would eventually recover and give me a shell so I  
could bring it back up when:

mclpool limit reached: increase NMBCLUSTERS

spewed down the screen 50 or so times.  Finally, a real clue and  
confirmation!  So whats the history
of this?

I tried 8192, 16k, 24k, 32k, 64k .. now I'm at 92k, yet still .. I  
need to increase NMBCLUSTERS.
To quote Nintendo, How high can you go?  Whats the logic behind  
NMBCLUSTERS?  I realise that
this is a single value that can affect other parameters, isn't that  
correct?  So is it a phantom
or should I really be ever increasing NMBCLUSTERS?  What happens if I  
tell it to go 256k?  Is
that too high?

Did you mention to 6bone to send the output of pstat -T .. Will that  
help out?