Subject: Re: problems with nmbcluster (?)
To: Manuel Bouyer <firstname.lastname@example.org>
From: Stephen Jones <email@example.com>
Date: 01/10/2007 17:35:09
> On Mon, Jan 08, 2007 at 07:01:40AM +0100,
> firstname.lastname@example.org wrote:
>> I have already testet with NMBCLUSTERS=4096. I think the system
>> runs some
>> days longer until the stall occurs. Now I will test with 8192, but
>> I think
>> it will not solve the problem.
Why is there such a black magic to this? Is this something that
could be handled more
gracefully with kernel warnings prior to actually hanging? Could it
be set to increase
(or decrease) dynamically?
Nearly all the NetBSD crashes I experience are related to this, or so
I am told, and over
the years I've never gotten it figured out. I've cited this as a
'vnlock deadlock' issue,
but thats just a symptom. The real issue is resource starvation ..
but is NMBCLUSTER a
spectre or the real ghost?
One of the big problems is that you might not even get a clue before
a system hangs.
So for me, I see about 18-24 days of uptime prior to inevitable
silent hang. No
warning, no panic .. just a hang on the NFS server which causes all
of the clients
to cascade vnlock deadlocks.
Just a few days ago I had a fortunate clue. I awoke to my phone
beeping at me telling
me of a problem and when I got to the console I was able to break to
a debugger and
kill init to get the NFS server to drop to single user mode. I was
hoping that it would eventually recover and give me a shell so I
could bring it back up when:
mclpool limit reached: increase NMBCLUSTERS
spewed down the screen 50 or so times. Finally, a real clue and
confirmation! So whats the history
I tried 8192, 16k, 24k, 32k, 64k .. now I'm at 92k, yet still .. I
need to increase NMBCLUSTERS.
To quote Nintendo, How high can you go? Whats the logic behind
NMBCLUSTERS? I realise that
this is a single value that can affect other parameters, isn't that
correct? So is it a phantom
or should I really be ever increasing NMBCLUSTERS? What happens if I
tell it to go 256k? Is
that too high?
Did you mention to 6bone to send the output of pstat -T .. Will that