Re: Severe netbsd-6 NFS server-side performance issues

Friday's server maintenance window resulted in more information, but not
more clarity.

At 9:30 Uhr +0200 31.05.2012, Hauke Fath wrote:
>>Or is nfsd really trashing the system?
>>It could be an nfsd regression too.
>The load certainly goes away when I turn off nfsd.  ;)
>Maybe I should do that, then run bonnie to check whether local disk
>bandwidth has changed.

That's what I started with: Switched nfsd off, then ran bonnie++ on the
RAID with results like those I got in single-user. My take: As long as nfsd
stays out of the way, the machine is fine (but see below).

Next, I restricted nfs access to one (NetBSD) client machine, mounted a
share there, and ran bonnie++ on it. 'systat vmstat' on the server gave
me > 30 MBytes/sec. I tried from a Ubuntu 10 client - same result, more or

In the end, I re-enabled nfs access on the server, set up a 4way bonnie run
on a Ubuntu client, and left for the weekend.(*)

Checking back on Saturday, I found most of the processes on the server in
D. The console had a lone

login: amr0: bad status (not active; 0x040)

and "systat vmstat" gave 100% disk bandwidth at 100 KBytes/sec.

I got a 'ps axl', but the serial console truncates output after the 80th
column, even when you re-direct the output to a file, and most of the
daemons start with /usr. grep(1) didn't work.

A reboot got stuck, I broke into the debugger and got
The "reboot 0x04" wedged the machine solidly.

A coworker reset the server on Monday morning, but found he had to actually
power it down to un-wedge the MegaRAID. Now, the machine is back to its
usual 10 MBytes/sec / 100% I/O / load 10 state.

What next? At this point, I a seriously contemplating to give FreeBSD a
try. Their amr(4) appears to have seen a lot of updates that went past the
NetBSD counterpart, and they ship nfs4 support, too ...


(*) I had disabled my "reboot the machine if nfsd found in D for more than
60 sec" script at this point.

