Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Severe netbsd-6 NFS server-side performance issues



Friday's server maintenance window resulted in more information, but not
more clarity.

At 9:30 Uhr +0200 31.05.2012, Hauke Fath wrote:
>>Or is nfsd really trashing the system?
>>It could be an nfsd regression too.
>
>The load certainly goes away when I turn off nfsd.  ;)
>
>Maybe I should do that, then run bonnie to check whether local disk
>bandwidth has changed.

That's what I started with: Switched nfsd off, then ran bonnie++ on the
RAID with results like those I got in single-user. My take: As long as nfsd
stays out of the way, the machine is fine (but see below).

Next, I restricted nfs access to one (NetBSD) client machine, mounted a
share there, and ran bonnie++ on it. 'systat vmstat' on the server gave
me > 30 MBytes/sec. I tried from a Ubuntu 10 client - same result, more or
less.

In the end, I re-enabled nfs access on the server, set up a 4way bonnie run
on a Ubuntu client, and left for the weekend.(*)

Checking back on Saturday, I found most of the processes on the server in
D. The console had a lone

login: amr0: bad status (not active; 0x040)

and "systat vmstat" gave 100% disk bandwidth at 100 KBytes/sec.

I got a 'ps axl', but the serial console truncates output after the 80th
column, even when you re-direct the output to a file, and most of the
daemons start with /usr. grep(1) didn't work.

A reboot got stuck, I broke into the debugger and got
<http://la.causeuse.org/hauke/NetBSD/netbsd-6-nfsd/ddb-venediger-nfsd.out.gz>.
The "reboot 0x04" wedged the machine solidly.

A coworker reset the server on Monday morning, but found he had to actually
power it down to un-wedge the MegaRAID. Now, the machine is back to its
usual 10 MBytes/sec / 100% I/O / load 10 state.

What next? At this point, I a seriously contemplating to give FreeBSD a
try. Their amr(4) appears to have seen a lot of updates that went past the
NetBSD counterpart, and they ship nfs4 support, too ...

        hauke


(*) I had disabled my "reboot the machine if nfsd found in D for more than
60 sec" script at this point.


-- 
     The ASCII Ribbon Campaign                    Hauke Fath
()     No HTML/RTF in email            Institut für Nachrichtentechnik
/\     No Word docs in email                     TU Darmstadt
     Respect for open standards              Ruf +49-6151-16-3281


Home | Main Index | Thread Index | Old Index