Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Severe netbsd-6 NFS server-side performance issues
Friday's server maintenance window resulted in more information, but not
more clarity.
At 9:30 Uhr +0200 31.05.2012, Hauke Fath wrote:
>>Or is nfsd really trashing the system?
>>It could be an nfsd regression too.
>
>The load certainly goes away when I turn off nfsd. ;)
>
>Maybe I should do that, then run bonnie to check whether local disk
>bandwidth has changed.
That's what I started with: Switched nfsd off, then ran bonnie++ on the
RAID with results like those I got in single-user. My take: As long as nfsd
stays out of the way, the machine is fine (but see below).
Next, I restricted nfs access to one (NetBSD) client machine, mounted a
share there, and ran bonnie++ on it. 'systat vmstat' on the server gave
me > 30 MBytes/sec. I tried from a Ubuntu 10 client - same result, more or
less.
In the end, I re-enabled nfs access on the server, set up a 4way bonnie run
on a Ubuntu client, and left for the weekend.(*)
Checking back on Saturday, I found most of the processes on the server in
D. The console had a lone
login: amr0: bad status (not active; 0x040)
and "systat vmstat" gave 100% disk bandwidth at 100 KBytes/sec.
I got a 'ps axl', but the serial console truncates output after the 80th
column, even when you re-direct the output to a file, and most of the
daemons start with /usr. grep(1) didn't work.
A reboot got stuck, I broke into the debugger and got
<http://la.causeuse.org/hauke/NetBSD/netbsd-6-nfsd/ddb-venediger-nfsd.out.gz>.
The "reboot 0x04" wedged the machine solidly.
A coworker reset the server on Monday morning, but found he had to actually
power it down to un-wedge the MegaRAID. Now, the machine is back to its
usual 10 MBytes/sec / 100% I/O / load 10 state.
What next? At this point, I a seriously contemplating to give FreeBSD a
try. Their amr(4) appears to have seen a lot of updates that went past the
NetBSD counterpart, and they ship nfs4 support, too ...
hauke
(*) I had disabled my "reboot the machine if nfsd found in D for more than
60 sec" script at this point.
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-3281
Home |
Main Index |
Thread Index |
Old Index