Subject: Bad NFS performance
To: None <tech-perform@netbsd.org>
From: Dominik Westner <westner@absurd.dnsalias.org>
List: tech-perform
Date: 05/04/2001 20:40:03
Hi all,

I am having problems with the NFS performance between  a NetBSD1.5 
(server) and a MacOSX10.0.2 (client) machine.

While I get about 6,8MB/s between two MacOSX machines both read/write. I 
only get below 1MB/s write and about 6,7MB/s read performance between 
NetBSD and MacOSX.

Here are some details: (boo / NetBSD server, icebox / MacOSX client)

Local Write / Read on an external RAID on NetBSD server:

boo# dd bs=1024k count=128 if=/dev/zero of=tmpfile
128+0 records in
128+0 records out
134217728 bytes transferred in 11 secs (12201611 bytes/sec)
boo# dd bs=1024k count=128 if=/dev/zero of=tmpfile
128+0 records in
128+0 records out
134217728 bytes transferred in 9 secs (14913080 bytes/sec)
boo# dd bs=1024k count=128 of=/dev/null if=tmpfile
128+0 records in
128+0 records out
134217728 bytes transferred in 7 secs (19173961 bytes/sec)
boo# dd bs=1024k count=128 of=/dev/null if=tmpfile
128+0 records in
128+0 records out
134217728 bytes transferred in 7 secs (19173961 bytes/sec)

NFS Write / Read between NetBSD server and MacOSX client:

[icebox:~/Temp] westner% dd bs=1024k count=32 if=/dev/zero of=tmpfile
32+0 records in
32+0 records out
33554432 bytes transferred in 34 secs (986895 bytes/sec)
[icebox:~/Temp] westner% dd bs=1024k count=32 if=/dev/zero of=tmpfile
32+0 records in
32+0 records out
33554432 bytes transferred in 31 secs (1082401 bytes/sec)
[icebox:~/Temp] westner% dd bs=1024k count=32 of=/dev/null if=tmpfile
32+0 records in
32+0 records out
33554432 bytes transferred in 5 secs (6710886 bytes/sec)
[icebox:~/Temp] westner% dd bs=1024k count=32 of=/dev/null if=tmpfile
32+0 records in
32+0 records out

On NetBSD I have built a custom kernel, with
options         NMBCLUSTERS=1024
maxusers        64              # estimated number of users
I also have tried
#options        BUFCACHE=20
but that made things even worth, so I disabled it again.

nfsd consumes a lot of CPU cycles, which is somehow surprising (>90%)
systat vm gives me below 5% for the disk

and finally here is my dmesg output:

boo# dmesg
NetBSD 1.5S (BOO) #13: Fri May  4 20:14:57 CEST 2001
     root@boo.absurd.dnsalias.org:/usr/src/sys/arch/i386/compile/BOO
cpu0: Intel Pentium II (Klamath) (686-class), 233.88 MHz
cpu0: I-cache 16K 32b/line 4-way, D-cache 16K 32b/line 2/4-way
cpu0: L2 cache 512K 32b/line 4-way
cpu0: features 80f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
cpu0: features 80f9ff<PGE,MCA,CMOV,MMX>
total memory = 255 MB
avail memory = 234 MB
using 3297 buffers containing 13188 KB of memory
BIOS32 rev. 0 found at 0xf04e0
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82443LX PCI AGP Controller (PAC) (rev. 0x03)
ppb0 at pci0 dev 1 function 0: Intel 82443LX AGP Interface (PAC) (rev. 
0x03)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga0 at pci1 dev 0 function 0: Matrox MGA Millennium II 2164WA-B AG 
(rev. 0x00)
wsdisplay0 at vga0: console (80x25, vt100 emulation)
pcib0 at pci0 dev 4 function 0
pcib0: Intel 82371AB PCI-to-ISA Bridge (PIIX4) (rev. 0x01)
pciide0 at pci0 dev 4 function 1: Intel 82371AB IDE controller (PIIX4) 
(rev. 0x01)
pciide0: device disabled (at device)
uhci0 at pci0 dev 4 function 2: Intel 82371AB USB Host Controller 
(PIIX4) (rev. 0x01)
uhci0: interrupting at irq 12
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
Intel 82371AB Power Management Controller (PIIX4) (miscellaneous bridge, 
revision 0x01) at pci0 dev 4 function 3 not configured
ahc0 at pci0 dev 6 function 0
ahc0: interrupting at irq 12
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
scsibus0 at ahc0 channel 0: 16 targets, 8 luns per target
ahc1 at pci0 dev 9 function 0
OptionMode = 3
ahc1: interrupting at irq 12
ahc1: aic7892 Wide Channel A, SCSI Id=7, 16/255 SCBs
scsibus1 at ahc1 channel 0: 16 targets, 8 luns per target
ex0 at pci0 dev 12 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
ex0: interrupting at irq 11
ex0: MAC address 00:10:5a:39:4c:f0
exphy0 at ex0 phy 24: 3Com internal media interface
exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isa0 at pcib0
com2 at isa0 port 0x3e8-0x3ef irq 5: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
lpt1 at isa0 port 0x278-0x27b: polled
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
isapnp0: no ISA Plug 'n Play devices found
biomask f7dd netmask ffdd ttymask ffdf
scsibus0: waiting 2 seconds for devices to settle...
ahc0: target 0 synchronous at 20.0MHz, offset = 0xf
ahc0: target 0 using tagged queuing
sd0 at scsibus0 target 0 lun 0: <IBM, DCAS-34330, S65A> SCSI2 0/direct 
fixed
sd0: 4134 MB, 8205 cyl, 6 head, 171 sec, 512 bytes/sect x 8467200 sectors
ahc0: target 1 synchronous at 10.0MHz, offset = 0xf
ahc0: target 1 using tagged queuing
sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST12550N, 0014> SCSI2 0/direct 
fixed
sd1: 2040 MB, 2708 cyl, 19 head, 81 sec, 512 bytes/sect x 4178874 sectors
ahc0: target 5 synchronous at 5.0MHz, offset = 0xf
cd0 at scsibus0 target 5 lun 0: <PLEXTOR, CD-ROM PX-32TS, 1.02> SCSI2 
5/cdrom removable
ahc0: target 6 using asynchronous transfers
cd1 at scsibus0 target 6 lun 0: <PHILIPS, CDD2600, 1.07> SCSI2 5/cdrom 
removable
scsibus1: waiting 2 seconds for devices to settle...
ahc1: target 3 using 16bit transfers
ahc1: target 3 synchronous at 40.0MHz, offset = 0x1f
ahc1: target 3 using tagged queuing
sd2 at scsibus1 target 3 lun 0: <IFT, 3102, 0223> SCSI2 0/direct fixed
sd2: 35003 MB, 35003 cyl, 64 head, 32 sec, 512 bytes/sect x 71687040 
sectors
ahc1: target 4 using 16bit transfers
ahc1: target 4 synchronous at 40.0MHz, offset = 0x1f
ahc1: target 4 using tagged queuing
ahc1: target 6 using 16bit transfers
ahc1: target 6 synchronous at 10.0MHz, offset = 0xf
st0 at scsibus1 target 6 lun 0: <QUANTUM, DLT8000, 010F> SCSI2 
1/sequential removable
st0: density code 65, variable blocks, write-enabled
sd0: no disk label
boot device: sd1
root on sd1a dumps on sd1b
root file system type: ffs

Any ideas on how to make this combo rock is very much appreciated. I 
just can't believe that this is all I can get out on NetBSD.

Thanks

Dominik