Subject: *horrible* performance?
To: None <port-alpha@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-alpha
Date: 06/26/2000 13:40:49
The machine: an AlphaPC 164LX/533 running 1.4T (full dmesg below).

The problem: absolutely *abysmal* performance at a certain task.

This alpha is a backup server.  Recently I've been playing with NDMP,
since we've got a new machine and have found that the most
suitable-seeming backup method for it is probably NDMP.  I have an NDMP
client program written, which sends the backup to its stdout.

When I run this program with its output redirected to /dev/null, I see
some eight-plus megabytes per second throughput, which is somewhere in
the "acceptable to good" range.

When I run this program with its output directed to a file on a local
disk, I see a max of about 60 *kilo*bytes per second throughput.  This
is positively abysmal; the SS1+ on my desk can sustain five times that
over a regular 10Mbit ether.  I've tried this with the file on wd0
(which is where we'd ultimately like it) and sd0; the speed difference
is so small as to be lost in the noise.

I've tried mounting the filesystem async (when using wd0), and that
doesn't help at all.

I've tried using a file on a ramdisk (mount_mfs), and that performs
approximately as well as redirecting it to /dev/null - until the
ramdisk fills up.

When I run the program with the output piped to a netcat to send the
bits back out the network interface to be discarded on another machine,
I see about four-plus megabytes per second - roughly half the /dev/null
figure, which is not too surprising given that it's contending for the
same underlying network pipe.

The code does comparatively small writes to its output, on the order of
a few K per write.  I've got the code gathering statistics, and they
imply that it's *not* pipeline back-pressure; it seems that when
writing to a file, it somehow slows down the incoming data stream.

Interestingly, I tried making the code buffer writes, so it saves data
up in RAM and writes it in chunks of 20 megs or so, and that brings
performance up into the 4-5 megabytes/sec range.  Any idea what could
be causing such horrible performance in the small-writes case?  Note
that the problem happens even when the filesystem is mounted async, but
does not happen when writing to an mfs.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Here's the full dmesg.  The network interface in use is de0; ex0 is not
connected and hasn't been touched since boot.  Both de0 and the switch
it's connected to are specifically configured for 100/full, and in any
case all the above tests are run without touching the network
configuration in any way.

[ preserving 320984 bytes of netbsd ELF symbol table ]
Copyright (c) 1996, 1997, 1998, 1999, 2000
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.4T (OMEGA) #0: Thu Jun 22 11:19:01 EDT 2000
    mouse@Omega.McRCIM.McGill.EDU:/usr/src/sys/arch/alpha/compile/OMEGA
Digital AlphaPC 164LX 533 MHz
8192 byte page size, 1 processor.
total memory = 128 MB
(1904 KB reserved for PROM, 126 MB used by NetBSD)
avail memory = 113 MB
using 820 buffers containing 6560 KB of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-2 (pass 2)
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (Pyxis), pass 1
cia0: extended capabilities: 1<BWEN>
cia0: using BWX for PCI config access
pci0 at cia0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
de0 at pci0 dev 5 function 0
de0: interrupting at eb164 irq 2
de0: DEC DE500-AA 21140A [10-100Mb/s] pass 2.0
de0: address 00:00:f8:06:29:15
de0: enabling 100baseTX port
ex0 at pci0 dev 6 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
ex0: interrupting at eb164 irq 0
ex0: MAC address 00:50:04:9b:4a:56
exphy0 at ex0 phy 24: 3Com internal media interface
exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
sio0 at pci0 dev 8 function 0: Intel 82378ZB System I/O (SIO) (rev. 0x43)
ncr0 at pci0 dev 9 function 0: ncr 53c810 fast10 scsi
ncr0: interrupting at eb164 irq 3
ncr0: minsync=25, maxsync=206, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
scsibus0 at ncr0: 8 targets, 8 luns per target
pciide0 at pci0 dev 11 function 0: CMD Technology PCI0646
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <Maxtor 92048D8>
wd0: drive supports 16-sector pio transfers, lba addressing
wd0: 19531 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 40000464 sectors
wd0: 32-bits data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
pciide0: secondary channel wired to compatibility mode
atapibus0 at pciide0 channel 1
cd0 at atapibus0 drive 1: <FX820S, , g01> type 5 cdrom removable
cd0: 32-bits data port
cd0: drive supports PIO mode 3, DMA mode 1
cd0(pciide0:1:1): using PIO mode 3, DMA mode 1 (using DMA data transfers)
isa0 at sio0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
lpt0 at isa0 port 0x3bc-0x3bf irq 7
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST39140N, 1498> SCSI2 0/direct fixed
sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
sd0: 8683 MB, 9006 cyl, 8 head, 246 sec, 512 bytes/sect x 17783240 sectors
st0 at scsibus0 targ 2 lun 0: <EXABYTE, EXB-850085QANXRC, 06X0> SCSI2 1/sequential removable
st0: st0(ncr0:2:0): 4.0 MB/s (250 ns, offset 8)
drive empty
root on sd0a dumps on sd0b
root file system type: ffs