Subject: some performance diff with getc()/putc() between FreeBSD and NetBSD?
To: NetBSD-current Discussion List <current-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-i386
Date: 05/05/2003 19:18:36
Over the past week I've been running various benchmarks and other tests
on a system destined to become a new file server and I've noticed what
seems to be about a 9% better per-char intput (using getc()) througput
on FreeBSD.

The machine I'm testing on has a Xeon Pentium-IV CPU running at 2.8 GHz:

	NetBSD 1.6R (GENERIC) #1: Mon May  5 15:32:17 EDT 2003
	        woods@dhcp138:/scratch/GENERIC
	total memory = 1023 MB
	avail memory = 943 MB
	using 6144 buffers containing 52508 KB of memory
	BIOS32 rev. 0 found at 0xf14e0
	mainbus0 (root)
	cpu0 at mainbus0: (uniprocessor)
	cpu0: Intel Pentium 4 (686-class), 2790.97 MHz, id 0xf27
	cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
	cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
	cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
	cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
	cpu0: L2 cache 512 KB 64b/line 8-way
	cpu0: ITLB 4K/4M: 64 entries
	cpu0: DTLB 4K/4M: 64 entries
	cpu0: 16 page colors

At first I had it running FreeBSD-4.8, but I couldn't get VINUM to work
quite right on it so we ended up installing NetBSD-current.  (-current
because the system has a pair of wm ethernets and a dual-channel mpt
controller)

As a result I've had a first-hand look at some of the differences and
similarities between the two systems on such a machine.

One of the baseline tests I've been running is Bonnie, such as this pair
of runs done on the ATA drive in the system:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
FBSD-ATA 3000 45998 33.5 44012 10.1 22631  7.9 42423 34.4 45041  6.2 152.1  0.4
NBSD-ATA 3000 41865 34.7 40128 13.2 10464  2.8 40413 34.4 40832  6.0  93.9  0.5

So far so good w.r.t. STDIO (though the rewrite speed is astoundingly
low on NetBSD for unknown reasons)

However as soon as I move on to significantly faster devices I see a
peak rate on the per-char input values that are noticably (~9%) lower
than I see on FreeBSD:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
Free-VN  3000 97956 69.6 97411 27.2 25448  8.2 97323 81.2 103413 16.5 579.4  2.2
Net-RF   3000 78534 66.6 78926 26.5 13895  5.9 87662 81.2 119517 26.2 276.9  2.9

These particular numbers look a little suspicious on the output side,
given the CPU wasn't maxed out.....

However on the input side the differences between FreeBSD and NetBSD are
much clearer to see since it's obvious the data could come off the
filesystem faster, as it does for block-input.  Note the CPUs were going
full out too -- the last 19% was going to VINUM and RAIDframe.

(Note to Greg Oster if he sees this:  the original numbers I showed you
were much lower for per-char throughput because I had initially forgot
the -O2 when compiling bonnie!  Luckily though, I suppose, the optimizer
make no difference to block-I/O....  :-)


I haven't looked closely to see if there are any differences in the
getc()/putc() implementations, but on first glance I don't see anything
obvious.  I also haven't disassembled the resulting code to see what it
looks like.

FreeBSD-4.8 also uses GCC 2.95.3 (though there may be differences of
course, even for i386).

I didn't use any special compiler options to build bonnie (just -O2 on
both systems), and the libc's were stock from the binaries I installed
with (which for NetBSD was the 20030425 snapshot from releng).


Just to close with some good news for NetBSD:  it seems NetBSD has
better overall throughput for many kinds of filesystem jobs, including
decent "postmark" simulated loads, while running on the very same bare
disks on the very same machine.  This particular set of Postmark
parameters shows it well enough:

	set transactions 5000
	set number 5000
	set size 100 150000
	set read 1024
	set write 1024
	set bias read 7
	set subdirectories 100

	5000/5000@0.1K-146K       FB-ATA NB-ATA FB-SCSI NB-SCSI
	Total Time to run (secs)      51     36      89      63
	Transactions per second      192    312     125     113
	Data read (Kbytes/sec)      5260   7450    3190    4260
	Data written (Kbytes/sec)  11690  16560    7100    9460

Take these numbers with at least a small grain of salt.  I'm not sure I
had exactly figured out how to properly control "newfs" to make the
filesystem layouts identical at the time these numbers were collected --
it may be that FreeBSD had some room for improvement, though I'm
guessing a lot of the speed seen on NetBSD comes from the unified buffer
cache.  Note also this is FreeBSD-4.8 vs. NetBSD-current.

It's even more dramatic for job loads that fit almost entirely into
the buffer cache!

	set bias read 5
	set bias create 5
	set size 10240 20480
	set transactions 100000
	set subdirectories 10
	set number 10000

	10000/100000@10K-20K      FB-ATA NB-ATA FB-SCSI NB-RAID-10
	Total Time to run (secs)     303     45     271     101
	Transactions per second      349   2325     404    1020
	Data read (Kbytes/sec)      2640  17810    2960    7930
	Data written (Kbytes/sec)   3190  21500    3570    9580

All these numbers, except for the total time to run, are of course just
plain silly from the point of view of disk device testing, but for
smaller jobs like this they show just how important the performance of
the buffer cache can be.  I'm pretty sure that back before the unified
buffer cache came along these numbers would have made NetBSD look very
bad.

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>