Subject: some performance diff with getc()/putc() between FreeBSD and NetBSD?
To: NetBSD-current Discussion List <current-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-i386
Date: 05/05/2003 19:18:36
Over the past week I've been running various benchmarks and other tests
on a system destined to become a new file server and I've noticed what
seems to be about a 9% better per-char intput (using getc()) througput
on FreeBSD.
The machine I'm testing on has a Xeon Pentium-IV CPU running at 2.8 GHz:
NetBSD 1.6R (GENERIC) #1: Mon May 5 15:32:17 EDT 2003
woods@dhcp138:/scratch/GENERIC
total memory = 1023 MB
avail memory = 943 MB
using 6144 buffers containing 52508 KB of memory
BIOS32 rev. 0 found at 0xf14e0
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Pentium 4 (686-class), 2790.97 MHz, id 0xf27
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu0: L2 cache 512 KB 64b/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: 16 page colors
At first I had it running FreeBSD-4.8, but I couldn't get VINUM to work
quite right on it so we ended up installing NetBSD-current. (-current
because the system has a pair of wm ethernets and a dual-channel mpt
controller)
As a result I've had a first-hand look at some of the differences and
similarities between the two systems on such a machine.
One of the baseline tests I've been running is Bonnie, such as this pair
of runs done on the ATA drive in the system:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
FBSD-ATA 3000 45998 33.5 44012 10.1 22631 7.9 42423 34.4 45041 6.2 152.1 0.4
NBSD-ATA 3000 41865 34.7 40128 13.2 10464 2.8 40413 34.4 40832 6.0 93.9 0.5
So far so good w.r.t. STDIO (though the rewrite speed is astoundingly
low on NetBSD for unknown reasons)
However as soon as I move on to significantly faster devices I see a
peak rate on the per-char input values that are noticably (~9%) lower
than I see on FreeBSD:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
Free-VN 3000 97956 69.6 97411 27.2 25448 8.2 97323 81.2 103413 16.5 579.4 2.2
Net-RF 3000 78534 66.6 78926 26.5 13895 5.9 87662 81.2 119517 26.2 276.9 2.9
These particular numbers look a little suspicious on the output side,
given the CPU wasn't maxed out.....
However on the input side the differences between FreeBSD and NetBSD are
much clearer to see since it's obvious the data could come off the
filesystem faster, as it does for block-input. Note the CPUs were going
full out too -- the last 19% was going to VINUM and RAIDframe.
(Note to Greg Oster if he sees this: the original numbers I showed you
were much lower for per-char throughput because I had initially forgot
the -O2 when compiling bonnie! Luckily though, I suppose, the optimizer
make no difference to block-I/O.... :-)
I haven't looked closely to see if there are any differences in the
getc()/putc() implementations, but on first glance I don't see anything
obvious. I also haven't disassembled the resulting code to see what it
looks like.
FreeBSD-4.8 also uses GCC 2.95.3 (though there may be differences of
course, even for i386).
I didn't use any special compiler options to build bonnie (just -O2 on
both systems), and the libc's were stock from the binaries I installed
with (which for NetBSD was the 20030425 snapshot from releng).
Just to close with some good news for NetBSD: it seems NetBSD has
better overall throughput for many kinds of filesystem jobs, including
decent "postmark" simulated loads, while running on the very same bare
disks on the very same machine. This particular set of Postmark
parameters shows it well enough:
set transactions 5000
set number 5000
set size 100 150000
set read 1024
set write 1024
set bias read 7
set subdirectories 100
5000/5000@0.1K-146K FB-ATA NB-ATA FB-SCSI NB-SCSI
Total Time to run (secs) 51 36 89 63
Transactions per second 192 312 125 113
Data read (Kbytes/sec) 5260 7450 3190 4260
Data written (Kbytes/sec) 11690 16560 7100 9460
Take these numbers with at least a small grain of salt. I'm not sure I
had exactly figured out how to properly control "newfs" to make the
filesystem layouts identical at the time these numbers were collected --
it may be that FreeBSD had some room for improvement, though I'm
guessing a lot of the speed seen on NetBSD comes from the unified buffer
cache. Note also this is FreeBSD-4.8 vs. NetBSD-current.
It's even more dramatic for job loads that fit almost entirely into
the buffer cache!
set bias read 5
set bias create 5
set size 10240 20480
set transactions 100000
set subdirectories 10
set number 10000
10000/100000@10K-20K FB-ATA NB-ATA FB-SCSI NB-RAID-10
Total Time to run (secs) 303 45 271 101
Transactions per second 349 2325 404 1020
Data read (Kbytes/sec) 2640 17810 2960 7930
Data written (Kbytes/sec) 3190 21500 3570 9580
All these numbers, except for the total time to run, are of course just
plain silly from the point of view of disk device testing, but for
smaller jobs like this they show just how important the performance of
the buffer cache can be. I'm pretty sure that back before the unified
buffer cache came along these numbers would have made NetBSD look very
bad.
--
Greg A. Woods
+1 416 218-0098; <g.a.woods@ieee.org>; <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>