Subject: Bonnie Numbers for a SE/30
To: None <port-mac68k@netbsd.org>
From: Joshua Coombs <jcoombs@gwi.net>
List: port-mac68k
Date: 11/07/2003 22:46:30
Awhile back I mentioned putting a silly HD in an otherwise stock SE/30,
and a request was made for bonnie numbers, so here you go. : )

Obligitory dmesg output:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 1.6.1_STABLE (68030) #0: Mon Oct 20 19:26:22 EDT 2003
    root@68030.x386.net:/usr/src/sys/arch/mac68k/compile/68030
Apple Macintosh SE/30  (68030)
cpu: delay factor 166
total memory = 128 MB
avail memory = 115 MB
using 1664 buffers containing 6656 KB of memory
mrg: 'Mac II class ROMs' ROM glue, tracing off, debug off, silent traps
mainbus0 (root)
obio0 at mainbus0
adb0 at obio0
asc0 at obio0: Apple Sound Chip
iwm0 at obio0: Apple GCR floppy disk controller
fd0 at iwm0 drive 0: (drive empty)
sbc0 at obio0: options=1<PDMA>
scsibus0 at sbc0: 8 targets, 8 luns per target
zsc0 at obio0 chip type 0
zsc0 channel 0: d_speed   9600 DCD clk 0 CTS clk 0
zstty0 at zsc0 channel 0
zsc0 channel 1: d_speed   9600 DCD clk 0 CTS clk 0
zstty1 at zsc0 channel 1
nubus0 at mainbus0
ae0 at nubus0 slot a: MacNIC II/E, 64KB memory
ae0: Ethernet address 00:00:94:31:9f:80
macvid0 at nubus0 slot e: Macintosh SE/30 Internal Video
macvid0: 512 x 342, monochrome
macfb0 at macvid0
wsdisplay0 at macfb0 (kbdmux ignored): console (std, vt100 emulation)
fpu0 at mainbus0 (mc68882)
ae0: NIC memory corrupt - invalid packet length 65280
adb0 (direct, II series): 0 targets
aed0 at adb0 addr 0: ADB Event device
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <QUANTUM, ATLAS 10K 36SCA, UCP0> SCSI3
0/direct
fixed
sd0: 35037 MB, 10042 cyl, 24 head, 297 sec, 512 bytes/sect x 71755944
sectors
sd0: async, 8-bit transfers
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
IP Filter: v3.4.29 initialized.  Default = pass all, Logging = disabled

68030# uname -a
NetBSD 68030.x386.net 1.6.1_STABLE NetBSD 1.6.1_STABLE (68030) #0: Mon Oct 20 19:26:22 EDT 2003
root@68030.x386.net:/usr/src/sys/arch/mac68k/compile/68030 mac68k

68030# df
Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
/dev/sd0a      496239    24145    447282     5%    /
/dev/sd0g    29710930  1287392  26937990     4%    /usr

So, as shown above, I have a 10k rpm Quantum 36gb HD installed and
running off the standard scsi controller of the SE/30.  I've also pumped
this box up to 128MB of ram.

For a completely invalid comparison, I'm including numbers from my
hopped up 386.  CPU is a Cyrix 486DLC 40, Adaptec 1540CF scsi, and a 9GB
5.25" full height Seagate Elite HD (4500rpm?).

Now for some benchmarking... first a little dd muscle flexing...
68030# dd if=/dev/zero of=bigfile bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 179.047 secs (585642 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 79.461030 secs (1319610 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=512k count=200
200+0 records in
200+0 records out
104857600 bytes transferred in 159.507 secs (657385 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=512k count=200
200+0 records in
200+0 records out
104857600 bytes transferred in 64.567853 secs (1623991 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=256k count=400
400+0 records in
400+0 records out
104857600 bytes transferred in 167.490 secs (626052 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=256k count=400
400+0 records in
400+0 records out
104857600 bytes transferred in 64.977015 secs (1613765 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=128k count=800
800+0 records in
800+0 records out
104857600 bytes transferred in 208.870 secs (502023 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=128k count=800
800+0 records in
800+0 records out
104857600 bytes transferred in 64.439781 secs (1627218 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=64k count=1600
1600+0 records in
1600+0 records out
104857600 bytes transferred in 186.402 secs (562534 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=64k count=1600
1600+0 records in
1600+0 records out
104857600 bytes transferred in 64.733248 secs (1619841 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=32k count=3200
3200+0 records in
3200+0 records out
104857600 bytes transferred in 183.981 secs (569937 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=32k count=3200
3200+0 records in
3200+0 records out
104857600 bytes transferred in 63.885948 secs (1641325 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=16k count=6400
6400+0 records in
6400+0 records out
104857600 bytes transferred in 194.446 secs (539263 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=16k count=6400
6400+0 records in
6400+0 records out
104857600 bytes transferred in 67.650031 secs (1550001 bytes/sec)

68030# dd if=/dev/zero of=bigfile bs=4k count=25600
25600+0 records in
25600+0 records out
104857600 bytes transferred in 279.773 secs (374795 bytes/sec)

cyrix-dlc# dd if=/dev/zero of=bigfile bs=4k count=25600
25600+0 records in
25600+0 records out
104857600 bytes transferred in 84.386210 secs (1242592 bytes/sec)

ASCII Charting

68030     = *
cyrix-dlc = #

 Time to write 100MB using DD

     280 |              **
     260 |              **
     240 |              **
Time 220 |              **
 in  200 |      **    ****
Secs 180 |**    **********
     160 |****************
     140 |****************
     120 |****************
     100 |****************
      80 |##************##
      60 |################
      40 |################
      20 |################
       0 +----------------
          1 5 2 1 6 3 1 4
	  0 1 5 2 4 2 6
	  2 2 6 8
	  4

	  Segment Size in k

I'm assuming that part of the performance hit on the 68030 is being CPU
bound.  Near as I can tell the onboard scsi is accessed via polling, and
cannot do DMA, so the CPU is doing all the heavy lifting.  The 386 gets
to use a scsi interface that does busmastering and full DMA.  Other
niceties like tagged queuing, etc are also unobtainabale with the NCR
chip as far as I know.  Once I get a chance to stuff my 68030 powercache
in, I'll rerun to see how that helps.

To show theoretical max perf, here's dd results, same tests, using /dev/nul
for output.

Block   Throughput (bytes/sec) 
Size   68030          cyrix-dlc

1024k  2695845b/s     9324512b/s
 512k  2719970b/s     9135125b/s
 256k  2643979b/s     8981335b/s
 128k  2651123b/s     9745324b/s
  64k  2565386b/s     8725273b/s
  32k  2388555b/s     7957978b/s
  16k  2111425b/s     6864039b/s
   4k  1238353b/s     4166912b/s

So, using those numbers, lets see what we get for bandwidth realization
as a percentage.  (Yup, I'm now officially off in space with this one.)

Block  % of available bandwidth achieved
Size   68030          cyrix-dlc

1024k  21.72%         14.15%
 512k  24.16%         17.77%
 256k  23.67%         17.96%
 128k  18.93%         16.69%
  64k  21.92%         18.56%
  32k  23.86%         20.62%
  16k  25.54%         22.58%
   4k  30.26%         29.82%


68030     = *
cyrix-dlc = #

     Bandwidth Realization 

     30% |              *#
     28% |              ##
     26% |            **##
     24% |  ****    ****##
     22% |******  ****####
     20% |******  **######
     18% |**####**########
     16% |**##############
     14% |################
     12% |################
     10% |################
      8% |################
      6% |################
      4% |################
      2% |################
       0 +----------------
          1 5 2 1 6 3 1 4
	  0 1 5 2 4 2 6
	  2 2 6 8
	  4

	  Segment Size in k

If I were foolish enough to try and corrolate these results as if they
were factual and relevant, I'd say this supports my theory that the
16mhz 68030 is my limiting factor.  The 68030 is getting closer to the
theoretical peak number than the faster cyrix-dlc, so the only way a
software speedup is going to occur is if something drastically improves
raw memory io in NetBSD.  At least, thats how I'd interprut this, if it
were valid.

Now, for Bonnie++ fun...

Note, the versions of bonnie++ are different enough on the two machines
to completely invalidate this data for comparison purposes even ignoring
all the other inconsistancies.


68030#bonnie++ -u 0:0 -s 10
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
68030.x386.net  10M   140  96   736  83   646  94   123  96  1522  98 173.2  91
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16     6  97   265  98    72  72     6  95     8  98    17  91
68030.x386.net,10M,140,96,736,83,646,94,123,96,1522,98,173.2,91,16,6,97,265,98,72,72,6,95,8,98,17,91

cyrix-dlc#bonnie++ -u 0:0 -s 10
Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cyrix-dlc.x386. 10M     1  98  1333  69  1381  70     2  99  5791  98 404.0 404
Latency              6758ms    1114ms   53933us    2961ms    4750us    1317ms
Version 1.93c       ------Sequential Create------ --------Random Create--------
cyrix-dlc.x386.net  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   249  92   774  98   349  60   252  95   769  98   455  70
Latency              2197ms   44705us    9798ms    2169ms   22475us    3927ms
1.93c,1.93c,cyrix-dlc.x386.net,1,1068140104,10M,,1,98,1333,69,1381,70,2,99,5791,98,404.0,404,16,,,,,249,92,774,98,349,60,252,95,769,98,455,70,6758ms,1114ms,53933us,2961ms,4750us,1317ms,2197ms,44705us,9798ms,2169ms,22475us,3927ms

Full disclosure requires I point out the cyrix was running FreeBSD
4.8p13, apache+php, qmail and friends, and being access via SSH so it
was not exactly 'idle' at the time of testing.  The 68030 was also
accessed via SSH, and running MySQL for a php-nuke clone hosted on the
cyrix-dlc.

So, the net result of this is a slower machine with a faster harddrive
looks poor in a cooked test with no scientific basis.  Anyone have any
tweaking suggestions to help tune up the 68030 software wise?

This concludes today's lesson on How NOT to Run a Comparison Bakeoff.

Joshua Coombs
http://www.outofspec.com
http://www.x386.net