Subject: BYTE Un*x benchmark results, was: Disapointing performances...
To: Vincent BARAT <Vincent.Barat@alcatel.fr>
From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
List: port-mac68k
Date: 06/14/1998 21:20:44
At 8:54 Uhr +0200 12.06.1998, Vincent BARAT wrote:
>Bruce Anderson wrote:
>>
>> Your Centris 650 has a 16MHz I/O bus and no L2 cache.
>> compair that with a 486 mother board I have laying about
>>  64-256K L2 and I/O bus speed of 16 to 50MHz.

Not quite. The NuBus, if you are referring to that, is a 32Bit wide
synchronous bus clocked with 10 MHz. An extension named "NuBus90" that
doubles the clock rate and is present in the Quadra class machines is
available to extension cards but not used by the motherboard itself.
But this is not relevant for the benchmark in question, as all the
components that are needed (CPU, memory, SCSI interface) are on-board and
more-or-less running with CPU speed.

I wouldn't expect too much from a second level cache, either: The Q700 I
run came with a Radius 128k 2nd level cache, and according to Speedometer 4
tests the gain is in the 0..10% range - not impressive.

>Don't you think it could be interesting to know bytemark results
>of a few of our machines ?
>
>You could download it and view Linux PC results at:
>
>http://www.silkroad.com/bass/linux/

Here we go... The machine in question is a Quadra 700 clocked with 33MHz
(true 68040/33), 128K 2nd level cache, 68MB RAM, 2MB VRAM, 4GB IBM DCAS
disk. Rather top range as m68k Macintoshes go.


1) X11R6 running, clients: XEmacs 20.4, two xloads, some xterms, an oclock,
no loads otherwise; Benchmark compiled with -O2 -m68040 on 1.3E egcs 1.0.2.

  BYTE UNIX Benchmarks (Version 3.11)
  System -- NetBSD q700.hf.org 1.3B NetBSD 1.3B (FG54) #4: Sun Jan 25
22:02:13 CET 1998 hauke@q700:/usr/src/sys/arch/mac68k/compile/FG54 mac68k
  Start Benchmark Run: Fri Jun 12 20:54:40 CEST 1998
   1 interactive users.
Dhrystone 2 without register variables    32395.2 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables      32436.3 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)          58998.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)          4004.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)             4509.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)               4009.3 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)              3963.0 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)             3365.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)            3390.8 lps   (10 secs, 6 samples)
System Call Overhead Test                 14008.8 lps   (10 secs, 6 samples)
Pipe Throughput Test                       2772.6 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test          1524.3 lps   (10 secs, 6 samples)
Process Creation Test                        39.1 lps   (10 secs, 6 samples)
Execl Throughput Test                         4.0 lps   (10 secs, 6 samples)
File Read  (10 seconds)                   19653.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                     800.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                     915.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                   21803.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                     600.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                     980.0 KBps  (30 secs, 6 samples)
C Compiler Test                              12.9 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)                 19.9 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                 10.0 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                  5.0 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                  2.0 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places           no measured results
Recursion Test--Tower of Hanoi              447.7 lps   (10 secs, 6 samples)


                     INDEX VALUES
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7     3390.8        1.3
Dhrystone 2 without register variables       22366.3    32395.2        1.4
Execl Throughput Test                           16.5        4.0        0.2
File Copy  (30 seconds)                        179.0      980.0        5.5
Pipe-based Context Switching Test             1318.5     1524.3        1.2
Shell scripts (8 concurrent)                     4.0        2.0        0.5
                                                                 =========
     SUM of  6 items                                                  10.2
     AVERAGE                                                           1.7



2) X11R6 running, clients: an xload, an xterm, an oclock, no loads
otherwise. Benchmark compiled with "-O3 -m68040 -static -fforce-mem
-fforce-addr -fomit-frame-pointer -finline-functions -funroll-loops on 1.3E
egcs 1.0.2.

  BYTE UNIX Benchmarks (Version 3.11)
  System -- NetBSD q700.hf.org 1.3B NetBSD 1.3B (FG54) #4: Sun Jan 25
22:02:13 CET 1998 hauke@q700:/usr/src/sys/arch/mac68k/compile/FG54 mac68k
  Start Benchmark Run: Fri Jun 12 23:24:53 CEST 1998
   1 interactive users.
Dhrystone 2 without register variables    35107.8 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables      34656.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         879078.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)          4096.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)             5053.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)               4095.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)              4095.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)             4072.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)            4167.3 lps   (10 secs, 6 samples)
System Call Overhead Test                 15131.3 lps   (10 secs, 6 samples)
Pipe Throughput Test                       2818.8 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test          1522.1 lps   (10 secs, 6 samples)
Process Creation Test                        71.3 lps   (10 secs, 6 samples)
Execl Throughput Test                        66.6 lps   (9 secs, 6 samples)
File Read  (10 seconds)                   19163.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                     800.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                     921.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                   20964.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                     544.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                     981.0 KBps  (30 secs, 6 samples)
C Compiler Test                              13.3 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)                 20.0 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                 10.0 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                  5.0 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                  2.0 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places           no measured results
Recursion Test--Tower of Hanoi              508.0 lps   (10 secs, 6 samples)


                     INDEX VALUES
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7     4167.3        1.6
Dhrystone 2 without register variables       22366.3    35107.8        1.6
Execl Throughput Test                           16.5       66.6        4.0
File Copy  (30 seconds)                        179.0      981.0        5.5
Pipe-based Context Switching Test             1318.5     1522.1        1.2
Shell scripts (8 concurrent)                     4.0        2.0        0.5
                                                                 =========
     SUM of  6 items                                                  14.4
     AVERAGE                                                           2.4



2) Running single-user; Benchmark compiled with "-O3 -m68040 -static
-fforce-mem -fforce-addr -fomit-frame-pointer -finline-functions
-funroll-loops on 1.3E egcs 1.0.2.

  BYTE UNIX Benchmarks (Version 3.11)
  System -- NetBSD q700.hf.org 1.3B NetBSD 1.3B (FG54) #4: Sun Jan 25
22:02:13 CET 1998 hauke@q700:/usr/src/sys/arch/mac68k/compile/FG54 mac68k
  Start Benchmark Run: Sat Jun 13 12:39:36 CEST 1998
   0 interactive users.
Dhrystone 2 without register variables    35383.0 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables      35380.3 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         886080.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)          4134.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)             5100.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)               4135.4 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)              4133.8 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)             4143.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)            4192.9 lps   (10 secs, 6 samples)
System Call Overhead Test                 15378.0 lps   (10 secs, 6 samples)
Pipe Throughput Test                       2893.2 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test          1606.8 lps   (10 secs, 6 samples)
Process Creation Test                        75.0 lps   (10 secs, 6 samples)
Execl Throughput Test                        81.6 lps   (9 secs, 6 samples)
File Read  (10 seconds)                   19460.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                     800.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                     928.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                   20745.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                     600.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                     995.0 KBps  (30 secs, 6 samples)
C Compiler Test                              13.6 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)                 21.0 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                 11.0 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                  5.0 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                  2.0 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places           no measured results
Recursion Test--Tower of Hanoi              516.9 lps   (10 secs, 6 samples)


                     INDEX VALUES
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7     4192.9        1.6
Dhrystone 2 without register variables       22366.3    35383.0        1.6
Execl Throughput Test                           16.5       81.6        4.9
File Copy  (30 seconds)                        179.0      995.0        5.6
Pipe-based Context Switching Test             1318.5     1606.8        1.2
Shell scripts (8 concurrent)                     4.0        2.0        0.5
                                                                 =========
     SUM of  6 items                                                  15.5
     AVERAGE                                                           2.6


Some annotations:

o  Results are qualitative, at best. Running single-user, the box lost 10
min (that's "ten minutes") against the wall clock during the benchmark. I
did not see the issue discussed on the web page; even systems with higher
prioritized clock interrupts than ours run at splhigh() sometimes.

o  Some tests ("Execl throughput") were extremely sensitive to optimization
levels. I wouldn't expect to see any effect in real life applications.

o  This is a 1.3B kernel. -current kernels with _vfork14() and Chuck
Cranor's VM system may well do better.

o  Looking at the list, the box comes out shortly behind 486DX2/66
machines. Disk I/O is rather weak in spite of the fast disk -- we don't
have busmaster DMA. Integer arithmetics are fine, mostly.

	hauke


--
"It's never straight up and down"     (DEVO)