Subject: Re: shell script performance improvement
To: Toru Nishimura <nisimura@itc.aist-nara.ac.jp>
From: Simon Burge <simonb@netbsd.org>
List: port-mips
Date: 03/26/2000 19:42:43
Toru Nishimura wrote:

> Hello, guys.
> 
> Any qualified NetBSD/mips folks are asked to dig out the reason why
> shell scripts run slowly on NetBSD/mips.  My first guess is the cost
> of fork/exec operation is handicapped severely.

Here's part of some old lmbench results I had lying around comparing a
5000/240 running Ultrix 4.5 vs. NetBSD 1.4.1, as well as a 5000/260 with
Ultrix 4.5 (it's the 117MHz one).

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos       inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
mips-dec-    ULTRIX 4.5   39  7.9       177  226 0.78K 18.2   88 3.5K  11K  24K
mips-dec-    ULTRIX 4.5  117  3.7  28.   80   99 0.39K 13.8   41 5.6K  14K  30K
pmax-netb  NetBSD 1.4.1   39  8.2  39.  248  322 0.68K 23.6   58 116K 232K 466K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
mips-dec-    ULTRIX 4.5   13    186    923   392   1029     440    1062
mips-dec-    ULTRIX 4.5   46    356    963   251   1493     296    1738
pmax-netb  NetBSD 1.4.1   46    271    837   426    940     443     964

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
mips-dec-    ULTRIX 4.5    13   228  292   975         635      23006
mips-dec-    ULTRIX 4.5    46   101  146   404         302       1678
pmax-netb  NetBSD 1.4.1    46   295  280   628         644       3262

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page       
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
mips-dec-    ULTRIX 4.5    254     58    751    131                      
mips-dec-    ULTRIX 4.5    189     61   1265    128                       
pmax-netb  NetBSD 1.4.1   2272   1204   4347   2941    49203    14    2.8K

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
mips-dec-    ULTRIX 4.5    8   11                        23     23   33    86
mips-dec-    ULTRIX 4.5   16   10           1            10      9   23    18
pmax-netb  NetBSD 1.4.1    8    8    5      9     33     24     23   33    86

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------   ---  ----   ----    --------    -------
mips-dec-    ULTRIX 4.5    39    50   1472        1456    No L2 cache?
mips-dec-    ULTRIX 4.5   117    23    281        1269
pmax-netb  NetBSD 1.4.1    39    50   1260        1456    No L2 cache?

Look at the first group - 116,000 us for NetBSD/pmax and 3,500 us for
Ultrix to fork a process!!  Similar (232,000 vs. 11,000) for exec.  I'll
try to get some benchmarks on a 5000/2{4,6}0 running -current unless
anyone beats me to it.

Don't take too much notice of the file create/delete times, the Ultrix
boxes had PrestoServe.

Simon.