Subject: Re: ARM9 cache routines updated
To: None <port-arm@netbsd.org>
From: Hiroyuki Bessho <bsh@grotto.jp>
List: port-arm
Date: 02/09/2004 18:46:17
Richard Earnshaw <rearnsha@arm.com> writes:

>
> If anyone has access to the various Samsung ARM920-based boards I'd be 
> interested to hear how this affects performance.
>

 I got an lmbench result on SMDK2410.


                 L M B E N C H  1 . 9   S U M M A R Y
                 ------------------------------------
		 (Alpha software, do not distribute)

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos       inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
2410-a     NetBSD 1.6ZI  200  3.1  19.   74  1650 0.51K 10.8   26 7.1K  31K  77K
2410-a     NetBSD 1.6ZI  200  3.1  19.   75  1654 0.52K 10.9   26 7.1K  31K  77K
2410-b     NetBSD 1.6ZI  200  1.8  10.   41  1566 0.29K  6.4   16 6.2K  27K  67K
2410-b     NetBSD 1.6ZI  200  1.8  10.   41  1595 0.29K  6.4   16 6.2K  27K  67K
2410-c     NetBSD 1.6ZI  200  3.1  19.   74  1527 0.51K 10.7   26 7.2K  29K  72K
2410-c     NetBSD 1.6ZI  200  3.1  19.   73  1599 0.50K 10.6   26 7.2K  29K  72K
2410-d     NetBSD 1.6ZI  200  1.5  9.5   36  1431 0.26K  5.3   13 5.8K  24K  59K
2410-d     NetBSD 1.6ZI  200  1.5  9.5   36  1478 0.26K  5.4   13 5.8K  24K  59K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
2410-a     NetBSD 1.6ZI  267    631          627            634        
2410-a     NetBSD 1.6ZI  267    626          624            630        
2410-b     NetBSD 1.6ZI  265    635          631            654        
2410-b     NetBSD 1.6ZI  268    633          631            659        
2410-c     NetBSD 1.6ZI  320    687          680            695        
2410-c     NetBSD 1.6ZI  320    685          679            691        
2410-d     NetBSD 1.6ZI  287    663          656            657        
2410-d     NetBSD 1.6ZI  289    661          656            669        

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
2410-a     NetBSD 1.6ZI   267   586  794                             
2410-a     NetBSD 1.6ZI   267   586  798                             
2410-b     NetBSD 1.6ZI   265   564  770                             
2410-b     NetBSD 1.6ZI   268   563  771                             
2410-c     NetBSD 1.6ZI   320   689  881                             
2410-c     NetBSD 1.6ZI   320   693  881                             
2410-d     NetBSD 1.6ZI   287   596  787                             
2410-d     NetBSD 1.6ZI   289   596  787                             

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page	
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
2410-a     NetBSD 1.6ZI    465    337   3333    321     2027          9.9K
2410-a     NetBSD 1.6ZI    458    312   3333    326     2091          9.9K
2410-b     NetBSD 1.6ZI    442    292   3225    302     1568          9.7K
2410-b     NetBSD 1.6ZI    442    302   3225    314     1582          9.7K
2410-c     NetBSD 1.6ZI    442    290   3225    310     2215          9.9K
2410-c     NetBSD 1.6ZI    436    301   3225    313     2105          9.8K
2410-d     NetBSD 1.6ZI    411    273   3125    286     1515          9.5K
2410-d     NetBSD 1.6ZI    409    277   3125    294     1500          9.5K

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
2410-a     NetBSD 1.6ZI   28    4   -1     14     45     41     40   45   189
2410-a     NetBSD 1.6ZI   27    4   -1     14     45     41     40   45   189
2410-b     NetBSD 1.6ZI   28    4   -1     15     54     42     42   54   192
2410-b     NetBSD 1.6ZI   28    4   -1     15     54     42     42   54   192
2410-c     NetBSD 1.6ZI   24    4   -1     14     45     41     40   45   189
2410-c     NetBSD 1.6ZI   24    4   -1     14     45     41     40   45   189
2410-d     NetBSD 1.6ZI   27    4   -1     16     54     42     42   54   193
2410-d     NetBSD 1.6ZI   27    4   -1     16     54     42     42   54   193

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------   ---  ----   ----    --------    -------
2410-a     NetBSD 1.6ZI   200    20    483         495    No L2 cache?
2410-a     NetBSD 1.6ZI   200    20    484         495    No L2 cache?
2410-b     NetBSD 1.6ZI   200    10    483         495    No L2 cache?
2410-b     NetBSD 1.6ZI   200    10    483         495    No L2 cache?
2410-c     NetBSD 1.6ZI   200    20    483         495    No L2 cache?
2410-c     NetBSD 1.6ZI   200    20    483         495    No L2 cache?
2410-d     NetBSD 1.6ZI   200    10    483         495    No L2 cache?
2410-d     NetBSD 1.6ZI   200    10    483         495    No L2 cache?


  The kernels were built from -current source as of 2004-Feb-04, with
following changes:

  2410-a: backed out both write-back dcache change and clocking-mode
          bits fix in arm9_setup().
          (using sys/arm/include/cpufunc.h:1.29, sys/arm/arm/cpufunc.c:1.65, 
          sys/arm/arm/cpufunc_asm_arm9.S:1.2)

  2410-b: with clocking-mode bits fix in arm9_setup(), and without
          write-back d-cache.

  2410-c: with write-back d-cache chages, and without clocking-mode
          bits fix.


  2410-d: both write-back d-cache changes and clocking-mode bits fix.


  It showed that clocking-mode bits fix made better results for all
tests, while write-back d-cache changes gave lower performance on some
tests.

--
bsh.