Subject: Re: Issue with large memory systems, and PPC overhead
To: Chuck Silvers <chuq@chuq.com>
From: Matt Thomas <matt@3am-software.com>
List: tech-kern
Date: 11/08/2002 13:09:37
At 12:54 AM 11/8/2002, Chuck Silvers wrote:
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ns/call  ns/call  name
>  12.04      4.20     4.20   190077 22096.31 31792.03  pmap_remove
>   6.48      6.46     2.26   120044 18826.43 18826.43  vcopypage
>   6.42      8.70     2.24   160293 13974.41 13974.41  __syncicache
>   5.48     10.61     1.91  2011224   949.67  1088.58  pmap_pvo_enter
>   5.19     12.42     1.81 11607659   155.93   155.93  splx



>and on the 604:
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  us/call  us/call  name
>  15.49      1.78     1.78    13044   136.46   136.46  pmap_copy_page
>   9.23      2.84     1.06    19081    55.55    63.34  pmap_remove
>   7.05      3.65     0.81    16708    48.48    48.48  __syncicache
>   4.70      4.19     0.54   200028     2.70     4.21  pmap_pvo_enter
>   3.57      4.60     0.41   618995     0.66     0.66  splx

It's interesting to see that the Altivec version of pmap_copy_page
(vcopypage) is so much faster than the non-Altivec version.
Given that pmap_remove is 32us/call on the G4/400 and 63us on the
604ev/180 (roughly scaling with CPU speed), the difference between
19us for vcopypage and 136us for pmap_copy_page is amazing.  That's
over 3 times as fast as you'd except pmap_copy_page to be on the G4.


-- 
Matt Thomas               Internet:   matt@3am-software.com
3am Software Foundry      WWW URL:    http://www.3am-software.com/bio/matt/
Cupertino, CA             Disclaimer: I avow all knowledge of this message