Subject: RE: problems regarding libc
To: Pai-Hsiang Hsiao <shawn@eecs.harvard.edu>
From: TAKEMURA, Shin <takemura@netbsd.org>
List: port-hpcmips
Date: 12/24/1999 12:10:19
-----Original Message-----
From: Pai-Hsiang Hsiao <shawn@eecs.harvard.edu>
To: TAKEMURA, Shin <takemura@netbsd.org>
Cc: port-hpcmips@netbsd.org <port-hpcmips@netbsd.org>
Date: Friday, December 24, 1999 8:23 AM
Subject: RE: problems regarding libc
>> I wrote a simple program, which call bzero(1MB) 100 times and
>> ran it within time command on my MC-R500. The time command
>> says it takes 6.5 sec, so I roughly got 16MB/s. Is that enough?
>>
>> (MC-R500 has VR4111 of which clock is 78MHz or 100MHz.)
>
>I don't know what's the theoretical bandwidth of both machine, could
>you please try use the C version bcopy included in the libc
>(libsa/string/)
>and see what's the difference?

Sorry, I couldn't find any differences.

>It might be better for you to test
>different
>size of block copy, to avoid biased cache effect.


I tried and got follows:

bzero speed on MC-R700:
   1KB:   69.3MB/s
   4KB:   87.9MB/s
   8KB:   88.6MB/s
  16KB:   16.2MB/s
  32KB:   16.3MB/s
  64KB:   16.3MB/s
 128KB:   16.3MB/s
 256KB:   16.3MB/s
 512KB:   16.2MB/s
1024KB:   16.2MB/s

bzero speed on my celeron 300A PC:
   1KB: 1075.6MB/s
   4KB: 1910.4MB/s
   8KB: 2169.5MB/s
  16KB: 2285.7MB/s
  32KB:  853.3MB/s
  64KB:  656.4MB/s
 128KB:  301.9MB/s
 256KB:  255.5MB/s
 512KB:  215.5MB/s
1024KB:  210.5MB/s

>The difference here is huge (4MB/s v.s. 160MB/s), though the benchmark
>(hbench-OS, a patched version of lmbench) program itself might have
>problems.
>
>My conjecture is, 16MB/s is still far from enough. If it's a 78MHz machine
>with 32 bit memory bus, the memory (write/copy) bandwidth should not be
>as low as 16MB/s. That's why I believe there're some problems in the
library's
>bzero/bcopy. FYI, Another Pentium III 350MHz under test has more than
>1000MB/s memory bandwidth.


Do you talk about 1st cache access speed? Pentium III 350MHz system
has memory of which clock is 100MHz. Thereby maximum memory
bandwidth is 32bit x 100MHz = 400MB/s.  And my celeron 300A PC's
read/write speed is about 200MB/s effectively.

My R500's CPU clock is 78MHz and I think that it's bus width is 16bit.
Generally speaking, MIPS machine's memory bus clock is less than
half of the CPU bus lock. Thus maximun speed is 78MHz/2 x 16bit = 78MB/s
and effective speed may be 20-40MB/s. I think a speed of 16MB/s is so so.
(My argument may be something wrong because I'm not hardware
engineer and I don't know about memory system very much.)
But 4MB/s is too slow.

The kernel happens to make  a page uncachable to avoid virtual alias
problem. If it happened, a memory access speed dramatically worsen.
I wonder that it would be cause of your problem.

Takemura