Subject: Memory RD/WR and copy (aligned problems)
To: None <port-hpcmips@netbsd.org>
From: Pai-Hsiang Hsiao <shawn@eecs.harvard.edu>
List: port-hpcmips
Date: 01/10/2000 18:04:46
I get more numbers today.

First, by replacing bcopy in libc with the C version (kernel uses
C version of bcopy, not the assembly version) and use my own buggy
bcopy bandwidth measurement program, I can get 3 times of
peroformance improvement. The buggy program may not do the timing
or setup right, so just try to get some feeling about it.

  bcopy   assembly       C
   8KB      16MB/s  53MB/s
  16KB       7MB/s  22MB/s

Then, I use the new libc with hbench-OS, which still yield similar
numbers (3.5MB/s) for aligned bcopy, but 11MB/s for unaligned bcopy.
Weird, right?

So I dig into the program and use different unalignment to see what's
going on there. (in the program, aligned means to have the starting
point of buffer aligned to 8K boundary, unaligned means not to)

  bcopy 2K
  unalignment offset        C
                   0   3.3MB/s
                   1   0.9MB/s
                   2   0.9MB/s
                   4   4.4MB/s
                   8   6.2MB/s
                  16  11.2MB/s
                 128  11.2MB/s

the default offset for "unaligned" is 128.

This might exaplain why I saw those strange results I had last time.

I don't know why the alignment make so much differences, I doubt it
is caused by some TLB or cache codes. I don't know whether this is
caused by the virtual address aliasing problem mentioned by Shin
last time.

// Shawn