tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Cache coloring effects on NetBSD/Alpha



Hello,

I have Alpha Station DS15 and Alpha Station DS25.

DS15 and DS25 are the same CPU and chipset.
(See:https://en.wikipedia.org/wiki/AlphaServer#Titan_Family)

Alpha Station DS15 specs
CPU:EV68CB
CPU MHz: 1000
B-cache:2MB

Alpha Station DS25 specs
CPU:EV68CB
CPU MHz: 1000
B-cache:8MB

The difference is B-cache size!

EV68CB(21264C) processor has split primary instruction and data caches.
The instruction cache has a capacity of 64 KB and is two-way
set-associative.(VIPT)
The data cache has a capacity of 64 KB and is two-way set-associative.(VIPT)
The external L2 cache is direct-mapped.(PIPT)

I run Himeno benchmark on DS15 and DS25.
(See:http://accc.riken.jp/2444.htm)
(Code:https://gist.githubusercontent.com/nullnilaki/d70f0ab4ff3bae2bbecb/raw/10522fe75ae34fa4fcc2d9864d5fc058af9a8968/himenoBMTxpa.c)

The result is

DS15 on Tru64 UNIX:
MFLOPS measured : 86.027073 cpu : 59.370118
-------------------------------------------------------------
ds15> /usr/users/naruaki/gcc/local/gcc4/bin/gcc -v
Using built-in specs.
Target: alpha-dec-osf5.1b
Configured with: ../gcc-4.2.3/configure --prefix=/usr/local/gcc4
--enable-languages=c,c++ --enable-threads=posix --disable-nls
--host=alpha-dec-osf5.1b --without-gnu-ld --with-ld=/usr/ccs/bin/ld
--without-gnu-as --with-as=/usr/bin/as --disable-libssp
Thread model: posix
gcc version 4.2.3

ds15> /usr/users/naruaki/gcc/local/gcc4/bin/gcc -mcpu=ev67 -O3 himenoBMTxpa.c

# ./a.out
For example:
Grid-size= XS (32x32x64)
S (64x64x128)
M (128x128x256)
L (256x256x512)
XL (512x512x1024)
Grid-size = M
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 86.651842 time(s): 4.653320 1.733593e-03
Now, start the actual measurement process.
The loop will be excuted in 38 times
This will take about one minute.
Wait for a while
Loop executed for 38 times
Gosa : 1.531465e-03
MFLOPS measured : 86.027073 cpu : 59.370118
Score based on Pentium III 600MHz using Fortran 77: 1.049111
-------------------------------------------------------------

DS25 on Tru64 UNIX:
MFLOPS measured : 88.777215 cpu : 59.044922
-------------------------------------------------------------
ds25> /usr/users/naruaki/gcc/local/gcc4/bin/gcc -v
Using built-in specs.
Target: alpha-dec-osf5.1b
Configured with: ../gcc-4.2.3/configure --prefix=/usr/local/gcc4
--enable-languages=c,c++ --enable-threads=posix --disable-nls
--host=alpha-dec-osf5.1b --without-gnu-ld --with-ld=/usr/ccs/bin/ld
--without-gnu-as --with-as=/usr/bin/as --disable-libssp
Thread model: posix
gcc version 4.2.3

ds25> /usr/users/naruaki/gcc/local/gcc4/bin/gcc -mcpu=ev67 -O3 himenoBMTxpa.c

# ./a.out
For example:
Grid-size= XS (32x32x64)
S (64x64x128)
M (128x128x256)
L (256x256x512)
XL (512x512x1024)
Grid-size = M
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 89.448864 time(s): 4.507813 1.733593e-03
Now, start the actual measurement process.
The loop will be excuted in 39 times
This will take about one minute.
Wait for a while
Loop executed for 39 times
Gosa : 1.525480e-03
MFLOPS measured : 88.777215 cpu : 59.044922
Score based on Pentium III 600MHz using Fortran 77: 1.082649
-------------------------------------------------------------

I was satisfied and I run benchmark on NetBSD/alpha current.

DS15 on NetBSD/alpha current:
MFLOPS measured : 41.800371 cpu : 57.877776
-------------------------------------------------------------
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/lto-wrapper
Target: alpha--netbsd
Configured with:
/usr/src/tools/gcc/../../external/gpl3/gcc/dist/configure
--target=alpha--netbsd --enable-long-long
--enable-threads --with-bugurl=http://www.NetBSD.org/Misc/send-pr.html
--with-pkgversion='NetBSD nb2 20150115'
--with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-threads
--enable-libstdcxx-time=rt --enable-lto
--with-mpc-lib=/var/obj/mknative/alpha/usr/src/external/lgpl3/mpc/lib/libmpc
--with-mpfr-lib=/var/obj/mknative/alpha/usr/src/external/lgpl3/mpfr/lib/libmpfr
--with-gmp-lib=/var/obj/mknative/alpha/usr/src/external/lgpl3/gmp/lib/libgmp
--with-mpc-include=/usr/src/external/lgpl3/mpc/dist/src
--with-mpfr-include=/usr/src/external/lgpl3/mpfr/dist/src
--with-gmp-include=/usr/src/external/lgpl3/gmp/lib/libgmp/arch/alpha
--enable-tls --disable-multilib --disable-symvers
--disable-libstdcxx-pch --build=x86_64-unknown-netbsd6.0. --host=alpha--netbsd
--with-sysroot=/var/obj/mknative/alpha/usr/src/destdir.alpha
Thread model: posix
gcc version 4.8.5 (nb2 20150115)

# gcc -mcpu=ev67 -O3 himenoBMTxpa.c

# ./a.out
For example:
Grid-size= XS (32x32x64)
S (64x64x128)
M (128x128x256)
L (256x256x512)
XL (512x512x1024)
Grid-size = M
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 41.801047 time(s): 9.646140 1.733593e-03
Now, start the actual measurement process.
The loop will be excuted in 18 times
This will take about one minute.
Wait for a while
Loop executed for 18 times
Gosa : 1.599236e-03
MFLOPS measured : 41.800371 cpu : 57.877776
Score based on Pentium III 600MHz using Fortran 77: 0.509761
-------------------------------------------------------------

DS25 on NetBSD/alpha current:
MFLOPS measured : 43.540721 cpu : 58.651274
-------------------------------------------------------------
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/lto-wrapper
Target: alpha--netbsd
Configured with:
/usr/src/tools/gcc/../../external/gpl3/gcc/dist/configure
--target=alpha--netbsd --enable-long-long
--enable-threads --with-bugurl=http://www.NetBSD.org/Misc/send-pr.html
--with-pkgversion='NetBSD nb2 20150115'
--with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-threads
--enable-libstdcxx-time=rt --enable-lto
--with-mpc-lib=/var/obj/mknative/alpha/usr/src/external/lgpl3/mpc/lib/libmpc
--with-mpfr-lib=/var/obj/mknative/alpha/usr/src/external/lgpl3/mpfr/lib/libmpfr
--with-gmp-lib=/var/obj/mknative/alpha/usr/src/external/lgpl3/gmp/lib/libgmp
--with-mpc-include=/usr/src/external/lgpl3/mpc/dist/src
--with-mpfr-include=/usr/src/external/lgpl3/mpfr/dist/src
--with-gmp-include=/usr/src/external/lgpl3/gmp/lib/libgmp/arch/alpha
--enable-tls --disable-multilib --disable-symvers
--disable-libstdcxx-pch --build=x86_64-unknown-netbsd6.0. --host=alpha--netbsd
--with-sysroot=/var/obj/mknative/alpha/usr/src/destdir.alpha
Thread model: posix
gcc version 4.8.5 (nb2 20150115)

# gcc -mcpu=ev67 -O3 himenoBMTxpa.c

# ./a.out
For example:
Grid-size= XS (32x32x64)
S (64x64x128)
M (128x128x256)
L (256x256x512)
XL (512x512x1024)
Grid-size = M
mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 43.546577 time(s): 9.259482 1.733593e-03
Now, start the actual measurement process.
The loop will be excuted in 19 times
This will take about one minute.
Wait for a while
Loop executed for 19 times
Gosa : 1.597596e-03
MFLOPS measured : 43.540721 cpu : 58.651274
Score based on Pentium III 600MHz using Fortran 77: 0.530984
-------------------------------------------------------------

I am disappointed in that result.
I think that the cause of decrease in performance is page coloring.
NetBSD/alpha dec6600 L2 cache size is not set.(uvmexp.ncolors == 1)
I am setting correct L2 cache size in dec_6600_init function.
(See:http://nxr.netbsd.org/source/xref/src/sys/arch/alpha/alpha/dec_6600.c#103)
(Reference:http://nxr.netbsd.org/source/xref/src/sys/arch/alpha/alpha/dec_eb164.c#94)

-------------------------------------------------------------
#include <uvm/uvm_extern.h>

void
dec_6600_init(void)
{

    platform.family = "6600";
...
    /* DS15 is 256 colors*/
    uvmexp.ncolors = atop(2 * 1024 * 1024);

        or

    /* DS25 is 1024 colors*/
    uvmexp.ncolors = atop(8 * 1024 * 1024);
...
    /* enable Cchip and Pchip error interrupts */
    STQP(TS_C_DIM0) = 0xe000000000000000;
    STQP(TS_C_DIM1) = 0xe000000000000000;
}
-------------------------------------------------------------

I am recompile kernel.
I'm expecting!

The result is

DS15 on NetBSD/alpha page coloring:
MFLOPS measured : 11.496620    cpu : 58.454680
-------------------------------------------------------------
# ./a.out
For example:
 Grid-size= XS (32x32x64)
        S  (64x64x128)
        M  (128x128x256)
        L  (256x256x512)
        XL (512x512x1024)

Grid-size = M

mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 11.496454 time(s): 35.073314 1.733593e-03

Now, start the actual measurement process.
The loop will be excuted in 5 times
This will take about one minute.
Wait for a while

Loop executed for 5 times
Gosa : 1.684019e-03
MFLOPS measured : 11.496620    cpu : 58.454680
Score based on Pentium III 600MHz using Fortran 77: 0.140203
-------------------------------------------------------------

DS25 on NetBSD/alpha page coloring:
MFLOPS measured : 11.775470    cpu : 57.070440
-------------------------------------------------------------
# ./a.out
For example:
 Grid-size= XS (32x32x64)
        S  (64x64x128)
        M  (128x128x256)
        L  (256x256x512)
        XL (512x512x1024)

Grid-size = M

mimax = 128 mjmax = 128 mkmax = 256
imax = 127 jmax = 127 kmax =255
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 11.773811 time(s): 34.247088 1.733593e-03

Now, start the actual measurement process.
The loop will be excuted in 5 times
This will take about one minute.
Wait for a while

Loop executed for 5 times
Gosa : 1.684019e-03
MFLOPS measured : 11.775470    cpu : 57.070440
Score based on Pentium III 600MHz using Fortran 77: 0.143603
-------------------------------------------------------------

I am very very disappointed.

Please let me know the cause.

I throw away "UNIX systems for modern architectures by schimmel" in
the trashcan!!!

--
Naruaki.Etomi
nullnilaki%gmail.com@localhost


Home | Main Index | Thread Index | Old Index