Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lenovo ThinkSystem HR330A



On Tue, 24 Sep 2019 09:56:31 -0600 (MDT)
Swift Griggs <swiftgriggs%gmail.com@localhost> wrote:

> On Tue, 24 Sep 2019, Sad Clouds wrote:
> > I have raspberry pi 3 where NetBSD has timer issues/bugs which make
> > it unusable, not sure if this is related to this particular system
> > or if this is a generic NetBSD issue with aarch64.
> 
> I've had similar results from my RPi hardware. ARM offers a lot more
> than it ever has hardware-wise, and this server shows a lot of
> muscle, but I wonder if anyone has turned loose something like
> byte-bench on it a few dozen times in a row.
> 
> I'll have to go scrape Geekbench's site to see if anyone has
> benchmark'd that system. The clock rates are surprisingly high to my
> untrained-to-ARM eye.

CPU clock rate can sometimes be a bit misleading. Quite often, the
faster the clock rate, the more time CPU is spent waiting for memory
access, etc. Also you need to take into account what type
of instructions you need to be fast, i.e. if you do a lot of
scientific processing, then fast floating point is going to be
important to you.

I've been developing various performance tools in order to compare
different hardware platforms. Below is an example where 1.4 GHz RPi3
ARM CPU seems to outperform 2.4 GHz x86_64 CPU on integer division.

Intel Xeon E5620 @ 2.4 GHz:
$ .obj/sv_cpu -loop=1000000 -threads=1 -ta=seq
Per-thread metrics:
  T1      Add (Mops): int32=1590.69, int64=1594.50, flt=1594.80, dbl=2194.19, ldbl=708.42
          Sub (Mops): int32=2392.06, int64=2392.37, flt=2390.53, dbl=2393.83, ldbl=708.73
          -----------
          Mul (Mops): int32=2378.29, int64=2389.63, flt=2388.18, dbl=2392.04, ldbl=708.84
          Div (Mops): int32=217.44, int64=87.30, flt=341.73, dbl=341.71, ldbl=341.68
          -----------
          And (Mops): int32=2390.50, int64=2392.15
          Or  (Mops): int32=2392.38, int64=2393.83
          XOr (Mops): int32=2390.60, int64=2392.20
          RoL (Mops): int32=3189.19, int64=3191.77
          RoR (Mops): int32=3186.07, int64=3188.87
          -----------
          H64 (MiBs): 8117.98

ARM Cortex A53 @ 1.4 GHz
$ .obj/sv_cpu -loop=1000000 -threads=1 -ta=seq
Per-thread metrics:
  T1      Add (Mops): int32=1093.34, int64=1092.83, flt=994.20, dbl=994.67, ldbl=17.13
          Sub (Mops): int32=1119.89, int64=1119.40, flt=1017.84, dbl=1017.83, ldbl=12.78
          -----------
          Mul (Mops): int32=932.97, int64=559.90, flt=1017.85, dbl=1017.84, ldbl=10.81
          Div (Mops): int32=430.66, int64=430.67, flt=139.95, dbl=73.66, ldbl=8.78
          -----------
          And (Mops): int32=1119.89, int64=1119.57
          Or  (Mops): int32=1119.63, int64=1119.89
          XOr (Mops): int32=1119.62, int64=1119.61
          RoL (Mops): int32=2239.59, int64=2238.45
          RoR (Mops): int32=2239.65, int64=2239.59
          -----------
          H64 (MiBs): 1900.89


Most metrics are given as Mops (Mega Operations Per Second). For
integer division ARM Cortex A53 has significantly higher throughput:

Intel Xeon: Div (Mops): int32=217.44, int64=87.30
ARM Cortex: Div (Mops): int32=430.66, int64=430.67

OK, compiler versions were different, but still, it's quite impressive
how AArch64 ARM CPU is potentially 5 times better on 64-bit integer
division, when it is running at 1 GHz lower clock rate.


Home | Main Index | Thread Index | Old Index