Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Some more patches for GCC on NetBSD/VAX coming soon...



Great, thanks for running that! I'll try to finish up the test cases, but that is very interesting info already. It looks like D_float and G_float are similar speed, but the 2-operator version is faster than the 3-op version, and especially for D_float, while on NVAX it's exactly the opposite. You also don't have to pay a huge penalty for "emul", but every other instruction is much slower, including 4 clock cycles for "nop". So much for the idea of aligning jumps to the nearest longword with nops as a general performance improvement.

Perhaps "-falign-jumps=4" would be okay since it aligns branch targets that can only be reached by jumping, as opposed to -falign-loops, which aligns the labels for branch targets of loops by padding with nops to align the loop entry point. Discovering the optimal alignment for data as well as code is one thing I'm trying to figure out. It may be that no VAX really cares too much about memory alignment, but I suspect the newer ones pay more of a relative penalty when data is misaligned.

At least I know the test program works okay for GCC tuning purposes. Thanks again!


> On Apr 1, 2016, at 13:32, Johnny Billquist <bqt%update.uu.se@localhost> wrote:
> 
> Results from a VAX 8650 then.
> Program compiled with cc -O2 -o cyclecount cyclecount.c
> 
> Krille:local/bqt# ./cyclecount
> loop overhead is 0.290213 usec
> elapsed time for nop: 41270288 usec
> # cycles at 72 MHz: 4.127029 (17445965 ips)
> elapsed time for 32-bit int multiply (2 op): 165203547 usec
> # cycles at 72 MHz: 33.040709 (2179130 ips)
> elapsed time for 32-bit int multiply (3 op, 1 reg): 185888344 usec
> # cycles at 72 MHz: 37.177669 (1936646 ips)
> elapsed time for 32-bit int multiply (3 op, 3 reg): 144865833 usec
> # cycles at 72 MHz: 28.973167 (2485058 ips)
> elapsed time for 16-bit int multiply (2 op): 33086337 usec
> # cycles at 72 MHz: 33.086337 (2176125 ips)
> elapsed time for 16-bit int multiply (3 op, 3 reg): 28985526 usec
> # cycles at 72 MHz: 28.985526 (2483998 ips)
> elapsed time for 8-bit int multiply (2 op): 99279460 usec
> # cycles at 72 MHz: 33.093153 (2175677 ips)
> elapsed time for 8-bit int multiply (3 op, 3 reg): 86891245 usec
> # cycles at 72 MHz: 28.963748 (2485866 ips)
> elapsed time for F_floating multiply (2 op): 67304807 usec
> # cycles at 72 MHz: 13.460961 (5348801 ips)
> elapsed time for F_floating multiply (3 op, 3 reg): 103582829 usec
> # cycles at 72 MHz: 20.716566 (3475480 ips)
> elapsed time for D_floating multiply (2 op): 144650844 usec
> # cycles at 72 MHz: 28.930169 (2488751 ips)
> elapsed time for D_floating multiply (3 op, 3 reg): 310407964 usec
> # cycles at 72 MHz: 62.081593 (1159764 ips)
> elapsed time for G_floating multiply (2 op): 268805457 usec
> # cycles at 72 MHz: 53.761091 (1339259 ips)
> elapsed time for G_floating multiply (3 op, 3 reg): 310712278 usec
> # cycles at 72 MHz: 62.142456 (1158628 ips)
> elapsed time for 32-bit int multiply-add (64-bit result): 37261313 usec
> # cycles at 72 MHz: 37.261313 (1932299 ips)
> 
> 
> Oh, and for the record:
> ?MCP-I-CPSRUN, CPU is still running
> >>>sho clock
>    FREQUENCY 72 Mhz, full rate, locked
>    SYS_CLOCK running
>    CPU_CLOCK running
> 
>    X1 = 40 Mhz
>    X2 = 50 Mhz
>    X3 = 68 Mhz
>    X4 = 72 Mhz (Normal)
>    X5 = 74 Mhz (High)
>    X6 = 76 Mhz
> 
> 	Johnny
> 



Home | Main Index | Thread Index | Old Index