Port-vax archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Bountysource campaign for gcc-vax
On Sun, 25 Oct 2020, Maciej W. Rozycki wrote:
> > > > === gcc Summary ===
> > > >
> > > > # of expected passes 110142
> > > > # of unexpected failures 1821
> > > > # of unexpected successes 7
> > > > # of expected failures 566
> > > > # of unresolved testcases 30
> > > > # of unsupported tests 3131
> > > > /scratch/vol0/vax-netbsd/obj/gcc/gcc/xgcc version 11.0.0 20200704 (experimental) (GCC)
FAOD throughout this report I'll be referring to test results with GCC as
it stands as "base results" and test results with all my changes applied
as "final results"; by no means I consider the effort complete at this
stage, but I think I'm pretty close.
Full testing takes between 42 and 68 hours, depending on whether and how
quickly I kill a runaway `chi2_q_uniform_random.exe' process from the
libstdc++ testsuite, which is also the testsuite that takes the most time.
The following components are not supported at this time for the reasons
stated and therefore could not have been verified:
1. libada -- not ported to VAX/NetBSD, machine/OS bindings are not
present.
2. libgfortran -- oddly enough for Fortran a piece requires IEEE 754
floating-point arithmetic (possibly a porting problem too).
3. libgo -- not ported to VAX/NetBSD, machine/OS bindings are not present.
Originally libgomp was broken and didn't build due to an ICE, but I have
now figured out it gets fixed with Matt's old fix for GCC PR target/58901
which wasn't upstreamed and therefore I have just run quick final testing,
but I have no reference base results at this point. I'll include Matt's
fix in the upcoming submission as I think it's important.
I now have full final test results as follows:
1. === gnat Summary ===
# of expected passes 1829
# of unexpected failures 161
# of unexpected successes 23
# of unresolved testcases 941
# of unsupported tests 29
-- no change compared to base results.
2. === gcc Summary ===
# of expected passes 112963
# of unexpected failures 1886
# of unexpected successes 7
# of expected failures 665
# of unresolved testcases 28
# of unsupported tests 3161
-- there have been the following progressions:
-FAIL: gcc.dg/pr83623.c (internal compiler error)
-FAIL: gcc.dg/pr83623.c (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v16qi.c -O2 (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v16qi.c -O2 (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v4hi.c -O2 (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v4hi.c -O2 (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v8hi.c -O2 (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v8hi.c -O2 (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 (test for excess errors)
-FAIL: gcc.target/vax/bswapdi-1.c (test for excess errors)
-- and the following regression (discussed previously):
+FAIL: gcc.dg/lto/pr55660 c_lto_pr55660_0.o-c_lto_pr55660_1.o link, -O2 -flto -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects
The results are overall slightly worse than those obtained with the
20200704 snapshot, but I'm not concerned about it as my changes are
neutral to the regression and the scope of the project is not to clean the
backend up.
3. === g++ Summary ===
# of expected passes 179033
# of unexpected failures 1278
# of expected failures 754
# of unresolved testcases 11
# of unsupported tests 8946
-- no change compared to base results.
4. === gdc Summary ===
# of expected passes 3580
# of unexpected failures 23
# of unsupported tests 1035
-- no change compared to base results.
5. === gfortran Summary ===
# of expected passes 18823
# of unexpected failures 16643
# of expected failures 177
# of unresolved testcases 14746
# of untested testcases 1752
# of unsupported tests 671
-- results look poor, but that is due to the lack of libgfortran (for
reasons stated above) leading to link errors:
gfortran: fatal error: cannot read spec file 'libgfortran.spec': No such file or directory
(arguably a test framework bug, as in the absence of libgfortran it should
downgrade all run or link tests to compilation tests). There have been a
couple of regressions too, included in the figures above, as follows:
+FAIL: gfortran.dg/class_61.f90 -O (test for excess errors)
+FAIL: gfortran.dg/graphite/PR67518.f90 -O (internal compiler error)
+FAIL: gfortran.dg/graphite/PR67518.f90 -O (test for excess errors)
of which the latter turned out to be a red herring or a build environment
oddity, which disappeared with a clean rebuild, and the two latter ones
(which are a single one really) were caused by another mismatch between
predicates and constraints (discussed below) and which I have now fixed,
but have not rerun full testing with yet.
6. === go Summary ===
# of expected passes 1770
# of unexpected failures 778
# of unresolved testcases 2
# of untested testcases 689
# of unsupported tests 3
-- again, poor results are due to libgo missing:
.../usr/bin/vax-netbsdelf-ld: cannot find -lgobegin
.../usr/bin/vax-netbsdelf-ld: cannot find -lgo
collect2: error: ld returned 1 exit status
and the same observations apply as with libgfortran. Also no change
compared to base results.
7. === objc Summary ===
# of expected passes 2688
# of unexpected failures 2
# of unsupported tests 68
-- no change compared to base results.
8. === obj-c++ Summary ===
# of expected passes 1449
# of unexpected failures 1
# of expected failures 2
# of unsupported tests 77
-- no change compared to base results (and almost clean results, wow!).
9. === libstdc++ Summary ===
# of expected passes 12247
# of unexpected failures 598
# of expected failures 96
# of unresolved testcases 103
# of unsupported tests 796
-- there have been the following progressions:
-FAIL: 27_io/basic_stringbuf/overflow/char/1.cc execution test
-FAIL: experimental/filesystem/iterators/directory_iterator.cc execution test
and the following regression:
+FAIL: 25_algorithms/set_symmetric_difference/constrained.cc (test for excess errors)
which is however the same issue as PR67518.f90 above.
10. === libffi Summary ===
# of expected passes 1819
# of unexpected failures 87
# of unsupported tests 30
-- no change compared to base results.
11. === libatomic Summary ===
# of expected passes 44
# of unsupported tests 5
-- clean results!
12. === libgomp Summary ===
# of expected passes 2646
# of unexpected failures 40
# of expected failures 4
# of unsupported tests 379
-- as noted above no base results.
> I have updated the changes significantly, including some required
> modifications to generic GCC code needed for scenarios not previously
> considered. I have also instrumented the test harness to run `size' on
> all executables run on the remote machine and record the output in the
> test log. That has turned out to collect 17253 samples.
>
> With that in place I ran a simple (well, maybe not so much) shell command
> to diff the `size' results and report any test cases whose text size has
> not decreased with my change in place. This resulted in these test cases
> only having grown with the old and the new text sizes respectively shown:
>
> 2317 2327 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2323 2333 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2109 2117 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 2023 2025 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 5929 5983 ./strlenopt-68.exe
> 2621 2623 ./interchange-0.exe
> 2419 2421 ./interchange-15.exe
> 2663 2665 ./interchange-5.exe
> 2419 2421 ./uns-interchange-15.exe
>
> (many test cases are run consecutively at different optimisation levels
> and unfortunately many of them do not care to make the file names of the
> binaries produced unique, consequently overwriting ones previously built).
>
> I'll yet see if there is anything significantly wrong with them, but
> otherwise I consider my code ready for final verification and I do not
> consider these size regressions showstoppers for change submission. After
> all the code size with 17253 - 17 = 17236 samples has decreased, and any
> corner-case pessimisations can be sorted out, where feasible, anytime.
So the size regressions turned out to be due to a bug in one of the
peepholes I have added for some common compare-branch sequences the
comparison elimination pass was unable to fold due to the structure of the
insns involved not fitting the design expectations of the pass, and then a
pessimisation with consecutive branches like:
107d9: d1 55 8f ff cmpl r5,$0xffffffff
107dd: ff ff ff
107e0: 19 68 blss 1084a <foo+0x7a>
107e2: 13 5d beql 10841 <foo+0x71>
which got expanded unnecessarily with new code:
107d9: d1 55 8f ff cmpl r5,$0xffffffff
107dd: ff ff ff
107e0: 19 76 blss 10858 <foo+0x88>
107e2: d1 55 8f ff cmpl r5,$0xffffffff
107e6: ff ff ff
107e9: 13 64 beql 1084f <foo+0x7f>
Both now fixed, the second one after a rethought of the handling of the
set of CC modes representing various subsets of the condition codes
different VAX instructions set, which I am now particularly happy about.
Overall size results have got a little worse now, and I need to figure
out what the cause is. For example I have observed regressions like one
from:
10861: 9e ef d9 12 movab 11b40 <p>,r0
10865: 00 00 50
10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
1086c: 02
1086d: d1 60 8f 61 cmpl (r0),$0x64646261
10871: 62 64 64
10874: 13 07 beql 1087d <main_test+0x21>
to:
10861: 9e ef e1 12 movab 11b48 <p>,r0
10865: 00 00 50
10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
1086c: 02
1086d: d1 ef d5 12 cmpl 11b48 <p>,$0x64646261
10871: 00 00 8f 61
10875: 62 64 64
10878: 13 07 beql 10881 <main_test+0x25>
and I'd like to understand why it's happened. I can't imagine the
displacement addressing mode (or even the absolute addressing mode, not
used with PIC code) to be faster with any VAX implementation than the
register deferred addressing mode, though surely it doesn't have to be
slower. And all things being equal the shorter encoding should win.
> Once that's complete I'll do final patch folding into self-contained
> pieces (e.g. to merge code updates with the respective test case additions
> which I keep as separate changes for easier verification in development)
> and post the resulting series upstream. I yet plan to add a number of
> proper test cases though to verify that the compare elimination pass does
> its job, using the template I have made for my own verification already.
I have not started patch folding yet, and the number has since increased.
OTOH I have some test cases already prepared, which also revealed an odd
phenomenon where the comparison elimination pass was unable to do its job
due to a seemingly odd choice of instructions used as a result of code
transformations where e.g. a signed expression like (x >= 0) used in the
source code was transformed to (x > -1) in the expand pass. This is of
course equivalent, but doing a comparison against -1 in preference to 0 is
odd to say the least, because in the VAX architecture a comparison against
0 is a side effect of many hardware operations, whereas a comparison
against -1 has to be done explicitly.
In the end it has turned out to be a result of a bug in setting RTX costs
in the VAX backend, which for constant zero returns 0, meaning unspecified
in the internal API. That in turn makes the caller assume a single "fast
instruction", which is `COSTS_N_INSNS (1)' or 4. Whereas the cost for -1
and small positive constants is set to 1, which is obviously less than 4,
and hence the reluctance of the compiler to use constant zero.
To fix that I have decided to rescale the costs in terms of "fast
instructions" leaving the weights unchanged (i.e. effectively multiplying
all them by four), except for constant zero which I gave the cost of
`COSTS_N_INSNS (1) / 2' giving it the deserved precedence and yielding
better machine code overall.
The other issue was a mismatch between predicates and constraints
mentioned above in patterns like:
(define_insn "mulsidi3"
[(set (match_operand:DI 0 "nonimmediate_operand" "=g")
(mult:DI (sign_extend:DI
(match_operand:SI 1 "nonimmediate_operand" "nrmT"))
(sign_extend:DI
(match_operand:SI 2 "nonimmediate_operand" "nrmT"))))]
""
"emul %1,%2,$0,%0")
Here we have operands #1 and #2 whose `nonimmediate_operand' predicates
exclude immediates, but the constraints have `n' included meaning an
immediate is permitted in assembly generation. Of course the hardware
instruction does permit immediates with its input operands, so I have no
idea why that predicate was chosen by whoever wrote this code.
More importantly the predicate is ineffective, because a register operand
is permitted and while RTL insns are matched by the expand pass pseudo
registers will be supplied for any immediate operands with the intent to
reload them as appropriate in the reload pass. Then reload will see the
`n' constraint and happily supply the immediate directly anyway rather
than reloading it into a hardware register (with a preceding instruction).
And what was a mere internal inconsistency with my changes turned into an
ICE, so I have fixed the predicates with the pattern above (and a few
more).
Consequently I have ended up with extra two patches, both of which need
to be fully regression-tested, so barring any further issues and adding
Matt's change I expect to have ultimate test results in 4-5 days' time.
Maciej
Home |
Main Index |
Thread Index |
Old Index