Port-vax archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Bountysource campaign for gcc-vax
On Wed, 4 Nov 2020, Maciej W. Rozycki wrote:
> 2. === gcc Summary ===
>
> # of expected passes 112963
> # of unexpected failures 1886
> # of unexpected successes 7
> # of expected failures 665
> # of unresolved testcases 28
> # of unsupported tests 3161
Final pre-upstream-submission report. Current results:
=== gcc Summary ===
# of expected passes 119308
# of unexpected failures 1853
# of unexpected successes 7
# of expected failures 665
# of unresolved testcases 27
# of unsupported tests 3442
-- and these progressions:
-FAIL: gcc.c-torture/compile/pr46883.c -Os (internal compiler error)
-FAIL: gcc.c-torture/compile/pr46883.c -Os (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c -O2 (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c -O2 (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c -O3 -g (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c -O3 -g (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c -Os (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c -Os (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c -O2 (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c -O2 (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c -O3 -g (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c -O3 -g (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c -Os (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c -Os (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c -O2 (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c -O2 (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c -O3 -g (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c -O3 -g (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
-FAIL: gcc.dg/20050629-1.c (internal compiler error)
-FAIL: gcc.dg/20050629-1.c (test for excess errors)
-FAIL: c-c++-common/torture/pr53505.c -Os (internal compiler error)
-FAIL: c-c++-common/torture/pr53505.c -Os (test for excess errors)
> 5. === gfortran Summary ===
>
> # of expected passes 18823
> # of unexpected failures 16643
> # of expected failures 177
> # of unresolved testcases 14746
> # of untested testcases 1752
> # of unsupported tests 671
Likewise:
=== gfortran Summary ===
# of expected passes 18825
# of unexpected failures 16638
# of expected failures 177
# of unresolved testcases 14746
# of untested testcases 1752
# of unsupported tests 671
-- and this single progression:
-FAIL: gfortran.dg/coarray_stopped_images_1.f08 -Os (internal compiler error)
> 9. === libstdc++ Summary ===
>
> # of expected passes 12247
> # of unexpected failures 598
> # of expected failures 96
> # of unresolved testcases 103
> # of unsupported tests 796
This continues suffering from intermittent failures, many of which would
go away it would seem just with a timeout increase (and then further test
cases would not anymore time out with the system load going down). This
however would significantly increase the run time, especially due to the
cases that do hang. So I am not going to fiddle with the timeout at this
time (which BTW is on per-testsuite basis, so no need to interfere with
other parts of testing), but I may look into it later especially if I feel
so inclined as to actually figure out what is causing the worst offenders
to hang.
No changes in test results otherwise.
> > I'll yet see if there is anything significantly wrong with them, but
> > otherwise I consider my code ready for final verification and I do not
> > consider these size regressions showstoppers for change submission. After
> > all the code size with 17253 - 17 = 17236 samples has decreased, and any
> > corner-case pessimisations can be sorted out, where feasible, anytime.
I have updated size results now. I have only analysed the C subset, as
going through all the test suites would take too much time. Overall there
have now been 17592 and 17586 samples collected, for the MODE_CC change
only and the whole patchset respectively (missing samples are of course
due to failed compilation in the less patched version). The size changes
are as follows:
MODE_CC change whole patchset
samples average median samples average median
----------------------------------------------------------------
regressions 1814 0.578% 0.198% 427 1.701% 0.523%
unchanged 15119 0.000% 0.000% 83 0.000% 0.000%
progressions 659 0.587% 0.194% 17076 0.660% 1.076%
total 17592 17586
So we have lost a little from the MODE_CC change itself and gained a lot
from the preparatory changes.
There have been a few surprising outliers which I mean to look into:
old new change %change filename
----------------------------------------------------
2426 2950 +524 +21.599 20111208-1.exe
2251 3055 +804 +35.717 990404-1.exe
2931 4213 +1282 +43.739 pr57521.exe
3043 5579 +2536 +83.339 20000422-1.exe
(these are whole-patchset figures) which are reflected in the average vs
median figures above. There have been a couple of unusual reductions too:
old new change %change filename
----------------------------------------------------
6289 4781 -1508 -23.978 vector-compare-1.exe
6285 4781 -1504 -23.930 vector-compare-1.exe
> Overall size results have got a little worse now, and I need to figure
> out what the cause is. For example I have observed regressions like one
> from:
>
> 10861: 9e ef d9 12 movab 11b40 <p>,r0
> 10865: 00 00 50
> 10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
> 1086c: 02
> 1086d: d1 60 8f 61 cmpl (r0),$0x64646261
> 10871: 62 64 64
> 10874: 13 07 beql 1087d <main_test+0x21>
>
> to:
>
> 10861: 9e ef e1 12 movab 11b48 <p>,r0
> 10865: 00 00 50
> 10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
> 1086c: 02
> 1086d: d1 ef d5 12 cmpl 11b48 <p>,$0x64646261
> 10871: 00 00 8f 61
> 10875: 62 64 64
> 10878: 13 07 beql 10881 <main_test+0x25>
>
> and I'd like to understand why it's happened. I can't imagine the
> displacement addressing mode (or even the absolute addressing mode, not
> used with PIC code) to be faster with any VAX implementation than the
> register deferred addressing mode, though surely it doesn't have to be
> slower. And all things being equal the shorter encoding should win.
Well, this happens due to the constant propagation passes (3 of them)
eagerly replacing pseudo registers with direct symbol references where
possible. This is surely intentional and does make sense for load/store
architectures, which usually have a fixed instruction size and where a
(base+offset) symbol reference is equal in cost to a (register) pointer
dereference both size- and performance-wise. In fact the same instruction
is typically used with (offset) set to 0 and (register) being (base) from
the former calculation, although some more arcane, especially compressed
instruction sets may have a different (shorter) encoding available for
references with (reduced or) no offset.
It's not clear to me though why the constant propagation does not happen
with CC0 even though the passes do run regardless, and I now have run out
of time for making any further investigation here lest I miss the upstream
submission deadline. It will be more important to figure out what to do
with it, if anything, with MODE_CC. For performance optimisation we want
to prioritise execution performance over size reduction (though a larger
I-cache footprint does negatively affect performance of course), and for
size optimisation we want the reverse.
There are further code quality regressions due to earlier compilation
stages trying to push expression evaluation earlier where possible so as
to make data dependencies further apart from each other. This works well
for computations and architectures that do not involve condition codes set
as a side effect of calculations. However for integer negation that makes
RTL code produced equivalent to assembly like:
movb *8(%ap),%r0
mnegb %r0,%r1
tstb %r0
jeql .L2
which the comparison elimination pass cannot really do anything about
because the comparison is made on the source rather than the target
operand of the negation (we could add a peephole for this, but this seems
futile an effort, as one'd have to iterate over all the possible such
cases), even though this is really equivalent to:
movb *8(%ap),%r0
mnegb %r0,%r1
jeql .L2
or, if R0 is dead at the conclusion of the branch, even:
mnegb *8(%ap),%r1
jeql .L2
Since the compiler insists on doing the comparison on the source of the
negation it obviously has to load it into a temporary so as to avoid
accessing the original memory location twice, hence the sequence of three
instructions rather than just a single one. A similar phenomenon can be
observed with the XOR operation and in other cases.
Finally due to how the setting of the condition codes is expressed in RTL
in the MODE_CC model comparisons on results of operations with side
effects (pre-decrement, post-increment, volatile) cannot be eliminated as
it stands, because that would require a duplication of the side effect in
the RTL transcript, which does not correspond to what hardware does. This
results in an inevitable code quality regression from CC0, and probably
qualifies as a functional bug.
Maybe this can be avoided somehow, like by stripping the side effects by
hand in comparison elimination from the extra leg in the RTL stream where
two operations are performed in parallel by a single hardware instruction,
or by defining and using RTL syntax where duplicate operand references
imply any associated side effects only once across multiple operations
made in parallel. The latter approach could be useful in other situations
too where multiple operations are made in parallel, like with our EDIV
instruction we currently make no use of in GCC, and which actually makes
as many as three calculations in parallel, but obviously only evaluates
any side effects once only for each of its operands.
Something to look into later on then.
> I have not started patch folding yet, and the number has since increased.
Patch folding is now complete and the series totals 31 patches. I have
additional 4 fixes and clean-ups too, some of which obviously correct so
they will go in without a review, and 2 compilation fixes for old versions
of NetBSD like 1.6 which are otherwise harmless so why not.
I have also completed adding new test cases, which now total 199 files
and make vax.exp score:
=== gcc Summary ===
# of expected passes 5890
# of unsupported tests 261
(as code quality tests they cannot be run at optimisation levels that do
not enable the relevant optimisations, hence the unsupported results).
These have been already included in the test results quoted earlier on.
Some test cases revealed various issues and missed optimisations, so in
the end I made further code updates, and had to fix a preexisting issue
with constant classification that made my change introduce regressions.
That in turn fixed a couple of preexisting failures as well (reflected in
the result updates earlier on), but forced me to rerun regression testing,
hence extra time consumed.
Finally I noticed that some quadword (DImode in GCC-speak) operations
were not handled in a way that would let comparison elimination make use
of their setting of the condition codes and made a small update to make
this happen, again causing some code quality regressions against CC0.
Unfortunately we don't have a CMPQ hardware instruction so the choice of
comparisons is limited by whatever instructions are available, such as
MOVQ, or ASHQ, or double-precision synthesised addition/subtraction. I
have factored them in with an update to the original MODE_CC change,
however that in turn caused the middle end to resort to the `__cmpdi2'
libcall in many cases, not at all more efficient than inlined open-coded
equivalents using longword operations (e.g. to check for negativity you
can just execute TSTL over the upper longword rather than making a full
comparison).
So I decided to take the update back, however I think there is potential
in it if it gets developed some more, e.g. by adding insn splitters where
the operation requested cannot be expressed with readily available DImode
RTL patterns. We can get back to it once MODE_CC support has been merged.
Since a couple of weeks have passed again I need to rebase the patchset
once more and schedule final regression testing, but that will run in
parallel to my upstream submission. The upstream sources are a moving
target and I could not submit anything if I were to wait four days while
the repository has been changing upstream.
Also running some benchmarking would be good doing and I'll think what I
can do about it once the patches have been submitted and final regression
testing completed.
Maciej
Home |
Main Index |
Thread Index |
Old Index