Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Bountysource campaign for gcc-vax



On Wed, 4 Nov 2020, Maciej W. Rozycki wrote:

> 2. 		=== gcc Summary ===
> 
> # of expected passes		112963
> # of unexpected failures	1886
> # of unexpected successes	7
> # of expected failures		665
> # of unresolved testcases	28
> # of unsupported tests		3161

 Final pre-upstream-submission report.  Current results:

		=== gcc Summary ===

# of expected passes		119308
# of unexpected failures	1853
# of unexpected successes	7
# of expected failures		665
# of unresolved testcases	27
# of unsupported tests		3442

-- and these progressions:

-FAIL: gcc.c-torture/compile/pr46883.c   -Os  (internal compiler error)
-FAIL: gcc.c-torture/compile/pr46883.c   -Os  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O2  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O2  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -g  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -g  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c   -Os  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c   -Os  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c   -O2  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c   -O2  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c   -O3 -g  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c   -O3 -g  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c   -Os  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c   -Os  (test for excess errors)
-FAIL: gcc.c-torture/execute/20040709-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
-FAIL: gcc.c-torture/execute/20040709-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O2  (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O2  (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -g  (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -g  (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)
-FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
-FAIL: gcc.dg/20050629-1.c (internal compiler error)
-FAIL: gcc.dg/20050629-1.c (test for excess errors)
-FAIL: c-c++-common/torture/pr53505.c   -Os  (internal compiler error)
-FAIL: c-c++-common/torture/pr53505.c   -Os  (test for excess errors)

> 5.		=== gfortran Summary ===
> 
> # of expected passes		18823
> # of unexpected failures	16643
> # of expected failures		177
> # of unresolved testcases	14746
> # of untested testcases		1752
> # of unsupported tests		671

 Likewise:

		=== gfortran Summary ===

# of expected passes		18825
# of unexpected failures	16638
# of expected failures		177
# of unresolved testcases	14746
# of untested testcases		1752
# of unsupported tests		671

-- and this single progression:

-FAIL: gfortran.dg/coarray_stopped_images_1.f08   -Os  (internal compiler error)

> 9.		=== libstdc++ Summary ===
> 
> # of expected passes		12247
> # of unexpected failures	598
> # of expected failures		96
> # of unresolved testcases	103
> # of unsupported tests		796

 This continues suffering from intermittent failures, many of which would 
go away it would seem just with a timeout increase (and then further test 
cases would not anymore time out with the system load going down).  This 
however would significantly increase the run time, especially due to the 
cases that do hang.  So I am not going to fiddle with the timeout at this 
time (which BTW is on per-testsuite basis, so no need to interfere with 
other parts of testing), but I may look into it later especially if I feel 
so inclined as to actually figure out what is causing the worst offenders 
to hang.

 No changes in test results otherwise.

> >  I'll yet see if there is anything significantly wrong with them, but 
> > otherwise I consider my code ready for final verification and I do not 
> > consider these size regressions showstoppers for change submission.  After 
> > all the code size with 17253 - 17 = 17236 samples has decreased, and any 
> > corner-case pessimisations can be sorted out, where feasible, anytime.

 I have updated size results now.  I have only analysed the C subset, as 
going through all the test suites would take too much time.  Overall there 
have now been 17592 and 17586 samples collected, for the MODE_CC change 
only and the whole patchset respectively (missing samples are of course 
due to failed compilation in the less patched version).  The size changes 
are as follows:

		    MODE_CC change	    whole patchset
		samples	average	median	samples average median
----------------------------------------------------------------
regressions	1814	0.578%	0.198%	427	1.701%	0.523%
unchanged	15119	0.000%	0.000%	83	0.000%	0.000%
progressions	659	0.587%	0.194%	17076	0.660%	1.076%
total		17592			17586

So we have lost a little from the MODE_CC change itself and gained a lot 
from the preparatory changes.

 There have been a few surprising outliers which I mean to look into:

old	new	change	%change	filename
----------------------------------------------------
2426	2950	+524	+21.599	20111208-1.exe
2251	3055	+804	+35.717	990404-1.exe
2931	4213	+1282	+43.739	pr57521.exe
3043	5579	+2536	+83.339	20000422-1.exe

(these are whole-patchset figures) which are reflected in the average vs 
median figures above.  There have been a couple of unusual reductions too:

old	new	change	%change	filename
----------------------------------------------------
6289	4781	-1508	-23.978	vector-compare-1.exe
6285	4781	-1504	-23.930	vector-compare-1.exe

>  Overall size results have got a little worse now, and I need to figure 
> out what the cause is.  For example I have observed regressions like one 
> from:
> 
>     10861:	9e ef d9 12 	movab 11b40 <p>,r0
>     10865:	00 00 50 
>     10868:	90 a0 03 a0 	movb 0x3(r0),0x2(r0)
>     1086c:	02 
>     1086d:	d1 60 8f 61 	cmpl (r0),$0x64646261
>     10871:	62 64 64 
>     10874:	13 07 		beql 1087d <main_test+0x21>
> 
> to:
> 
>     10861:	9e ef e1 12 	movab 11b48 <p>,r0
>     10865:	00 00 50
>     10868:	90 a0 03 a0 	movb 0x3(r0),0x2(r0)
>     1086c:	02 
>     1086d:	d1 ef d5 12 	cmpl 11b48 <p>,$0x64646261
>     10871:	00 00 8f 61 
>     10875:	62 64 64 
>     10878:	13 07 		beql 10881 <main_test+0x25>
> 
> and I'd like to understand why it's happened.  I can't imagine the 
> displacement addressing mode (or even the absolute addressing mode, not 
> used with PIC code) to be faster with any VAX implementation than the 
> register deferred addressing mode, though surely it doesn't have to be 
> slower.  And all things being equal the shorter encoding should win.

 Well, this happens due to the constant propagation passes (3 of them) 
eagerly replacing pseudo registers with direct symbol references where 
possible.  This is surely intentional and does make sense for load/store 
architectures, which usually have a fixed instruction size and where a 
(base+offset) symbol reference is equal in cost to a (register) pointer 
dereference both size- and performance-wise.  In fact the same instruction 
is typically used with (offset) set to 0 and (register) being (base) from 
the former calculation, although some more arcane, especially compressed 
instruction sets may have a different (shorter) encoding available for 
references with (reduced or) no offset.

 It's not clear to me though why the constant propagation does not happen 
with CC0 even though the passes do run regardless, and I now have run out 
of time for making any further investigation here lest I miss the upstream 
submission deadline.  It will be more important to figure out what to do 
with it, if anything, with MODE_CC.  For performance optimisation we want 
to prioritise execution performance over size reduction (though a larger 
I-cache footprint does negatively affect performance of course), and for 
size optimisation we want the reverse.

 There are further code quality regressions due to earlier compilation 
stages trying to push expression evaluation earlier where possible so as 
to make data dependencies further apart from each other.  This works well 
for computations and architectures that do not involve condition codes set 
as a side effect of calculations.  However for integer negation that makes 
RTL code produced equivalent to assembly like:

	movb *8(%ap),%r0
	mnegb %r0,%r1
	tstb %r0
	jeql .L2

which the comparison elimination pass cannot really do anything about 
because the comparison is made on the source rather than the target 
operand of the negation (we could add a peephole for this, but this seems 
futile an effort, as one'd have to iterate over all the possible such 
cases), even though this is really equivalent to:

	movb *8(%ap),%r0
	mnegb %r0,%r1
	jeql .L2

or, if R0 is dead at the conclusion of the branch, even:

	mnegb *8(%ap),%r1
	jeql .L2

Since the compiler insists on doing the comparison on the source of the 
negation it obviously has to load it into a temporary so as to avoid 
accessing the original memory location twice, hence the sequence of three 
instructions rather than just a single one.  A similar phenomenon can be 
observed with the XOR operation and in other cases.

 Finally due to how the setting of the condition codes is expressed in RTL 
in the MODE_CC model comparisons on results of operations with side 
effects (pre-decrement, post-increment, volatile) cannot be eliminated as 
it stands, because that would require a duplication of the side effect in 
the RTL transcript, which does not correspond to what hardware does.  This 
results in an inevitable code quality regression from CC0, and probably 
qualifies as a functional bug.

 Maybe this can be avoided somehow, like by stripping the side effects by 
hand in comparison elimination from the extra leg in the RTL stream where 
two operations are performed in parallel by a single hardware instruction, 
or by defining and using RTL syntax where duplicate operand references 
imply any associated side effects only once across multiple operations 
made in parallel.  The latter approach could be useful in other situations 
too where multiple operations are made in parallel, like with our EDIV 
instruction we currently make no use of in GCC, and which actually makes 
as many as three calculations in parallel, but obviously only evaluates 
any side effects once only for each of its operands.

 Something to look into later on then.

>  I have not started patch folding yet, and the number has since increased.

 Patch folding is now complete and the series totals 31 patches.  I have 
additional 4 fixes and clean-ups too, some of which obviously correct so 
they will go in without a review, and 2 compilation fixes for old versions 
of NetBSD like 1.6 which are otherwise harmless so why not.

 I have also completed adding new test cases, which now total 199 files 
and make vax.exp score:

		=== gcc Summary ===

# of expected passes		5890
# of unsupported tests		261

(as code quality tests they cannot be run at optimisation levels that do 
not enable the relevant optimisations, hence the unsupported results).  
These have been already included in the test results quoted earlier on.

 Some test cases revealed various issues and missed optimisations, so in 
the end I made further code updates, and had to fix a preexisting issue 
with constant classification that made my change introduce regressions.  
That in turn fixed a couple of preexisting failures as well (reflected in 
the result updates earlier on), but forced me to rerun regression testing, 
hence extra time consumed.

 Finally I noticed that some quadword (DImode in GCC-speak) operations 
were not handled in a way that would let comparison elimination make use 
of their setting of the condition codes and made a small update to make 
this happen, again causing some code quality regressions against CC0.

 Unfortunately we don't have a CMPQ hardware instruction so the choice of 
comparisons is limited by whatever instructions are available, such as 
MOVQ, or ASHQ, or double-precision synthesised addition/subtraction.  I 
have factored them in with an update to the original MODE_CC change, 
however that in turn caused the middle end to resort to the `__cmpdi2' 
libcall in many cases, not at all more efficient than inlined open-coded 
equivalents using longword operations (e.g. to check for negativity you 
can just execute TSTL over the upper longword rather than making a full 
comparison).

 So I decided to take the update back, however I think there is potential 
in it if it gets developed some more, e.g. by adding insn splitters where 
the operation requested cannot be expressed with readily available DImode 
RTL patterns.  We can get back to it once MODE_CC support has been merged.

 Since a couple of weeks have passed again I need to rebase the patchset 
once more and schedule final regression testing, but that will run in 
parallel to my upstream submission.  The upstream sources are a moving 
target and I could not submit anything if I were to wait four days while 
the repository has been changing upstream.

 Also running some benchmarking would be good doing and I'll think what I 
can do about it once the patches have been submitted and final regression 
testing completed.

  Maciej


Home | Main Index | Thread Index | Old Index