Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Bountysource campaign for gcc-vax



On Sun, 25 Oct 2020, Maciej W. Rozycki wrote:

> > > > 		=== gcc Summary ===
> > > > 
> > > > # of expected passes		110142
> > > > # of unexpected failures	1821
> > > > # of unexpected successes	7
> > > > # of expected failures		566
> > > > # of unresolved testcases	30
> > > > # of unsupported tests		3131
> > > > /scratch/vol0/vax-netbsd/obj/gcc/gcc/xgcc  version 11.0.0 20200704 (experimental) (GCC)

 FAOD throughout this report I'll be referring to test results with GCC as 
it stands as "base results" and test results with all my changes applied 
as "final results"; by no means I consider the effort complete at this 
stage, but I think I'm pretty close.

 Full testing takes between 42 and 68 hours, depending on whether and how 
quickly I kill a runaway `chi2_q_uniform_random.exe' process from the 
libstdc++ testsuite, which is also the testsuite that takes the most time.

 The following components are not supported at this time for the reasons 
stated and therefore could not have been verified:

1. libada -- not ported to VAX/NetBSD, machine/OS bindings are not 
   present.

2. libgfortran -- oddly enough for Fortran a piece requires IEEE 754 
   floating-point arithmetic (possibly a porting problem too).

3. libgo -- not ported to VAX/NetBSD, machine/OS bindings are not present.

 Originally libgomp was broken and didn't build due to an ICE, but I have 
now figured out it gets fixed with Matt's old fix for GCC PR target/58901 
which wasn't upstreamed and therefore I have just run quick final testing, 
but I have no reference base results at this point.  I'll include Matt's 
fix in the upcoming submission as I think it's important.

 I now have full final test results as follows:

1.		=== gnat Summary ===

# of expected passes		1829
# of unexpected failures	161
# of unexpected successes	23
# of unresolved testcases	941
# of unsupported tests		29

-- no change compared to base results.

2. 		=== gcc Summary ===

# of expected passes		112963
# of unexpected failures	1886
# of unexpected successes	7
# of expected failures		665
# of unresolved testcases	28
# of unsupported tests		3161

-- there have been the following progressions:

-FAIL: gcc.dg/pr83623.c (internal compiler error)
-FAIL: gcc.dg/pr83623.c (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  (test for excess errors)
-FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)
-FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (test for excess errors)
-FAIL: gcc.target/vax/bswapdi-1.c (test for excess errors)

-- and the following regression (discussed previously):

+FAIL: gcc.dg/lto/pr55660 c_lto_pr55660_0.o-c_lto_pr55660_1.o link, -O2 -flto -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects

The results are overall slightly worse than those obtained with the 
20200704 snapshot, but I'm not concerned about it as my changes are 
neutral to the regression and the scope of the project is not to clean the 
backend up.

3.		=== g++ Summary ===

# of expected passes		179033
# of unexpected failures	1278
# of expected failures		754
# of unresolved testcases	11
# of unsupported tests		8946

-- no change compared to base results.

4.		=== gdc Summary ===

# of expected passes		3580
# of unexpected failures	23
# of unsupported tests		1035

-- no change compared to base results.

5.		=== gfortran Summary ===

# of expected passes		18823
# of unexpected failures	16643
# of expected failures		177
# of unresolved testcases	14746
# of untested testcases		1752
# of unsupported tests		671

-- results look poor, but that is due to the lack of libgfortran (for 
reasons stated above) leading to link errors:

gfortran: fatal error: cannot read spec file 'libgfortran.spec': No such file or directory

(arguably a test framework bug, as in the absence of libgfortran it should 
downgrade all run or link tests to compilation tests).  There have been a 
couple of regressions too, included in the figures above, as follows:

+FAIL: gfortran.dg/class_61.f90   -O  (test for excess errors)
+FAIL: gfortran.dg/graphite/PR67518.f90   -O  (internal compiler error)
+FAIL: gfortran.dg/graphite/PR67518.f90   -O  (test for excess errors)

of which the latter turned out to be a red herring or a build environment 
oddity, which disappeared with a clean rebuild, and the two latter ones 
(which are a single one really) were caused by another mismatch between 
predicates and constraints (discussed below) and which I have now fixed, 
but have not rerun full testing with yet.

6.		=== go Summary ===

# of expected passes		1770
# of unexpected failures	778
# of unresolved testcases	2
# of untested testcases		689
# of unsupported tests		3

-- again, poor results are due to libgo missing:

.../usr/bin/vax-netbsdelf-ld: cannot find -lgobegin
.../usr/bin/vax-netbsdelf-ld: cannot find -lgo
collect2: error: ld returned 1 exit status

and the same observations apply as with libgfortran.  Also no change 
compared to base results.

7.		=== objc Summary ===

# of expected passes		2688
# of unexpected failures	2
# of unsupported tests		68

-- no change compared to base results.

8.		=== obj-c++ Summary ===

# of expected passes		1449
# of unexpected failures	1
# of expected failures		2
# of unsupported tests		77

-- no change compared to base results (and almost clean results, wow!).

9.		=== libstdc++ Summary ===

# of expected passes		12247
# of unexpected failures	598
# of expected failures		96
# of unresolved testcases	103
# of unsupported tests		796

-- there have been the following progressions:

-FAIL: 27_io/basic_stringbuf/overflow/char/1.cc execution test
-FAIL: experimental/filesystem/iterators/directory_iterator.cc execution test

and the following regression:

+FAIL: 25_algorithms/set_symmetric_difference/constrained.cc (test for excess errors)

which is however the same issue as PR67518.f90 above.

10.		=== libffi Summary ===

# of expected passes		1819
# of unexpected failures	87
# of unsupported tests		30

-- no change compared to base results.

11.		=== libatomic Summary ===

# of expected passes		44
# of unsupported tests		5

-- clean results!

12.		=== libgomp Summary ===

# of expected passes		2646
# of unexpected failures	40
# of expected failures		4
# of unsupported tests		379

-- as noted above no base results.

>  I have updated the changes significantly, including some required 
> modifications to generic GCC code needed for scenarios not previously 
> considered.  I have also instrumented the test harness to run `size' on 
> all executables run on the remote machine and record the output in the 
> test log.  That has turned out to collect 17253 samples.
> 
>  With that in place I ran a simple (well, maybe not so much) shell command 
> to diff the `size' results and report any test cases whose text size has 
> not decreased with my change in place.  This resulted in these test cases 
> only having grown with the old and the new text sizes respectively shown:
> 
> 2317 2327 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2323 2333 ./pr42833.exe
> 2367 2383 ./pr42833.exe
> 2109 2117 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 2023 2025 ./pr46316.exe
> 2073 2075 ./pr46316.exe
> 5929 5983 ./strlenopt-68.exe
> 2621 2623 ./interchange-0.exe
> 2419 2421 ./interchange-15.exe
> 2663 2665 ./interchange-5.exe
> 2419 2421 ./uns-interchange-15.exe
> 
> (many test cases are run consecutively at different optimisation levels 
> and unfortunately many of them do not care to make the file names of the 
> binaries produced unique, consequently overwriting ones previously built).
> 
>  I'll yet see if there is anything significantly wrong with them, but 
> otherwise I consider my code ready for final verification and I do not 
> consider these size regressions showstoppers for change submission.  After 
> all the code size with 17253 - 17 = 17236 samples has decreased, and any 
> corner-case pessimisations can be sorted out, where feasible, anytime.

 So the size regressions turned out to be due to a bug in one of the 
peepholes I have added for some common compare-branch sequences the 
comparison elimination pass was unable to fold due to the structure of the 
insns involved not fitting the design expectations of the pass, and then a 
pessimisation with consecutive branches like:

   107d9:	d1 55 8f ff 	cmpl r5,$0xffffffff
   107dd:	ff ff ff 
   107e0:	19 68 		blss 1084a <foo+0x7a>
   107e2:	13 5d 		beql 10841 <foo+0x71>

which got expanded unnecessarily with new code:

   107d9:       d1 55 8f ff     cmpl r5,$0xffffffff
   107dd:       ff ff ff
   107e0:       19 76           blss 10858 <foo+0x88>
   107e2:       d1 55 8f ff     cmpl r5,$0xffffffff
   107e6:       ff ff ff
   107e9:       13 64           beql 1084f <foo+0x7f>

Both now fixed, the second one after a rethought of the handling of the 
set of CC modes representing various subsets of the condition codes 
different VAX instructions set, which I am now particularly happy about.

 Overall size results have got a little worse now, and I need to figure 
out what the cause is.  For example I have observed regressions like one 
from:

    10861:	9e ef d9 12 	movab 11b40 <p>,r0
    10865:	00 00 50 
    10868:	90 a0 03 a0 	movb 0x3(r0),0x2(r0)
    1086c:	02 
    1086d:	d1 60 8f 61 	cmpl (r0),$0x64646261
    10871:	62 64 64 
    10874:	13 07 		beql 1087d <main_test+0x21>

to:

    10861:	9e ef e1 12 	movab 11b48 <p>,r0
    10865:	00 00 50
    10868:	90 a0 03 a0 	movb 0x3(r0),0x2(r0)
    1086c:	02 
    1086d:	d1 ef d5 12 	cmpl 11b48 <p>,$0x64646261
    10871:	00 00 8f 61 
    10875:	62 64 64 
    10878:	13 07 		beql 10881 <main_test+0x25>

and I'd like to understand why it's happened.  I can't imagine the 
displacement addressing mode (or even the absolute addressing mode, not 
used with PIC code) to be faster with any VAX implementation than the 
register deferred addressing mode, though surely it doesn't have to be 
slower.  And all things being equal the shorter encoding should win.

>  Once that's complete I'll do final patch folding into self-contained 
> pieces (e.g. to merge code updates with the respective test case additions 
> which I keep as separate changes for easier verification in development) 
> and post the resulting series upstream.  I yet plan to add a number of 
> proper test cases though to verify that the compare elimination pass does 
> its job, using the template I have made for my own verification already.

 I have not started patch folding yet, and the number has since increased.

 OTOH I have some test cases already prepared, which also revealed an odd 
phenomenon where the comparison elimination pass was unable to do its job 
due to a seemingly odd choice of instructions used as a result of code 
transformations where e.g. a signed expression like (x >= 0) used in the 
source code was transformed to (x > -1) in the expand pass.  This is of 
course equivalent, but doing a comparison against -1 in preference to 0 is 
odd to say the least, because in the VAX architecture a comparison against 
0 is a side effect of many hardware operations, whereas a comparison 
against -1 has to be done explicitly.

 In the end it has turned out to be a result of a bug in setting RTX costs 
in the VAX backend, which for constant zero returns 0, meaning unspecified 
in the internal API.  That in turn makes the caller assume a single "fast 
instruction", which is `COSTS_N_INSNS (1)' or 4.  Whereas the cost for -1 
and small positive constants is set to 1, which is obviously less than 4, 
and hence the reluctance of the compiler to use constant zero.

 To fix that I have decided to rescale the costs in terms of "fast 
instructions" leaving the weights unchanged (i.e. effectively multiplying 
all them by four), except for constant zero which I gave the cost of 
`COSTS_N_INSNS (1) / 2' giving it the deserved precedence and yielding 
better machine code overall.

 The other issue was a mismatch between predicates and constraints 
mentioned above in patterns like:

(define_insn "mulsidi3"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=g")
	(mult:DI (sign_extend:DI
		  (match_operand:SI 1 "nonimmediate_operand" "nrmT"))
		 (sign_extend:DI
		  (match_operand:SI 2 "nonimmediate_operand" "nrmT"))))]
  ""
  "emul %1,%2,$0,%0")

Here we have operands #1 and #2 whose `nonimmediate_operand' predicates 
exclude immediates, but the constraints have `n' included meaning an 
immediate is permitted in assembly generation.  Of course the hardware 
instruction does permit immediates with its input operands, so I have no 
idea why that predicate was chosen by whoever wrote this code.

 More importantly the predicate is ineffective, because a register operand 
is permitted and while RTL insns are matched by the expand pass pseudo 
registers will be supplied for any immediate operands with the intent to 
reload them as appropriate in the reload pass.  Then reload will see the 
`n' constraint and happily supply the immediate directly anyway rather 
than reloading it into a hardware register (with a preceding instruction).

 And what was a mere internal inconsistency with my changes turned into an 
ICE, so I have fixed the predicates with the pattern above (and a few 
more).

 Consequently I have ended up with extra two patches, both of which need 
to be fully regression-tested, so barring any further issues and adding 
Matt's change I expect to have ultimate test results in 4-5 days' time.

  Maciej


Home | Main Index | Thread Index | Old Index