Subject: egcs-1.1 m68k codegen bug
To: None <tech-toolchain@netbsd.org>
From: Ignatios Souvatzis <ignatios@theory.cs.uni-bonn.de>
List: tech-toolchain
Date: 08/26/1998 11:40:21
It came to my attention that my report about the m68k code generator bug 
did not make it to the list due to the 40 kB message limit. This message
contains the essence of the report, without the full generated assembler 
output for vfprintf.c.

- vfprintf uses an unsigned 64bit integer to hold the value to convert.
- the octal output switch case looks roughly like this:

	u_quad value;
	char *s;
	...
	value = ...
	s = end_of_buffer;
	...

	do {
		*--s = (value & 7) + '0';
		value >>= 3;
	} while (value);


-O1 output of egcs-1.1 looks like this: 

	L1:
		movb fp@(-461),d0
		andb #7,d0
		addb #48,d0
		movb d0,a0@-
	
		movl fp@(-468),d3
		movl fp@(-464),d4
		lsrl d3
		roxrl d4
		lsrl d3
		roxrl d4
		lsrl d3
		roxrl d4
		movl d3,fp@(-468)
		movl d4,fp@(-464)
		movl d3,d4		<-- wrong!!!
		orl d4,d4		<-- wrong!!!
		bne L1

the correct variant would have been:
		movl d4,fp@(-464)
		orl d3,d4
		bne L1

-O0 output uses instead this, slightly slower, loop end
		negl d3
		negxl d4
		bne L1

For a comparison: any assembler programming beginner, and egcs-1.0,
code the whole loop like this:

		movl fp@(-468),d3
		movl fp@(-464),d4
	L1:
		movb d3,d0
		andb #7,d0
		addb #48,d0
		movb d0,a0@-
		lsrl d3
		roxrl d4
		lsrl d3
		roxrl d4
		lsrl d3
		roxrl d4
		movl d3,d0
		orl d4,d0
		bne L1

Summary: egcs-1.1 not only creates much slower code for this example
(1 byte fetch, 2 longword fetches, and 2 longword stores per loop iteration 
more) but the created code is wrong!

Regards,
	Ignatios Souvatzis