port-arm32: Re: StrongARM K bug

Subject: Re: StrongARM K bug
To: None <richard.earnshaw@arm.com>
From: Nicholas Clark <nick@flirble.org>
List: port-arm32
Date: 03/29/1999 17:50:25
In the last mail Richard Earnshaw said:

> nick@flirble.org said:
> > Is this of direct use to BSD?
> 
> I wrote a utility some time ago (which was posted on this list), which is 
> generally a better work-around than rogering the compiler.  My utility 
> identified just those problem instructions that lay on a page boundary and 
> moved them to a safe location in the binary image.  While it was 
> theoretically possible for this program's heuristics to make a mistake, 
> I've never known it to (however, I no-longer have a Rev-K, so I don't know 
> how much other people are using it).
> 
> A full "fix" to the compiler would require that you never allow the PC to 
> be loaded directly from memory (except via B or BL) -- this will make some 
> return instruction sequences really slow, and will also impact pointers to 
> functions and table-jumps.

I think I wasn't clear:

I added a flag to optionally enable the cc1 workaround for the K bug
The workaround itself is to .align 3; before any LDM involving PC.
As I understand it, the only time the compiler generates an LDM loading
the PC is for a function return.  If this is true then the workaround
catches all problem instances.

 
> Summary: I still don't think fixing the compiler is the correct way to 
> address this bug.
> 
> Richard (gcc/arm maintainer).

I agree that bodging the compiler is not a permanent solution to the problem,
and I wasn't advocating that the compiler flag be enabled by default, or that
distributed binaries be built with the K workaround.  The discussion about
working round the K bug arose because I'd hit specific problems compiling/
running things (bootstrapping egcs, glibc, and perl5.005_03, ld-linux.so) that
appeared to be due to the K bug.

I'd been sent and compiled your program, but found that it wouldn't solve my
problems because

1: It's for a.out, I'm using ELF, and I don't know enough about ELF formats
   to convert your program to modify ELF
2: I think some of my SEGVs due to the K bug were in shared libraries.
   Certainly the perl ones only hit with dynamic loading of the IO extension.
3: The egcs problems were with programs compiled during the bootstrapping
   process (gcc/genrecog), hence even if I'd created an ELF version I'd've
   had to modify the makefile to process binaries before they were used.

I'm not sure what the best solution is short of replacing the buggy CPU
(which may not be possible without replacing the whole RiscPC with something
else).  If the source is available I'm happier with modify the assembler or
object files rather than binary patching.  Matthew Wilcox suggested to me
that one could consider arranging to optionally compile the kernel to
check whether a page finishes with a problem instruction, and if so lock
that page to the next so that the second is always paged in if the first
is paged in.  This too is messy (?This is too messy?), as you then have to
check the subsequent page, which could make a chain, and you'd need a limit
to stop a denial of service using a pathological binary.  Messy as it may be,
it would allow a K processor to use unmodified binaries as is.

I'm not sure if I like that idea either.  Because I'm trying to ensure that
I've got source code for everything I'm running, personally I'm quite happy
with a private, modified toolchain.  Hacking the kernel (Linux or BSD) appears
to me (an outsider) like more work than obtaining a replacement CPU.

I'm being bitten by the K bug, enough to be irritated into doing something
about it.  As it seems I'm currently pointing in the wrong direction, could
someone suggest a more profitable avenue I should be following?

Nick