Subject: Re: PIC hacks
To: None <richard.earnshaw@arm.com>
From: Neil A. Carson <neil@causality.com>
List: port-arm32
Date: 12/05/1998 14:47:21
Richard Earnshaw wrote:

> Errm, how can a file format cause/trigger a processor bug?  That's an
> execution problem.  I'm not reading any of the Linux lists, so I've no

Well, of course :-)

In the APCS entry in their ELF-compiled code, they generate some code
that rounds off the stack pointer to a lower base. Apparently sometimes
an instructions gets missed in this sequence or something. I've attached
Russel's report here. Most of these just sound like screwed cache
consistency, but you can judge yourself of course...

Russel King wrote:

                     * * * * * PRELIMINARY * * * * *

After the weekends work on the Linux kernel, I am convinced that I have
found
a bug in the SA110 revision S processor.  My reasons are this:

1. Dave Gilbert has been experiencing random crashes with 2.1.126 on his
EBSA285
   with tulip netcard driver, which only occur with the data cache on. 
Any
   attempt to add debugging causes the nature of the problem to change.

   If I supply Dave a kernel configured for his machine compiled using
my
   GCC 2.7.2.2 ELF tools (which I trust), he sees the same problem. 
However,
   if I compile up an EBSA285 kernel, well - it's been running for 28
days
   without one single problem now.

2. In 2.1.129 recently, I have been getting data aborts in schedule().
   On close investigation, it appears that the code sequence, aligned
   as follows causes the SA110S to misbehave:

   r4 = c00ff6c0 r5 = c00ffc40 r6 = 00000000
      schedule:
c0018ba4:       mov ip, sp
c0018ba8:       stmfd sp!, {r4, r5, r6, fp, ip, lr, pc}
c0018bac:       sub fp, ip, #4                  ;  sp = c00fdf4c
c0018bb0:       bic r5, sp, #0x1f00
c0018bb4:       bic r5, r5, #0xff

   While this code executes as expected 99% of the time, the SA110S
appears
   to under some bizarre circumstance miss the instruction at c0018bb0.
   As a result, r5 ends up containing c00ffc00, NOT c00fc000.

   The problem appears to be dependent on the alignment of this routine,
   the number of instructions between the stmfd and the first bic, and
   the registers used.  As a result, I believe that it is an interaction
   between the Icache, and the stmfd instruction.

   There has already been one instance of Icache and stm interaction
causing
   problems - the stmia ..., {r8-pc}^ instruction (which stores both
non-user
   mode (r8 - r12) and user mode (sp, lr) registers, which ARM Linux
already
   avoids.

   I have not noticed this problem until now because I normally use my
   special GCC 2.7.2.2 ELF for ARM, which does not optimise to the same
   level (and inserts an extra instruction between the sub and the bic).

3. A vdir /usr -R > /dev/null on the Netwinder under the above mentioned
2.1.129
   kernel (but with the assembler modified to prevent this bug) appears
to
   cause a lot of segmentation violations.  Turning off the data cache
(only)
   cures this problem.

                     * * * * * PRELIMINARY * * * * *