Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: New Raspberry Pi image



On Jan 25, 2013, at 3:52 PM, Wim Lewis wrote:

> On 1/25/13 2:32 PM, Matt Thomas wrote:
>> On Jan 25, 2013, at 10:34 AM, Stephan wrote:
>>> That makes things clear. As for performance in general, what known
>>> issues are there yet? One thing I know is hard-float, which apparently
>>> requires changes to libc (also to the CPU support in the kernel?).
>> 
>> Hardfloat requires an entire changes to a new ABI.  It's a very big change.
> 
> 
> My understanding (which may be wrong) is that there are two pretty
> distinct pieces of work needed to fully support hardfloat.
> 
> One is that the kernel needs to save/restore the FPU's state on context
> switches, and deal with initializing it, processing FPU-specific
> exceptions, etc.

That's been there for a while

> The other is that the hardfloat ARM ABI uses FPU registers to pass
> values around, and so it's (of course) incompatible with the softfloat ABI.

It even worse than that.  NetBSD uses an older calling standard and gcc can't 
emit FP instruction when using that older calling standard.

So to get hardfloat, NetBSD has to switch to the new calling standard which 
happens to be binary incompatible with the old one.  Converting to the new ABI 
requires a corresponding change in the kernel so it's not a simple change over.

> It's possible to do the first without doing the second. Function-call
> performance isn't as good, since you don't get to use the FPU registers,
> but the inner loops of fp-heavy code can be compiled to use hardware
> floating point instructions and get most of the speedup of a full
> hardfloat implementation. I think gcc even has a flag for this situation
> (hardfloat using the softfloat ABI).

If you really need hardfloat, you can write a library with the softfloat 
interface which moves its argument to the VFP registers, does the FP op, and 
moves them back to ARM registers.  You can then use LD_PRELOAD to load this 
shared library and get close to FP performance with no changes to your program.


Home | Main Index | Thread Index | Old Index