Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Moving VAX into 21 century :-)



Morning Johnny,

Den 2019-08-27 kl. 00:40, skrev Johnny Billquist:
A couple of comments without having caught up where the thread is now...

On 2019-08-26 14:08, Anders Magnusson wrote:
Hi all,

I have been looking at some VAX problems lately, and have found out that there are two architectural things things that probably would help VAX quite much.

1) Change calling convention.
     As described in my previous mail, it would solve a very old well-known performance problem.

It is indeed well known that the CALL/RET instructions on the VAX are very heavy. However, it is not that they are heavy for no reason. I can't really see that the JSB/RSB would be much better, unless we actually think that we want to strip some of that functionality away.

Things that CALL do:
Push requested register on stack at entry, and automatically restores them again at return.
Saves AP and sets up new AP.
Saves and sets up new FP.

Pushing and popping registers will need to be done by compiler if not using CALL/RET. Should not have any penalty on speed, but will grow memory needs a little, I would expect. Setting up AP - well, you might have some clever convetions in mind, but I would expect similar effort to CALL would be needed for JSB. Setting up FP - if we don't want tracebacks and returning without first cleaning up the stack to work, then this can save some time. But is this really something we'd like?

I wonder how much gain there really is, if you still want all the bells that CALL gives you? I would expect the end cost to come out about the same, but with more memory required.
In the common case we don't need any of the extra stuff that CALLS does, this is why jsb/rsb can be used instead. - Pass parameters in registers (we have a bunch of them).  This avoids memory cycles as well which is good.
- No need for AP (Use for TLS?)
- No need for FP (unless we are playing with VLAs which is quite uncommon)
- No need to save PSL or align stack.  Keeping stack aligned is up to the compiler. - Keep a "red zone" below stack of 8 words or so to simplify for leaf functions.  Amd64 ABI does this as well.

It won't increase the code noticeable;
- CALLS is (usually) 9 bytes (7 + the word in the function)
- JSB is 6 bytes.
- PUSHR/POPR takes 4 bytes.

So if no regs needs saving we save 3 bytes, otherwise we add 5.
Also we save three bytes if we only are inside the red zone.



2) Make VAX use IEEE floats :-)
     Today virtually no floating point exist that is not IEEE. The only fragment around is probably the VAX floats.

This can hardly be a performance problem. So now we're talking about some compatibility or general behavior thing? But FP on VAX do have differences, in the hardware, that we cannot pretend don't exist. So is this about stupid programs that are making some assumptions that we fail, while still actually not caring enough about actual IEEE FP, or do we really want to be proper IEEE FP, in which case we're going to need to emulate in software, which will have a huge impact on performance.
We're talking about being able to compile and run existing programs without getting a headache.

If we just want to fake it, we might just want to lie, and shortcircuit a couple of obvious differences. And then run with it.

     I have done some checking, and if we accept the difference in rounding (VAX uses a different way than IEEE) then it would be (almost) no overhead in the common cases (overhead comes when dealing with INF, NAN and subnormals).      - Use F and G floats.  They have the same format as IEEE single and double, and are both available on virtually all VAXen.      - Make use of the floating point faults that VAXen can generate to emulate the features missing on VAX.

So essentially just trying to fake it?
I seem to remember there are some differences about non-vanishing values as well, but my brain is fuzzy...
The largest difference will be the rounding, since it works different on VAX and (default) IEEE, and it's not worth trying to fix that. (OTOH, most HW implementations have problems as well, like the (in-)famous x87 :-)


     ...in theory also H floats could be used as long double since they match the IEEE 128-bit quad precision :-)         But since H float is optional it might end up being emulated (maybe not a big problem?)

Comments on this?

Not sure if it's a good idea or not. Not saying for sure it should not be done, but I'd like to better understand what gains we will have, and what we might be sacrificing.
Calling convention:
    + Speed
    - Require update of toolchain

IEEE Floating point:
    + Compatibility with rest of the world. Simple to compile programs. Possible to use otherwise unusable programs.     - Require update of toolchain. Will not support IEEE rounding (would be too slow then).

Not to mention, how come the number of calls have blown up so much?
Much more code, and much more modular code.

Most of the code that was written like 30 years ago on VAX avoided function calls due to their slowliness, there are many comments about that in old code.  This is not the case at all anymore.

-- R



Home | Main Index | Thread Index | Old Index