Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Some more patches for GCC on NetBSD/VAX coming soon...



> On Mar 30, 2016, at 01:09, David Brownlee <abs%netbsd.org@localhost> wrote:
> 
> On 30 March 2016 at 07:21, Jake Hamby <jehamby420%me.com@localhost> wrote:
>> I'm looking at a few remaining issues in the recent update in NetBSD-current to GCC 5.3, which overall appears to be an improvement over 4.8.5. I dropped GCC-patches from the CC list because I don't think 98% of the subscribers to that list care about VAX, while I know that 100% of the subscribers to this one do. ;-)
> 
> That sounds like a very safe assumption :-p

Yes, I replied to an engineer from Red Hat on that list to apologize for even bothering the list with something in so rough a state for an architecture that 99% of them aren't interested in. To be honest, if I were working on supporting GCC for my day job (despite my great interest in toolchains, I haven't yet worked on a compiler "professionally"), I would probably lean towards voting to deprecate the VAX backend since apparently the previous maintainer is no longer working on it and it obviously has some bugs that still need to be fixed.

I'm very happy with the progress I've been able to make so far (as I said, I don't have professional expertise in compilers, but I've learned a lot on my own), but there needs to be a lot more work in order to "undeprecate" GCC's VAX backend.


> I think that would be great. The CVAX/NVAX would definitely make sense
> as the default CPU target. If you were interested in getting numbers
> for other VAX variants (this is a "what you're already planning is
> most excellent", not a "hey you should do this to" :), then I'm sure
> there are some list members would be happy to run test or provide
> remote access to appropriate machines.

Yes, I emailed a little test utility that I need to add about 30 or 40 more testcases to, which so far tests integer multiplies of different sizes, extended multiply, and nop. Most of them are one line asm instructions that expand out to a loop of 100 opcodes in a row and then I scale the loop count to try to run for 30 seconds (NetBSD only has a 100 Hz system clock on VAX, so gettimeofday() is only accurate to 0.01s). When I have a full set of tests, I'll email it to the list and maybe it could be checked into a miscellaneous directory in the NetBSD tree for future use. I'll try to make the printf() be a C struct that can be pasted directly into a header or source file, for a particular machine, like GMP does for its tuning parameters (which I also want to update, after I've tuned GCC's output to be as fast as it can be for the stuff GMP cares about).

For those who don't know about GCC's library dependencies, recent versions link with several GNU math libraries (GMP, MPFR, MPC) to handle arbitrary-precision arithmetic so that it can generate the correct constants from any build host for the desired target. So because I've had to build those dependencies, I became familiar with the testsuite and that there are various tuning parameters and optimized asm versions of the math functions it builds on that have already been written for VAX but haven't been updated in 10 or 20 years. So that's on my TODO list as well.

The interesting thing will be to see if the ratios between # of cycles for various types of instructions are roughly the same for CVAX/NVAX and for the older "full VAX" models, or if, for example, you could speed up memory copies by using the octaword (128-bit) mov instructions instead of two 64-bit copies, or movc (block move), or something like that. Also, there may be big differences between the penalties for going to memory vs. registers, and for the more complex addressing modes. On SimH, all the instructions seem to take 2-4x as long as nop. Nop takes about 34-40 of the physical CPU's clock cycles to run, so divide the real CPU MHz by that amount and that's the virtual VAX speed for simple instructions.

So far, I know that both "emul" and 16-bit multiply are much slower than one might expect on NVAX. But for 64-bit math, the alternative to "emul" might be a dozen instructions that add up to the same thing. Other CPUs may have different penalties. I especially want to work on optimizing 64-bit math ops because "long long" is used so extensively in more and more code. time_t is now 64 bits on NetBSD, as is the HOST_BITS_PER_WIDE_INT size on GCC, even if the host is 32 bits (it uses "long long"). So the code currently in vax.c that is expects WIDE_INT to be 32, and not 64, is definitely broken (I added an "#if 0" around it in my version for now, since it was only ever used for "-Os" anyway).

Another thing I'm curious about is what the performance penalty for G_float vs. D_float is on different machines. G_float appears to be only slightly slower than D_float on NVAX, and I wonder if there is a bigger difference on the older models. NetBSD uses D_float for its doubles: they have the same exponent range as a float but more digits in the mantissa. I would really have preferred if NetBSD had used G_float by default because they're closer to IEEE-754 doubles: one fewer decimal digit of mantissa, but much greater exponent range. In fact, you can represent a googol (1.0e100, the number that Google was named after) in an IEEE-754 double, or a G_float, but not a D_float, even though they're both 64 bits. I don't think DEC named it G_float after "googol", but because they already had F_float for single-precision, and G comes after F. The 128-bit H-float FP format that the pre-MicroVAX models support is of fairly limited usefulness, since few platforms even today support "long double" of that precision, and no software that I've ever come across requires long double. x86 has had an 80-bit long double that it uses internally and that compilers support, and some RISC chips do support 128-bit FP, but it's probably fairly slow even on the systems that support it (we can test this).

The reason I'd like to see better support for compiling as G_float (you can do it now with "-mg", but all of the libraries that you link with will crash or return bad results for any "double" values that you pass or receive, because the bits are interpreted differently) is that VAX floating point also does not support denormalized numbers, which means that there's an abrupt gap between +/-2.9e-39 and 0 (there's a hidden 1 bit, even when the mantissa is all 0). With IEEE math, there's special support for this case (see https://en.wikipedia.org/wiki/Denormal_number), which VAX doesn't support at all in any mode.

The reason that a switch from D to G float would be worthwhile isn't to support very large numbers, because most programs don't ever deal with quantities as big as 1e100, which is a very large number, and if they did, they'd already be crashing on VAX with FP overflow signals. What concerns me are programs that perform subtraction or other math with numbers very close to zero, because they'll be rounding to 0 much more often than you would expect of a double (with or without denormalized number support, which some CPUs like Alpha don't fully support in hardware, anyway). With G_float, you can go down to +/-5.6e-309 before it rounds down to 0.

There's a performance penalty to G_float, and a code size penalty (2-byte vs. 1-byte opcodes in some cases), but a much bigger conversion penalty of having to recompile all libraries for both FP formats, like VMS did (Alpha has hardware support for both IEEE and VAX FP formats). If you want to live dangerously and are willing to recompile everything at once, you could probably do "-mg" right now and see what happens. I'll probably do that with a SimH image, but it's probably too impractical for NetBSD/vax to ever switch over the default FP format.

At the very least, it would be nice to add a build mode so that selected libraries (-lc, -lm, -lgcc, -lstdc++, etc.) would be built for G_float as well as D_float and installed in a separate /usr/lib/g-float directory that GCC knew about, so that you could compile programs and other libs with "-mg" and they would be able to coexist with other programs using D_float. But I'm curious if any VAXen pay any serious penalties to use G_float vs. D_float.


> As an aside, I would say the biggest issue for NetBSD/vax (after a
> working compiler :) is the footprint of recent gcc versions (and
> recent NetBSD itself) is more than any but the fastest (and most
> memory rich) VAXen can handle. Thats an unfortunate artefact of
> keeping the gcc version up to date and adding features to NetBSD that
> are expected on modern platforms (amd64 etc), but it would definitely
> be nice if someone were to eke out any performance improvements
> possible for the code that gcc generated :)

Yes, absolutely! That's a very good point. I was very concerned that GCC would become unusable on the older platforms in upgrading from 4.8.5 to 5.3, but it seems to be a bit faster, at least on the build host (GCC on the VAX itself is much slower for the moment because large pieces are being compiled with "-O0" to avoid GCC crashing on some 64-bit conversion stuff) compared to 4.8. The big decrease in speed was going to 4.8 from earlier versions that could be compiled with the C compiler instead of C++.

As for memory usage, I'm fortunate to have been able to max out (thanks to eBay) my VS4000/90 with the full 128MB of ECC RAM, which is actually sufficient for build purposes. In addition, I bought an ACARD AEC-7730A SCSI-to-SATA bridge (works great, but costs close to $300) to connect a 120 GB SATA SSD to the VS4000 through the bridge. I had to flash to the latest VS4000/90 firmware (which I managed to find) in order for the boot ROM to show the correct size for drives bigger than about 1GB, and it turns out that VAX/VMS is incompatible with the SCSI-to-SATA bridge because it sends obsolete SCSI commands that probably don't work with the size of SSD that is connected. But NetBSD boots and runs perfectly from an SSD.

From my previous experiments, I think that 128MB of RAM is just enough to build most C/C++ programs, including NetBSD itself, with GCC because it seems to be pretty frugal about memory. For SimH, I've been using the MicroVAX 3900 emulator with 128MB RAM (in the sources, you can see they had to patch the boot loader code to support twice the max RAM that the real machine supported). And as I said, I think, although I'm not certain, that GCC 5.3 at the very least isn't using any more RAM than the earlier versions compiled as C++, and probably not much more than the earlier versions compiled as C. They're not using exceptions or anything like that for GCC itself, and they disable RTTI, so there isn't any more inherent overhead than C for the subset of C++ that GCC is using.

Sadly, I can't say the same for LLVM. In my other hobbiest experiments with FreeBSD on PowerPC, I've been trying to get their system version of Clang to work for PowerPC, but there are lots of issues related to nobody caring about LLVM on PowerPC, apparently. Even when it does work, it seems to require about 1GB RAM to rebuild itself without swapping excessively. I think we can safely rule out any possibility of any hypothetical LLVM/Clang backend for VAX *ever* being able to self-host. They're using the same basic flavor of C++ (no exceptions, no RTTI), but written with very different assumptions about available memory.


> One thing you may find is that shared libraries have a significant
> performance penalty over static linking on VAX (assuming you are
> running on a system with enough memory to offset the smaller working
> set - which is not usually easy on a VAX :/

Actually, this is a very interesting point, and one that has been annoying me constantly while working on this. NetBSD and ELF on VAX builds everything as -fpic, always. In fact, it's one of the few platforms where neither "-fno-pic" or "-fomit-frame-pointer" has any effect. It's not like Mac OS X that builds everything as PIC by default, but you can disable it. And you can't disable the frame pointer either, because CALLS/RET uses both %fp and %sp. As for shared libraries, the vax.md backend does some really weird stuff to resolve references to external symbols in them. I don't know if it's any weirder than what other GCC backends have to do for ELF, but there's some definite overhead to patching up references to code and data in shared libraries, even though everything is also being compiled as -fPIC, and this can't be disabled (on ELF, at least).

On the plus side, NetBSD's Makefiles are smart enough to only build the .o files for the libraries once, instead of building separate ".so" versions with "-fPIC" that would end up being identical and take twice as much build time. Speaking of which, here's a small patch to add "-g" when building .o files for libraries when MKDEBUG and MKPICLIB are both defined. Currently, MKDEBUG doesn't build the .o files with debugging, so you get useless stub .debug files for /usr/libdata/debug in the debug.tgz. With this patch, the /usr/lib/*.a files become much larger because they also have debug info (but you're compiling with MKDEBUG defined, so that's probably okay), but you should gain the ability to trace into shared libraries (I think this works!) because there'll be something useful in the /usr/libdata/debug/usr/lib/*.debug files. It also eliminates the need to also use MKDEBUGLIB, because the /usr/lib/*.a files will have the debug info for static linking, as already mentioned.

Index: share/mk/bsd.lib.mk
===================================================================
RCS file: /cvsroot/src/share/mk/bsd.lib.mk,v
retrieving revision 1.367
diff -u -r1.367 bsd.lib.mk
--- share/mk/bsd.lib.mk	12 Mar 2016 23:08:58 -0000	1.367
+++ share/mk/bsd.lib.mk	1 Apr 2016 20:14:36 -0000
@@ -162,7 +162,7 @@
 # We only add -g to the shared library objects
 # because we don't currently split .a archives.
 CSHLIBFLAGS+=	-g
-.if ${LIBISPRIVATE} == "yes"
+.if ${LIBISPRIVATE} == "yes" || ${MKPICLIB} == "no"
 CFLAGS+=	-g
 .endif
 .endif


This email is long enough so I'll end here and write more when I have more patches to share.

Best regards,
Jake


Home | Main Index | Thread Index | Old Index