Port-mips archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: __atomic_test_and_set() and mips o32 - help wanted
On Tue, 18 Nov 2025, Jason Thorpe wrote:
> > Well, you need to trap into the kernel anyway to emulate the operation
> > required, so you might as well make it benefit hardware such as MIPS III,
> > which (as David has also observed) is very much legacy now too, having
> > reached 30+ years of age. And with the alternative syscall approach all
> > the parties lose, as even hardware that has atomics support available has
> > to trap.
>
> There are actually other ways to do this. This is a situation where an
> ifunc would be beneficial, or some other run-time fix-up to the correct
> implementation. A restartable atomic sequence is incredibly cheap.
Possibly, but that requires indirection, which would still penalise
machines from MIPS II onwards (or MIPS III in reality, as true MIPS II is
also as scarce as hen's teeth, having been suffering from issues with the
ECL technology and largely unreliable in the first place).
Then you need to maintain this extra code, which means regression-testing
on regular basis and addressing issues as they arise possibly from changes
made elsewhere.
Conversely the compiler is happy to inline LL/SC sequences as they map
directly to RTL and at worst you need to maintain emulation. It seems
like a good compromise.
> > I have also saved the VAX GCC backend from being dropped and now non-BWX
> > (pre-EV56) Alpha support is at risk as the old register allocator is being
> > removed right away, along with all the backends that still rely on it, and
> > sadly I've been running out of resources to get that sorted, so you are
> > welcome (as is anyone) to step in and assist.
>
> No everyone knows about the impending dooms, alas. It’s a real shame
> that non-BWX support is on the chopping block (aside: how on earth can
> that have such a huge impact on the register allocator???) … there are
That's a historical artefact of the old IRA allocator in GCC: the Alpha
backend wires straight into the guts of the allocator so as to make sure
all subword memory accesses are 4-byte-aligned that are produced as data
is assigned to memory locations, and could therefore be expanded into an
instruction sequence that does not require an extra register (one is
available as a temporary, but two are needed for an unaligned location).
The new LRA allocator obviously has no guts available to be wired into
(and no GCC general maintainer would let such abuse in nowadays anyway).
I guess the simplest initial approach would be to fix a temporary for the
purpose of making sure an aligned memory access can be produced with no
extra allocatable register use.
Implementing a way for backends to request the allocator to produce a
word-aligned memory reference for subword memory accesses might be the
right ultimate approach. Or the allocator could be extended to permit
more than temporary and then the backend would handle the calculation.
Apparently the original IRA allocator was meant to have support for
multiple temporaries implemented, but that never happened.
Cf. <https://gcc.gnu.org/wiki/LRAIsDefault>, it's been there for a while
now; <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117185> for the Alpha
issue.
> quite a few non-BWX machines still out there (at least at chez moi,
> there are 3 + a parts machine), and for someone wanting to dabble in
> Alpha hardware, the BWX-supporting systems fetch top dollar on the used
> market (compared to merely more-than-I-should-spend for an EV4-class
> system) - people still gaga to run VMS I guess - making the non-BWX
> systems a lot more accessible.
I also have an EV45 box wired in my lab (and another in a fully working
condition at my other site as well), and I know of a bunch of people who
want to run such machines too. And if you go for the more exotic stuff,
such as TURBOchannel, then non-BWX might be the only option.
> Emulation of BWX would be ***incredibly*** slow. Like, slow enough that
> pre-BWX systems would be unusable, I’m afraid.
There are issues with data races on subword memory accesses on non-BWX
Alpha machines too, which affect some algorithms. I've already written
the GCC-side workaround for it (see the `-msafe-bwa' and `-msafe-partial'
options, as from version 15) as it's been the reason why support has been
already dropped from the Linux kernel: the non-BWX Alpha hacks hindered
all the other ports and didn't work anyway.
A kernel side is required as well, to emulate unaligned LL/SC for data
consistency only with otherwise non-atomic operations (it's for the corner
case really of a unaligned 16-bit memory write spanning an aligned 4-byte
boundary). I made a working prototype for Linux (no regressions through
the GCC testsuite!) back in Feb this year, based on a promise that it
might be the way to bring support back at least for a choice of systems
(e.g. Jensen support won't ever come back I've been told), but there were
some concerns to address and I've found no time to address them since,
sigh...
Cf. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117759> for a proper
description of the problem.
> I’m not in a position to directly contribute to the typing-of-code
> effort to help save the pre-BWX Alpha back-end, but what else can I (and
> others) do to help?
Well, someone does need to write the code. I mean to myself, but I can't
stretch beyond limits as I have other fires to put out too (and a real
life as well), so please feel free to beat me to it. Otherwise I'll get
there eventually, but it might take a couple years yet.
NB there's the issue too on my plate of a broken/unimplemented VAX
exception unwinder causing most C++ code not to work correctly. It's not
a trivial one or I'd have made it already as I made an attempt a couple
years ago, but failed.
Despite native support in the VAX architecture I now think it just might
be easier to rely on DWARF anyway, just as virtually all the platforms do.
And I'm told the expressions required can be made into a DWARF program,
but it's not something I can do offhand as my experience with DWARF has
been limited and not very recent (i.e. DWARF-2-ish).
Once that's been sorted I have a working VAX/NetBSD gdbserver backend to
contribute (which I needed to debug my attempts at the unwinder, as native
GDB obviously doesn't work as it requires the unwinder as well!), but I've
hesitated submitting code known to always crash on termination owing to a
C++ exception going astray. Even though GDB regression-test results via
said gdbserver look reasonable despite the crashes.
Yes, the Alpha issue mentioned above made impact here as well as I spent
several weeks on it last and this year rather than the VAX unwinder as I
originally intended.
FWIW,
Maciej
Home |
Main Index |
Thread Index |
Old Index