Re: __atomic_test_and_set() and mips o32 - help wanted

To: Jason Thorpe <thorpej%me.com@localhost>
Subject: Re: __atomic_test_and_set() and mips o32 - help wanted
From: "Maciej W. Rozycki" <macro%orcam.me.uk@localhost>
Date: Wed, 19 Nov 2025 01:05:58 +0000 (GMT)

On Tue, 18 Nov 2025, Jason Thorpe wrote:

> > Well, you need to trap into the kernel anyway to emulate the operation 
> > required, so you might as well make it benefit hardware such as MIPS III, 
> > which (as David has also observed) is very much legacy now too, having 
> > reached 30+ years of age.  And with the alternative syscall approach all 
> > the parties lose, as even hardware that has atomics support available has 
> > to trap.
> 
> There are actually other ways to do this.  This is a situation where an 
> ifunc would be beneficial, or some other run-time fix-up to the correct 
> implementation.  A restartable atomic sequence is incredibly cheap.

 Possibly, but that requires indirection, which would still penalise 
machines from MIPS II onwards (or MIPS III in reality, as true MIPS II is 
also as scarce as hen's teeth, having been suffering from issues with the 
ECL technology and largely unreliable in the first place).

 Then you need to maintain this extra code, which means regression-testing 
on regular basis and addressing issues as they arise possibly from changes 
made elsewhere.

 Conversely the compiler is happy to inline LL/SC sequences as they map 
directly to RTL and at worst you need to maintain emulation.  It seems 
like a good compromise.

> > I have also saved the VAX GCC backend from being dropped and now non-BWX 
> > (pre-EV56) Alpha support is at risk as the old register allocator is being 
> > removed right away, along with all the backends that still rely on it, and 
> > sadly I've been running out of resources to get that sorted, so you are 
> > welcome (as is anyone) to step in and assist.
> 
> No everyone knows about the impending dooms, alas.  It’s a real shame 
> that non-BWX support is on the chopping block (aside: how on earth can 
> that have such a huge impact on the register allocator???) … there are 

 That's a historical artefact of the old IRA allocator in GCC: the Alpha 
backend wires straight into the guts of the allocator so as to make sure 
all subword memory accesses are 4-byte-aligned that are produced as data 
is assigned to memory locations, and could therefore be expanded into an 
instruction sequence that does not require an extra register (one is 
available as a temporary, but two are needed for an unaligned location).

 The new LRA allocator obviously has no guts available to be wired into 
(and no GCC general maintainer would let such abuse in nowadays anyway).  
I guess the simplest initial approach would be to fix a temporary for the 
purpose of making sure an aligned memory access can be produced with no 
extra allocatable register use.

 Implementing a way for backends to request the allocator to produce a 
word-aligned memory reference for subword memory accesses might be the 
right ultimate approach.  Or the allocator could be extended to permit 
more than temporary and then the backend would handle the calculation.  
Apparently the original IRA allocator was meant to have support for 
multiple temporaries implemented, but that never happened.

 Cf. <https://gcc.gnu.org/wiki/LRAIsDefault>, it's been there for a while 
now; <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117185> for the Alpha 
issue.

> quite a few non-BWX machines still out there (at least at chez moi, 
> there are 3 + a parts machine), and for someone wanting to dabble in 
> Alpha hardware, the BWX-supporting systems fetch top dollar on the used 
> market (compared to merely more-than-I-should-spend for an EV4-class 
> system) - people still gaga to run VMS I guess - making the non-BWX 
> systems a lot more accessible.

 I also have an EV45 box wired in my lab (and another in a fully working 
condition at my other site as well), and I know of a bunch of people who 
want to run such machines too.  And if you go for the more exotic stuff, 
such as TURBOchannel, then non-BWX might be the only option.

> Emulation of BWX would be ***incredibly*** slow.  Like, slow enough that 
> pre-BWX systems would be unusable, I’m afraid.

 There are issues with data races on subword memory accesses on non-BWX 
Alpha machines too, which affect some algorithms.  I've already written 
the GCC-side workaround for it (see the `-msafe-bwa' and `-msafe-partial' 
options, as from version 15) as it's been the reason why support has been 
already dropped from the Linux kernel: the non-BWX Alpha hacks hindered 
all the other ports and didn't work anyway.

 A kernel side is required as well, to emulate unaligned LL/SC for data 
consistency only with otherwise non-atomic operations (it's for the corner 
case really of a unaligned 16-bit memory write spanning an aligned 4-byte 
boundary).  I made a working prototype for Linux (no regressions through 
the GCC testsuite!) back in Feb this year, based on a promise that it 
might be the way to bring support back at least for a choice of systems 
(e.g. Jensen support won't ever come back I've been told), but there were 
some concerns to address and I've found no time to address them since, 
sigh...

 Cf. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117759> for a proper 
description of the problem.

> I’m not in a position to directly contribute to the typing-of-code 
> effort to help save the pre-BWX Alpha back-end, but what else can I (and 
> others) do to help?

 Well, someone does need to write the code.  I mean to myself, but I can't 
stretch beyond limits as I have other fires to put out too (and a real 
life as well), so please feel free to beat me to it.  Otherwise I'll get 
there eventually, but it might take a couple years yet.

 NB there's the issue too on my plate of a broken/unimplemented VAX 
exception unwinder causing most C++ code not to work correctly.  It's not 
a trivial one or I'd have made it already as I made an attempt a couple 
years ago, but failed.

 Despite native support in the VAX architecture I now think it just might 
be easier to rely on DWARF anyway, just as virtually all the platforms do.  
And I'm told the expressions required can be made into a DWARF program, 
but it's not something I can do offhand as my experience with DWARF has 
been limited and not very recent (i.e. DWARF-2-ish).

 Once that's been sorted I have a working VAX/NetBSD gdbserver backend to 
contribute (which I needed to debug my attempts at the unwinder, as native 
GDB obviously doesn't work as it requires the unwinder as well!), but I've 
hesitated submitting code known to always crash on termination owing to a 
C++ exception going astray.  Even though GDB regression-test results via 
said gdbserver look reasonable despite the crashes.

 Yes, the Alpha issue mentioned above made impact here as well as I spent 
several weeks on it last and this year rather than the VAX unwinder as I 
originally intended.

 FWIW,

  Maciej

Follow-Ups:
- Re: __atomic_test_and_set() and mips o32 - help wanted
  - From: Jason Thorpe

References:
- Re: __atomic_test_and_set() and mips o32 - help wanted
  - From: Simon Burge
- Re: __atomic_test_and_set() and mips o32 - help wanted
  - From: Maciej W. Rozycki
- Re: __atomic_test_and_set() and mips o32 - help wanted
  - From: Mouse
- Re: __atomic_test_and_set() and mips o32 - help wanted
  - From: Maciej W. Rozycki
- Re: __atomic_test_and_set() and mips o32 - help wanted
  - From: Jason Thorpe

Prev by Date: scsi timeouts on sgimips indy + zuluscsi
Next by Date: Re: scsi timeouts on sgimips indy + zuluscsi
Previous by Thread: Re: __atomic_test_and_set() and mips o32 - help wanted
Next by Thread: Re: __atomic_test_and_set() and mips o32 - help wanted
Indexes:

Home | Main Index | Thread Index | Old Index