tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Can't build lang/openjdk{17,21} on NetBSD/aarch64 (Apple M3)



On 7/24/25 18:19, Nick Hudson wrote:
I guess the code generation isn't sync'd correctly and the icache is stale

The Arm ARM has this example.

; Coherency example for data and instruction accesses within the same
; Inner Shareable domain.
; Enter this code with ‹Wt> containing a new 32-bit instruction,
; to be held in Cacheable space at a location pointed to by Xn.

STR Wt, [Xn]
DC CVAU, Xn    ; Clean data cache by VA to point of unification (PoU)
DSB ISH        ; Ensure visibility of the data cleaned from cache
IC IVAU, Xn    ; Invalidate instruction cache by VA to PoU
DSB ISH        ; Ensure completion of the invalidations
ISB        ; Synchronize the fetched instruction stream

Thanks for your insight. I thought that was plausible so I investigated further. Turned out the only difference between openjdk8 (which works) and openjdk17 (which doesn't) wrt icache was that openjdk8 directly calls __clear_cache() from libgcc while openjdk17 uses __builtin___clear_cache().

hotspot/src/cpu/aarch64/vm/icache_aarch64.hpp from OpenJDK 8 does this:

class ICache : public AbstractICache {
 public:
  static void initialize();
  static void invalidate_word(address addr) {
    __clear_cache((char *)addr, (char *)(addr + 3));
  }
  static void invalidate_range(address start, int nbytes) {
    __clear_cache((char *)start, (char *)(start + nbytes));
  }
};

src/hotspot/os_cpu/bsd_aarch64/icache_bsd_aarch64.hpp from OpenJDK 17 does this:

class ICache : public AbstractICache {
 public:
  static void initialize();
  static void invalidate_word(address addr) {
    __builtin___clear_cache((char *)addr, (char *)(addr + 4));
  }
  static void invalidate_range(address start, int nbytes) {
    __builtin___clear_cache((char *)start, (char *)(start + nbytes));
  }
};

However, running "objdump --disassemble" against both bootkits revealed something very surprising. In OpenJDK 17, the call to invalidate_range() was inlined at its call site AbstractAssembler::flush():

void AbstractAssembler::flush() {
  ICache::invalidate_range(addr_at(0), offset());
}

0000000000385738 <_ZN17AbstractAssembler5flushEv>:
  385738:       f9400400        ldr     x0, [x0, #8]
  38573c:       f9400002        ldr     x2, [x0]
  385740:       f9400801        ldr     x1, [x0, #16]
  385744:       aa0203e0        mov     x0, x2
  385748:       cb020021        sub     x1, x1, x2
  38574c:       8b21c041        add     x1, x2, w1, sxtw
  385750:       17fc18a8        b       28b9f0 <__clear_cache@plt>
  385754:       d503201f        nop

But in OpenJDK 8 the function was a nop, and there were no references to __clear_cache@plt from libjvm.so at all!!

000000000008e250 <_ZN17AbstractAssembler5flushEv>:
   8e250:       d65f03c0        ret
   8e254:       d503201f        nop

This suggests lang/openjdk8 doesn't do JIT, which turned out to be the case:

% java -version
openjdk version "1.8.0_452-internal"
OpenJDK Runtime Environment (build 1.8.0_452-internal-pkgsrc_1.8.452-b00)
OpenJDK 64-Bit Zero VM (build 25.452-b00, interpreted mode)

While searching the Internet I found something interesting. It appears FreeBSD on Apple Silicon suffers from the same issue, presumably due to the fact that this CPU enforces W^X at the hardware level no matter what kernels do: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265284

It seems OpenJDK folks applied a macOS specific workaround for macOS/aarch64 to avoid mapping pages with W+X, but did nothing about other OSes. This means, until they rewrite their Hotspot VM to universally stop doing W+X, we can only use Zero VM on Apple Silicon chips.

But unfortunately it is not possible to choose Zero VM at runtime. It's only a configure-time option --with-jvm-variant=zero. Also, it wouldn't be wise to disable JIT on all aarch64 platforms just for the sake of Apple Silicon. And now I don't know what to do, aside from diving deeply into the codebase and patching the VM to adhere to today's standard of W^X...

Home | Main Index | Thread Index | Old Index