tech-pkg archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Can't build lang/openjdk{17,21} on NetBSD/aarch64 (Apple M3)
On 7/24/25 18:19, Nick Hudson wrote:
I guess the code generation isn't sync'd correctly and the icache is stale
The Arm ARM has this example.
; Coherency example for data and instruction accesses within the same
; Inner Shareable domain.
; Enter this code with ‹Wt> containing a new 32-bit instruction,
; to be held in Cacheable space at a location pointed to by Xn.
STR Wt, [Xn]
DC CVAU, Xn ; Clean data cache by VA to point of unification (PoU)
DSB ISH ; Ensure visibility of the data cleaned from cache
IC IVAU, Xn ; Invalidate instruction cache by VA to PoU
DSB ISH ; Ensure completion of the invalidations
ISB ; Synchronize the fetched instruction stream
Thanks for your insight. I thought that was plausible so I investigated
further. Turned out the only difference between openjdk8 (which works)
and openjdk17 (which doesn't) wrt icache was that openjdk8 directly
calls __clear_cache() from libgcc while openjdk17 uses
__builtin___clear_cache().
hotspot/src/cpu/aarch64/vm/icache_aarch64.hpp from OpenJDK 8 does this:
class ICache : public AbstractICache {
public:
static void initialize();
static void invalidate_word(address addr) {
__clear_cache((char *)addr, (char *)(addr + 3));
}
static void invalidate_range(address start, int nbytes) {
__clear_cache((char *)start, (char *)(start + nbytes));
}
};
src/hotspot/os_cpu/bsd_aarch64/icache_bsd_aarch64.hpp from OpenJDK 17
does this:
class ICache : public AbstractICache {
public:
static void initialize();
static void invalidate_word(address addr) {
__builtin___clear_cache((char *)addr, (char *)(addr + 4));
}
static void invalidate_range(address start, int nbytes) {
__builtin___clear_cache((char *)start, (char *)(start + nbytes));
}
};
However, running "objdump --disassemble" against both bootkits revealed
something very surprising. In OpenJDK 17, the call to invalidate_range()
was inlined at its call site AbstractAssembler::flush():
void AbstractAssembler::flush() {
ICache::invalidate_range(addr_at(0), offset());
}
0000000000385738 <_ZN17AbstractAssembler5flushEv>:
385738: f9400400 ldr x0, [x0, #8]
38573c: f9400002 ldr x2, [x0]
385740: f9400801 ldr x1, [x0, #16]
385744: aa0203e0 mov x0, x2
385748: cb020021 sub x1, x1, x2
38574c: 8b21c041 add x1, x2, w1, sxtw
385750: 17fc18a8 b 28b9f0 <__clear_cache@plt>
385754: d503201f nop
But in OpenJDK 8 the function was a nop, and there were no references to
__clear_cache@plt from libjvm.so at all!!
000000000008e250 <_ZN17AbstractAssembler5flushEv>:
8e250: d65f03c0 ret
8e254: d503201f nop
This suggests lang/openjdk8 doesn't do JIT, which turned out to be the case:
% java -version
openjdk version "1.8.0_452-internal"
OpenJDK Runtime Environment (build 1.8.0_452-internal-pkgsrc_1.8.452-b00)
OpenJDK 64-Bit Zero VM (build 25.452-b00, interpreted mode)
While searching the Internet I found something interesting. It appears
FreeBSD on Apple Silicon suffers from the same issue, presumably due to
the fact that this CPU enforces W^X at the hardware level no matter what
kernels do: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265284
It seems OpenJDK folks applied a macOS specific workaround for
macOS/aarch64 to avoid mapping pages with W+X, but did nothing about
other OSes. This means, until they rewrite their Hotspot VM to
universally stop doing W+X, we can only use Zero VM on Apple Silicon chips.
But unfortunately it is not possible to choose Zero VM at runtime. It's
only a configure-time option --with-jvm-variant=zero. Also, it wouldn't
be wise to disable JIT on all aarch64 platforms just for the sake of
Apple Silicon. And now I don't know what to do, aside from diving deeply
into the codebase and patching the VM to adhere to today's standard of
W^X...
Home |
Main Index |
Thread Index |
Old Index