So I've built a kernel natively, both with the tools from the
(cross-built) base system and also with the natively built tools
(using the inelegant gcc build workaround that Kalvis kindly sent me).
Turns out that no matter which toolchain I used, the natively built
kernel seems to work at first, but when the system is put under load it
quickly panics due to filesystem inconsistencies.
So I cross-built the same kernel on amd64, disassembled all the object
files generated by both builds, and ran diff. There was really only one
thing standing out, in ffs_truncate():
@@ -1958,9 +1958,9 @@
      1247:	7c 44 c3 d0 	clrd 0xd0(r3)[r4]
      124b:	00
  			lastiblock[level] = -1;
-    124c:	7d 8f ff ff 	movq $0xffffffffffffffff,0xfffffedc(fp)[r4]
-    1250:	ff ff ff ff
-    1254:	ff ff 44 cd
+    124c:	7d 8f ff ff 	movq $0x00000000ffffffff,0xfffffedc(fp)[r4]
+    1250:	ff ff 00 00
+    1254:	00 00 44 cd
      1258:	dc fe
  		blks[i] = DIP(oip, db[i]);
      125a:	d0 a6 14 50 	movl 0x14(r6),r0
Looking at the assembly that gcc generated, it's the same when native or
cross-built:
         clrq 208(%r3)[%r4]
.LM720:
.LM721:
.LM722:
         movq $-1,-292(%fp)[%r4]
.LM723:
         movl 20(%r6),%r0
So it's all the assembler's fault, I guess? Attached is a minimal test
case, in case anyone wants to take a closer look.
Hans