NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)



On 2025/04/19 4:04, riastradh%NetBSD.org@localhost wrote:
Synopsis: Multiple segfaults in erlite3 boot

State-Changed-From-To: open->feedback
State-Changed-By: riastradh%NetBSD.org@localhost
State-Changed-When: Fri, 18 Apr 2025 19:04:59 +0000
State-Changed-Why:
This is probably the the same CN50xx bug that we have been puzzling
over in PR port-mips/59064: jemalloc switch to 5.3 broke userland
<https://gnats.NetBSD.org/59064>.

Can you try the patch at the bottom of this message?

https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html

Thank you very much for working on this problem!

However, unfortunately, even with your patch, erlite3 cannot boot
into multiuser mode, both for n64 and n32 userlands:
https://gist.github.com/rokuyama/7bbe1619e55e8e3aba5bf3b112a23725

On the other hand, MIPSSIM64 kernel on QEMU successfully boots into
multiuser mode.

In the above-mentioned log, debug printf is enabled for trap():
```
diff --git a/sys/arch/mips/mips/trap.c b/sys/arch/mips/mips/trap.c
index 58caf19e2d2..a079dec91dd 100644
--- a/sys/arch/mips/mips/trap.c
+++ b/sys/arch/mips/mips/trap.c
@@ -448,8 +448,8 @@ trap(uint32_t status, uint32_t cause, vaddr_t vaddr, vaddr_t pc,
 		rv = uvm_fault(map, va, ftype);
 		pcb->pcb_onfault = onfault;

-#if defined(VMFAULT_TRACE)
-		if (!KERNLAND_P(va))
+#if defined(VMFAULT_TRACE) || 1
+		if (!KERNLAND_P(va) && rv != 0)
 			printf(
 			    "uvm_fault(%p (pmap %p), %#"PRIxVADDR
 			    " (%"PRIxVADDR"), %d) -> %d at pc %#"PRIxVADDR"\n",
```

You can see SEGVs are caused by read access to NULL:
```
[ 13.3599689] uvm_fault(0x980000041f9c0c00 (pmap 0x980000041fce44d0), 0 (0), 1) -> 14 at pc 0xfff83b1db4 [1] Segmentation fault (core dumped) /sbin/ifconfig lo0 inet6 >/dev/null 2>&1
...
[ 19.5399661] uvm_fault(0x980000041f20c800 (pmap 0x980000041fce44d0), 0 (0), 1) -> 14 at pc 0xfff8391db4 [1] Segmentation fault (core dumped) awk "/^sendmail[ \t]/{print\$2}" /etc/mailer.conf
```

As you pointed out earlier, SEGVs can be avoided by replacing
`user_reserved_insn` with `user_gen_exception`, i.e.:
https://gist.github.com/rokuyama/c7a50b8e7a62dc25f3f536f1434eea9b

By grep'ping into Linux codes, I've found they check TLB entry
for PC before fetching it:
https://github.com/torvalds/linux/commit/5b10496b6e65#diff-bbe4c1a54ce7bd13e6109d887383993c3b5276a1362f84092e9ef31dc84064d9R390

and our `user_gen_exception` path uses copyin(9), of course.

I don't know ~anything for mips, and much more destructive results
may happen for this "double-fault scenario", although...

Thanks,
rin

If you open one of the core dumps in gdb (if you are able to do that
from another machine where everything isn't segfaulting all the time,
e.g. if the core dump is written to nfs) and do `x/i $pc' and `bt', I
bet you will find it in malloc_default (via some stack trace through
jemalloc) at this instruction:

00008a58 <malloc_default>:
malloc_default():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
     8a58:       27bdff70        addiu   sp,sp,-144
     8a5c:       ffbc0078        sd      gp,120(sp)
     8a60:       3c1c0000        lui     gp,0x0
                         8a60: R_MIPS_GPREL16    malloc_default
                         8a60: R_MIPS_SUB        *ABS*
                         8a60: R_MIPS_HI16       *ABS*
     8a64:       0399e021        addu    gp,gp,t9
     8a68:       279c0000        addiu   gp,gp,0
                         8a68: R_MIPS_GPREL16    malloc_default
                         8a68: R_MIPS_SUB        *ABS*
                         8a68: R_MIPS_LO16       *ABS*
tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
     8a6c:       8f820000        lw      v0,0(gp)
                         8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
     8a70:       7c03e83b        0x7c03e83b
malloc_default():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
     8a74:       ffb10040        sd      s1,64(sp)
     8a78:       ffb00038        sd      s0,56(sp)
tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
     8a7c:       00433021        addu    a2,v0,v1
malloc_default():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
     8a80:       ffbf0088        sd      ra,136(sp)
     8a84:       ffbe0080        sd      s8,128(sp)
     8a88:       ffb70070        sd      s7,112(sp)
     8a8c:       ffb60068        sd      s6,104(sp)
     8a90:       ffb50060        sd      s5,96(sp)
     8a94:       ffb40058        sd      s4,88(sp)
     8a98:       ffb30050        sd      s3,80(sp)
     8a9c:       ffb20048        sd      s2,72(sp)
tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:422
  => 8aa0:       90c30258        lbu     v1,600(a2)

And I bet you will find that $v0 holds the address malloc_default+0x18,
i.e., the pc of this instruction:

tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
     8a6c:       8f820000        lw      v0,0(gp)
                         8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
  => 8a70:       7c03e83b        0x7c03e83b

The instruction 0x7c03e83b is sometimes also written

	rdhwr	$3,$29

or

	rdhwr	v1,ulr

but it is architecturally undefined so it traps to the kernel to
emulate, and the kernel is supposed to return the thread's tcb pointer
in v1.

But as a side effect, the emulation clobbers the register v0 with the
address of the excepting instruction, rather than leaving it as the
value it found at -1234(gp) (or whatever; written as 0(gp) above, but
the linker will replace it by some probably-nonzero number; you can use
`objdump --disassemble=malloc_default libc.so' to find it), which is
decidedly not the instruction address malloc_default+0x18 but rather
some tls offset that is reasonable to add to the tcb pointer.

Now, the emulation routine
https://nxr.netbsd.org/xref/src/sys/arch/mips/mips/mipsX_subr.S?r=1.115#1297
is not _supposed_ to clobber v0 -- it goes out of its way to save v0 on
the kernel stack and restore it before returning from the exception:

    1312 	/* Need two working registers */
    1313 	REG_S	AT, CALLFRAME_SIZ+TF_REG_AST(k0)
    1314 	REG_S	v0, CALLFRAME_SIZ+TF_REG_V0(k0)
...
    1349 	REG_L	AT, CALLFRAME_SIZ+TF_REG_AST(k0)# restore reg
    1350 	REG_L	v0, CALLFRAME_SIZ+TF_REG_V0(k0) # restore reg
    1351 	eret

But, in all my trials, it has been consistently corrupted in the same
way.  The best theory we have for why it is corrupted is cn50xx CPUs --
found in erlite3 (but not er4) -- have some kind of register-writeback
bug (which passes through some register renaming unchanged) provoked by
the particular combination of reading MIPS_COP_0_EXC_PC and eret so
that after the eret, the exception pc gets written back to v0 even
though we just restored v0 from the kernel stack.

So, all that said, here is a summary of the science we did on my
erlite3, together with a patch that seems to address the issue and --
under the theory that it is the register that we move MIPS_COP_0_EXC_PC
into -- will only corrupt a temporary register k0 which is not
accessible to userland and treated as garbage on any kernel entry
points, so it's safe:

https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html






Home | Main Index | Thread Index | Old Index