Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Diagnosing hard lockups in NetBSD 8



Hi, all,

NetBSD 8 came out so I figured I'd update my systems. I have two Jetson TK1 systems installed remotely with serial consoles. They have been running with only a tiny number of problems over the past almost two years.

The upgrades went smoothly, but then the busier machine started locking up hard (no network, no response to serial). Attempts to get in to the debugger failed. Sending a break basically gives:

~Stopped in pid 0.32 (system) at netbsd:cpu_Debugger+0x4:

and the console remains completely unresponsive. The systems were last updated towards the end of January.

One time out of half a dozen I did get this:

cprng 222 147 12: failed statistical RNG test
panic: kernel diagnostic assertion "(softints != 0) == ((ci->ci_softints >> opl)
 != 0)" failed: file "/usr/src/sys/arch/arm/arm32/arm32_machdep.c", line 646
cpu0: Begin traceback...
0x9ce89d5c: netbsd:db_panic+0xc
0x9ce89d74: netbsd:vpanic+0x1b4
0x9ce89d8c: netbsd:__udivmoddi4
0x9ce89dcc: netbsd:dosoftints+0x164
0x9ce89dfc: netbsd:splx+0xc8
0x9ce89e84: netbsd:sosend+0x210
0x9ce89eac: netbsd:soo_write+0x40
0x9ce89f04: netbsd:dofilewrite+0x90
0x9ce89f34: netbsd:sys_write+0x70
0x9ce89fac: netbsd:syscall+0x104
cpu0: End traceback...

dumping to dev 16,1 offset 4194495

(and it locked up after that)

I tried updating tegra124-jetson-tk1.dtb, but that didn't make any change one way or the other.

Finally, I tried reverting the busier machine to the kernel compiled on 29-January-2018, and it still locks up hard, which made me think this may've been a hardware issue, so I began migrating all work to the other Jetson. However, now the second Jetson is exhibiting the same problem :P

Now I plan to revert one to a kernel and userland compiled from 29-January-2018 with sysctls ddb.onpanic=1 and ddb.commandonenter="bt;reboot 0x4", unless anyone else has better ideas about how to not end up with an unresponsive system that needs to be physically power cycled.

Below are all the changes between my kernels and the default TEGRA kernel:

< config                netbsd          root on wd0a type ?
---
config                netbsd          root on ? type ?
167,181d166
< pseudo-device cgd             # cryptographic disk devices
<
< options       IPFILTER_LOG    # ipmon(8) log support
< options       IPFILTER_LOOKUP # ippool(8) support
< options       IPFILTER_COMPAT # Compat for IP-Filter
< #options      IPFILTER_DEFAULT_BLOCK  # block all packets by default
< pseudo-device ipfilter        # IP filter (firewall) and NAT
<
< # accept filters
< pseudo-device accf_data       # "dataready" accept filter
< pseudo-device accf_http       # "httpready" accept filter
<
< options       QUOTA2          # new, in-filesystem UFS quotas
< options       GATEWAY         # packet forwarding


Thoughts? Suggestions? How would I begin to diagnose this if I can't get in to the debugger?

Thanks,
John


Home | Main Index | Thread Index | Old Index