NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/56850: system locks up with NFS root & swap on mvgbe(4)



>Number:         56850
>Category:       kern
>Synopsis:       system locks up with NFS root & swap on mvgbe(4)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 23 02:00:01 +0000 2022
>Originator:     Rin Okuyama
>Release:        9.99.96
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD obsa6 9.99.96 NetBSD 9.99.96 (OBSA6_BE) #23: Mon May 23 00:06:35 JST 2022  rin@latipes:/build/src/sys/arch/evbarm/compile/OBSA6_BE evbarm
>Description:
* Summary

The system eventually locks up with NFS root/swap on mvgbe(4).

This is probably due to software or hardware bugs of mvgbe(4), but
at the same time, I suspect that our NFS client may be fragile for
packet loss or other problems for NICs.

* Details

The failure occurs on ARM9E-based Marvell SoCs:

- KUROBOX_PRO: https://dmesgd.nycbug.org/index.cgi?do=view&id=6594
- OPENBLOCKS_A6: https://dmesgd.nycbug.org/index.cgi?do=view&id=6595

both in little- and big-endian mode.

With NFS root/swap on mvgbe(4), the system eventually locks up under
heavy I/O while building some pkgsrc's. Once the failure occurs, the
system does not respond to anything but input from serial console.

Then, I observe that many processes sleep at "nfsrecv":

https://gist.github.com/rokuyama/228f7afe67ffa8fe8024eb10bc2f14a1

The problem seems to be significantly mitigated by using UDP, but
it is not perfect; the failure occurs ~ every few hours for TCP,
while it does ~ every day for UDP.

For a similar generation armv5-based machine but with wm(4):

- HDL_G: https://dmesgd.nycbug.org/index.cgi?do=view&id=6139

I've never observed a similar failure.

Therefore, there should be bugs in mvgbe(4), or hardware problems.

However, at the same time, I wonder whether we can improve NFS or
socket layers in kernel; even if some packets are unexpectedly lost,
NFS routines should not sleep forever.
>How-To-Repeat:
Build some pkgsrc's with NFS root/swap on mvgbe(4).
>Fix:
N/A



Home | Main Index | Thread Index | Old Index