kern/50432: NFS client crashes or hangs kernel

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/50432: NFS client crashes or hangs kernel
From: rhialto%falu.nl@localhost
Date: Sun, 15 Nov 2015 21:10:01 +0000 (UTC)
>Number:         50432
>Category:       kern
>Synopsis:       NFS client crashes or hangs kernel
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 15 21:10:00 +0000 2015
>Originator:     Rhialto
>Release:        NetBSD 7.0
>Organization:
	
>Environment:
	
	
System: NetBSD vargaz.falu.nl 7.0 NetBSD 7.0 (GENERIC.201509250726Z) amd64
Architecture: x86_64
Machine: amd64
>Description:

Following a thread on current-users and tech-kern, I'm filing this PR so
that the issue doesn't get lost. No resolution was reached. I'm
summarizing the useful contents here.

(http://mail-index.netbsd.org/current-users/2015/10/19/msg028233.html)

NFS clients on NetBSD 7.0 (plain GENERIC kernel as distributed) seem to
have a problem with the queue of outstanding requests to the server
(nfs_reqq). As a result, either the kernel crashes because the queue is
traversed while it contains bogus pointers, or processes start to hang
on NFS I/O, because their I/O seemingly never completes. (The reply no
doubt was received, but could not be matched to an outstanding request
and was therefore dropped).

What I'm doing when this happens: I'm mounting a pkg_comp chroot which
is located on NFS. On top of it, I mount a local directory for temporary
files. But lots of NFS traffic still happens, mostly readonly, but
writing when it creates and installs the packages. The extra null mounts
that pkg_comp does may make things more fragile.

All mounts:

/dev/wd0a on / type ffs (local)
/dev/wd0f on /var type ffs (log, local)
/dev/wd0e on /usr type ffs (log, local)
/dev/wd0g on /home type ffs (log, local)
/dev/wd0h on /tmp type ffs (log, local)
kernfs on /kern type kernfs (local)
ptyfs on /dev/pts type ptyfs (local)
procfs on /proc type procfs (local)
procfs on /usr/pkg/emul/linux32/proc type procfs (read-only, local)
nfsserver:/mnt/vol1 on /mnt/vol1 type nfs
nfsserver:/mnt/scratch on /mnt/scratch type nfs
tmpfs on /var/shm type tmpfs (local)
/mnt/vol1/rhialto/cvs/src on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/src type null (read-only)
/mnt/vol1/rhialto/cvs/pkgsrc on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/pkgsrc type null (read-only)
/mnt/vol1/distfiles on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/distfiles type null
/mnt/scratch/scratch/packages.amd64-7.0 on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/packages type null
/home/rhialto/obj on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/obj/pkgsrc type null (local)

The problem did not go away when I changed the mount from UDP to TCP.

Here is some debugging I've done on this.

I re-tried and with an original GENERIC 7.0 kernel it happened again and
now I have a crash dump. Its dmesg ends with this:

nfs server 10.0.0.16:/mnt/scratch: not responding
nfs server 10.0.0.16:/mnt/scratch: is alive again
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff806b94b4 cs 8 rflags 10246 cr2 38 ilevel 2 rsp ff
fffe80b9fc1f28
curlwp 0xfffffe813fb39860 pid 0.5 lowest kstack 0xfffffe80b9fbf2c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
callout_softclock() at netbsd:callout_softclock+0x248
softint_dispatch() at netbsd:softint_dispatch+0x79
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe80b9fc1ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
cpu0: End traceback...

dumping to dev 0,1 (offset=199775, size=1023726):


pid 0.5 is this:

PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
0    >   5 7   0       200   fffffe813fb39860          softclk/0

gdb (without debugging symbols) so far thinks this is in nfs_timer():

(gdb) kvm proc 0xfffffe813fb39860
0xffffffff806b9aab in nfs_timer ()
(gdb) bt
#0  0xffffffff806b9aab in nfs_timer ()
#1  0x0000000000000000 in ?? ()

with a rebuilt netbsd.gdb (hopefully the addresses match)

(gdb) target kvm netbsd.5.core
0xffffffff8063d735 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:671
671                     dumpsys();
(gdb) bt
#0  0xffffffff8063d735 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:671
#1  0xffffffff80865182 in vpanic (fmt=0xffffffff80d123b2 "trap",
fmt@entry=0xffffffff80d123d2 "otection fault",
ap=ap@entry=0xfffffe80b9fc1d10) at ../../../../kern/subr_prf.c:340
#2  0xffffffff8086523d in panic (fmt=fmt@entry=0xffffffff80d123d2
"otection fault") at ../../../../kern/subr_prf.c:256
#3  0xffffffff808a84d6 in trap (frame=0xfffffe80b9fc1e30) at
../../../../arch/amd64/amd64/trap.c:298
#4  0xffffffff80100f46 in alltraps ()
#5  0xffffffff806b94b4 in nfs_sigintr (nmp=0x0, rep=0xfffffe81163730a8,
l=0x0) at ../../../../nfs/nfs_socket.c:871
#6  0xffffffff806b9b0e in nfs_timer (arg=<optimized out>) at
../../../../nfs/nfs_socket.c:752
#7  0xffffffff805e9458 in callout_softclock (v=<optimized out>) at
../../../../kern/kern_timeout.c:736
#8  0xffffffff805df84a in softint_execute (l=<optimized out>,
s=<optimized out>, si=<optimized out>) at
../../../../kern/kern_softint.c:589
#9  softint_dispatch (pinned=<optimized out>, s=2) at
../../../../kern/kern_softint.c:871
#10 0xffffffff8011402f in Xsoftintr ()

(gdb) kvm proc 0xfffffe813fb39860
nfs_timer (arg=<unavailable>) at ../../../../nfs/nfs_socket.c:735
735     {
(gdb) bt
#0  nfs_timer (arg=<unavailable>) at ../../../../nfs/nfs_socket.c:735
#1  0x0000000000000000 in ?? ()


nmp should not be NULL here... let's look at rep, where it comes from
via "nmp = rep->r_nmp;"

(gdb) print *(struct nfsreq *)0xfffffe81163730a8
$1 = {r_chain = {tqe_next = 0xfffffe811edcee40, tqe_prev = 0x1}, r_mreq = 0xffff8000028f9888, r_mrep = 0x0, r_md = 0x0, r_dpos = 0x0, r_nmp = 0x0, r_xid = 0, r_flags = 0, r_retry = 0, r_rexmit = 0, r_procnum = 0, r_rtt = 0, 
  r_lwp = 0x0}

well, r_chain.tqe_prev looks bogus (unless that's a special marker), so
let's look at tqe_next:

(gdb) print *((struct nfsreq *)0xfffffe81163730a8)->r_chain.tqe_next
$3 = {r_chain = {tqe_next = 0x0, tqe_prev = 0x15aa3c85d}, r_mreq = 0xbd83e8af8fe58282, r_mrep = 0x81e39981e3a781e3, r_md = 0xe39d81e38180e38c, r_dpos = 0x8890e5b4a0e5ae81, r_nmp = 0xe57baf81e3ab81e3, r_xid = 2179183259, 
  r_flags = -1565268289, r_retry = 0, r_rexmit = 0, r_procnum = 1520683101, r_rtt = 1, r_lwp = 0x80e39981e3a781e3}

well, even more bogus. Too bad that the next frame has its argument
optimized out...

This problem is very repeatable, usually within a few hours, but a few
times now it happened within half an hour.

It seems to me that somehow the nfs_reqq list gets corrupted. Then
either there is a crash when traversing it in nfs_timer() (occurring in
nfs_sigintr() due to being called with a bogus pointer), or there is a
hang when one of the NFS requests gets lost and never retried.

I tried it with a TCP mount for NFS. Still hangs, this time in a bit
under an hour of uptime.

So the cause is likely not packet loss.


Further looking into this by examining where nfs_reqq occurs:

nfs/nfs.h:extern TAILQ_HEAD(nfsreqhead, nfsreq) nfs_reqq;

nfs/nfs_clntsocket.c:         TAILQ_FOREACH(rep, &nfs_reqq, r_chain) {
nfs/nfs_clntsocket.c: TAILQ_INSERT_TAIL(&nfs_reqq, rep, r_chain);
nfs/nfs_clntsocket.c: TAILQ_REMOVE(&nfs_reqq, rep, r_chain);

Protected with

    s = splsoftnet();

for match #2 and #3 but #1 seems not protected by anything I can see
nearby. Maybe it is

    error = nfs_rcvlock(nmp, myrep);

if that makes any sense.
That function definitely does not use either splsoftnet() OR
mutex_enter(softnet_lock).

nfs/nfs_socket.c:struct nfsreqhead nfs_reqq;
nfs/nfs_socket.c:     TAILQ_FOREACH(rp, &nfs_reqq, r_chain) {
nfs/nfs_socket.c:     TAILQ_FOREACH(rep, &nfs_reqq, r_chain) {

match #3 is protected with

    mutex_enter(softnet_lock);	/* XXX PR 40491 */

but none of the others (visibly nearby).

#2 is called from nfs_receive() which uses nfs_sndlock() which also
doesn't use either splsoftnet() OR mutex_enter(softnet_lock).

nfs/nfs_subs.c:       TAILQ_INIT(&nfs_reqq);

presumably doesn't need any extra protection.

softnet_lock is allocated as

./kern/uipc_socket.c:kmutex_t   *softnet_lock;
./kern/uipc_socket.c:   softnet_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);

IPL_NONE seems inconsistent with splsoftnet().

I never studied the inner details of kernel locking, but the diversity
of protections of this list doesn't inspire trust at first sight.

===

And this is how far it got so far.

>How-To-Repeat:
	Use pkg_comp with the chroot directory on an NFS-mounted volume.
	(see above for more details).
>Fix:
	Not known.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl    -- 'this bath is too hot.'

>Unformatted:
Prev by Date: Re: port-sparc64/32999 (sparc64 uses 32bit VM limits, even in 64 bit mode)
Next by Date: re: kern/50430: syscall_disestablish() can remove active syscalls
Previous by Thread: Re: port-sparc64/32999 (sparc64 uses 32bit VM limits, even in 64 bit mode)
Next by Thread: PR/49560 CVS commit: src/sys/external/bsd/drm2/linux
Indexes:
Home | Main Index | Thread Index | Old Index