NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/53624: dom0 freeze on domU exit



>Number:         53624
>Category:       kern
>Synopsis:       dom0 freeze on domU exit
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 22 01:50:00 +0000 2018
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 8.0
>Organization:
NetBSD
>Environment:
NetBSD xmai 8.0_STABLE NetBSD 8.0_STABLE (XEN3_DOM0_NOAGP) #63: Fri Sep 21 16:37:10 CEST 2018  root@lego:/pkg_comp/NetBSD-8stable-amd64/src/sys/arch/amd64/compile/XEN3_DOM0_NOAGP amd64

>Description:
When shutting down a Xen domU that has two block devices backed by plain files, there it a race condition that can freeze the dom0.

Here are the relevant processes at freeze time:
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
1880     1 3   0        84   ffffa000010a0a20           vnconfig fstcnt
1711     1 3   0         c   ffffa000028d48c0           vnconfig biowait
0      103 3   0       200   ffffa00001c4a300               vnd0 fstchg

fstrans_dump output:

Fstrans locks by lwp:
1711.1   (/) shared 1 cow 0
Fstrans state by mount:
/                state suspended

vconfig 1711.1 waits for an I/O to complete:
sleepq_block/cv_wait/biowait/convertdisklabel/validate_label/readdisklabel/vndopen/spec_open/VOP_OPEN/vn_open/do_open/do_sys_openat/sys_open/syscall

This I/O should be done by kernel thread vnd0 0.103, which waits for
filesystem resume on  cv_wait(&fstrans_state_cv, &fstrans_lock)
sleepq_block/cv_wait/fstrans_start/genfs_do_putpages/VOP_PUTPAGES/vndthread

The process that suspended filesystem is vnconfig 1880.1 through vrevoke.
It is itself waiting for vconfig 1711.1 to finish its transaction, on
 cv_wait_sig(&fstrans_count_cv, &fstrans_lock):
sleepq_block/cv_wait_sig/fstrans_setstate/genfs_suspendctl/vfs_suspend/vrevoke_suspend_next.part.1/vrevoke/genfs_revoke/VOP_REVOKE/vdevgone/vnddoclear/vndioctl/VOP_IOCTL/vn_ioctl/sys_ioctl/syscall

Processes wait each others, we have a deadlock.
>How-To-Repeat:
Setup a domU with two block devices backed by file on the root filesystem, and create/shutdown it until dom0 freezes:
while true; do xl shutdown -w test ; xl create test ; sleep 60; done

>Fix:
The root of the problem seems to wait forever on fstrans_count_cv in
strans_setstate(). As condvar(9) notes, "Non-interruptable waits have 
the potential to deadlock the system". This wait is interruptible, 
but most processes in the system end up waiting in fstrans_start() because they attempt to do a filesystem access. Once sshd and getty are hit, it becomes impossible to login and kill a process.

Here is a proposal to fix the problem: use cv_timewait_sig() instead
of cv_wait_sig(). Since the original code allowed failure when catching a signal, failing because of a timeout is already correctly handled by calling functions.

--- sys/kern/vfs_trans.c.orig
+++ sys/kern/vfs_trans.c
@@ -41,8 +41,9 @@
 #endif
 
 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/kernel.h>
 #include <sys/atomic.h>
 #include <sys/buf.h>
 #include <sys/kmem.h>
 #include <sys/mount.h>
@@ -531,12 +532,16 @@
 
        /*
         * All threads see the new state now.
         * Wait for transactions invalid at this state to leave.
+        * We cannot wait forever because many processes would
+        * get stuck waiting for fstcnt in fstrans_start(). This
+        * is acute when suspending the root filesystem.
         */
        error = 0;
        while (! state_change_done(mp)) {
-               error = cv_wait_sig(&fstrans_count_cv, &fstrans_lock);
+               error = cv_timedwait_sig(&fstrans_count_cv,
+                                        &fstrans_lock, hz / 4);
                if (error) {
                        new_state = fmi->fmi_state = FSTRANS_NORMAL;
                        break;
                }



-- 
Emmanuel Dreyfus
manu%netbsd.org@localhost




Home | Main Index | Thread Index | Old Index