tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Xen dom0 freeze after domU exits (was Re: Zombie kernel thread)



Following up myself

It took me a while to understand the whole story. Here it is with
a fix proposal.

The problem arises when terminating a domU with two block devices
backed by regular files on the root filesystem. In such a case,
there is a race condition between the vnd device disposal.

Here are the relevant processes at freeze time:
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
1880     1 3   0        84   ffffa000010a0a20           vnconfig fstcnt
1711     1 3   0         c   ffffa000028d48c0           vnconfig biowait
0      103 3   0       200   ffffa00001c4a300               vnd0 fstchg

fstrans_dump output:

Fstrans locks by lwp:
1711.1   (/) shared 1 cow 0
Fstrans state by mount:
/                state suspended

vconfig 1711.1 waits for an I/O to complete:
sleepq_block/cv_wait/biowait/convertdisklabel/validate_label/readdisklabel/vndopen/spec_open/VOP_OPEN/vn_open/do_open/do_sys_openat/sys_open/syscall

This I/O should be done by kernel thread vnd0 0.103, which waits for
filesystem resume on  cv_wait(&fstrans_state_cv, &fstrans_lock)
sleepq_block/cv_wait/fstrans_start/genfs_do_putpages/VOP_PUTPAGES/vndthread

The process that suspended filesystem is vnconfig 1880.1 through vrevoke.
It is itself waiting for vconfig 1711.1 to finish its transaction, on
 cv_wait_sig(&fstrans_count_cv, &fstrans_lock):
sleepq_block/cv_wait_sig/fstrans_setstate/genfs_suspendctl/vfs_suspend/vrevoke_suspend_next.part.1/vrevoke/genfs_revoke/VOP_REVOKE/vdevgone/vnddoclear/vndioctl/VOP_IOCTL/vn_ioctl/sys_ioctl/syscall

The root of the problem seems to wait forever on fstrans_count_cv in
strans_setstate(). As condvar(9) notes, "Non-interruptable waits have 
the potential to deadlock the system". This wait is interruptible, 
but most processes in the system end up waiting in fstrans_start(),
just like vnd0 0.103 does, and it quickly becomes impossible to kill 
a process.

Here is a proposal to fix the problem: use cv_timewait_sig() instead
of cv_wait_sig(). Opinions?

--- sys/kern/vfs_trans.c.orig
+++ sys/kern/vfs_trans.c
@@ -41,8 +41,9 @@
 #endif
 
 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/kernel.h>
 #include <sys/atomic.h>
 #include <sys/buf.h>
 #include <sys/kmem.h>
 #include <sys/mount.h>
@@ -531,12 +532,16 @@
 
        /*
         * All threads see the new state now.
         * Wait for transactions invalid at this state to leave.
+        * We cannot wait forever because many processes would
+        * get stuck waiting for fstcnt in fstrans_start(). This
+        * is acute when suspending the root filesystem.
         */
        error = 0;
        while (! state_change_done(mp)) {
-               error = cv_wait_sig(&fstrans_count_cv, &fstrans_lock);
+               error = cv_timedwait_sig(&fstrans_count_cv,
+                                        &fstrans_lock, hz / 4);
                if (error) {
                        new_state = fmi->fmi_state = FSTRANS_NORMAL;
                        break;
                }



-- 
Emmanuel Dreyfus
manu%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index