tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Xen dom0 freeze after domU exits (was Re: Zombie kernel thread)
Following up myself
It took me a while to understand the whole story. Here it is with
a fix proposal.
The problem arises when terminating a domU with two block devices
backed by regular files on the root filesystem. In such a case,
there is a race condition between the vnd device disposal.
Here are the relevant processes at freeze time:
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1880 1 3 0 84 ffffa000010a0a20 vnconfig fstcnt
1711 1 3 0 c ffffa000028d48c0 vnconfig biowait
0 103 3 0 200 ffffa00001c4a300 vnd0 fstchg
fstrans_dump output:
Fstrans locks by lwp:
1711.1 (/) shared 1 cow 0
Fstrans state by mount:
/ state suspended
vconfig 1711.1 waits for an I/O to complete:
sleepq_block/cv_wait/biowait/convertdisklabel/validate_label/readdisklabel/vndopen/spec_open/VOP_OPEN/vn_open/do_open/do_sys_openat/sys_open/syscall
This I/O should be done by kernel thread vnd0 0.103, which waits for
filesystem resume on cv_wait(&fstrans_state_cv, &fstrans_lock)
sleepq_block/cv_wait/fstrans_start/genfs_do_putpages/VOP_PUTPAGES/vndthread
The process that suspended filesystem is vnconfig 1880.1 through vrevoke.
It is itself waiting for vconfig 1711.1 to finish its transaction, on
cv_wait_sig(&fstrans_count_cv, &fstrans_lock):
sleepq_block/cv_wait_sig/fstrans_setstate/genfs_suspendctl/vfs_suspend/vrevoke_suspend_next.part.1/vrevoke/genfs_revoke/VOP_REVOKE/vdevgone/vnddoclear/vndioctl/VOP_IOCTL/vn_ioctl/sys_ioctl/syscall
The root of the problem seems to wait forever on fstrans_count_cv in
strans_setstate(). As condvar(9) notes, "Non-interruptable waits have
the potential to deadlock the system". This wait is interruptible,
but most processes in the system end up waiting in fstrans_start(),
just like vnd0 0.103 does, and it quickly becomes impossible to kill
a process.
Here is a proposal to fix the problem: use cv_timewait_sig() instead
of cv_wait_sig(). Opinions?
--- sys/kern/vfs_trans.c.orig
+++ sys/kern/vfs_trans.c
@@ -41,8 +41,9 @@
#endif
#include <sys/param.h>
#include <sys/systm.h>
+#include <sys/kernel.h>
#include <sys/atomic.h>
#include <sys/buf.h>
#include <sys/kmem.h>
#include <sys/mount.h>
@@ -531,12 +532,16 @@
/*
* All threads see the new state now.
* Wait for transactions invalid at this state to leave.
+ * We cannot wait forever because many processes would
+ * get stuck waiting for fstcnt in fstrans_start(). This
+ * is acute when suspending the root filesystem.
*/
error = 0;
while (! state_change_done(mp)) {
- error = cv_wait_sig(&fstrans_count_cv, &fstrans_lock);
+ error = cv_timedwait_sig(&fstrans_count_cv,
+ &fstrans_lock, hz / 4);
if (error) {
new_state = fmi->fmi_state = FSTRANS_NORMAL;
break;
}
--
Emmanuel Dreyfus
manu%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index