NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/53624 (dom0 freeze on domU exit) is still there



The following reply was made to PR kern/53624; it has been noted by GNATS.

From: Manuel.Bouyer%lip6.fr@localhost
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: kern/53624 (dom0 freeze on domU exit) is still there
Date: Wed, 18 Sep 2019 16:54:56 +0200 (MEST)

 >Submitter-Id:	net
 >Originator:	Manuel Bouyer
 >Organization:
 >Confidential:	no 
 >Synopsis:	kern/53624 (dom0 freeze on domU exit) is still there
 >Severity:	serious
 >Priority:	high
 >Category:	kern
 >Class:		sw-bug
 >Release:	NetBSD 8.1_STABLE
 >Environment:
 System: NetBSD xen1.soc.lip6.fr 8.1_STABLE NetBSD 8.1_STABLE (ADMIN_DOM0) #0: Tue Sep 17 15:47:43 MEST 2019 bouyer%armandeche.soc.lip6.fr@localhost:/local/armandeche1/tmp/build/amd64/obj/local/armandeche1/netbsd-8/src/sys/arch/amd64/compile/ADMIN_DOM0 x86_64
 Architecture: x86_64
 Machine: amd64
 >Description:
 	On my testbed, which starts/destroys several domUs per day (eventually
 	in parallel), I see occasional filesystem hang with processes
 	waiting on fstchg.
 	Interesting processes are:
 0      105 3   0       200   ffffa0000213e5a0               vnd1 fstchg
 0      104 3   0       200   ffffa00002088160               vnd0 vndbp
 0       97 3   0       200   ffffa0000206a980               vnd3 vndbp
 0       96 3   0       200   ffffa0000105a280               vnd2 fstchg
 0       67 3   0       200   ffffa00000d73640            ioflush fstchg
 6533     1 3   0         0   ffffa00001f77080           vnconfig biowait
 25777    1 3   0        80   ffffa00001e5f480           vnconfig fstcnt
 
 db> tr/a ffffa0000213e5a0
 trace: pid 0 lid 105 at 0xffffa0002cffd4f0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 fstrans_start() at netbsd:fstrans_start+0x78e
 VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
 genfs_getpages() at netbsd:genfs_getpages+0x1344
 VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x4b
 ubc_fault() at netbsd:ubc_fault+0x188
 uvm_fault_internal() at netbsd:uvm_fault_internal+0x6d4
 trap() at netbsd:trap+0x3c1
 --- trap (number 6) ---
 kcopy() at netbsd:kcopy+0x15
 uiomove() at netbsd:uiomove+0xb9  
 ubc_uiomove() at netbsd:ubc_uiomove+0xf7
 ffs_read() at netbsd:ffs_read+0xf7
 VOP_READ() at netbsd:VOP_READ+0x33
 vn_rdwr() at netbsd:vn_rdwr+0x10c
 vndthread() at netbsd:vndthread+0x5b1
 
 db>  tr/a ffffa0000105a280       
 trace: pid 0 lid 96 at 0xffffa0002cf4d9c0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 fstrans_start() at netbsd:fstrans_start+0x78e
 VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
 genfs_do_io() at netbsd:genfs_do_io+0x1b4
 genfs_gop_write() at netbsd:genfs_gop_write+0x52
 genfs_do_putpages() at netbsd:genfs_do_putpages+0xb9c
 VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x36
 vndthread() at netbsd:vndthread+0x683
 
 db> tr/a ffffa00000d73640
 trace: pid 0 lid 67 at 0xffffa0002cd48ca0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 fstrans_start() at netbsd:fstrans_start+0x78e
 VOP_BWRITE() at netbsd:VOP_BWRITE+0x42
 ffs_sbupdate() at netbsd:ffs_sbupdate+0xc3
 ffs_cgupdate() at netbsd:ffs_cgupdate+0x20
 ffs_sync() at netbsd:ffs_sync+0x1e9
 sched_sync() at netbsd:sched_sync+0x93
 
 db> tr/a ffffa00001f77080
 trace: pid 6533 lid 1 at 0xffffa0002cff8910
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 biowait() at netbsd:biowait+0x4f
 scan_iso_vrs_session() at netbsd:scan_iso_vrs_session+0x60
 readdisklabel() at netbsd:readdisklabel+0x304
 vndopen() at netbsd:vndopen+0x305
 spec_open() at netbsd:spec_open+0x385
 VOP_OPEN() at netbsd:VOP_OPEN+0x2f
 vn_open() at netbsd:vn_open+0x1e9
 do_open() at netbsd:do_open+0x112
 do_sys_openat() at netbsd:do_sys_openat+0x68
 sys_open() at netbsd:sys_open+0x24
 syscall() at netbsd:syscall+0x9c
 db> tr/a ffffa00001e5f480
 trace: pid 25777 lid 1 at 0xffffa0002b358860
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait_sig() at netbsd:cv_wait_sig+0xf4
 fstrans_setstate() at netbsd:fstrans_setstate+0xaa
 genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
 vfs_suspend() at netbsd:vfs_suspend+0x5b
 vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
 vrevoke() at netbsd:vrevoke+0x2b
 genfs_revoke() at netbsd:genfs_revoke+0x13
 VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
 vdevgone() at netbsd:vdevgone+0x5a
 vnddoclear() at netbsd:vnddoclear+0xc6
 vndioctl() at netbsd:vndioctl+0x3bb
 VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
 vn_ioctl() at netbsd:vn_ioctl+0xa6
 sys_ioctl() at netbsd:sys_ioctl+0x101
 syscall() at netbsd:syscall+0x9c
 
 db> call fstrans_dump
 Fstrans locks by lwp:
 6533.1   (/) shared 1 cow 0
 0.105    (/domains) lazy 3 cow 0
 0.96     (/domains) lazy 2 cow 0
 0.67     (/domains) shared 1 cow 0
 Fstrans state by mount:
 /                state suspending
 
 So it looks like we have a 3-way deadlock between ioflush and the two vnconfig
 threads (while kern/53624 was only between 2 vnconfig threads) but I can't
 see the exact scenario yet. Also, the files backing the vnd are in
 /domains, not in /
 
 WAPBL is configured in the kernel but not in use.
 
 
 >How-To-Repeat:
 	xl create/shutdown several domUs in parallel
 >Fix:
 	please ...
 


Home | Main Index | Thread Index | Old Index