NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/53385: vnconfig deadlock on fstchg
>Number: 53385
>Category: kern
>Synopsis: vnconfig deadlock on fstchg
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jun 19 20:55:01 +0000 2018
>Originator: Manuel Bouyer
>Release: NetBSD 8.0_RC1
>Organization:
>Environment:
System: NetBSD admin2-dom0.lip6.fr 8.0_RC1 NetBSD 8.0_RC1 (ADMIN_DOM0) #1: Mon Jun 11 11:32:45 MEST 2018 bouyer%armandeche.soc.lip6.fr@localhost:/local/armandeche1/tmp/build/amd64/obj/local/armandeche1/netbsd-8/src/sys/arch/amd64/compile/ADMIN_DOM0 a
md64
Architecture: x86_64
Machine: amd64
>Description:
This is on a NetBSD/Xen dom0 host. A domU with 2 file-backed disks
has been destroyed, and I suspect the scripts have called
the 2 vnconfig -u in parallel. This resulted in I/O stalls with
most processes waiting on fstchg.
Here is the ps output from ddb:
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
8753 1 3 0 0 ffffa00008091660 sshd fstchg
9834 1 3 0 0 ffffa00008091a80 sshd fstchg
9950 1 3 0 0 ffffa00008076640 sshd fstchg
9590 1 3 0 1000000 ffffa00008073200 tcsh fstchg
6430 1 3 0 0 ffffa00008073620 sshd fstchg
8033 1 3 0 0 ffffa00008076a60 sshd fstchg
3594 1 3 0 1000000 ffffa00008076220 tcsh fstchg
5271 1 3 0 0 ffffa00003f21980 xl fstchg
12415 1 3 0 0 ffffa00003d0d680 vnconfig biowait
11058 1 3 0 80 ffffa00003f1d960 sh wait
13112 1 3 0 80 ffffa00003d1cac0 vnconfig fstcnt
5137 1 3 0 80 ffffa00003f2d180 sh wait
9645 2 3 0 80 ffffa00003f0c0c0 xl netio
9645 1 3 0 0 ffffa00003f0c900 xl fstchg
13935 1 3 0 0 ffffa00003f671e0 tcsh wait
13693 1 3 0 80 ffffa00008073a40 ksh pause
12178 1 3 0 80 ffffa00003d1c280 tcsh pause
9584 1 3 0 80 ffffa00003dc2ae0 sshd select
10645 1 3 0 80 ffffa00003f67600 sshd select
10534 1 3 0 80 ffffa00003f289a0 pickup kqueue
8128 1 3 0 80 ffffa00003f10500 tcsh ttyraw
9113 1 3 0 80 ffffa00003ee10a0 sshd select
9735 1 3 0 80 ffffa00003e59460 sshd select
11149 1 3 0 80 ffffa00003f2d9c0 tcsh ttyraw
8069 1 3 0 80 ffffa00002aef6a0 ksh pause
6550 1 3 0 80 ffffa00003cef660 sshd select
12226 3 3 0 1000080 ffffa00003f1d540 qemu-dm netio
12226 2 3 0 1000080 ffffa00003e53020 qemu-dm netio
12226 1 3 0 1000000 ffffa00002f8b140 qemu-dm fstchg
3406 1 3 0 80 ffffa00002adf260 tcsh ttyraw
1451 1 3 0 80 ffffa00003e4c000 ksh pause
7092 1 3 0 80 ffffa000033955a0 tcsh pause
4314 1 3 0 80 ffffa00002aefac0 screen-4.6.2 select
4212 1 3 0 80 ffffa00002adfaa0 getty ttyraw
3519 1 3 0 80 ffffa00002adf680 getty ttyraw
6078 1 3 0 80 ffffa00002819240 getty ttyraw
7444 1 3 0 0 ffffa00002817a60 getty fstchg
6455 1 3 0 0 ffffa00003dcdb00 cron fstchg
3915 1 3 0 80 ffffa00003f21140 inetd kqueue
5033 1 3 0 80 ffffa00003e4c420 qmgr kqueue
4296 1 3 0 80 ffffa00003e5f8a0 master kqueue
2276 1 3 0 80 ffffa00003f21560 smartd nanoslp
1641 1 3 0 80 ffffa00003f335c0 upsmon nanoslp
2908 1 3 0 80 ffffa00003f17520 upsmon pipe_rd
5241 2 3 0 80 ffffa00003395180 xl netio
5241 1 3 0 80 ffffa00003dfcb40 xl select
6252 1 3 0 0 ffffa00003e17740 tcsh wait
5508 2 3 0 80 ffffa00003f1d120 xl netio
5508 1 3 0 80 ffffa00003e59880 xl select
5875 3 3 0 1000080 ffffa00003e47ba0 qemu-dm netio
5875 2 3 0 1000080 ffffa00003f10920 qemu-dm netio
5875 1 3 0 1000000 ffffa00003dcd2c0 qemu-dm fstchg
3821 1 3 0 80 ffffa00003dcd6e0 ksh pause
3651 2 3 0 80 ffffa00003f3aa00 xl netio
3651 1 3 0 80 ffffa00003dc22a0 xl select
3736 3 3 0 1000080 ffffa00003e30340 qemu-dm netio
3736 2 3 0 1000080 ffffa00003de12e0 qemu-dm netio
3736 1 3 0 80 ffffa00003de1700 qemu-dm select
6648 2 3 0 80 ffffa00003f339e0 xl netio
6648 1 3 0 80 ffffa00003dfc300 xl select
5294 3 3 0 1000080 ffffa00003e59040 qemu-dm netio
5294 2 3 0 1000080 ffffa00003f100e0 qemu-dm netio
5294 1 3 0 80 ffffa00003f331a0 qemu-dm select
3797 2 3 0 80 ffffa00003e47360 xl netio
3797 1 3 0 80 ffffa00003e5f480 xl select
6759 1 3 0 80 ffffa00003f3a5e0 tcsh pause
2896 1 3 0 80 ffffa00003f0c4e0 sshd select
5811 2 3 0 80 ffffa00003dc26c0 xl netio
5811 1 3 0 80 ffffa00003de1b20 xl select
1745 1 3 0 80 ffffa00003e17320 sshd select
4614 2 3 0 80 ffffa00003ee14c0 xl netio
4614 1 3 0 80 ffffa00003e30b80 xl select
6813 2 3 0 80 ffffa00003e47780 xl netio
6813 1 3 0 80 ffffa00003ee18e0 xl select
161 2 3 0 80 ffffa00003cef240 xl netio
161 1 3 0 80 ffffa00003d0d260 xl select
2211 2 3 0 80 ffffa00003cdd640 xl netio
2211 1 3 0 80 ffffa00003cbe200 xl select
1717 2 3 0 80 ffffa00003cbea40 xl netio
1717 1 3 0 80 ffffa00003b2b1e0 xl select
1790 2 3 0 80 ffffa00003b175e0 xl netio
1790 1 3 0 80 ffffa00003b171c0 xl select
1646 2 3 0 80 ffffa00002f93580 xenconsoled netio
1646 1 3 0 80 ffffa00002aef280 xenconsoled select
1613 1 3 0 80 ffffa000033e11a0 xenstored select
1627 1 3 0 80 ffffa000033e19e0 sshd select
1622 1 3 0 80 ffffa00002f93160 powerd kqueue
1605 1 3 0 80 ffffa000033959c0 ntpd pause
1564 1 3 0 0 ffffa00002f8b980 ipmon fstchg
1476 1 2 0 0 ffffa00002f108a0 syslogd
1 1 3 0 80 ffffa000026651e0 init wait
0 183 5 0 200 ffffa00003f28580 (zombie)
0 182 3 0 200 ffffa00003f17940 vnd13 fstchg
0 181 3 0 200 ffffa00003e5f060 vnd12 vndbp
0 160 3 0 200 ffffa00003e944a0 bridge_rtage bridge_rtage
0 159 3 0 200 ffffa00003f2d5a0 xbdb13i51712 xbdb13i51712
0 158 3 0 200 ffffa00003e94080 vnd8 fstchg
0 144 3 0 200 ffffa00003e948c0 xbdb11i768 xbdb11i768
0 143 3 0 200 ffffa00003f67a20 xbdb10i51712 xbdb10i51712
0 142 3 0 200 ffffa00002d74b00 xbdb12i1 xbdb12i1
0 140 3 0 200 ffffa00003f3a1c0 vnd11 fstchg
0 139 3 0 200 ffffa00003e53440 vnd10 fstchg
0 138 3 0 200 ffffa00003f28160 vnd9 vndbp
0 136 3 0 200 ffffa00003e4c840 xbdb8i1 xbdb8i1
0 135 3 0 200 ffffa00003e30760 xbdb7i1 xbdb7i1
0 134 3 0 200 ffffa00003e53860 vnd7 fstchg
0 133 3 0 200 ffffa00003f17100 vnd6 fstchg
0 132 3 0 200 ffffa00003b17a00 xbdb6i1 xbdb6i1
0 131 3 0 200 ffffa00003e17b60 xbdb5i1 xbdb5i1
0 130 3 0 200 ffffa00003dfc720 vnd5 fstchg
0 129 3 0 200 ffffa00003b2b600 xbdb4i1 xbdb4i1
0 128 3 0 200 ffffa00003d1c6a0 vnd4 fstchg
0 127 3 0 200 ffffa00003cefa80 xbdb3i1 xbdb3i1
0 126 3 0 200 ffffa00003d0daa0 vnd3 fstchg
0 125 3 0 200 ffffa00003cdda60 xbdb2i1 xbdb2i1
0 124 3 0 200 ffffa00003cdd220 vnd2 fstchg
0 123 3 0 200 ffffa00003b2ba20 xbdb1i1 xbdb1i1
0 122 3 0 200 ffffa00003cbe620 vnd1 fstchg
0 121 3 0 200 ffffa000022a78e0 vnd0 fstchg
0 120 3 0 200 ffffa000033e15c0 xen_balloon xen_balloon
0 119 3 0 200 ffffa00002f8b560 ipmi0 ipmi0
0 118 3 0 200 ffffa00002f939a0 bridge_rtage bridge_rtage
0 117 3 0 200 ffffa00002f84960 bridge_rtage bridge_rtage
0 116 3 0 200 ffffa00002f84540 bridge_rtage bridge_rtage
0 115 3 0 200 ffffa00002f84120 bridge_rtage bridge_rtage
0 114 3 0 200 ffffa00002f7b100 bridge_rtage bridge_rtage
0 113 3 0 200 ffffa00002f7b520 bridge_rtage bridge_rtage
0 112 3 0 200 ffffa00002f7b940 bridge_rtage bridge_rtage
0 111 3 0 200 ffffa00002f720e0 bridge_rtage bridge_rtage
0 110 3 0 200 ffffa00002f72500 bridge_rtage bridge_rtage
0 109 3 0 200 ffffa00002f72920 bridge_rtage bridge_rtage
0 108 3 0 200 ffffa00002f690c0 bridge_rtage bridge_rtage
0 107 3 0 200 ffffa00002f694e0 bridge_rtage bridge_rtage
0 106 3 0 200 ffffa00002f69900 bridge_rtage bridge_rtage
0 105 3 0 200 ffffa00002f610a0 bridge_rtage bridge_rtage
0 104 3 0 200 ffffa00002f614c0 bridge_rtage bridge_rtage
0 103 3 0 200 ffffa00002f618e0 bridge_rtage bridge_rtage
0 102 3 0 200 ffffa00002f18080 bridge_rtage bridge_rtage
0 101 3 0 200 ffffa00002f184a0 bridge_rtage bridge_rtage
0 100 3 0 200 ffffa00002f188c0 bridge_rtage bridge_rtage
0 99 3 0 200 ffffa00002f10060 bridge_rtage bridge_rtage
0 98 3 0 200 ffffa00002f10480 bridge_rtage bridge_rtage
0 97 3 0 200 ffffa00002ec92e0 bridge_rtage bridge_rtage
0 96 3 0 200 ffffa00002f07040 bridge_rtage bridge_rtage
0 95 3 0 200 ffffa00002f07460 bridge_rtage bridge_rtage
0 94 3 0 200 ffffa00002f07880 bridge_rtage bridge_rtage
0 93 3 0 200 ffffa00002efe020 bridge_rtage bridge_rtage
0 92 3 0 200 ffffa00002efe440 bridge_rtage bridge_rtage
0 91 3 0 200 ffffa00002efe860 bridge_rtage bridge_rtage
0 90 3 0 200 ffffa00002ef5000 bridge_rtage bridge_rtage
0 89 3 0 200 ffffa00002ef5420 bridge_rtage bridge_rtage
0 88 3 0 200 ffffa00002ef5840 bridge_rtage bridge_rtage
0 87 3 0 200 ffffa00002eec360 bridge_rtage bridge_rtage
0 86 3 0 200 ffffa00002ee4340 bridge_rtage bridge_rtage
0 85 3 0 200 ffffa00002eecba0 bridge_rtage bridge_rtage
0 84 3 0 200 ffffa00002eec780 bridge_rtage bridge_rtage
0 83 3 0 200 ffffa00002ee4760 bridge_rtage bridge_rtage
0 82 3 0 200 ffffa00002ee4b80 bridge_rtage bridge_rtage
0 81 3 0 200 ffffa00002edb320 bridge_rtage bridge_rtage
0 80 3 0 200 ffffa00002ed1720 bridge_rtage bridge_rtage
0 79 3 0 200 ffffa00002edbb60 bridge_rtage bridge_rtage
0 78 3 0 200 ffffa00002edb740 bridge_rtage bridge_rtage
0 77 3 0 200 ffffa00002d746e0 bridge_rtage bridge_rtage
0 76 3 0 200 ffffa00002ed1b40 bridge_rtage bridge_rtage
0 75 3 0 200 ffffa00002ed1300 bridge_rtage bridge_rtage
0 74 3 0 200 ffffa00002ec9700 bridge_rtage bridge_rtage
0 73 3 0 200 ffffa00002b0c2a0 bridge_rtage bridge_rtage
0 72 3 0 200 ffffa00002d742c0 bridge_rtage bridge_rtage
0 71 3 0 200 ffffa00002ec9b20 bridge_rtage bridge_rtage
0 70 3 0 200 ffffa00002b0c6c0 bridge_rtage bridge_rtage
0 69 3 0 200 ffffa00002b0cae0 bridge_rtage bridge_rtage
0 68 3 0 200 ffffa00002817640 physiod physiod
0 67 3 0 200 ffffa00002819660 aiodoned aiodoned
0 66 3 0 200 ffffa00002819a80 ioflush fstchg
0 65 3 0 200 ffffa00002817220 pgdaemon pgdaemon
0 62 3 0 200 ffffa00002759200 raidio0 raidiow
0 61 3 0 200 ffffa00002637140 raid0 rfnodeq
0 60 3 0 200 ffffa00002759620 atapibus0 sccomp
0 56 3 0 200 ffffa00002637560 usb7 usbevt
0 55 3 0 200 ffffa00002637980 usb6 usbevt
0 54 3 0 200 ffffa00002636120 usb5 usbevt
0 53 3 0 200 ffffa00002636540 usb4 usbevt
0 52 3 0 200 ffffa00002636960 usb3 usbevt
0 51 3 0 200 ffffa0000262e100 usb2 usbevt
0 50 3 0 200 ffffa0000262e520 usb1 usbevt
0 49 3 0 200 ffffa00002759a40 usb0 usbevt
0 48 3 0 200 ffffa00002665600 rt_free rt_free
0 47 3 0 200 ffffa00002665a20 unpgc unpgc
0 46 3 0 200 ffffa0000265f1c0 key_timehandler key_timehandler
0 45 3 0 200 ffffa0000265f5e0 icmp6_wqinput/0 icmp6_wqinput
0 44 3 0 200 ffffa0000265fa00 ip6flow_slowtim ip6flow_slowtim
o$uwk2
0 43 3 0 200 ffffa0000265c1a0 nd6_timer nd6_timer
0 42 3 0 200 ffffa0000265c5c0 carp6_wqinput/0 carp6_wqinput
0 41 3 0 200 ffffa0000265c9e0 carp_wqinput/0 carp_wqinput
0 40 3 0 200 ffffa00002651180 icmp_wqinput/0 icmp_wqinput
0 39 3 0 200 ffffa000026515a0 rt_timer rt_timer
0 38 3 0 200 ffffa000026519c0 ipflow_slowtimo ipflow_slowtimo
0 37 3 0 200 ffffa00002639160 vmem_rehash vmem_rehash
0 36 3 0 200 ffffa00002639580 xenbus rdst
0 35 3 0 200 ffffa000026399a0 xenwatch evtsq
0 26 3 0 200 ffffa0000262e940 iic0 iicintr
0 25 3 0 200 ffffa000024c80e0 atabus5 atath
0 24 3 0 200 ffffa000024c8500 atabus4 atath
0 23 3 0 200 ffffa000024c8920 atabus3 atath
0 22 3 0 200 ffffa000024bd0c0 atabus2 atath
0 21 3 0 200 ffffa000024bd4e0 atabus1 atath
0 20 3 0 200 ffffa000024bd900 atabus0 atath
0 19 3 0 200 ffffa000022a70a0 usbtask-dr usbtsk
0 18 3 0 200 ffffa000022a74c0 usbtask-hc usbtsk
0 16 3 0 200 ffffa0000208c080 ipmi ipmipoll
0 15 3 0 200 ffffa0000208c4a0 sysmon smtaskq
0 14 3 0 200 ffffa0000208c8c0 pmfsuspend pmfsuspend
0 13 3 0 200 ffffa00002086060 pmfevent pmfevent
0 12 3 0 200 ffffa00002086480 sopendfree sopendfr
0 11 3 0 200 ffffa000020868a0 nfssilly nfssilly
0 10 3 0 200 ffffa00001cf0040 cachegc cachegc tr
(unfortunably the Xen console buffer isn't large enough to have the complete
ps output).
Note that one vnconfig is on biowait, the second one on fstcnt. Other
processes are on fstchg, I guess because of one of the vnconfig.
traces of the 2 vnconfigs:
db> tr/a ffffa00003d0d680
trace: pid 12415 lid 1 at 0xffffa0004e709940
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
biowait() at netbsd:biowait+0x4f
validate_label() at netbsd:validate_label+0x2a
readdisklabel() at netbsd:readdisklabel+0x1bc
vndopen() at netbsd:vndopen+0x2db
spec_open() at netbsd:spec_open+0x385
VOP_OPEN() at netbsd:VOP_OPEN+0x2f
vn_open() at netbsd:vn_open+0x1e9
do_open() at netbsd:do_open+0x112
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9c
--- syscall (number 5) ---
758b2943e2ca:
db> tr/a ffffa00003d1cac0
trace: pid 13112 lid 1 at 0xffffa0004e744860
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait_sig() at netbsd:cv_wait_sig+0xf4
fstrans_setstate() at netbsd:fstrans_setstate+0x9f
genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
vfs_suspend() at netbsd:vfs_suspend+0x5b
vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
vrevoke() at netbsd:vrevoke+0x2b
genfs_revoke() at netbsd:genfs_revoke+0x13
VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
vdevgone() at netbsd:vdevgone+0x5a
vnddoclear() at netbsd:vnddoclear+0xb9
vndioctl() at netbsd:vndioctl+0x361
VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
vn_ioctl() at netbsd:vn_ioctl+0xa6
sys_ioctl() at netbsd:sys_ioctl+0x101
syscall() at netbsd:syscall+0x9c
--- syscall (number 54) ---
7d6abdcfedda:
I suspect the first vnconfig is stuck on biowait because the underlying
filesystem is suspended. There's lots of vnd threads stuck on fstchg:
db> tr/a ffffa00003f17940
trace: pid 0 lid 182 at 0xffffa0004c8fd4f0
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
fstrans_start() at netbsd:fstrans_start+0x78e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
genfs_getpages() at netbsd:genfs_getpages+0x1344
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x4b
ubc_fault() at netbsd:ubc_fault+0x188
uvm_fault_internal() at netbsd:uvm_fault_internal+0x6d4
trap() at netbsd:trap+0x3c1
--- trap (number 6) ---
kcopy() at netbsd:kcopy+0x15
uiomove() at netbsd:uiomove+0xb9
ubc_uiomove() at netbsd:ubc_uiomove+0xf7
ffs_read() at netbsd:ffs_read+0xf7
VOP_READ() at netbsd:VOP_READ+0x33
vn_rdwr() at netbsd:vn_rdwr+0x10c
vndthread() at netbsd:vndthread+0x4a7
>How-To-Repeat:
destroy a domU with 2 file-backed disks ? Or run multiple vnconfig -u
concurently ?
>Fix:
workaround: make sure Xen won't call more than one vnconfig -u at
once. But we need a fix for this.
Home |
Main Index |
Thread Index |
Old Index