NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/57421: VND deadlock on NetBSD 10.0_BETA (Xen Dom0 on Xen 4.15)
>Number: 57421
>Category: kern
>Synopsis: VND deadlock on NetBSD 10.0_BETA (Xen Dom0 on Xen 4.15)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 19 10:40:00 +0000 2023
>Originator: Matthias Petermann
>Release: NetBSD-10.0_BETA (build from sources 2023-05-08)
>Organization:
>Environment:
NetBSD vhost2.lan 10.0_BETA NetBSD 10.0_BETA (XEN3_DOM0) #0: Mon May 8 07:40:40 UTC 2023 root@ws.local:/build/netbsd-10/obj/sys/arch/amd64/compile/XEN3_DOM0 amd64
>Description:
While untarring files to a filesystem located on a VND, a sudden
freeze occurs. This PR is the result of a discussion in ports-xen
and #netbsd on IRC where mlelstv helped me to break this down to the
root cause.
This might not be related as much to Xen as it is to the circumstance
that a Dom0 typically is operated with limited RAM.
>How-To-Repeat:
Chances are good to reproduce in a Dom0 with 512MB RAM. I guess Xen doesn't contribute much here, so maybe in a native system or another VM with similiar small RAM it can be reproduced too.
The following partition layout exists on my test machines:
[ ESP | Root FFSv2 | Swap | Data FFSv2 ]
| | | |
dk0 dk1 dk2 dk3
...while the tar files are located in the Data partition. The vnd backing files are also located on the Data partition.
1) Create the image file as sparse file:
$ bytes=$(echo "(16*1024*1024*1024)-1"|bc)
$ doas dd if=/dev/zero of=/data/vhd/net.img bs=1 count=1 seek=$bytes
2) Configure and mount the image:
$ doas vndconfig vnd0 /data/vhd/net.img
$ doas newfs -O2 -I /dev/vnd0
2$ doas mount -o log /dev/vnd0 /mnt/
3) Extract the base system components
$ cd /mnt/
$ sets="kern-GENERIC.tar.xz base.tar.xz comp.tar.xz etc.tar.xz man.tar.xz misc.tar.xz modules.tar.xz rescue.tar.xz test.tar.xz text.tar.xz"
$ setsdir=/data/install/NetBSD-10.0_BETA/amd64/binary/sets
$ for set in $sets;do doas tar xvfz $setsdir/$set;done
During the last command (tar xvfz), after it seems to work for a while,
it suddenly comes to a freeze. In a second terminal interaction and inspection with top is still possible. The CPU is in 100% idle in this state, also RAM does not seem to be an obvious problem (no swap used and
some MB still free). Anyway, interaction is somewhat limited then, and
a "sync" locks the session immediately. Also a reboot is not possible.
I was able to reproduce this problem on two separate systems. On one
of these I was able to capture the backtrace of the vnd0 thread while
the freeze situation was ongoing:
login: ++++[ 136.6327721] fatal breakpoint trap in supervisor mode
[ 136.6327721] trap type 1 code 0 rip 0xffffffff8024196d cs 0xe030 rflags 0x202 cr2 0x7971a5cff000 ilevel 0x6 rsp 0xffffb5804103fbf8
[ 136.6327721] curlwp 0xffffb58000ff8040 pid 0.2 lowest kstack 0xffffb5804103b2c0
Stopped in pid 0.2 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
xencons_tty_input() at netbsd:xencons_tty_input+0xb2
xencons_intr() at netbsd:xencons_intr+0x50
xen_intr_biglock_wrapper() at netbsd:xen_intr_biglock_wrapper+0x1b
evtchn_do_event() at netbsd:evtchn_do_event+0x118
do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x167
Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x1a
--- interrupt ---
hypercall_page() at netbsd:hypercall_page+0x3aa
idle_loop() at netbsd:idle_loop+0x146
ds 0
es 7780
fs cb79
gs fbf8
rdi ffffb580012c6080
rsi ffffffff8128fe70 rbuf.0
rbp ffffb5804103fbf8
rbx ffffffff8128fe70 rbuf.0
rdx 2b
rcx 2b
rax 1
r8 349a
r9 0
r10 0
r11 246
r12 ffffb580012c7780
r13 ffffb580012c6080
r14 ffffffff8128fe71 rbuf.0+0x1
r15 ffffb580012c5200
rip ffffffff8024196d breakpoint+0x5
cs e030
rflags 202
rsp ffffb5804103fbf8
ss e02b
netbsd:breakpoint+0x5: leave
----------------------------------------------------------------------
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
xencons_tty_input() at netbsd:xencons_tty_input+0xb2
xencons_intr() at netbsd:xencons_intr+0x50
xen_intr_biglock_wrapper() at netbsd:xen_intr_biglock_wrapper+0x1b
evtchn_do_event() at netbsd:evtchn_do_event+0x118
do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x167
Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x1a
--- interrupt ---
hypercall_page() at netbsd:hypercall_page+0x3aa
idle_loop() at netbsd:idle_loop+0x146
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
598 598 3 0 0 ffffb58002d85780 tar vndpc
580 580 3 0 180 ffffb58002d85340 top select
1718 1718 3 0 180 ffffb58002d2eb80 ksh pause
443 443 3 0 180 ffffb580028f7b40 ksh pause
1529 1529 3 0 180 ffffb58002d2e300 tmux kqueue
2168 2168 3 0 180 ffffb58001628540 tmux kqueue
436 436 3 0 180 ffffb58001858a40 ksh pause
431 431 3 0 180 ffffb580016c59c0 sshd poll
433 433 3 0 180 ffffb580016c5140 sshd poll
425 425 3 0 180 ffffb58001ef2a80 ntpd netio
428 428 3 0 180 ffffb580023866c0 getty ttyraw
427 427 3 0 180 ffffb58002386280 getty ttyraw
424 424 3 0 180 ffffb58001628980 getty ttyraw
1881 1881 3 0 180 ffffb580013fc240 getty ttyraw
1249 1249 3 0 180 ffffb580028f7700 cron nanoslp
467 467 3 0 180 ffffb58001ef2200 inetd kqueue
460 460 3 0 180 ffffb58002386b00 qmgr kqueue
459 459 3 0 180 ffffb58001ef2640 pickup kqueue
457 457 3 0 180 ffffb580028f72c0 master kqueue
173 173 3 0 180 ffffb58001f19ac0 sshd poll
1567 1567 3 0 180 ffffb58001f19240 powerd kqueue
348 348 3 0 180 ffffb58001f19680 ntpd pause
193 195 3 0 180 ffffb580017385c0 xenconsoled netio
193 193 3 0 180 ffffb580016c5580 xenconsoled poll
1720 1720 3 0 180 ffffb58001858600 xenstored poll
1081 1081 3 0 180 ffffb58001738a00 syslogd kqueue
1 1 3 0 180 ffffb580015288c0 init wait
0 306 3 0 200 ffffb580012c9180 vnd0 uvnfp1
0 1824 3 0 200 ffffb580018581c0 xen_balloon xen_balloon
0 801 3 0 200 ffffb58001738180 bridge_rtage bridge_rtage
0 156 3 0 200 ffffb580016270c0 physiod physiod
0 124 3 0 200 ffffb58001628100 pooldrain pooldrain
0 123 3 0 240 ffffb58001627940 ioflush tstile
0 122 3 0 200 ffffb58001627500 pgdaemon vndpc
0 119 3 0 200 ffffb58001539900 atapibus0 sccomp
0 116 3 0 200 ffffb580013f2a80 usb3 usbevt
0 115 3 0 200 ffffb580015394c0 usb2 usbevt
0 114 3 0 200 ffffb58001539080 npfgc0 npfgcw
0 113 3 0 200 ffffb58001528480 rt_free rt_free
0 112 3 0 200 ffffb58001528040 unpgc unpgc
0 111 3 0 200 ffffb58001501bc0 key_timehandler key_timehandler
0 110 3 0 200 ffffb58001501780 icmp6_wqinput/0 icmp6_wqinput
0 109 3 0 200 ffffb580013fc680 nd6_timer nd6_timer
0 108 3 0 200 ffffb580013fcac0 carp6_wqinput/0 carp6_wqinput
0 107 3 0 200 ffffb580013fd280 carp_wqinput/0 carp_wqinput
0 106 3 0 200 ffffb580013fd6c0 icmp_wqinput/0 icmp_wqinput
0 105 3 0 200 ffffb580013fdb00 rt_timer rt_timer
0 104 3 0 200 ffffb580014002c0 vmem_rehash vmem_rehash
0 103 3 0 200 ffffb58001501340 usb1 usbevt
0 102 3 0 200 ffffb58001403b80 usb0 usbevt
0 101 3 0 200 ffffb58001403740 xenbus xsio
0 100 3 0 200 ffffb58001403300 xenwatch evtsq
0 99 3 0 200 ffffb58001400b40 acpitz1 acpitz1
0 98 3 0 200 ffffb58001400700 acpitz0 acpitz0
0 24 3 0 200 ffffb580013f2640 entbutler entropy
0 23 3 0 240 ffffb580013f2200 atabus1 atath
0 22 3 0 240 ffffb580013cca40 atabus0 atath
0 21 3 0 200 ffffb580013cc600 wm0Reset wm0Reset
0 20 3 0 200 ffffb580013cc1c0 wm0TxRx/0 wm0TxRx
0 19 3 0 200 ffffb580012c9a00 usbtask-dr usbtsk
0 18 3 0 200 ffffb580012c95c0 usbtask-hc usbtsk
0 16 3 0 200 ffffb580010119c0 sysmon smtaskq
0 15 3 0 200 ffffb58001011580 pmfsuspend pmfsuspend
0 14 3 0 200 ffffb58001011140 pmfevent pmfevent
0 13 3 0 200 ffffb5800100e980 sopendfree sopendfr
0 12 3 0 200 ffffb5800100e540 ifwdog ifwdog
0 11 3 0 200 ffffb5800100e100 iflnkst iflnkst
0 10 3 0 200 ffffb58001003940 nfssilly nfssilly
0 9 3 0 200 ffffb58001003500 vdrain vdrain
0 8 3 0 200 ffffb580010030c0 modunload mod_unld
0 7 3 0 200 ffffb58000ffb900 xcall/0 xcall
0 6 1 0 200 ffffb58000ffb4c0 softser/0
0 5 1 0 200 ffffb58000ffb080 softclk/0
0 4 1 0 200 ffffb58000ff88c0 softbio/0
0 3 1 0 200 ffffb58000ff8480 softnet/0
0 > 2 1 0 201 ffffb58000ff8040 idle/0
0 0 3 0 200 ffffffff81140480 swapper uvm
----------------------------------------------------------------------
db{0}> bt ffffb580012c9180
sleepq_locks() at ffffffff81167300
----------------------------------------------------------------------
db{0}> bt/a ffffb580012c9180
trace: pid 0 lid 306 at 0xffffb5805201e7b0
sleepq_block() at netbsd:sleepq_block+0x13a
mtsleep() at netbsd:mtsleep+0x17f
uvn_findpage() at netbsd:uvn_findpage+0x20a
uvn_findpages() at netbsd:uvn_findpages+0xdd
genfs_getpages() at netbsd:genfs_getpages+0x6a7
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x52
ufs_balloc_range() at netbsd:ufs_balloc_range+0x114
ffs_write() at netbsd:ffs_write+0x346
VOP_WRITE() at netbsd:VOP_WRITE+0xf3
vn_rdwr() at netbsd:vn_rdwr+0xc9
vndthread() at netbsd:vndthread+0x6d8
db{0}>
>Fix:
mlelstv provided a patch for vnd.c which I am going to test now. I will respond with results as soon as possible.
Home |
Main Index |
Thread Index |
Old Index