NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/57421: VND deadlock on NetBSD 10.0_BETA (Xen Dom0 on Xen 4.15)



>Number:         57421
>Category:       kern
>Synopsis:       VND deadlock on NetBSD 10.0_BETA (Xen Dom0 on Xen 4.15)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 19 10:40:00 +0000 2023
>Originator:     Matthias Petermann
>Release:        NetBSD-10.0_BETA (build from sources 2023-05-08)
>Organization:
>Environment:
NetBSD vhost2.lan 10.0_BETA NetBSD 10.0_BETA (XEN3_DOM0) #0: Mon May  8 07:40:40 UTC 2023  root@ws.local:/build/netbsd-10/obj/sys/arch/amd64/compile/XEN3_DOM0 amd64

>Description:
While untarring files to a filesystem located on a VND, a sudden
freeze occurs. This PR is the result of a discussion in ports-xen
and #netbsd on IRC where mlelstv helped me to break this down to the
root cause. 

This might not be related as much to Xen as it is to the circumstance
that a Dom0 typically is operated with limited RAM.
>How-To-Repeat:
Chances are good to reproduce in a Dom0 with 512MB RAM. I guess Xen doesn't contribute much here, so maybe in a native system or another VM with similiar small RAM it can be reproduced too.

The following partition layout exists on my test machines:

[ ESP | Root FFSv2 | Swap | Data FFSv2 ]
   |        |          |        |
  dk0      dk1        dk2      dk3     

...while the tar files are located in the Data partition. The vnd backing files are also located on the Data partition.

1) Create the image file as sparse file:

  $ bytes=$(echo "(16*1024*1024*1024)-1"|bc)
  $ doas dd if=/dev/zero of=/data/vhd/net.img bs=1 count=1 seek=$bytes

2) Configure and mount the image:

  $ doas vndconfig vnd0 /data/vhd/net.img
  $ doas newfs -O2 -I /dev/vnd0
  2$ doas mount -o log /dev/vnd0 /mnt/

3) Extract the base system components

  $ cd /mnt/
  $ sets="kern-GENERIC.tar.xz base.tar.xz comp.tar.xz etc.tar.xz man.tar.xz misc.tar.xz modules.tar.xz rescue.tar.xz test.tar.xz text.tar.xz"
  $ setsdir=/data/install/NetBSD-10.0_BETA/amd64/binary/sets
  $ for set in $sets;do doas tar xvfz $setsdir/$set;done 

During the last command (tar xvfz), after it seems to work for a while,
it suddenly comes to a freeze. In a second terminal interaction and inspection with top is still possible. The CPU is in 100% idle in this state, also RAM does not seem to be an obvious problem (no swap used and 
some MB still free). Anyway, interaction is somewhat limited then, and
a "sync" locks the session immediately. Also a reboot is not possible.

I was able to reproduce this problem on two separate systems. On one
of these I was able to capture the backtrace of the vnd0 thread while
the freeze situation was ongoing:

login: ++++[ 136.6327721] fatal breakpoint trap in supervisor mode
[ 136.6327721] trap type 1 code 0 rip 0xffffffff8024196d cs 0xe030 rflags 0x202 cr2 0x7971a5cff000 ilevel 0x6 rsp 0xffffb5804103fbf8
[ 136.6327721] curlwp 0xffffb58000ff8040 pid 0.2 lowest kstack 0xffffb5804103b2c0
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
xencons_tty_input() at netbsd:xencons_tty_input+0xb2
xencons_intr() at netbsd:xencons_intr+0x50
xen_intr_biglock_wrapper() at netbsd:xen_intr_biglock_wrapper+0x1b
evtchn_do_event() at netbsd:evtchn_do_event+0x118
do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x167
Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x1a
--- interrupt ---
hypercall_page() at netbsd:hypercall_page+0x3aa
idle_loop() at netbsd:idle_loop+0x146
ds          0
es          7780
fs          cb79
gs          fbf8
rdi         ffffb580012c6080
rsi         ffffffff8128fe70    rbuf.0
rbp         ffffb5804103fbf8
rbx         ffffffff8128fe70    rbuf.0
rdx         2b
rcx         2b
rax         1
r8          349a
r9          0
r10         0
r11         246
r12         ffffb580012c7780
r13         ffffb580012c6080
r14         ffffffff8128fe71    rbuf.0+0x1
r15         ffffb580012c5200
rip         ffffffff8024196d    breakpoint+0x5
cs          e030
rflags      202
rsp         ffffb5804103fbf8
ss          e02b
netbsd:breakpoint+0x5:  leave
----------------------------------------------------------------------
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
xencons_tty_input() at netbsd:xencons_tty_input+0xb2
xencons_intr() at netbsd:xencons_intr+0x50
xen_intr_biglock_wrapper() at netbsd:xen_intr_biglock_wrapper+0x1b
evtchn_do_event() at netbsd:evtchn_do_event+0x118
do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x167
Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x1a
--- interrupt ---
hypercall_page() at netbsd:hypercall_page+0x3aa
idle_loop() at netbsd:idle_loop+0x146
db{0}> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
598    598 3   0         0   ffffb58002d85780                tar vndpc
580    580 3   0       180   ffffb58002d85340                top select
1718  1718 3   0       180   ffffb58002d2eb80                ksh pause
443    443 3   0       180   ffffb580028f7b40                ksh pause
1529  1529 3   0       180   ffffb58002d2e300               tmux kqueue
2168  2168 3   0       180   ffffb58001628540               tmux kqueue
436    436 3   0       180   ffffb58001858a40                ksh pause
431    431 3   0       180   ffffb580016c59c0               sshd poll
433    433 3   0       180   ffffb580016c5140               sshd poll
425    425 3   0       180   ffffb58001ef2a80               ntpd netio
428    428 3   0       180   ffffb580023866c0              getty ttyraw
427    427 3   0       180   ffffb58002386280              getty ttyraw
424    424 3   0       180   ffffb58001628980              getty ttyraw
1881  1881 3   0       180   ffffb580013fc240              getty ttyraw
1249  1249 3   0       180   ffffb580028f7700               cron nanoslp
467    467 3   0       180   ffffb58001ef2200              inetd kqueue
460    460 3   0       180   ffffb58002386b00               qmgr kqueue
459    459 3   0       180   ffffb58001ef2640             pickup kqueue
457    457 3   0       180   ffffb580028f72c0             master kqueue
173    173 3   0       180   ffffb58001f19ac0               sshd poll
1567  1567 3   0       180   ffffb58001f19240             powerd kqueue
348    348 3   0       180   ffffb58001f19680               ntpd pause
193    195 3   0       180   ffffb580017385c0        xenconsoled netio
193    193 3   0       180   ffffb580016c5580        xenconsoled poll
1720  1720 3   0       180   ffffb58001858600          xenstored poll
1081  1081 3   0       180   ffffb58001738a00            syslogd kqueue
1        1 3   0       180   ffffb580015288c0               init wait
0      306 3   0       200   ffffb580012c9180               vnd0 uvnfp1
0     1824 3   0       200   ffffb580018581c0        xen_balloon xen_balloon
0      801 3   0       200   ffffb58001738180       bridge_rtage bridge_rtage
0      156 3   0       200   ffffb580016270c0            physiod physiod
0      124 3   0       200   ffffb58001628100          pooldrain pooldrain
0      123 3   0       240   ffffb58001627940            ioflush tstile
0      122 3   0       200   ffffb58001627500           pgdaemon vndpc
0      119 3   0       200   ffffb58001539900          atapibus0 sccomp
0      116 3   0       200   ffffb580013f2a80               usb3 usbevt
0      115 3   0       200   ffffb580015394c0               usb2 usbevt
0      114 3   0       200   ffffb58001539080             npfgc0 npfgcw
0      113 3   0       200   ffffb58001528480            rt_free rt_free
0      112 3   0       200   ffffb58001528040              unpgc unpgc
0      111 3   0       200   ffffb58001501bc0    key_timehandler key_timehandler

0      110 3   0       200   ffffb58001501780    icmp6_wqinput/0 icmp6_wqinput
0      109 3   0       200   ffffb580013fc680          nd6_timer nd6_timer
0      108 3   0       200   ffffb580013fcac0    carp6_wqinput/0 carp6_wqinput
0      107 3   0       200   ffffb580013fd280     carp_wqinput/0 carp_wqinput
0      106 3   0       200   ffffb580013fd6c0     icmp_wqinput/0 icmp_wqinput
0      105 3   0       200   ffffb580013fdb00           rt_timer rt_timer
0      104 3   0       200   ffffb580014002c0        vmem_rehash vmem_rehash
0      103 3   0       200   ffffb58001501340               usb1 usbevt
0      102 3   0       200   ffffb58001403b80               usb0 usbevt
0      101 3   0       200   ffffb58001403740             xenbus xsio
0      100 3   0       200   ffffb58001403300           xenwatch evtsq
0       99 3   0       200   ffffb58001400b40            acpitz1 acpitz1
0       98 3   0       200   ffffb58001400700            acpitz0 acpitz0
0       24 3   0       200   ffffb580013f2640          entbutler entropy
0       23 3   0       240   ffffb580013f2200            atabus1 atath
0       22 3   0       240   ffffb580013cca40            atabus0 atath
0       21 3   0       200   ffffb580013cc600           wm0Reset wm0Reset
0       20 3   0       200   ffffb580013cc1c0          wm0TxRx/0 wm0TxRx
0       19 3   0       200   ffffb580012c9a00         usbtask-dr usbtsk
0       18 3   0       200   ffffb580012c95c0         usbtask-hc usbtsk
0       16 3   0       200   ffffb580010119c0             sysmon smtaskq
0       15 3   0       200   ffffb58001011580         pmfsuspend pmfsuspend
0       14 3   0       200   ffffb58001011140           pmfevent pmfevent
0       13 3   0       200   ffffb5800100e980         sopendfree sopendfr
0       12 3   0       200   ffffb5800100e540             ifwdog ifwdog
0       11 3   0       200   ffffb5800100e100            iflnkst iflnkst
0       10 3   0       200   ffffb58001003940           nfssilly nfssilly
0        9 3   0       200   ffffb58001003500             vdrain vdrain
0        8 3   0       200   ffffb580010030c0          modunload mod_unld
0        7 3   0       200   ffffb58000ffb900            xcall/0 xcall
0        6 1   0       200   ffffb58000ffb4c0          softser/0
0        5 1   0       200   ffffb58000ffb080          softclk/0
0        4 1   0       200   ffffb58000ff88c0          softbio/0
0        3 1   0       200   ffffb58000ff8480          softnet/0
0    >   2 1   0       201   ffffb58000ff8040             idle/0
0        0 3   0       200   ffffffff81140480            swapper uvm
----------------------------------------------------------------------
db{0}> bt ffffb580012c9180
sleepq_locks() at ffffffff81167300
----------------------------------------------------------------------
db{0}> bt/a ffffb580012c9180
trace: pid 0 lid 306 at 0xffffb5805201e7b0
sleepq_block() at netbsd:sleepq_block+0x13a
mtsleep() at netbsd:mtsleep+0x17f
uvn_findpage() at netbsd:uvn_findpage+0x20a
uvn_findpages() at netbsd:uvn_findpages+0xdd
genfs_getpages() at netbsd:genfs_getpages+0x6a7
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x52
ufs_balloc_range() at netbsd:ufs_balloc_range+0x114
ffs_write() at netbsd:ffs_write+0x346
VOP_WRITE() at netbsd:VOP_WRITE+0xf3
vn_rdwr() at netbsd:vn_rdwr+0xc9
vndthread() at netbsd:vndthread+0x6d8
db{0}> 
>Fix:
mlelstv provided a patch for vnd.c which I am going to test now. I will respond with results as soon as possible.



Home | Main Index | Thread Index | Old Index