NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault trap, code=0
>Number: 55402
>Category: kern
>Synopsis: amd64/9.99.68/GENERIC: xen/zfs - kernel: double fault trap, code=0
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jun 20 10:55:00 +0000 2020
>Originator: Frank Kardel
>Release: NetBSD 9.99.68
>Organization:
>Environment:
System: NetBSD abstest2 9.99.68 NetBSD 9.99.68 (GENERIC) #2: Sat Jun 20 06:48:01 UTC 2020 kardel%dolomiti.hw.abs.acrys.com@localhost:/src/NetBSD/cur/src/obj.amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
While testing ZFS on a pvh instance of GENERIC
zfs scrub
runs into a double fault with an optically long stack.
Reboots run into a similary stack trace at file system check time.
>How-To-Repeat:
abstest2# zpool create data0 mirror xbd1 xbd2
abstest2# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 824G 96K 824G - 0% 0% 1.00x ONLINE -
abstest2# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data0 72.5K 798G 23K /data0
abstest2# zfs set compression=lz4 data0
abstest2# zfs set dedup=on data0
abstest2# ll /fs/nvme1/data0/
total 24
drwxr-xr-x 3 root wheel 512 Feb 22 2019 BACKUP
drwxr-xr-x 7 root wheel 512 Sep 20 2018 CA
drwxr-xr-x 4 abs abs 512 Jan 4 09:15 abs
drwxr-xr-x 8 root abs 512 Sep 6 2018 poolarranger
drwxr-xr-x 8 root abs 512 Sep 11 2018 poolarranger-test
drwxr-xr-x 3 pgsql wheel 512 Nov 21 2019 postgres
abstest2# zfs create data0/BACKUP
abstest2# zfs set compression=off data0/BACKUP
abstest2# zfs create data0/CA
abstest2# zfs set copies=2 data0/CA
abstest2# zfs create data0/abs
abstest2# zfs create data0/poolarranger
abstest2# zfs create data0/poolarranger-test
abstest2# zfs create data0/postgres
abstest2# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data0 397K 798G 25K /data0
data0/BACKUP 23K 798G 23K /data0/BACKUP
data0/CA 23K 798G 23K /data0/CA
data0/abs 23K 798G 23K /data0/abs
data0/poolarranger 23K 798G 23K /data0/poolarranger
data0/poolarranger-test 23K 798G 23K /data0/poolarranger-test
data0/postgres 23K 798G 23K /data0/postgres
abstest2# zpool scrub data0
abstest2# zpool status data0
pool: data0
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jun 20 07:34:29 2020
config:
NAME STATE READ WRITE CKSUM
data0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
xbd1 ONLINE 0 0 0
xbd2 ONLINE 0 0 0
errors: No known data errors
abstest2# rsync -av /fs/nvme1/data0/ /data0/
sending incremental file list
./
[...]
sent 619,861,176,009 bytes received 274,561 bytes 79,586,756.19 bytes/sec
total size is 619,708,937,442 speedup is 1.00
abstest2# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 824G 536G 288G - 11% 65% 1.07x ONLINE -
abstest2# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data0 574G 261G 25K /data0
data0/BACKUP 338G 261G 338G /data0/BACKUP
data0/CA 26.5M 261G 26.5M /data0/CA
data0/abs 79.0G 261G 79.0G /data0/abs
data0/poolarranger 134G 261G 134G /data0/poolarranger
data0/poolarranger-test 22.1G 261G 22.1G /data0/poolarranger-test
data0/postgres 39.5K 261G 39.5K /data0/postgres
abstest2# zpool status
pool: data0
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jun 20 07:34:29 2020
config:
NAME STATE READ WRITE CKSUM
data0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
xbd1 ONLINE 0 0 0
xbd2 ONLINE 0 0 0
errors: No known data errors
abstest2# zpool scrub data0
abstest2# zpool status
pool: data0
state: ONLINE
scan: scrub in progress since Sat Jun 20 09:49:33 2020
352M scanned out of 536G at 117M/s, 1h17m to go
0 repaired, 0.06% done
config:
NAME STATE READ WRITE CKSUM
data0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
xbd1 ONLINE 0 0 0
xbd2 ONLINE 0 0 0
errors: No known data errors
abstest2#
[ 9089.3930078] fatal double fault in supervisor mode
[ 9089.3930078] trap type 13 code 0 rip 0xffffffff80234157 cs 0x8 rflags 0x10096 cr2 0xffffc8116e792fd8 ilevel 0 rsp 0xffffc8116e792fe8
[ 9089.3930078] curlwp 0xfffff94d42842940 pid 0.2010 lowest kstack 0xffffc8116e7912c0
kernel: double fault trap, code=0
Stopped in pid 0.2010 (system) at netbsd:do_hypervisor_callback+0x1c:
movq %rax,ffffffffffffffb0(%rbp)
do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x1c
Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x19
--- interrupt ---
vdev_queue_offset_compare() at zfs:vdev_queue_offset_compare+0x7
vdev_queue_io_to_issue() at zfs:vdev_queue_io_to_issue+0x714
vdev_queue_io() at zfs:vdev_queue_io+0xec
zio_vdev_io_start() at zfs:zio_vdev_io_start+0x151
zio_execute() at zfs:zio_execute+0xe3
zio_nowait() at zfs:zio_nowait+0x5c
vdev_mirror_io_start() at zfs:vdev_mirror_io_start+0x32f
zio_vdev_io_start() at zfs:zio_vdev_io_start+0x192
zio_execute() at zfs:zio_execute+0xe3
zio_nowait() at zfs:zio_nowait+0x5c
vdev_mirror_io_start() at zfs:vdev_mirror_io_start+0x157
zio_vdev_io_start() at zfs:zio_vdev_io_start+0x33f
zio_execute() at zfs:zio_execute+0xe3
zio_nowait() at zfs:zio_nowait+0x5c
zio_ddt_read_start() at zfs:zio_ddt_read_start+0x1a6
zio_execute() at zfs:zio_execute+0xe3
zio_nowait() at zfs:zio_nowait+0x5c
dsl_scan_scrub_cb() at zfs:dsl_scan_scrub_cb+0x4e9
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x2f1
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitdnode() at zfs:dsl_scan_visitdnode+0x75
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x61e
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitdnode() at zfs:dsl_scan_visitdnode+0x75
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x7cb
dsl_scan_visitds() at zfs:dsl_scan_visitds+0xe3
dsl_scan_visit() at zfs:dsl_scan_visit+0x1ad
dsl_scan_sync() at zfs:dsl_scan_sync+0x276
spa_sync() at zfs:spa_sync+0x41c
txg_sync_thread() at zfs:txg_sync_thread+0x2d8
ds 23
es 23
fs 0
gs 0
rdi ffffc8116e793058
rsi fffff9739f3a9bf8
rbp ffffc8116e793048
rbx fffff94d427729f0
rdx 0
rcx fffff975917e1e78
rax ffffffff81a1d000
r8 260
r9 20000
r10 fffff94372cdd2d8
r11 fffff95c3761e2b0
r12 fffff975917e1c18
r13 248
r14 fffff9739f3a9e40
r15 fffff9739f3a9e40
rip ffffffff80234157 do_hypervisor_callback+0x1c
cs 8
rflags 10096
rsp ffffc8116e792fe8
ss 10
netbsd:do_hypervisor_callback+0x1c: movq %rax,ffffffffffffffb0(%rbp)
db{0}>
------------- REBOOT ------------
[...]
[ 1.0900620] boot device: dk0
[ 1.0900620] root on dk0
[ 1.1000647] root file system type: ffs
[ 1.1000647] kern.module.path=/stand/amd64/9.99.68/modules
Sat Jun 20 10:09:53 UTC 2020
Starting root file system check:
/dev/rdk0: 65945 files, 1167160 used, 3913894 free (17854 frags, 487005 blocks, 0.4% fragmentation)
/dev/rdk0: MARKING FILE SYSTEM CLEAN
[ 10.4500850] WARNING: ZFS on NetBSD is under development
[ 10.4600697] pool redzone disabled for 'zio_buf_4096'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_4096'
[ 10.4600697] pool redzone disabled for 'zio_buf_8192'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_8192'
[ 10.4600697] pool redzone disabled for 'zio_buf_16384'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_16384'
[ 10.4600697] pool redzone disabled for 'zio_buf_32768'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_32768'
[ 10.4600697] pool redzone disabled for 'zio_buf_65536'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_65536'
[ 10.4600697] pool redzone disabled for 'zio_buf_131072'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_131072'
[ 10.4600697] pool redzone disabled for 'zio_buf_262144'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_262144'
[ 10.4600697] pool redzone disabled for 'zio_buf_524288'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_524288'
[ 10.4600697] pool redzone disabled for 'zio_buf_1048576'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_1048576'
[ 10.4600697] pool redzone disabled for 'zio_buf_2097152'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_2097152'
[ 10.4600697] pool redzone disabled for 'zio_buf_4194304'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_4194304'
[ 10.4600697] pool redzone disabled for 'zio_buf_8388608'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_8388608'
[ 10.4600697] pool redzone disabled for 'zio_buf_16777216'
[ 10.4600697] pool redzone disabled for 'zio_data_buf_16777216'
[ 10.7200569] ZFS filesystem version: 5
Starting file system checks:
[ 15.9900617] fatal double fault in supervisor mode
[ 15.9900617] trap type 13 code 0 rip 0xffffffff80c932ae cs 0x8 rflags 0x10297 cr2 0xffffa3916dd12ff8 ilevel 0 rsp 0xffffa3916dd13000
[ 15.9900617] curlwp 0xffffcae5720061c0 pid 0.351 lowest kstack 0xffffa3916dd112c0
kernel: double fault trap, code=0
Stopped in pid 0.351 (system) at netbsd:mutex_vector_enter+0x8: pushq
%r13
mutex_vector_enter() at netbsd:mutex_vector_enter+0x8
pool_get() at netbsd:pool_get+0x69
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x12b
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x23a
kmem_intr_alloc() at netbsd:kmem_intr_alloc+0x5b
kmem_intr_zalloc() at netbsd:kmem_intr_zalloc+0x11
kmem_zalloc() at netbsd:kmem_zalloc+0x4a
vdev_mirror_io_start() at zfs:vdev_mirror_io_start+0x64
zio_vdev_io_start() at zfs:zio_vdev_io_start+0x192
zio_execute() at zfs:zio_execute+0xe3
zio_nowait() at zfs:zio_nowait+0x5c
vdev_mirror_io_start() at zfs:vdev_mirror_io_start+0x157
zio_vdev_io_start() at zfs:zio_vdev_io_start+0x33f
zio_execute() at zfs:zio_execute+0xe3
zio_nowait() at zfs:zio_nowait+0x5c
arc_read() at zfs:arc_read+0x4ed
dbuf_read() at zfs:dbuf_read+0x1c3
dbuf_hold_impl() at zfs:dbuf_hold_impl+0x332
dbuf_hold_impl() at zfs:dbuf_hold_impl+0x2a4
dbuf_hold() at zfs:dbuf_hold+0x22
dmu_buf_hold_noread_by_dnode() at zfs:dmu_buf_hold_noread_by_dnode+0x39
dmu_buf_hold_by_dnode() at zfs:dmu_buf_hold_by_dnode+0x2a
zap_idx_to_blk() at zfs:zap_idx_to_blk+0x97
zap_deref_leaf() at zfs:zap_deref_leaf+0x6e
fzap_length() at zfs:fzap_length+0x2a
zap_length_uint64() at zfs:zap_length_uint64+0x7b
ddt_zap_lookup() at zfs:ddt_zap_lookup+0x33
ddt_class_contains() at zfs:ddt_class_contains+0x7c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x2b4
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitdnode() at zfs:dsl_scan_visitdnode+0x75
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x61e
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x46c
dsl_scan_visitdnode() at zfs:dsl_scan_visitdnode+0x75
dsl_scan_visitbp() at zfs:dsl_scan_visitbp+0x7cb
dsl_scan_visitds() at zfs:dsl_scan_visitds+0xe3
dsl_scan_visit() at zfs:dsl_scan_visit+0x1ad
dsl_scan_sync() at zfs:dsl_scan_sync+0x276
spa_sync() at zfs:spa_sync+0x41c
txg_sync_thread() at zfs:txg_sync_thread+0x2d8
ds 23
es 23
fs 0
gs 0
rdi ffffcb10dd8a34f0
rsi 1
rbp ffffa3916dd13010
rbx ffffcb10dd8a3440
rdx b
rcx ffffcae5720061c0
rax 601
r8 ffffcb10dd8a3440
r9 1
r10 ffffcae5818bd900
r11 ffffcae5719ff078
r12 1
r13 ffffcb10dd8a34f0
r14 ffffa3916dd13118
r15 ffffcb10dd8a3e80
rip ffffffff80c932ae mutex_vector_enter+0x8
cs 8
rflags 10297
rsp ffffa3916dd13000
ss 10
netbsd:mutex_vector_enter+0x8: pushq %r13
db{0}>
>Fix:
?
could it be a stack size issue? - the stack trace seems to resemble
the pattern of tree walk.
>Unformatted:
Home |
Main Index |
Thread Index |
Old Index