Re: dump -X of large LVM based FFSv2 with WAPBL panics

2017-11-15 12:56 GMT+01:00 Matthias Petermann <matthias%petermann-it.de@localhost>:

Hello,

on my system I have observed a serious panic when doing FFSv2 dumps under certain conditions. I did some googling on my own and found some references regarding the lead symptom

"ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks ffffffffffffff00 or size 0"

but all of them ended up as solved back in 2016. So I wanted to share my observation here, in the hope somebody can give me some pointers how the issue could be narrowed down further.

1) Given:

- NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around 2017-11-06)

NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI) #0: Mon Nov 6 14:31:17 CET 2017 admin@nuc.local:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI amd64

- A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL enabled
(/dev/mapper/vg0-photo mounted at /p)

- (An external USB 3.0 Drive)

2) What I tried:

- make a dump of the aforementioned filesystem, using snapshots

# dump -X -0auf /mnt/photo.0.dump /p

3) What happens then:

- the System crashes, leaving a coredump with with the following indication:

ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks ffffffffffffff00 or size 0
fatal page fault in supervisor mode
trap type 6 code 0x2 rip 0xffffffff8022c0cc cs 0x8 rflags 0x10246 cr2 0xfffffe82deaddf1d ilevel 0x3 rsp 0xfffffe810e6b1eb8
curlwp 0xfffffe827f736000 pid 0.4 lowest kstack 0xfffffe810e6ae2c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
trap() at netbsd:trap+0xc6b
--- trap (number 6) ---
mutex_enter() at netbsd:mutex_enter+0xc
biodone2() at netbsd:biodone2+0x9b
biodone2() at netbsd:biodone2+0x9b
biointr() at netbsd:biointr+0x3a
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e6b1ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
cpu0: End traceback...

dumping to dev 0,1 (offset=168119, size=2076255):
dump

- gdb backtrace shows:

(gdb) target kvm netbsd.3.core
0xffffffff80229545 in cpu_reboot ()
(gdb) bt
#0 0xffffffff80229545 in cpu_reboot ()
#1 0xffffffff809a4afc in vpanic ()
#2 0xffffffff809a4bb0 in panic ()
#3 0xffffffff8022b176 in trap ()
#4 0xffffffff8020113e in alltraps ()
#5 0xffffffff8022c0cc in mutex_enter ()
#6 0xffffffff80a029f5 in wapbl_biodone ()
#7 0xffffffff809e2f20 in biodone2 ()
#8 0xffffffff809e2f20 in biodone2 ()
#9 0xffffffff809e303e in biointr ()
#10 0xffffffff8097bc1d in softint_dispatch ()
#11 0xffffffff80223eef in Xsoftintr ()
(gdb)

4) What I tried afterwards:

- make a dump of the aforementioned filesystem, using NO snapshots

# dump -0auf /mnt/photo.0.dump /p

-> works

- umount the filesystem, enforcing a manual fsck

-> no problems

- dumpfs -s /dev/mapper/vg0-photo

nuc# dumpfs -s /dev/mapper/vg0-photo
file system: /dev/mapper/vg0-photo
format FFSv2
endian little-endian
location 65536 (-b 128)
magic 19540119 time Wed Nov 15 12:26:52 2017
superblock location 65536 id [ 59f8026a 16319237 ]
cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5
nbfree 4461561 ndir 1865 nifree 24770027 nffree 2079
ncg 530 size 100663296 blocks 99102949
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 3
bpg 23742 fpg 189936 ipg 46848
minfree 5% optim time maxcontig 2 maxbpg 4096
symlinklen 120 contigsumsize 2
maxfilesize 0x000800800805ffff
nindir 4096 inopb 128
avgfilesize 16384 avgfpdir 64
sblkno 24 cblkno 32 iblkno 40 dblkno 2968
sbsize 4096 cgsize 32768
csaddr 2968 cssize 12288
cgrotor 0 fmod 0 ronly 0 clean 0x01
wapbl version 0x1 location 2 flags 0x0
wapbl loc0 402688128 loc1 131072 loc2 512 loc3 3
flags none
fsmnt /p
volname swuid 0

5) Further observations:

- dump -X of other FSs on the same machine seem to work fine, but
these FSs are smaller

I'd be glad to help identifying the root cause further.

Best regards,
Matthias

--
Matthias Petermann <matthias%petermann-it.de@localhost> | www.petermann-it.de
GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572 C3D7 7B1D A3C3 5C3E 6D75