Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: dump -X of large LVM based FFSv2 with WAPBL panics



Hi,

can you try if doing full forced fsck (fsck -f) would resolve this?

I've seen several such persistent panics when I was debugging WAPBL. Even after kernel fixes I had persistent panics around ffs_newvnode() due to disk data corruption from previous runs. This is worth trying.

Some day I plan to add some counter, so that actually boot would actually force fsck every X boots even when clean, similarily what Linux does with ext3/4.

Jaromir

2017-11-15 12:56 GMT+01:00 Matthias Petermann <matthias%petermann-it.de@localhost>:
Hello,

on my system I have observed a serious panic when doing FFSv2 dumps under certain conditions. I did some googling on my own and found some references regarding the lead symptom

        "ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks ffffffffffffff00 or size 0"

but all of them ended up as solved back in 2016. So I wanted to share my observation here, in the hope somebody can give me some pointers how the issue could be narrowed down further.

1) Given:

- NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around 2017-11-06)

        NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI) #0: Mon Nov 6 14:31:17 CET 2017 admin@nuc.local:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI amd64

- A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL enabled
  (/dev/mapper/vg0-photo mounted at /p)

- (An external USB 3.0 Drive)

2) What I tried:

- make a dump of the aforementioned filesystem, using snapshots

    # dump -X -0auf /mnt/photo.0.dump /p

3) What happens then:

- the System crashes, leaving a coredump with with the following indication:

    ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks ffffffffffffff00 or size 0
    fatal page fault in supervisor mode
    trap type 6 code 0x2 rip 0xffffffff8022c0cc cs 0x8 rflags 0x10246 cr2 0xfffffe82deaddf1d ilevel 0x3 rsp 0xfffffe810e6b1eb8
    curlwp 0xfffffe827f736000 pid 0.4 lowest kstack 0xfffffe810e6ae2c0
    panic: trap
    cpu0: Begin traceback...
    vpanic() at netbsd:vpanic+0x140
    snprintf() at netbsd:snprintf
    trap() at netbsd:trap+0xc6b
    --- trap (number 6) ---
    mutex_enter() at netbsd:mutex_enter+0xc
    biodone2() at netbsd:biodone2+0x9b
    biodone2() at netbsd:biodone2+0x9b
    biointr() at netbsd:biointr+0x3a
    softint_dispatch() at netbsd:softint_dispatch+0xd3
    DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e6b1ff0
    Xsoftintr() at netbsd:Xsoftintr+0x4f
    --- interrupt ---
    0:
    cpu0: End traceback...

    dumping to dev 0,1 (offset=168119, size=2076255):
    dump

- gdb backtrace shows:

    (gdb) target kvm netbsd.3.core
    0xffffffff80229545 in cpu_reboot ()
    (gdb) bt
    #0  0xffffffff80229545 in cpu_reboot ()
    #1  0xffffffff809a4afc in vpanic ()
    #2  0xffffffff809a4bb0 in panic ()
    #3  0xffffffff8022b176 in trap ()
    #4  0xffffffff8020113e in alltraps ()
    #5  0xffffffff8022c0cc in mutex_enter ()
    #6  0xffffffff80a029f5 in wapbl_biodone ()
    #7  0xffffffff809e2f20 in biodone2 ()
    #8  0xffffffff809e2f20 in biodone2 ()
    #9  0xffffffff809e303e in biointr ()
    #10 0xffffffff8097bc1d in softint_dispatch ()
    #11 0xffffffff80223eef in Xsoftintr ()
    (gdb)

4) What I tried afterwards:

- make a dump of the aforementioned filesystem, using NO snapshots

    # dump -0auf /mnt/photo.0.dump /p

    -> works

- umount the filesystem, enforcing a manual fsck

    -> no problems

- dumpfs -s /dev/mapper/vg0-photo

    nuc# dumpfs -s /dev/mapper/vg0-photo
    file system: /dev/mapper/vg0-photo
    format  FFSv2
    endian  little-endian
    location 65536  (-b 128)
    magic   19540119        time    Wed Nov 15 12:26:52 2017
    superblock location     65536   id      [ 59f8026a 16319237 ]
    cylgrp  dynamic inodes  FFSv2   sblock  FFSv2   fslevel 5
    nbfree  4461561 ndir    1865    nifree  24770027        nffree  2079
    ncg     530     size    100663296       blocks  99102949
    bsize   32768   shift   15      mask    0xffff8000
    fsize   4096    shift   12      mask    0xfffff000
    frag    8       shift   3       fsbtodb 3
    bpg     23742   fpg     189936  ipg     46848
    minfree 5%      optim   time    maxcontig 2     maxbpg  4096
    symlinklen 120  contigsumsize 2
    maxfilesize 0x000800800805ffff
    nindir  4096    inopb   128
    avgfilesize 16384       avgfpdir 64
    sblkno  24      cblkno  32      iblkno  40      dblkno  2968
    sbsize  4096    cgsize  32768
    csaddr  2968    cssize  12288
    cgrotor 0       fmod    0       ronly   0       clean   0x01
    wapbl version 0x1       location 2      flags 0x0
    wapbl loc0 402688128    loc1 131072     loc2 512        loc3 3
    flags   none
    fsmnt   /p
    volname         swuid   0

5) Further observations:

- dump -X of other FSs on the same machine seem to work fine, but
  these FSs are smaller

I'd be glad to help identifying the root cause further.

Best regards,
Matthias

--
Matthias Petermann <matthias%petermann-it.de@localhost> | www.petermann-it.de
GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572  C3D7 7B1D A3C3 5C3E 6D75



Home | Main Index | Thread Index | Old Index