Re: ZFS disaster on -current

To: Jaromír Doleček <jaromir.dolecek%gmail.com@localhost>, NetBSD current-users <current-users%netbsd.org@localhost>
Subject: Re: ZFS disaster on -current
From: Chavdar Ivanov <ci4ic4%gmail.com@localhost>
Date: Wed, 24 Jun 2020 13:26:01 +0100
Reverting external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c to
1.16 resolved the panic. I don't know if there is a link with the
change to src/external/cddl/osnet/dist/uts/common/fs/zfs/zio.c, the
build was done with the reverted to 1.6 one.

On Wed, 24 Jun 2020 at 13:20, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
>
> On Wed, 24 Jun 2020 at 14:12, Jaromír Doleček <jaromir.dolecek%gmail.com@localhost> wrote:
> >
> > OK nvm, mlelsv@ claims it's the unrelated change to vdev_disk.c - so
> > perhaps try with that backed off, i.e. rev. 1.16 of:
> >
> > external/cddl/osnet/dist/uts/common/fs/zfs/vdev_disk.c
> >
> > Le mer. 24 juin 2020 à 14:55, Jaromír Doleček
> > <jaromir.dolecek%gmail.com@localhost> a écrit :
> > >
> > > Can you please check if it still panics the same way if you revert
> > > sources to rev. 1.6 file:
> > > src/external/cddl/osnet/dist/uts/common/fs/zfs/zio.c
>
> Backing the change in this one didn't sort the problem, identical
> panic on 'zpool import'
>
> I'll try to back vdev_disk.c now.
>
> > >
> > > and rebuild the zfs module?
> > >
> > > Jaromir
> > >
> > >
> > > Le mer. 24 juin 2020 à 14:39, Jaromír Doleček
> > > <jaromir.dolecek%gmail.com@localhost> a écrit :
> > > >
> > > > What is 'the test' ? Just modload zfs?
> > > >
> > > > Jaromir
> > > >
> > > > Le mer. 24 juin 2020 à 14:05, Chavdar Ivanov <ci4ic4%gmail.com@localhost> a écrit :
> > > > >
> > > > > Yes, I do. Should I be looking for something specific?
> > > > >
> > > > >  I've uploaded it here, if it is of interest -
> > > > > https://send.firefox.com/download/74761aa43c6c54b3/#PbaGxtDN81Hzk2VUjefozw
> > > > > .
> > > > >
> > > > > BTW I repeated the test on a pvh guest of XCP-NG, it panics the same way.
> > > > >
> > > > > Chavdar
> > > > >
> > > > > On Wed, 24 Jun 2020 at 12:07, Jaromír Doleček <jaromir.dolecek%gmail.com@localhost> wrote:
> > > > > >
> > > > > > By chance, do you have the kernel crash dump from the original panic
> > > > > > which happened yesterday? The subsequent ones might be a result of the
> > > > > > first one.
> > > > > >
> > > > > > The messages about redzone don't mean anything beyond that there is no
> > > > > > overflow protection for items on the pool.
> > > > > >
> > > > > > Jaromir
> > > > > >
> > > > > > Le mer. 24 juin 2020 à 11:34, Chavdar Ivanov <ci4ic4%gmail.com@localhost> a écrit :
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On
> > > > > > >
> > > > > > > NetBSD ymir 9.99.68 NetBSD 9.99.68 (GENERIC) #1: Tue Jun 23 22:53:46
> > > > > > > BST 2020  sysbuild@ymir:/home/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/compile/GENERIC
> > > > > > > amd64
> > > > > > >
> > > > > > > I suddenly got a panic with ZFS; it took place with the previous
> > > > > > > kernel, so it was something with the module. In single user I disabled
> > > > > > > zfs in /etc/rc.conf and was able to complete boot, but obviously
> > > > > > > without my two pools.
> > > > > > >
> > > > > > > 'modload solaris' didn't show any problem.
> > > > > > >
> > > > > > > I set aside the contents of /etc/zfs and did 'modload zfs', which resulted in:
> > > > > > >
> > > > > > > .....
> > > > > > >
> > > > > > > WARNING: ZFS on NetBSD is under development
> > > > > > > pool redzone disabled for 'zio_buf_4096'
> > > > > > > pool redzone disabled for 'zio_data_buf_4096'
> > > > > > > pool redzone disabled for 'zio_buf_8192'
> > > > > > > pool redzone disabled for 'zio_data_buf_8192'
> > > > > > > pool redzone disabled for 'zio_buf_16384'
> > > > > > > pool redzone disabled for 'zio_data_buf_16384'
> > > > > > > pool redzone disabled for 'zio_buf_32768'
> > > > > > > pool redzone disabled for 'zio_data_buf_32768'
> > > > > > > pool redzone disabled for 'zio_buf_65536'
> > > > > > > pool redzone disabled for 'zio_data_buf_65536'
> > > > > > > pool redzone disabled for 'zio_buf_131072'
> > > > > > > pool redzone disabled for 'zio_data_buf_131072'
> > > > > > > pool redzone disabled for 'zio_buf_262144'
> > > > > > > pool redzone disabled for 'zio_data_buf_262144'
> > > > > > > pool redzone disabled for 'zio_buf_524288'
> > > > > > > pool redzone disabled for 'zio_data_buf_524288'
> > > > > > > pool redzone disabled for 'zio_buf_1048576'
> > > > > > > pool redzone disabled for 'zio_data_buf_1048576'
> > > > > > > pool redzone disabled for 'zio_buf_2097152'
> > > > > > > pool redzone disabled for 'zio_data_buf_2097152'
> > > > > > > pool redzone disabled for 'zio_buf_4194304'
> > > > > > > pool redzone disabled for 'zio_data_buf_4194304'
> > > > > > > pool redzone disabled for 'zio_buf_8388608'
> > > > > > > pool redzone disabled for 'zio_data_buf_8388608'
> > > > > > > pool redzone disabled for 'zio_buf_16777216'
> > > > > > > pool redzone disabled for 'zio_data_buf_16777216'
> > > > > > >
> > > > > > > I have no idea what that means, it is a first for me, ZFS otherwise
> > > > > > > has been very reliable on this hardware so far, inasmuch as I have the
> > > > > > > mercurial repo on a zfs and build from it from time to time (the panic
> > > > > > > is from the last cvs update from yesterday, though).
> > > > > > >
> > > > > > > Subsequent 'zpool import' repeated the panic (without getting me into
> > > > > > > the debugger, though):
> > > > > > >
> > > > > > >
> > > > > > > ZFS filesystem version: 5
> > > > > > > uvm_fault(0xffffa97e4c3e1610, 0x0, 1) -> e
> > > > > > > fatal page fault in supervisor mode
> > > > > > > trap type 6 code 0 rip 0xffffffff81d49882 cs 0x8 rflags 0x10286 cr2
> > > > > > > 0xa0 ilevel 0 rsp 0xffffde819c16d760
> > > > > > > curlwp 0xffffa97e3a41e140 pid 17394.17394 lowest kstack 0xffffde819c16a2c0
> > > > > > > panic: trap
> > > > > > > cpu0: Begin traceback...
> > > > > > > vpanic() at netbsd:vpanic+0x152
> > > > > > > snprintf() at netbsd:snprintf
> > > > > > > startlwp() at netbsd:startlwp
> > > > > > > alltraps() at netbsd:alltraps+0xc3
> > > > > > > vdev_open() at zfs:vdev_open+0x9e
> > > > > > > vdev_open_children() at zfs:vdev_open_children+0x39
> > > > > > > vdev_root_open() at zfs:vdev_root_open+0x33
> > > > > > > vdev_open() at zfs:vdev_open+0x9e
> > > > > > > spa_load() at zfs:spa_load+0x38e
> > > > > > > spa_tryimport() at zfs:spa_tryimport+0x86
> > > > > > > zfs_ioc_pool_tryimport() at zfs:zfs_ioc_pool_tryimport+0x41
> > > > > > > zfsdev_ioctl() at zfs:zfsdev_ioctl+0x8c1
> > > > > > > nb_zfsdev_ioctl() at zfs:nb_zfsdev_ioctl+0x38
> > > > > > > VOP_IOCTL() at netbsd:VOP_IOCTL+0x44
> > > > > > > vn_ioctl() at netbsd:vn_ioctl+0xa5
> > > > > > > sys_ioctl() at netbsd:sys_ioctl+0x550
> > > > > > > syscall() at netbsd:syscall+0x26e
> > > > > > > --- syscall (number 54) ---
> > > > > > > netbsd:syscall+0x26e:
> > > > > > > cpu0: End traceback...
> > > > > > >
> > > > > > > The above panic did not leave a crash dump.
> > > > > > >
> > > > > > > When I had /etc/zfs populated before, I also got a crash dump (with
> > > > > > > 'reboot 0x104'), as follows:
> > > > > > >
> > > > > > > # crash -M netbsd.18.core -N netbsd.18
> > > > > > > Crash version 9.99.68, image version 9.99.68.
> > > > > > > crash: _kvm_kvatop(0)
> > > > > > > Kernel compiled without options LOCKDEBUG.
> > > > > > > System panicked: reboot forced via kernel debugger
> > > > > > > Backtrace from time of crash is available.
> > > > > > > crash> bt
> > > > > > > _KERNEL_OPT_NARCNET() at 0
> > > > > > > _KERNEL_OPT_NARCNET() at 0
> > > > > > > sys_reboot() at sys_reboot
> > > > > > > db_fncall() at db_fncall
> > > > > > > db_command() at db_command+0x127
> > > > > > > db_command_loop() at db_command_loop+0xa6
> > > > > > > db_trap() at db_trap+0xe6
> > > > > > > kdb_trap() at kdb_trap+0xe1
> > > > > > > trap() at trap+0x2b7
> > > > > > > --- trap (number 6) ---
> > > > > > > vdev_disk_open.part.4() at vdev_disk_open.part.4+0x49a
> > > > > > > vdev_open() at vdev_open+0x9e
> > > > > > > vdev_open_children() at vdev_open_children+0x39
> > > > > > > vdev_root_open() at vdev_root_open+0x33
> > > > > > > vdev_open() at vdev_open+0x9e
> > > > > > > spa_load() at spa_load+0x38e
> > > > > > > spa_load_best() at spa_load_best+0x58
> > > > > > > spa_open_common() at spa_open_common+0xc2
> > > > > > > pool_status_check.part.25() at pool_status_check.part.25+0x1e
> > > > > > > zfsdev_ioctl() at zfsdev_ioctl+0x80e
> > > > > > > nb_zfsdev_ioctl() at nb_zfsdev_ioctl+0x38
> > > > > > > VOP_IOCTL() at VOP_IOCTL+0x44
> > > > > > > vn_ioctl() at vn_ioctl+0xa5
> > > > > > > sys_ioctl() at sys_ioctl+0x550
> > > > > > > syscall() at syscall+0x26e
> > > > > > > --- syscall (number 54) ---
> > > > > > > syscall+0x26e:
> > > > > > > .....
> > > > > > >
> > > > > > > Any idea what is going on? I've restarted a build, but the cvs log
> > > > > > > doesn't show anything relevant as far as I can see.
> > > > > > >
> > > > > > >
> > > > > > > Chavdar
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > ----
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ----
>
>
>
> --
> ----



-- 
----
References:
- ZFS disaster on -current
  - From: Chavdar Ivanov
- Re: ZFS disaster on -current
  - From: Jaromír Doleček
- Re: ZFS disaster on -current
  - From: Chavdar Ivanov
Prev by Date: Re: ZFS disaster on -current
Next by Date: Automated report: NetBSD-current/i386 build success
Previous by Thread: Re: ZFS disaster on -current
Next by Thread: Re: ZFS disaster on -current
Indexes:
Home | Main Index | Thread Index | Old Index