NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/55631: zfs mount causes panic due to assertion "fdm != NULL"



>Number:         55631
>Category:       kern
>Synopsis:       zfs mount causes panic due to assertion "fdm != NULL"
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 31 11:45:00 +0000 2020
>Originator:     Piotr Meyer
>Release:        -current from Aug 26
>Organization:
>Environment:
NetBSD szpak.local 9.99.71 NetBSD 9.99.71 (SZPAK) #0: Wed Aug 26 23:04:17 CEST 2020  aniou@szpak.local:/usr/obj/sys/arch/amd64/compile/SZPAK amd64
>Description:
1. ZFS mirror was created on machine with 8GB RAM by command:

   zpool create pool1 mirror dk11 dk23

2. After a pool creation system panicked and restarted.

3. After setting ddb.onpanic=1 I got:

a) stacktrace, available at:
   http://smutek.pl/netbsd/zfs/panic-part1.jpg
   http://smutek.pl/netbsd/zfs/panic-part2.jpg

b) core dump, that gave me following data in gdb:

(gdb) bt
#0  0xffffffff80226305 in cpu_reboot (howto=howto@entry=256, bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:713
#1  0xffffffff80c9c37f in kern_reboot (howto=howto@entry=256, bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2  0xffffffff80a611e4 in db_sync_cmd (addr=<optimized out>, have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>)
    at /usr/src/sys/ddb/db_command.c:1503
#3  0xffffffff80a61a38 in db_command (last_cmdp=last_cmdp@entry=0xffffffff8166f9e0 <db_last_command>) at /usr/src/sys/ddb/db_command.c:940
#4  0xffffffff80a61d9e in db_command_loop () at /usr/src/sys/ddb/db_command.c:599
#5  0xffffffff80a65877 in db_trap (type=type@entry=1, code=code@entry=0) at /usr/src/sys/ddb/db_trap.c:91
#6  0xffffffff802230d5 in kdb_trap (type=type@entry=1, code=code@entry=0, regs=regs@entry=0xffffd500bda9d850)
    at /usr/src/sys/arch/amd64/amd64/db_interface.c:249
#7  0xffffffff802284fc in trap (frame=0xffffd500bda9d850) at /usr/src/sys/arch/amd64/amd64/trap.c:315
#8  0xffffffff802210e3 in alltraps ()
#9  0xffffffff80221a25 in breakpoint ()
#10 0xffffffff80cdb46e in vpanic (fmt=0xffffffff814554e0 "kernel %sassertion \"%s\" failed: file \"%s\", line %d mount %p invalid", 
    ap=ap@entry=0xffffd500bda9d988) at /usr/src/sys/kern/subr_prf.c:288
#11 0xffffffff80e4ea06 in kern_assert (
    fmt=fmt@entry=0xffffffff814554e0 "kernel %sassertion \"%s\" failed: file \"%s\", line %d mount %p invalid")
    at /usr/src/sys/lib/libkern/kern_assert.c:51
#12 0xffffffff80d3f9e1 in fstrans_debug_validate_mount (mp=0xffffd34617177088) at /usr/src/sys/kern/vfs_trans.c:172
#13 fstrans_alloc_lwp_info (mp=0xffffd34617177088) at /usr/src/sys/kern/vfs_trans.c:421
#14 0xffffffff80d40672 in fstrans_get_lwp_info (do_alloc=true, mp=<optimized out>) at /usr/src/sys/kern/vfs_trans.c:464
#15 _fstrans_start (wait=1, lock_type=FSTRANS_SHARED, mp=<optimized out>) at /usr/src/sys/kern/vfs_trans.c:509
#16 _fstrans_start (wait=1, lock_type=FSTRANS_SHARED, mp=<optimized out>) at /usr/src/sys/kern/vfs_trans.c:496
#17 fstrans_start (mp=<optimized out>) at /usr/src/sys/kern/vfs_trans.c:546
#18 0xffffffff80d516db in vop_pre (vp=<optimized out>, vp=<optimized out>, op=<optimized out>, mpsafe=<optimized out>, mp=<optimized out>)
    at /usr/src/sys/kern/vnode_if.c:79
#19 vop_pre (vp=0xffffd34617177000, vp=0xffffd34617177000, op=FST_YES, mpsafe=<synthetic pointer>, mp=<synthetic pointer>)
    at /usr/src/sys/kern/vnode_if.c:55
#20 VOP_LOCK (vp=vp@entry=0xffffd34617177000, flags=flags@entry=513) at /usr/src/sys/kern/vnode_if.c:1317
#21 0xffffffff80d475a9 in vn_lock (vp=vp@entry=0xffffd34617177000, flags=flags@entry=513) at /usr/src/sys/kern/vfs_vnops.c:1058
#22 0xffffffff80e28f6e in secmodel_extensions_system_cb (action=8, cookie=<optimized out>, arg0=<optimized out>, arg3=0x0, arg2=0x0, 
    arg1=0xffffd34617177000, cred=0xffffd34611d09dc0) at /usr/src/sys/secmodel/extensions/secmodel_extensions.c:363
#23 secmodel_extensions_system_cb (cred=0xffffd34611d09dc0, action=<optimized out>, cookie=<optimized out>, arg0=<optimized out>, 
    arg1=0xffffd34617177000, arg2=0x0, arg3=<optimized out>) at /usr/src/sys/secmodel/extensions/secmodel_extensions.c:335
#24 0xffffffff80c6ae3a in kauth_authorize_action_internal (scope=0xffffd347170981c0, cred=0xffffd34611d09dc0, action=8, arg0=0x5, 
    arg1=0xffffd34617177000, arg2=0x0, arg3=arg3@entry=0x0) at /usr/src/sys/kern/kern_auth.c:968
#25 0xffffffff80c6d008 in kauth_authorize_action (scope=<optimized out>, cred=<optimized out>, action=<optimized out>, 
    arg0=<optimized out>, arg1=<optimized out>, arg2=<optimized out>, arg3=0x0) at /usr/src/sys/kern/kern_auth.c:993
#26 0xffffffff80c6d7f1 in kauth_authorize_system (cred=<optimized out>, action=<optimized out>, req=<optimized out>, arg1=<optimized out>, 
    arg2=<optimized out>, arg3=<optimized out>) at /usr/src/sys/kern/kern_auth.c:1025
#27 0xffffffff81c1715d in ?? ()
#28 0xffffd500bda9dd28 in ?? ()
#29 0xffffd347170e1cc0 in ?? ()
#30 0xffffffff80c90413 in mutex_spin_retry (mtx=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:969
#31 0x00007f7fff603660 in ?? ()
#32 0xffffd500bda9dd10 in ?? ()
#33 0xffffffff80cd91bd in pool_cache_get_paddr (pc=0xffffd34617177000, flags=298884544, pap=0xffffd3460baae000)
    at /usr/src/sys/kern/subr_pool.c:2651
#34 0xffffffff80d383c5 in VFS_MOUNT (mp=mp@entry=0xffffd34617177000, 
    a=a@entry=0x7f7fff603660 <error: Cannot access memory at address 0x7f7fff603660>, b=b@entry=0xffffd3460baae000, 
    c=c@entry=0xffffd500bda9df78) at /usr/src/sys/kern/vfs_subr.c:1343
#35 0xffffffff80d351da in mount_domount (l=l@entry=0xffffd34610d502c0, vpp=vpp@entry=0xffffd500bda9df00, vfsops=0xffffffff81abcbc0, 
    path=path@entry=0x7f7fff603660 <error: Cannot access memory at address 0x7f7fff603660>, flags=flags@entry=0, 
    data=data@entry=0xffffd3460baae000, data_len=data_len@entry=0xffffd500bda9df78) at /usr/src/sys/kern/vfs_mount.c:769
#36 0xffffffff80d3b06d in do_sys_mount (l=0xffffd34610d502c0, type=<optimized out>, type_seg=type_seg@entry=UIO_USERSPACE, 
    path=0x7f7fff603660 <error: Cannot access memory at address 0x7f7fff603660>, flags=0, data=0x7f7fff602b30, 
    data_seg=data_seg@entry=UIO_USERSPACE, data_len=<optimized out>, retval=0xffffd500bda9dfb0) at /usr/src/sys/kern/vfs_syscalls.c:552
--Type <RET> for more, q to quit, c to continue without paging--
#37 0xffffffff80d3b694 in sys___mount50 (l=<optimized out>, uap=<optimized out>, retval=<optimized out>)
    at /usr/src/sys/kern/vfs_syscalls.c:470
#38 0xffffffff8053c51e in sy_call (rval=0xffffd500bda9dfb0, uap=0xffffd500bda9e000, l=0xffffd34610d502c0, 
    sy=0xffffffff816826d0 <sysent+9840>) at /usr/src/sys/sys/syscallvar.h:65
#39 sy_invoke (code=410, rval=0xffffd500bda9dfb0, uap=0xffffd500bda9e000, l=0xffffd34610d502c0, sy=0xffffffff816826d0 <sysent+9840>)
    at /usr/src/sys/sys/syscallvar.h:94
#40 syscall (frame=0xffffd500bda9e000) at /usr/src/sys/arch/x86/x86/syscall.c:138
#41 0xffffffff8020b25d in handle_syscall ()

4. According to logs, code responsible for assertion is located at:

Aug 31 11:38:56 szpak savecore: reboot after panic: [ 594.9010917] panic: kernel diagnostic assertion "fdm != NULL" failed: file "/usr/src/sys/kern/vfs_trans.c", line 172 mount 0xffffd34617177088 invalid

static void
fstrans_debug_validate_mount(struct mount *mp)
{
    struct fstrans_debug_mount *fdm;

    KASSERT(mutex_owned(&fstrans_lock));

    SLIST_FOREACH(fdm, &fstrans_debug_mount_head, fdm_list)
        if (fdm->fdm_mount == mp)
            break;
    KASSERTMSG(fdm != NULL, "mount %p invalid", mp);                                                                                         
}   

5. My overral setup is pretty complicated, it consists:
- two HDD with multiple GPT partitions
- 30 wedges at different levels (on hw, on raid devices)
- root on wedge (GPT) on raid(4) assembled from wedges on real disks
- two gpt partitions (wedges) used for ZFS
- NFS mounts (box acts as client)
- single null_mount:

$ mount
/dev/dk24 on / type ffs (noatime, local)
/dev/dk25 on /var type ffs (noatime, local)
/dev/dk26 on /usr type ffs (noatime, local)
/dev/dk28 on /home type ffs (noatime, local)
/dev/dk29 on /ve type ffs (noatime, local)
kernfs on /kern type kernfs (local)
ptyfs on /dev/pts type ptyfs (local)
procfs on /proc type procfs (local)
tmpfs on /tmp type tmpfs (local)
tmpfs on /var/shm type tmpfs (local)
1.1.1.1:/v/distfiles on /ve/chroot/pkgsrc/mnt/distfiles type nfs
/usr/pkgsrc on /ve/chroot/pkgsrc/usr/pkgsrc type null (local)
1.1.1.1:/v/pkgsrc-wip on /ve/chroot/pkgsrc/usr/pkgsrc/wip type nfs                                                                           
tmpfs on /ve/chroot/pkgsrc/tmp type tmpfs (local)


6. The most puzzling part comes here: I have a second machine, with two disks and 8GB RAM. I *successfully* tested following scenarios:
a) NetBSD 9_RELEASE, root on single GPT partition, ZFS mirror created by  
zpool create pool1 mirror dk6 dk13 

b) NetBSD 9_RELEASE, root on raid(4) on two GPT partitions, ZFS mirror as above

c) NetBSD 9-current, root on raid(4) on two GPT partitions, ZFS mirror as above

d) NetBSD 9-current, root on raid(4) on two GPT partitions, additional null-mount between /mnt/a and /mnt/b, ZFS mirror as above

ZFS storage size was evel larger, than in previous case (350GB vs 250GB on problematic box).

>How-To-Repeat:
I don't know - It exists on one, particular machine.
>Fix:



Home | Main Index | Thread Index | Old Index