specfs/spec_vnops.c diagnostic assertion panic

To: current-users%netbsd.org@localhost
Subject: specfs/spec_vnops.c diagnostic assertion panic
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Tue, 09 Aug 2022 22:33:22 +0700

A few times recently, I have seen the following panic (from 9.99.99)

[    36.426616] panic: kernel diagnostic assertion "sd->sd_closing" failed: file "/readonly/release/testing/src/sys/miscfs/specfs/spec_vnops.c", line 1725
[    36.426616] cpu0: Begin traceback...
[    36.426616] vpanic() at netbsd:vpanic+0x183
[    36.426616] kern_assert() at netbsd:kern_assert+0x4b
[    36.426616] spec_close8TB>
[     2.748915] wd0: drive supports 1-sector PIO transfers, sd:VOP_CLOSE+0x42
[    36.426616] vn_close() at netbsd:vn_close+0x35
[    36.426616] dklastclose() at netbsd:dklastclose+0x8a
[    36.426616] spec_close() at netbsd:spec_close+0x1bb
[    36.426616] VOP_CLOSE() at netbsd:VOP_CLOSE+0x42 
[    36.436616] vn_close() at netbsd:vn_close+0x35
[    36.436616] closef() at netbsd:closef+0x60
[    36.436616] fd_close() at netbsd:fd_close+0x138
[    36.436616] sys_close() at netbsd:sys_close+0x22

[    27.099271] cd0(ahcisata0:6:0):  DEFERRED ERROR, ked_closing" failed: file "/readonly/release/testing/src/sys/miscfs/specfs/spec_vnops.c", line 1725
[    27.389271] cpu0: Begin traceback...
[    27.389271] vpanic() at netbsd:vpanic+0x183
[    27.389271] kern_assert() at netbsd:kern_assert+0x4b
[    27.389271] E() at netbsd:VOP_CLOSE+0x42
[    27.389271] vn_close() at netbsd:vn_close+0x35
[    27.389271] dklastclose() at netbsd:dklastclose+0x8a
[    27.389271] spec_close() at netbsd:spec_close+0x1bb
[    27.389271] VOP_CLOSE() at netbsd:VOP_CLOSE+0x42
[    27.389271] vn_close() at netbsd:vn_close+0x35
[    27.399271] closef() at nx138
[    27.399271] sys_close() at netbsd:sys_close+0x22
[    27.399271] syscall() at netbsd:syscall+0xa1
[    27.399271] --- syscall (number 6) ---


panic: kernel diagnostic assertion "sd->sd_closing" failed: file "/readonly/release/testing/src/sys/miscfs/specfs/spec_vnops.c", line 1725
cpu2: Begin traceback...
vpanic() at netbsd:vpanic+0x183
kern_assert() at netbsd:kern_assert+0x4b
spec_close() at netbsd:spec_close+0x386
VOP_CLOSE() at netbsd:VOP_CLOSE+0x42 
vn_close() at netbsd:vn_close+0x35
dklastclose() at netbsd:dklastclose+0x8a
spec_close() at netbsd:spec_close+0x1bb
VOP_CLOSE() at netbsd:VOP_CLOSE+0x42 
 at netbsd:vn_close+0x35
closef() at netbsd:close3876735] sys_close() at netbsd:sys_close+0x22
syscall() at netbsd:syscall+0xa1
--- syscall (number 6) ---


They're all clearly the same thing, despite the tracebacks being
often corrupted (seemingly by other cpus doing other stuff - though
one of those somehow has gained a message from much earlier in the
boot ... note: message buffer wrap around is not likely - I have a
BIG message buffer).

In each case the panic occurs soon after boot (as shown by the timestamps
in the tracebacks where I kept them - the other is from dmesg.boot).

That would be, just as the system is running the early scripts in rc.d
(one of which is devpubd which is enabled, if that matters).

When this happens, the system reboots, and that boot, everything is fine
(there has never been a case where the subsequent boot has failed the same
way).

This does not happen often, those 3 traces were (in some random order)
from July 19, July 28, and Aug 8 (very early, Aug 7 for most people...).

The relevant assertion is this one:

        /*
         * Wake any spec_open calls waiting for close to finish -- do
         * this before reacquiring the vnode lock, because spec_open
         * holds the vnode lock while waiting, so doing this after
         * reacquiring the lock would deadlock.
         */
        mutex_enter(&device_lock);
        KASSERT(sd->sd_closing);
        sd->sd_closing = false;
        cv_broadcast(&specfs_iocv);
        mutex_exit(&device_lock);


Note that from the traceback, spec_close() seems to have been recursively
called.

The call from spec_close+01bb looks to be a call to cdev_close(), and
looks likely (from surrounding code) to be


        /*
         * If we can cancel all outstanding I/O, then wait for it to
         * drain before we call .d_close.  Drivers that split up
         * .d_cancel and .d_close this way need not have any internal
         * mechanism for waiting in .d_close for I/O to drain.
         */
        if (vp->v_type == VBLK)
                error = bdev_cancel(dev, flags, mode, curlwp);
        else
                error = cdev_cancel(dev, flags, mode, curlwp);
        if (error == 0)
                spec_io_drain(sd);
        else
                KASSERTMSG(error == ENODEV, "cancel dev=0x%lx failed with %d",
                    (unsigned long)dev, error);

        if (vp->v_type == VBLK)
                error = bdev_close(dev, flags, mode, curlwp);
        else
                error = cdev_close(dev, flags, mode, curlwp);

That one (line 1710 of spec_vnops.c).

spec_close+0x386 is the kern_assert() call from the KASSERT() above.

Anyone have any idea what this issue might be?   Or just seen it before?

kre

ps: sorry, what's above are all the details I have - no crash dump, if
(or when, unless it can be fixed based only upon the above info) this
happens again, I will see if there is a crash dump to fetch.

Follow-Ups:
- Re: specfs/spec_vnops.c diagnostic assertion panic
  - From: Taylor R Campbell

Prev by Date: Re: Virtio Viocon driver - possible to backport from OpenBSD?
Next by Date: Re: Virtio Viocon driver - possible to backport from OpenBSD?
Previous by Thread: Re: Virtio Viocon driver - possible to backport from OpenBSD?
Next by Thread: Re: specfs/spec_vnops.c diagnostic assertion panic
Indexes:

Home | Main Index | Thread Index | Old Index