Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: specfs/spec_vnops.c diagnostic assertion panic



    Date:        Fri, 12 Aug 2022 23:35:26 +0000
    From:        Taylor R Campbell <campbell+netbsd-current-users%mumble.net@localhost>
    Message-ID:  <20220812233531.8C22560A2F%jupiter.mumble.net@localhost>


  | Can you try _reverting_ specfs_blockopen.patch, and _applying_ the
  | attached dkopenclose.patch, and see if you can reproduce any crash?

OK, I put specfs_vnode.c back to 1.212 and applied that patch, and yes,
getting a crash from that is easy - but isn't in any way related to
specfs that I can see.

Further, this one happens early - very early - too early for the kernel
to have attached my USB keyboard, so I cannot interact with ddb at all.

But I do think I managed to see the full backtrace from the "command on
enter" stuff that happens (but it didn't get as far as the register dump),
and the actual panic message was lost, but it is certainly a KASSERT failing.

I should also note that between the last kernel build and this one, I
had updated by src tree, and I see that there were some autoconf changes
applied in that, so it is possible that it isn't your patch that caused
the problem.   I am going to undo that, and build a new kernel (with even
more updated src tree, not that I see any changes today that are likely to
matter - the audio changes cannot be related) and see what happens.   If
that works, I will apply your patch again, so that is the only change that
is being made (I will leave specfs_vnode.c at 1.212 through all of this, and
temporarily simply ignore any crash that looks like the fsck_ffs/dkctl race
issue for now).

The kernel stack trace (with most details omitted, though I have
a photo which shows it all)

vpanic()
kern_assert()
_bus_dmamem_unmap.constprop.0() at <same>+0x157
nvme_dmamem_free() at <same>+0x2c
nvme_attach() at <same>+0x4e3
nvme_pci_attach()
config_attach_internal()
config_found()
pci_probe_device()
pci_enumerate_bus()
pcirescan()
pciattach()
config_attach_internal()
config_found()
ppbattach()
config_attach_internal()
config_found()
pci_probe_device()
pci_enumerate_bus()
pcirescan()
pciattach()
config_attach_internal()
config_found()
ppbattach()
config_attach_internal()
config_found()
pci_probe_device()
pci_enumerate_bus()
pcirescan()
ppciattach()
config_attach_internal()
config_found()
mp_pci_scan()
amd64_mainbus_attach()
config_attach_internal()
config_rootfound()
cpu_configure()
main()

Note that I had to piece that together from the msgbuf stacktrace
that the panic prints, and the bt ddb command that ddb runs when
entered, and because of that (and the repititive nature of some of
it) it is entirely possible that some of the frames listed above
are duplicates.   There are definitely at least 2 instances of
pciattach()/pcirescan()/pci_enumerate_bus() in the stacktrace (those
are visible in both the dmesg and bt stack traces) - but I am unable
to tell if they are the same frames or not (actually, I cannot be
certain there aren't more instances of that group of stack frames
that don't appear on the screen at all).

The two ends of the trace will be correct however, just not necessarily
all that is in the middle.

More later after I do some more tests.

kre



Home | Main Index | Thread Index | Old Index