Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Kernel crash trying to use union mount



I just had a really weird crash on a NetBSD/amd64-current system,
running a kernel 8.99.30 from January 2nd.  Here's what happened:

I was going to experiment with a rather large set of changes to the
local copy of the source tree, which I'd want to revert afterwards, so I
created a directory on another file system, and mounted it on top of
/usr/src with mount_union.  I then copied a 10MiB diff into /usr/src/.
That went well - the file was visible in /usr/src/, and I observed that
it was correctly stored in the auxiliary directory, as expected.

Then I tried reading the file from /usr/src/, and the system immediately
crashed, and dumped core, with the panic:

kernel diagnostic assertion "fli->fli_trans_cnt > 0" failed: file "/usr/src/sys/kern/vfs_trans.c", line 451

I had an emacs running, and the crash happened while emacs was
attempting to tab-autocomplete the name of the file for me, so it hadn't
even gotten around to reading the file itself.

The really weird thing was that when it had finished dumping, having
counted down to 1, and printed "successful" on the (serial) console, it
just sat there, completely unresponsive - but still routing packets
(it's my main server, and my gateway to the Internet)!  I let it do this
for a while, and verified that I could connect to TCP ports on it from
inside and outside (as I was still logged on the NetBSD IRC server, I
had someone there check from outside for me), but userland was obviously
not running, so there was no response from the connection.

After a bit, I hit NMI on the front panel, and it dropped nicely into
the kernel debugger.  Bactraces from each of the four CPUs:

cache_lock_cpus()
cache_reclaim()
cache_thread()

sched_pstats()
uvm_scheduler()
sysctl_alloc()

kpause()
sigsuspend1()
sys___sigsuspend14()
syscall()
--- syscall (number 294) ---

x86_stihlt()
acpicpu_cstate_idle_enter()
acpicpu_cstate_idle()
idle_loop()
cpu_hatch()

I rebooted, and found that the file I'd copied into the union mount was
complete and intact in the directory I had union mounted onto /usr/src/,
so it obviously got correctly written.  Running crash(8) on the core
dump shows:

: barsoom# ;crash -M netbsd.73.core -N netbsd.73
Crash version 8.99.30, image version 8.99.30.
System panicked: kernel diagnostic assertion "fli->fli_trans_cnt > 0" failed: file "/usr/src/sys/kern/vfs_trans.c", line 451
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
?() at ffff9a73cceb1690
vpanic() at vpanic+0x178
ch_voltag_convert_in() at ch_voltag_convert_in
fstrans_done() at fstrans_done+0x126
VOP_UNLOCK() at VOP_UNLOCK+0x5b
vput() at vput+0x11
union_lookup1() at union_lookup1+0xfe
union_lookup() at union_lookup+0xa2
VOP_LOOKUP() at VOP_LOOKUP+0x52
lookup_once() at lookup_once+0x1ef
namei_tryemulroot() at namei_tryemulroot+0x45f
namei() at namei+0x29
fd_nameiat.isra.2() at fd_nameiat.isra.2+0x36
do_sys_statat() at do_sys_statat+0x87
sys_fstatat() at sys_fstatat+0x2d
syscall() at syscall+0x173
--- syscall (number 466) ---
7f7ff5c3f61a:

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Home | Main Index | Thread Index | Old Index