Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: specfs/spec_vnops.c diagnostic assertion panic



Here's a hypothesis about what happened.

- You have a RAID volume, say raid0, with a GPT partitioning it.

- raid0 is configured with an explicit /etc/raid0.conf file, rather
  than with an autoconfigured label.

- You have devpubd=YES in /etc/rc.conf.

On boot, the following sequence of events occurs:

1. /etc/rc.d/devpubd launches devpubd, which synchronously enumerates
   devices and invokes hooks for all the devices it finds.  This
   _excludes_ raid0 because it hasn't been configured yet.

2. /etc/rc.d/raidframe configures raid0 from /etc/raid0.conf.

3. /etc/rc.d/fsck starts to run.

At this point, two things happen concurrently:

(a) /etc/rc.d/fsck runs fsck on dkN (some wedge of raid0)
(b) devpubd wakes and runs `dkctl raid0 listwedges' in 02-wedgenames

fsck and dkctl race to work on raid0 -- the block device.  Sometimes
this happens without trouble.  Sometimes one will try to open it while
the other is still using it, and the open will fail with EBUSY.  But
sometimes one tries to open while the other has started, but not yet
finished, closing it -- and that's when the crash happens.  With my
last patch, it should just fail with EBUSY in that case too.

Now there's a higher-level issue here -- once we fix the kernel crash,
dkctl as called via devpubd -> 02-wedgenames might lose the race with
fsck and then fail to create the wedge, or fsck might lose the race
and, well, fail to fsck your file system, both of which might be bad.

So we need to find a way to deal with this race even after we fix the
kernel crash.


Home | Main Index | Thread Index | Old Index