NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)



On Tue, Jan 17, 2012 at 08:28:28PM -0800, Greg A. Woods wrote:
> After digging in the mfi(4) code a bit more, and poking through the
> current OpenBSD code (from which NetBSD's mfi(4) was long ago derived),
> I thought I had discovered a possible problem (as well as a few other
> bug fixes not yet imported to NetBSD).  The changes I made to mfi(4) are
> appended below my signature.  I completely removed the kernel_lock and
> reverted to using splbio() around the only the same code OpenBSD uses it
> around (w.r.t. the code paths previously protected by the kernel_lock).

I don't think either of these is the right way.

Unfortunately, I don't know enough about our storage drivers these days
to recommend one as a good example for you to look at; but I think it
would be a much better idea to look at another storage driver and see
how the locking works than to copy OpenBSD or throw KERNEL_LOCK()
calls around, which is what the changes in the source tree seem to
have done.

What is being protected by those calls?  The driver, from another kernel
subsystem?  Or the driver, from itself?  If the latter, it would be much
better to think about the driver's actual locking needs and add or adjust
locks accordingly.

If the issue is that the driver needs to do something that could sleep,
and that it's sometimes from interrupt context, the only correct
solution is to move that operation to a sleepable entity, which means
that the driver probably needs a softint callout, workqueue, or just a
kernel thread to do its deferred operations.  If not, then it's just
locking mistakes, but borrowing code from OpenBSD, where kernel
synchronization is totally different, isn't likely to help.

Thor


Home | Main Index | Thread Index | Old Index