Subject: Re: port bio(4) and bioctl(8) from openbsd ?
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 04/22/2007 23:01:41
On Sat, Apr 21, 2007 at 11:17:37PM -0700, Bill Stouder-Studenmund wrote:
> > I've an issue with lock usage in bio(4):
> > there is a bio_lock, which is a kmutex_t to protect the internal structures
> > of bio(4). A disk driver (for example mfi(4)) will also have its own
> > lock to protect its structure; and will probably call bio_register()
> > and bio_unregister() with this lock held.
> > The callback from bio(4) to mfi(4) shall be called with bio_lock held, to
> > avoid having the entry removed from under its feets (it's not the
> > case right now, but as nothing calls bio_unregister() for now, it's
> > safe).
> > Now we could have this deadlock (theorically; because mfi doesn't have
> > a mfi_detach):
> > mfi_detach() takes the mfi mutex, and calls bio_unregister() which takes
> >   bio_lock.
> > bioioctl() takes bio_lock, and call backs in mfi(4) which will want to
> >   take the mfi mutex.
> > 
> > I'm not a multithread expert programmer. How is such case handled usually ?
> 
> There are lots of ways to do this. One is reference counting. Another is 
> some sort of in-use flag. The idea is to have a second thing, other than 
> the lock, that indicates that it's in use. So once you do whatever, you 
> can be sure that the structure won't get wiped out from underneath you.
> 
> One way to do things would be to set a flag when the callout gets started
> (before it is enqueued). Then clear the flag when the callout is done
> running and has the mutex locked. If someone (the destroyer) wants to
> delete the device (or whatever it is), if the "callout hasn't finished"
> flag is set (remember to have the mutex locked when looking), set another
> flag that says "tell me when you're done" & sleep waiting for the callout
> to finish. Likewise, when the callout is clearing its "I'm running" flag,
> it does something else to wake up the destroyer.

Note that the issue is not with callouts here, but with callbacks between
2 layers (bio and the underlying driver)

> 
> Reference counting is similar except it needs les corrdination. But it 
> does need you to be able to free memory/do final cleanup whenever you 
> release a reference. That might not be true in an interrupt handler.

After reading this and thinking more about it, I think disk controller
drivers will have similar problems with other layers than bio (think bufq)
at detach time. I'll go with a detach flag in mfi or something like that.

Would anyone object if I commit the code as is, with bio_unregister() just
calling panic() for now, and dealing properly with this later when
mfi(4) gets converted to mutexes ? We need this for mfi(4), as currently
there's no way to report drive failures with NetBSD (not even a line
in dmesg). I want to get this pulled up to netbsd-4 and possibly netbsd-3,
and I'd like to first commit something not too far from the code we'll
have in the branches.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--