tech-kern: Re: cgd->vnd panic in BETA

Subject: Re: cgd->vnd panic in BETA_2.0
To: Jason Thorpe <thorpej@shagadelic.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 09/22/2004 09:32:46
--7VkxxUl3xUvPtoxk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Sep 19, 2004 at 12:01:34PM -0700, Jason Thorpe wrote:
>=20
> On Sep 19, 2004, at 11:47 AM, Roland C. Dowdeswell wrote:
>=20
> >Basically, when cgdiodone() is called it checks to
> >see if there are any pending transactions that need to be completed
> >and if there are it will start them.  When the cgd(4) is configured
> >on a vnd(4), this jumps into file system code which expects to be
> >run from a process context not an interrupt context.
>=20
> In general, we probably need to make sure that all disk drivers'=20
> "strategy" routines are safe for calling from interrupt context.

Yes.  When we thought about the solution to the cgd allocation problem
that is now in dk_subr, this issue of scheduling new requests in
interrupt context from the completion event came up - but we forgot
about vnd :-(

> For vnd, this probably means having a kernel thread waiting around to=20
> do the actual processing.

That's certainly something that would work.  I've been thinking about
this, and perhaps there's another way, too.  Not sure which is better,
or more elegant.

We added the deferred queuing (and re-issue on completion events) to
dk_subr to allow cgd(4) to fail "softly" in case of allocation
shortages.  There's always at least one buffer allocated, but we might
have to wait for an underlying disk event to complete before it
becomes available again.  Doing this greatly simplified the internals
of cgd(4), and there are other drivers with allocation behaviour
issues that could also similarly benefit (raidframe being one).

Perhaps the solution to this case is to invert the problem, and make
vnd(4) also be behind dk_subr.  This would allow vnd(4) to softly fail
(based on its own criteria, which might include vnode locking
protocols as well as not having a suitable context), and have dk_subr
queue the request until a later time.

This might help with some of the other potentially gnarly vnd cases, too.

The question is, can vnd come up with a suitable completion event, in
a suitable context, to restart the dk queue?  Perhaps a hybrid
solution is needed, whereby that's what the kernel thread does?

--
Dan.




--7VkxxUl3xUvPtoxk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iD8DBQFBULoeEAVxvV4N66cRAti9AJ4oMGgFFix0bGI8eXqLvU3xKI6kKQCcCiTx
I4ri3uxlwjHxxEBIo4bDjmM=
=Uj6h
-----END PGP SIGNATURE-----

--7VkxxUl3xUvPtoxk--