Subject: Re: xbd backend disconnection
To: Jed Davis <jdev@panix.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 09/19/2005 00:27:40
On Fri, Sep 16, 2005 at 11:14:23PM -0400, Jed Davis wrote:
> 
> While testing live migration, I discovered that the xbd(4) backend
> processes disconnect requests immediately, even if I/O is still
> pending.  In that case, the kernel panics when the response message is
> written to the then-unmapped ring.
> 
> I was also able to reproduce the panic by destroying a domain
> performing I/O; in my case it was by configuring a swap device and
> causing the domain to use it heavily.
> 
> The attached patch adds a reference count on the backend instance,
> much like the one Linux has, and defers deallocating resources and
> responding to the disconnect request until the I/O is done.
> uvm_km_free can't be called from interrupt context, since it takes a
> lock, so that I deferred to the destroy message handling -- and, if
> the device is reconnected instead, reuses the still-allocated page.
> (I don't know if that case would normally be reached.)

Hi,
I've tried your patch and it doesn't work well: on my test system,
a reboot of a dumU cause the xbdback driver to go catatonic. It seems that
it never gets the CMSG_BLKIF_BE_DESTROY message, possibly because its
CMSG_BLKIF_BE_DISCONNECT reply is missed. xend sends a CMSG_BLKIF_BE_CREATE
anyway, and things start going wrong here.
It's possible that not sending the reply message in-place cause this,
and in fact all other users of ctrl_if_send_response() do a in-place
reply.

Anyway, I think it should be possible to have this working in a different
way; not deffering the DISCONNECT and DESTROY messages, while still
handling pending I/O appropriately. I'll look at this later (time to go to
bed :)

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--