Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Jed Davis <jdev@panix.com>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-xen
Date: 06/14/2005 15:26:36
On Tue, Jun 14, 2005 at 02:42:47AM +0000, Jed Davis wrote:
> In article <d8j08f$pt3$1@sea.gmane.org>, Jed Davis <jdev@panix.com> wrote:
> > 
> > And the other changes I'm making don't, so far as I know, sidestep this
> > issue.  I think I'll have to chain the actual IOs together, toss them
> > if a pool_get fails, run them all at the end of the segment loop, and
> > adjust the xenshm callback to match.  Except that the callback can fail
> > to install for want of memory, it looks like.  That's... annoying.
> 
> The resource-alllocation problem is even more fun than that.  It looks
> as though, when the pool of xbdback_requests runs out -- and it will;
> just create enough xbd's and throw enough IO at them -- any further
> requests get the equivalent of EIO, whereas they should, I think, just
> be left in the ring.  That on top of the xen_shm_callback thing.

Yes, this is something that can happen. I've not seen it in real usage
yet, though.

> Now, the Linux backend handles this by spinning off a kernel thread
> to do the request-juggling; they don't have to care about blocking
> allocation in that context, and indeed don't.  I don't know what the
> NetBSD view on this sort of thing is.

I'd prefer is we didn't go with a kernel thread for this. kernel thread
inplies context switching which is not cheap. I think we can just use
a callback-based mechanism (much like xm_shm does) around pool_put().
Would be nice if pool(9) already had support for this :)

> 
> Furthermore, the Linux backend accepts only a certain number (by
> default, 16 out of a possible 64 pending; though, if they really
> coalesce in the disksort-equivalent as they appear to, why not just do
> one at a time?) of requests at once from a single domain -- i.e., a
> simple round-robin arrangement -- presumably to keep one domain from
> crowding out all the rest.

Yes, I've seen this code. I didn't want to implement this because I think it's
the wrong place for this sort of thing. We need some sort of I/O scheduler,
and doing it closer to the disk's buffer queue would improve the whole system,
not only xen's backend.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
     NetBSD: 26 ans d'experience feront toujours la difference
--