Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Jed Davis <jdev@panix.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 06/11/2005 00:52:11
On Fri, Jun 10, 2005 at 01:05:03AM +0000, Jed Davis wrote:
> In article <d88qle$s6r$1@sea.gmane.org>, Jed Davis <jdev@panix.com> wrote:
> > In article <1118281429.532953.2723.nullmailer@yamt.dyndns.org>,
> > YAMAMOTO Takashi  <yamt@mwd.biglobe.ne.jp> wrote:
> > > 
> > > besides that, xen_shm assumes an arbitrary request can be mapped into
> > > a single contiguous virtual address range.  it's wrong.
> > 
> > It looks like it might be possible to fix that by changing xbdback_io
> > to send off multiple xbdback_requests.  And with that done right, it
> > shouldn't be too hard to glue together request segments that can be
> > mapped into contiguous VM.
> > 
> > So I might even be able to fix this myself, if no-one more knowledgeable
> > is working on it.
> 
> And I think I have; the case that failed before is now working,
> although it's very, very slow because I'm not yet making any attempt to
> combine the transfers, and neither is disksort() for reasons Thor just
> explained to me out-of-band as I was writing this, so the disk (which is
> apparently quite old) gets a lot of 512-byte transfers (which would be
> 4k if Linux weren't being bizarre).

Thanks for looking at this ! I've seen reading the linux xen2.0.6 kernel
changes that there would probably be a problem here, but I've not got the
time to look at it.

> 
> I'm including the patch I have so far (I'd use MIME, but I'm doing this
> through gmane with trn, and am not convinced of my ability to get the
> headers right by hand).  I wasn't sure quite of the best way to have the
> Xen request be responded to only when the last IO finished; I ended up
> malloc(9)ing an int, pointing to it in the xbdback_request, and using
> that as a reference count.  And there's also a big FIXME comment that's
> due to my not knowing exactly how asynchronous things are here.
> 
> The easy next thing: gluing together the scatter/gather segments where
> applicable.
> 
> The less-easy next thing: gluing together consecutive requests where
> applicable; e.g., a 64k transfer broken into 44k and 20k parts.

I'm not sure we want to go that far. The request may be split for a
valid reason, like write ordering requirement for the filesystem
(although the underlying disk will probably break this anyway :(
Of if we want to go this route, it would probably be better to do it at
another level, so that other subsystem parts benefit from it.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--