Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: port-xen
Date: 06/10/2005 20:39:41
On Sat, Jun 11, 2005 at 12:52:11AM +0200, Manuel Bouyer wrote:
> On Fri, Jun 10, 2005 at 01:05:03AM +0000, Jed Davis wrote:
> > 
> > The less-easy next thing: gluing together consecutive requests where
> > applicable; e.g., a 64k transfer broken into 44k and 20k parts.
> 
> I'm not sure we want to go that far. The request may be split for a
> valid reason, like write ordering requirement for the filesystem
> (although the underlying disk will probably break this anyway :(
> Of if we want to go this route, it would probably be better to do it at
> another level, so that other subsystem parts benefit from it.

I don't think that's right at all, for several reasons:

1) We cannot do this in an MI way in the only obvious place in the system
   to do it, which is disksort(), because on some architectures the
   mapping operations required to glue the transfers together are far
   too expensive (which is why the changes to do exactly that that were
   offered on the mailing lists were rejected long ago).  But the Xen
   backend is inherently tied to the current architecture (and a small
   number of related ones, perhaps) and it's reasonable to do the
   mapping operations there.

2) Not doing it *wrecks* performance by doubling the number of IOPS
   needed to handle a client OS doing the perfectly reasonable thing
   and sending us 64K writes on the assumption that, just like a Linux
   domain0, we will merge them.

3) If you only merge forward in the ring, you can't break filesystem
   ordering constraints, but you _will_ fix the problem where 64K from
   the client turns into 44K + 20K.

-- 
 Thor Lancelot Simon	                                      tls@rek.tjls.com

"The inconsistency is startling, though admittedly, if consistency is to be
 abandoned or transcended, there is no problem."		- Noam Chomsky