tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: assumption about a device's maxphys



On Wed, Oct 10, 2012 at 04:04:02PM +0200, Manuel Bouyer wrote:
> On Wed, Oct 10, 2012 at 08:22:05AM -0400, Thor Lancelot Simon wrote:
> > On Wed, Oct 10, 2012 at 12:02:21PM +0200, Manuel Bouyer wrote:
> > > 
> > > There's another case, which I think is worse: a raidframe volume, with
> > > underlying disks with different maxphys. If it's a raid-1 you can't 
> > > predict
> > > from which disk a read will come from, so you don't know the maxphys.
> > 
> > Sure you do.  The maxphys computation for RAIDframe is simple and
> > consistent: it is the smallest maxphys of any component disk, multiplied
> > by the number of data (not parity) elements in the set.
> > 
> > It works the same for RAID1 as for any other RAID level: a RAID1 set has
> > 1 data element (so does a RAID0 set) so it is simply the smallest maxphys
> > of any disk in the set.
> 
> But this assumes you knows the disks at config time. The problem is that
> you can add or remove drives from a raid volume, which may change the
> maxphys. 

In the short term, I think RAIDframe should simply disallow such
operations.  Teaching RAIDframe to split I/O is not going to be pleasant
and, in practice, adding devices with smaller maxphys to an existing
RAID set while it's running should be very uncommon.

RAIDframe could, also, of course, impose the old MAXPHYS default on a
per-component basis, ensuring we cannot ever make the problem any worse
than it was -- only better.  We could allow overriding this on a per-set
basis.

> > That is my thinking: a device driver whose maxphys can change should split
> > (and potentially even combine, though this is a matter of performance not
> > correctness -- xbdback can already do this, though) requests as
> > necessary, since there is no real atomicity guarantee for a request
> > larger than a single sector anyhow.
> 
> This is one solution, but I think it should be centralised. No need to
> replicate this in every drivers which needs this.

There are very, very few drivers that would ever need this -- only drivers
that can hide multiple disks beneath a single virtual device.  We could
provide a common utility function, perhaps (I notice since the last time
I looked at implementing this in disksort(), it may have become easier,
because we grew something called nestiobuf, though I am not sure it is
right for this purpose) but I don't think we should force all I/O through
a layer that does this; it is almost never wanted.

Thor


Home | Main Index | Thread Index | Old Index