tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: assumption about a device's maxphys



On Wed, Oct 10, 2012 at 12:02:21PM +0200, Manuel Bouyer wrote:
> 
> There's another case, which I think is worse: a raidframe volume, with
> underlying disks with different maxphys. If it's a raid-1 you can't predict
> from which disk a read will come from, so you don't know the maxphys.

Sure you do.  The maxphys computation for RAIDframe is simple and
consistent: it is the smallest maxphys of any component disk, multiplied
by the number of data (not parity) elements in the set.

It works the same for RAID1 as for any other RAID level: a RAID1 set has
1 data element (so does a RAID0 set) so it is simply the smallest maxphys
of any disk in the set.

> I wouldn't expect this case (a volume composed of multiple disks with
> different maxphys) to be that common, so I'm not sure we should optimise
> for this. The volume's maxphys would be the lower of all the devices's
> maxphys.
> 
> A tricky case is when the new maxphys would be smaller. You would need
> to suspend filesystemm operations before changing it, but I'm not sure
> all filesystems support this. Maybe support for splitting the request in
> VOP_STRATEGY() or another appropriate place would be better ?

That is my thinking: a device driver whose maxphys can change should split
(and potentially even combine, though this is a matter of performance not
correctness -- xbdback can already do this, though) requests as
necessary, since there is no real atomicity guarantee for a request
larger than a single sector anyhow.

I really would like to avoid reaching out through multiple layers of
indirect reference (or via several stacked up function pointers,
defeating branch prediction etc.) every time we schedule an I/O or
even consider which pages to push or pull on a single I/O.

Thor


Home | Main Index | Thread Index | Old Index