tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: assumption about a device's maxphys

On Wed, Oct 10, 2012 at 08:22:05AM -0400, Thor Lancelot Simon wrote:
> On Wed, Oct 10, 2012 at 12:02:21PM +0200, Manuel Bouyer wrote:
> > 
> > There's another case, which I think is worse: a raidframe volume, with
> > underlying disks with different maxphys. If it's a raid-1 you can't predict
> > from which disk a read will come from, so you don't know the maxphys.
> Sure you do.  The maxphys computation for RAIDframe is simple and
> consistent: it is the smallest maxphys of any component disk, multiplied
> by the number of data (not parity) elements in the set.
> It works the same for RAID1 as for any other RAID level: a RAID1 set has
> 1 data element (so does a RAID0 set) so it is simply the smallest maxphys
> of any disk in the set.

But this assumes you knows the disks at config time. The problem is that
you can add or remove drives from a raid volume, which may change the

> > I wouldn't expect this case (a volume composed of multiple disks with
> > different maxphys) to be that common, so I'm not sure we should optimise
> > for this. The volume's maxphys would be the lower of all the devices's
> > maxphys.
> > 
> > A tricky case is when the new maxphys would be smaller. You would need
> > to suspend filesystemm operations before changing it, but I'm not sure
> > all filesystems support this. Maybe support for splitting the request in
> > VOP_STRATEGY() or another appropriate place would be better ?
> That is my thinking: a device driver whose maxphys can change should split
> (and potentially even combine, though this is a matter of performance not
> correctness -- xbdback can already do this, though) requests as
> necessary, since there is no real atomicity guarantee for a request
> larger than a single sector anyhow.

This is one solution, but I think it should be centralised. No need to
replicate this in every drivers which needs this.

> I really would like to avoid reaching out through multiple layers of
> indirect reference (or via several stacked up function pointers,
> defeating branch prediction etc.) every time we schedule an I/O or
> even consider which pages to push or pull on a single I/O.

I agree.

Manuel Bouyer <>
     NetBSD: 26 ans d'experience feront toujours la difference

Home | Main Index | Thread Index | Old Index