Subject: Re: RFC (reassign)buf and carvinf up buffers (was Re: SCSI MMC device abstraction and UDF patch for review)
To: Reinoud Zandijk <>
From: Bill Studenmund <>
List: tech-kern
Date: 12/29/2005 13:45:52
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 29, 2005 at 09:47:25PM +0100, Reinoud Zandijk wrote:
> On Thu, Dec 29, 2005 at 09:58:51AM -0800, Bill Studenmund wrote:
> > > That implies having a VOP_BMAP() figuring this out. Since UDF can't u=
se a=20
> > > VOP_BMAP this way (due to write shuffling) it would mean that VOP_BMA=
> > > needs to distinguish between read and write requests and for read-req=
> > > try to figure out how much it can read in one go... quite expensive a=
> > > locking trouble prone.
> >=20
> > This does not imply VOP_BMAP() figuring this out.
> >=20
> > The file system decides what data goes into what buffers. The file syst=
> > knows what blocks are where. Thus you don't have to figure all of this =
> > in the middle of your strategy routine, you can figure it out when you=
> > make the buffers in the first place.
> >=20
> > More directly, you SHOULD figure it out before your strategy routine.
> Since UDF uses genfs, genfs decides the number of blocks to request by th=
> `runp' variable set by its VOP_BMAP() call to the filingsystem. Since UDF=
> bmap is a 1:1 translation it allways returns the maximum runlength with
>    *runp =3D MAXPHYS / lb_size;
> to make full use of long extents to read to reduce the number of=20
> transactions as much as possible. Note that this isn't happening yet but=
> thats the idea behind it. If i otoh return 0 or 1 i get lb_size or=20
> 2*lb_size. So prolly i'll have to substract 1 from the *runp assignment :)

You should return the number of blocks that are contiguous. If the next=20
MAXPHYS are all together, return the runp above. If not, return less.

> > No, a VOP_STRATEGY() call does NOT represent a read/write that has noth=
> > to do with disk mapping, it represents a read or write of a buffer. Sai=
> > buffer represents an extent on disk. One extent. If you have multiple=
> > extents in your transfer, you are dealing with multiple buffers.
> true the read or write of a buffer that is created by genfs. So i allways=
> have to return a runlength of one then and loosing all hope on multi-sect=
> reads?

Actually look at your metadata layout. You (the fs) know where the blocks=
are. You should know how many blocks are contiguous for the passed-in=20
offset. If you don't have MAXPHYS / lb_size worth of data in a row, return=
the amount you have. If you do have (MAXPHYS / lb_size) worth, return=20
MAXPHYS / lb_size.

The test should be a simple conditional. It shouldn't be that hard. :-)

For sane files, you're always going to find a lot of blocks in a row. So=20
most of the time you are going to do (MAXPHYS / lb_size) blocks.

If you really have a file that is significantly fragmented, then you have=
a severe performance issue. The time it takes to figure all of this out in=
VOP_BMAP() will be next to nothing compared to disk access time. So while=
you need to handle it, you don't need to worry about it performing well.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)