Subject: Re: RFC (reassign)buf and carvinf up buffers (was Re: SCSI MMC device abstraction and UDF patch for review)
To: Reinoud Zandijk <reinoud@netbsd.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/29/2005 13:45:52
--3Gf/FFewwPeBMqCJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 29, 2005 at 09:47:25PM +0100, Reinoud Zandijk wrote:
> On Thu, Dec 29, 2005 at 09:58:51AM -0800, Bill Studenmund wrote:
> > > That implies having a VOP_BMAP() figuring this out. Since UDF can't u=
se a=20
> > > VOP_BMAP this way (due to write shuffling) it would mean that VOP_BMA=
P=20
> > > needs to distinguish between read and write requests and for read-req=
uest=20
> > > try to figure out how much it can read in one go... quite expensive a=
nd=20
> > > locking trouble prone.
> >=20
> > This does not imply VOP_BMAP() figuring this out.
> >=20
> > The file system decides what data goes into what buffers. The file syst=
em=20
> > knows what blocks are where. Thus you don't have to figure all of this =
out=20
> > in the middle of your strategy routine, you can figure it out when you=
=20
> > make the buffers in the first place.
> >=20
> > More directly, you SHOULD figure it out before your strategy routine.
>=20
> Since UDF uses genfs, genfs decides the number of blocks to request by th=
e=20
> `runp' variable set by its VOP_BMAP() call to the filingsystem. Since UDF=
's=20
> bmap is a 1:1 translation it allways returns the maximum runlength with
>=20
>    *runp =3D MAXPHYS / lb_size;
>=20
> to make full use of long extents to read to reduce the number of=20
> transactions as much as possible. Note that this isn't happening yet but=
=20
> thats the idea behind it. If i otoh return 0 or 1 i get lb_size or=20
> 2*lb_size. So prolly i'll have to substract 1 from the *runp assignment :)

You should return the number of blocks that are contiguous. If the next=20
MAXPHYS are all together, return the runp above. If not, return less.

> > No, a VOP_STRATEGY() call does NOT represent a read/write that has noth=
ing=20
> > to do with disk mapping, it represents a read or write of a buffer. Sai=
d=20
> > buffer represents an extent on disk. One extent. If you have multiple=
=20
> > extents in your transfer, you are dealing with multiple buffers.
>=20
> true the read or write of a buffer that is created by genfs. So i allways=
=20
> have to return a runlength of one then and loosing all hope on multi-sect=
or=20
> reads?

Actually look at your metadata layout. You (the fs) know where the blocks=
=20
are. You should know how many blocks are contiguous for the passed-in=20
offset. If you don't have MAXPHYS / lb_size worth of data in a row, return=
=20
the amount you have. If you do have (MAXPHYS / lb_size) worth, return=20
MAXPHYS / lb_size.

The test should be a simple conditional. It shouldn't be that hard. :-)

For sane files, you're always going to find a lot of blocks in a row. So=20
most of the time you are going to do (MAXPHYS / lb_size) blocks.

If you really have a file that is significantly fragmented, then you have=
=20
a severe performance issue. The time it takes to figure all of this out in=
=20
VOP_BMAP() will be next to nothing compared to disk access time. So while=
=20
you need to handle it, you don't need to worry about it performing well.

Take care,

Bill

--3Gf/FFewwPeBMqCJ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFDtFkQWz+3JHUci9cRAv+8AJ9azhaK7/6h6/0QTvtr/y+X197kVgCdFLzQ
jQJXAbTvnxVy+xS+Eh9o1qU=
=e4st
-----END PGP SIGNATURE-----

--3Gf/FFewwPeBMqCJ--