Subject: Re: Overlapping bread(9) behaviour
To: None <rumble@ephemeral.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/03/2007 10:08:45
--g7w8+K/95kPelPD2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 03, 2007 at 10:18:55AM -0400, Stephen M. Rumble wrote:
> Quoting Bill Stouder-Studenmund <wrstuden@netbsd.org>:
>=20
> >On Sun, Jul 01, 2007 at 03:01:44PM -0400, Thor Lancelot Simon wrote:
> >
> EFS doesn't do things properly at present, but it doesn't trigger this =
=20
> error, and I could easily make it conform.
>=20
> In any event, I'd either like to fix this so it behaves as expected, =20
> or update the documentation and add some assertions so people don't =20
> trip over this in the future.
>=20
> The following makes things behave as I had expected. Whether this is =20
> complete or correct is beyond my present knowledge to appropriately =20
> judge.
>=20
> --- vfs_bio.c   17 May 2007 14:51:42 -0000      1.172
> +++ vfs_bio.c   3 Jul 2007 14:13:12 -0000
> @@ -1081,6 +1081,8 @@
>         if (ISSET(bp->b_flags, B_LOCKED)) {
>                 KASSERT(bp->b_bufsize >=3D size);
>         } else {
> +               if (size > bp->b_bcount)
> +                       CLR(bp->b_flags, B_DONE);
>                 allocbuf(bp, size, preserve);
>         }
>         BIO_SETPRIO(bp, BPRIO_DEFAULT);
>=20
> It's also not terribly efficient, triggering a re-read of any =20
> overlapping region. Also, I suppose if buffers were to overlap while =20
> not beginning from the same offset, we'd have duplication and =20
> terrible, unexpected things could happen.

Yes. One requirement of the buffer cache is that any block on disk is=20
cached in exactly one place at any one time. Where that is can change=20
(say as a file gets deleted and the underlying disk re-used), but there's=
=20
only one place at any one time.

> A few related questions: If the buffer cache expects fixed-sized =20
> buffers, does that mean for some filesystems there could be a 124-byte =
=20
> struct buf for each block of cached data? Also, do we not have any =20
> filesystems with extents where this sort of thing would have cropped =20
> up before?

You could have 124-byte blocks if you wanted. The problem is that you=20
can't do atomic i/o on them, and we implicitly assume you can atomically=20
write a buffer block.

Extents don't matter here. Extents are still ranges of fixed-size blocks,=
=20
they just are described differently in the file metadata.

Also, we have a limit to the maximum physical transfer size. Right now=20
it's 64k. We want to raise that, but I don't think you want to be doing=20
much over 256k in general. So you won't want a 1:1 mapping between extents=
=20
and buffer entries, you'll want 1:many.

ext2fs uses extents and is in tree.

Take care,

Bill

--g7w8+K/95kPelPD2
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGioKdWz+3JHUci9cRAry/AKCSfrzqVkQtsfDk0O8jmA9qOqlg1QCfXPLC
LpHe/4z2XkTX4GJ5NaD4dO0=
=j9bL
-----END PGP SIGNATURE-----

--g7w8+K/95kPelPD2--