tech-kern: Re: Overlapping bread(9) behaviour

Subject: Re: Overlapping bread(9) behaviour
To: None <rumble@ephemeral.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/03/2007 17:16:48
--vmttodhTwj0NAgWp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 03, 2007 at 02:12:18PM -0400, Stephen M. Rumble wrote:
> Quoting Bill Stouder-Studenmund <wrstuden@netbsd.org>:
>=20
> >Yes. One requirement of the buffer cache is that any block on disk is
> >cached in exactly one place at any one time. Where that is can change
> >(say as a file gets deleted and the underlying disk re-used), but there's
> >only one place at any one time.
>=20
> Okay, but the bread() case of two invocations with the same offset, =20
> yet different size parameters is still poorly handled. Should an =20
> assertion be made? I don't understand why bread() should happily fetch =
=20
> an in-core buffer of length N - epsilon, allocate a new buffer of =20
> length N, copy the smaller buffer's contents, and return it claiming =20
> it's truly of length N. Is this actually useful and used, or does =20
> everybody simply not do this? If the latter, I'd like to make sure =20
> that it can't be done.

Folks simply don't do this.

> >>A few related questions: If the buffer cache expects fixed-sized
> >>buffers, does that mean for some filesystems there could be a 124-byte
> >>struct buf for each block of cached data? Also, do we not have any
> >>filesystems with extents where this sort of thing would have cropped
> >>up before?
> >
> >You could have 124-byte blocks if you wanted. The problem is that you
> >can't do atomic i/o on them, and we implicitly assume you can atomically
> >write a buffer block.
>=20
> I was referring to the size of struct buf, rather than some queer block=
=20
> size.

My appologies.

So the answer is yes, there's a 124-byte (on some architectures) structure=
=20
per buffer-cache block.

> Where do the atomic i/o assumptions stem from? Is this a guarantee =20
> provided by the disk? And if so, how strong is this guarantee? I.e., =20
> are devices specifically designed to either write a complete block or =20
> nothing at all?

It's atomic in terms of the driver. We issue the operation to disk, then=20
it tells us if it worked or not.

Exactly how strong this promise is depends on the disk.

> >Extents don't matter here. Extents are still ranges of fixed-size blocks,
> >they just are described differently in the file metadata.
> >
> >Also, we have a limit to the maximum physical transfer size. Right now
> >it's 64k. We want to raise that, but I don't think you want to be doing
> >much over 256k in general. So you won't want a 1:1 mapping between exten=
ts
> >and buffer entries, you'll want 1:many.
>=20
> I think I could do a 1:1 mapping for indirect extents, as they were =20
> artificially limited in length by SGI. The on-disk structures are =20
> capable of exceeding that 64k limit, however.

Take care,

Bill

--vmttodhTwj0NAgWp
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGiubwWz+3JHUci9cRAld+AJ99gJX4w6WELzFdf/rdxvPVVdZQ5ACcClTB
TK0Luxh14YtpVsLi+k94vn0=
=a6+a
-----END PGP SIGNATURE-----

--vmttodhTwj0NAgWp--