Subject: Re: Filesystems vs. device sector sizes
To: None <rumble@ephemeral.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/24/2007 22:41:20
--lrZ03NoBR/3+SXJZ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 24, 2007 at 09:34:53PM -0400, Stephen M. Rumble wrote:
> Hi all,
>=20
> I've a quick question regarding how a filesystem should handle =20
> underlying sector sizes. In my case, EFS uses 512-byte sectors, but =20
> most cd-roms deal with 2048-byte blocks. For a read-only filesystem =20
> this should be rather straightforward to handle in the EFS code =20
> itself, although I think I'd have to be careful about deadlocking on =20
> getblk if consecutively requested 512-byte blocks lived within the =20
> same 2048-byte sector. Is this generally the right way to be thinking =20
> about this (making it the concern of the filesystem itself), or should =
=20
> I take some other approach?

You should take another approach. Well, keep it in the file system, but do=
=20
NOT split-cache things.

I don't know much about EFS, but on the SGI web site I saw this about XFS:

Physical Disk Sector Sizes Supported

512 bytes through to 32 kilobytes (in powers of 2), with the caveat that=20
the sector size must be less than or equal to the filesystem blocksize.

So I doubt that the EFS file system on that CD Rom is not going to use any
block sizes smaller than 2K. So you don't need to read 2K sectors then=20
split them up.

> Similarly, whatever happened to Koji Imada's work[1] on DEV_BSIZE and =20
> Bill Studenmund's related changes[2]? Are these not pertinent to what =20
> I'm looking at?

They are pertinent.

What happened to my work? Zembu Labs and UBC, in that order. I left=20
NASA/Ames and so was no longer paid to work on this. So I didn't get a=20
chance to finish it. Then UBC came along, and a number of things which led=
=20
me to Koji's third approach lost importance. The cool thing about it was=20
that it would permit the buffer cache coping with non-power-of-two=20
sectors, like the raw audio sectors on a CD.

The thing though is that with UBC, most of our caching happens in the VM=20
system. And things get VERY messy if the device sector size and the VM=20
page size are relative primes. So the Audio CD idea goes out the window.

Chuck Silvers added support for non-512-byte sectors as part of UBC. It=20
probably needs more testing, but it's there.

We went with a different one of Koji's proposals. I think 4.3.2 to be
exact. We still have DEV_BSIZE, but it's just the unit of block addresses
in the buffer system. For disks with sectors larger than 512 bytes, you
still have to do i/o that starts on a natural block boundary and runs an=20
integral number of blocks in the transfer. So all the driver has to do is=
=20
a little math and it all works out. The reason we went with this route is=
=20
that a lot of UBC's operations happen in terms of bytes, so we're doing=20
this translation anyway and thus it washes out.

We probably need more testing of all of this, but for the most part it=20
works according to chs last time I asked (which was a few releases ago).

Take care,

Bill

--lrZ03NoBR/3+SXJZ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFGpuJ/Wz+3JHUci9cRAuHuAJ4wAjM0EbSPkRjG5l+SylsyXK4swACggSKY
PjNoSEdd8wtya+gQE4LBUKg=
=Rsnf
-----END PGP SIGNATURE-----

--lrZ03NoBR/3+SXJZ--