Subject: Re: Filesystems vs. device sector sizes
To: None <rumble@ephemeral.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/25/2007 22:46:25
--tThc/1wpZn/ma/RB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 25, 2007 at 10:41:06AM -0400, Stephen M. Rumble wrote:
> Quoting Bill Stouder-Studenmund <wrstuden@netbsd.org>:
>=20
> >So I doubt that the EFS file system on that CD Rom is not going to use a=
ny
> >block sizes smaller than 2K. So you don't need to read 2K sectors then
> >split them up.
>=20
> Is the 'not' above unintended? EFS definitely does not have variable =20
> block sizes. It's absolutely fixed at 512 bytes. Further, SGI always =20
> shipped 512b addressable drives, so far as I'm aware.

The 'not' was unintended.

If EFS really really needs files smaller than device blocks, you need to=20
use something like vnd. I don't envision us ever hacking the buffer cache=
=20
to handle sub-device-block entities.

> In this case, is there any option other than splitting sectors? For =20
> writable file systems this would probably complicate doing updates in =20
> a consistent way significantly, but I don't need to worry about that, =20
> especially when the device block size is not EFS-native (i.e. a cd-rom).

Yeah, you'd need to split sectors, and probably need vnd to do it.

> >>Similarly, whatever happened to Koji Imada's work[1] on DEV_BSIZE and
> >>Bill Studenmund's related changes[2]? Are these not pertinent to what
> >>I'm looking at?
> >
> >They are pertinent.
>=20
> [snip]
>=20
> Thanks for the quick overview. I think I need to read more of Koji's =20
> posts to better understand what the problem was, and what it intended =20
> to solve. I had initially assumed it sought to make the buffer cache =20
> magically provide the desired block size for a consumer, regardless of =
=20
> the underlying sector size. Yet I don't immediately see how updates =20
> could be easily done with consistency in such a scenario.

The main thing the work aimed at was making DEV_BSIZE not be an absolute.=
=20
In older unix, EVERYTHING had to use DEV_BSIZE blocks. So hard disks would=
=20
need reformatting for other OSs. NeXT, along with a number of other unix=20
vendors, set DEV_BSIZE to 1k and you had to reformat disks accordingly. As=
=20
an aside, they did this as you got more space on the disk (about 4% as I=20
remember).

It was not intended to "magically provide the desired block size for a=20
consumer". It was intended to have a system cope with 512b sector and 2k=20
sector disks in it using the same file system at the same time.

My recollection was that sector splitting was relegated to vnd or=20
something like it.

> >We probably need more testing of all of this, but for the most part it
> >works according to chs last time I asked (which was a few releases ago).
>=20
> Shouldn't this all be exercised when we, for example, read a cd9660 =20
> file system?

Kinda, and that part works fine. The deal is that cd9660 was written on=20
systems with DEV_BSIZE =3D=3D 512. So the CD drivers and the file system ha=
ve=20
all been developed to work together. I _think_ they are consistent with=20
what we're talking about, but it's not necessarily true. After all using=20
cd9660 on a CD drive is using a file system designed for 2k sectors on a=20
device with 2k sectors. A real test would be to see if you could dd an iso=
=20
to a hard disk (512b sectors), point cd9660 at it, and have it all just=20
work.

Take care,

Bill

--tThc/1wpZn/ma/RB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFGqDUwWz+3JHUci9cRAoijAJ9CLgmA/PyeHlceyeUaf3XxFwBxWwCfeyF+
BhXUnFx5HBxXqBJ1IZiEDfc=
=Bdjt
-----END PGP SIGNATURE-----

--tThc/1wpZn/ma/RB--