Subject: Re: Supporting sector size != DEV_BSIZE
To: Darrin B. Jewell <jewell@mit.edu>
From: Trevin Beattie <trevin@xmission.com>
List: tech-kern
Date: 06/24/2002 23:13:57
At 11:43 PM 6/24/2002 -0400, Darrin B. Jewell wrote:
>
>Bill Studenmund <wrstuden@netbsd.org> writes:
>| >   . units based on the ffs superblock (see FFS_DEV_BSIZE below)
>| 
>| Note: those are file system blocks aka frags.
>
>I would like to carefully assert that my definition of FFS_DEV_BSIZE
>is explicitly not the file system fragment size.  Under my definition,

I suspect Bill simply didn't look closely enough at your macro definitions.
 The units I'm sure you're talking about is 2^(fs_fshift - fs_fsbtodb),
which I'm beginning to see originally meant "disk blocks" a.k.a. sectors,
but somewhere along the line got mixed up with DEV_BSIZEs.

>
>I also _always_ define the kernel constant DEV_BSIZE to be 512 and
>_never_ use a different value for it.  By treating it as a fundamental
>constant that never changes and is never retrieved from persistent media,
>it becomes an independent unit.

Too bad not all implementations have treated it as such.  :-/

>| >   . units based on the disklabel d_secsize
>| >      ( this should always match the hardware device)
>| 
>| Note: the latter isn't necessarily true. If you take a disk image & move
>| it to another system, it may change. Folks wish to continue using the
>| disklabel number.
>
>This is why I mentioned it.  I am not as adamant about this,
>but I was thinking that the in core value for this field
>should always match the hardware sector size.

I tend to agree with you on this point.  For one thing, if the disk label's
sector size were smaller than the media's sector size, then I/O on the
device could fail when the kernel tries to read partial sectors.  On the
other hand, d_secsize should be used in preference to the real sector size
(if different) when interpreting the units of other members of the disk
label, such as p_offset and p_size.

> Currently,
>the device strategy routines use d_secsize to interpret
>bp->b_blkno.   If d_secsize does not match the hardware sector
>size, then the device strategy routines will need to be
>modified to do the appropriate conversion.

AFAIK, NetBSD's device strategy routines all use DEV_BSIZE to interpret
b_blkno.  I think it was done to simplify working with the new buffer cache
and layered file systems.  (Of course, that was all implemented while I
wasn't looking. :)  It sounds like you're looking at things from a
different flavor of BSD.  Which one?

>
>| > At the time, I found the following definitions useful:
>| >
>| >   #define FFS_DEV_BSHIFT(fs) ((fs)->fs_fshift-(fs)->fs_fsbtodb)
>| 
>| That should be a constant in the ufs mount structure (the in-kernel
>| thing). We don't need to subtract those constants every time; they aren't
>| going to change.

Well, in working on the LFS code, I figured that with five (!) different
block sizes it would take 20 macros to cover all the conversions, and 10
shift constants.  I did add a few more shift constants to the in-kernel
structure, and a tried simplifying a few existing macros, but then I
discovered that some non-kernel code was using them, and those programs broke.


>As I mentioned, here is the partial
>dumpfs output from a nextstep 3.3 operating system distribution CD:
>
># dumpfs ns33cd.ufs | head -22
>file system: ns33cd.ufs
>endian  big-endian
>magic   11954   time    Sat Nov 12 00:44:21 1994
>id      [ 0 0 ]
>cylgrp  static  inodes  4.2/4.3BSD      fslevel 0       softdep disabled
>nbfree  1406    ndir    3168    nifree  71290   nffree  51
>ncg     45      ncyl    89      size    182272  blocks  176323
>bsize   8192    shift   13      mask    0xffffe000
>fsize   2048    shift   11      mask    0xfffff800
>frag    4       shift   2       fsbtodb 0

This is cool!  It's much the same sort of layout I got from my 640MB
optical disk formatted by NeXTSTEP.  And if it's truly a full ffs partition
on CD, then it proves a theory I had that one could create a ffs file
system with 2K sectors, burn it on a CD, and use it just like a regular disk.

Tell me, did you read this super block from sector 3?  That's where it was
written on my optical disk.  Oh, wait -- I can just read it off my own copy
of the NeXTSTEP 3.3 CD!

Hmmm... very interesting.  There are actually _multiple_ disk labels here,
and except for the first one (on sector 0), they are not aligned on a
sector boundary.  The cd_label_blkno field, which changes for each copy of
the disk label (it's the block # of the label), is given in terms of
512-byte blocks, NOT cd_secsize (# of bytes per sector).

Moving on to the root partition 'a'... well, that's supposed to start on
sector 0, but I don't see anything that resembles a super block... extra
disk labels on sectors 3, 7, and 11... this looks like an aout header on
sector 16... strings for a boot loader on sector 32... another boot loader
on sector 48... Ah, here it is, on sector 84.  That's odd; I wonder where
they came up with that number?

-----------------------
Trevin Beattie          "Do not meddle in the affairs of wizards,
trevin@xmission.com     for you are crunchy and good with ketchup."
      {:->                                     --unknown