Subject: Re: Supporting sector size != DEV_BSIZE
To: Trevin Beattie <trevin@xmission.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 05/31/2002 17:48:23
On Fri, 31 May 2002, Trevin Beattie wrote:

> At 03:04 PM 5/31/2002 -0700, Bill Studenmund wrote:
> >On Fri, 31 May 2002, Trevin Beattie wrote:
> >
> >> Obviously, one or more of the above lines is incorrect and must be changed,
> >> because line 496 is inconsistent with line 481.  But I have no idea what
> >> the correct change should be, because I can't find any formal definition
> >> for the units used by b_blkno.  <sys/buf.h> defines this parameter as the
> >> "underlying physical block number"; but is that a physical sector or a
> >> DEV_BSIZE block?
> >
> >b_blkno is in units of DEV_BSIZE.
>
> Then there are no exceptions?  This should help in auditing the file system
> code.

There shouldn't be.

Have you looked at the three DEV_BSIZE PRs by Koji Imada? They are 3790,
3791, and 3792. Most all of our thoughts about how to fix this are in
terms of one of the three approaches. He died before he was able to
implement any of them.

> >You mean the buffer cache? :-)
>
> Not just that, but the block device drivers (e.g. sd.c), ffs, ufs, layered
> file systems, I/O-related system calls, etc.  I read Chuck's Usenix paper
> on the UBC, and it's very good, but it isn't a design spec.  I'm looking
> more for something that encompasses section 9 of the man pages, with
> additional detail on the parameters / structures passed to and from each
> function, and especially (for solving this particular problem) which
> variables have units of physical sectors (which vary between devices),
> logical blocks (which vary by file system), and the DEV_BSIZE constant.

No, we don't have such a set of documentation.

> And maybe some rationale too, so I know *why* b_blkno is neither the
> physical sector nor the file system block number.

It's not file system block number as that's a higher-layer concept than
the buffer cache is.

It *is* physical sector size as the current kernel design really only fits
working with disks that are all DEV_BSIZE. :-)

For instance, Convex set DEV_BSIZE to 2048, and you had to have specilly
formatted drives to work in their systems (which they of course sold you).

We've attempted to add support for different block sizes over the years,
but it's all ad-hoc.

> I've made some patches to i386/disksubr.c and dev/vnd.c (shown at the end)
> which fixed the boundary check problem, and got newfs to work.  Now I'm
> looking into why the mount procedure can't find the superblock, and I have
> a couple of design-related questions:
>
> Where should the disk label be?  <i386/disklabel.h> defines LABELSECTOR as

Not sure. Disklabeling on x86 is one of the worst examples to look at.

> 1, but when I write the disk label on my 2048-Bps disk, the label is
> written at offset 0x200 on sector 0.  In i386/disksubr.c, we have:
> 	215:	dospartoff = dp->mbrp_start;
> and
> 	247:	bp->b_blkno = dospartoff + LABELSECTOR;

I'd guess the first real sector.

> .  In the MBR partition table, all offsets and sizes are in units of
> physical sectors.  So the right side of the equation would seem to be the
> sector number.  But b_blkno on the left side is supposed to be in units of
> DEV_BSIZE, so the assignment is invalid.  I can correct the first part of
> the assignment with:
> 	247:	bp->b_blkno = dospartoff * (lp->d_secsize / DEV_BSIZE)
> , but should LABELSECTOR be scaled as well or not?  In other words, should
> the disk label be a constant number of sectors from the start of a
> partition, or a constant number of bytes from the start?  It can't be both.
>
> Likewise, where should the superblock be?  At a constant
> (partition-relative) sector offset or a constant byte offset?

I think the best way to look at it is they should be in the same places as
they would be if we recompiled everything with DEV_BSIZE = the size of the
disk and ran the commands naively.

> Here are my patches so far; feel free to comment:

will look at next week.

Take care,

Bill