Subject: Re: Supporting sector size != DEV_BSIZE
To: Trevin Beattie <trevin@xmission.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/11/2002 17:55:57
On Mon, 10 Jun 2002, Trevin Beattie wrote:

> At 10:26 AM 6/10/2002 -0700, Bill Studenmund wrote:
> >On Mon, 10 Jun 2002, Trevin Beattie wrote:
> >> Neither of the standard drafts I looked at mentions the macro S_BLKSIZE,
> >> but we have it in <sys/stat.h> defined as 512.  Would there be any
> >> objection to replacing btodb() with an expression using S_BLKSIZE
> >> everywhere that di_blocks is used?
> >
> >In kernel, no. On disk, well, yeah.
>
> I can't see how to separate the kernel's copy of di_blocks from the
> di_blocks stored on disk, unless you add some extra code to the functions
> that read/write inodes.

No, what I meant was that when we're looking at numbers that live only in
the kernel (like block numbers in the buffer cache), DEV_BSIZE is fine.
But when we look at numbers that hit the disk, let's use sector size.

You're right it'd be silly to keep the number in sectors on disk but
convert to DEV_BSIZE in the kernel. :-|

[snip]

> >Also, you would hose anyone who happened to have already set DEV_BSIZE to
> >something else & made file systems. Not sure if *anyone* has, but they
> >might have.
>
> As I've said before, I don't want to do anything that will break existing
> file systems, if I can avoid it.  OTOH, since the existing kernel is broken
> WRT sector size != DEV_BSIZE, I think it's safe to base any changes on the
> premise that DEV_BSIZE == sector size on all working file systems.

I agree. :-)

> I'm sure your suggestion will break my existing trial fs where the sector
> size == 2048 and DEV_BSIZE == 512, but that's why it is a trial fs, and not
> production! ;^)

:-)

> >You could choose to do that, but that would be changing the plan. Is there
> >a reason to change the plan?
>
> Remember, what triggered the suggestion was Robert's view on what the
> *intention* was for di_blocks:
>
> > At 10:34 PM 6/10/2002 +0700, Robert Elz wrote:
> > >The theory is that di_blocks is in very well known constant units,
> > >that are known by everyone.   If the count was in units of fragments
> > >then applications (like du, ls) that extract the information would
> > >need to have a way to know the fragment size before they could use
> > >the information.
> > >
> > >So, di_blocks isn't really supposed to be in DEV_BSIZE units, it is
> > >supposed to be in 512 byte block units.   But it happens that at the
> > >time, DEV_BSIZE==512 was one of the unchangable constants of the
> > >universe (kind of like pi=3.14159... or NULL=0) and the distinction
> > >between things wanting the constant number, and things wanting device
> > >blocksize units was very much blurred (as you're discovering).

Yes, but nowadays those utilities use stat() calls, and we convert to
512-byte-blocks for that. So we can use different units for stat() &
on-disk.

> I was also concerned because there was nothing in <ufs/ufs/dinode.h> to
> indicate either the sector size, the fs fragment size, or the value of
> DEV_BSIZE used by the kernel that created the file system.  Just
> considering the lack of a unit made me think that a constant would be the
> better way to go.
>
> However, after looking into the kernel code that uses di_blocks (which in
> most cases is handled by the i_ffs_blocks macro), it would appear that all
> affected functions also have access to struct fs for the inode's file
> system.  If there are no exceptions, then yes, we can define the units of
> di_blocks to be physical sectors, using the NSPF macro.

Sounds good (though I'm not familiar with NSPF, but if it works, great).

> Then we just have to make sure the stat() system call converts di_blocks
> from sectors to S_BLKSIZE units before storing the number in st_blocks.

Already done.

VOP_GETATTR() returns va_bytes, so the conversion is in the file system.
:-)

Take care,

Bill