Subject: Re: Supporting sector size != DEV_BSIZE
To: Trevin Beattie <trevin@xmission.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/10/2002 10:26:24
On Mon, 10 Jun 2002, Trevin Beattie wrote:

> At 10:34 PM 6/10/2002 +0700, Robert Elz wrote:
> >
> >Since no-one else answered this, and since di_blocks was my code
> >originally, a long long time ago, perhaps I can explain what is
> >going on there.

Heh. :-)

> >The theory is that di_blocks is in very well known constant units,
> >that are known by everyone.   If the count was in units of fragments
> >then applications (like du, ls) that extract the information would
> >need to have a way to know the fragment size before they could use
> >the information.
>
> That makes perfect sense, and answers my last question.
>
> >So, di_blocks isn't really supposed to be in DEV_BSIZE units, it is
> >supposed to be in 512 byte block units.   But it happens that at the
> >time, DEV_BSIZE==512 was one of the unchangable constants of the
> >universe (kind of like pi=3.14159... or NULL=0) and the distinction
> >between things wanting the constant number, and things wanting device
> >blocksize units was very much blurred (as you're discovering).
> >
> >An alternatiive implementation might have had di_bsize recorded in
> >frags in the filesystem, and then converted to a well known constant
> >unit in stat() (etc) before being made visible to average userland
> >utilities.   That was never considered at the time this was being
> >implemented, there simply was no motivation to look into things that
> >deeply.
> >
> >What's important I guess is that stat() returns a count of 512 byte
> >blocks, however you want to make the filesystem (and filesystem
> >cognisant utilities) behave here.
>
> This got me to thinking about what the POSIX and SUSv2 standards have to
> say about the stat() function, so I poked around some drafts I have.  The
> 1990 POSIX standard, BTW, does not include st_blocks in struct stat; this
> was added in the 200x version.  There is also another new member,
> st_blksize, which is defined as "the preferred I/O block size for this
> object".  But strangely, the definition of st_blocks as the "number of
> blocks allocated for this object" does not define what the size of those
> blocks are, esp. whether the blocks are a constant size or in terms of
> st_blksize.  The definition of the data type blkcnt_t is even more vague:
> "Used for file block counts." :-P

Yes, stat always uses 512-byte blocks.

> The only reference to a specific size that I could find was in the
> rationale section for the du(1) program:
>
> "The use of 512-byte units is historical practice and maintains
> compatibility with ls and other utilities in this volume of IEEE Std
> 1003.1-200x. This does not mandate that the file system itself be based on
> 512-byte blocks. The -k option was added as a compromise measure. It was
> agreed by the standard developers that 512 bytes was the best default unit
> because of its complete historical consistency on System V (versus the
> mixed 512/1024-byte usage on BSD systems), and that a -k option to switch
> to 1024-byte units was a good compromise. Users who prefer the 1024-byte
> quantity can easily alias du to du -k without breaking the many historical
> scripts relying on the 512-byte units.
>
> "The -b option was added to an early proposal to provide a resolution to
> the situation where System V and BSD systems give figures for file sizes in
> blocks, which is an implementation-defined concept. (In common usage, the
> block size is 512 bytes for System V and 1024 bytes for BSD systems.)
> However, -b was later deleted, since the default was eventually decided as
> 512-byte units."
>
> Neither of the standard drafts I looked at mentions the macro S_BLKSIZE,
> but we have it in <sys/stat.h> defined as 512.  Would there be any
> objection to replacing btodb() with an expression using S_BLKSIZE
> everywhere that di_blocks is used?

In kernel, no. On disk, well, yeah.

Trevin, what you're trying to do is a wide-sweeping kernel change. When
doing these things, it's usually best to come up with a plan and stick
with it. You're going to run into weird cases, and having a coherent
design will help you though them. I'd suggested one plan, which I thought
you'd agreed to. That is to make the "reference" be for a file system be
whatever a kernel & userland set with DEV_BSIZE == that disk size would
do.

Using S_BLKSIZE would not be following that plan, since S_BLKSIZE won't
vary with disk size.

Also, you would hose anyone who happened to have already set DEV_BSIZE to
something else & made file systems. Not sure if *anyone* has, but they
might have.

You could choose to do that, but that would be changing the plan. Is there
a reason to change the plan?

Take care,

Bill