Re: In-kernel units for block numbers, etc ...

To: mlelstv%serpens.de@localhost (Michael van Elst)
Subject: Re: In-kernel units for block numbers, etc ...
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Fri, 27 Nov 2015 09:54:14 +0700

This is a reply to a message on the netbsd-users list, which drifted
into a discussion which eventually spawned this thread on teck-kern,
where (post drifting) it is more appropriate...

mlelstv%serpens.de@localhost said:
  | The newfs command queries the sector size, calculates the filesystem
  | parameters and puts them into the superblock. 

Sure, that's what I'd expect.  But if you are faking a 512 byte sector
underlying device, when it queries, newfs will be told 512 byte sectors,
and actual sector count * N sectors (where N is the drive sector size / 512).

That's what you were suggesting with the method of making lvm work by
telling it (manually) a larger number of sectors, I believe.   It is just
the wrong solution.

newfs needs to discover the correct sector size (4K or whatever), and the
correct number of sectors (because it calculates total size from multiplying
the two) - if it gets that data it can build a suitable ffs layout.

Otherwise (without manual overriding) it cannot.

mlelstv%serpens.de@localhost said:
  | FFS and other filesystems don't even try to get that information

We may have a disconnect of terminology here, when I refer to FFS I mean
all of it - which includes the kernel code, but also the userland tools
that support it, including newfs, fsck, dumpfs, tunefs, even dump/restore.

Most of those are irrelevant to the current discussion, but newfs isn't
(as above).

So, when I said "FFS won't (or shouldn't) allow frag sizes smaller than
the sector size", I meant it in that context.  How FFS (the overall system)
chooses to implement that is not all that important.  Here I'd expect
(as you suggest above) that newfs will build a suitable superblock. and
cyl groups, and the kernel part will just use that data.   That's all fine.

But it all depends on getting the right numbers from the underlying device,
not fake ones.

In another message (one from this tech-kern) thread mlelstv%serpens.de@localhost said:
 | >It has two disadvantages that I can see at the minute.   One is that it
 | >would require "large int" fields (at least 64 bits) everywhere,

 | It would also be conceptually identical to the current scenario where the
 | kernel _already uses byte offsets_ (it just doesn't store the low 9 bits
 | which are always 0, thus preventing the use of sectors maller than 512
 | bytes).

Personally, I'd like to avoid restrictions like 
	"thus preventing the use of sectors maller than 512 bytes"
there's no reason for it.   Some of the kernel (particularly in some
of the wedge code, if I recall correctly) already goes to some pains to
allow smaller sector sizes (shifting left, or right, as appropriate).
Most of the kernel doesn't however, and just assumes (where it isn't
assuming that everything is DEV_BSIZE) that the sector size must be bigger
(or the same), never smaller.   That's poor.

Using byte offsets everywhere (outside the filesystem code, I don't
want to alter any of that) would remove that restriction, and as you
say otherwise is conceptually identical to what you want.

It also means less translations, they only ever need to be made when
the code is actually writing a block number to the hardware (etc) - and
it would always need to be made, so it can be done in the drivers at that
point, not by having something translate the values in struct buf, and
hoping that we know where it will be used next.

kre

Follow-Ups:
- Re: In-kernel units for block numbers, etc ...
  - From: Michael van Elst

Prev by Date: Re: In-kernel units for block numbers, etc ...
Next by Date: Re: In-kernel units for block numbers, etc ...
Previous by Thread: Re: In-kernel units for block numbers, etc ...
Next by Thread: Re: In-kernel units for block numbers, etc ...
Indexes:

Home | Main Index | Thread Index | Old Index