tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: blocksizes

On Fri, Jan 22, 2010 at 07:36:15PM +0900, Izumi Tsutsui wrote:
> > disk devices are accessed in units of 'blocks', a block can be
> > any size, however NetBSD makes assumptions in many places that
> > a block is 512 bytes or DEV_BSIZE bytes which makes it impossible
> > to use devices with different block sizes.
> > 
> > IMHO there need to be three different ways to specify block
> > offsets and block counts:
> > 
> > 1. in units of blocks of the physical device
> > 2. in units of blocks of DEV_BSIZE bytes
> > 3. in bytes
> > 
> > and we need to establish what units are used where.
> > 
> > 
> > One possible path to support devices where physical blocks are
> > not of size DEV_BSIZE would be
> > 
> > - device drivers use 1.
> > - wedges use 1.
> > - the dk driver translates between 1. and 2.
> > - the buffer system uses 2. and 3.
> > 
> > Filesystems can be pretty agnostic to this as any multiple of DEV_BSIZE
> > will work (but slower if not properly aligned to the physical block size).
> > 
> > The necessary changes are rather small. In particular, dkwedge_info needs
> > to be extended to keep track of the physical sector size so that the dk
> > driver can do the transformations.
> > 
> > Comments?
> - What's the reason to keep DEV_BSIZE constant? Less necessary changes?

Keeping DEV_SIZE at 512 bytes avoids lots of changes.

I think about doing block size translations only in the dk driver.
Changing DEV_BSIZE would break lots of things if you use the
parent device (like wd) directly and I am sure that we will have
users of dk and users of the parent devices at the same time.

> - Do you mean wedge and dk(4) are mandatory to use !512bytes/sec disks?

Yes. This simplifies things a lot.

> - Can we use filesystems on raw devices without label on !512bytes/sec disks?

I think so. Wedges can be created without label and it would be simple
to create wedges from other kinds of information.

> - What about pseudo disk devices like vnd(4), cgd(4), and raid(4)?

Good question.

vnd so far is a simulated physical device with 512bytes/sector.
I don't see any problems in making this flexible (so vnd can
be used to simulate real hardware with different block sizes).
But otherwise it isn't affected.

I haven't looked at cgd and raidframe, but I would assume that
they can talk to dk (and if not, this would be a bug that needs
fixing). There might be other dependencies, in particular
for tools similar to newfs or fdisk that do not use buffered
I/O and that need to be aware of the physical block size.

> - What value should be used for d_secsize in struct disklabel?

This should be the physical sector size of the medium (or maybe
some multiple so you could have faked but compatible geoemtries).

> - UFS file systems have "disk block size" parameters in their superblocks
>   as fs_fsbtodb etc. Which should it be, 1. or 2.?

It should be 2. The filesystem should treat a disk as having the 'logical'
block size DEV_BSIZE.

Support for physical block sizes smaller than DEV_BSIZE of course is
not possible. But I consider this a separate, and for the moment
rather academic, problem.

>   Current newfs(8) uses sectorsize (physical block size) to set it, but
>   fs code passes block numbers calculated by fsbtodb to buffercache(9).

Yes, there are several such calculations that mix sector size and

> IMO, such "less necessary changes" will include much more logical hacks
> which will confuse future developers, as current inconsistent implementation.

I don't consider this a hack. It is an abstraction.

Filesystems work on DEV_BSIZE devices.
dk translates this to physical block sizes.

If you do have some filesystem code that can handle arbitrary
device block sizes (which we currently do not have) you could
again run it on top of a real device again. However, even then
I'd rather keep the dk layer. For one, this is the place that
knows about wedges. And for two, it keeps flexibility.

> I wonder if 512bytes/sector will become legacy or not in future..

Even when 512bytes/sector becomes legacy, it will probably be
replaced by some other fixed size, just because it is easier
to do this.

It will be some multiple of 512 bytes, so the 512bytes/sector
model still holds but you do have to deal with alignment which
is a small problem as the normal situation will be that the
file system block size is still as large as the physical block

Even then, there will be code that only knows about 512 bytes/block.
Dealing with this in a central place (like dk) is simple and
avoids complexity.

                                Michael van Elst
                                "A potential Snark may lurk in every tree."

Home | Main Index | Thread Index | Old Index