Subject: proposal for non-512 bytes/sector block device
To: None <tech-kern@NetBSD.ORG>
From: Koji Imada - je4owb/2 <email@example.com>
Date: 06/25/1997 20:17:15
I was working to use 640MB MO(2048bytes/sector) with ffs.
In current NetBSD implementation, 512 bytes/sector media was
common. So there is constant DEV_BSIZE(which is 512 and DEV_BSHIFT is
9) and it was referred from file system and device driver code. And
macro's like btodb() and dbtob() is used in many places.
Because of that, It is not trivial to use non-512 bytes/sector media
But nowadays, there are many non-512 bytes/sector media like CD-ROM,
MO and floppys. I think it is better to remove DEV_BSIZE dependency.
I consider about file system, buffer cache, raw device and physical
I will use following definitions:
1.1. physical sector size
Physical access unit of media. Device drivers will access
media in this unit. For example, 256, 512 for HDD, 512, 2048
for MO, 2048 for CD-ROM.
1.2. disklabel sector size
Sector size recorded in disklabel. This is usually equal with
physical sector size and in the case of there is no disklabel, =
this is equal too.
disklabel information(media size, partition offset, partition
size and etc) should be recorded in this unit.
1.3. block io unit or buffer cache interface unit
This is unit of buffer cache interface or request to *strategy =
routine(b_blkno of struct buf). In current implementation,
this is fixed to DEV_BSIZE.
In the case of raw device, this is unit of physio().
1.4. file system sector size
This is sector size of underlying media assumed when creating
file system. This is usually equal to physical sector size of
In regard to these definition, current implementations are:
2.1. physical sector size
device driver dependent.
2.2. disklabel sector size
varies on device. some device driver uses physical sector size =
and some device driver uses DEV_BSIZE for any physical sector
size. Partition boundary check is ambiguous because of this.
2.3. buffer cache interface or block io unit
Fixed to DEV_BSIZE. So, device drivers needs translation for
physical sector size which is different from DEV_BSIZE.
2.4. file system sector size
Ffs and msdosfs assumes that block io unit is equal to
physical sector size. Request to block device is issued in
file system sector size unit. This can't work with non-512
bytes/sector media with condition of 2.3.
On the other hand, cd9660 translates block number and issues
in DEV_BSIZE unit.
It is major promise to make ffs and msdosfs available with non-512
bytes/sector media. I hold up following objective.
3.1. make ffs, msdosfs available with non-512 bytes/sector media.
3.2. make ffs, msdosfs available with file system image for other
media(different size, physical sector size and etc). This may
be restricted by file system structure and physical sector
size of media.
3.3. To use disklabel sector size different from physical sector
size. This is convenient when working disk image for other
disk(which may have different physical sector size).
3.4. Dividing file system layer and underlying layer, it is possible
to use any file system on any block device if file system and
physical sector restriction permits.
Objective 3.4 is convenient to handle cd9660 image on hdd, mo, or
vnode disk(to make cd9660 image for CD-R) or to make image for hdd, fd
or mo on vnode disk.
Consider design and implementation to achieve those objectives.
4.1. about physical sector size and disklabel sector size
To achive 3.3.
support disklabel sector size's different from physical sector =
size(n * physical sector size) with device driver.
Expand functionality of vnode pseudo disk device(as vnode disk
in the following) to handle block device. And emulate
different physical sector size from device's physical sector
Finding disklabel on the disk is difficult for 4.1.1. it may be
difficult to implement 4.1.1 on all device driver. On the other hand,
4.1.2 method is available for all block device. It may be reasonable
to use mainly 4.1.2 and 4.1.1 may be implemented if device driver can
I modified vnode disk driver to implement 4.1.2 and it worked. it is
assumed in the following that physical sector size is equal to
disklabel sector size.
4.2. about buffer cache interface or block io unit
The focal point is to keep block io unit DEV_BSIZE fixed or not. If
not changed, it is easy to read sector includes n'th byte from first
and buffer cache interface will not be modified. But all file system
must translate their request to block device in DEV_BSIZE unit and
could not access in smaller than DEV_BSIZE unit.
Because block io unit is also unit of buffer cache interface, this
change may influence to implementation of buffer cache.
I think this change will work with NetBSD-current of 97/06/05(My
kernel is working :-). =
The other problem caused by this change is access to raw device. When
accessing raw device on NetBSD, physio() calculates block number to
read using btodb(). This is wrong for non-DEV_BSIZE block io unit.
To avoid this problem, new physio() which has extra block size
argument is necessary. And all raw device access must call this
function with appropriate block size(maybe physical sector
size). Because physio() is called from *read()/*write of each device
in NetBSD, Modifying *read()/*write() of non-512 bytes/sector capable
device to call with extra block size argument is sufficient.
Advantage of changing block io unit to natural size of each device is
that implementation of block device becomes natural and intuitive and
it is available to access in physical sector unit in case of it is
smaller than DEV_BSIZE(like 256 bytes/sector).
"Natural" size means "disklabel sector size =3D=3D physical sector size"=
from consideration of 4.1.
4.3. about file system implementation
There are typical cases of file system implementation. Ffs is assuming =
"block io unit =3D=3D file system sector size" and cd9660 is assuming
"file system sector size !=3D block io unit" and require translation.
In the case like ffs, it is major problem that block io unit is fixed
or not. If block io unit is fixed, file system like ffs can't handle
file system sector size other than DEV_BSIZE. But it is not general
policy because cd9660 can handle such case.
If block io unit is natural size of each device, file system like ffs
can handle any file system sector size as long as file system is
structured properly(file system sector size =3D=3D physical sector size)=
And file system whose file system sector size is different from
physical sector size may be handled with vnode disk driver of 4.1.2.
So, changing block io unit to physical sector size has lowest impact
to file system code(cd9660 needs more modification than ffs and
On the other hand, if file system permits "file system sector size !=3D
(block io unit =3D=3D physical sector size)", file system implementation=
and buffer cache interface is divided and it doesn't need vnode disk
driver of 4.1.2 to achieve 3.4.
From=20these investigation, there is some design of file system
according to policy of 4.2.
require "file system sector size =3D=3D block io unit =3D=3D physical
sector size". Although file system sector size is restricted to =
physical sector size, this is avoided with vnode disk driver
"block io unit =3D=3D DEV_BSIZE" and file system permits "file
system sector size !=3D DEV_BSIZE". This implementation can't
handle smaller size than DEV_BSIZE properly and block number
translation is necessary in both file system and block device
driver. But this achieves all objective without vnode disk
"block io unit =3D=3D physical sector size(disklabel sector size)" =
and file system permits "file system sector size !=3D block io
unit". This is most flexible and powerful but complicated.
For file system design, it is important to allow "file system sector
size !=3D block io unit" or not. Difference between 4.3.2 and 4.3.3 is
just translation unit.
But in current implementation of ffs, there is only struct fs as file
system information. So those information which depends on
implementation rather than file system can't be stored. Because of
this, we must use spare area of struct fs for these purpose. To
resolve this problem, we must divide in-core information for file
system and struct fs(superblock in file system).
What modification is necessary for each design?
For any case, block device driver and disklabel related
routines(especially bounds_check_with_label()) for disklabel sector
size. This depends on block io unit of 4.2,it is assumed in the
following that disklabel sector size and physical sector size
relation and disklabel related routines(reading, bound check and etc) =
are resolved according to block io unit.
For block io unit, raw device interfaces must be modified if it is not =
fixed to DEV_BSIZE. As described before, physio()(or some equivalent)
must accept block io unit(or block size) and *read()/*write() must
call it with current disklabel sector size.
5.1. to implement 4.3.1
Each block device drivers treat block number passed to *strategy in
disklabel sector size unit. This seems to be natural implementation.
Ffs/msdos doesn't needs modification basically as described in
4.3. But for cd9660, bmap handling must modified.
5.2. to implement 4.3.2
In this case, block io unit is fixed to DEV_BSIZE. file system must
translate their request to DEV_BSIZE unit and block device driver must =
re-translate it to disklabel sector size and process it. To do this,
we must calculate translation parameter when mounting file system and =
store it as new field in struct fs(there is some spare area). And when =
translating block number, fsbtodb() and dbtofsb() must use this value
rather than fs_fsbtodb in struct fs. In msdosfs, modifying to use
macro's like fsbtodb() and dbtofsb() when requesting to block device
5.3. to implement 4.3.3
Although file system code should be modified like 5.2, translation
parameter should be calculated from disklabel sector size which maybe
got with ioctl(DIOCZGINFO?) rather than DEV_BSIZE. In cd9660,bmap code =
should be modified to translate block number to disklabel sector size
got with ioctl rather than DEV_BSIZE.
Applying those modification, though 4.3.1 requires appropriate
disklabel sector size, 4.3.2 and 4.3.3 can handle any file system on
any image of block device.
I think method of 4.3.3 is most flexible though complicated.
5.1, 5.2, 5.3 is sent as kern/3790, kern/3791, kern/3792. please look
Additionary, After ffs is modified to support non-512 bytes/sector
media, new problem appears. It is di_blocks in struct dinode. What is
propper unit of this? In current implementation, this is
DEV_BSIZE(which is counted with btodb() and dbtob()). But i think
file system sector size is appropriate. Any comment about this
Koji Imada - je4owb/2