Subject: Re: Custom FIFO filesystem from userspace
To: Matthew Mondor <mm_lists@pulsar-zone.net>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 11/29/2005 08:23:07
--0eh6TmSyL6TZE2Uz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Nov 29, 2005 at 09:13:12AM -0500, Matthew Mondor wrote:
> On Mon, 28 Nov 2005 09:27:47 -0500
> Matthew Mondor <mm_lists@pulsar-zone.net> wrote:
>=20
> I resumed doing some research related to custom storage using raw
> devices:
>=20
> Hmm reading /usr/src/sys/dev/ata/wd.c and physio(9), it appears that
> using the raw device causes no unwanted buffering, if I properly
> understand, so this should be fine.
>=20
> About the prefered I/O block size to use, it appears that filesystems
> can use various sizes depending on the wanted size of the FS and block
> indexing methods used;  Since I can choose a custom block size, and that
> I saw fdisk(8) report the same amount of sectors per tracks (63) both
> for BIOS and NetBSD geometry, I guess that I could base the system on
> sector sized blocks, or perhaps track sized blocks and use custom I/O
> buffering....  I however have no idea if this geometry is virtual, or if
> it actually corresponds to the hard drive hardware and that I'll gain
> any advantage basing my block size on the sector or track size.

Modern disks don't have a single geometry. 63 sectors/track sounds like a=
=20
remnant of the BIOS geometry. I doubt you'll gain much advantage using the=
=20
track size. I think you will, however, gain an advantage using large=20
writes. Note that our i/o system has a limit in i/o transaction size,=20
which is 64k as I recall.

> I have seen some cache related ioctls in wd.c.  I'm unsure if this
> should be used to retreive/set/flush (DIOCGCACHE, DIOCSCACHE,
> DIOCCACHESYNC).  I guess that since operations on raw devices are
> unbuffered, that the use of fsync(2) will probably be irrelevant to
> ensure that a the data is synchronized to disk, though.  Would it
> however make any sense to set the buffer size related to my block size,
> and to use the flush ioctl after commiting transaction data and related
> log entries?

You are partially incorrect. The cache in question for these operations is=
=20
the cache in the drive. While using the raw device ensures that the kernel=
=20
does no caching, chances are that the disk itself is doing caching. You'll=
=20
probably be disappointed with the performance if it doesn't cache.

> I yet can't determine if it's possible for a block to be only partially
> written to disk in the event of a crash.  The latest transaction logs
> will need to be properly commited to disk after writing the new file(s)
> data out of the buffer, so this is an important aspect I must look into.

I think what will happen is you will either get the unmodified block=20
(write never made it out of cache), you will get the written block, or=20
there will be an i/o error with the block.

Take care,

Bill

--0eh6TmSyL6TZE2Uz
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFDjIBrWz+3JHUci9cRAr3WAJ4gfw9Xy7YBItpPHn+UKq6mYLOucQCfZaaV
xJJKFKnKqk1onVq564PhwWo=
=rEMk
-----END PGP SIGNATURE-----

--0eh6TmSyL6TZE2Uz--