Subject: Re: Custom FIFO filesystem from userspace
To: Matthew Mondor <email@example.com>
From: Bill Studenmund <firstname.lastname@example.org>
Date: 11/29/2005 08:23:07
Content-Type: text/plain; charset=us-ascii
On Tue, Nov 29, 2005 at 09:13:12AM -0500, Matthew Mondor wrote:
> On Mon, 28 Nov 2005 09:27:47 -0500
> Matthew Mondor <email@example.com> wrote:
> I resumed doing some research related to custom storage using raw
> Hmm reading /usr/src/sys/dev/ata/wd.c and physio(9), it appears that
> using the raw device causes no unwanted buffering, if I properly
> understand, so this should be fine.
> About the prefered I/O block size to use, it appears that filesystems
> can use various sizes depending on the wanted size of the FS and block
> indexing methods used; Since I can choose a custom block size, and that
> I saw fdisk(8) report the same amount of sectors per tracks (63) both
> for BIOS and NetBSD geometry, I guess that I could base the system on
> sector sized blocks, or perhaps track sized blocks and use custom I/O
> buffering.... I however have no idea if this geometry is virtual, or if
> it actually corresponds to the hard drive hardware and that I'll gain
> any advantage basing my block size on the sector or track size.
Modern disks don't have a single geometry. 63 sectors/track sounds like a=
remnant of the BIOS geometry. I doubt you'll gain much advantage using the=
track size. I think you will, however, gain an advantage using large=20
writes. Note that our i/o system has a limit in i/o transaction size,=20
which is 64k as I recall.
> I have seen some cache related ioctls in wd.c. I'm unsure if this
> should be used to retreive/set/flush (DIOCGCACHE, DIOCSCACHE,
> DIOCCACHESYNC). I guess that since operations on raw devices are
> unbuffered, that the use of fsync(2) will probably be irrelevant to
> ensure that a the data is synchronized to disk, though. Would it
> however make any sense to set the buffer size related to my block size,
> and to use the flush ioctl after commiting transaction data and related
> log entries?
You are partially incorrect. The cache in question for these operations is=
the cache in the drive. While using the raw device ensures that the kernel=
does no caching, chances are that the disk itself is doing caching. You'll=
probably be disappointed with the performance if it doesn't cache.
> I yet can't determine if it's possible for a block to be only partially
> written to disk in the event of a crash. The latest transaction logs
> will need to be properly commited to disk after writing the new file(s)
> data out of the buffer, so this is an important aspect I must look into.
I think what will happen is you will either get the unmodified block=20
(write never made it out of cache), you will get the written block, or=20
there will be an i/o error with the block.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
-----END PGP SIGNATURE-----