Subject: RE: Coallesce disk I/O
To: Alfred Perlstein <bright@mu.org>
From: Gordon Waidhofer <gww@traakan.com>
List: tech-kern
Date: 01/26/2004 19:51:26
> > I/O coalescing by creating page aliases using the MMU is incredibly
> > inefficient on some architectures, either because of the cache flushing
> > it forces or because of the cost of MMU operations.
> 
> Yes, the remapping stuff is wasteful ....

pmap stuff is really expensive, especially on some
architectures and especially on multiprocessors. And,
as you say, in the end it amounts to very expensive
pointer memory.

Forming a "super bp" above strategy() really isn't the
way to go. Here's a little thought experiment.

A filesystem/database/videoserver/other generates a
huge pile of I/O requests (bufs). These get handed
into strategy(). Wow, it isn't a disk but an LVM.
The requests get dispursed across the constituent
disks. Now is a good time to coallesce.

If the first batch of bufs had been coallesced,
all that would happen is the LVM would go through
the expensive step of taking them all apart again.

I did look at vfs_cluster.c on FreeBSD. But just
briefly. It doesn't look like its coallescing. It's
doing the "super bp" thing and only for file data.
Don't smack me if I misinterpretted vfs_cluster.c.
Thanx for the pointer.

Consider another thought experiment. sync() flushes
a big pile of inodes. That means lots of inode blocks.
Inode blocks have a habit of being consecutive on disk.
Cool. So those 37 inode blocks could be delivered in
a single, coallesced I/O. Trust me, this is a huge
performance win under heavy load. 1/37 the interupts.
1/37 the CPU power. 1/37 the access latency. Big win.

Coallescing I/O isn't going to do anything about softdep
bugs. I just wanted to start a thread since there was
discussion about performance under heavy load.
It's been good. Nothing urgent, though.

Cheers,
	-gww