Subject: Re: mp->mnt_vnodelist change
To: Andrew Reilly <>
From: Bill Studenmund <>
List: tech-kern
Date: 10/19/2006 11:19:37
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 19, 2006 at 01:33:33PM +0200, Reinoud Zandijk wrote:
> Hi
> On Thu, Oct 19, 2006 at 03:40:20PM +1000, Andrew Reilly wrote:
> > On Thu, 19 Oct 2006 00:01:53 +0200
> > Reinoud Zandijk <> wrote:
> > I've never looked at the code, but I would have thought that
> > there would be some sort of block-sorting/elevator-algorithm step
> > in between "this list of blocks need to go to disk" and "disk
> > drive: seek to xyz; write blocks nnn to nnn+m".  Isn't there?
> >=20
> > If there is, then this reversal probably doesn't matter much, if
> > all it does is pessimize an in-memory sort.
> Well there is some sort of disc queue sorting going on at the disc=20
> interface driver level but thats a peep-hole optimalisation that only wor=
> for some sanity when accessed by multiple processes and asynchronous writ=
> from one processes.

peep-hole optimization? I thought we had an elevator sort in there.

> For synchronous writes this peep-hole optimalisation is not relevant sinc=
> it will wait for each write to complete before it issues a new one.

This hits on the real problem. We need to be able to issue all of the=20
writes as async and wait for all of them to complete. As SANs become more=
and more accessible, we need to be able to have lots of i/o outstanding.

I'm not sure how to fix this for now, so go ahead with the list change.=20
I'm not sure if we have a good methodology for waiting on a set of=20

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.4.3 (NetBSD)