Subject: Re: I/O priorities
To: Greywolf <greywolf@starwolf.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 06/20/2002 21:26:51
On Thu, Jun 20, 2002 at 12:07:27PM -0700, Greywolf wrote:
> As a non-kernel non-engineer (I know just enough to be dangerous), it
> does strike me that ordering writes per partition is asking for trouble;
> should the writes not be ordered per physical device?
> 
> Take this into account:
> 
> 	Another I/O is currently holding up the write queue long
> 	enough to store the following two writes:
> 
> 	write is scheduled for blocks 229-383 of Nd0h (absolute
> 		blocks 32999829-32999983)
> 	write is scheduled for blocks 128-255 of Nd0a (absolute
> 		128-255 plus change where Nd0a[0] != Nd0[0])
> 
> If we order per partition -- at least by default, with no priorities
> given to partitions -- we have to seek to the middle or near the end
> of the disk to scribble something (time may be negligible to
> non-trivial), and then we have to seek BACK to near the beginning of the
> disk (for which time will NOT be non-trivial).

This is exacly what we have now: ordered per devices. And this is in part
responsible of the behavior whe have now (I tried disabling completely
write ordering, handling requests in the order they come: this helps a bit
for the problem we're talking about). Here's an example of what happens:
- a) a bunch of sequencial I/O requests in the middle of the disk are queued
  I/O to disk are started, and we go do something else while it complete.
- b) while waiting for a) we queue a single I/O for the end of the disk.
  With write ordering it's appenned to the queue (we're still in the
  middle of the disk).
- another bunch of sequential I/O for the middle of the disk are queued.
  Because of write ordering, they're inserrted between a) and b).

With a large buffer cache, this happens. Some pages are freeed, and the
process manage to write a few more megs before the queue is empty enouth
to come the single I/O request. This I/O request stay at the end of the
queue forever.

Sure we should still try to keep requests to disk ordered. But there
should be limits to this: once one area of the disk has been serviced enouth,
go visit other areas - even if there are still requests pending for this
area, because in some case there is *always* requests for this area, and so
we will *never* visit other areas.

> Actually, that will happen whether or not we assign priorities to
> different partitions, but I'm leaving it in for train-of-thought.
> 
> Anyway, wouldn't it still be best to order the writes per disk rather
> than per partition?
> 
> An I/O scheduler, would that be something like the process scheduler
> where the more contiguous time a write process has, the lower its
> priority gets on the I/O end?

Could be. We need to think about it more. And read papers, it's OSes like
solaris have probably solved.

> 
> I think it should apply to hard drives only, personally, as if you do
> something like that with a CD-RW, you run the risk of an underrun.
> But, then, what do I know?

When you write a CD-RW, commands are sent directly by the burning process
to the device. It doesn't go through the I/O queue.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
--