Subject: Re: I/O priorities
To: Aidan Cully <aidan@kublai.com>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 06/20/2002 15:36:29
On Thu, Jun 20, 2002 at 08:35:50AM -0400, Aidan Cully wrote:
> [redirected to tech-kern]
> On Thu, Jun 20, 2002 at 01:06:58PM +0200, Manuel Bouyer wrote:
> > On Thu, Jun 20, 2002 at 01:01:35PM +0200, Wojciech Puchar wrote:
> > > i understand. can't we have 2 queues?
> > 
> > This is what I'm saying since the begining :)
> > Yes, we need to give different priorities to different I/O, and having several
> > queues one way to do this (it may not be the best; if we want to keep disksort
> > keeping a single queue may be easier). The problem is to define the
> > priority policy (and eventually to gather some infos from upper layer to help
> > implementing this policy).
> 
> I'm starting to understand this a bit better...  Two things strike me
> about this: 1) it's only a problem when the kernel allocates buffers
> for its own internal use, during read()s and write()s, but not mmap().

mmap can probably trigger the problem as well. Anything that will create
a large buffered write will.

> 2) The problem we'd be trying to solve is almost the same as that
> originally faced by Berkeley when they created the scheduler system --
> that is, to still allow a good level of interactivity when some
> processes want to use all the CPU for themselves, though in this case
> it's processes wanting to use all buffers and disk I/O.
> 
> Maybe an I/O scheduler makes sense as a solution to this problem?  Or
> would that be Too Much Effort?  If so, another solution might be to
> restrict the percentage of available buffers a given process can use
> for read/write...  Keep it small enough, and you wouldn't need to wait
> seconds for dd if=/dev/zero to flush all of its buffers.

An I/O scheduler does probably make sense. Other mechanism can help too.
First we probably need a per-partition I/O queue, instead of per device.
For example, with the current implementation a write barrier is global to a
device, when it only needs to be per-partition (for now I think write barriers
are not used at all, but it doesn't mean we don't need to handle them
properly :)
We also can implement per-partition I/O priorities, which would already be
a win (at last for servers usage, where we usually have one partiton per
usage). We could look at algorithm used in QoS network devices, for example.
Bumping swap priority probably makes sense.

I already tried the per-partition I/O priorities (with a gross hack) and
it helps a lot. We could start with this. It's probably the easiest to
implement. An I/O scheduler will probably require changes in filesystems and
UVM.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--