Subject: Re: buffer priority [Re: unified buffers and responsibility]
To: Milos Urbanek <urbanek@openbsd.cz>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 06/13/2002 16:36:09
On Thu, Jun 13, 2002 at 02:07:20PM +0200, Milos Urbanek wrote:
> On Thu, Jun 13, 2002 at 12:42:57AM +0200, Manuel Bouyer wrote:
> > Hi,
> > I've experimented a bit about this problem of X freeze while a large
> > cp is running, and the target disk is the same as the system disk.
> > 
> > One of the reasons of the problem is that some data gets paged out
> > when they shouldn't be (I see activity on the system disk when doing
> > a large cp on another disk, clearly related to cp).
> > Even setting filemax low (under 10%) doesn't help, and top still reports
> > about 30M allocated to files (of 128M - 70M when kernel and buffer cache
> > are allocated).
> 
> Is that because most of RAM pages are already assigned to file buffers and
> can not be freed until they are written to the disk and freed in biodone()?
> Therefore the pages of running processes are being swapped out?

But this shouldn't happen IMHO. The kernel should wait for the
dirty buffer to be flushed, instead of paging out exec or data. This only
increase the I/O load.

> 
> > 
> > The second problem is I/O priority: buffers of a large, batch I/O have
> > the same priority as a one-buffer I/O on which a process is blocked.
> > This also kills interractive performances (and the disksort() routines
> > probably make this even worse).
> > On my system my test partition is the last one in the disklabel, so I
> > changed disksort with this simple algorithm: the lower the partition
> > number is, the highter the priority of the buffer is.
> > +void
> > +disksort_pri(struct buf_queue *bufq, struct buf *bp)
> > +{
> > +       int part = DISKPART(bp->b_dev);
> > +       struct buf *bq, *nbq;
> > +
> > +       bq = BUFQ_FIRST(bufq);
> > +       if (bq == NULL) {
> > +               BUFQ_INSERT_TAIL(bufq, bp);
> > +               return;
> > +       }
> > +
> > +       while ((nbq = BUFQ_NEXT(bq)) != NULL) {
> > +               if (part < DISKPART(nbq->b_dev))
> > +                       goto insert;
> > +               bq = nbq;
> > +       }
> > +insert:        BUFQ_INSERT_AFTER(bufq, bq, bp);
> > +}
> > 
> > This helps a lot. There is still some slowdown, but the system is now usable
> > when a cp is running (without this, X will freeze completely until the cp
> > completes).
> > 
> > So I think we need something to prioritize I/O at a disk level (not partition
> > level). Even for server use I'm afraid this can cause problems (I'm thinking
> > about my mail server, on which some users have mailboxes of more than 100M).
> 
> Is not the issue that the disk is unable to clean buffers from the queue fast
> enough?

The disk will *never* be able to clean buffer fast enouth (think
dd if=/dev/zero of=somefile)

> I do not think that prioritizing would help in case of one process
> doing a 'cp huge_file somewhere' when there are no other processes interactive
> processes performing an IO (which i think is actually the situation I observe this problem
> myself).

I account swapin/swapout as interractive I/O.

prioritizing would also solve the problem I'm seeing in some real world
application, where an app writing a large file keeps a disk busy, blocking
all other processes with small I/O pending for this disk blocked for several
seconds.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--