Subject: Re: 1.4.2 Observations
To: Manuel Bouyer <firstname.lastname@example.org>
From: Thor Lancelot Simon <email@example.com>
Date: 03/28/2000 12:16:41
On Tue, Mar 28, 2000 at 06:02:17PM +0200, Manuel Bouyer wrote:
> On Mon, Mar 27, 2000 at 04:44:01PM -0500, Thor Lancelot Simon wrote:
> > On Mon, Mar 27, 2000 at 10:53:28PM +0200, Manuel Bouyer wrote:
> > Nonetheless, it's been my experience that since shortly before the 1.4
> > release, our IDE subsystem has been prone to misbehave in the face of high
> > levels of I/O in ways which do make the whole system feel rather slow.
> I have the same behavior on a system with IDE disks, and SCSI (aha2940UW).
> > I don't understand quite what's going on, but doing things like rsync or
> > find or ls -lR or dump that hit the IDE disks with huge numbers of requests
> > do, in fact, from the statistics, get huge numbers of xfers/sec and very
> > high bytes/sec throughput, and CPU utilization does not, in fact, seem to
> > be particularly high. On the other hand, in the midst of this type of
> > activity even keyboard input can seem sluggish, and if I do something that
> > generates *new* I/O requests to the IDE disk (e.g. 'ls' while an rsync is
> > running in the background) those requests take a *long* time to complete.
> > The first behaviour suggests that too much time is being spent at high SPL,
> > but from examination of the IDE code that doesn't seem correct.
> This seems to be related to higth IRQ load, involving disk I/O or not
> (I've also seen this on machine with higth network load but no disk I/O).
Interestingly, I have a system here with a parallel printer attached that
has 64MB of buffer memory. I have seen over *40,000* IRQs/sec from the
lpt device on this system, while the system feels completely usable. It
can't just be the number of IRQs.
> > Interestingly, using LFS, which makes almost all disk I/O asynchronous,
> > pretty much makes both problems go away.
> I think this is also because I/O are of larger size, so the IRQ load is
This might explain why some SCSI controllers avoid this problem: when you
get an interrupt, it's quite likely you may find out that multiple commands
> > With SCSI disks, they don't
> > seem to appear in the first place. I'd suspect some kind of odd barrier
> > condition with !B_ASYNC buffers, but since we don't do disconnection or
> > multiple command queueing on IDE that doesn't seem likely, either.
> What SCSI controller do you use ?
A variety of them: ahc, bha, and adw. I haven't seen the problem we're
discussing with any of them. I run 'ahc' with tagged queueing turned on,
I don't think it can be just the *number* of IRQs. I think we have to be
spending too much time with too many interrupts blocked in some devices'
interrupt service routines. Otherwise, my system with the fast printer
generating 40,000 IRQs/sec would be useless, and it's fine.
I can't find a change in the period in which I recall this phenomenon
appearing (a month or so pre 1.4) which looks likely to have caused this,