Subject: Re: stray interrupt ipl 0x7
To: khaqq <khaqq@free.fr>
From: David Laight <david@l8s.co.uk>
List: port-sparc
Date: 07/29/2005 18:52:24
On Fri, Jul 29, 2005 at 01:18:03PM +0200, khaqq wrote:
> > 
> > Anyway, maybe hme(4) is missing some interrupts ?
> 
> This happens here quite often under full network load. The CPU seems to
> spend 70-80% of its cycles in "interrupt" according to top (interrupt handler ?).
> Transferring about 1GB through the box makes the error happen about 2 or 3
> times.
> That's on a SS5/110 with 32MB of RAM, never seems to swap, QFE 2.0,
> NetBSD 2.0.
> What would make it "miss" some interrupts ?

You've got it backwards!

What actually happens is that device requests an interrupt while the
interrupt routine is active servicing a previous interrupt.

The ISR will process the event for the new interupt, write to the
hardware to clear the IRQ, and then exit.
At this point we start a race between the hardware seeing the write,
clearing the IRQ and the (now inactive) IRQ propogating to the CPU,
and the CPU exting from the interrupt handler and taking the interrupt.

If/when the CPU wins it (typically) fails to find an ISR that wants
to service the interrupt and outputs the message.

On a sparc system the cpu puts writes into a FIFO (the store buffer)
and will perform all the reads associated with the IRET before
doing the final write(s) done inside the ISR - so the IRQ line
if often still active if cleared at the end of the ISR.
(Posted writes on PCI busses - especially if the actual device is
behind a few PCI-PCI bridges - just make it more likely.)

The traditional fix is to perform a read-back of the written address
to flush the write through the store buffer and PCI bridges.

	David

-- 
David Laight: david@l8s.co.uk