port-i386: i386 isa interrupt latency

Subject: i386 isa interrupt latency
To: None <tech-kern@NetBSD.ORG, port-i386@NetBSD.ORG>
From: Ronald Khoo <ronald@cpm.com.my>
List: port-i386
Date: 07/06/1995 14:13:06
Executive summary:  I get occasional poor interrupt latency on
NetBSD-current/i386 (yesterday's sup).  Please help.

[ Please adjust To: the appropriate mailing list, I'm not sure
  which one this question belongs to. ]

Hi.  I've potentially got a project to do using NetBSD.
The details are unimportant (it's actually a T1 speed PPP interface
using very stupid hardware) but the critical issue is that of
interrupt latency.

Basically, the interrupt latency response I need is similar to
what would be needed to run the 16450 at somewhere between
57600 and 115200 (but of course, with 1.8M clock chip and 16x prescale,
you can't get anything in between) so in order to check the feasibility
of my proposed project, I was asked to show that in general, it is
possible to do just that on a medium speed PC (486SX-33, 128k cache and
486DX2-66, 256k cache are the machines on my desk for this) with
fairly dumb ethernet cards (NE2000, 16 bit compex WD clone which 
probes as an WD8003E in 8 bit mode (?)).

[ This is of course against conventional wisdom, ("you need a UART with
  a FIFO"), but in a controlled, embedded environment, I want to 
  know exactly why not, and if its possible to fix it. ]

First test is to see what the "normal" interrupt latency is.
Okay, hook up an oscilloscope to 486SX-33, IRQ line, and a
prototype board, and hack com.c to set bits on the prototype board.
Typical IRQ->comintr despatch time is 25 microseconds. Whooo!
I only really need about 150.  com.c at 115200 needs around 100.
Should be a cinch.  NOT.

Test is to write a user mode program with VMIN=16 VTIME=9 to
echo loop, and run an async bit error rate test from a protocol
analyser.  We get loads of silo overflows when there's net traffic
and a few when there's none.

Okay, so the ethernet interrupts are around 200 microseconds, and
the ethernet can interrupt a com interrupt, extending that short 20 us
interrupt to 220.  Hm.  Not good.  Right so we dive into isa_machdep.c
(the new one with all software interrupts at lower priority than
hardware ones) and 

	imask[IPL_TTY] |= imask[IPL_BIO] | imask[IPL_NET] | imask[IPL_CLOCK]

and add options REORDER_IRQ for good measure.  (yuk, but i'm only testing,
right ?)

Okay, so the massive silo overflows caused by the ethernet go away, but
there are still occasional silo overflows.  (ping -s1492 -f from a
next door PC to test)

Then we strip the PC right down (yep, even pull out the ethernet
card) and see what happens.

At 38400, no errors.
At 57600, occasional silo overflows.
at 115200, more frequent silo overflows.

WHY ?  I'm looking for 100 us, I've got a typical 25 us latency,
I've blocked and deprioritised all other interrupts and STILL
I get the occasional overrun.  Where's the latency coming from?
Can someone explain who's holding up the CPU ?  Can it be fixed?
Do I have to drop my project and do something else instead ?

As a postscript to this one of my colleagues saw my tests, and
did the same tests on FreeBSD instead, and it passes the test!
(FreeBSD 2.0.5, 486SX-25, 256k cache, cheap NE2000 clone,
16450 clone on cheap multifunction card)  NOT A SINGLE OVERRUN
with TWO simultaneous ports at 115200, with a standard kernel,
even with ping -s1492 -f hitting it.

Help please ?  I'm way outta my depth.  And I'd *much* rather not
have to switch to FreeBSD.

-- 
/* ronald@cpm.com.my +60 3 241 5232  |  ronald@demon.net +44 181 371 1000 */