current-users: Re: Transmit interrupts, fxp driver for Intel 82557/8/9 Ethernet

Subject: Re: Transmit interrupts, fxp driver for Intel 82557/8/9 Ethernet
To: None <current-users@netbsd.org>
From: Hal Murray <murray@pa.dec.com>
List: current-users
Date: 05/24/2000 01:20:42
Sorry I didn't get back to this sooner.

> Actually, yes, the fxp driver *does* use transmit interrupts.  If you
> take a look at fxp_start(), at the end of that function:

Argh/blush.  I was working with 1.4.2 rather than current.  Sorry 
for any confusion.

Is there a better list to discuss this on?

Is there going to be a 1.4.3?  If so, is this quirk worth chasing/fixing?

Long message warning.  Here goes...

-----

I've been hacking/thrashing.

All my stuff got a lot better when I changed the loop control at 
the top of fxp_start from:
	while (ifp->if_snd.ifq_head != NULL && sc->tx_queued < FXP_NTXCB) {
to:
	while (ifp->if_snd.ifq_head != NULL && sc->tx_queued < (FXP_NTXCB-1)) {

That is I changed it to use only 127 out of 128 slots.

But I can't figure out why 128 doesn't work. 

Every time I try using all 128 slots, something nasty/obscure happens.  
When I use only 127, everything works as expected.  It fixes both 
the big-TCP-window glitch and the UDP-blast-em case. 

That might explain this observation too: 

> We have a NetBSD 1.4.1/i386 box being used as a generic router on 
> our network and we were running into "fxp0 timeout" problems every 
> few days (but with seemingly random intervals between), but especially 
> during peak traffic periods.  It was enough to make the box unreliable 
> as a router so we switched to ex cards instead. Peak traffic was 
> probably around 4-5Mb/s bidirectional with about 1000pps per direction 
> being forwarded (just a ballpark estimate), standard mix of realworld 
> net traffic. 

I'm hoping that somebody familiar with the hardware/driver can see 
why 128 doesn't work by looking at the code.  I've tried several 
times and can't find anything. 

I've patched the watchdog code to print more info:

  	printf("%s: device timeout, TXQ=%d, SND=%d\n",
	   sc->sc_dev.dv_xname, sc->tx_queued, ifp->if_snd.ifq_len);

From the tail of dmesg after running the TCP big window case:

    de0: enabling 100baseTX port
    fxp1: device timeout, TXQ=128, SND=9
    fxp1: device timeout, TXQ=128, SND=6
    fxp1: device timeout, TXQ=128, SND=6
    ...

Humm.  I hadn't noticed this before, but the effective transmit queue 
size is actually 2*FXP_NTXCB.  There can be FXP_NTXCB mbufs on the 
ifp->if_snd queue and another FXP_NTXCB mbufs that have been setup 
on the hardware control blocks.  The packets the hardware knows about 
have been dequeued from ifp->if_snd so they don't get counted there. 

-----

I've been thinking about what is the right way to handle transmit 
interrupts.

None clearly doesn't work right in the special case of transmit-only. 
That's pretty unlikely in real life when anything interesting is 
going on.  (Humm.  Suppose I have 2 systems connected by a hub and 
the other machine gets powered off.)

Interrupt-on-every-packet is easy to understand and simple to code, 
but most of the time, it wastes CPU cycles on transmit interrupt 
overhead. 

The -current driver does an interrupt on the last packet of the chain.  
For short packets, that's the same as interrupt on every packet.  
(For the machines I'm using, the mode shift happens at around 200 
bytes.) 

The FreeBSD driver requests an interrupt when it gets near the end 
of the queue.  (It uses 120 out of 128.)  This seems reasonable.  
The idea for interrupting before the end is to keep the hardware 
running at full speed.  [I can't see an easy way to think about what 
happens when the queue gets full to make interrupts and there is 
receive traffic that is cleaning up the front of the transmit queue.  
I think it generates some interrupts but the exact number will depend 
upon timings.]  

I've been testing with a scheme that requests an interrupt on the 
last packet if the queue is over 1/2 full.  This was easy to code.  

I was going to say that only case where this won't work right/cleanly 
was when you send a small clump and then nothing else happens before 
the watchdog goes off.  But there is another interesting case. 

Consider the normal UDP blast-em case.  My test code sends as fast 
as it can until it gets an error.  Then it sleeps for a tick.  The 
interrupt on the last packet will happen somewhere between ticks.  
Between the interrupt and the next tick, the transmit queue is empty.

I see this with small packets - 38K packets/second with some unused 
CPU vs 50K and CPU saturated when interrupting on every packet or 
interrupting on the last packet.  This matches what I expect. 
  
For large packets, I'd expect the throughput to be significantly 
less than wire speed.  But it's going at (very) close to wire speed.  

  Ahh.  I see it now.  There are 128 more packets waiting on ifp->if_snd 
  That's enough to cover the gaps until the timer ticks and my test 
  code gets woken up again. 

-----

> With the driver in NetBSD-current, I am getting 85-90Mb/s w/ the 
> fxp driver between similar machines, tho I'll see if I can reproduce 
> this problem with a > 195K window. 

I consistently get 90 megabits with NetBSD on a full duplex link, 
either point-point or through a switch.

That's when running below the 195K cliff.  The dropoff in throughput 
is pretty drastic.  The critical window size might depend slightly 
on the CPU speed or number of switches/routers in the path.