Subject: high input packet rate can lead to process starvation
To: tech-net@netbsd.org, Ashis Mandal <amandal@entrisphere.com>
From: Tad Hunt <tad@entrisphere.com>
List: tech-net
Date: 09/22/2006 14:00:10
This is a multi-part message in MIME format.
--------------040805030006050801000806
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Folks,

We just found and fixed an issue in the network stack where user 
processes (actually, anything at a lower priority than softnet)
get starved indefinitely as long as the input queue never empties.

For example, the  ethernet driver RX ISR (which runs at a priority 
higher than softnet) can easily keep ipintrq full of icmp packets.

As long as there are ip packets in the queue, ipintr() will continue 
happily processing them.

This is really a problem not just with the ipintrq, but with all of the 
software-interrupt protocol input routines.

My solution (this is a hack) is to put a time-limit on how long ipintr() 
is allowed to run.  If it runs longer than this without emptying the 
queue, I set a global "ipdisabled" flag and start a callout ticking that 
will wakeup, clear the flag, and setsoftnet() (code attached).  In 
addition the ethernet driver drops all packets as long as ipdisabled is 
true.  (See the attachment for code)

Does anyone have a better recommendation?

Thanks,
-Tad Hunt

--------------040805030006050801000806
Content-Type: text/plain;
 name="ipintr.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ipintr.c"

ethernet_rx_intr()
{
	if(ipdisabled)
		drop
	else
		... rx and ifp->if_input() the packet as usual
}


static void
ipwakeup(void *arg)
{
	int s;

	s = splsoftnet();
	ipdisabled = 0;
	setsoftnet();
	splx(s);
}

void
ipintr(void)
{
	int s;
	struct mbuf *m;
	struct timeval start, now, elapsed;
	static struct timeval timeout = {.tv_sec = 0, .tv_usec = 700*1000};

	if(ipdisabled)
		return;

	microtime(&start);

	while (1) {
		s = splimp();
		IF_DEQUEUE(&ipintrq, m);
		splx(s);
		if (m == 0)
			return;
		ip_input(m);

		microtime(&now);
		timersub(&now, &start, &elapsed);
		if(timercmp(&elapsed, &timeout, >=)) {
			ipdisabled = 1;
			ipdiscount++;
			callout_reset(&ipintrtimer, MS2TICK(300), ipwakeup, NULL);
			break;
		}
	}
}

--------------040805030006050801000806--