Subject: Re: Melting down your network [Subject changed]
To: Christos Zoulas <christos@zoulas.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/28/2005 19:55:49
In message <20050329030448.2A39D2AC9F@beowulf.gw.com>,
Christos Zoulas writes:

>On Mar 28,  6:54pm, jonathan@dsg.stanford.edu (Jonathan Stone) wrote:


Christos,

Aside from you taking time to read the papers papers I referred to
(citations and URLs or for the final VJ paper, even pdfs on request),
so that we have a common reference frame for discussion, I am not sure
how to convey the requisite ackgorund in an email.  My prose is just
not up to the task. I find I need to draw, and point at, some pictures.

Beleive it or not I am trying to be polite, and to not whack anyone
who doesn't need it; and to whack those who *do* need it, no harder
than necessary to get them to cease-and-desist from what (in the
networking community) are widely seen as pernicious practices.

But here goes...



>-- Subject: Re: Melting down your network [Subject changed]


>| Bill,
>| 
>| No, that's incorrect; I suspect you don't understand the issue
>| (or don't see it the way a networking expert will see it).
>| Here is the key point again:
>| 
>| An ill-behaved app can *always* attempt to overflow a queue. The queue
>| under discussion here as a potential victim of overflow attaks is the
>| per-interface if_snd queue.
>| 
>| Thus, the question under discussion is: what should we do under
>| sustained overload (or attempts to sustain overload) of the if_snd
>| queue?  Specifically, when an app using UDP (or other unreliable
>| datagram protocols) uses non-blocking I/O to persistently overflow the
>| if_snd queue?
>| 
>| The most correct anwser is: we should drop packets.
>
>You are not dropping packets. You are returning ENOBUFS to the application,
>and you are giving it a chance to retry.

Sorry? Now I'm not understanding you.  If we return ENOBUFS in this
scenario, isn't it because the packet would exceed some interface's
ifp->if_snd.ifq_maxlen, and the packet has therefore just been
dropped? ... hmm, UTSL ...

It sure Looks to me like it is. My tree is a week or so old, and it
shows a somewhat untidy mix of IFQ_ENQUEUE and the pre-ALTQ inlined
versions, which are, as far as I can see, now subsumbed by
IFQ_ENQUEUE() even for the non-ATLQ case. But thats a minor hygiene
issue.  Has something changed drastically here in the last week or two?



>Packets are not being dropped in this case... 

Currently, they are.  And that's both an entirely appropriate
respsosne, , and (with returning ENOBUFS, according to Jason's
reading) a sufificent response.


>It is just a matter
>of having the kernel sleep, and return upon success, or having the
>application spin and retry to send the same packet again. 

But there is no point or benefit in doing _either_ of those.  Just
drop the packets, and return ENOBUFS.  With the exception of insane
applications, there is no real need to tell the application that an
unreliable-delivery packet was locally dropped: any sane app will either

  a) be told about the packet drop by its application-level peers,
     and will slow down accordingly, or

  b) Doesn't care about packet drop.

If the application continues to send forever, the rate of packets
emitted on the wire will fill the link, but not all of the packet sent
via send/sendto will hit the wire. You can see this easily with ttcp -u.



If ttcp -u has ceased to work, then maybe we have a problem. 
OK, so lets try it:
# uname -a

	[exceprted] 2.99.9 NetBSD 2.99.9 GENERIC.MP.FAST_IPSEC

That kernel config is precisely GENERIC.MP, inluded via config(8)
inclusion into my local config file which just adds FAST_IPSEC and
does the inclusion. Here goes:


# ttcp -u -s -b 199608 -l 16385 -n 1024 -t 10.50.2.17
ttcp-t: buflen=16385, nbuf=1024, align=16384/0, port=5001, sockbufsize=199608  udp  -> 10.50.2.17
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: 16778240 bytes in 0.04 real seconds = 461991.77 KB/sec +++
ttcp-t: 1030 I/O calls, msec/call = 0.04, calls/sec = 29041.90
ttcp-t: -1.9user 0.0sys 0:00real 100% 0i+0d 0maxrss 0+5125pf 0+4csw


The reported send rate is 461 MByte/sec, vastly more than a 123
Mbyte/sec link. Unreliable UDP can send all it wants.  Where's the
problem?

If you wish, you can find a suitably fast pair of machines and use
ttcp -u at both ends to measure the actual received rate. In the
networking community, this exact experiment is standard practice for a
quick estimate of send-side per-packet overhead; it has been for at
least a decade.

I would much prefer that we dont break existing practice. Especially
when (insane applications aside) there's absoutely no reason to change
the current behaviour.