Subject: Re: Melting down your network [Subject changed]
To: Allen Briggs <briggs@netbsd.org>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/28/2005 20:48:37
In message <20050329032314.GE8782@canolog.ninthwonder.com>,
Allen Briggs writes:


>I believe the discussion started talking about BLOCKING mode, not
>non-blocking mode. 


Well if you say so, then I fear I may be wrong; but I thought the PR
and the entire ENOBUFS case was for non-blocking mode?


>It seems to me to be more robust in general if
>the kernel sleeps until there's space on the queue--when in blocking
>mode. 


Allen, please believe that I understand where you are coming from and
the reasons you are saying that.  But in the networking world, the
answer is basically: no, it's *not* more robust.

If you want your application to work in real networks, not dedicated
"toy" networks like the NETBLT experiments[1] then your applications
have to be able to recover from drop and from congestion.  So you have
to put some real congestion control, or -- for inhernetly fixed-rate
apps, like digitized audio or video constrained to use fixed-rate
codecs -- some rate-adjustment hooks. Or, do what commercially-viable
applications do: make the indiviudal flows so small that nobody
worries about even large aggregations of individual flows.

So. You have to put congestion-control in anyways. And that congestion
control can, and should, recover from congestion (full queues) which
occurs *inside* the sending host, as well as inside the network.

There's at least one networking textbook -- Tannenbaum, I think but
I can't swear to it, so please dont quote me -- which makes the
following point: If you have to deal with congestion, then you need to
drop packets.  Under sustained congestion, then the best place to drop
packets is at the sending hosts.  There's no point investing network
resources in sending a packet halfway across the network, just to have
the packet dropped due to congestion in any case; the network is
better off if packets are dropped as close to the sending host as
possible.  Dropping at the sending host is the limiting case.


[1]: I wonder how many of the readers following this have read the
relevant RFCs? Is it worth my time to try and summarize NETBLT
history, instead of hoping people will follow the reference?


>It makes more sense from an application programming point
>of view and it makes more sense from a resource utilization point
>of view.

That is not the wisdom in the networking community. A robust
application needs full and complete handling of congestion in the
network. If the application has that, then optimizing the local send
queue is not necessary _except as an optimization_ for whatever losses
are happening locally inside the host, over and above network losses.

The textbook example is to point out to Manu(sp?)  that his app
doesn't deal with a half-duplex or wifi network, where, due to a wide
variety of causes[2], drops *at the link layer* are relatively common.

[2]. Ethernet link-level collision mechansisms, RF oversubscription,
and even Ethernet capture effect.  As it happens, investigation of the
frequency of Ethernet drop -- e.g., RF damage to frames, or hardware
binary-exponential-backff transmit counter exceeding the limit of 16
-- was recently suggested in the research community as a topic worthy
of fresh investigation: nobody's really looked at it since the days of
half-duplex 10Mbit Ethernet.

> If someone's trying to fill the pipe, make 'em sleep,
>don't encourage them to spin.

But, here's the rub: trying to fill the pipe when the pipe is
congested is *evil*. Punishing such ill-behaved applications, by
dropping their packets, is entirley appropriate.

>In non-blocking mode, by all means, return EAGAIN/EWOULDBLOCK if
>it would block.

Back again to an earlier paragraph; isn't it ENOBUFS in that case?
Is this confusion between blocking and non-blocking the reason Christos
and I were talking past each other earlier?


[>I don't see a good reason to transmit "output queue is full" back
>to the application when you're in blocking mode. 

Sure: let it block until the send queue is *completely* drained, then
wake it up. If the queue gets full in the meantime, sleep again.
Seems reasonable to me.

If Emmanuel's problem is that he tried to use blocking mode, and
discovered the hard that UDP basically ignores the socket send buffer
limit (doesn't, send-side UDP spimly push packets down to the
interface send queue instantly?), and *then* he tried to use
non-blocking mode, then whether or not to ``fix'' blocking mode, as an
optimization, may be worth revisiting. Me, I tend to doubt it, for
exactly the resaons I outlined above for non-blocking mode.

Remember, UDP (as said by Bob Braden, one of the designers) is
supposed to be a very lightweight shim on top of raw IP best-effort
semantics.


Right now, for myself, I'm out of time until maybe Wednesday.  And
(as you may have gathered already) I'm likely to conclude that the problem
is simply not worth trying to fix. So it's a mess for anyone who tries
to write a UDP-based app and can't tolerate local drop: so what?



> And I'm not sure
>when I would care *why* it would block in non-blocking mode.

The desiderata in nonblocking mode is not to block, but to drop
packets.  Again: UDP is best-effort, which includes drops not just
inside the network but inside the host. 

After thinking about it a third time... the litmus-test I personally
would use here is: anytime we'd drop forwarded IP packets during
forwarding of those packets, we should also be dropping locally-generated
UDP traffic as well.



>Since you, Jonathan, have a lot of networking experience, have
>read a lot (most?) of the literature, and participated in multiple
>fora, I am very interested in hearing what you have to say and
>understanding how this looks from that perspective.  I understand
>you have a vehement reaction to what Emmanuel proposed, but I'm
>interested in the technical argument, not the vehemence.


Well... thank you for the compliments. Unfortunately for me, it's
exactly my familiarity with the networking literature and with the
rseaerch community, which leads to the vehemenice.  I feel a moral
burden to jump *heavily*, on anyone who both shares my association
with NetBSD, and who attempts to write a non-rate adaptive,
non-congestion-responsive application.  I honestly don't bear Emmanuel
wany personal ill-will, nor do I wish him to stop using his app in a
private, controlled, network.  But, I have a vehement objection to the
app being distributed in pkgsrc without a strong disclaimer that being
non-congestion-respsonsive and non-rate-adaptive, and designed to send
at the highest rate it can, Emmanuel's app is inherently a DDOS tool.
It should either be  labelled as such, or not distributed at all.

Maybe I overdid it. But from where I'm sitting, last I checked the pkg
was still available, and the DESCR warning was (from my perspective,
about which you say such positive things) wholly inadequate.

I mention that because to me, it's far, far more signficinat than
whatever changes we may or may not make to our UDP code.