Subject: Re: fixing send(2) semantics (kern/29750)
To: Christos Zoulas <christos@zoulas.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/26/2005 21:50:25
In message <20050327043122.DDE6A2AC9F@beowulf.gw.com>,
Christos Zoulas writes:

>On Mar 26,  6:59pm, jonathan@dsg.stanford.edu (Jonathan Stone) wrote:
>-- Subject: Re: fixing send(2) semantics (kern/29750)
>
>This is a different scenario. 

No, it isn't not in any meaningful way. From either a networking
perspective, or the application, its fundamtenally teh same scenario:
the offered traffic is more than the network can handle.  Whether the
network "bottleneck" is between the sending NIC and its switch, or
deeper in the network, or at the receiver, or (as in this case)
between the sending network stack and the outbound device, makes no
fundamental difference.


>The cpu is a lot faster than the nic
>card, and the nic card cannot absorb packets quickly enough to send
>it out to the network. 

Exactly: it is congestion in the network.  Dropping packets in the
if_output queue is exactly analagous to dropping packets ddeat the
next-hop switch or elsewhere along the path.

From the perspective of TCP, Emmanuel's scenario is, *by definition*
congestion, and is appropriately treated by dropping the packets after
they have been buffered in the socket layer.

If Emmanuel is using a protocol that can recover from packet loss, the
protocol will recover.  If he's using a protocol that does not recoer
from packet loss, then packet loss will not be masked.  Depending on
wha tthe application wants, that may be exactly the desired behaviour
--- the textbook example is packetized speech --- or it may cause the
application to fail. The latter case is the fault of whoever chose an
inappropriate protocol.

 It is not congestion in the network fabric,
>but internal congestion. There is also currently no way to rate limit
>send so that it does not return ENOBUFS from the application side,
>and this is clearly broken.

Its not broken at all: its the expected and desired behaviour.  And I
mean that quite literally.  People have used ttcp over UDP for
*decades* with the expectation that the sender will drop packets as
they fill the send queue.  (The reported send rate on the sendre tells
hou how fast it can push data down to the send queue; the rate on the
receiver tells you either how fast it can receive packets, or that
throughput is network-limited).

The term "clue-by-four" is not in use in the ede community. There, one
might instead say, ... oh, that if this was an exam, that ansewr would
earn a failing grade. Or something like that.

Once the socket layer has done its accepted and buffered the data,
subject to sleep/retry notificaoitn, as per the relevant standards,
the job of the socket layer is *done*.


> I.e. I cannot even select or poll before
>I send, in order to avoid gettting ENOBUFS.

So what? Local mechanisms are *never* going to tell you that the
immediate-upstream switch has dropped the packets.  Local mechanisms
cannot even tell you that the packet made it to the NIC, but the link
dropped the packet.  (I have a lot of sympathy for Emmanuel, as right
now I'm seeing bursts of 50% or greate packet loss over WiFi).

The (almost) universally-accepted wisdom in the networking community
is that there's simply no *point* trying to overengineer the local
network stack to take heroic efforts to avoid packet loss, when the
adjacent switch router can drop them anytime it chooses. One *cannot*
control the network; therefore one *has* to code applications and
protcols to handle lossy networks.

Period.

Cue reference to the end-to-end argument, which in this specific case
isn't so much a clue-by-four as a whack in the head with a steel I-beam...

Christos, honestly, this is a *dumb* idea. I tell you three times.
What should I do to convince you that it's a dumb idea?