Subject: Re: fixing send(2) semantics (kern/29750)
To: Emmanuel Dreyfus <manu@netbsd.org>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/26/2005 18:47:33
In message <1gu26ju.tyif891kp9p6iM%manu@netbsd.org>Emmanuel Dreyfus writes

I>Christos Zoulas <christos@zoulas.com> wrote:


>That's the whole point: if non blocking I/O is set, SUS and our man
>pages say that we must block instead of failling.

I think that should be "unless non-blocking I/O is set..."

And, if there is insufficient buffer space *IN THE SOCKET LAYER*
for the kernel to accept the send() request, then sure, we should sleep,
*in the socket layer*.

Whatever happens below the socket layer is protocol-specific; SUS
doesn't have much to say there.   


The suggestion to push that state down down to the if_send interface
suggests, OTOH, a fundamental lack of understanding of the semantics
at the different layers of abstraction.  Networks can, and do, lose
packets. Losing a packet at the if-output queue is, from the
perspective of upper layers, not different in any significant way
whatsoever than other causes of loss: losing the packet due to
excessive collisions on a shared medium (half-duplex Ethernet
collisons), or RF noise on Wifi, or the adjacent switch port deciding
to toss the packet because the switch packet buffers for that port are
full.

If your application can't cope with overflowing the if-output send
queue, then it cannot cope with any of those other conditions.  In
which case, it's *your* application that is broken, not the BSD
networking code.

To see this most cleary, consider a UDP-based application. UDP
semantics are "best effort", without retransmission. Thus, dropping
UDP packets at the if_output() layer when the if send queue is full is
*entirely* appropriate. Forcing a sleep there would be a bug, pure and
simple.  Conversely, if you want the network stack to handle such
conditions, you should use a protocol like TCP, which will recover
from packet loss by retrasmmitting that data which (as per SUS) was
accepted and buffered safely at the socket layer.


I think you'd be better off to fix your application, instead of
pursuing a (well-meant, but deeply misguided) attempt to "fix" the
BSD networking code.