Subject: Re: sendto() and ENOBUFS question..
To: None <tech-net@netbsd.org>
From: Justin C. Walker <justin@mac.com>
List: tech-net
Date: 05/15/2002 09:43:03
On Wednesday, May 15, 2002, at 12:41 AM, <sudog@telus.net> wrote:

> On Tue, 14 May 2002, Justin C. Walker wrote:
>
>>> So, the question is, does select() or poll() make a check to ensure
>>> that ENOBUFS won't show up when udp_output() is called? Or should it?
>>
>> Nope; select/poll don't have any way to determine this, since it isn't
>> recorded as "state" anywhere.  UDP is "best effort" delivery, which
>> means that if the packet can't be delivered for any reason, it gets
>> dropped, and very little is done to note that fact.
>
> Then the select check for writeable status on the SOCK_DGRAM is
> meaningless in its current form.

I would not phrase it this way.  I would say that the mechanisms to 
provide what you want are not available when you use UDP.  UDP is 
"unreliable"; the definition says that, and the implementation 
reinforces it.  It doesn't make sense to build a lot of support for 
reliability into a transport that is inherently unreliable.

> In that case, maybe it should either
> fail with an error message (something else than is listed in the
> manpage) or be modified to clearly indicate the path is reliable down
> to the network interface at that particular moment.

Nope.  For one thing, the 'iswriteable' semantics are specific to a 
'connection', and in UDP, you don't have any.

Also, there is no clear definition of "the path to the interface", since 
UDP operates in an "unconnected" mode.  The system lets you 'connect' a 
datagram socket as a convenience (so you don't have to continually 
supply headers if the datagrams always go to the same place), but that 
shouldn't imply to you that there really is a path.

Again, what you are asking for is reliability, and for UDP, that is 
solely at the discretion of the application.

> I guess that would
> mean reserving a small amount of memory for each SOCK_DGRAM driven by
> select()'s wouldn't it.. and a queuing mechanism.. which again would
> be better implemented in userland. Hrm.

I think you've got it!

> Buffered UDP would be what I'm looking for..  but isn't there a send
> buffer associated with a SOCK_DGRAM that can be adjusted?

Nope.  As I said, SOCK_DGRAM traffic typically goes through the socket 
layer like grease through a goose: no buffering.  I believe that 
sosend() will just call the protocol send hook when it gets data from 
the caller.

> Anyway, the fact that it's unreliable doesn't mean the local machine
> shouldn't do something to help it become as reliable as possible. The
> socket-full errors that crop up for receivers are perfectly natural
> and logical;  after all if the receiver can't cope with the data, it's
> normal to start dropping packets -- *on the receiver's end*.

I don't think this is correct at all.  In a sense, trying to bring some 
reliability to UDP on one end of a conversation is similar to 
rearranging the deck chairs on the titanic (or, better, making sure that 
all passengers take part in daily lifeboat drills when all the lifeboats 
have holes in their bottoms).

It doesn't make a lot of sense (to me, at least) to introduce a lot of 
complexity, and tie up resources, to provide for local reliability when 
the very nature of the protocol is to be unreliable.

>>> If not, how would I, in userland, cleanly wait until I can do another
>>> udp_output safely without ENOBUFS showing up?
>>
>> As someone else mentioned, application-layer flow control is your
>> answer.  Think about the underlying transport, and what it provides:
>> some probability that a datagram you hand to the kernel with "send()"
>> will get to the other end.  There's no guarantee about order or even
>> eventual delivery.
>
> Right--but that's the assumption *once it's out on the wire.* Getting
> it to the wire to begin with shouldn't be an exercise in futility.

See above.

>> If you are using UDP, your application is, in essence, accepting
>> the fact that delivery of data is up to it, not the system.  You
>> choose UDP because the terms are acceptable.  If you want to
>> guarantee delivery, using UDP, you have to resort to timeouts and
>> positive acknowledgements.  There's no other way that I know of.
>
> Timeouts and positive acknowledgements are fine--I can easily build
> that logic into the system. But when select() tells me I can write to
> the socket, I should be able to write to the socket unless someone
> else gets there first and eats the last mbufs. If the write fails,
> then select() is unreliable and we're back around to basically useless
> system call functionality that pretends to run fine on a SOCK_DGRAM
> but really doesn't.

You are confusing things.  If you want reliability, use TCP.

>> It's up to your app.  "patching" the kernel would "break" UDP.
>
> Making select() work properly would mean that the kernel would have to
> deal somehow with who should get woken up first.

select() does work properly in this case.  It's telling you that the 
underlying transport is telling it that there are no protocol conditions 
that will block a write attempt.  For an analogous situation, select() 
on a file descriptor connected to a real file, on a local drive, will 
always say that writing is possible.  It won't take into account whether 
there are enough free pages in the system to handle the user's write 
(AFAIK).

> Well at least now I can better see why it was done the way it was
> done. I think there's a better way. But for people like myself who
> think TCP performance is quite pathetic and would like finer-grained
> control over our network communications, what alternative is there
> except to implement another protocol? And if UDP won't cut it as a
> base to work from, is there something else that might? SOCK_RAW,
> listening on a custom protocol number seems a better method. Ah,
> there's one. Protocol number 68!
>
> For the record:
>
> . Yes, I have the time to fiddle around with this.
> . Yes, I like doing this.
> . No, TCP isn't cutting it because of all the compatibility, the poor
> packet loss recovery, and the crappy in-order error correction it has
> to enforce which is IMHO, less than suited for finite chunks of data
> like files.

I dunno.  It's always seemed to me that TCP does a pretty good job at 
all those things.  The alternative is to do all those things yourself, 
and in essence, reinvent the wheel.  It's worth doing if you have an 
application that requires it (http 1.1 springs to mind), but otherwise, 
I don't see it.

Regards,

Justin

--
Justin C. Walker, Curmudgeon-At-Large  *
Institute for General Semantics        |    Men are from Earth.
                                        |    Women are from Earth.
                                        |       Deal with it.
*--------------------------------------*-------------------------------*