Subject: Re: sendto() and ENOBUFS question..
To: <>
From: David Laight <david@l8s.co.uk>
List: tech-net
Date: 05/15/2002 19:05:10
On Wed, May 15, 2002 at 09:43:03AM -0700, Justin C. Walker wrote:
> 
> On Wednesday, May 15, 2002, at 12:41 AM, <sudog@telus.net> wrote:
> 
> > On Tue, 14 May 2002, Justin C. Walker wrote:
> >
> >>> So, the question is, does select() or poll() make a check to ensure
> >>> that ENOBUFS won't show up when udp_output() is called? Or should it?
> >>
> >> Nope; select/poll don't have any way to determine this, since it isn't
> >> recorded as "state" anywhere.  UDP is "best effort" delivery, which
> >> means that if the packet can't be delivered for any reason, it gets
> >> dropped, and very little is done to note that fact.
> >
> > Then the select check for writeable status on the SOCK_DGRAM is
> > meaningless in its current form.
> 
> I would not phrase it this way.  I would say that the mechanisms to 
> provide what you want are not available when you use UDP.  UDP is 
> "unreliable"; the definition says that, and the implementation 
> reinforces it.  It doesn't make sense to build a lot of support for 
> reliability into a transport that is inherently unreliable.

I would say that UDP should block (or return EAGAIN) before it
starts failing ENOBUFS...

> Nope.  For one thing, the 'iswriteable' semantics are specific to a 
> 'connection', and in UDP, you don't have any.

There is no reason why they souldn't apply to the socket itself.
> 
> Also, there is no clear definition of "the path to the interface", since 
> UDP operates in an "unconnected" mode.  The system lets you 'connect' a 
> datagram socket as a convenience (so you don't have to continually 
> supply headers if the datagrams always go to the same place), but that 
> shouldn't imply to you that there really is a path.

There IS a path when you have a datagram in your hand to send.  That is
the one that the user wants to know about.
> 
> Again, what you are asking for is reliability, and for UDP, that is 
> solely at the discretion of the application.

Not reliability, just sanity!
NFS over UDP works (on local LAN segments) because the actual error
rate is very small.  If UDP or the ethernet driver decides to
discard packets just because of flow control then it all dies a
horrid death (made much worse by increasing the default NFS data
block size from 8k to 32k).
> 
> > I guess that would
> > mean reserving a small amount of memory for each SOCK_DGRAM driven by
> > select()'s wouldn't it.. and a queuing mechanism.. which again would
> > be better implemented in userland. Hrm.
> 
> I think you've got it!

no - just the same as any other select/poll call.  You block wainting
for a wakeup from the lower driver.  The required data block is
(probably) allocated during the poll/select system call.
> 
> > Buffered UDP would be what I'm looking for..  but isn't there a send
> > buffer associated with a SOCK_DGRAM that can be adjusted?
> 
> Nope.  As I said, SOCK_DGRAM traffic typically goes through the socket 
> layer like grease through a goose: no buffering.  I believe that 
> sosend() will just call the protocol send hook when it gets data from 
> the caller.

You don't need to buffer is to implemet flow control.
> 
> > Anyway, the fact that it's unreliable doesn't mean the local machine
> > shouldn't do something to help it become as reliable as possible. The
> > socket-full errors that crop up for receivers are perfectly natural
> > and logical;  after all if the receiver can't cope with the data, it's
> > normal to start dropping packets -- *on the receiver's end*.
> 
> I don't think this is correct at all.  In a sense, trying to bring some 
> reliability to UDP on one end of a conversation is similar to 
> rearranging the deck chairs on the titanic (or, better, making sure that 
> all passengers take part in daily lifeboat drills when all the lifeboats 
> have holes in their bottoms).

No both the sending system and receiving system need to take reasonable
care to ensure that UDP packets aren't dumped on the floor.
> 
> It doesn't make a lot of sense (to me, at least) to introduce a lot of 
> complexity, and tie up resources, to provide for local reliability when 
> the very nature of the protocol is to be unreliable.
> 
> >>> If not, how would I, in userland, cleanly wait until I can do another
> >>> udp_output safely without ENOBUFS showing up?

Fix UDP?

> > Timeouts and positive acknowledgements are fine--I can easily build
> > that logic into the system. But when select() tells me I can write to
> > the socket, I should be able to write to the socket unless someone
> > else gets there first and eats the last mbufs. If the write fails,
> > then select() is unreliable and we're back around to basically useless
> > system call functionality that pretends to run fine on a SOCK_DGRAM
> > but really doesn't.
> 
> You are confusing things.  If you want reliability, use TCP.

If I understand what people are saying that UDP allows a single
(user) application to generate a DoS attack on the system by
tieing up all the mbufs by getting them queued for output on a
single interface.  Surely there ought to be some back pressure
flow control back from the ethernet driver to the UDP socket?

> > Making select() work properly would mean that the kernel would have to
> > deal somehow with who should get woken up first.

Wake 'em all up and let 'em charge at the crack in the door like
a herd of hungry hippos.

> select() does work properly in this case.  It's telling you that the 
> underlying transport is telling it that there are no protocol conditions 
> that will block a write attempt.  For an analogous situation, select() 
> on a file descriptor connected to a real file, on a local drive, will 
> always say that writing is possible.  It won't take into account whether 
> there are enough free pages in the system to handle the user's write 
> (AFAIK).

Except that local files don't work that way.....
They probably always block?

	David

-- 
David Laight: david@l8s.co.uk