[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: ping: sendto: No buffer space available
I'd like to clarify my thinking here. We're talking both about
networking, in which we commonly lose data (sometimes by design,
e.g. in network congestion situations at IP routers), and we're
talking about UNIX kernel system call API.
In my opinion, when the kernel is interacting with a userland
application, it should *never* silently drop (lose) data.
The UNIX kernel should endeavor to provide the userland with as
much information about what's going on as possible, so that the
application programmer can decide what to do, as appropriate to
his application. Defaults should be set to match reasonable
expectations, and exceptions allowed for programs that are prepared
to handle a wider range of error conditions themselves.
In most resource limitation situations (whether the resource is
limited a priori, or simply exhausted), only the kernel has the
global view of the resource (if there is a global view to be had).
So the question is: what does it do (or tell the application) in
There are different models for this, depending upon the resource
and the nature of the limitation.
When a filesystem fills up, generally write(2) returns ENOSPC.
Usually, these situations require operator (human) intervention to
clean up, i.e. someone has to start freeing up disk space, based
one presumes in some local policy about what files or data is
expendable (or transferable to elsewhere). ENOSPC is the kernel
saying, "there is no more data storage space, and I have no
expectation that any will become available 'soon' - deal with it,
Most programs don't bother to check for that error, and have no
code to handle it - they just keep blithely banging away at write(2),
or they fail outright on any error without specifically handling
If the kernel simply blocked userland programs until filesystem
space became available, RAM could potentially fill up with buffers
(inside programs) waiting to be written. Possibly the system could
seize up completely, preventing effective operator response. There
is a presumption in UNIX that there should always be some disk
space available. NetBSD provides newsyslog(8) and a default
/etc/newsyslog.conf for this reason.
By contrast, when a TCP stream is flow controlled, that's temporary,
and resource can reasonably be expected to be available again
shortly, as a matter of course in normal operation. So, the kernel
will block a write(2) call until flow is allowed again, rather than
return an error.
In UDP (or other datagram protocol situation, e.g. ICMP), there
are no flow controls as such - IP network routers are designed (and
thus expected) to drop packets when they're congested, and applications
using those protocols are expected to deal with that as appropriate
(some handle it, some ignore it). Inside the UNIX kernel, we limit
the number of packets the kernel will handle both in the maximum
size of the global mbuf pool and in the maximum output packet queue
length for each network interface.
I complained ten years ago that there was no distinction being made
in error messages from the kernel between "no more mbufs" and
"network interface output queue full" mostly from a concern for
user confusion as to the specific error situation being encountered
and reported to users. This E-mail thread started (I believe) from
precisely this confusion.
In both cases, an application is extremely unlikely to have the
global view of whichever resource (mbufs or output queues) is
temporarily exhausted. What is an application programmer to do in
that case? How much backoff or waiting should the program apply?
Given the dynamic nature of networking, it is likely that either
resource exhaustion (or limitation) is very temporary in nature.
That's why I'm suggesting that the kernel should block by default,
rather than return ENOBUFS in a network output queue limit situation,
and return ENOBUFS only to those applications which have requested
non-blocking behavior (i.e. have explicitly indicated to the kernel
that they're prepared to handle that error condition). Blocking
gets the program to shut up for a time (flow control in the face
of local resource exhaustion, but of a resource that is reasonably
expected to be available again very soon).
Ping(8) is a network test and measurement tool. Clearly, it falls
into the "will request non-blocking I/O" class, in that in order
for it to properly report where things are working (or not working),
it has to know what's going on.
I want to make clear that the IP router response is different,
despite that UNIX can and does act as an IP router. An IP router
has a tenuous relationship at best to the other systems on its
attached networks. When it hits resource exhaustion, all it can do
is drop IP packets - sending a response in a congestion situation
only makes congestion worse; TCP is designed to deal with detected
packet drops by reducing data flow. That's a very different
relationship than the one between an application program and an OS
kernel within a single system.
Main Index |
Thread Index |