tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ping: sendto: No buffer space available



I'd like to clarify my thinking here. We're talking both about networking, in which we commonly lose data (sometimes by design, e.g. in network congestion situations at IP routers), and we're talking about UNIX kernel system call API.

In my opinion, when the kernel is interacting with a userland application, it should *never* silently drop (lose) data.

The UNIX kernel should endeavor to provide the userland with as much information about what's going on as possible, so that the application programmer can decide what to do, as appropriate to his application. Defaults should be set to match reasonable expectations, and exceptions allowed for programs that are prepared to handle a wider range of error conditions themselves.

In most resource limitation situations (whether the resource is limited a priori, or simply exhausted), only the kernel has the global view of the resource (if there is a global view to be had). So the question is: what does it do (or tell the application) in that situation?

There are different models for this, depending upon the resource and the nature of the limitation.

When a filesystem fills up, generally write(2) returns ENOSPC. Usually, these situations require operator (human) intervention to clean up, i.e. someone has to start freeing up disk space, based one presumes in some local policy about what files or data is expendable (or transferable to elsewhere). ENOSPC is the kernel saying, "there is no more data storage space, and I have no expectation that any will become available 'soon' - deal with it, application program."

Most programs don't bother to check for that error, and have no code to handle it - they just keep blithely banging away at write(2), or they fail outright on any error without specifically handling that one.

If the kernel simply blocked userland programs until filesystem space became available, RAM could potentially fill up with buffers (inside programs) waiting to be written. Possibly the system could seize up completely, preventing effective operator response. There is a presumption in UNIX that there should always be some disk space available. NetBSD provides newsyslog(8) and a default /etc/newsyslog.conf for this reason.

By contrast, when a TCP stream is flow controlled, that's temporary, and resource can reasonably be expected to be available again shortly, as a matter of course in normal operation. So, the kernel will block a write(2) call until flow is allowed again, rather than return an error.

In UDP (or other datagram protocol situation, e.g. ICMP), there are no flow controls as such - IP network routers are designed (and thus expected) to drop packets when they're congested, and applications using those protocols are expected to deal with that as appropriate (some handle it, some ignore it). Inside the UNIX kernel, we limit the number of packets the kernel will handle both in the maximum size of the global mbuf pool and in the maximum output packet queue length for each network interface.

I complained ten years ago that there was no distinction being made in error messages from the kernel between "no more mbufs" and "network interface output queue full" mostly from a concern for user confusion as to the specific error situation being encountered and reported to users. This E-mail thread started (I believe) from precisely this confusion.

In both cases, an application is extremely unlikely to have the global view of whichever resource (mbufs or output queues) is temporarily exhausted. What is an application programmer to do in that case? How much backoff or waiting should the program apply?

Given the dynamic nature of networking, it is likely that either resource exhaustion (or limitation) is very temporary in nature. That's why I'm suggesting that the kernel should block by default, rather than return ENOBUFS in a network output queue limit situation, and return ENOBUFS only to those applications which have requested non-blocking behavior (i.e. have explicitly indicated to the kernel that they're prepared to handle that error condition). Blocking gets the program to shut up for a time (flow control in the face of local resource exhaustion, but of a resource that is reasonably expected to be available again very soon).

Ping(8) is a network test and measurement tool. Clearly, it falls into the "will request non-blocking I/O" class, in that in order for it to properly report where things are working (or not working), it has to know what's going on.

I want to make clear that the IP router response is different, despite that UNIX can and does act as an IP router. An IP router has a tenuous relationship at best to the other systems on its attached networks. When it hits resource exhaustion, all it can do is drop IP packets - sending a response in a congestion situation only makes congestion worse; TCP is designed to deal with detected packet drops by reducing data flow. That's a very different relationship than the one between an application program and an OS kernel within a single system.

        Erik <fair%netbsd.org@localhost>


Home | Main Index | Thread Index | Old Index