Subject: Re: fixing send(2) semantics (kern/29750)
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 03/27/2005 02:08:55
> About fixing the app to workaround the current bhavior of send(2): I
> don't see any workaround for the problem at the application level.

Surely just treating an ENOBUFS return as a successful send that you
just happen to know got dropped is a suitable workaround?  I can't see
it as any worse than the results of leaving the application untouched
but fixing the bug that pushes the error back up to the send()
interface.

> The app needs to send UDP packets as fast as possible.  I want to use
> the full network bandwidth for sending UDP packets.  The only
> congestion that exists here is internal to the machine because the
> CPU can put more entries in the interface queue than the network can
> handle.

And external to the machine, unless you can somehow know that every
device between the endpoints, and the other endpoint, can handle full
wire speed.  Oh, and that there's no contention for any segments that
are functionally bus-topology segments.

While those conditions are no doubt met in some circumstances (say, a
crossed patch cable between two fast machines), I rather have to wonder
what sort of protocol this is that contains so little back-channel that
you have to fire-and-forget packets at full wire speed and yet depend
on back-pressure from the link-layer driver to get the speed right.

> In my situation, a single system call is enough to slow down the app
> so that the interface absorbs the data, but then the app is not able
> anymore to maintain the interface queue full, thus it cannot use the
> full network bandwidth.

You're saying that one syscall takes enough time for an entire full
interface queue to drain?  How can you ever fill it up with sends then?
For that matter, even if it were pollable, how would that help?  Merely
doing the poll would slow it down too much.

If you really need that kind of extreme limiting-case performance, all
I can suggest is that you hack in some kind of interface (ioctl,
syscall, special device, whatever) that basically acts as a really big
interface queue: you pass it a whole wack of packets and it pumps them
out the interface at wire speed, letting userland write more chunks
through an interface with enough back-pressure to keep things under
control.  But if one user->kernel->user boundary crossing is enough for
the whole interface queue to drain, you _have_ to stay in the kernel
msot of the time; even a tream of blocking send()s would at best only
barely keep up.

That's an extreme enough case that I don't see anything wrong with "the
stock system can't do that" as an answer.  What do you have there,
anyway, a gigabit NIC on a SPARCstation-20 or something?  I can't
remember ever seeing a machine-and-NIC combination where the machine
was slow enough compared to the NIC's wire speed that one syscall could
drain the whole send queue.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B