Subject: Re: fixing send(2) semantics (kern/29750)
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Emmanuel Dreyfus <manu@netbsd.org>
List: tech-kern
Date: 03/27/2005 18:20:03
Subject: Re: fixing send(2) semantics (kern/29750)
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 03/27/2005 02:08:55 

> > About fixing the app to workaround the current bhavior of send(2): I
> > don't see any workaround for the problem at the application level.
> 
> Surely just treating an ENOBUFS return as a successful send that you
> just happen to know got dropped is a suitable workaround?  I can't see
> it as any worse than the results of leaving the application untouched
> but fixing the bug that pushes the error back up to the send()
> interface.

Ignoring ENOBUFS is not a nice workaround for my problem. I'll have to
wait for a resend request from receivers and I'll loose even more time
than looping around sendto() until it does not return ENOBUFS.

But I think not returning ENOBUFS makes us compliant with SUS and our
man pages.  

That said, Johnatan has a good point: the broken behavior is what many
people have seen and expect. The idea of a fixed behavior settable
through setsockopt may be a good idea.

[No congestion outside link layer]
> And external to the machine, unless you can somehow know that every
> device between the endpoints, and the other endpoint, can handle full
> wire speed.  Oh, and that there's no contention for any segments that
> are functionally bus-topology segments.

Yes, I'm in that situation. On a LAN it's easy to acheive that.
 
> While those conditions are no doubt met in some circumstances (say, a
> crossed patch cable between two fast machines), 

I also have ethernet switches but they can handle the load. I don't see
packet loss on the network.

> You're saying that one syscall takes enough time for an entire full
> interface queue to drain?  How can you ever fill it up with sends then?

One additionnal system call other than the sendto(). I tried adding this
and I can't keep up with the network interface when I do it: 
struct timval tv = { 0, 1};
(void)select(0, NULL, NULL, NULL, &tv}

But there is a possible workaround: I'm now trying to add a select()
every N sendto(), with N varying upon when I get ENOBUFS. 

> For that matter, even if it were pollable, how would that help?  Merely
> doing the poll would slow it down too much.

Not sure (see below)

> That's an extreme enough case that I don't see anything wrong with "the
> stock system can't do that" as an answer.  What do you have there,
> anyway, a gigabit NIC on a SPARCstation-20 or something?  I can't
> remember ever seeing a machine-and-NIC combination where the machine
> was slow enough compared to the NIC's wire speed that one syscall could
> drain the whole send queue.

It's a celeron and a 100 Mb/s ethernet interface. It's true that I'm not
100% sure that the system calls last too long and cause the queue to get
drained before I reinsert new packets on it. 

I send packets of 1024 bytes, plus 28 bytes of headers, that makes 8416
bits. At 100 Mb/s, it would takes 0.084 ms to send a packet. That number
is probably horribly wrong, I'm interested by a more precise one if
someone has a suggestion. Until I get a better idea, I'll assume that if
I add system calls that last longer then 0.084 ms, the queue is consumed
faster than I can feed it, and it never fills. 

I don't know how much time a sendto() consume. How can I evaluate that?
ktrace tells me that it takes up to 0.18 ms for a 1024 bytes packet. But
when I use ktrace on the test program in kern/29750, the ENOBUFS
disapear, so I suspect ktrace makes the system call really longer than
they really are.

While investigating to give you accurate numbers, I discovered something
interesting: ktrace shows that select (void)select(0, NULL, NULL, NULL,
&tv} with tv = { 0, 1 } takes 19 ms. That's huge. Any idea why it takes
so long? 

A 19 ms wait clearly explains that the ENOBUFS disapear: calling select
cause me to feed the queue 200 times slower than it is consumed. Another
quick test shows that ktrace reports 0.005 ms for getuid(). So all the
system calls are not that slow, it's just select(). 

So I'd change my statement: waiting using select() cause the app to be
so slow it cannot feed the queue fast enough to use full bandwidth.

Question: how can I wait with a finer granularity than select()?

About the poll question: it depends how much a poll() costs.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@netbsd.org