Subject: None
To: Bill Studenmund <wrstuden@netbsd.org>
From: Jonathan Stone <jonathan@Pescadero.dsg.stanford.edu>
List: tech-net
Date: 08/05/2005 18:43:38
In-reply-to: Your message of "Fri, 05 Aug 2005 18:20:04 PDT."
             <20050806012004.GB9074@netbsd.org> 
--------

[... discussion of ACK rate, possible causes, possible remedies... ]

In message <20050806012004.GB9074@netbsd.org>Bill Studenmund writes
>> Do you have packet traces?  [...]
>I have traces around somewhere which I have stared at with Ethereal.
>
>The pattern I see is a flow of writes to the NetBSD box, perhaps with ACKs
>interspursed (I don't actually remember either way). Then I see a spew of
>ACKs with no other activity. Say about 12 in a row.

So now I ask: are you using a NIC with interrupt hysteresis?  I've
seen Intel pro/1000s slurp in 44-odd frames, an entire 64k of payload,
and only generate an interrupt after the entire 64k burst has come in
and the link has gone quiet.  Your description above is exactly what
I'd expect to see with a NIC with interrupt mititgation that has
interrupted after receiving about 32k bytes of data (22-odd frames).

If you can make a trace available, I can take a look in a day or so.


>> a)  ACKing every other in-order segment as the segments are received
>>=20
>> b) sending non-piggybacked window updates as the application reads
>> data
>
>I expect this is what I'm seeing. Well, the part of what I'm seeing that I
>dislike. I however have not figured out where this happens in the code.
>:-)

>Small pieces. Well, small and large chunks.

Reading in small pieces will trigger more window-updates than reading
in large pieces.


>> >I think we send an ack each time userland reads something, in the hopes =
>of
>> >keeping the window as open as possible. That's kinda good. However it'd =
>be
>> >nice to meter them somehow.
>>=20
>> I'd guess not; what you describe sounds sounds more like our TCP is
>> sending window updates to its TCP peer, triggered by an application
>> removing (reading) data from the local TCP receive buffer.
>>=20
>> Given the uncertainties in TCP RTT estimation, and that you want
>> BW*RTT data in flight, metering ACKs sounds to me like it'd make an
>> interesting research project (albeit one of dubious value to the TCP
>> research community).
>
>I think it could be quite useful. However the thing that has me most
>stumped is how to end the delay. What I'd like is something that knew if
>the app was still reading, and kept processing rather than send the ACK
>(delay the ACK as we will have another window update in just a little
>bit).  Well, if the window changed more than a set amount, send a window
>update anyway.
>
>I however have no idea how to reasonably implement knowing "if the app was
>still reading". Thus the idea is still a pipe dream. :-)

Yep, as I said, "research project".


>In my mind, a delay of even a tick (10 ms) is probably too long. The
>environment I'm looking at can transmit the TCP window about 5 times in a
>
>tick. So that won't work.

And if it's someone like me, trying to fill a 10GbE pipe, a 10ms extra
delay corresponds to an additional window of 12 Mbytes.  No, thankyou.


>> >The problem is that to delay sending them means we need some sort of
>> >timeout.
>>
>> Worse than that; I'd expect beleivers in ACK clocking will complain
>> fiercely if you delay ACKs for in-order received data, until the
>> occurrence of unpredictable (and, to network-level TCP dynamics,
>> largely "irrelevant") events, such as an application reading data.
>
>Uhm, I never suggested delaying receipt ACKs. Or I never meant to.
>
>My thoughts are solely around exclusive window updates. Or I think they
>are. :-)

Aye, There's the rub. It's the data ACKs that you cannot stop from
going out (at least not without breaking ACK clocking).  So when we
had that bug where our (providing your window is big enough) then the
best you can do is to delay window updates, until you can piggyback
the window updates on an ACK for previously-unacknowledged data.

But if your window *isn't* big enough, if the window gets close to
full, then any delay added to sending window updates comes straight
out of goodput time.

Personally, on any machine where I can do so, I set TCP buffers to at
least 199608 bytes. Just in case.

>It could be I should just leave TCP alone and do something different with
>my application.

I don't know about YAMAMOTO-san. Can you thwack the systems in
question, and your application, to use huge TCP buffers? Or do the
moral equivalent of stdio reads with a buffer of 256k?


>> (Wan't it Jason who opined some 8 years back that perhaps our TCP
>> should send an ACK for *every* received segment? Or am I misremembering)
>
>No idea.

If I'm not misremembering, it was back around the time Jason fixed the
bug whereby our TCP ACKed every third semgent of a unidirectional
transfer. Ancient history.