I have been looking at our changed code, and it's sufficiently hairy that I'm not sure our change is correct. Here are the issues: * meta-issue: limited precision The real problem is limited precision. RTT measurements have 500 ms granularity. Stored SRTT is also too granular. * +1 as flag bit in tcp_input.c, around line 1687, ts_rtt is set to a difference of timestamps PLUS ONE. The +1 is used as a flag bit to denote that the calculation is valid two places later, ts_rtt is tested for != 0, but then ts_rtt is used, instead of ts_rtt - 1. In tcp_xmit_timer, there's no explanation of how this extra 500ms is removed. * bad rounding in tcp_xmit_timer, there's no rounding on the 1/8 of the old srtt. This seems to prevent srtt from getting low enough. Our code changes (I'm working on extracting the diff from larger unrelated changes) basically address these two points, and our stack then runs with more sensible timeout versions. I am unclear on whether stored srtt, said to be in <<3 fixed point, is in seconds, or in 500ms, or in ?
Attachment:
pgp2lQFFnITNH.pgp
Description: PGP signature