Subject: sbappend() is not scalable
To: None <firstname.lastname@example.org, email@example.com>
From: Mohit Aron <firstname.lastname@example.org>
Date: 10/08/1999 15:51:29
I recently did some experiments with TCP over a high b/w-delay path
and found a scalability problem in sbappend(). The experimental setup
consisted of a 100Mbps network with a round-trip delay of 100ms. Under this
situation, FreeBSD's TCP version is incapable of attaining more than 65 Mbps
on a 300MHz Pentium II - even without slow-start.
I tracked down the problem to sbappend() - the routine that appends user data
into the socket buffers for network transmission. Every time a TCP ACK
acknowledges some data, space is created in the socket buffer that permits
more data to be appended. Unfortunately, the implementation does not maintain
a pointer to the end of the list of mbufs in the socket buffer. Thus each
time any data is added, the whole list of mbufs is traversed to reach the
very end where the data is added. Since the b/w-delay product is large, there
can be about 600 mbufs in the socket buffer waiting to be acknowledged. Thus
upon every ACK, about 600 mbufs are traversed causing the TCP sender to run
out of CPU.
The problem is not limited only to high b/w networks - it is also present in
long latency paths (satellite links). Thus a server transferring a large file
over a satellite link can spend lot of CPU due to the above problem.
Hope the problem shall be fixed in future releases,