Subject: Re: Socket buffer accounting and TCP
To: Charles M. Hannum <mycroft@mit.edu>
From: Stefan Grefen <grefen@hprc.tandem.com>
List: tech-kern
Date: 09/04/1998 00:36:03
In message <199809031521.LAA07560@lunacity.ne.mediaone.net>  "Charles M. Hannum" wrote:
> 
> > If the mbufs have no room (or not much) than we're wasting headers
> > and can start compressing to reclaim the header space if the amount of
> > wasted headerspace is a certain percentage of buffered data. 
> >
> > If there is a mbuf with lots of space (eg. the new one fits in) then we can 
> > compress immediatly.  
> 
> Obviously you're missing the point.  We don't want to do extra copies;
> they're a waste of CPU time.

I don't think so ..

> 
> Consider, for example, a news server or mail server at an ISP, with
> lots of clients connected via PPP (with small MTUs), all doing PMTU
> discovery (which will cause lots of small segments to be received).
> This is a very real case -- or at least will be more so when PMTU
> discovery is more widely deployed.
> 
> In this case, almost all the packets are much smaller than the mbuf
> size, so you're going to end up always copying the data an extra time.
> This really screws your network layer performance.

I think you're between a rock and a hardplace than anyway. 
Either you use only 10% of your buffer space or you do some extra copies.
What is prefereable depends on the actual system (do we have memory 
or CPU cycles ...).

We want to compress packets on sockets with slow applications.
The easiest way to do it is to set a threshold for wasted space (let say 
30 %) and have that applied if the total amount of data on that buffer
is 1.5 times the low-watermark, with a minimum of 2 buffers needed.
Of course all those parameters should be sysctl variables.

This compress data for process which have been awakend but for some reason 
didn't pick up the data yet. 

> 
> This *can* be fixed, with largely the same mechanism one might use to
> implement zero-copy socket buffering, but it's not trivial.
> 

This is not trivial and also has some performance issues which are not 
obvious. Most UNIX-VM's (honorable exception AIX) are not designed with
async rawio in mind. Socket/filesystem io is even worse as the application
expects to be able to reuse the buffer immediatly (which means the page
has to go to COW and this bites even if the buffer is not touched but
something else on the same page).
This is another can of worms for an industrial sized opener ...
(been there, done that (unixware 2.1, was fairly easy to cheat), but don't
like the T-shirt :-))

Stefan

--
Stefan Grefen                                Tandem Computers Europe Inc.
grefen@hprc.tandem.com                       High Performance Research Center
 --- Hacking's just another word for nothing left to kludge. ---