Subject: Re: Intel i82547 performance problems in wm(4)
To: Bill Studenmund <wrstuden@NetBSD.org>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 07/16/2004 11:57:53
In message <20040716173350.GB22092@netbsd.org>,
Bill Studenmund writes:


>My personal experience with an application that spends a lot of time 
>sending data from disk over a TCP socket is that checksum offload (IP and 
>TCP) do in fact make a big difference. It helps both with saved cycles 
>(CPU can do other things) and with reduced cache usage (as the data don't 
>have to be reloaded into the Dcache to get checksummed). If there is 
>something else the CPU can be doing, then the offload support lets it 
>effectively do two things at once.

Bill, how much is "big"? 10%-15%, or much more than that?

Having spent many years and years measuring this and similar
effects, with tools from ttcp throughput down to staring at PCI
bus-analyzer traces:

On modern machines, the *real* case when you get a significant win
from TCP checksum offload, is when computing the TCP checksum is the
only time the CPU actually touches the data.  In that case, moving the
software TCP checksum to outboard hardware means the CPU *never* has
to see or touch the TCP payload; you eliminate the off-chip activity
necessary for the CPU to gain cache-line ownership of the I/O buffers
holding the data.  That's usually a bigger win than the savings from
checksumming data the CPU already touch recently (e.g., refetching a
cache line from L2 cache to L1/registers).

For the latter case (refetching a line the CPU already owns and has in
a lower cache level) the gains from TCP checksum offload are often
quoted at around 10% or so.

OTOH, if checksum-offload means your app now fits into cache (L1 or
other) when it didn't before, that can be a significant win, too...