Subject: Re: tuning IP checksumming code...
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: port-i386
Date: 07/18/1996 08:12:41
> Experimentally, using the 1.2 in_cksum.c or the tuned in_cksum.s seems
> to make no significant performance difference on the P120s here; the
> tuned code may be marginally slower.  Not having seen the
> alternatives, I'm not sure which flavour of x86 in_cksum.s is tuned for

Remember that the processor type and speed are only two factors in the
speed equation, and may not even be the most important factors; there are
also the speed of the memory bus, details of the level 2 cache, and so on.

> NB: for those who care, the latest Linux 2.0.x has a TCP over loopback
> throughput that's about 25% faster than NetBSD on a P/120.  (That's
> with a ~3.5k MTU on the linux lo0, and 32k on NetBSD.  Finding 10%
> improvements here and there on NetBSD/i386 seem really quite attractive.

Someone else mentioned that Linux uses a checksum-and-copy routine in its
TCP/IP stack (and it wouldn't surprise me if the loopback interface was
able to avoid doing the second checksum calculation at all); having added
such to an OSF/1 based kernel, I can attest that if you have to copy the
data at all, doing checksums at the same time is a serious performance
improvement, even considering the effort you go through to unchecksum bytes
here and there.