Subject: Re: 25%+ improvement in in_cksum speed!
To: Steven M. Bellovin <smb@research.att.com>
From: David Laight <david@l8s.co.uk>
List: port-i386
Date: 09/18/2002 10:23:58
> >> I haven't looked at your code in detail, but how does it perform on 
> >> small packets?  (~40-50% of packets are about 40 bytes.)
> >
> >It's not my code, but here's what happens when I use a size of 50
> >bytes:
> >
> >          in_cksum.s sum a26d took     2887 usecs 0.563867 nsec/byte
> >           asm adc 1 sum eab6 took     5033 usecs 0.983008 nsec/byte
> >          asm adc 1a sum eab6 took     3758 usecs 0.733984 nsec/byte
> >Segmentation fault (core dumped)

That is because I was lazy and didn't write the 'end' effects for
all the routines - they only work for blocks that are multiples
of the inner look size (since that was what I wanted to test).
The first one that has a blocksize of 64 tries to do 4Gb...

> Ignoring the segfaul for now, those numbers suggest that we need the 
> current code for small blocks and one of the newer ones for long 
> bloocks.

No! you need one of the slightly unrolled loops for small
packets.  Functions '1' and '1a' are particulalry cride.

	David

-- 
David Laight: david@l8s.co.uk