Subject: Re: 25%+ improvement in in_cksum speed!
To: Steven M. Bellovin <smb@research.att.com>
From: David Laight <david@l8s.co.uk>
List: port-i386
Date: 09/18/2002 10:23:58
> >> I haven't looked at your code in detail, but how does it perform on
> >> small packets? (~40-50% of packets are about 40 bytes.)
> >
> >It's not my code, but here's what happens when I use a size of 50
> >bytes:
> >
> > in_cksum.s sum a26d took 2887 usecs 0.563867 nsec/byte
> > asm adc 1 sum eab6 took 5033 usecs 0.983008 nsec/byte
> > asm adc 1a sum eab6 took 3758 usecs 0.733984 nsec/byte
> >Segmentation fault (core dumped)
That is because I was lazy and didn't write the 'end' effects for
all the routines - they only work for blocks that are multiples
of the inner look size (since that was what I wanted to test).
The first one that has a blocksize of 64 tries to do 4Gb...
> Ignoring the segfaul for now, those numbers suggest that we need the
> current code for small blocks and one of the newer ones for long
> bloocks.
No! you need one of the slightly unrolled loops for small
packets. Functions '1' and '1a' are particulalry cride.
David
--
David Laight: david@l8s.co.uk