tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: in_cksum (Re: CVS commit: [yamt-lazymbuf] src/sys/arch/amd64/include)

On Sat, Feb 09, 2008 at 02:10:20PM +0100, Joerg Sonnenberger wrote:
> On Sat, Feb 09, 2008 at 09:48:56PM +0900, YAMAMOTO Takashi wrote:
> > btw, according to regress/sys/net/in_cksum, i386 asm version
> > (cpu_in_cksum.S 1.2) seems slower than portable version
> > on my cpu.
> Actually, I'm not surprised by that. Given that the inner loop is using
> adc all the time, the adds themselve can't run in parallel. I haven't
> had time to carefully analyse the code if the amd64 could be ported.

I also suspect it is excessively unrolled - but that only affects
the cold cache time.

Possibly you could run 2 (maybe 3) add sequences in parallel, using 'adc $0'
to break the carry chain - but I don't know if that actually works!
ISTR than an instruction that writes to ALL the condition flags doesn't
have a dependency against the previous copy of the flags.

Note that an 'adc' chain can't end up with a value of ~0 and carry set
(unless that is the initial condition and you've added ~0).


David Laight:

Home | Main Index | Thread Index | Old Index