[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: cprng_fast implementation benchmarks
On Wed, Apr 23, 2014 at 10:57:59AM +0200, Joerg Sonnenberger wrote:
> On Tue, Apr 22, 2014 at 11:59:38PM -0400, Thor Lancelot Simon wrote:
> > I believe ChaCha8 is suitable for our purpose: we were previously
> > considering
> > ciphers with, at most, 128-bit security, and even 6-round ChaCha has 139-bit
> > strength against the best currently known attack (at present, there is no
> > attack better than brute force on ChaCha8, and the best attack on ChaCha7
> > is 2^248). ChaCha8 appears to be somewhat faster than the old arc4
> > implementation.
> Sounds wrong. When I measured Salsa20/8, it was ~3 times faster than
> RC4. Code can be found at
That's a libc implementation -- and were you calling it for 32 bits at a
time, or bulk data?
In the kernel, called for 32 bits at a time, with the percpu datastructures
and the spl calls, chacha8 appears to be about 30% faster than arc4. Called
for 256 bytes at a time with the additional overhead of copying those bytes
out to userspace, it appears to be about 40% faster.
Given that -- supposedly -- these ciphers can generate data at somewhere
between 8 and 12 cycles per byte even when implemented in C, though the
core cipher makes a not insignificant contribution to the total cost here
there are fixed overheads (the function calls; the percpu allocation and
spl overhead) that account for much of the total time.
Do we still have a compile-time way to check if the kernel (or port) is
uniprocessor only? If so we should probably #ifdef away the percpu calls
in such kernels, which are probably for slower hardware anyway.
Without the data moves to userspace, of course the 256-byte case should
be more indicative of raw cipher performance but that wasn't the point
of that test; rather that test was meant to determine how well the
different alternatives scale out to additional CPUs.
Main Index |
Thread Index |