tech-crypto archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Patch: rework kernel random number subsystem



On Mon, Nov 21, 2011 at 09:20:36AM +0100, Pawel Jakub Dawidek wrote:
> 
> Could you tell more about performance characteristics of your
> implementation? If I read the code correctly, you also use single mutex
> in cprng_strong() around all the work. The simplest scalability test is
> to run 'dd if=/dev/random of=/dev/null bs=1m count=1024' N times in
> parallel, where N is the number of CPUs.

As you can see from what I checked in, I haven't replaced the
pseudodevice implementation -- yet.  On my test system the performance
of the stream generator is about 50% better than the old direct
extraction from the entropy pool, and that's for small requests; it
can probably get better with some work.

When I replace the existing pseudodevice code, the way it will work is
that there will be one instance of cprng_strong per instance of the
pseudodevice -- which will clone on open.  So the problem you describe
should not exist.  Also, one separately-keyed/"personalized" instance
of the stream generator per client is really how these generators are
intended to be used, so I am more comfortable with it on those grounds
too.

At present there are only two instances of the cprng_strong used by
the kernel itself.  However, there is no reason why there could not
be one per CPU -- and there should.

Also on the near-term horizon is a replacement for cprng_fast()
which is much stronger, faster, and avoids contention by using
per-cpu state.  You may have noticed I had to add a mutex to
the underlying arc4random() implementation -- this is a temporary
measure.  It is not good for performance but is necessary for correctness;
before, nothing at all protected the arc4 state from simultaneous
update, so the stream generator actually implemented was not arc4 but
something potentially unanalyzed and broken.  Since there is a much better
replacement coming we'll live with correct but slightly slower for now
(I benchmarked it and it is not too bad, even contended).
The new cprng_fast()/"arc4random" implementation is not mine so I'll
let its author talk about it more if he wishes.

Thor


Home | Main Index | Thread Index | Old Index