Subject: Re: insufficient entropy for rnd
To: Rumi Szabolcs <rumi_ml@rtfm.hu>
From: Daniel Carosone <dan@geek.com.au>
List: tech-crypto
Date: 08/13/2003 10:39:54
On Tue, Aug 12, 2003 at 02:46:34PM +0200, Rumi Szabolcs wrote:
> Here are some outputs from my server:
> 
> # rndctl -l
> Source                 Bits Type      Flags
> sd1                12200023 disk estimate, collect
> sd0                14535605 disk estimate, collect
> pckbd0                 2421 tty  estimate, collect
> fxp0                      0 net  
> wd0                 6184645 disk estimate, collect
> 
> Correct me when I'm wrong but I'd think fxp0 is disabled
> by default because it could theoretically be flooded from
> the network with a pattern that could make the randomness
> somewhat more predictable (not that I'd recognize that
> as a serious security risk...)

It is, and it's not.

It is the reason device-type net is disabled by default, and it's
not a serious risk. Anyone who can predict the arrival time of a
network packet interrupt (and subsequent processing) within the
precision of a CPU cycle counter has enough control over your
machine that randomness is irrelevant.

> # rndctl -s
>          32922694 bits mixed into pool
>                 0 bits currently stored in pool (max 4096)
>          14031307 bits of entropy discarded due to full pool
>          18891387 hard-random bits generated
>         171645157 pseudo-random bits generated
> 
> To me the above numbers tell that lots of entropy bits get
> wasted because of the pool being full and the 4096 bits
> in the pool can be drained rather easily by any process
> which eats bigger bursts of random numbers.

That's exactly what these numbers mean - but the numbers come from
the estimator, and don't really reflect anything concrete at all.
It works like this:

  Every sample added has a timestamp (cycle counter on capable
  ports, microtime otherwise), and usually some device-specific
  data (such as maybe the disk sector being accessed).

  Each sample is assumed to add at most one bit of "entropy". In
  practice, there are probably many more "unknown bits" in any
  sample, but how would you measure such a thing?

  Each byte read from the pool reduces the estimate by 8 bits.  In
  practice, you probably get much less information about the internal
  state of the pool than this, but again, this is a safe absolute
  upper bound.

There's a line of (non-mathematical) argument that says it would
be very hard or impossible for the "entropy" of the pool to ever
decrease. It may be so, but I wouldn't rely on it.

  All data is XOR'd into the pool (in ways that depend on the
  current pool contents), so no known bits ever "overwrite" unknown
  bits, but unknown bits can change known ones to unknown ones.

  Reads from the pool don't give you pool contents, they come from
  an XOR-folded SHA-1 hash of the entire pool - so every bit in
  the pool has influence. We assume SHA-1 is secure.

  After a read, the unfolded data is stirred back into the pool;
  because of the folding, this contains at least as many unknown
  bits as could have been disclosed in the folded data.

The only notable known risk is depending on the random data very
early in the boot process, before there has been a chance for much
data to be stirred in.

One enhancement under consideration is the use of a good PRNG to
generate urandom (or frandom) output, reseeded from the pool every
so often. This would be faster, and avoid draining the estimator
for some workload random data for heavy consumers.

> > options         RND_POOLWORDS=512
> 
> I made a quick "grep RND_POOLWORDS /usr/src/sys/arch/i386/conf/*"
> and there was nothing (1.6 release syssrc though). The only manpage
> that has got some indication of this is rnd(4):
> 
> I'd say this is far too implicit so for a normal user like me
> this option is not the in well documented part. Thank you for
> mentioning it.

It's not really supposed to be changed, I think the original
author was nervous about unintended cryptographic consequences of
a larger pool.  I don't know what they might be, but that's entirely
the point (I've not seen any particular justification for the present
value either, for that matter.)

I have been considering raising the default, but it does make random
reads slower (more data to SHA).

I would welcome input from expert analysis.

> > Having a larger pool probably helps fudge the estimator more than
> > it really helps the "quality" of the randomness produced.
> 
> Why that? (sorry I'm no mathematician ;-)

Simply because it allows the estimator to go to a higher value, so
that single reads don't "drain" so much of its value. I hope the
"quality" (whatever exactly that means) is more than adequate even
with the smaller pool.

--
Dan.