tech-security: Re: Fw: Linux RNG paper

Subject: Re: Fw: Linux RNG paper
To: Steven M. Bellovin <smb@cs.columbia.edu>
From: Travis H. <solinym@gmail.com>
List: tech-security
Date: 03/23/2006 02:06:21
On 3/21/06, Steven M. Bellovin <smb@cs.columbia.edu> wrote:
> I saw the following message on a cryptography mailing list.  It would
> be good if someone did that analysis on the *BSD rng.

I am very eager to participate in refactoring/restructuring the RNG in
free Unices.  Although my professional experience is limited, I make
up for it in enthusiasm.  My preferred OS is NetBSD, or was, until I
experienced booting problems with Bustek adapters on that machine (the
BT958D and BT948 both have the latest firmware/bios yet hang during
boot before returning control to the mainboard BIOS).  If I could get
them to boot, I'd install a recent distro, perhaps -current, and
develop there.  So any help would be greatly appreciated (email me
privately).

If I can find time, I'll do an analogous examination of the NetBSD
RNG.  Note that FreeBSD uses yarrow, and does not have a "real" RNG,
although it reseeds yarrow with new entropy as it comes in.

I'd like to see:

1) more flexibility in the design -- unfettered access to entropy pool
inputs, seperate harvesting and estimation code, etc.
2) some statistical tests for common failure modes of the entropy
sources (this may require userland periodic polling and would require
source data as it appears prior to any processing).  Might benefit
from some X-based tools for visualizing the distributions, because a
picture is worth at least ten statistics (I don't really like
graphics, but this is one case where it conveys substantially more
information than ASCII text).
3) potentially use extractors to "extract" the randomness from weakly
random sources.  For more information on extractors, see Zuckerman's
work or google 'e2random' and find Seth Hardy's paper, but basically
it combines "uniformly random" bits with "weakly random" bits to
produce bits that are e-close to uniform.
4) Having /dev/urandom drain the /dev/random pool of bits is uncool.
5) I'd like to see various statistics available via sysctl or
something else that can pass to userland.  For example, at what rate
are the entropy sources producing bits?  Etc.
6) I'd like writes to /dev/random to block if the pool is full, so I
can write simple daemons which update the entropy pools, but block
when they're full.
7) I'd like a userland tool for updating the pool and entropy counts,
because I use other languages than C on occasion.
8) I'd like to see the documentation suggest using "head -c X
/dev/{u,}random" to get a certain number of random bytes, because the
partial reads returned by the random devices make it difficult to use
dd to achieve the same thing and I find that counterintuitive.=20
Actually with ibs=3D1 and obs=3DX and count=3D1, I think you can do it, but
I wager that's not terribly obvious, as well as being inefficient.
9) Faster, faster, until the thrill of speed outweighs the fear of
death!  Seriously, SHA-1 over a large pool is a superslow way to
generate 160 bits of output; I get 1MB/s on my relatively old dual
P3-550 machine.  This is a very time-consuming way to fill (or worse,
wipe) a 200GB partition, for example.
10) Optional whitening, perhaps by XOR with a conventional PRNG
(perhaps a long-period one like the Merseinne Twister).  This may
eliminate any biasing or dependence among bits, but should remain
optional and probably off by default if we're passing the output
through something like a hash already.  OTOH, for the amount of state
it takes to keep it running, a simple PRNG whitener is quite
efficient.
11) Opportunisitc external reseeders.  There are a couple of sites on
the web that provide free random bits, and it'd be cool to securely
update the pool with them (_without_ increasing entropy counts of
course).  If an attacker can see your WAN traffic, then this buys
nothing, but if he can't, it buys recovery from state compromise.  So
there's little to lose from this, assuming that the attacker cannot
control the output if he controls partial inputs (and with SHA-1 or
the like, he cannot) and that these inputs are mixed in such a way
that he can control the pool contents without knowing the old
contents.  Of course if he knows the old contents and your network
traffic he knows the new state, but he would have known that without
external reseeders anyway, so you're no worse off.
12) Bruce Schneier's description of fortuna in his "Practical
Cryptography" book has a number of improvements over yarrow, and I'd
like to borrow some of them if possible.
12a) His idea for updating pool hashes at interrrupt time may be too
computationally expensive for interrupt-time processing, but perhaps
some buffering can be performed and hash context updates delayed or
deferred until a more convenient time, otherwise the accumulator is
brilliant.  I think what I'd really like is a hash with a fast
compression/update function, but a strong finalization function, since
there's only a need to finalize when someone is reading, so 99% of the
time it needn't be called.
12b) He also suggests that the input to the pools have an unambiguous
interpretation (in his words, that it is "parsable").  He suggests
numbering each source 0..255, and each input event has the form
<srcid, length(data), data>.  Without an unambiguous interpretation,
two different series of events would trigger the same input to the
pools, which is clearly a Bad Thing.

One thing I've wondered about, given the various RNG implementations,
is how application software would know to ask for /dev/urandom versus
/dev/random (remember, they're the same on FreeBSD) and all that.  Is
this worth writing a small library that allows applications to
negotiate with what the kernel provides, to agree on some mechanism
for getting bits?

While I'm at it, I'm considering implementing something that provides
cryptographically strong random permutations, for cases in the kernel
where we don't want to repeat under any circumstances, yet we also
wish to be a bit less predictable than a counter.  Of course the 2^nth
output is completely deterministic, since it'd be the only symbol not
seen yet, and we aren't repeating symbols unless absolutely necessary.

I will, of course, run any design by both the cryptography mailing
list for security evaluation and later by the NetBSD team for design
and implementation evaluation (feel free to make security-related
critiques then too, I'm not picky).

Then again, I have a tendency to overestimate what I'll have time to
do, when it's fun stuff.  We shall see.
--
Security Guru for Hire http://www.lightconsulting.com/~travis/ -><-
GPG fingerprint: type 09D3F 395A DAC5 5CCC 9066  151D 0A6B 4098 0C55 1484