tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Initial entropy with no HWRNG
On Tue, May 12, 2020 at 10:00:20AM +0300, Andreas Gustafsson wrote:
>
> Adding more sources could mean
> reintroducing some timing based sources after careful analysis, but
> also things like having the installer install an initial random seed
> on the target machine (and if the installer itself lacks entropy,
> asking the poor user to pound on the keyboard until it does).
As Peter Gutmann has noted several times in the past, for most use cases
you don't have to have input you know are tied to physical random processes;
you just have to have input you know it's uneconomical for your adversary
to predict or recover.
This is why I fed the first N packets off the network into the RNG; why I
added sampling of kernel printf output (and more importantly, its *timing);
etc. But the problem with this kind of stuff is that there really are use
cases where an adversary _can_ know these things, so it is very hard to
support an argument that _in the general case_ they should be used to
satisfy some criterion that N unpredictable, unrecoverable (note I do
*not* say "random" here!) bits have been fed into the machinery. The data
I fed in from the VM system are not quite the same, but in a somewhat similar
situation.
That said, I also added a number of sources which we *do* know are tied to
real random physical processes: the environmental sensors such as temperature,
fan speed, and voltage, where beyond the sampling noise you've got thermal
processes on both micro and macro scales, turbulence, etc; and the "skew"
source type which, in theory, represents skew between multiple oscillators
in the system, one of the hybrid analog-digital RNG designs with a long
pedigree (though as implemented in the example "callout" source, less-so).
Finally, there's a source type I *didn't* take advantage of because I was
advised doing so would cause substantial power consumption: amplifier noise
available by setting muted audio inputs to max gain (we can also use the
sample arrival rate here as a skew source).
I believe we can and should legitimately record entropy when we add input
of these kinds. But there are three problems with all this.
*Problems are marked out with numbers, thoughts towards solutions or
mitigations with letters.*
1) It's hard to understand how many bits of entropy to assign to a sample from
one of these sources. How much of the change in fan speed is caused by
system load as a factor (and thus highly correlated with CPU temperature),
and how much by turbulence, which we believe is random? How much of the
signal measured from amplifier noise on a muted input is caused by the
bus clock (and clocks derived from it, etc.) and how much is genuine
thermal noise from the amplifier? And so forth.
The delta estimator _was_ good for these things, particularly for things
like fans or thermistors (where the macroscopic, non-random physical
processes _are_ expected to have continuous behavior), because it could
tell you when to very conservatively add 1 bit. If you believe that at
least 1 bit of each 32-bit value from the input really is attributable to
entropy. I also prototyped an lzf-based entropy estimator, but it never
really seemed worth the trouble -- it is, though, consistent with how
the published analysis of physical sources often estimates minimum
entropy.
A) This is a longwinded way of saying I firmly believe we should count
input from these kinds of sources towards our "full entropy" threshold
but need to think harder about how.
2) Sources of the kind I'm talking about here seldom contribute _much_
entropy - with the old estimator, perhaps 1 bit per change - so if you
need to get 256 bits from them you may be waiting quite some time (the
audio-amp sources might be different, which is another reason despite
their issues, they are appealing).
3) Older or smaller systems don't have any of this stuff onboard so it does
them no good: no fan speed sensors (or no drivers for them), no temp
sensors, no access to power rail voltages, certainly no audio, etc.
B) One thing we *could* do to help out such systems would be to actually run
a service to bootstrap them with entropy ourselves, from the installer,
across the network. Should a user trust such a service? I will argue
"yes". Why?
B1) Because they already got the binaries or the sources from us; we could
simply tamper those to do the wrong thing instead.
Counterargument: it's impossible to distinguish the output of a
cryptographically-strong stream cipher keyed with something known
to us from real random data, so it's harder to _tell_ if we subverted
you.
Counter-counter-argument: When's the last time you looked? Users
who _do_ really inspect the sources and binaries they get from us
can always not use our entropy server, or run their own.
B2) Because we have already arranged to mix in a whole pile of stuff whose
entropy is hard to estimate but which would be awfully hard for us, the
OS developers, to predict or recover with respect to an arbitrary system
being installed (all the sources we used to count but now don't, plus
the ones we never counted). If you trust the core kernel RNG mixing
machinery, you should have some level of confidence this protects you
against subversion by an entropy server that initially gets the ball
rolling.
B3) Because we can easily arrange for you to mix the input we give you with
an additional secret we don't and can't know, which you may make as strong
as you like: we can prompt you at install time to enter a passphrase, and
use that to encrypt the entropy we serve you, using a strong cipher,
before using it to initially seed the RNG.
So, those are the problems I see and some potential solutions: figure out how
to better estimate the entropy of the environmental sources we have available,
and count such estimates by default; consider using audio sources by default;
and run an entropy server to seed systems from the installer.
What do others think?
Thor
Home |
Main Index |
Thread Index |
Old Index