Re: CVS commit: src

To: nia <nia%NetBSD.org@localhost>
Subject: Re: CVS commit: src
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Sat, 15 Aug 2020 18:49:11 +0000
> Date: Sat, 15 Aug 2020 10:23:02 +0000
> From: nia <nia%NetBSD.org@localhost>
> 
> Obviously, I disagree with core's decision, but let's try to be
> productive about this.
> 
> I'm happy to have getrandom in NetBSD, it's a good thing. But not with
> this behaviour.
> 
> 1) Adopting getrandom for compatibility does not make sense.
> 
>    NetBSD's behaviour for getrandom(x, y, 0) is incompatible with Linux
>    and FreeBSD _at least_ - they will unblock after the kernel receives
>    an arbitrary amount of "random-ish" data. NetBSD will block forever
>    until the sysadmin intervenes (by writing to /dev/random or attaching
>    a forensically analyzed HWRNG, or rebooting with a seed file).

- The behaviour is compatible in the sense that the getrandom calls
  that _can_ lead to blocking are the same: a getrandom call that may
  block on NetBSD may also block on Linux/FreeBSD/&c.; a getrandom
  call that is guaranteed never to block on Linux/FreeBSD/&c. is
  guaranteed never to block on NetBSD.

- The behaviour is compatible in the sense that if getrandom blocks,
  then it unblocks when the operating system has decided there is
  adequate entropy.

- The behaviour is incompatible only in the sense that NetBSD's idea
  of `adequate entropy' is stronger than FreeBSD's or Linux's, so
  blocking is _more likely_ on NetBSD than on FreeBSD or Linux.

The difference manifests in a user-visible way primarily only on
systems where users are actually in danger of not having adequate
entropy -- in other words, on systems where the signal of an alarm
might actually amtter.  I would like to put effort toward addressing
that by making it easier to provide adequate entropy rather than by
papering over the alarm.

>    NetBSD's behaviour for GRND_RANDOM is incompatible with FreeBSD,
>    which treats it the same as getrandom(x, y, 0).

Why do you say this is incompatible?  getrandom(...,GRND_RANDOM) just
makes fewer promises than getrandom(...,0) as a portable API -- it
_may_ block more often and it _may_ return short.  In practice, on
NetBSD it only blocks when getrandom(...,0) would block too.

If FreeBSD makes _more_ promises, fine, but the GRND_RANDOM flag a
silly API that exists only for Linux source compatibility that very
few reasonable applications use.  So I don't see why it's important to
put any attention on it or make any stronger promises about it than
portable applications can rely on -- that's why, e.g., the man page I
wrote specifically calls it out as silly, not recommended, for Linux
source compatibility only, and with no usage examples.

> 2) The main problem raised with getentropy is that Solaris has a buggy
>    implementation that projects such as Python were seeking to avoid
>    (because it blocked a lot, and they preferred something that
>    wouldn't).

The main problem raised with getentropy is that between four different
operating systems (OpenBSD, Linux, FreeBSD, Solaris) there seemed to
be three different behaviours around blocking (block never, block at
boot, block often).  That's not good for a portable API, particularly
one which was originally defined never to block, period.

I'm not saying I disagree with adopting getentropy.  I'm just saying
that _as a portable API_ its semantics is murkier than getrandom's,
despite the additional complexity of flags in getrandom.

Indeed, I made an argument, based on a survey of how entropy pool
initialization and unblocking works across different operating
systems, for adopting getentropy(p,n) == getrandom(p,n,GRND_INSECURE)
as you're suggesting:

https://mail-index.netbsd.org/tech-userlevel/2020/05/09/msg012390.html

But there are reasonable counterarguments too, as gson raised:

https://mail-index.netbsd.org/tech-userlevel/2020/05/10/msg012397.html

So my somewhat elaborate argument isn't strong enough for me to want
to push for it one way or another.  Sure would be nice if every
computer just had a reliable HWRNG!  But alas.

> 4) The original argument that we need the getrandom(x, y, 0) behaviour
>    to please Rust does not make sense, since Rust's randomness library
>    now uses never-blocking APIs on both OpenBSD and NetBSD. Same for
>    OpenSSL.

The Rust API specifically describes getrandom(p,n,0) semantics:

https://docs.rs/rand/0.7.3/rand/rngs/struct.OsRng.html

  `It is possible that when used during early boot the first call to
   OsRng will block until the system's RNG is initialised. It is also
   possible (though highly unlikely) for OsRng to fail on some
   platforms, most likely due to system mis-configuration.

  `After the first successful call, it is highly unlikely that
   failures or significant delays will occur (although performance
   should be expected to be much slower than a user-space PRNG).'

Obviously we can patch OpenSSL in base however we like, but at least
one OpenSSL developer reported being uncomfortable with having
getrandom(p,n,GNRD_INSECURE) semantics for getentropy if used
upstream:

https://mail-index.netbsd.org/tech-userlevel/2020/05/02/msg012334.html

  `If you make getentropy the insecure version, I will need to modify
   OpenSSL to switch to getrandom() on NetBSD.'

To be clear, I'm not saying that getrandom(p,n,GRND_INSECURE)
semantics is necessarily _wrong_ for these libraries, absent further
context -- just that getrandom(p,n,0) semantics more clearly meets the
security expectations of cryptography engineers.

> So, I suggest:
> 
> 1) Make the RANDOM case for getrandom an alias for the default behaviour,
>    as FreeBSD also does. It's just a nail for unsuspecting software to
>    step on, and we shouldn't be copying bad ideas from Linux into our
>    own syscalls.

Can you identify an existing application that actually behaves badly
with GRND_RANDOM as currently implemented, but reliably behaves well
on other systems?  I expect most applications just don't bother with
GRND_RANDOM but I haven't surveyed.

> 2) Add a sysctl knob to disable getrandom's blocking behaviour, for
>    systems without a forensically analyzed HWRNG. This provides an
>    obvious way to ensure the system doesn't block after entropy is
>    consolidated after we enter userland. Writing to /dev/random is
>    non-obvious.

What's the benefit of writing

sysctl -w kern.entropy.dontblock=1

vs

dd if=/dev/urandom of=/dev/random bs=32 count=1

in an rc script?  What would make one more obvious than the other, if
they can be written in the same place in documentation?

I think we already have too many knobs and bells and whistles here and
I would like to limit new ones to have really good justification.

>                 On these systems losing the on-disk seed file is a
>    critical error case that will cause blocking, currently.

Yes.  Is the seed file getting lost in practice?  If yes, that's a
real security problem -- blocking is a symptom of the real security
problem which is lacking underlying entropy, and I would like to focus
effort on fixing the problem rather than just the symptom.  So if
you've seen it get lost -- do you know how it might have been lost?

(I realize on netbsd<=8 any crash or unclean shutdown would lose it,
but we fixed that before netbsd-9 was released.)
Follow-Ups:
- Re: CVS commit: src
  - From: nia
References:
- Re: CVS commit: src
  - From: nia
Prev by Date: Re: CVS commit: src/share/man/man8
Next by Date: re: CVS commit: src/share/man/man8
Previous by Thread: Re: CVS commit: src
Next by Thread: Re: CVS commit: src
Indexes:
Home | Main Index | Thread Index | Old Index