tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)



Further to this, here's the ktrace output from just before and after the
SIGINT:

  9350   9350 python   0.122852724 RET   mmap 127488830459904/0x73f34e73d000
  9350   9350 python   0.122890259 CALL  getrandom(0x73f34e742610,0x20,0)
  9350   9350 python   6.141515919 RET   getrandom -1 errno 4 Interrupted system call
  9350   9350 python   6.141522183 PSIG  SIGINT caught handler=0x73f34f182b25 mask=(): code=SI_NOINFO

So, how can getramdom(2) hang for so long despite the rest of the system
running on and doing things for over a day on a system that's been up
and running and busy building packages for nearly a week?

Perhaps the answer is that nothing seems to be contributing anything to
the entropy pool.  No matter what device I exercise, none of the numbers
in the following changes:


# rndctl -l
Source                 Bits Type      Flags
/dev/random               0 ???  estimate, collect, v
xbd6                      0 disk estimate, collect, v, t, dt
xbd5                      0 disk estimate, collect, v, t, dt
xbd4                      0 disk estimate, collect, v, t, dt
xbd3                      0 disk estimate, collect, v, t, dt
xennet0                   0 net  estimate, v, t, dt
xbd2                      0 disk estimate, collect, v, t, dt
xbd1                      0 disk estimate, collect, v, t, dt
xbd0                      0 disk estimate, collect, v, t, dt
cpu15                     0 vm   estimate, collect, v, t, dv
cpu14                     0 vm   estimate, collect, v, t, dv
cpu13                     0 vm   estimate, collect, v, t, dv
cpu12                     0 vm   estimate, collect, v, t, dv
cpu11                     0 vm   estimate, collect, v, t, dv
cpu10                     0 vm   estimate, collect, v, t, dv
cpu9                      0 vm   estimate, collect, v, t, dv
cpu8                      0 vm   estimate, collect, v, t, dv
cpu7                      0 vm   estimate, collect, v, t, dv
cpu6                      0 vm   estimate, collect, v, t, dv
cpu5                      0 vm   estimate, collect, v, t, dv
cpu4                      0 vm   estimate, collect, v, t, dv
cpu3                      0 vm   estimate, collect, v, t, dv
cpu2                      0 vm   estimate, collect, v, t, dv
cpu1                      0 vm   estimate, collect, v, t, dv
cpu0                      0 vm   estimate, collect, v, t, dv
hardclock                 0 skew estimate, collect, t
system-power              0 power estimate, collect, v, t, dt
autoconf                  0 ???  estimate, collect, t
seed                    128 ???  estimate, collect, v
# rndctl -s
                0 bits mixed into pool
              128 bits currently stored in pool (max 256)
                0 bits of entropy discarded due to full pool
                0 hard-random bits generated
                0 pseudo-random bits generated
# sysctl kern.entropy
kern.entropy.collection = 1
kern.entropy.depletion = 0
kern.entropy.consolidate = -11774
kern.entropy.gather = -11774
kern.entropy.needed = 128
kern.entropy.pending = 0
kern.entropy.epoch = 8

Even if I set the network devices to collect (rndctl -c -e -t net),
nothing changes.

Again, for the record:

# uname -a
NetBSD b2 9.99.81 NetBSD 9.99.81 (XEN3_DOMU) #1: Tue Mar 23 14:26:58 PDT 2021  woods@b2:/build/woods/b2/current-amd64-amd64-obj/work/woods/m-NetBSD-current/sys/arch/amd64/compile/XEN3_DOMU amd64


Also, is python the only thing that calls getrandom(2) with the flags
parameter set to the recommended value of zero?

Also, is the behaviour of getrandom(2) supposed to be the same as
/dev/random, i.e. w.r.t. the note in the original announcement of the
entropy overhaul, i.e. that it should never block once the system as
achieved full entropy?

- /dev/random no longer blocks repeatedly: it will block after boot
  until the system has full entropy, and the never again.  This means
  applications that issue repeated reads from /dev/random will no longer
  repeatedly hang.

If so then can I assume no device is actually contributing entropy and
that the system never achieved full entropy?

Should Xen domUs be running the commands recommended in the entropy
overhaul announcement to fool the system into thinking it has full
entropy?

	dd if=/dev/urandom of=/dev/random bs=32 count=1
	sysctl -w kern.entropy.consolidate=1

After I do this then I can read from /dev/random without blocking.

Can Xen domUs get entropy from their dom0?  Perhaps via xenstore?

Finally I just noticed that syslogd isn't collecting "entropy" messages
from the kernel.  My /var/log/kern does _not_ contain the following
(from dmesg):

[ 517813.480815] entropy: pid 19875 (python) blocking due to lack of entropy
[ 520426.415882] entropy: pid 19875 (python) blocking due to lack of entropy
[ 520468.885538] entropy: pid 19875 (python) blocking due to lack of entropy
[ 543351.589752] entropy: pid 19875 (python) blocking due to lack of entropy
[ 543351.589752] entropy: pid 19875 (python) blocking due to lack of entropy
[ 563844.834413] entropy: pid 7903 (python) blocking due to lack of entropy
[ 566365.511377] entropy: pid 9001 (python) blocking due to lack of entropy
[ 577473.897830] entropy: pid 9350 (python) blocking due to lack of entropy
[ 579179.381600] entropy: pid 25728 (od) blocking due to lack of entropy
[ 579186.994440] entropy: pid 11107 (cat) blocking due to lack of entropy
[ 579202.264290] entropy: pid 7248 (cat) blocking due to lack of entropy
[ 579669.831978] entropy: ready


--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>


At Tue, 30 Mar 2021 10:06:19 -0700, "Greg A. Woods" <woods%planix.ca@localhost> wrote:
Subject: python3.7 rebuild stuck in kernel in "entropy" during an "import" statement
>
> So I've been running a pkg-rolling_replace and one of the packages being
> rebuilt is python3.7, and it has got stuck, apparently on an "entropy"
> wait in the kernel, and it's been in this state for over 24hrs as you
> can see.
>
> The only things the process has open appear to be its stdio descriptors,
> two of which are are open on the log file I was directing all output to.
>
> This is on a Xen domU of a machine running:
>
> $ uname -a
> NetBSD xentastic 9.99.81 NetBSD 9.99.81 (XEN3_DOM0) #1: Tue Mar 23 14:39:55 PDT 2021  woods@xentastic:/build/woods/xentastic/current-amd64-amd64-obj/build/src/sys/arch/amd64/compile/XEN3_DOM0 amd64
>
>
> 09:51 [504] $ ps -lwwp 19875
> UID   PID  PPID CPU PRI NI   VSZ   RSS WCHAN   STAT TTY      TIME COMMAND
>   0 19875 11551   0  85  0 55412 11324 entropy I    pts/0 0:00.27 ./python -E -Wi /var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/compileall.py -d /usr/pkg/lib/python3.7 -f -x bad_coding|badsyntax|site-packages|lib2to3/tests/data /var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7
> 09:51 [505] $ ps -uwwp 19875
> USER   PID %CPU %MEM   VSZ   RSS TTY   STAT STARTED    TIME COMMAND
> root 19875  0.0  0.1 55412 11324 pts/0 I     9:09PM 0:00.27 ./python -E -Wi /var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/compileall.py -d /usr/pkg/lib/python3.7 -f -x bad_coding|badsyntax|site-packages|lib2to3/tests/data /var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7
> 09:51 [506] $ fstat -p 19875
> USER     CMD          PID   FD  MOUNT         INUM MODE         SZ|DV R/W
> root     python     19875   wd  /build    10645634 drwxr-xr-x    1024 r
> root     python     19875    0  /dev/pts         3 crw-------   pts/0 rw
> root     python     19875    1  /build     3721223 -rw-r--r--  28287492 w
> root     python     19875    2  /build     3721223 -rw-r--r--  28287492 w
> 09:51 [507] $ find /build -inum 3721223
> /build/packages/root/pkg_roll.out
> 09:51 [508] $
>
>
> It was killable -- I sent SIGINT from the tty and it died as expected.
>
>
> Running "make replace" gets it stuck in the same place again, an the
> SIGINT shows the following stack trace:
>
> PYTHONPATH=/var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7  LD_LIBRARY_PATH=/build/package-obj/root/lang/python37/work/Python-3.7.1  ./python -E -Wi /var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/compileall.py  -d /usr/pkg/lib/python3.7 -f  -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data'  /var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7
> ^T
> [ 563859.5589422] load: 0.39  cmd: make 15726 [wait] 0.23u 0.07s 0% 9184k
> make: Working in: /build/package-obj/root/lang/python37/work/Python-3.7.1
> make[1]: Working in: /work/woods/m-NetBSD-pkgsrc-current/lang/python37
> make: Working in: /work/woods/m-NetBSD-pkgsrc-current/lang/python37
> ^T
> [ 563866.4606073] load: 0.36  cmd: make 15726 [wait] 0.23u 0.07s 0% 9184k
> make: Working in: /work/woods/m-NetBSD-pkgsrc-current/lang/python37
> make: Working in: /build/package-obj/root/lang/python37/work/Python-3.7.1
> make[1]: Working in: /work/woods/m-NetBSD-pkgsrc-current/lang/python37
> ^?Traceback (most recent call last):
>   File "/var/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/compileall.py", line 20, in <module>
>     from concurrent.futures import ProcessPoolExecutor
>   File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
>   File "/build/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/concurrent/futures/__init__.py", line 43, in __getattr__
>     from .process import ProcessPoolExecutor as pe
>   File "/build/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/concurrent/futures/process.py", line 53, in <module>
>     import multiprocessing as mp
>   File "/build/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/multiprocessing/__init__.py", line 16, in <module>
>     from . import context
>   File "/build/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/multiprocessing/context.py", line 5, in <module>
>     from . import process
>   File "/build/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/multiprocessing/process.py", line 363, in <module>
>     _current_process = _MainProcess()
>   File "/build/package-obj/root/lang/python37/work/.destdir/usr/pkg/lib/python3.7/multiprocessing/process.py", line 347, in __init__
>     self._config = {'authkey': AuthenticationString(os.urandom(32)),
> KeyboardInterrupt
> *** Error code 1 (ignored)
> *** Signal 2
> *** Signal 2
>
>
>
> --
> 					Greg A. Woods <gwoods%acm.org@localhost>
>
> Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
> Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgp8pQhUFzKlx.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index