Re: pkg/59808 (random python3 process crashes in NetBSD VMs)

To: gnats-bugs%netbsd.org@localhost, port-amd64-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, pkgsrc-bugs%netbsd.org@localhost, twaldmann%thinkmo.de@localhost
Subject: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
From: Thomas Waldmann <tw%waldmann-edv.de@localhost>
Date: Tue, 2 Dec 2025 14:36:35 +0100

  - Is there stdout/stderr stored anywhere?


That's what I have (from a new run in a virtualbox VM):

netbsd9: =================================== FAILURES===================================netbsd9: ________________________ src/borg/testsuite/archiver.py________________________netbsd9: [gw9] netbsd9 -- Python 3.11.13/vagrant/borg/borg/.tox/py/bin/pythonnetbsd9: worker 'gw9' crashed while running'src/borg/testsuite/archiver.py::ArchiverTestCase::test_unknown_feature_on_rename'netbsd9: ================================ tests coverage================================netbsd9: ______________ coverage: platform netbsd9, python3.11.13-final-0 ______________

    netbsd9:

netbsd9: ___________________________ coverage: failed workers___________________________

    netbsd9:

netbsd9: The following workers failed to return coverage data,ensure that pytest-cov is installed on these workers.

    netbsd9: gw9

  - Did the parent process determine it terminated on a signal, and if
    so, what signal,


I don't know, I am not a developer of pytest(-xdist).

and did it dump core?


I found exactly 1 core dump:

-rw------- 1 vagrant wheel 55814056 Dec 2 08:40/tmp/tmp6pedxq5l/python.core

    If you can find a core dump, and it's (say) from a program called
    /usr/pkg/bin/foo, can you get a stack trace out of gdb?

# gdb /usr/pkg/bin/foo /path/to/foo.core

    (gdb) bt
    (gdb) info registers
    (gdb) frame apply all info locals


$ pwd
/vagrant/borg/borg-env/bin

$ ls -l python

lrwxrwxr-x 1 vagrant wheel 23 Dec 2 08:32 python ->/usr/pkg/bin/python3.11


$ gdb python /tmp/tmp6pedxq5l/python.core

GNU gdb (GDB) 8.3
...
Reading symbols from python...
(No debugging symbols found in python)
[New process 1]
[New process 2]
Core was generated by `python'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
[Current thread is 1 (process 1)]

(gdb) bt
#0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12

#1 0x00007355f80525f1 in faulthandler_fatal_error () from/usr/pkg/lib/libpython3.11.so.1.0

#2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
#3  0x000000010000000b in ?? ()
#4  0x0000000000000000 in ?? ()

(gdb) info registers
rax            0x0                 0
rbx            0x7355f81226f6      126813071353590
rcx            0x7355f6967dea      126813046472170
rdx            0x0                 0
rsi            0xb                 11
rdi            0x1                 1
rbp            0xb                 0xb
rsp            0x7355f3e4a338      0x7355f3e4a338
r8             0x0                 0
r9             0x0                 0
r10            0x7355f6967dca      126813046472138
r11            0x206               518
r12            0x7355f6bd0680      126813048997504
r13            0xc                 12
r14            0x0                 0
r15            0x7355f84162a0      126813074449056
rip            0x7355f6967dea      0x7355f6967dea <_lwp_kill+10>
eflags         0x206               [ PF IF ]
cs             0x47                71
ss             0x3f                63
ds             0x23                35
es             0x23                35
fs             0x0                 0
gs             0x0                 0
fs_base        <unavailable>
gs_base        <unavailable>

(gdb) frame apply all info locals
#0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
No symbol table info available.

#1 0x00007355f80525f1 in faulthandler_fatal_error () from/usr/pkg/lib/libpython3.11.so.1.0

No symbol table info available.
#2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
No symbol table info available.
#3  0x000000010000000b in ?? ()
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)

  - Is there any relevant output in `dmesg'?


Nothing related, just the boot messages and a few unrelated msgs.

  I ran the test suite three times in a VM by loosely following the
  instructions at
  https://github.com/borgbackup/borg/blob/9a0122995c32aa657a2b1cac7a015cec6d1=
  a89ab/.github/workflows/ci.yml#L432-L468
  but so far I haven't seen any crashes.  Takes about an hour to run;
  how often do the crashes occur?

I think I currently see them in most testsuite runs on github CI onnetbsd 10.

I also needed only 1 try now to get one process crashing in thevirtualbox VM with netbsd 9.


Sometimes, multiple process crashes in 1 testsuite run.

In the past, I have also seen them frequently on netbsd 9.

Thanks for your detailled help!

References:
- Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
  - From: Taylor R Campbell via gnats

Prev by Date: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Next by Date: pkg/59820: QEMU 10.1.2 no longer work with `-smp 2` (or more) with NVMM (qemu 10.1.1 -> 10.1.2 regression)
Previous by Thread: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Next by Thread: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Indexes:

Home | Main Index | Thread Index | Old Index