pkgsrc-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pkg/59808 (random python3 process crashes in NetBSD VMs)



The following reply was made to PR pkg/59808; it has been noted by GNATS.

From: Thomas Waldmann <tw%waldmann-edv.de@localhost>
To: gnats-bugs%netbsd.org@localhost, port-amd64-maintainer%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, pkgsrc-bugs%netbsd.org@localhost, twaldmann%thinkmo.de@localhost
Cc: riastradh%NetBSD.org@localhost
Subject: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Date: Tue, 2 Dec 2025 14:36:35 +0100

 >   - Is there stdout/stderr stored anywhere?
 
 That's what I have (from a new run in a virtualbox VM):
 
      netbsd9: =================================== FAILURES 
 ===================================
      netbsd9: ________________________ src/borg/testsuite/archiver.py 
 ________________________
      netbsd9: [gw9] netbsd9 -- Python 3.11.13 
 /vagrant/borg/borg/.tox/py/bin/python
      netbsd9: worker 'gw9' crashed while running 
 'src/borg/testsuite/archiver.py::ArchiverTestCase::test_unknown_feature_on_rename'
      netbsd9: ================================ tests coverage 
 ================================
      netbsd9: ______________ coverage: platform netbsd9, python 
 3.11.13-final-0 ______________
      netbsd9:
      netbsd9: ___________________________ coverage: failed workers 
 ___________________________
      netbsd9:
      netbsd9: The following workers failed to return coverage data, 
 ensure that pytest-cov is installed on these workers.
      netbsd9: gw9
 
 >   - Did the parent process determine it terminated on a signal, and if
 >     so, what signal,
 
 I don't know, I am not a developer of pytest(-xdist).
 
 > and did it dump core?
 
 I found exactly 1 core dump:
 
 -rw-------  1 vagrant  wheel  55814056 Dec  2 08:40 
 /tmp/tmp6pedxq5l/python.core
 
 >     If you can find a core dump, and it's (say) from a program called
 >     /usr/pkg/bin/foo, can you get a stack trace out of gdb?
 >   
 >     # gdb /usr/pkg/bin/foo /path/to/foo.core
 >     (gdb) bt
 >     (gdb) info registers
 >     (gdb) frame apply all info locals
 
 $ pwd
 /vagrant/borg/borg-env/bin
 
 $ ls -l python
 lrwxrwxr-x  1 vagrant  wheel  23 Dec  2 08:32 python -> 
 /usr/pkg/bin/python3.11
 
 $ gdb python /tmp/tmp6pedxq5l/python.core
 
 GNU gdb (GDB) 8.3
 ...
 Reading symbols from python...
 (No debugging symbols found in python)
 [New process 1]
 [New process 2]
 Core was generated by `python'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
 [Current thread is 1 (process 1)]
 
 (gdb) bt
 #0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
 #1  0x00007355f80525f1 in faulthandler_fatal_error () from 
 /usr/pkg/lib/libpython3.11.so.1.0
 #2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
 #3  0x000000010000000b in ?? ()
 #4  0x0000000000000000 in ?? ()
 
 (gdb) info registers
 rax            0x0                 0
 rbx            0x7355f81226f6      126813071353590
 rcx            0x7355f6967dea      126813046472170
 rdx            0x0                 0
 rsi            0xb                 11
 rdi            0x1                 1
 rbp            0xb                 0xb
 rsp            0x7355f3e4a338      0x7355f3e4a338
 r8             0x0                 0
 r9             0x0                 0
 r10            0x7355f6967dca      126813046472138
 r11            0x206               518
 r12            0x7355f6bd0680      126813048997504
 r13            0xc                 12
 r14            0x0                 0
 r15            0x7355f84162a0      126813074449056
 rip            0x7355f6967dea      0x7355f6967dea <_lwp_kill+10>
 eflags         0x206               [ PF IF ]
 cs             0x47                71
 ss             0x3f                63
 ds             0x23                35
 es             0x23                35
 fs             0x0                 0
 gs             0x0                 0
 fs_base        <unavailable>
 gs_base        <unavailable>
 
 (gdb) frame apply all info locals
 #0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
 No symbol table info available.
 #1  0x00007355f80525f1 in faulthandler_fatal_error () from 
 /usr/pkg/lib/libpython3.11.so.1.0
 No symbol table info available.
 #2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
 No symbol table info available.
 #3  0x000000010000000b in ?? ()
 No symbol table info available.
 #4  0x0000000000000000 in ?? ()
 No symbol table info available.
 (gdb)
 
 >   - Is there any relevant output in `dmesg'?
 
 Nothing related, just the boot messages and a few unrelated msgs.
 
 >   I ran the test suite three times in a VM by loosely following the
 >   instructions at
 >   https://github.com/borgbackup/borg/blob/9a0122995c32aa657a2b1cac7a015cec6d1=
 >   a89ab/.github/workflows/ci.yml#L432-L468
 >   but so far I haven't seen any crashes.  Takes about an hour to run;
 >   how often do the crashes occur?
 
 I think I currently see them in most testsuite runs on github CI on 
 netbsd 10.
 
 I also needed only 1 try now to get one process crashing in the 
 virtualbox VM with netbsd 9.
 
 Sometimes, multiple process crashes in 1 testsuite run.
 
 In the past, I have also seen them frequently on netbsd 9.
 
 Thanks for your detailled help!
 
 


Home | Main Index | Thread Index | Old Index