pkgsrc-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pkg/59808 (random python3 process crashes in NetBSD VMs)





  - Is there stdout/stderr stored anywhere?

That's what I have (from a new run in a virtualbox VM):

netbsd9: =================================== FAILURES =================================== netbsd9: ________________________ src/borg/testsuite/archiver.py ________________________ netbsd9: [gw9] netbsd9 -- Python 3.11.13 /vagrant/borg/borg/.tox/py/bin/python netbsd9: worker 'gw9' crashed while running 'src/borg/testsuite/archiver.py::ArchiverTestCase::test_unknown_feature_on_rename' netbsd9: ================================ tests coverage ================================ netbsd9: ______________ coverage: platform netbsd9, python 3.11.13-final-0 ______________
    netbsd9:
netbsd9: ___________________________ coverage: failed workers ___________________________
    netbsd9:
netbsd9: The following workers failed to return coverage data, ensure that pytest-cov is installed on these workers.
    netbsd9: gw9

  - Did the parent process determine it terminated on a signal, and if
    so, what signal,

I don't know, I am not a developer of pytest(-xdist).

and did it dump core?

I found exactly 1 core dump:

-rw------- 1 vagrant wheel 55814056 Dec 2 08:40 /tmp/tmp6pedxq5l/python.core

    If you can find a core dump, and it's (say) from a program called
    /usr/pkg/bin/foo, can you get a stack trace out of gdb?
# gdb /usr/pkg/bin/foo /path/to/foo.core
    (gdb) bt
    (gdb) info registers
    (gdb) frame apply all info locals

$ pwd
/vagrant/borg/borg-env/bin

$ ls -l python
lrwxrwxr-x 1 vagrant wheel 23 Dec 2 08:32 python -> /usr/pkg/bin/python3.11

$ gdb python /tmp/tmp6pedxq5l/python.core

GNU gdb (GDB) 8.3
...
Reading symbols from python...
(No debugging symbols found in python)
[New process 1]
[New process 2]
Core was generated by `python'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
[Current thread is 1 (process 1)]

(gdb) bt
#0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
#1 0x00007355f80525f1 in faulthandler_fatal_error () from /usr/pkg/lib/libpython3.11.so.1.0
#2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
#3  0x000000010000000b in ?? ()
#4  0x0000000000000000 in ?? ()

(gdb) info registers
rax            0x0                 0
rbx            0x7355f81226f6      126813071353590
rcx            0x7355f6967dea      126813046472170
rdx            0x0                 0
rsi            0xb                 11
rdi            0x1                 1
rbp            0xb                 0xb
rsp            0x7355f3e4a338      0x7355f3e4a338
r8             0x0                 0
r9             0x0                 0
r10            0x7355f6967dca      126813046472138
r11            0x206               518
r12            0x7355f6bd0680      126813048997504
r13            0xc                 12
r14            0x0                 0
r15            0x7355f84162a0      126813074449056
rip            0x7355f6967dea      0x7355f6967dea <_lwp_kill+10>
eflags         0x206               [ PF IF ]
cs             0x47                71
ss             0x3f                63
ds             0x23                35
es             0x23                35
fs             0x0                 0
gs             0x0                 0
fs_base        <unavailable>
gs_base        <unavailable>

(gdb) frame apply all info locals
#0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
No symbol table info available.
#1 0x00007355f80525f1 in faulthandler_fatal_error () from /usr/pkg/lib/libpython3.11.so.1.0
No symbol table info available.
#2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
No symbol table info available.
#3  0x000000010000000b in ?? ()
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)

  - Is there any relevant output in `dmesg'?

Nothing related, just the boot messages and a few unrelated msgs.

  I ran the test suite three times in a VM by loosely following the
  instructions at
  https://github.com/borgbackup/borg/blob/9a0122995c32aa657a2b1cac7a015cec6d1=
  a89ab/.github/workflows/ci.yml#L432-L468
  but so far I haven't seen any crashes.  Takes about an hour to run;
  how often do the crashes occur?

I think I currently see them in most testsuite runs on github CI on netbsd 10.

I also needed only 1 try now to get one process crashing in the virtualbox VM with netbsd 9.

Sometimes, multiple process crashes in 1 testsuite run.

In the past, I have also seen them frequently on netbsd 9.

Thanks for your detailled help!



Home | Main Index | Thread Index | Old Index