Re: kern/57580: evbarm/earmv7hf RPI2 scheduler/cpu stall

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,kardel%netbsd.org@localhost
Subject: Re: kern/57580: evbarm/earmv7hf RPI2 scheduler/cpu stall
From: Frank Kardel <kardel%netbsd.org@localhost>
Date: Thu, 17 Aug 2023 05:50:02 +0000 (UTC)

The following reply was made to PR kern/57580; it has been noted by GNATS.

From: Frank Kardel <kardel%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/57580: evbarm/earmv7hf RPI2 scheduler/cpu stall
Date: Thu, 17 Aug 2023 07:45:58 +0200

 So I got a partial success with DDB.
 
 Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4:        bx      r14
 db{0}> mach cpu 1
 kdb_trap: switching to cpu1
 [ 85689.2095237] Mutex error: mutex_vector_enter,516: assertion failed: 
 !cpu_intr_p()
 
 [ 85689.2095237] lock address : 913d3940
 [ 85689.2095237] current cpu  :                  1
 [ 85689.2095237] current lwp  : 0x00000000914c3400
 [ 85689.2095237] owner field  : 000000000000000000 
 wait/spin:                0/0
 
 [ 85689.2095237] panic: lock error: Mutex: mutex_vector_enter,516: 
 assertion failed: !cpu_intr_p(): lock 0x913d3940 cpu 1 lwp 0x914c3400
 [ 85689.2095237] cpu1: Begin traceback...
 [ 85689.2095237] 0x80cb3b44: netbsd:db_panic+0x14
 [ 85689.2095237] 0x80cb3b64: netbsd:vpanic+0x114
 [ 85689.2095237] 0x80cb3b7c: netbsd:panic+0x24
 [ 85689.2095237] 0x80cb3c44: netbsd:lockdebug_abort+0xe8
 [ 85689.2095237] 0x80cb3c5c: netbsd:mutex_abort+0x30
 [ 85689.2095237] 0x80cb3cc4: netbsd:mutex_enter+0x48c
 [ 85689.2095237] 0x80cb3cdc: netbsd:usbd_set_polling+0x34
 [ 85689.2095237] 0x80cb3cfc: netbsd:ukbd_cnpollc+0x5c
 [ 85689.2095237] 0x80cb3d14: netbsd:wsdisplay_pollc+0x60
 [ 85689.2095237] 0x80cb3d2c: netbsd:cnpollc+0x4c
 [ 85689.2095237] 0x80cb3dbc: netbsd:kdb_trap+0x19c
 [ 85689.2095237] 0x80cb3dcc: netbsd:pic_ipi_ddb+0x18
 [ 85689.2095237] 0x80cb3df4: netbsd:bcm2836mp_ipi_handler+0x11c
 [ 85689.2095237] 0x80cb3e44: netbsd:pic_dispatch+0x54
 [ 85689.2095237] 0x80cb3ecc: netbsd:pic_do_pending_ints+0x434
 [ 85689.2095237] 0x80cb3f34: netbsd:irq_idle_entry+0x38
 [ 85689.2095237] 0x80cb3f94: netbsd:idle_loop+0x1b8
 [ 85689.2095237] cpu1: End traceback...
 
 [ 85689.2095237] dump to dev 92,1 not possible
 [ 85689.2095237] rebooting...
 
 Additionally getting stacks from crash got unlucky and stuck.
 
 The currently runnable processes from ps are:
    UID   PID  PPID    CPU PRI NI    VSZ   RSS WCHAN   STAT TTY       
 TIME COMMAND
      0     0     0      0 222  0      0  8724 -       RKl ?     32:48.28 
 [system]
   1002   585  3387      0  85  0 164508  3084 -       Rs ?      0:00.14 
 postgres: logical replication launcher
   1002   588  3387      0  85  0 164508  3268 -       Rs ?      0:02.57 
 postgres: autovacuum launcher
   1005   594     1      0  85  0 100024 74656 -       R ?     37:17.08 
 /usr/bin/perl /usr/pkg/fhem/fhem.pl /usr/pkg/fhem/fhem.cfg
      0   736     1      0  85  0  15780  4604 -       Rs ?      0:13.95 
 /usr/pkg/bin/perl -wT /usr/pkg/sbin/munin-node
      0   969     1      0  85  0  13716  2336 -       Rs ?      0:01.12 
 /usr/libexec/postfix/master -w
      0  1249     1      0  85  0   6396  1444 -       Rs ?      0:01.19 
 /usr/sbin/cron
      0  1297  1296      0  85  0   7156  1896 select  Is ?      0:00.13 
 SCREEN -R (screen)
      0  1454     1      0  85  0   9744  1684 -       Rs ?      1:13.53 
 /usr/sbin/syslogd -s
      0  1646     1   7048  85  0 112428 10004 -       Rsl ?      2:08.08 
 /usr/sbin/named
   1002  1739  3387      0  85  0 163884  2564 -       Rs ?      0:01.31 
 postgres: walwriter
   1001  1882     1  42491  85  0  44568  3800 -       Rsl ?      0:00.63 
 /usr/pkg/sbin/zebra -P 0 -d
   1008  3288     1 102890  85  0 187872 39556 -       Rsl ?      1:19.25 
 /usr/pkg/bin/node /usr/pkg/zigbee2mqtt/index.js
   1002  3387     1      0  85  0 163600  3988 -       Rs ?      0:00.57 
 /usr/pkg/bin/postgres -D /usr/pkg/pgsql/data
      0 11982  8414      0   0  0  11476  3564 -       Rs ?      0:07.43 
 (perl)
 
 As Hearbeat does not seem to be supported in NetBSD-10, I try to
 run a -current kernel and see how far I get.
 
 On 08/13/23 14:00, Taylor R Campbell wrote:
 >   Can you enter ddb in this state (not crash(8) -- use C-A-ESC in wskbd
 >   or break at serial console or (set and) type the hw.cnmagic sequence),
 >   and do `mach cpu 1', and then `bt'?  Once you get output, you can do
 >   `continue' to return from ddb.
 >   
 >   If not, can you try enabling `options HEARTBEAT' and `options
 >   HEARTBEAT_MAX_PERIOD=15' in our kernel config and see if you get any
 >   diagnostics out of that?

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: Re: kern/57580: evbarm/earmv7hf RPI2 scheduler/cpu stall
Previous by Thread: Re: kern/57580: evbarm/earmv7hf RPI2 scheduler/cpu stall
Next by Thread: Re: kern/57580: evbarm/earmv7hf RPI2 scheduler/cpu stall
Indexes:

Home | Main Index | Thread Index | Old Index