Re: evbarm hang

To: tech-kern%netbsd.org@localhost
Subject: Re: evbarm hang
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Thu, 18 Apr 2019 17:43:12 +0200

On Thu, Apr 18, 2019 at 04:32:56PM +0200, Manuel Bouyer wrote:
> [...]
> 
> (the common point is the dwc_gmac_intr() call, which ends up in pool_get().
> pool_get() seems to spin on the mutex_enter(&pp->pr_lock); call
> just before startover:.
> Unfortunably I can't get a stack trace for CPU 1:
> db{0}> mach cpu 1
> kdb_trap: switching to cpu1
> Stopped in pid 26110.1 (gcc) at netbsd:_kernel_lock+0xc4:<hang>
> 
> Any idea how to process from here ?

Here's another instance. This one was with LOCKDEBUG, but LOCKDEBUG didn't
trigger.
0x9e06d8cc: netbsd:comintr+0xc   
0x9e06d8e4: netbsd:gic_fdt_intr+0x2c
0x9e06d90c: netbsd:pic_dispatch+0x38
0x9e06d96c: netbsd:armgic_irq_handler+0xf4
0x9e06da2c: netbsd:irq_entry+0x68
0x9e06da84: netbsd:pool_get+0x70
0x9e06dad4: netbsd:pool_cache_get_slow+0x1f4
0x9e06db24: netbsd:pool_cache_get_paddr+0x288                                   0x9e06db44: netbsd:m_clget+0x34
0x9e06dba4: netbsd:dwc_gmac_intr+0x194
0x9e06dbbc: netbsd:gic_fdt_intr+0x2c
0x9e06dbe4: netbsd:pic_dispatch+0x110      
0x9e06dc44: netbsd:armgic_irq_handler+0xf4
0x9e06dcf4: netbsd:irq_entry+0x68        
0x9e06dd2c: netbsd:pmap_extract_coherency+0x16c
0x9e06dd7c: netbsd:pool_cache_get_slow+0x254
0x9e06ddcc: netbsd:pool_cache_get_paddr+0x288
0x9e06ddec: netbsd:m_clget+0x34
0x9e06de84: netbsd:sosend+0x38c                                        
0x9e06deac: netbsd:soo_write+0x3c                                           
0x9e06df04: netbsd:dofilewrite+0x7c
0x9e06df34: netbsd:sys_write+0x5c
0x9e06dfac: netbsd:syscall+0x12c

PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
13250    1 2   1         0           92ba1d20                gcc
25370    1 2   0         0           929d8c80                gcc
16746>   1 7   0         0           91fc1a20                gcc
15636    1 2   0     40000           91b5d1c0            cc1plus
15822>   1 7   1         0           92db9d40                gcc

db{0}> tr/a 92db9d40
trace: pid 15822 lid 1 at 0x99feddbc
0x99feddbc: 91af4828
db{0}> tr/a 91fc1a20
trace: pid 16746 lid 1 at 0x9e06ddbc
0x9e06ddbc: 922c8ec0

Maybe more interesting:
0       21 3   1       204           90bf1640            xcall/1 xcall
0       20 1   1       200           90bf1380          softser/1
0       19 1   1       200           90bf10c0          softclk/1
0       18 1   1       200           90bf0e00          softbio/1
0       17 1   1       200           90bf0b40          softnet/1
0       16 1   1       201           90bf0880             idle/1
db{0}> tr/a 90bf0b40
trace: pid 0 lid 17 at 0x99e83e24
0x99e83e24: netbsd:pic_do_pending_ints+0x648
0x99e83eb4: netbsd:pic_do_pending_ints+0x614
0x99e83ee4: netbsd:splx+0x64
0x99e83f44: netbsd:canintr+0x290
0x99e83fac: netbsd:softint_dispatch+0x22c
0x9e1f9dbc: netbsd:softint_switch+0x58
0x9e1f9de4: netbsd:_splraise+0x3c
--- tf 0x99e83fb0 ---

canintr+0x290 is:
mutex_exit(softnet_lock);

(gdb) l *(splx+0x64)
0x80004270 is in splx (/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/arm/pic/pic_splfuncs.c:96).
91                  "splx(%d) to a higher ipl than %d", savedipl, ci->ci_cpl);
92
93              ci->ci_intr_depth++;
94              pic_do_pending_ints(psw, savedipl, NULL);
95              ci->ci_intr_depth--;
96              KASSERTMSG(ci->ci_cpl == savedipl, "cpl %d savedipl %d",
97                  ci->ci_cpl, savedipl);
(gdb) l *(pic_do_pending_ints+0x614)
0x800031f4 is in pic_do_pending_ints (/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/arm/pic/pic.c:583).
578             if (newipl == IPL_NONE && (ci->ci_astpending & __BIT(1))) {
579                     pic_set_priority(ci, IPL_SCHED);
580                     kpreempt(0);
581             }
582     #endif
583             if (ci->ci_cpl != newipl)
584                     pic_set_priority(ci, newipl);
585     }
586
587     static void

my guess is that we did call kpreempt() here and switched to another lwp.
It is possible that we did so with a spin lock held ?

pic_do_pending_ints+0x648 is in pic_set_priority() (gdb points to
	ci->ci_cpl = newipl;
but it may be the call to pic_list[0]->pic_ops->pic_set_priority() just
before).

From show all locks /t, this one could be interesting:
Locks held on CPU 1:
Lock 0 (initialized at pool_init)
lock address : 0x0000000090b79074 type     :               spin
initialized  : 0x000000008041582c
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  0 last held:                  1
current lwp  : 0x0000000091fc1a20 last held: 0x0000000092db9d40
last locked* : 0x0000000080415e78 unlocked : 0x00000000804161f8
[ 1447.2013328] owner field  : 0x0000000000010500 wait/spin:                0/1
trace: pid 15822 lid 1 at 0x99feddbc
0x99feddbc: 91af4828

0x0000000080415e78 is also the mutex_enter(&pp->pr_lock); line in pool_get().

trying to switch to cpu 1 did hang as usual.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Follow-Ups:
- Re: evbarm hang
  - From: Manuel Bouyer

References:
- evbarm hang
  - From: Manuel Bouyer

Prev by Date: Re: evbarm hang
Next by Date: Re: evbarm hang
Previous by Thread: Re: evbarm hang
Next by Thread: Re: evbarm hang
Indexes:

Home | Main Index | Thread Index | Old Index