tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: evbarm hang
On Thu, Apr 18, 2019 at 04:32:56PM +0200, Manuel Bouyer wrote:
> [...]
>
> (the common point is the dwc_gmac_intr() call, which ends up in pool_get().
> pool_get() seems to spin on the mutex_enter(&pp->pr_lock); call
> just before startover:.
> Unfortunably I can't get a stack trace for CPU 1:
> db{0}> mach cpu 1
> kdb_trap: switching to cpu1
> Stopped in pid 26110.1 (gcc) at netbsd:_kernel_lock+0xc4:<hang>
>
> Any idea how to process from here ?
Here's another instance. This one was with LOCKDEBUG, but LOCKDEBUG didn't
trigger.
0x9e06d8cc: netbsd:comintr+0xc
0x9e06d8e4: netbsd:gic_fdt_intr+0x2c
0x9e06d90c: netbsd:pic_dispatch+0x38
0x9e06d96c: netbsd:armgic_irq_handler+0xf4
0x9e06da2c: netbsd:irq_entry+0x68
0x9e06da84: netbsd:pool_get+0x70
0x9e06dad4: netbsd:pool_cache_get_slow+0x1f4
0x9e06db24: netbsd:pool_cache_get_paddr+0x288 0x9e06db44: netbsd:m_clget+0x34
0x9e06dba4: netbsd:dwc_gmac_intr+0x194
0x9e06dbbc: netbsd:gic_fdt_intr+0x2c
0x9e06dbe4: netbsd:pic_dispatch+0x110
0x9e06dc44: netbsd:armgic_irq_handler+0xf4
0x9e06dcf4: netbsd:irq_entry+0x68
0x9e06dd2c: netbsd:pmap_extract_coherency+0x16c
0x9e06dd7c: netbsd:pool_cache_get_slow+0x254
0x9e06ddcc: netbsd:pool_cache_get_paddr+0x288
0x9e06ddec: netbsd:m_clget+0x34
0x9e06de84: netbsd:sosend+0x38c
0x9e06deac: netbsd:soo_write+0x3c
0x9e06df04: netbsd:dofilewrite+0x7c
0x9e06df34: netbsd:sys_write+0x5c
0x9e06dfac: netbsd:syscall+0x12c
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
13250 1 2 1 0 92ba1d20 gcc
25370 1 2 0 0 929d8c80 gcc
16746> 1 7 0 0 91fc1a20 gcc
15636 1 2 0 40000 91b5d1c0 cc1plus
15822> 1 7 1 0 92db9d40 gcc
db{0}> tr/a 92db9d40
trace: pid 15822 lid 1 at 0x99feddbc
0x99feddbc: 91af4828
db{0}> tr/a 91fc1a20
trace: pid 16746 lid 1 at 0x9e06ddbc
0x9e06ddbc: 922c8ec0
Maybe more interesting:
0 21 3 1 204 90bf1640 xcall/1 xcall
0 20 1 1 200 90bf1380 softser/1
0 19 1 1 200 90bf10c0 softclk/1
0 18 1 1 200 90bf0e00 softbio/1
0 17 1 1 200 90bf0b40 softnet/1
0 16 1 1 201 90bf0880 idle/1
db{0}> tr/a 90bf0b40
trace: pid 0 lid 17 at 0x99e83e24
0x99e83e24: netbsd:pic_do_pending_ints+0x648
0x99e83eb4: netbsd:pic_do_pending_ints+0x614
0x99e83ee4: netbsd:splx+0x64
0x99e83f44: netbsd:canintr+0x290
0x99e83fac: netbsd:softint_dispatch+0x22c
0x9e1f9dbc: netbsd:softint_switch+0x58
0x9e1f9de4: netbsd:_splraise+0x3c
--- tf 0x99e83fb0 ---
canintr+0x290 is:
mutex_exit(softnet_lock);
(gdb) l *(splx+0x64)
0x80004270 is in splx (/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/arm/pic/pic_splfuncs.c:96).
91 "splx(%d) to a higher ipl than %d", savedipl, ci->ci_cpl);
92
93 ci->ci_intr_depth++;
94 pic_do_pending_ints(psw, savedipl, NULL);
95 ci->ci_intr_depth--;
96 KASSERTMSG(ci->ci_cpl == savedipl, "cpl %d savedipl %d",
97 ci->ci_cpl, savedipl);
(gdb) l *(pic_do_pending_ints+0x614)
0x800031f4 is in pic_do_pending_ints (/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/arm/pic/pic.c:583).
578 if (newipl == IPL_NONE && (ci->ci_astpending & __BIT(1))) {
579 pic_set_priority(ci, IPL_SCHED);
580 kpreempt(0);
581 }
582 #endif
583 if (ci->ci_cpl != newipl)
584 pic_set_priority(ci, newipl);
585 }
586
587 static void
my guess is that we did call kpreempt() here and switched to another lwp.
It is possible that we did so with a spin lock held ?
pic_do_pending_ints+0x648 is in pic_set_priority() (gdb points to
ci->ci_cpl = newipl;
but it may be the call to pic_list[0]->pic_ops->pic_set_priority() just
before).
From show all locks /t, this one could be interesting:
Locks held on CPU 1:
Lock 0 (initialized at pool_init)
lock address : 0x0000000090b79074 type : spin
initialized : 0x000000008041582c
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
current cpu : 0 last held: 1
current lwp : 0x0000000091fc1a20 last held: 0x0000000092db9d40
last locked* : 0x0000000080415e78 unlocked : 0x00000000804161f8
[ 1447.2013328] owner field : 0x0000000000010500 wait/spin: 0/1
trace: pid 15822 lid 1 at 0x99feddbc
0x99feddbc: 91af4828
0x0000000080415e78 is also the mutex_enter(&pp->pr_lock); line in pool_get().
trying to switch to cpu 1 did hang as usual.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index