NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/55485: bwfm @ sdmmc hangs, with some diagnosis
>Number: 55485
>Category: kern
>Synopsis: bwfm @ sdmmc hangs, with some diagnosis
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jul 13 10:15:00 +0000 2020
>Originator: matthew green
>Release: NetBSD 9.99.69
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
System: NetBSD hinotori.eterna23.net 9.99.68 NetBSD 9.99.68 (_hinotori_) #14: Sun Jul 12 22:34:00 PDT 2020 mrg%yesterday-when-i-was-mad.eterna23.net@localhost:/var/obj/evbarm-aarch64/usr/src2/sys/arch/evbarm/compile/_hinotori_ evbarm
Architecture: aarch64
Machine: evbarm
>Description:
we've been tracking hangs in bwfm(4) on the pinebookpro for a
while and most of the time there is no information, but i've
found some useful data related to this.
first, NET_MPSAFE kernel doesn't hard-hang the kernel when the
problem occurs. this is because he paths into the bwfm code
that hang aren't holding kernel_lock when they hang. you only
get the specific network interface to hang.
second, in the reproduction i just had, i found that the
hung 'ifconfig -a' command was stuck waiting for the bwfm
adaptive mutex in bwfm_sdio_txctl() with this trace:
fp ffffc00088d07780 mi_switch() at ffffc000004abecc netbsd:mi_switch+0x20c
fp ffffc00088d077e0 sleepq_block() at ffffc000004a88bc netbsd:sleepq_block+0x9c
fp ffffc00088d07820 turnstile_block() at ffffc000004ba0bc netbsd:turnstile_block+0x30c
fp ffffc00088d07890 mutex_enter() at ffffc0000048f58c netbsd:mutex_enter+0x1cc
fp ffffc00088d07910 bwfm_sdio_txctl() at ffffc000000ee9c0 netbsd:bwfm_sdio_txctl+0x60
fp ffffc00088d07940 bwfm_proto_bcdc_query_dcmd() at ffffc000001ea7d4 netbsd:bwfm_proto_bcdc_query_dcmd+0xd4
fp ffffc00088d079a0 bwfm_fwvar_var_get_data() at ffffc000001ec61c netbsd:bwfm_fwvar_var_get_data+0x9c
fp ffffc00088d07a00 bwfm_get_sta_info() at ffffc000001ed5c8 netbsd:bwfm_get_sta_info+0x58
fp ffffc00088d07b60 bwfm_ioctl() at ffffc000001ed808 netbsd:bwfm_ioctl+0x178
fp ffffc00088d07ba0 doifioctl() at ffffc0000057e778 netbsd:doifioctl+0x8f8
fp ffffc00088d07d40 sys_ioctl() at ffffc000004ee770 netbsd:sys_ioctl+0x420
fp ffffc00088d07e20 syscall() at ffffc0000008da2c netbsd:syscall+0x18c
tf ffffc00088d07ed0 el0_trap() at ffffc0000008c524 netbsd:el0_trap
this lock is:
* Lock 0 (initialized at bwfm_sdio_attach)
lock address : 0xffffc00001b2d180 type : sleep/adaptive
initialized : 0xffffc000000ef710
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
relevant cpu : 5 last held: 5
relevant lwp : 0xffff0000f687b080 last held: 0xffff0000f687b080
last locked* : 0xffffc000000efb00 unlocked : 0xffffc000000efb98
owner field : 0xffff0000f687b080 wait/spin: 1/0
Turnstile:
=> 0 waiting readers:
=> 1 waiting writers: 0xffff00002af94b00
it is held by the sdmmc task thread for bwfm:
fp ffffc00088008b00 splx() at ffffc000000041b0 netbsd:splx+0x80
fp ffffc00088008b30 callout_halt() at ffffc000004b9444 netbsd:callout_halt+0x54
fp ffffc00088008b60 sleepq_block() at ffffc000004a8940 netbsd:sleepq_block+0x120
fp ffffc00088008ba0 cv_timedwait() at ffffc0000046d71c netbsd:cv_timedwait+0x11c
fp ffffc00088008be0 cv_timedwaitbt() at ffffc0000046da74 netbsd:cv_timedwaitbt+0x74
fp ffffc00088008c20 dwc_mmc_exec_command() at ffffc000001e5ac4 netbsd:dwc_mmc_exec_command+0x4b4
fp ffffc00088008ca0 sdmmc_mmc_command() at ffffc000000e80d0 netbsd:sdmmc_mmc_command+0x40
fp ffffc00088008cd0 sdmmc_io_rw_extended() at ffffc000000e9374 netbsd:sdmmc_io_rw_extended+0xa4
fp ffffc00088008d90 sdmmc_io_read_4() at ffffc000000e9d74 netbsd:sdmmc_io_read_4+0x24
fp ffffc00088008db0 bwfm_sdio_task() at ffffc000000efb24 netbsd:bwfm_sdio_task+0x54
fp ffffc00088008e90 sdmmc_task_thread() at ffffc000000e8a00 netbsd:sdmmc_task_thread+0xb0
this thread also holds this lock (which seems unrelated):
* Lock 1 (initialized at dwc_mmc_init)
lock address : 0xffff0000f68f0250 type : sleep/adaptive
initialized : 0xffffc000001e6108
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 0
relevant cpu : 5 last held: 5
relevant lwp : 0xffff0000f687b080 last held: 0xffff0000f687b080
last locked* : 0xffffc000001e5634 unlocked : 0xffffc000001e5cbc
owner field : 0xffff0000f687b080 wait/spin: 0/0
Turnstile: no active turnstile for this lock.
this task thread is spinning waiting for a condition:
574 static void
575 dwc_mmc_exec_command(sdmmc_chipset_handle_t sch, struct sdmmc_command *cmd)
...
722 struct bintime timeout = { .sec = 15, .frac = 0 };
723 const struct bintime epsilon = { .sec = 1, .frac = 0 };
724 while (!ISSET(cmd->c_flags, SCF_ITSDONE)) {
725 error = cv_timedwaitbt(&sc->sc_intr_cv,
726 &sc->sc_intr_lock, &timeout, &epsilon);
727 if (error != 0) {
728 cmd->c_error = error;
729 SET(cmd->c_flags, SCF_ITSDONE);
730 mutex_exit(&sc->sc_intr_lock);
731 goto done;
732 }
733 }
this appears to be a loop that should exist after 15 seconds if
the condition fails to come true, but it isn't exiting. this
leaves both the dwc_mmc_init() mutex and the bwfm_sdio_attach()
mutex locked while this doesn't exit, and the bwfm hangs.
as i was attemting to look at the in-memory contents of the
timeout and epsilon values on the stack, i lost ddb and i had
to power cycle.
>How-To-Repeat:
use bwfm. use a lossy network for faster hangs.
>Fix:
Home |
Main Index |
Thread Index |
Old Index