NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/57870: umount hang (ld/sdmmc/rtsx)
Can you please try to reproduce the deadlock with the attached patch?
From 9d5fcd6249b8b67687e8e5e731a507592966a40e Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Mon, 22 Jan 2024 22:24:20 +0000
Subject: [PATCH] ld@sdmmc(4): Hack around deadlock in cache sync on detach.
Yanking a card triggers the sdmmc discovery task, which runs in the
sdmmc task thread, to detach any attached child devices.
Detaching ld@sdmmc triggers a cache flush (via ldbegindetach ->
disk_begindetach -> ld_lastclose -> ld_flush -> ioctl DIOCCACHESYNC),
which is implemented by scheduling a task to do sdmmc_mem_flush_cache
and then waiting for it to complete.
The sdmmc_mem_cache_flush is done by an sdmmc task so it happens
after all previously scheduled I/O operations -- that way the cache
flush doesn't complete until the previously scheduled I/O operations
are complete.
However, when the cache flush task is issued from the discovery task,
this doesn't work, because the cache flush task can't start until the
discovery task has returned -- but the discovery task won't return
until the cache flush task has completed.
To work around this deadlock, which usually happens only when the
device has been yanked anyway so further I/O would be lost anyway,
just do the cache flush synchronously in DIOCCACHESYNC if we're
running in the task thread.
This isn't quite right -- implementation details of the task thread
shouldn't bleed into ld@sdmmc, and running the cache sync _before_
any subsequently scheduled I/O tasks is asking for trouble -- but it
should serve to avoid the deadlock in PR kern/57870 until we can fix
a host of concurrency bugs in sdmmc by fixing the locking scheme and
running discovery in a separate thread from tasks.
XXX pullup-10
---
sys/dev/sdmmc/ld_sdmmc.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/sys/dev/sdmmc/ld_sdmmc.c b/sys/dev/sdmmc/ld_sdmmc.c
index 0c310c3fe21b..9a527ad98db5 100644
--- a/sys/dev/sdmmc/ld_sdmmc.c
+++ b/sys/dev/sdmmc/ld_sdmmc.c
@@ -599,9 +599,24 @@ static int
ld_sdmmc_cachesync(struct ld_softc *ld, bool poll)
{
struct ld_sdmmc_softc *sc = device_private(ld->sc_dv);
+ struct sdmmc_softc *sdmmc = device_private(device_parent(ld->sc_dv));
struct ld_sdmmc_task *task;
int error = -1;
+ /*
+ * If we come here through the sdmmc discovery task, we can't
+ * wait for a new task because the new task can't even begin
+ * until the sdmmc discovery task has completed.
+ *
+ * XXX This is wrong, because there may already be queued I/O
+ * tasks ahead of us. Fixing this properly requires doing
+ * discovery in a separate thread. But this should avoid the
+ * deadlock of PR kern/57870 (https://gnats.NetBSD.org/57870)
+ * until we do split that up.
+ */
+ if (curlwp == sdmmc->sc_tskq_lwp)
+ return sdmmc_mem_flush_cache(sc->sc_sf, poll);
+
mutex_enter(&sc->sc_lock);
/* Acquire a free task, or fail with EBUSY. */
Home |
Main Index |
Thread Index |
Old Index