NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57870: umount hang (ld/sdmmc/rtsx)



The following reply was made to PR kern/57870; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Rob Whitlock <rwhitlock22%gmail.com@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/57870: umount hang (ld/sdmmc/rtsx)
Date: Mon, 22 Jan 2024 22:31:54 +0000

 This is a multi-part message in MIME format.
 --=_zuuYtksK0C8RJaF9J8AwVVP/KZSlw/L+
 
 Can you please try to reproduce the deadlock with the attached patch?
 
 --=_zuuYtksK0C8RJaF9J8AwVVP/KZSlw/L+
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr57870-ldsdmmcdetachflushdeadlock"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr57870-ldsdmmcdetachflushdeadlock.patch"
 
 From 9d5fcd6249b8b67687e8e5e731a507592966a40e Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
 Date: Mon, 22 Jan 2024 22:24:20 +0000
 Subject: [PATCH] ld@sdmmc(4): Hack around deadlock in cache sync on detach.
 
 Yanking a card triggers the sdmmc discovery task, which runs in the
 sdmmc task thread, to detach any attached child devices.
 
 Detaching ld@sdmmc triggers a cache flush (via ldbegindetach ->
 disk_begindetach -> ld_lastclose -> ld_flush -> ioctl DIOCCACHESYNC),
 which is implemented by scheduling a task to do sdmmc_mem_flush_cache
 and then waiting for it to complete.
 
 The sdmmc_mem_cache_flush is done by an sdmmc task so it happens
 after all previously scheduled I/O operations -- that way the cache
 flush doesn't complete until the previously scheduled I/O operations
 are complete.
 
 However, when the cache flush task is issued from the discovery task,
 this doesn't work, because the cache flush task can't start until the
 discovery task has returned -- but the discovery task won't return
 until the cache flush task has completed.
 
 To work around this deadlock, which usually happens only when the
 device has been yanked anyway so further I/O would be lost anyway,
 just do the cache flush synchronously in DIOCCACHESYNC if we're
 running in the task thread.
 
 This isn't quite right -- implementation details of the task thread
 shouldn't bleed into ld@sdmmc, and running the cache sync _before_
 any subsequently scheduled I/O tasks is asking for trouble -- but it
 should serve to avoid the deadlock in PR kern/57870 until we can fix
 a host of concurrency bugs in sdmmc by fixing the locking scheme and
 running discovery in a separate thread from tasks.
 
 XXX pullup-10
 ---
  sys/dev/sdmmc/ld_sdmmc.c | 15 +++++++++++++++
  1 file changed, 15 insertions(+)
 
 diff --git a/sys/dev/sdmmc/ld_sdmmc.c b/sys/dev/sdmmc/ld_sdmmc.c
 index 0c310c3fe21b..9a527ad98db5 100644
 --- a/sys/dev/sdmmc/ld_sdmmc.c
 +++ b/sys/dev/sdmmc/ld_sdmmc.c
 @@ -599,9 +599,24 @@ static int
  ld_sdmmc_cachesync(struct ld_softc *ld, bool poll)
  {
  	struct ld_sdmmc_softc *sc =3D device_private(ld->sc_dv);
 +	struct sdmmc_softc *sdmmc =3D device_private(device_parent(ld->sc_dv));
  	struct ld_sdmmc_task *task;
  	int error =3D -1;
 =20
 +	/*
 +	 * If we come here through the sdmmc discovery task, we can't
 +	 * wait for a new task because the new task can't even begin
 +	 * until the sdmmc discovery task has completed.
 +	 *
 +	 * XXX This is wrong, because there may already be queued I/O
 +	 * tasks ahead of us.  Fixing this properly requires doing
 +	 * discovery in a separate thread.  But this should avoid the
 +	 * deadlock of PR kern/57870 (https://gnats.NetBSD.org/57870)
 +	 * until we do split that up.
 +	 */
 +	if (curlwp =3D=3D sdmmc->sc_tskq_lwp)
 +		return sdmmc_mem_flush_cache(sc->sc_sf, poll);
 +
  	mutex_enter(&sc->sc_lock);
 =20
  	/* Acquire a free task, or fail with EBUSY.  */
 
 --=_zuuYtksK0C8RJaF9J8AwVVP/KZSlw/L+--
 


Home | Main Index | Thread Index | Old Index