NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/60144: virtio(4) cache coherence issue
The following reply was made to PR kern/60144; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
Cc: thorpej%me.com@localhost, isaki%pastel-flower.jp@localhost, nick.hudson%gmx.co.uk@localhost,
gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost, tsutsui%ceres.dti.ne.jp@localhost,
adrian%NetBSD.org@localhost
Subject: Re: kern/60144: virtio(4) cache coherence issue
Date: Mon, 18 May 2026 23:29:58 +0000
This is a multi-part message in MIME format.
--=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW
OK, here is a candidate patch to address this issue. Most of what
needs review is the commit message, but it would be good to test
whether it empirically addresses the problem too. (Note that alone it
is probably _not_ enough to fix PR 60182, but it is probably needed
_in addition_ to the patch I already posted there.)
Isaki-san: Can you verify that this patch works?
--=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW
Content-Type: text/plain; charset="ISO-8859-1"; name="pr60144-virtiodmasync"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr60144-virtiodmasync.patch"
# HG changeset patch
# User Taylor R Campbell <riastradh%NetBSD.org@localhost>
# Date 1779137898 0
# Mon May 18 20:58:18 2026 +0000
# Branch trunk
# Node ID 0f2733c1e2e8f9e109ce9e1258023da90fe016f5
# Parent 7c298a30dee7608bf965a5082affaaafadcefa9b
# EXP-Topic riastradh-pr60144-virtiodmasync
virtio(4): Add missing BUS_DMASYNC_PREREAD operations.
And one missing BUS_DMASYNC_POSTREAD operation.
With this change, loads from vq->vq_used->flags (which occur
immediately after a transfer is submitted to test whether we need to
kick the host device) is separated by a PREREAD/POSTREAD cycle from
loads from vq->vq_used->idx (which occur when we think a transfer may
have completed, e.g. upon receiving an interrupt, to test whether it
has, in fact, completed).
Additionally, with this change, consecutive loads from
*vq->vq_avail_event are separated by a PREREAD/POSTREAD cycle.
Should fix virtio(4) issues on m68k and other related architectures
like mips and armv<7:
PR kern/60144: virtio(4) cache coherence issue
(And may make what I think is a mere potential performance
improvement on those and other architectures like x86 or aarch64,
because of the change to virtio_postpone_intr.)
Why?
This is tricky, and (despite a remark I made to the contrary in the
PR) I'm still not entirely convinced we have the rules articulated
sufficiently to prove the driver correct with or without the
changes, or to prove a bus_dma implementation correct.
But, based on the current mengarie of bus_dma implementations we
have, the crux of the issue is:
When we load twice from the same cache line affected by a DMA read
operation, if the DMA read operation may complete in the interim
between the two loads, it is necessary to have a PREREAD/POSTREAD
cycle between the loads -- not just POSTREAD.
For example, in a polling loop:
buf =3D alloc_buf()
bus_dmamap_load(map, buf)
bus_dmamap_sync(map, 0, offsetof(buf->ready), PREREAD)
bus_space_write(trigger_dma_at, map->dm_segs[0].ds_addr);
bus_dmamap_sync(map, 0, offsetof(buf->ready), POSTREAD)
while (!buf->ready) {
bus_dmamap_sync(map, 0, offsetof(buf->ready), PREREAD)
bus_dmamap_sync(map, 0, offsetof(buf->ready), POSTREAD)
}
...
The pair of PREREAD/POSTREAD syncs in the body of the loop must
_both_ appear in order for this to work; POSTREAD alone is not
enough.
On some architectures, such as x86, PREREAD/POSTREAD is stronger than
needed.
But on others -- such as m68k, mips, and armv<7 -- POSTREAD alone
isn't enough because it's a noop: these architectures assume that
after PREREAD, the driver doesn't load the memory in question until
the DMA read operation has completed, so flushing cached data in
PREREAD and doing nothing in POSTREAD is enough to ensure that the
driver's load isn't stale.
This scenario isn't really spelled out in the bus_dma(9) man page.
And bus_dma(9) doesn't even have a concept of cache lines. But, even
outside virtio(4), the issue has already bitten us with real
hardware:
1. USB ehci(4) TDs (transfer descriptors), back when they were packed
into shared cache lines. This required BUS_DMA_COHERENT, which is
-- by the letter of bus_dma(9) -- not supposed to affect
correctness.
Packing multiple TDs into a single shared cache line required
BUS_DMA_COHERENT because the host CPU and the device might
otherwise attempt to update the same cache line simultaneously,
leading one party to lose its updates; there was no sequence of
bus_dmamap_syncs that could make it work.
This was addressed by splitting the TDs across cache lines:
https://mail-index.NetBSD.org/source-changes/2024/09/23/msg153483.html
2. Polling loops, such as the following in fxp(4), where the driver
repeatedly loads from the same DMA buffer in memory, separated by
bus_dmamap_sync calls, until the DMA read operation is done and
the CPU loads a different value out of the buffer.
We don't need to define a notion of `the same cache line' in
bus_dma(9) for this case -- it's exactly the same location in
memory every time. The marked PREREAD was added to fix the driver
on sgimips O2.
1858 FXP_CDCONFIGSYNC(sc, BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
1859
1860 /*
1861 * Start the config command/DMA.
1862 */
1863 fxp_scb_wait(sc);
1864 CSR_WRITE_4(sc, FXP_CSR_SCB_GENERAL, sc->sc_cddma + FXP_CDCONFIGOF=
F);
1865 fxp_scb_cmd(sc, FXP_SCB_COMMAND_CU_START);
1866 /* ...and wait for it to complete. */
1867 for (i =3D 1000; i > 0; i--) {
1868 FXP_CDCONFIGSYNC(sc,
1869 BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
1870 status =3D le16toh(cbp->cb_status);
=3D> 1871 FXP_CDCONFIGSYNC(sc, BUS_DMASYNC_PREREAD);
1872 if ((status & FXP_CB_STATUS_C) !=3D 0)
1873 break;
1874 DELAY(1);
1875 }
https://nxr.NetBSD.org/xref/src/sys/dev/ic/i82557.c?r=3D1.162#1858
In the case of virtio(4), we have both of these issues:
1. Once the guest driver has published a transfer by a store to
vq->vq_avail->idx, it must load vq->vq_used->flags to decide
whether to kick the host:
1259 vq->vq_avail->idx =3D virtio_rw16(sc, vq->vq_avail_idx);
...
1275 vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
1276 flags =3D virtio_rw16(sc, vq->vq_used->flags);
1277 if (!(flags & VRING_USED_F_NO_NOTIFY))
1278 sc->sc_ops->kick(sc, vq->vq_index);
Later on, to determine whether the transfer has completed, it must
load vq->vq_used->idx:
https://nxr.NetBSD.org/xref/src/sys/dev/pci/virtio.c?r=3D1.84#1259
600 bool
601 virtio_vq_is_enqueued(struct virtio_softc *sc, struct virtqueue *vq)
602 {
...
609 vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
610 if (vq->vq_used_idx =3D=3D virtio_rw16(sc, vq->vq_used->idx))
611 return 0;
...
613 return 1;
614 }
https://nxr.NetBSD.org/xref/src/sys/dev/pci/virtio.c?r=3D1.84#600
But vq->vq_used->flags and vq->vq_used->idx are 16-bit integers
in a likely-aligned buffer, so they are almost certain to share a
cache line:
177 struct vring_used {
178 uint16_t flags;
179 uint16_t idx;
https://nxr.NetBSD.org/xref/src/sys/dev/pci/virtioreg.h?r=3D1.15#177
Fortunately, the guest driver and the host device are not competing
to update the cache line at the same time, so to fix virtio(4) on
the affected architectures, it is just a matter of issuing a
PREREAD/POSTREAD cycle between the load of vq->vq_used->flags and
vq->vq_used->idx, not just POSTREAD.
2. Virtio interrupt handlers use virtio_vq_is_enqueued to test
whether a transfer on a virtqueue has completed. If the interrupt
line is shared with other devices, a driver might test
virtio_vq_is_enqueued several times in a row from unrelated
interrupts before it returns true, and this boils down to:
*interrupt*
POSTREAD
load vq->vq_used->idx
*interrupt*
POSTREAD
load vq->vq_used->idx
*interrupt*
POSTREAD
load vq->vq_used->idx
...
On some architectures, since POSTREAD does nothing, the first
value of vq->vq_used->idx may remain cached through the subsequent
loads.
Note: this is related to, and probably necessary for, PR kern/60182:
ld@virtio sometimes hangs up. But there is also another issue there
with missing POSTREAD in a loop inside ld(4) -- not just PREREAD. To
be fixed in a separate commit.
diff -r 7c298a30dee7 -r 0f2733c1e2e8 sys/dev/pci/virtio.c
--- a/sys/dev/pci/virtio.c Sun May 17 01:31:04 2026 +0000
+++ b/sys/dev/pci/virtio.c Mon May 18 20:58:18 2026 +0000
@@ -607,8 +607,10 @@ virtio_vq_is_enqueued(struct virtio_soft
}
=20
vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
- if (vq->vq_used_idx =3D=3D virtio_rw16(sc, vq->vq_used->idx))
+ if (vq->vq_used_idx =3D=3D virtio_rw16(sc, vq->vq_used->idx)) {
+ vq_sync_uring_header(sc, vq, BUS_DMASYNC_PREREAD);
return 0;
+ }
vq_sync_uring_payload(sc, vq, BUS_DMASYNC_POSTREAD);
return 1;
}
@@ -629,8 +631,10 @@ virtio_postpone_intr(struct virtio_softc
vq_sync_aring_used(vq->vq_owner, vq, BUS_DMASYNC_PREWRITE);
vq->vq_queued++;
=20
+ vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
nused =3D (uint16_t)
(virtio_rw16(sc, vq->vq_used->idx) - vq->vq_used_idx);
+ vq_sync_uring_header(sc, vq, BUS_DMASYNC_PREREAD);
KASSERT(nused <=3D vq->vq_num);
=20
return nslots < nused;
@@ -717,8 +721,10 @@ virtio_start_vq_intr(struct virtio_softc
paravirt_membar_sync();
=20
vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
- if (vq->vq_used_idx =3D=3D virtio_rw16(sc, vq->vq_used->idx))
+ if (vq->vq_used_idx =3D=3D virtio_rw16(sc, vq->vq_used->idx)) {
+ vq_sync_uring_header(sc, vq, BUS_DMASYNC_PREREAD);
return 0;
+ }
vq_sync_uring_payload(sc, vq, BUS_DMASYNC_POSTREAD);
return 1;
}
@@ -1269,11 +1275,13 @@ notify:
if (sc->sc_active_features & VIRTIO_F_RING_EVENT_IDX) {
vq_sync_uring_avail(sc, vq, BUS_DMASYNC_POSTREAD);
t =3D virtio_rw16(sc, *vq->vq_avail_event) + 1;
+ vq_sync_uring_avail(sc, vq, BUS_DMASYNC_PREREAD);
if ((uint16_t) (n - t) < (uint16_t) (n - o))
sc->sc_ops->kick(sc, vq->vq_index);
} else {
vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
flags =3D virtio_rw16(sc, vq->vq_used->flags);
+ vq_sync_uring_header(sc, vq, BUS_DMASYNC_PREREAD);
if (!(flags & VRING_USED_F_NO_NOTIFY))
sc->sc_ops->kick(sc, vq->vq_index);
}
--=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW--
Home |
Main Index |
Thread Index |
Old Index