NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/60144: virtio(4) cache coherence issue
>Number: 60144
>Category: kern
>Synopsis: virtio(4) cache coherence issue
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 30 02:55:00 +0000 2026
>Originator: Tetsuya Isaki
>Release: NetBSD/virt68k 11.0_RC2
>Organization:
>Environment:
NetBSD 11.0_RC2 virt68k
>Description:
The kernel sometimes hangs up with "Spurious interrupt on CPU ipl 5",
when accessing ld@virtio on virt68k.
This message means "an interrupt occurred, but no handlers took it".
Here, ipl 5 is the interrupt that virtio is assigned (and may be
shared with other devices).
At least in virtio_is_enqueue(), I suspect that there may be some
insufficient cache line invalidation.
sys/dev/pci/virtio.c:
601 virtio_vq_is_enqueued(struct virtio_softc *sc, struct virtqueue *vq)
602 {
603
604 if (vq->vq_queued) {
605 vq->vq_queued = 0;
606 vq_sync_aring_all(sc, vq, BUS_DMASYNC_POSTWRITE);
607 }
608
609 vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
610 if (vq->vq_used_idx == virtio_rw16(sc, vq->vq_used->idx))
611 return 0;
612 vq_sync_uring_payload(sc, vq, BUS_DMASYNC_POSTREAD);
613 return 1;
614 }
The virtio device incremented its own index and wrote it to
vq->vq_used->idx. But when the interrupt was lost, (un)luckily
the data cache still held the previous vq->vq_used->idx, so that
CPU read it from the cache(!).
As you know, the previous vq->vq_used->idx is the same as
vq->vq_used_idx, therefore the function returned 0 (which means
vq is empty), even though the device notified as vq-is-enqueued.
I think that vq_sync_uring_header(sc, vq, BUS_DMASYNC_PREREAD) is
necessary to invalidate the cache line before reading fresh
vq->vq_used->idx (at line 610) ?
This is the only case that I was able to observe by tracing on
emulator. But many other places look similar.
And the following four results I observed also support this assumption.
- qemu (68040, without cache impl.) could not reproduce.
- nono (68030, with cache impl.) could reproduce.
- nono (68040, without cache impl. yet) could not reproduce.
- nono (68030, force disable data cache) could not reproduce.
>How-To-Repeat:
Boot NetBSD/virt68k on emulator which implements a data cache.
Access ld@virtio. But not 100% reproducible.
If someone else updates the same cache line, this problem will not
reproduce. When reproducible, I typically encountered it within
20 sets of the following command pair.
# mount /dev/ld1e /mnt; umount /mnt
:
>Fix:
See above.
Home |
Main Index |
Thread Index |
Old Index