NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD 10.1 VPS became unresponsive
On Thu, Aug 28, 2025 at 11:53:44PM +0200, Christof Meerwald wrote:
> On Mon, Aug 25, 2025 at 05:46:07AM -0000, Michael van Elst wrote:
> > cmeerw%cmeerw.org@localhost (Christof Meerwald) writes:
> >
> > >So it does look like some kind of race condition, also I thought that
> > >should be handled by qemu by first setting the vq->vq_used->flags to 0
> > >and then checking vq->vq_used->idx again before relying on
> > >notifications being sent.
> >
> > Maybe a memory ordering issue then ?
>
> Yes, probably - I have added an mfence and not seen any issue since
> then (more than 2 days now, with my sync loop running in the
> background). BTW, this is on a AMD Ryzen 9 9950X 16-Core Processor
> (with 2 CPUs assigned to the VPS).
I think it's starting to make some sense now. In virtio.c we
essentially have
vq->vq_avail->idx = virtio_rw16(sc, vq->vq_avail_idx);
vq_sync_aring_header(sc, vq, BUS_DMASYNC_PREWRITE);
vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
flags = virtio_rw16(sc, vq->vq_used->flags);
where the BUS_DMASYNC_PREWRITE is a sfence and BUS_DMASYNC_PREWRITE is
an lfence, so we have:
vq->vq_avail->idx = virtio_rw16(sc, vq->vq_avail_idx);
x86_sfence();
x86_lfence();
flags = virtio_rw16(sc, vq->vq_used->flags);
And https://stackoverflow.com/a/50322404 argues that the store and
load can be reordered here, and this appears to be exactly what I am
seeing. But I am only seeing this on a "AMD Ryzen 9 9950X 16-Core
Processor" so far - maybe they only started doing that fairly recently?
Christof
--
https://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org
Home |
Main Index |
Thread Index |
Old Index