NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 10.1 VPS became unresponsive



On Thu, Aug 28, 2025 at 11:53:44PM +0200, Christof Meerwald wrote:
> On Mon, Aug 25, 2025 at 05:46:07AM -0000, Michael van Elst wrote:
> > cmeerw%cmeerw.org@localhost (Christof Meerwald) writes:
> > 
> > >So it does look like some kind of race condition, also I thought that
> > >should be handled by qemu by first setting the vq->vq_used->flags to 0
> > >and then checking vq->vq_used->idx again before relying on
> > >notifications being sent.
> > 
> > Maybe a memory ordering issue then ?
> 
> Yes, probably - I have added an mfence and not seen any issue since
> then (more than 2 days now, with my sync loop running in the
> background). BTW, this is on a AMD Ryzen 9 9950X 16-Core Processor
> (with 2 CPUs assigned to the VPS).

I think it's starting to make some sense now. In virtio.c we
essentially have

                vq->vq_avail->idx = virtio_rw16(sc, vq->vq_avail_idx);
                vq_sync_aring_header(sc, vq, BUS_DMASYNC_PREWRITE);

                vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
                flags = virtio_rw16(sc, vq->vq_used->flags);

where the BUS_DMASYNC_PREWRITE is a sfence and BUS_DMASYNC_PREWRITE is
an lfence, so we have:

                vq->vq_avail->idx = virtio_rw16(sc, vq->vq_avail_idx);
                x86_sfence();

                x86_lfence();
                flags = virtio_rw16(sc, vq->vq_used->flags);

And https://stackoverflow.com/a/50322404 argues that the store and
load can be reordered here, and this appears to be exactly what I am
seeing. But I am only seeing this on a "AMD Ryzen 9 9950X 16-Core
Processor" so far - maybe they only started doing that fairly recently?


Christof

-- 
https://cmeerw.org                             sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org


Home | Main Index | Thread Index | Old Index