Re: kern/60144: virtio(4) cache coherence issue

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,isaki%pastel-flower.jp@localhost
Subject: Re: kern/60144: virtio(4) cache coherence issue
From: "Robert Elz via gnats" <gnats-admin%NetBSD.org@localhost>
Date: Mon, 30 Mar 2026 16:45:02 +0000 (UTC)

The following reply was made to PR kern/60144; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: Jason Thorpe <thorpej%me.com@localhost>
Cc: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/60144: virtio(4) cache coherence issue
Date: Mon, 30 Mar 2026 23:41:28 +0700

     Date:        Sun, 29 Mar 2026 21:39:43 -0700
     From:        Jason Thorpe <thorpej%me.com@localhost>
     Message-ID:  <52E223DA-EF6B-4266-836F-29955CE7FAAC%me.com@localhost>

   | This actually brings up an interesting philosophical question:
   | What does DMA coherency mean in the context of virtio?

 Well, as you suggest, but didn't explicitly say, caching in a virtual
 device can't logically be to make anything run faster, so the obvious
 interpretation would be that it is intended to assist in finding caching
 bugs.

 If I were implementing something like that (and I certainly am not), I'd
 be modelling all the memory the process generally sees (RAM) as cache,
 ie: 100% cache coverage -- all ram is always in "cache".

 Then I'd model real ram as only ever used by operations that explicitly
 don't use the cache - like device DMA, but possibly also interrupt vector
 fetch - anything which the modelled hardware doesn't pass by the cache
 (and for devices with multiple cache levels, that means all of them.)

 For that I'd only allocate real memory for the non-cache pages which are
 actually used - when one is first allocated, I'd simply copy the "cached"
 memory into it (as if at that instant, for whatever reason, the cache was
 completely consistent) - but after that the only operations which would
 ever move data between the external memory (where it has been allocated) and
 the cache memory would be those which explicitly affect the cache in the
 hardware design - operations which flush a cache line to ram (in real hardware)
 would copy that amount of data from the cache to the external ram, and
 operations which invalidate the cache (without writeback) would simply
 copy the external ram into the cache memory (as if the cache had been
 invalidated, and then something magically referenced the relevant cache
 lines, and copied the ram data into the cache).   But both only when there
 is an actual ram page allocated, when not, the cache ops would just be
 no-ops.

 When modelling a multi-cpu system with caching, I'd allocate entirely
 separate memory for each cpu (its cache), but share memory for the external
 ram - so any ram intended to be visible to multiple CPUs needs the cache
 flush logic implemented properly - in this case I'd probably allocate the
 ram backing pages for anything which might be shared between cpus, as well
 as when used for dma etc.

 So, I'm not sure that the general concept is quite as absurd as it seems
 at first glance - it might help locate all kinds of caching issues such
 as the one Taylor described.

 kre

 ps: everyone, when replying to gnats messages, it is a good idea, I
 believe, to delete the gnats-admin@ address from the destination fields.
 Having it there serves no useful purpose as best I understand things.

Follow-Ups:
- Re: kern/60144: virtio(4) cache coherence issue
  - From: Tetsuya Isaki

Prev by Date: Re: kern/60144: virtio(4) cache coherence issue
Next by Date: Re: lib/54938 (/usr/include/unbound.h constants are wrong)
Previous by Thread: Re: kern/60144: virtio(4) cache coherence issue
Next by Thread: Re: kern/60144: virtio(4) cache coherence issue
Indexes:

Home | Main Index | Thread Index | Old Index