kern/60145: vioif(4) panic on NetBSD/virt68k 11.0_RC2

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/60145: vioif(4) panic on NetBSD/virt68k 11.0_RC2
From: "isaki%pastel-flower.jp@localhost via gnats" <gnats-admin%NetBSD.org@localhost>
Date: Mon, 30 Mar 2026 03:00:01 +0000 (UTC)

>Number:         60145
>Category:       kern
>Synopsis:       vioif(4) panic on NetBSD/virt68k 11.0_RC2
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 30 03:00:00 +0000 2026
>Originator:     Tetsuya Isaki
>Release:        NetBSD/virt68k 11.0_RC2
>Organization:
>Environment:
NetBSD 11.0_RC2 virt68k
>Description:
A kernel panic occurred after boot up virt68k, without any
user interaction.  Here is a hand-copied panic message.

trap: bad kernel read/write access at 0x28
trap type 8, code = 0x4025105, v = 0x28
kernel program counter = 0x1e2ca4
kernel: MMU fault trap
pid = 0, lid = 3, pc = 001E2CA4, ps = 2500 , sfc = 1, dfc = 1
dreg: 0000005A 00002004 00000001 00458C00 00431C40 00000100 02F9DF28 02F9DF2C
areg: 00000000 00458C00 00468400 004337C0 001E07C6 001BD824 001E2B8A FFEFFFFC
:

Here are the corresponding C and assembly sources (with comment)
around PC=0x1e2ca4.

001e2c24 <vioif_rx_deq_locked>:
 :
  1e2c86:       4878 0002       pea 2 <E1>          ; BUS_DMASYNC_POSTREAD
  1e2c8a:       2200            movel %d0,%d1
  1e2c8c:       e989            lsll #4,%d1
  1e2c8e:       d2ab 0010       addl %a3@(16),%d1
  1e2c92:       2f01            movel %d1,%sp@-     ; map
  1e2c94:       2f00            movel %d0,%sp@-     ; slot
  1e2c96:       2f03            movel %d3,%sp@-     ; vq
  1e2c98:       2f04            movel %d4,%sp@-     ; vsc
  1e2c9a:       4e96            jsr %a6@            ; vioif_net_dequeue_commit
  1e2c9c:       202f 0048       movel %sp@(72),%d0  ; len 
  1e2ca0:       90aa 0014       subl %a2@(20),%d0   ; minus sc->sc_hdr_size
  1e2ca4:       2140 0028       movel %d0,%a0@(40)  ; assign to two vars. <==
  1e2ca8:       2140 0010       movel %d0,%a0@(16)  ; assign to two vars.

sys/dev/pci/if_vioif.c:
 1810 vioif_rx_deq_locked(struct vioif_softc *sc, struct virtio_softc *vsc,
 1811     struct vioif_netqueue *netq, u_int limit, size_t *ndeqp)
 1812 {
 :
 1835         if (virtio_dequeue(vsc, vq, &slot, &len) != 0)
 1836             break;
 1837
 1838         map = &netq->netq_maps[slot];
 1839         KASSERT(map->vnm_mbuf != NULL);
 1840         m = vioif_net_dequeue_commit(vsc, vq, slot,
 1841             map, BUS_DMASYNC_POSTREAD);
 1842         KASSERT(m != NULL);
 1843
 1844         m->m_len = m->m_pkthdr.len = len - sc->sc_hdr_size;

If m happens to be NULL at line 1840, it will cause this panic at 1844.
In fact, %a0 was 0 according to the panic message.
I think this KASSERT at 1842 means that "if virtio_dequeue() returns
0(success, dequeue-able), vioif_net_dequeue_commit() must be succeeded".
If so, I suspect that virtio_dequeue() accidentally returns 0 even
though vq was just dequeued and is empty.

I'm not sure but how about this?

sys/dev/pci/virtio.c:
 1322 virtio_dequeue(struct virtio_softc *sc, struct virtqueue *vq,
 1323     int *slotp, int *lenp)
 1324 {
 1325     uint16_t slot, usedidx;
 1326

  <== need vq_sync_uring_all(PREREAD) to invalidate vq->vq_used->idx
      (and following vq->vq_used->ring[].id payload together ?)
      on the cache line?

 1327     if (vq->vq_used_idx == virtio_rw16(sc, vq->vq_used->idx))
 1328         return ENOENT;
 1329     mutex_enter(&vq->vq_uring_lock);
 1330     usedidx = vq->vq_used_idx++;
 1331     mutex_exit(&vq->vq_uring_lock);
 1332     usedidx %= vq->vq_num;
 1333     slot = virtio_rw32(sc, vq->vq_used->ring[usedidx].id);
 1334
 1335     if (vq->vq_descx[slot].use_indirect)
 1336         vq_sync_indirect(sc, vq, slot, BUS_DMASYNC_POSTWRITE);
 1337
 1338     if (slotp)
 1339         *slotp = slot;
 1340     if (lenp)
 1341         *lenp = virtio_rw32(sc, vq->vq_used->ring[usedidx].len);
 1342
 1343     return 0;
 1344 }

>How-To-Repeat:
Boot NetBSD/virt68k on emulator which implements a data cache.
Up vioif(4) and wait...
>Fix:
See above.

Prev by Date: kern/60144: virtio(4) cache coherence issue
Next by Date: Re: kern/60144: virtio(4) cache coherence issue
Previous by Thread: kern/60144: virtio(4) cache coherence issue
Next by Thread: Re: lib/54938 (/usr/include/unbound.h constants are wrong)
Indexes:

Home | Main Index | Thread Index | Old Index