Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: panic: _bus_virt_to_bus for vioif on GCE with GENERIC kernel



On Sun, Jan 31, 2021 at 03:32:24PM +0100, Reinoud Zandijk wrote:
> Dear Paul,
> 
> On Sat, Jan 30, 2021 at 10:32:13PM +1100, Paul Ripke wrote:
> > On Sat, Jan 30, 2021 at 12:37:31AM +0100, Reinoud Zandijk wrote:
> > > On Thu, Jan 28, 2021 at 11:56:30PM +1100, Paul Ripke wrote:
> > > > Just tried running a newly built kernel on a GCE instance, and ran into
> > > > this panic. The previously running kernel is 9.99.73 from back around
> > > > October last year.
> 
> > Confirmed that a kernel built immediately prior to the following commit
> > works,
> > and fails after this commit:
> > https://github.com/NetBSD/src/commit/7bca0bcf21c9b3465a6ee4eef6c01be32c9de1eb
> 
> Thats good to know; I found a bug in memory allocation that might explain your
> panic and committed a fix for it. Could you please try out -current and see if
> the problem still persists?

Sorry, I appear to see the same behaviour with the patch:

[   1.0297881] virtio1 at pci0 dev 4 function 0
[   1.0297881] virtio1: network device (rev. 0x00)
[   1.0297881] vioif0 at virtio1: features: 0x20030020<EVENT_IDX,CTRL_VQ,STATUS,MAC>
[   1.0297881] vioif0: Ethernet address 42:01:0a:98:00:02
[   1.0297881] panic: _bus_virt_to_bus
[   1.0297881] cpu0: Begin traceback...
[   1.0297881] vpanic() at netbsd:vpanic+0x156
[   1.0297881] snprintf() at netbsd:snprintf
[   1.0297881] _bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf
[   1.0297881] bus_dmamap_load() at netbsd:bus_dmamap_load+0x9c
[   1.0297881] vioif_dmamap_create_load.constprop.0() at netbsd:vioif_dmamap_create_load.constprop.0+0x7e
[   1.0297881] vioif_attach() at netbsd:vioif_attach+0x1085
[   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
[   1.0297881] virtio_pci_rescan() at netbsd:virtio_pci_rescan+0x48
[   1.0297881] virtio_pci_attach() at netbsd:virtio_pci_attach+0x23a
[   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
[   1.0297881] pci_probe_device() at netbsd:pci_probe_device+0x585
[   1.0297881] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1b5
[   1.0297881] pcirescan() at netbsd:pcirescan+0x4e
[   1.0297881] pciattach() at netbsd:pciattach+0x186
[   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
[   1.0297881] mp_pci_scan() at netbsd:mp_pci_scan+0x9e
[   1.0297881] amd64_mainbus_attach() at netbsd:amd64_mainbus_attach+0x236
[   1.0297881] mainbus_attach() at netbsd:mainbus_attach+0x84
[   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
[   1.0297881] cpu_configure() at netbsd:cpu_configure+0x38
[   1.0297881] main() at netbsd:main+0x32c
[   1.0297881] cpu0: End traceback...
[   1.0297881] fatal breakpoint trap in supervisor mode
[   1.0297881] trap type 1 code 0 rip 0xffffffff80221a35 cs 0x8 rflags 0x202 cr2 0 ilevel 0x8 rsp 0xffffffff81cfa5d0
[   1.0297881] curlwp 0xffffffff81886e40 pid 0.0 lowest kstack 0xffffffff81cf52c0

However, forcing the full size virtio_net_hdr results in a working kernel!
Eg, the following hack:

diff --git a/sys/dev/pci/if_vioif.c b/sys/dev/pci/if_vioif.c
index 6482f7f60742..8ff187d33a48 100644
--- a/sys/dev/pci/if_vioif.c
+++ b/sys/dev/pci/if_vioif.c
@@ -863,7 +863,8 @@ vioif_attach(device_t parent, device_t self, void *aux)
        aprint_normal_dev(self, "Ethernet address %s\n",
            ether_sprintf(sc->sc_mac));
 
-       if (features & (VIRTIO_NET_F_MRG_RXBUF | VIRTIO_F_VERSION_1)) {
+       // if (features & (VIRTIO_NET_F_MRG_RXBUF | VIRTIO_F_VERSION_1)) {      // XXX stix
+       if (1) {
                sc->sc_hdr_size = sizeof(struct virtio_net_hdr);
        } else {
                sc->sc_hdr_size = offsetof(struct virtio_net_hdr, num_buffers);

Does that give any hints?

> > > Could you A) test with virtio v1 PCI devices? ie without legacy and if
> > > that
> > > fails too could you B) test with src/sys/dev/pci/if_vioif.c:832 commented
> > > out
> > > and see if that makes a difference? That's a new virtio 1.0 feature that
> > > was
> > > apparently negotiated and should work in transitional devices and should
> > > not
> > > be accepted in older. It could be that CGE is making a mistake there but
> > > negotiating EVENT_IDX shifts registers so has a big impact if it goes
> > > wrong.
> > 
> > A) Erm, how? Read thru some of the source and saw mentions of v1.0 vs v0.9,
> > but didn't see a way of just disabling legacy support
> 
> Legacy support has to be disabled in the hypervisor (like GCE) as it needs to
> pass a different PCI product number. In Qemu its a property of each virtio PCI
> device but in GCE it might be global.

Ah, I had wondered if that was the case. I haven't seen anything in the GCE
configs to control this; Googling for answers is also made awkward given
the ambiguous "PCI" acronym.

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Home | Main Index | Thread Index | Old Index