tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: vioif(4) deadlock with softnet_lock
On Tue, May 31, 2022 at 5:36 AM Taylor R Campbell
<campbell+netbsd-tech-net%mumble.net@localhost> wrote:
>
> Every now and then an aarch64 VM I run in qemu hangs at boot. It
> appears to be deadlocked, but I'm not sure exactly what the deadlock
> is.
>
> The symptom is that mdnsd holds softnet_lock and waits for
> ctrlq->ctrlq_inuse == FREE with cv_wait in vioif_ctrl_acquire -- all
> this is to update the `hardware' multicast filter via:
>
> vioif_set_rx_filter
> vioif_rx_filter
> vioif_ioctl
> if_mcast_op
> in_delmulti
> ip_freemoptions
> in_pcbdetach
> udp_detach_wrapper
> soclose
> soo_close
> closef
> fd_close
> sys_close
>
> At this point, various softint threads are stuck waiting for
> softnet_lock, so, e.g. timers no longer fire.
>
> This already seems bad -- cv_wait while holding softnet_lock is
> generally forbidden, because cv_wait is forbidden in softint context,
> and acquiring any lock from softint context that another thread might
> hold across cv_wait is tantamount to doing cv_wait in softint context.
>
> (Changing it to cv_timedwait wouldn't help much, because all callouts
> in softclock may get blocked waiting for softnet_lock at which point
> the timeout would never fire.)
>
> But as far as I can tell, this only leads to actual deadlock if the
> hardware isn't delivering the PCI interrupt that leads vioif_ctrl_intr
> to set ctrlq->ctrlq_inuse := DONE and wake vioif_ctrl_acquire.
>
> 1. What could be going wrong here to trigger this deadlock? Could
> something be missing a virtio interrupt?
It looks to me as an interrupt issue, not a deadlock, because I can't
find any circular dependency leading to a deadlock.
Have you ever seen the same issue on an amd64 VM?
It may be an issue only on aarch64.
ozaki-r
>
> 2. Can either (a) vioif, or (b) in_pcbdetach / ip_freemoptions /
> in_delmulti, be made to avoid cv_wait under softnet_lock? Could
> something in that stack safely release softnet_lock, for instance?
> Or is it necessary to take softnet_lock in this path at all? This
> is likely to cause deadlocks in other network drivers, like
> usbnet(9).
Home |
Main Index |
Thread Index |
Old Index