tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: vioif(4) deadlock with softnet_lock



On Tue, May 31, 2022 at 5:36 AM Taylor R Campbell
<campbell+netbsd-tech-net%mumble.net@localhost> wrote:
>
> Every now and then an aarch64 VM I run in qemu hangs at boot.  It
> appears to be deadlocked, but I'm not sure exactly what the deadlock
> is.
>
> The symptom is that mdnsd holds softnet_lock and waits for
> ctrlq->ctrlq_inuse == FREE with cv_wait in vioif_ctrl_acquire -- all
> this is to update the `hardware' multicast filter via:
>
>    vioif_set_rx_filter
>    vioif_rx_filter
>    vioif_ioctl
>    if_mcast_op
>    in_delmulti
>    ip_freemoptions
>    in_pcbdetach
>    udp_detach_wrapper
>    soclose
>    soo_close
>    closef
>    fd_close
>    sys_close
>
> At this point, various softint threads are stuck waiting for
> softnet_lock, so, e.g. timers no longer fire.
>
> This already seems bad -- cv_wait while holding softnet_lock is
> generally forbidden, because cv_wait is forbidden in softint context,
> and acquiring any lock from softint context that another thread might
> hold across cv_wait is tantamount to doing cv_wait in softint context.
>
> (Changing it to cv_timedwait wouldn't help much, because all callouts
> in softclock may get blocked waiting for softnet_lock at which point
> the timeout would never fire.)
>
> But as far as I can tell, this only leads to actual deadlock if the
> hardware isn't delivering the PCI interrupt that leads vioif_ctrl_intr
> to set ctrlq->ctrlq_inuse := DONE and wake vioif_ctrl_acquire.
>
> 1. What could be going wrong here to trigger this deadlock?  Could
>    something be missing a virtio interrupt?

It looks to me as an interrupt issue, not a deadlock, because I can't
find any circular dependency leading to a deadlock.

Have you ever seen the same issue on an amd64 VM?
It may be an issue only on aarch64.

  ozaki-r

>
> 2. Can either (a) vioif, or (b) in_pcbdetach / ip_freemoptions /
>    in_delmulti, be made to avoid cv_wait under softnet_lock?  Could
>    something in that stack safely release softnet_lock, for instance?
>    Or is it necessary to take softnet_lock in this path at all?  This
>    is likely to cause deadlocks in other network drivers, like
>    usbnet(9).


Home | Main Index | Thread Index | Old Index