tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

vioif(4) deadlock with softnet_lock



Every now and then an aarch64 VM I run in qemu hangs at boot.  It
appears to be deadlocked, but I'm not sure exactly what the deadlock
is.

The symptom is that mdnsd holds softnet_lock and waits for
ctrlq->ctrlq_inuse == FREE with cv_wait in vioif_ctrl_acquire -- all
this is to update the `hardware' multicast filter via:

   vioif_set_rx_filter
   vioif_rx_filter
   vioif_ioctl
   if_mcast_op
   in_delmulti
   ip_freemoptions
   in_pcbdetach
   udp_detach_wrapper
   soclose
   soo_close
   closef
   fd_close
   sys_close

At this point, various softint threads are stuck waiting for
softnet_lock, so, e.g. timers no longer fire.

This already seems bad -- cv_wait while holding softnet_lock is
generally forbidden, because cv_wait is forbidden in softint context,
and acquiring any lock from softint context that another thread might
hold across cv_wait is tantamount to doing cv_wait in softint context.

(Changing it to cv_timedwait wouldn't help much, because all callouts
in softclock may get blocked waiting for softnet_lock at which point
the timeout would never fire.)

But as far as I can tell, this only leads to actual deadlock if the
hardware isn't delivering the PCI interrupt that leads vioif_ctrl_intr
to set ctrlq->ctrlq_inuse := DONE and wake vioif_ctrl_acquire.

1. What could be going wrong here to trigger this deadlock?  Could
   something be missing a virtio interrupt?

2. Can either (a) vioif, or (b) in_pcbdetach / ip_freemoptions /
   in_delmulti, be made to avoid cv_wait under softnet_lock?  Could
   something in that stack safely release softnet_lock, for instance?
   Or is it necessary to take softnet_lock in this path at all?  This
   is likely to cause deadlocks in other network drivers, like
   usbnet(9).


Home | Main Index | Thread Index | Old Index