NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54189: IFNET_LOCK for if_mcast_op can cause a deadlock



>Number:         54189
>Category:       kern
>Synopsis:       IFNET_LOCK for if_mcast_op can cause a deadlock
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 10 09:25:00 +0000 2019
>Originator:     Ryota Ozaki
>Release:        current and 8
>Organization:
>Environment:
System: NetBSD rangeley 8.99.38 NetBSD 8.99.38 (RANGELEY) #2: Fri May 10 13:37:40 JST 2019  ozaki-r@rangeley:(hidden) amd64
Architecture: x86_64
Machine: amd64
>Description:
From src/doc/TODO.smpnet:

  To avoid data race on if_flags it should be protected by a lock (currently it's
  IFNET_LOCK).  (snip)
  
  IFF_ALLMULTI can be set/unset via if_mcast_op.  To protect updates of the flag,
  we had added IFNET_LOCK around if_mcast_op.  However that was not a good
  approach because if_mcast_op is typically called in the middle of a call path
  and holding IFNET_LOCK such places is problematic.  Actually a deadlock is
  observed.  Probably we should remove IFNET_LOCK and manage IFF_ALLMULTI
  somewhere other than if_flags, for example ethercom or driver itself (or a
  common driver framework once it appears).  Such a change is feasible because
  IFF_ALLMULTI is only set/unset by a driver and not accessed from any common
  components such as network protocols.

The resource dependency graph of a deadlock I encountered is like this:
  Thread A: IFNET_LOCK => psref_target_destroy => softint
  Thread B: softint => softnet_lock
  Thread C: softnet_lock => IFNET_LOCK (for if_mcast_op) 

Note that the deadlock doesn't occur if NET_MPSAFE is NOT enabled because
psref_target_destroy is called only if NET_MPSAFE is enabled.
>How-To-Repeat:
Run userspace processes below concurrently while receive some packets destinating to the host, then wait several hours:
- a process adding/removing an IP address on an interface, and
- a process configuring/unconfiguring multicast groups on the same interface.
>Fix:
One solution is to remove IFNET_LOCK for if_mcat_op as described above and
protect operations of IFF_ALLMULTI in another way.

The below patch introduces a new field to struct ethercom (ec_flags) and store
the IFF_ALLMULTI flag to it (as ETHER_F_ALLMULTI).  So we can use ETHER_LOCK
to protect operations of the flag instead of IFNET_LOCK.

https://www.netbsd.org/~ozaki-r/allmulti.diff

One known issue of the approach is that it can't be applied to netbsd-8 because
the change breaks ABI IIUC.



Home | Main Index | Thread Index | Old Index