kern/58575: altq(4) takes adaptive lock while holding spin lock

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/58575: altq(4) takes adaptive lock while holding spin lock
From: campbell+netbsd%mumble.net@localhost
Date: Sat, 10 Aug 2024 19:55:00 +0000 (UTC)

>Number:         58575
>Category:       kern
>Synopsis:       altq(4) takes adaptive lock while holding spin lock
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 10 19:55:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, ...
>Organization:
The AltqBSD Foundaspinlock
>Environment:
>Description:
altq(4) may try to take the global entropy lock in order to serve cprng_fast32 while holding struct ifnet::if_snd.ifq_lock, which is a spin lock:

#7  0xffffffff80239654 in trap ()
#8  0xffffffff80231fa4 in alltraps ()
#9  0xffffffff802328d5 in breakpoint ()
#10 0xffffffff807760f9 in vpanic ()
#11 0xffffffff807761d8 in panic ()
#12 0xffffffff80769da9 in lockdebug_abort1 ()
#13 0xffffffff8071f52f in mutex_vector_enter ()
#14 0xffffffff80700c0c in entropy_extract ()
#15 0xffffffff807590d6 in cprng_strong_reseed ()
#16 0xffffffff807594ae in cprng_strong ()
#17 0xffffffff8042efbd in cprng_fast_buf_short ()
#18 0xffffffff8042f144 in cprng_fast32 ()
#19 0xffffffff8052fe5e in red_addq ()
#20 0xffffffff8053028e in red_enqueue ()
#21 0xffffffff808116be in if_transmit ()
#22 0xffffffff8081858f in if_transmit_lock ()
#23 0xffffffff80819463 in ether_output ()
#24 0xffffffff8047fcd7 in ip_output ()
#25 0xffffffff8049d3ae in udp_output ()
#26 0xffffffff8049d474 in udp_send ()
#27 0xffffffff8049d5ca in udp_send_wrapper ()
#28 0xffffffff807afee6 in sosend ()
#29 0xffffffff807b7fc9 in do_sys_sendmsg_so ()
#30 0xffffffff807b824d in do_sys_sendmsg ()
#31 0xffffffff807b82df in sys_sendto ()
#32 0xffffffff8039eba8 in syscall ()
#33 0xffffffff8020e6bd in handle_syscall ()
>How-To-Repeat:
1. enable altq and lockdebug
2. do networking
>Fix:
The reason this is new is that cprng_fast32 synchronously seeds itself from the global entropy pool.  This way, the first sample on each CPU is still as good as the global entropy pool at the time of first sample.

Previously, we would initialize it with whatever fumes the global entropy pool started out with at a definite time early at boot, and then, if the global entropy pool was still hungry by the time of the first cprng_fast call, we would (a) return predictable data, and (b) schedule asynchronous reseed.

Some time during the development of 10, I went through and tried to stamp out all calls to cprng_* from hard interrupt context -- this made a lot of things in kern_entropy.c much simpler, _and_ enabled us to synchronously seed cprng_fast.  However, I missed the calls under _spin locks_ which are effectively also hard interrupt context (in that they hold up hard interrupts trying to take the same lock on another CPU).

We could bring back the asynchronous seeding of cprng_fast, at the cost of a bit of a security regression -- initial output may be predictable.  But rather than do that, I'd like to revisit whether ifq_lock really needs to be a spin lock.  Here's an alternative approach:

1. New lock struct ifnet::if_work_lock, rather than reusing if_snd.ifq_lock for the purpose.  Use if_work lock in if_link_state_change, which is taken from hard interrupt context.
2. Use IPL_SOFTNET, not IPL_NET, for ifq_lock, which I think should only be reached from softint context at most, never hard interrupt context.

Caveat: (1) requires kernel revbump because it changes the size of struct ifnet.  Maybe we can create a global if_link_state_change_lock instead, just for pullup to 10 -- suboptimal, but do we need link state changes to scale well?

Caveat: Some drivers hold a spin lock across IFQ_* operations, like aq(4) holds struct aq_txring::txr_mutex (IPL_NET spin lock) across aq_send_common_locked -> IFQ_POLL, IFQ_DEQUEUE.  So (2) alone isn't enough.  But maybe drivers should defer this logic to softint anyway, and use IPL_SOFTNET, which would resolve this problem.

Prev by Date: re: kern/57694: rge(4) hang
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: port-arm/58572 (aarch64 RNDRRS failures should be evcounted and dtraced)
Next by Thread: kern/58576: ifnet driver model is undocumented
Indexes:

Home | Main Index | Thread Index | Old Index