Re: TCP and NET_MPSAFE

To: tech-net%netbsd.org@localhost
Subject: Re: TCP and NET_MPSAFE
From: Kevin Bowling <kevin.bowling%kev009.com@localhost>
Date: Sun, 19 Apr 2026 17:17:05 -0700

On Fri, Apr 17, 2026 at 11:30 PM Kevin Bowling <kevin.bowling%kev009.com@localhost> wrote:
>
> On Wed, Apr 15, 2026 at 4:06 PM Kevin Bowling <kevin.bowling%kev009.com@localhost> wrote:
> >
> > Looking at https://nxr.netbsd.org/xref/src/doc/TODO.smpnet there is
> > mention of TCP being protected, and I think this is referring to the
> > KERNEL_LOCK around tcp_output() and like.
> >
> > But a cursory read makes these seem unnecessary, i.e. they could be
> > dropped or switched to KERNEL_LOCK_UNLESS_NET_MPSAFE, because all the
> > callers already lock accordingly.  A quick test doing so doesn't lead
> > to immediate issues with a NET_MPSAFE kernel.  But it's also not a big
> > win because softnet_lock still dominates.
> >
> > Is there any recent history looking into any of this, and is anyone
> > working on it still that may want to compare notes?
>
> Here is an early attempt at an MPSAFE netinet:
> https://people.freebsd.org/~kbowling/netbsd/mpsafe-netinet-solock-v1.patch

This approach is an interesting stepping stone and good to get
familiar with the NetBSD stack but is insufficient as-is.  The primary
issue is that serializing with a per socket mutex (adaptive spin) has
a degenerate case on a single TCP stream (iperf3 -R) where one core is
running the user state (copy data to socket buffer) and another is
running an expensive path like ip_input that also needs the socket
buffer exclusively.  We end up spending an excessive amount of time in
softint, the starvation issue I pointed out in my broader concerns.  I
am now working to further break the locks up using the FreeBSD model
as a guideline so these two operations do not spin on eachother.

Part of the interesting stepping stone is that since is done with
pserialize, which cannot sleep, we have a clear path to leap-frog
epoch(9) and instead mostly mechanically replace this with Jeff
Roberson's smr(9).  SMR compliments pserialize where it has a much
better write path with no broadcast IPI wait, and can approach the
read performance on TSO architectures or when using an SMR_LAZY flag.

More to come, but this is going to get pretty complicated.

> The major changes:
> 1) Per-socket locks.  This is based on Robert Watson and Gleb Smirnoff
> and other's work in FreeBSD and is fairly critical for breaking up
> stack processing in a meaningful way to allow flow level parallelism.
> 2) Safe memory reclamation using pserialize(9).  This is inspired
> structurally on work Matt Macy did in FreeBSD for me at Limelight
> Networks.  But NetBSD has some different tradeoffs which I will get to
> in a bit.
>
> There are some immediate concerns:
> 0) This requires NET_MPSAFE config.  I have not even compiled a
> !NET_MPSAFE config with it and I have not thought through any problems
> associated with that yet.  Consider that path broken with this patch
> until further effort proves otherwise.
> 1) I have only tested this in a LAN environment, on an EdgeRouter-4
> (MIPS64 Octeon).  MIPS has a weak memory model so this is a good thing
> for the lockless programming, but it could break in subtle ways on
> different archs that I have not yet thought about.
> 2) TCP is a forgiving protocol, there is certainly a chance I have not
> detected my own obvious breakage because things work well enough.
> 3) TCP has a lot of corner cases, some of which are difficult to
> exercise and some of which are rarely if ever seen.
> 4) I have done no testing outside of TCP/UDP.  There are some
> protocols which should probably be dropped (DCCP) and SCTP would need
> a lot more scrutiny than I have given it (maybe start witha refresh vs
> FreeBSD).  But GENERIC builds and testing can guide further work.
>
> There are some broader concerns.
> 1) pserialize pays a larger and a per use penalty versus FreeBSD's
> epoch(9) and its use in netinet there.  This is both algorithmic and
> structural.  We imported ConcurrencyKit (https://concurrencykit.org/)
> into FreeBSD, which offers Epoch Based Reclaimation (EBR).  Epoch is
> similar from a calling standpoint to pserialize, but has lower latency
> in implementation.  Structurally, FreeBSD enters an epoch for the
> entire pass through the network stack, so reads are basically a couple
> of extra instructions.  That could matter a lot, or it could just be a
> minor performance item.  I have not spent any time looking to see if
> pserialize can be used in a similar way (structural) by adding
> scheduler awareness or improved internally (algorithmically).
> 2) TCP becomes fast enough that I can starve out SOFTINT_CLOCK on a
> single core iperf3 -R on the ER4 system (requires driver improvements
> I will publish later), and it is likely possible on other hardware
> sizes.  The effects of this can be somewhat dire, for instance TCP
> timers stop working, or if you are running a watchdog it won't get
> poked in time.  This needs either a rethink of softint priority, or
> moving some of the work out of softint to something the scheduler can
> rotate.
>
> Regards,
> Kevin Bowling

References:
- TCP and NET_MPSAFE
  - From: Kevin Bowling
- Re: TCP and NET_MPSAFE
  - From: Kevin Bowling

Prev by Date: Re: TCP and NET_MPSAFE
Previous by Thread: Re: TCP and NET_MPSAFE
Indexes:

Home | Main Index | Thread Index | Old Index