tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: TCP and NET_MPSAFE



On Wed, Apr 15, 2026 at 4:06 PM Kevin Bowling <kevin.bowling%kev009.com@localhost> wrote:
>
> Looking at https://nxr.netbsd.org/xref/src/doc/TODO.smpnet there is
> mention of TCP being protected, and I think this is referring to the
> KERNEL_LOCK around tcp_output() and like.
>
> But a cursory read makes these seem unnecessary, i.e. they could be
> dropped or switched to KERNEL_LOCK_UNLESS_NET_MPSAFE, because all the
> callers already lock accordingly.  A quick test doing so doesn't lead
> to immediate issues with a NET_MPSAFE kernel.  But it's also not a big
> win because softnet_lock still dominates.
>
> Is there any recent history looking into any of this, and is anyone
> working on it still that may want to compare notes?

Here is an early attempt at an MPSAFE netinet:
https://people.freebsd.org/~kbowling/netbsd/mpsafe-netinet-solock-v1.patch

The major changes:
1) Per-socket locks.  This is based on Robert Watson and Gleb Smirnoff
and other's work in FreeBSD and is fairly critical for breaking up
stack processing in a meaningful way to allow flow level parallelism.
2) Safe memory reclamation using pserialize(9).  This is inspired
structurally on work Matt Macy did in FreeBSD for me at Limelight
Networks.  But NetBSD has some different tradeoffs which I will get to
in a bit.

There are some immediate concerns:
0) This requires NET_MPSAFE config.  I have not even compiled a
!NET_MPSAFE config with it and I have not thought through any problems
associated with that yet.  Consider that path broken with this patch
until further effort proves otherwise.
1) I have only tested this in a LAN environment, on an EdgeRouter-4
(MIPS64 Octeon).  MIPS has a weak memory model so this is a good thing
for the lockless programming, but it could break in subtle ways on
different archs that I have not yet thought about.
2) TCP is a forgiving protocol, there is certainly a chance I have not
detected my own obvious breakage because things work well enough.
3) TCP has a lot of corner cases, some of which are difficult to
exercise and some of which are rarely if ever seen.
4) I have done no testing outside of TCP/UDP.  There are some
protocols which should probably be dropped (DCCP) and SCTP would need
a lot more scrutiny than I have given it (maybe start witha refresh vs
FreeBSD).  But GENERIC builds and testing can guide further work.

There are some broader concerns.
1) pserialize pays a larger and a per use penalty versus FreeBSD's
epoch(9) and its use in netinet there.  This is both algorithmic and
structural.  We imported ConcurrencyKit (https://concurrencykit.org/)
into FreeBSD, which offers Epoch Based Reclaimation (EBR).  Epoch is
similar from a calling standpoint to pserialize, but has lower latency
in implementation.  Structurally, FreeBSD enters an epoch for the
entire pass through the network stack, so reads are basically a couple
of extra instructions.  That could matter a lot, or it could just be a
minor performance item.  I have not spent any time looking to see if
pserialize can be used in a similar way (structural) by adding
scheduler awareness or improved internally (algorithmically).
2) TCP becomes fast enough that I can starve out SOFTINT_CLOCK on a
single core iperf3 -R on the ER4 system (requires driver improvements
I will publish later), and it is likely possible on other hardware
sizes.  The effects of this can be somewhat dire, for instance TCP
timers stop working, or if you are running a watchdog it won't get
poked in time.  This needs either a rethink of softint priority, or
moving some of the work out of softint to something the scheduler can
rotate.

Regards,
Kevin Bowling


Home | Main Index | Thread Index | Old Index