Re: patch make struct protosw pr_input non-variadic

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: patch make struct protosw pr_input non-variadic
From: Taylor R Campbell <campbell+netbsd-tech-net%mumble.net@localhost>
Date: Sat, 7 Jan 2017 03:52:27 +0000

Date: Mon, 16 May 2016 04:05:36 +0000
From: Taylor R Campbell <campbell+netbsd-tech-net%mumble.net@localhost>

It's true that the reordering of the structure members may conceivably
affect performance by changing which cache line is used. But the work
that Ozaki-san and Nakahara-san are doing to parallelize the network
stack have much greater impact on performance (measured by Ozaki-san),
and some of their work has been hindered by the lack of the
compiler-assisted refactoring that true prototypes enable.

To revive this thread from last year, since I'd still like to see this
patch committed...

I asked Ozaki-san to measure the performance impact of the patch. It
is, of course, impossible to characterize the full distribution of
performance changes over all network traffic patterns. But at least
in the test setups that Ozaki-san has tried, there seems to be no
major performance impact.

Specifically, in the setup presented at EuroBSDcon[1], Ozaki-san ran
two performance evaluations, each one with and without the patch.
Quoting his explanation to me:

`Anyway I evaluated again three times for each kernel with
(a) simple throughput benchmark and (b) RFC 2544 benchmark.

(a) just measures a throughput with 200 Mbps offered traffic.
(b) tries to find the best performance while no packet drops happens
by changing offered traffic with bisecting. So its results should
be more accurate and stable than (a).

The results are:
(a)
w/ the patch: 152.0, 153.4, 153.7
w/o the patch: 152.7, 151.7, 153.5

(b)
w/ the patch: 162.7, 164.2, 162.7
w/o the patch: 162.9, 162.0, 162.7'

What do we conclude from this? It's reasonable to assume that all
these distributions are roughly normally distributed (there is a hard
lower bound to the possible run time, but I expect there is enough
white noise all the time that a normal approximates it well enough).

We might perform a traditional frequentist hypothesis test: for each
evaluation, apply a test to reject the null hypothesis, that the two
distributions -- with patch, without patch -- are equal, in favour of
the alternative hypothesis that the distributions are different,
namely a Welch's t-test, with false rejection rate alpha = 5%.
Computed by scipy and rounded by my wetware to two decimal places,

(a) t = .54, p = .62 > .05 => fail to reject null hypothesis
(b) t = 1.2, p = .32 > .05 => fail to reject null hypothesis

Power analysis: For this sample size, there would be an 80% chance of
rejecting the null hypothesis if the means were different by ~3 sigma,
50% if ~2 sigma, &c. This is a fairly weak test -- if we had a sample
size of thirty instead of three, there would be an 80% chance of
rejecting if the means differed by only ~.7 sigma. Nevertheless, this
suggests there is not a major performance impact.

I didn't preregister this experiment or hypothesis test, but this is
exactly what any undergrad in a statistics class would be inclined to
do, if they didn't reach for a Student's t-test which assumes equal
variance, and which gives essentially the same results anyway.

We might do a Bayesian analysis with priors on the means and standard
deviations of these distribution, and then ask for the posterior
predictive distribution on, say, the squared distances between the
means. That's a little more work than I care to do at the moment.

In contrast, Ozaki-san has measured much bigger performance
differences, ~8 sigma, with psref -- in serial, but of course psref
enables the network stack to scale in parallel. This is why many uses
of psref are still conditional on NET_MPSAFE. However, the world is
scaling in parallel, not in serial. And psref is also only a
provisional tool until we can restructure the hot paths to avoid
sleeping altogether and replace psref by pserialize which does no
linked list operations.

So. OK to commit?

[1] Ryota Ozaki and Kengo Nakahara, `Toward MP-safe Networking in
NetBSD', EuroBSDcon 2016, Stockholm.
https://www.netbsd.org/gallery/presentations/ozaki-r/2016_EuroBSDcon/EuroBSDcon2016-ozaki-nakahara.pdf

Follow-Ups:
- Re: patch make struct protosw pr_input non-variadic
  - From: Joerg Sonnenberger

References:
- Re: patch make struct protosw pr_input non-variadic
  - From: Taylor R Campbell

Prev by Date: Re: How to modify TOS ?
Next by Date: Re: patch make struct protosw pr_input non-variadic
Previous by Thread: Re: patch make struct protosw pr_input non-variadic
Next by Thread: Re: patch make struct protosw pr_input non-variadic
Indexes:

Home | Main Index | Thread Index | Old Index