Subject: Re: Faster pipes from FreeBSD
To: None <tech-kern@netbsd.org>
From: Ross Harvey <ross@ghs.com>
List: tech-kern
Date: 04/07/2001 09:25:45
> From: "Jarommr" Dolecek <jdolecek@netbsd.org>
>:::
> Once I did the tests when booted via 'boot -s' to single user,
> the numbers become more consistent (previously, I tested in single user too,
> but via shutdown from multiuser).  Don't ask me why :-/

It's probably due to a lack of page coloring in mbuf cluster allocation.
Running the system churns up mclpool so you get a stochastic cache aliasing
factor superimposed upon your results. Plays hell with benchmarking, for
sure.

> On my PIII/600Mhz, the pipe bandwidth and latency seems to be same
> for "old" and "new" code according to lmbench. It's like 34.9us
> latency and 487 MB/sec bandwidth.  Seems likely the code speed is
> not a factor here, more likely memory access speed is.

Hmm. I predicted a long time ago that the sometimes-criticized mbuf
implementation would work better and better as the CPU's became faster.
It's funny to see someone occasionally drag out the ancient LBL work showing
huge TCP speedups via mbuf elimination. That work was only valid on
sun2/sun3/vax class machines, and probably also only for certain ethernet
HW. Now it appears that your cpu can shove data thru mbufs at RAM speed,
never mind network speed, and with no speed penalty at all for the mbuf
overhead. Wow.

I bet that on your CPU, eliminating all that macro expansion in the
kernel would cause it to go faster, due to less icache thrashing.

Now, supposedly, the DTYPE_PIPE code reuses the same buffer bits to get
better cache performance. Does that comment just represent wishful
thinking?

> I tried on 386DX and values here are more telling - I get consistent values,
> NEW_PIPE is faster about 5 times:
>
> old pipe:	latency: ~1800 us, bandwidth: 0.75 MB/sec
> NEW_PIPE:	latency: ~1148 us, bandwidth: 3.57 MB/sec
> old pipe with unpst_*space bumped to 16KB:
> 		latency: ~1800 us, bandwidth: 0.80 MB/sec
>
> I also tried to run several (5) instancies
> of lmbench/bw_pipe, to get some numbers for paralel pipe pushing.
> Again, there was no real difference between "old" and "new" pipes
> on the PIII, even when I used more simultaneous bw_pipe processes (tried
> 7 and 10). The numbers for 386DX are like this:
>
> old pipes(5):	0.15 MB/s
> NEW_PIPE(5):	~0.76 MB/s
>
> It would be really interesting to compare numbers on other archs.

I like the fact that DTYPE_PIPE uses unwired buffer space, and
doesn't compete for nmbclusters (a common bottleneck) -- but it's a bit
discouraging to think it only helps performance on older CPUs. Still,
probably worth doing.

I would guess that patching those buffer sizes makes no difference on
your fast CPU?

//ross