Subject: Re: Making mbufs deal better with various buffer sizes
To: Darren Reed <avalon@caligula.anu.edu.au>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 05/03/2004 16:39:50
In message <200405020338.i423ca2V015126@caligula.anu.edu.au>,
Darren Reed writes:

Darren,

I'm having a very hard time in turning your proposal into concrete
actions to take.

I'd would like to see the proposed "efficiency" argument broken down
into two separate cases: send, and receive. The reason is simple: on
the send side, we can (largely) decide what we do.  In contrast, for
the receive side, we are at the mercy of whatever the NIC can do to
assign frames to different DMA queues, based on packet size; which is
typically "not very much". (You might get one queue of Rx DMA
descriptors for normal packets, and one for jumbo. if you're really
lucky, you get, say, a separate queue for packets marked by QoS;
possibly useful for VoIP, but not for what you're suggesting).

I'm ignoring long-defunct PIO NICs. Driver writers (e.g., Jason) can
argue about some the details, but that's a broad-brush, rough-justice
description of what we should design for.

You also mentioned jumbo-sized (8000 byte) frames. Here, a number of
NetBSD drivers will allocate frames out of a private jumbo pool.  (for
example, the bge driver does).  AFAIK you're right that the if_wm
driver does not.  But you're not going to get very far if you're
suggesting we redo the entire mbuf abstraction, when we could instead
improve the wm driver.

On the send side:
I think you can blame me for using (at least) 2048-byte clusters on
every port; given the ubiquity of Ethernet, 2048-byte clusters is
clearly better than the 1024-byte clusters several ports used to use;
and larger (4096) was not well-justified.


BTW, the Received Wisdom, amongst experienced high-performance
implementors using the BSD stack, is to tune the small mbuf sizes to
match the traffic.  You want MHLEN to be ``big enough'' so that --
after excluding full-sized (Ethernet) frames, and other traffic
which ends up in clusters -- your MHLEN is bigger than the majority of
remaining packets.

If (for example) most of your traffic is NFS, and you tune MHLEN so
that the majority of `small' RPCs fit inside MHLEN, that can make a
noticeable difference for NFS, vis-a-vis allocating and walking two
smaller mbufs.  Such tuning depends on the expected workload (I'd
expect mostly-unidirectional HTTP flows have rather different
requirements than does NFS); so again, I'm having difficulty seeing
what action you'd like be taken here.