Subject: Re: Moving ethfoo in the main tree
To: Jason Thorpe <thorpej@shagadelic.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 12/15/2004 19:09:03
--0Wg1ddIY7KV0vpwL
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 14, 2004 at 01:51:31PM -0800, Jason Thorpe wrote:
>=20
> On Dec 14, 2004, at 1:15 PM, Daniel Carosone wrote:
>=20
> >Perhaps these requirements are a little different, and merit a
> >different solution specifically in gre itself, but if a generic
> >solution is feasible I'd like to encourage that path.
>=20
> VERY different!

Hrm.. while this may be so, I don't think the rest of your comments
directly support this.

My suggestion was about plumbing between drivers that implement
various kinds of packet handling, not so much about the handling each
might do.

My thought was that there are several different pseudo interfaces,
with different functionality; tun for userspace i/o, gre (and similar)
for encapsulation, null for just dumping to, etc.  Each of these might
be more useful if they can be seen not only as IP (etc) layer 3
interfaces, but as something that handles ethernet-level frames as
well.

If some common code to do this can be factored out elsewhere, and made
easily available to these interfaces, it seemed like a good idea.

> When you encap inside GRE, you wind up with a lower MTU.

Sure, and it gets lower still when I'm bridge(4)ing vlan(4)-tagged
packets over gre(4) tunnels inside ipsec ESP transport - and then if
I'm lucky, my ESP compression transform might buy me back some or all
or even more of this space. Or whatever else I might be doing at the
time to suit the needs of the day.

But so what? That winds up being a property of the encapsulation being
done, and a matter for the specific driver to handle as appropriate
(IP fragmentation, for example); how each driver abuses my packets is
separate from getting the Ethernet frames to that driver in a generic
fashion.

> This is going to seriously break Ethernet networks that are=20
> bridged to that GRE.

Not at all, though certainly there are limitations and considerations
I need to be aware of, including MTU/fragmentation issues as well as
latency and bandwidth issues.  Those issues apply already to IP and
other packets sent down a GRE; even though an additional Ethernet
header adds more bytes to these, it doesn't create the problem.

To mitigate those limitations, in some of the environments where I've
done or considered this approach, it is actually entirely feasible and
appropriate to use a lower MTU on the physical Ethernet, because I
control all the hosts. In other environments, I might not care about
the limitations or performance impact of fragmentation, or they may
not impact the protocols I care about bridging.

Assessing the capabilities and limitations of my tools, and their
applicability to my requirements, is part of my job as a network
designer..  I don't expect perfect tools, but flexible ones are a
great help.

> To bridge Ethernet over a long-haul network like that, it seems to me=20
> that you need to encapsulate it in a stream protocol, so that you can=20
> preserve the local MTU semantics.

No way - or at least not if "a stream protocol" means TCP, because
then I can wind up with the well-known TCP-over-TCP problems.
Encapsulating in GRE or UDP, with fragmentation and potential
packet-loss-via-fragment-loss, meets the generic requirements better,
other than performance/link efficiency. =20

Sure, someone might invent a non-TCP stream protocol which allows
inner and outer packet boundaries to slide independantly, and deals
with reordering and packet loss appropriately for my needs on the day,
and provide a new driver to implement it.  Perhaps the
previously-mentioned etherip protocol does this, I haven't looked at
it yet. Or maybe today I really do want TCP encapsulation, because
bridge(4) grew the capability to define a monitor interface that gets
a copy of all packets, and the traffic being bridged is not actually
in the conversation path but is instead for some sort of remote
sniffer or replication tool that needs reliably-ordered copies of the
packets.

In any case, I'll then be looking for the relevant driver endpoint to
behave like the other pseudo-ethernets as I described, so I can stack
bridge(4) or vlan(4) or other things on or under them like any other
Ethernet, and I can simply replace my gre(4) tunnel with the new one.

Perhaps netgraph is a better way of achieving the generic plumbing I'm
envisaging?

--
Dan.

--0Wg1ddIY7KV0vpwL
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iD8DBQFBv/EfEAVxvV4N66cRAhuSAJ9D2SVZ1zxPJYo0Do/wOF1yK0sd9wCfSe7X
J31Q/AhwD7gdhC9WcBOTopE=
=zsHG
-----END PGP SIGNATURE-----

--0Wg1ddIY7KV0vpwL--