tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: AsiaBSDCon 2014 BoF: Improving bridge(4) or Toward a Unified L2 Framework



On 23 Mar, 2014, at 08:44 , Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> On Sun, Mar 23, 2014 at 8:27 PM, Gert Doering <gert%greenie.muc.de@localhost> 
> wrote:
>> HI,
>> 
>> On Sun, Mar 23, 2014 at 07:52:23PM +0900, Ryota Ozaki wrote:
>>> This proposal is a bit radical though, anyway we think
>>> we have to improve bridge(4), say making it L3 capable.
>> 
>> What exactly would a "L3 capable" bridge(4) be?  As opposed to "normal
>> routed interfaces"?
> 
> One direction would, as you say, make bridge(4) normal routed interfaces,
> like other BSD families and Linux. OTOH, we are thinking another direction
> having a special interface for a bridge where the interface acts a routed
> interface instead of the bridge itself. The design has benefits that it can
> keep bridge(4) L2 (letting bridges have L3 features is strange for us),
> keep bridge(4) code itself simple, and its structure fits the unification of
> dataplanes that we proposed in the presentation.

Yes, please!  I've never understood the current behaviour of bridge(4)
interfaces, but I'm pretty sure the semantic differences between the
definition of an "interface" at the switch level and an "interface" from
the point of view of L3 protocols causes things to break when you try
to treat a single hardware interface as if it were both kinds.  Even the
interface flags don't match up, since hardware ethernet interfaces which
are "multiaccess" interfaces for L3 are generally "point-to-point" interfaces
for L2 purposes.  I'll guess the problems with all of this will minimally
show up in broken behaviour for IP multicast, or as a pile of warts in
there to avoid breakage, as multicasting is generally the canary that
tells you something is wrong in the coal mine.

Hardware interfaces should either be L2 switched interfaces (i.e. you make
routing decisions for arriving packets by looking solely at the MAC addresses
and, optionally, VLAN tags), or they should be L3 interfaces, where the
destination MAC address might be used for drop/no-drop filtering but the
routing decision is arrived at by dumping the Ethernet header and looking
at the L3 header instead.  If the hardware interface is configured for bridging
it shouldn't have L3 configuration, and vice versa.  If you want to add the 
local
host to the bridge for L3 use you instead conjure up a pseudo-interface which
has one side added to the bridge group and treated pretty much like the hardware
interface members of the same bridge group, while the other side of the
pseudo-interface has a MAC address and gets the L3 configuration.  In 
particular,
no matter which of the bridge's hardware interfaces a packet arrives on, by the
time a packet makes it to the L3 stack its incoming interface should be 
identified
as the bridge group's pseudo-interface; the things L2 forwarding considers to be
"interfaces" are not relevant to IP.  I'm happy the proposal seems to want to
arrange it this way as well.

If you do it like this, however, please consider allowing more than one
pseudo-interface to be added to the bridge group when this makes sense.  This
might make it possible to, say, share the single hardware interface in a host
between the host itself and a SIMH vax by putting the hardware interface and
two pseudo-interfaces into a bridge group, one pseudo-interface for the host's
IP configuration and the other to be opened "raw" by the SIMH DEUNA emulation
to send and receive packets (thus avoiding BPF; when BPF is the solution it
is generally an indication that the problem remains to be solved).  This might
also help if one of the links you want to add to the bridge is an 
ethernet-over-PPP
or ethernet-over-ATM link and a way is needed to glue this in, though this 
problem
is less common than it used to be.

Finally, on an incidentally related topic, I really wish the IP multicast 
support
in the kernel were implemented as an unified part of the IP forwarding path, 
sharing
the basic forwarding code and route lookup with unicast to the extent possible 
(that
extent is considerable, actually), rather than the bag-on-the-side, 
whole-different-thing
approach used now.  The reason for this is not that I am a fan of IP 
multicasting (the
opposite of that is closer to the truth), but that I find that asking the 
question "How
does this work with IP multicast?" and finding a good answer for that usually 
leads
to better design decisions for the protocol stack as a whole.  It might 
encourage
people to ask the question more often if the multicast support were an 
integrated,
unavoidable component of the basic forwarding path rather than split out into
separate files that no one ever looks at.

Dennis Ferguson


Home | Main Index | Thread Index | Old Index