Subject: Layer-2 socket proposal
To: None <tech-net@netbsd.org>
From: Christian E. Hopps <chopps@merit.edu>
List: tech-net
Date: 12/27/1999 16:40:06
This proposal is broken into 2 parts.  The user API and the
implementation.  I'd appreciate it if any ensuing discussions kept this
division in tact.

LAYER-2 SOCKETS USER API:
-------------------------

Layer-2 sockets are allocated as follows:

        socket(PF_LINK, SOCK_DGRAM, frametype);

Where frametype is selected from an enumeration of supported logical
framing types.  For example FRAMETYPE_E2 (i.e., ethernet 2.0 or
ethertype framing), FRAMETYPE_LLC (802.2 framing)

No frames are delivered until the socket is bound.  The socket is bound
to a sockaddr_dl which will (possibly) restrict deliver of frames to a
specific subset.  The sockaddr_dl is interpretted as follows:

        sdl_index/sdl_nlen+sdl_data
                        -- specifies which interface to receive frame
                        from.  If no interface is specified then every
                        interface is eligible.
        
        sdl_type        -- if no interface is chosen the user
                        can select a specific interface type,
                        e.g., IFT_ETHER.

        sdl_slen+sdl_data
                        -- specifies a selector in the logical frame.
                        e.g. for FRAMETYPE_E2 it would be a 2 octet
                        value specifying the ethertype, and for
                        FRAMETYPE_LLC it would be the destination
                        service access point (i.e., llc_dsap).

When sending frames the user must include a sockaddr_dl that
specifies the interface in sdl_index/sdl_nlen+sdl_data and if necessary
for the interface type a desitnation address in sdl_nlen+sdl_data

Upone receiving frames the recvfrom address will be a sockaddr_dl that
has sdl_index set the the interface index upon which the frame was
received and if supported by the interface the link address the frame
was sent from in sdl_nlen+sdl_data.

Newly Defined Socket Options:

        L2_EXCLUSIVE, int (bool)
                If true then when a frame is sent to the socket it
                considered delivered and processing stops.  Otherwise
                the frame is eligible for other sockets or layer-3
                processing.

        L2_PHYHDRINCL, int (bool)
                If true then for reads the entire frame, both physical
                and logical frame headers are sent to the socket, for
                writes the user must include both the phsyical and
                logical parts of the frame header.

                Otherwise on reads the physical portion of the frame
                header is removed before the frame is sent to the
                socket, and on writes the phsycial portion is supplied
                by the kernel.

        L2_RECVDSTADDR, int (bool)
                If true a control option is included that contains a
                sockaddr_dl specifying the address the frame was
                delivered to.

        L2_ADD_MEMBERSHIP, sockaddr_dl
        L2_DROP_MEMBERSHIP, sockaddr_dl
                Joins or leaves repsectively the layer 2 multicast group
                specified in sockaddr_dl.  The sockaddr_dl must specify
                both the interface (sdl_index or sdl_nlen) and the
                multicast address (sdl_alen).


LAYER-2 SOCKETS KERNEL INTERFACE:
---------------------------------

INPUT:

The main input to the L2 sockets module is through the function
l2_input().

        int
        l2_input(ifp, m, mt, ft)
                struct ifnet *ifp;
                struct mbuf *m;
                u_int mt, ft;

Where `ifp' is the interface the frame was received on, `m' is the mbuf
chain containing the frame, 'mt' is the media type which the frame
belongs to and 'ft' is the logical frame type.  If the function returns
non-zero the frame should be considered delivered and processing should
end.

To facilitate performance each ifnet has a new field u_int
ifp->if_l2_unuse.  This is set to the number of l2 sockets bound to the
given interface 'ifp'.  There is also a global array
l2_global_inuse[maxframetype] which is indexed on frame type and
indicates the number of sockets bound to multiple interfaces for that
logical frame type.

So in the media input routine the user would do the following

        if (l2_global_inuse[ft] || ifp->if_l2_unuse) {
                if (l2_input(ifp, m, mediatype, frametype))
                        /* consider delivered */
        }

Most media requires only a single type.  The media type is used to
determine the offset and length of the source and destination address in
the physical frame and the offset of the logical frame.

Each media type must be registered with l2_register().

        int
        l2_register(dstoff, dstlen, srcoff, srclen, logoff)
                u_int dstoff, dstlen, srcoff, srclen, logoff;

Specifying the offsets from the start of the physical frame for each of
the respective items, destination address, source address and logical
frame.

The function returns the media type.

In an earlier version of the code the interface type was used for media
type, however the author has been informed that certain media can have
multiple phsyical formats and thus require more than a single
identifying value.

Further, the only use of media type is for collapsing the 5 arguments
that would otherwise have to be passed to l2_input().  If people really
dislike the media type stuff we can put these values in the call to
l2_input().

Finally, we could hard code media types into the l2 code if people like
the arg savings of a media type but don't like the dynamic nature of the 
l2_register().  The intention with using a dynamic registration was for
new media types to be able to be added to the system dynamically.

OUTPUT:

ifp->if_ouptut() is used to queue frames for interfaces. A sockaddr_dl
is passed as the 'dst' argument with sdl_family set to AF_LINK if only
the logical frame is present. The interface must then prepend the physical 
portion of the frame using the address found in 'dst'.

If the entire phsyical frame is present in the mbuf chain the pseudo
type AF_LINK_COMPLETE is specified in sdl_family.  And dst contains no
address data.

If people don't like AF_LINK_COMPLETE we could also consider allowing
the cleaner way of setting 'dst' to 0.  Using the fake AF value allows
not touching all the possible drivers to make sure 0 is handled.  This
is also the previous way it was done for a similar use with
`pseudo_AF_HDRCMPLT'.

REST:

All the other code is standard socket goo with paticular attention paid
to optimal performance in l2_input().  Mostly the performance in
l2_input() is gained through queueing sockets in hashes based on ifindex 
and frame type.

I'm mostly done coding the basics of the above.  I would probably be
ready to commit in less than a week dependent, of course, on the
reaction to the design.

Chris.