tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Enable to send packets on if_loop via bpf



On Tue, Nov 22, 2022 at 8:00 PM Ryota Ozaki <ozaki.ryota%gmail.com@localhost> wrote:
>
> On Tue, Nov 22, 2022 at 12:49 AM Greg Troxel <gdt%lexort.com@localhost> wrote:
> >
> >
> > Ryota Ozaki <ozaki.ryota%gmail.com@localhost> writes:
> >
> > > In the specification DLT_NULL assumes a protocol family in the host
> > > byte order followed by a payload.  Interfaces of DLT_NULL uses
> > > bpf_mtap_af to pass a mbuf prepending a protocol family.  All interfaces
> > > follow the spec and work well.
> > >
> > > OTOH, bpf_write to interfaces of DLT_NULL is a bit of a sad situation.
> > > A writing data to an interface of DLT_NULL is treated as a raw data
> > > (I don't know why); the data is passed to the interface's output routine
> > > as is with dst (sa_family=AF_UNSPEC).  tun seems to be able
> > > to handle such raw data but the others can't handle the data (probably
> > > the data will be dropped like if_loop).
> >
> > Summarizing and commenting to make sure I'm not confused
> >
> >   on receive/read, DLT_NULL  prepends AF in host byte order
> >   on transmit/write, it just sends with AF_UNSPCE
> >
> >   This seems broken as it is asymmetric, and is bad because it throws
> >   away information that is hard to reliably recreate.  On the other hand
> >   this is for link-layer formats, and it seems that some interfaces have
> >   an AF that is not really part of what is transmitted, even though
> >   really it is.  For example tun is using an IP proto byte to specify AF
> >   and really this is part of the link protocol.  Except we pretend it
> >   isn't.
>
> I found the following sentence in bpf.4:
>
>      A packet can be sent out on the network by writing to a bpf file
>      descriptor.  The writes are unbuffered, meaning only one packet can be
>      processed per write.  Currently, only writes to Ethernets and SLIP links
>      are supported.
>
> So bpf_write to interfaces of DLT_NULL may be simply unsupported on
> NetBSD...
>
> >
> > > Correcting bpf_write to assume a prepending protocol family will
> > > save some interfaces like gif and gre but won't save others like stf
> > > and wg.  Even worse, the change may break existing users of tun
> > > that want to treat data as is (though I don't know if users exist).
> > >
> > > BTW, prepending a protocol family on tun is a different protocol from
> > > DLT_NULL of bpf.  tun has three protocol modes and doesn't always prepend
> > > a protocol family.  (And also the network byte order is used on tun
> > > as gert says while DLT_NULL assumes the host byte order.)
> >
> > wow.
> >
> > > So my fix will:
> > > - keep DLT_NULL of if_loop to not break bpf_mtap_af, and
> > > - unchange DLT_NULL handling in bpf_write except for if_loop to bother
> > > existing users.
> > > The patch looks like this:
> > >
> > > @@ -447,6 +448,14 @@ bpf_movein(struct uio *uio, int linktype,
> > > uint64_t mtu, struct mbuf **mp,
> > >                 m0->m_len -= hlen;
> > >         }
> > >
> > > +       if (linktype == DLT_NULL && ifp->if_type == IFT_LOOP) {
> > > +               uint32_t af;
> > > +               memcpy(&af, mtod(m0, void *), sizeof(af));
> > > +               sockp->sa_family = af;
> > > +               m0->m_data += sizeof(af);
> > > +               m0->m_len -= sizeof(af);
> > > +       }
> > > +
> > >         *mp = m0;
> > >         return (0);
> >
> > That seems ok to me.
>
> Thanks.
>
> >
> >
> > I think the long-term right fix is to define DLT_AF which has an AF word
> > in host order on receive and transmit always, and to modify interfaces
> > to use it whenever they are AF aware at all.   In this case tun would
> > fill in the AF word from the IP proto field, and you'd get a
> > transformed/regularized AF word when really the "link layer packet" had
> > the IP proto field.  But that's ok as it's just cleanup and reversible.
>
> I think introducing DLT_AF is a bit of a tough task because DLT_* definitions
> are managed by us.
   ^ are NOT managed, I meant to say...

  ozaki-r


Home | Main Index | Thread Index | Old Index