tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Enable to send packets on if_loop via bpf



On Tue, Nov 22, 2022 at 8:25 PM Ryota Ozaki <ozaki.ryota%gmail.com@localhost> wrote:
>
> On Tue, Nov 22, 2022 at 8:00 PM Ryota Ozaki <ozaki.ryota%gmail.com@localhost> wrote:
> >
> > On Tue, Nov 22, 2022 at 12:49 AM Greg Troxel <gdt%lexort.com@localhost> wrote:
> > >
> > >
> > > Ryota Ozaki <ozaki.ryota%gmail.com@localhost> writes:
> > >
> > > > In the specification DLT_NULL assumes a protocol family in the host
> > > > byte order followed by a payload.  Interfaces of DLT_NULL uses
> > > > bpf_mtap_af to pass a mbuf prepending a protocol family.  All interfaces
> > > > follow the spec and work well.
> > > >
> > > > OTOH, bpf_write to interfaces of DLT_NULL is a bit of a sad situation.
> > > > A writing data to an interface of DLT_NULL is treated as a raw data
> > > > (I don't know why); the data is passed to the interface's output routine
> > > > as is with dst (sa_family=AF_UNSPEC).  tun seems to be able
> > > > to handle such raw data but the others can't handle the data (probably
> > > > the data will be dropped like if_loop).
> > >
> > > Summarizing and commenting to make sure I'm not confused
> > >
> > >   on receive/read, DLT_NULL  prepends AF in host byte order
> > >   on transmit/write, it just sends with AF_UNSPCE
> > >
> > >   This seems broken as it is asymmetric, and is bad because it throws
> > >   away information that is hard to reliably recreate.  On the other hand
> > >   this is for link-layer formats, and it seems that some interfaces have
> > >   an AF that is not really part of what is transmitted, even though
> > >   really it is.  For example tun is using an IP proto byte to specify AF
> > >   and really this is part of the link protocol.  Except we pretend it
> > >   isn't.
> >
> > I found the following sentence in bpf.4:
> >
> >      A packet can be sent out on the network by writing to a bpf file
> >      descriptor.  The writes are unbuffered, meaning only one packet can be
> >      processed per write.  Currently, only writes to Ethernets and SLIP links
> >      are supported.
> >
> > So bpf_write to interfaces of DLT_NULL may be simply unsupported on
> > NetBSD...
> >
> > >
> > > > Correcting bpf_write to assume a prepending protocol family will
> > > > save some interfaces like gif and gre but won't save others like stf
> > > > and wg.  Even worse, the change may break existing users of tun
> > > > that want to treat data as is (though I don't know if users exist).
> > > >
> > > > BTW, prepending a protocol family on tun is a different protocol from
> > > > DLT_NULL of bpf.  tun has three protocol modes and doesn't always prepend
> > > > a protocol family.  (And also the network byte order is used on tun
> > > > as gert says while DLT_NULL assumes the host byte order.)
> > >
> > > wow.
> > >
> > > > So my fix will:
> > > > - keep DLT_NULL of if_loop to not break bpf_mtap_af, and
> > > > - unchange DLT_NULL handling in bpf_write except for if_loop to bother
> > > > existing users.
> > > > The patch looks like this:
> > > >
> > > > @@ -447,6 +448,14 @@ bpf_movein(struct uio *uio, int linktype,
> > > > uint64_t mtu, struct mbuf **mp,
> > > >                 m0->m_len -= hlen;
> > > >         }
> > > >
> > > > +       if (linktype == DLT_NULL && ifp->if_type == IFT_LOOP) {
> > > > +               uint32_t af;
> > > > +               memcpy(&af, mtod(m0, void *), sizeof(af));
> > > > +               sockp->sa_family = af;
> > > > +               m0->m_data += sizeof(af);
> > > > +               m0->m_len -= sizeof(af);
> > > > +       }
> > > > +
> > > >         *mp = m0;
> > > >         return (0);
> > >
> > > That seems ok to me.
> >
> > Thanks.
> >
> > >
> > >
> > > I think the long-term right fix is to define DLT_AF which has an AF word
> > > in host order on receive and transmit always, and to modify interfaces
> > > to use it whenever they are AF aware at all.   In this case tun would
> > > fill in the AF word from the IP proto field, and you'd get a
> > > transformed/regularized AF word when really the "link layer packet" had
> > > the IP proto field.  But that's ok as it's just cleanup and reversible.
> >
> > I think introducing DLT_AF is a bit of a tough task because DLT_* definitions
> > are managed by us.
>    ^ are NOT managed, I meant to say...
>
>   ozaki-r

https://www.netbsd.org/~ozaki-r/loop-bpf2.patch

Anyway this is the latest patch.  It is adjusted to ensure to apply
input validations
(pointed out by ryo@).

  ozaki-r


Home | Main Index | Thread Index | Old Index