Ryota Ozaki <ozaki.ryota%gmail.com@localhost> writes: > In the specification DLT_NULL assumes a protocol family in the host > byte order followed by a payload. Interfaces of DLT_NULL uses > bpf_mtap_af to pass a mbuf prepending a protocol family. All interfaces > follow the spec and work well. > > OTOH, bpf_write to interfaces of DLT_NULL is a bit of a sad situation. > A writing data to an interface of DLT_NULL is treated as a raw data > (I don't know why); the data is passed to the interface's output routine > as is with dst (sa_family=AF_UNSPEC). tun seems to be able > to handle such raw data but the others can't handle the data (probably > the data will be dropped like if_loop). Summarizing and commenting to make sure I'm not confused on receive/read, DLT_NULL prepends AF in host byte order on transmit/write, it just sends with AF_UNSPCE This seems broken as it is asymmetric, and is bad because it throws away information that is hard to reliably recreate. On the other hand this is for link-layer formats, and it seems that some interfaces have an AF that is not really part of what is transmitted, even though really it is. For example tun is using an IP proto byte to specify AF and really this is part of the link protocol. Except we pretend it isn't. > Correcting bpf_write to assume a prepending protocol family will > save some interfaces like gif and gre but won't save others like stf > and wg. Even worse, the change may break existing users of tun > that want to treat data as is (though I don't know if users exist). > > BTW, prepending a protocol family on tun is a different protocol from > DLT_NULL of bpf. tun has three protocol modes and doesn't always prepend > a protocol family. (And also the network byte order is used on tun > as gert says while DLT_NULL assumes the host byte order.) wow. > So my fix will: > - keep DLT_NULL of if_loop to not break bpf_mtap_af, and > - unchange DLT_NULL handling in bpf_write except for if_loop to bother > existing users. > The patch looks like this: > > @@ -447,6 +448,14 @@ bpf_movein(struct uio *uio, int linktype, > uint64_t mtu, struct mbuf **mp, > m0->m_len -= hlen; > } > > + if (linktype == DLT_NULL && ifp->if_type == IFT_LOOP) { > + uint32_t af; > + memcpy(&af, mtod(m0, void *), sizeof(af)); > + sockp->sa_family = af; > + m0->m_data += sizeof(af); > + m0->m_len -= sizeof(af); > + } > + > *mp = m0; > return (0); That seems ok to me. I think the long-term right fix is to define DLT_AF which has an AF word in host order on receive and transmit always, and to modify interfaces to use it whenever they are AF aware at all. In this case tun would fill in the AF word from the IP proto field, and you'd get a transformed/regularized AF word when really the "link layer packet" had the IP proto field. But that's ok as it's just cleanup and reversible.
Attachment:
signature.asc
Description: PGP signature