tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: L2TPv3 interface



Hi riastradh@n.o,

Thank you for your detailed review!


At first, here is updated patches.
   http://netbsd.org/~knakahara/if-l2tp-2/01-accept-ifname-include-digit.patch
   http://netbsd.org/~knakahara/if-l2tp-2/02-if-l2tp.patch

And then, each response is below.

On 2017/01/20 0:38, Taylor R Campbell wrote:
>    Date: Thu, 19 Jan 2017 17:58:17 +0900
>    From: Kengo NAKAHARA <k-nakahara%iij.ad.jp@localhost>
> A few little comments:
> 
>    diff --git a/sys/net/if.c b/sys/net/if.c
>    index 2386af3..ba63266 100644
>    --- a/sys/net/if.c
>    +++ b/sys/net/if.c
>    @@ -1599,7 +1613,7 @@ if_clone_lookup(const char *name, int *unitp)
>            strcpy(ifname, "if_");
>            /* separate interface name from unit */
>            for (dp = ifname + 3, cp = name; cp - name < IFNAMSIZ &&
>    -           *cp && (*cp < '0' || *cp > '9');)
>    +           *cp && !if_is_unit(cp);)
>                    *dp++ = *cp++;
> 
> This changes the generic syntax interface names, perhaps to allow the
> `2' in `l2tp', although since this loop skips over the first three
> octets that doesn't seem to be necessary.  Either way, I don't have a
> problem with this, but it should be done in a separate change.

I see. But sorry, I want to postpone the fix to reduce unnecessary
skip... As a first step, I separate this changes and will commit first.


>    diff --git a/sys/net/if_l2tp.c b/sys/net/if_l2tp.c
>    new file mode 100644
>    index 0000000..dda8bbd
>    --- /dev/null
>    +++ b/sys/net/if_l2tp.c
>    @@ -0,0 +1,1541 @@
>    [...]
>    +/*
>    + * l2tp global variable definitions
>    + */
>    +LIST_HEAD(l2tp_sclist, l2tp_softc);
>    +static struct l2tp_sclist l2tp_softc_list;
>    +kmutex_t l2tp_list_lock;
>    +
>    +#if !defined(L2TP_ID_HASH_SIZE)
>    +#define L2TP_ID_HASH_SIZE 64
>    +#endif
>    +static u_long l2tp_id_hash_mask;
>    +
>    +kmutex_t l2tp_hash_lock;
>    +static struct pslist_head *l2tp_hashed_list = NULL;
> 
> Consider putting related global state into cacheline-aligned structs?

Oh, I forgot it. I should put them into cacheline-aligned structs.
In addition, I remove l2tp_id_hash_mask variable and use L2TP_ID_HASH_SIZE
to avoid holding lock in fast-path (l2tp_lookup_session_ref() =>
id_hash_func()).


> static struct {
>         kmutex_t		lock;
>         struct l2tp_sclist	list;
> } l2tp_softc __cacheline_aligned;
> 
> static struct {
>         kmutex_t		lock;
>         struct pslist_head	*list;
> 	unsigned long		mask;
> } l2tp_hash __cacheline_aligned;
> 
>    +pserialize_t l2tp_psz;
>    +struct psref_class *lv_psref_class __read_mostly;
> 
> __read_mostly for l2tp_psz?

Yes, I add it.

>    +static int
>    +l2tpdetach(void)
>    +{
>    +       int error;
>    +
>    +       if (!LIST_EMPTY(&l2tp_softc_list))
>    +               return EBUSY;
> 
> Need lock here?  Need to first set flag preventing new creation?
> 
> mutex_enter(&l2tp_softc.lock);
> KASSERT(!l2tp_softc.dying);
> l2tp_softc.detaching = true;
> if (!LIST_EMPTY(&l2tp_softc.list)) {
> 	l2tp_softc.detaching = false;
> 	mutex_exit(&l2tp_softc.lock);
> 	return EBUSY;
> }
> mutex_exit(&l2tp_softc.lock);
> 
> Anyone trying to add to l2tp_softc.list must also check
> l2tp_softc.detaching before proceeding.

You are right. Hmm..., it's seems there are same problems in
other module'd interfaces such as pppoe(4), gre(4), and so on.
I think module framework could fix this problem, so I will ask
pgoyette@n.o and christos@n.o if they have any idea later.


>    +static int
>    +l2tp_clone_destroy(struct ifnet *ifp)
>    +{
>    +       struct l2tp_softc *sc = (void *) ifp;
> 
> Use container_of here:
> 
> 	struct l2tp_softc *sc = container_of(ifp, struct l2tp_softc,
> 	    l2tp_ec.ec_if);
> 
> No functional difference, but the compiler type-checks it.

I use container_of here and similar codes.

>    +void
>    +l2tp_input(struct mbuf *m, struct ifnet *ifp)
>    +{
>    +
>    +       KASSERT(ifp != NULL);
>    +
>    +       if (0 == (mtod(m, u_long) & 0x03)) {
>    +               /* copy and align head of payload */
>    +               struct mbuf *m_head;
>    +               int copy_length;
>    +
>    +#define L2TP_COPY_LENGTH               60
>    +#define L2TP_LINK_HDR_ROOM     (MHLEN - L2TP_COPY_LENGTH - 4/*round4(2)*/)
>    +
>    +               if (m->m_pkthdr.len < L2TP_COPY_LENGTH) {
>    +                       copy_length = m->m_pkthdr.len;
>    +               } else {
>    +                       copy_length = L2TP_COPY_LENGTH;
>    +               }
>    +
>    +               if (m->m_len < copy_length) {
>    +                       m = m_pullup(m, copy_length);
>    +                       if (m == NULL)
>    +                               return;
>    +               }
>    +
>    +               MGETHDR(m_head, M_DONTWAIT, MT_HEADER);
>    +               if (m_head == NULL) {
>    +                       m_freem(m);
>    +                       return;
>    +               }
>    +               M_COPY_PKTHDR(m_head, m);
>    +
>    +               m_head->m_data += 2 /* align */ + L2TP_LINK_HDR_ROOM;
>    +               memcpy(m_head->m_data, m->m_data, copy_length);
>    +               m_head->m_len = copy_length;
>    +               m->m_data += copy_length;
>    +               m->m_len -= copy_length;
>    +
>    +               /* construct chain */
>    +               if (m->m_len == 0) {
>    +                       m_head->m_next = m_free(m); /* not m_freem */
>    +               } else {
>    +                       /*
>    +                        * copyed mtag in previous call M_COPY_PKTHDR
>    +                        * but don't delete mtag in case cutt of M_PKTHDR flag
>    +                        */
>    +                       m_tag_delete_chain(m, NULL);
>    +                       m->m_flags &= ~M_PKTHDR;
>    +                       m_head->m_next = m;
>    +               }
>    +
>    +               /* override m */
>    +               m = m_head;
>    +       }
> 
> Someone more familiar with the mbuf API than I should review this mbuf
> juggling show!

I also want someone() to do so...


>    +       case SIOCSIFMTU:
>    +               error = kauth_authorize_generic(kauth_cred_get(),
>    +                   KAUTH_GENERIC_ISSUSER, NULL);
> 
> Why the kauth check here and not in any other drivers?  Is this kauth
> check unnecessary, or does its absence in other drivers indicate a
> bug?  Likewise in a few other places below.

Oh, it must be vestige of old version kernel's manner. I remove it.


>    +               if (error)
>    +                       break;
>    +               switch (cmd) {
>    +#ifdef INET
>    +               case SIOCSIFPHYADDR:
>    +                       src = (struct sockaddr *)
>    +                               &(((struct in_aliasreq *)data)->ifra_addr);
>    +                       dst = (struct sockaddr *)
>    +                               &(((struct in_aliasreq *)data)->ifra_dstaddr);
> 
> Consider using one more local variable instead of multiple levels of
> nesting?
>
> case SIOCSIFPHYADDR: {
> 	struct in_aliasreq *aliasreq = data;
> 	src = (struct sockaddr *)&aliasreq->ifra_data;
> 	dst = (struct sockaddr *)&aliasreq->ifra_dstaddr;
> 	...
> }
> 
> Likewise in a few other places below.

Hmm, I think separating SIOCSIFPHYADDR, SIOCSIFPHYADDR_IN6 and SIOCSLIFPHYADDR
cases makes simpler. I refactor such way.


>    +       error = encap_lock_enter();
>    +       if (error)
>    +               goto error;
>    +
>    +       mutex_enter(&sc->l2tp_lock);
> 
> Document lock order of encap_lock ---> struct l2tp_softc::l2tp_lock?

I missed it. I add locking order at the end if_l2tp.h.


>    +       ovar = sc->l2tp_var;
>    +       osrc = ovar->lv_psrc;
>    +       odst = ovar->lv_pdst;
>    +       memcpy(nvar, ovar, sizeof(*nvar));
> 
> You can just do
> 
> 	*nvar = *ovar;
> 
> here, since they are both guaranteed to be aligned.

I fix it.


>    +static int id_hash_func(uint32_t id)
>    +{
>    +       uint32_t hash;
>    +
>    +       hash = (id >> 16) ^ id;
>    +       hash = (hash >> 4) ^ hash;
>    +
>    +       return hash & l2tp_id_hash_mask;
>    +}
> 
> Is this hash function an essential part of the l2tp protocol, or is it
> just something that will more likely involve all the bits of id when
> masking with l2tp_id_hash_mask?  (Asking so I can know whether it is
> safe to replace by, e.g., siphash later, once I get around to adding
> the siphash code I've been sitting on for about five years now.)

It is not essential of L2TPv3 protocol. So, it is safe to replace
other functions :)
# The session id would be random value set by userland command/daemon.
# And then, kernel just search correct l2tp_softc by the session id.


>    +/*
>    + * l2tp_variant update API.
>    + *
>    + * Assumption:
>    + * reader side dereferences sc->l2tp_var in reader critical section only,
>    + * that is, all of reader sides do not reader the sc->l2tp_var after
>    + * pserialize_perform().
>    + */
>    +static void
>    +l2tp_variant_update(struct l2tp_softc *sc, struct l2tp_variant *nvar)
>    +{
>    +       struct ifnet *ifp = &sc->l2tp_ec.ec_if;
>    +       struct l2tp_variant *ovar = sc->l2tp_var;
>    +
>    +       KASSERT(mutex_owned(&sc->l2tp_lock));
>    +
>    +       membar_producer();
>    +       atomic_swap_ptr(&sc->l2tp_var, nvar);
>    +       pserialize_perform(l2tp_psz);
>    +       psref_target_destroy(&ovar->lv_psref, lv_psref_class);
> 
> No need for atomic_swap_ptr.  Just
> 
>         sc->l2tp_var = nvar;
> 
> is enough.  Nobody else can write to it because we hold the lock.

Between writer and writer, it is correct. However, between writer and
reader, I think atomic_swap_ptr is required to prevent reader's load
before writer's store done. Is this correct?


>    diff --git a/sys/net/if_l2tp.h b/sys/net/if_l2tp.h
>    new file mode 100644
>    index 0000000..1aae23c
>    --- /dev/null
>    +++ b/sys/net/if_l2tp.h
>    @@ -0,0 +1,206 @@
>    [...]
>    +#include <net/if_ether.h>
>    +#include <netinet/in.h>
>    +/* xxx sigh, why route have struct route instead of pointer? */
> 
> Unclear what this comment refers to?

Sorry, garbage comment.
# It is vestige of original file...

>    +
>    +#define SIOCSL2TPSESSION       _IOW('i', 151, struct ifreq)
>    +#define        SIOCDL2TPSESSION _IOW('i', 152, struct ifreq)
>    +#define SIOCSL2TPCOOKIE        _IOW('i', 153, struct ifreq)
>    +#define        SIOCDL2TPCOOKIE _IOW('i', 154, struct ifreq)
>    +#define        SIOCSL2TPSTATE  _IOW('i', 155, struct ifreq)
>    +#define SIOCGL2TP      SIOCGIFGENERIC
> 
> Pick tabs or spaces and be consistent?  (Makes diffs look nicer.
> Usual rule is `#define<TAB>xyz<TAB>'.)

I apply "#define<TAB>xyz<TAB>" rule to if_l2tp.h and in_l2tp.h.

> Say struct l2tp_req, not struct ifreq, if that's what you mean?

I had to miss modification after copy and paste...

>    +struct l2tp_req {
>    +       int state;
>    +       int my_cookie_len;
>    +       int peer_cookie_len;
> 
> Pick a fixed-width unsigned integer type for this unless you actually
> need negative values?

They are not required negative values, I use unsigned type. 


>    +#ifdef _KERNEL
>    +extern struct psref_class *lv_psref_class __read_mostly;
> 
> The __read_mostly attribute matters only for definitions, I believe.

Oh, I remove unnecessary attribute.


>    +struct l2tp_softc {
>    +       struct ethercom l2tp_ec;        /* common area - must be at the top */
>    +                               /* to use ether_input(), we must have this */
>    +       percpu_t *l2tp_ro_percpu;
> 
> Mark this with what the type of the per-CPU object is.  For example,
> 
> 	percpu_t	*l2tp_ro_percpu;	/* struct l2tp_ro */
> 
> (Obviously this is not as good for type checking as percpu<l2tp_ro> in
> C++ or similar, but it's better than nothing for the reader's sake.)

Ok, I add that comment.


>    +static inline bool
>    +l2tp_heldref_variant(struct l2tp_variant *var)
>    +{
>    +
>    +       if (var == NULL)
>    +               return false;
>    +       return psref_held(&var->lv_psref, lv_psref_class);
>    +}
> 
> Both users of this first do KASSERT(var != NULL), so there's no need
> for the conditional `if (var == NULL)' here.

I remove unnecessary condition.


>    +/* Prototypes */
>    +void l2tpattach(int);
>    +void l2tpattach0(struct l2tp_softc *);
>    +void l2tp_input(struct mbuf *, struct ifnet *);
>    +int l2tp_ioctl(struct ifnet *, u_long, void *);
>    +
>    +struct l2tp_variant* l2tp_lookup_session_ref(uint32_t, struct psref *);
> 
> KNF:	struct l2tp_variant *l2tp_lookup_session_ref(uint32_t, struct psref *);

Ah, I missed it...


>    +/*
>    + * Locking notes:
>    + * + l2tp_softc_list is protected by l2tp_list_lock (an adaptive mutex)
>    + *       l2tp_softc_list is list of all l2tp_softcs, and it is used to avoid
>    + *       wrong unload.
> 
> Instead of `wrong unload', maybe `unload while busy' or something?

Yes, I fix my poor English wording. :)


>    + * + l2tp_hashed_list is protected by
>    + *   - l2tp_hash_lock (an adaptive mutex) for writer
>    + *   - pserialize for reader
>    + *       l2tp_hashed_list is hashed list of all l2tp_softcs, and it is used by
>    + *       input processing to find appropriate softc.
>    + * + l2tp_softc->l2tp_var is protected by
>    + *   - l2tp_softc->l2tp_lock (an adaptive mutex) for writer
>    + *   - l2tp_var->lv_psref for reader
>    + *       l2tp_softc->l2tp_var is used for variant values while the l2tp tunnel
>    + *       exists.
> 
> This looks great!  Can you also state any lock order constraints here?
> If the only constraint is that no pair of these locks is ever held
> simultaneously, so be it -- say that too.  It looks like encap_lock
> needs to be mentioned, though.

I add lock order comment. I found I forgot description about
struct l2tp_ro->lr_lock, so I add it, too.


>    diff --git a/sys/netinet/in_l2tp.c b/sys/netinet/in_l2tp.c
>    new file mode 100644
>    index 0000000..9b2ccd6
>    --- /dev/null
>    +++ b/sys/netinet/in_l2tp.c
>    @@ -0,0 +1,417 @@
>    [...]
>    +int
>    +in_l2tp_output(struct l2tp_variant *var, struct mbuf *m)
>    +{
>    [...]
>    +       bzero(&iphdr, sizeof(iphdr));
> 
> Use memset, not bzero.

Ahhhhhh, it is replacement leakage.
# original implementation was made for old version...


>    +               if (var->lv_peer_cookie_len == 4) {
>    +                       cookie_32 = htonl((uint32_t)var->lv_peer_cookie);
>    +                       memcpy(mtod(m, uint32_t *), &cookie_32,
>    +                           sizeof(uint32_t));
> 
> I have the impression that mtod(m, T *) is supposed to be used only
> when m is actually aligned for a T.  Most uses of memcpy(mtod(m, T *),
> ...) use void or uint8_t:
> 
> 	memcpy(mtod(m, void *), &cookie_32, sizeof(uint32_t));
> 
> I would suggest doing that, in case anyone ever makes mtod check
> alignment -- unless you can guarantee alignment, in which case you can
> just do
> 
> 	*mtod(m, uint32_t *) = cookie_32;

I see. I use memcpy(mtod(m, void *), ...).


>    +       error = ip_output(m, NULL, &lro->lr_ro, 0, NULL, NULL);
>    +       mutex_exit(&lro->lr_lock);
>    +       percpu_putref(sc->l2tp_ro_percpu);
> 
> Hope it's safe to call ip_output with this lock held!  Is it easy to
> prove that ip_output can only at worst put the mbuf on a queue, or
> that if it recursively calls in_l2tp_output, the recursion detection
> will prevent locking against myself?

Sorry to say, it cannot. Because if we call ip_output() without
lro->lc_lock, concurrent execution of l2tp output softint and
dad timer softint in the same CPU cause panic. That is,
    + begin dad timer processing (the lwp is softclk/0)
      - in_l2tp_output()
        - rtcache_lookup()
          - hold struct ro->ro_psref
        - call ip_output()
    + hardware interrupt arises (would be ethernet Rx interrupt)
    + call softint handlers by fast softints
    + begin l2tp output processing (the lwp is softnet/0)
      - in_l2tp_output()
        - rtcache_lookup()
          - hold struct ro->ro_psref
        - call ip_output()
    + hardware interrupt arises
    + resume dad timer processing
      - resume ip_output()
      - release struct ro->ro_psref
        - failure assertion in psref_release()'s KASSERTMSG((psref->psref_lwp == curlwp)

At least, locking against myself by calling in_l2tp_output() recursively
is prevent by setting MAX_L2TP_NEST to 0.


>    diff --git a/sys/netinet/in_proto.c b/sys/netinet/in_proto.c
>    index 5534847..e318a7b 100644
>    --- a/sys/netinet/in_proto.c
>    +++ b/sys/netinet/in_proto.c
>    @@ -360,6 +360,16 @@ const struct protosw inetsw[] = {
>            .pr_init = carp_init,
>     },
>     #endif /* NCARP > 0 */
>    +{      .pr_type = SOCK_RAW,
>    +       .pr_domain = &inetdomain,
> 
> Should this be conditional on NL2TP > 0?

I think no. To build and load libl2tp of rump, libinet should not
depend to libl2tp.


Thanks,

-- 
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA <k-nakahara%iij.ad.jp@localhost>



Home | Main Index | Thread Index | Old Index