tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[PATCH] TCP SYN cookie implementation for NetBSD



Attached is a patch to NetBSD 5.99.54 which adds optional TCP SYN cookies to the TCP stack. This implementation follows closely that of FreeBSD 9.0. My limited testing has gone well, so I'm hoping to get a wider audience and more developer eyes and feedback at this point. The patch is NOT production ready.

The cookies are optional, controlled by the new sysctl setting net.inet.tcp.syn_cookies. When set to 1, TCP SYN cookies will be sent as replies to all received SYN packets and the SYN cache is not used. The default setting for this sysctl of 0 reverts to the old behavior (though all incoming final ACKs will be checked for valid cookies).

A brief summary of the changes:

* Add sysctl setting to enable cookies.
* Re-factor part of syn_cache_get() into syn_cache_promote() to allow code re-use between SYN cache replies and cookie replies.
* Create and initialize a data structure containing cookie secrets.
* Add functions for (re)generating secrets, creating cookies, and validating cookies. * Modify tcp_input() to check all new ACKs for cookies (and possibly create full connections for them), and to check the sysctl to decide whether to send a cached SYN reply or a cookie reply.

Similar to the FreeBSD implementation, some connection metadata is "lost" when using SYN cookies and thus they are not a perfect replacement for the SYN cache. Most notably, the exact MSS sent by the client is not kept, merely an approximation (subject to the entries in the tcp_sc_msstab table).

The following items are not stored by the stack, but can be inferred to be supported they appear in the final ACK:

* TCP timestamps
* TCP selective ack (unlikely to be detected)
* TCP ECN
* TCP signatures

Currently, SYN cookie generation and validation are done under it's own mutex. It could easily be done under splsoftnet(), but that seemed heavy handed. I don't know enough about where locking is going in the kernel and I'd like some feedback on the best way to handle this.

The following is a rough to-do list for the code:

* Clean up my comments (some are notes to myself, some are features that are currently unimplemented). * A few comments mention things I don't fully understand (how to handle route caching, etc). Those problems should be figured out and the comments updated or removed. * Store (and recover) additional connection metadata in the TCP timestamp field, similar to FreeBSD.
* Finalize locking (see above).
* Detect and enable features found in packets outside of the final ACK (ie, enable selective ACK as soon as we see one from the client).
* Auto-enable SYN cookies when the SYN cache fills up.
* Add a (build time? run time?) setting to disable the SYN cache and free its memory, replying always with TCP SYN cookies (possibly fulfilling this requirement http://wiki.netbsd.org/projects/project/optional_tcp_syncache/ ?).

Bugs:

Currently the temporary SYN cache entry used to promote a final ACK into a full blown connection is dynamically allocated. I wanted to stack allocate this struct but when I do I get a seemingly unrelated panic later on in tcp_output(). My guess is that the temporary SYN cache struct is being somehow used after the connection is promoted, and by allocating it on the stack the data is overwritten after the function returns. Unfortunately I haven't been able to track down what's causing this, and even overwriting the dynamically allocated memory with garbage data before returning doesn't trigger the panic. For now the code allocates the entry dynamically and there are no panics, but there could be a bug lurking somewhere.

Thanks to agc for his assistance.

John
diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c
index 3055490..d637804 100644
--- a/sys/netinet/tcp_input.c
+++ b/sys/netinet/tcp_input.c
@@ -167,9 +167,7 @@ __KERNEL_RCSID(0, "$NetBSD: tcp_input.c,v 1.314 2011/05/25 
23:20:57 gdt Exp $");
 #include <sys/pool.h>
 #include <sys/domain.h>
 #include <sys/kernel.h>
-#ifdef TCP_SIGNATURE
 #include <sys/md5.h>
-#endif
 #include <sys/lwp.h> /* for lwp0 */
 
 #include <net/if.h>
@@ -243,6 +241,7 @@ __KERNEL_RCSID(0, "$NetBSD: tcp_input.c,v 1.314 2011/05/25 
23:20:57 gdt Exp $");
 
 int    tcprexmtthresh = 3;
 int    tcp_log_refused;
+int    tcp_syn_cookies;
 
 int    tcp_do_autorcvbuf = 1;
 int    tcp_autorcvbuf_inc = 16 * 1024;
@@ -254,6 +253,38 @@ static struct timeval tcp_rst_ppslim_last;
 static int tcp_ackdrop_ppslim_count = 0;
 static struct timeval tcp_ackdrop_ppslim_last;
 
+#define SYNCOOKIE_SECRET_SIZE 8 /* dwords */
+#define SYNCOOKIE_LIFETIME 16 /* seconds */
+
+static struct {
+       u_int     oddeven;
+       u_int32_t secbits_odd[SYNCOOKIE_SECRET_SIZE];
+       u_int32_t secbits_even[SYNCOOKIE_SECRET_SIZE];
+       u_int     reseed_time; /* time_uptime, seconds */
+       kmutex_t  updating_lock;
+} syn_cookie_secrets;
+
+/* MSS table inspired by FreeBSD. Used with SYN cookies sent to
+ * peers who do not support timestamps. In these cases, the only
+ * supported MSS values are the ones listed in this table. The
+ * largest MSS that does not exceed the client's requested MSS
+ * is chosen. Data should be gathered to determine the optimal
+ * set of values (ie, which 8 values are most popular while also
+ * adaquately covering narrow networks). As far as I can tell,
+ * the minimum MSS should be 536 (per RFC 791), but other
+ * operating systems support less than that. The minimum fragment
+ * size is 68 bytes (including IP header), and the maximum IP
+ * header size is 60 bytes, so the minimum I think we need to
+ * support is an 8 byte MSS. That's still unreasonably small, but
+ * it's conceivable one could derive that as a possible MSS from a
+ * (twisted) reading of the RFC.
+ *
+ * syn_cookie_check_seq returns the recovered MSS or 0 on failure.
+ * If at some point 0 becomes a valid MSS (is added to this table),
+ * that function will need to be changed.
+ */
+static u_int16_t tcp_sc_msstab[] = { 8, 256, 468, 536, 996, 1452, 1460, 8960 };
+
 #define TCP_PAWS_IDLE  (24U * 24 * 60 * 60 * PR_SLOWHZ)
 
 /* for modulo comparisons of timestamps */
@@ -1678,8 +1709,12 @@ findpcb:
                                         */
                                        goto badsyn;
                                } else if (tiflags & TH_ACK) {
+                                       struct socket *oso = so;
                                        so = syn_cache_get(&src.sa, &dst.sa,
                                                th, toff, tlen, so, m);
+                                       if (so == NULL)
+                                               so = 
syn_cookie_validate(&src.sa, &dst.sa,
+                                                       th, toff, tlen, oso, m, 
optp, optlen);
                                        if (so == NULL) {
                                                /*
                                                 * We don't have a SYN for
@@ -1861,10 +1896,17 @@ findpcb:
                                 * SYN looks ok; create compressed TCP
                                 * state for it.
                                 */
-                               if (so->so_qlen <= so->so_qlimit &&
-                                   syn_cache_add(&src.sa, &dst.sa, th, tlen,
-                                               so, m, optp, optlen, &opti))
-                                       m = NULL;
+                               if (so->so_qlen <= so->so_qlimit) {
+                                       if (tcp_syn_cookies) {
+                                               if (syn_cookie_reply(&src.sa, 
&dst.sa, th, tlen,
+                                                       so, m, optp, optlen, 
&opti))
+                                               m = NULL;
+                                       } else { /* not tcp_syn_cookies */
+                                           if (syn_cache_add(&src.sa, &dst.sa, 
th, tlen,
+                                                       so, m, optp, optlen, 
&opti))
+                                               m = NULL;
+                                       }
+                               }
                        }
                        goto drop;
                }
@@ -3699,6 +3741,11 @@ syn_cache_init(void)
        /* Initialize the hash buckets. */
        for (i = 0; i < tcp_syn_cache_size; i++)
                TAILQ_INIT(&tcp_syn_cache[i].sch_bucket);
+
+       /* Piggy-back on the syn cache initialization and initialize
+        * the syn cookie secrets. */
+       syn_cookie_secrets.reseed_time = 0;
+       mutex_init( &syn_cookie_secrets.updating_lock, MUTEX_DEFAULT, 
IPL_SOFTNET );
 }
 
 void
@@ -3948,14 +3995,8 @@ syn_cache_get(struct sockaddr *src, struct sockaddr *dst,
 {
        struct syn_cache *sc;
        struct syn_cache_head *scp;
-       struct inpcb *inp = NULL;
-#ifdef INET6
-       struct in6pcb *in6p = NULL;
-#endif
-       struct tcpcb *tp = 0;
-       struct mbuf *am;
        int s;
-       struct socket *oso;
+       struct socket *retval;
 
        s = splsoftnet();
        if ((sc = syn_cache_lookup(src, dst, &scp)) == NULL) {
@@ -3979,6 +4020,26 @@ syn_cache_get(struct sockaddr *src, struct sockaddr *dst,
        syn_cache_rm(sc);
        splx(s);
 
+       retval = syn_cache_promote(src, dst, th, hlen, tlen, so, m, sc);
+       s = splsoftnet();
+       syn_cache_put(sc);
+       splx(s);
+       return retval;
+}
+
+struct socket *
+syn_cache_promote(struct sockaddr *src, struct sockaddr *dst,
+    struct tcphdr *th, unsigned int hlen, unsigned int tlen,
+    struct socket *so, struct mbuf *m, struct syn_cache *sc)
+{
+       struct inpcb *inp = NULL;
+#ifdef INET6
+       struct in6pcb *in6p = NULL;
+#endif
+       struct tcpcb *tp = 0;
+       struct mbuf *am;
+       struct socket *oso;
+
        /*
         * Ok, create the full blown connection, and set things up
         * as they would have been set up if we had created the
@@ -4090,6 +4151,9 @@ syn_cache_get(struct sockaddr *src, struct sockaddr *dst,
        /*
         * Give the new socket our cached route reference.
         */
+       /* testing by je -- we don't save the route from the ip_output reply of 
the syn cookie
+          so what do we do here? can we somehow flag this socket to do a 
routing lookup the
+       next time it needs to send data? Or do the routing lookup now? */
        if (inp) {
                rtcache_copy(&inp->inp_route, &sc->sc_route);
                rtcache_free(&sc->sc_route);
@@ -4230,9 +4294,6 @@ syn_cache_get(struct sockaddr *src, struct sockaddr *dst,
        tp->t_dupacks = 0;
 
        TCP_STATINC(TCP_STAT_SC_COMPLETED);
-       s = splsoftnet();
-       syn_cache_put(sc);
-       splx(s);
        return (so);
 
 resetandabort:
@@ -4243,9 +4304,6 @@ abort:
                (void) soabort(so);
                mutex_enter(softnet_lock);
        }
-       s = splsoftnet();
-       syn_cache_put(sc);
-       splx(s);
        TCP_STATINC(TCP_STAT_SC_ABORTED);
        return ((struct socket *)(-1));
 }
@@ -4838,3 +4896,428 @@ syn_cache_respond(struct syn_cache *sc, struct mbuf *m)
        }
        return (error);
 }
+
+/* TCP SYN cookie implementation, based on FreeBSD 9.0's syn cookie
+ * implementation (minus the timestamp stuff):
+ *
+ * Initial sequence number we send:
+ * 31|................................|0
+ * DDDDDDDDDDDDDDDDDDDDDDDDDMMMRRRP
+ * D = MD5 Digest (first dword)
+ * M = MSS index
+ * R = Rotation of secret
+ * P = Odd or Even secret
+ *
+ * The MD5 Digest is computed with over following parameters:
+ * a) randomly rotated secret
+ * b) struct in_conninfo containing the remote/local ip/port (IPv4&IPv6)
+ * c) the received initial sequence number from remote host
+ * d) the rotation offset and odd/even bit
+ *
+ * Note:
+ *
+ * For some reason we get a panic if we allocate sc on the stack rather
+ * than dynamically (even if we zero out and free a dynamic sc later on). I
+ * can't figure out why this is. It would probably be slightly faster to have
+ * sc live on the stack rather than be dynamically allocated, but it doesn't
+ * matter so much.
+ *
+ * TODO:
+ *
+ * Don't bother setting variables syn_cache_reply won't check.
+ *
+ * Figure out what to do with route cache stuff?
+ *
+ * --je
+ */
+int
+syn_cookie_reply(struct sockaddr *src, struct sockaddr *dst, struct tcphdr *th,
+    unsigned int hlen, struct socket *so, struct mbuf *m, u_char *optp,
+    int optlen, struct tcp_opt_info *oi)
+{
+       struct tcpcb tb, *tp;
+       long win;
+       struct syn_cache *sc;
+       struct mbuf *ipopts;
+       struct tcp_opt_info opti;
+       int s;
+
+       tp = sototcpcb(so);
+
+       memset(&opti, 0, sizeof(opti));
+
+       /*
+        * RFC1122 4.2.3.10, p. 104: discard bcast/mcast SYN
+        *
+        * Note this check is performed in tcp_input() very early on.
+        */
+
+       /*
+        * Initialize some local state.
+        */
+       win = sbspace(&so->so_rcv);
+       if (win > TCP_MAXWIN)
+               win = TCP_MAXWIN;
+
+       switch (src->sa_family) {
+#ifdef INET
+       case AF_INET:
+               /*
+                * Remember the IP options, if any.
+                */
+               ipopts = ip_srcroute();
+               break;
+#endif
+       default:
+               ipopts = NULL;
+       }
+
+#ifdef TCP_SIGNATURE
+       if (optp || (tp->t_flags & TF_SIGNATURE))
+#else
+       if (optp)
+#endif
+       {
+               tb.t_flags = tcp_do_rfc1323 ? (TF_REQ_SCALE|TF_REQ_TSTMP) : 0;
+#ifdef TCP_SIGNATURE
+               tb.t_flags |= (tp->t_flags & TF_SIGNATURE);
+#endif
+               tb.t_state = TCPS_LISTEN;
+               if (tcp_dooptions(&tb, optp, optlen, th, m, m->m_pkthdr.len -
+                   sizeof(struct tcphdr) - optlen - hlen, oi) < 0)
+                       return (0);
+       } else
+               tb.t_flags = 0;
+
+       s = splsoftnet();
+       sc = pool_get(&syn_cache_pool, PR_NOWAIT);
+       splx(s);
+       if (sc == NULL) {
+               if (ipopts)
+                       (void) m_free(ipopts);
+               return (0);
+       }
+
+       /*
+        * Fill in the cache, and put the necessary IP and TCP
+        * options into the reply.
+        */
+       memset(sc, 0, sizeof(struct syn_cache));
+       /* We don't touch sc->sc_timer because syn_cache_respond doesn't 
either. */
+       bcopy(src, &sc->sc_src, src->sa_len);
+       bcopy(dst, &sc->sc_dst, dst->sa_len);
+       sc->sc_flags = 0;
+       sc->sc_ipopts = ipopts;
+       sc->sc_irs = th->th_seq;
+
+       sc->sc_iss = syn_cookie_generate_seq(src, dst, oi->maxseg, th->th_seq);
+
+       sc->sc_peermaxseg = oi->maxseg;
+       sc->sc_ourmaxseg = tcp_mss_to_advertise(m->m_flags & M_PKTHDR ?
+                                               m->m_pkthdr.rcvif : NULL,
+                                               sc->sc_src.sa.sa_family);
+       sc->sc_win = win;
+       sc->sc_timebase = tcp_now - 1;  /* see tcp_newtcpcb() */
+       sc->sc_timestamp = tb.ts_recent;
+//     sc->sc_timestamp = (oi->maxseg << 16)
+//                     | ((th->th_flags & (TH_ECE|TH_CWR) ? 1 : 0) << 4)
+//                     | ((tb.t_flags & TF_SACK_PERMIT ? 1 : 0) << 5)
+//                     | ((tb.t_flags & (TF_REQ_TSTMP|TF_RCVD_TSTMP) ? 1 : 0) 
<< 6)
+//                     | (tb.requested_s_scale & 0xF);
+
+       if ((tb.t_flags & (TF_REQ_TSTMP|TF_RCVD_TSTMP)) ==
+           (TF_REQ_TSTMP|TF_RCVD_TSTMP))
+               sc->sc_flags |= SCF_TIMESTAMP;
+
+       if ((tb.t_flags & (TF_RCVD_SCALE|TF_REQ_SCALE)) ==
+           (TF_RCVD_SCALE|TF_REQ_SCALE)) {
+               sc->sc_requested_s_scale = tb.requested_s_scale;
+               sc->sc_request_r_scale = 0;
+               /*
+                * Pick the smallest possible scaling factor that
+                * will still allow us to scale up to sb_max.
+                *
+                * We do this because there are broken firewalls that
+                * will corrupt the window scale option, leading to
+                * the other endpoint believing that our advertised
+                * window is unscaled.  At scale factors larger than
+                * 5 the unscaled window will drop below 1500 bytes,
+                * leading to serious problems when traversing these
+                * broken firewalls.
+                *
+                * With the default sbmax of 256K, a scale factor
+                * of 3 will be chosen by this algorithm.  Those who
+                * choose a larger sbmax should watch out
+                * for the compatiblity problems mentioned above.
+                *
+                * RFC1323: The Window field in a SYN (i.e., a <SYN>
+                * or <SYN,ACK>) segment itself is never scaled.
+                */
+               while (sc->sc_request_r_scale < TCP_MAX_WINSHIFT &&
+                   (TCP_MAXWIN << sc->sc_request_r_scale) < sb_max)
+                       sc->sc_request_r_scale++;
+       } else {
+               sc->sc_requested_s_scale = 15;
+               sc->sc_request_r_scale = 15;
+       }
+       if ((tb.t_flags & TF_SACK_PERMIT) && tcp_do_sack)
+               sc->sc_flags |= SCF_SACK_PERMIT;
+
+       /*
+        * ECN setup packet recieved.
+        */
+       if ((th->th_flags & (TH_ECE|TH_CWR)) && tcp_do_ecn)
+               sc->sc_flags |= SCF_ECN_PERMIT;
+
+#ifdef TCP_SIGNATURE
+       if (tb.t_flags & TF_SIGNATURE)
+               sc->sc_flags |= SCF_SIGNATURE;
+#endif
+       sc->sc_tp = tp;
+
+       if (syn_cache_respond(sc, m) == 0) {
+               uint64_t *tcps = TCP_STAT_GETREF();
+               /* XXX This isn't the right TCP_STAT to increment --je */
+               tcps[TCP_STAT_SNDACKS]++;
+               tcps[TCP_STAT_SNDTOTAL]++;
+               TCP_STAT_PUTREF();
+       } else {
+               TCP_STATINC(TCP_STAT_SC_DROPPED);
+       }
+
+       s = splsoftnet();
+       pool_put(&syn_cache_pool, sc);
+       splx(s);
+       return (1);
+}
+
+struct socket *
+syn_cookie_validate(struct sockaddr *src, struct sockaddr *dst,
+       struct tcphdr *th, unsigned int hlen, unsigned int tlen,
+       struct socket *so, struct mbuf *m, u_char *optp,
+       int optlen)
+{
+       struct syn_cache sc;
+       struct tcpcb tb, *tp;
+       struct mbuf *ipopts;
+       struct tcp_opt_info oi;
+       u_int16_t recovered_mss;
+       long ourwin;
+       struct socket *retval;
+
+       /* The return value will be 0 if we could not recover the
+        * original mss (meaning we couldn't decode the ISN,
+        * meaning the cookie is invalid.
+        */
+       if ((recovered_mss = syn_cookie_check_seq(src, dst, th)) == 0)
+               return NULL;
+
+       tp = sototcpcb(so);
+
+       memset(&oi, 0, sizeof(oi));
+
+       switch (src->sa_family) {
+#ifdef INET
+       case AF_INET:
+               /*
+                * Remember the IP options, if any.
+                */
+               ipopts = ip_srcroute();
+               break;
+#endif
+       default:
+               ipopts = NULL;
+       }
+
+       if (optp)
+       {
+               tb.t_state = TCPS_LISTEN;
+               if (tcp_dooptions(&tb, optp, optlen, th, m, m->m_pkthdr.len -
+                   sizeof(struct tcphdr) - optlen - hlen, &oi) < 0)
+                       return (0);
+       } else
+               tb.t_flags = 0;
+
+       /*
+        * Fill in the cache, and put the necessary IP and TCP
+        * options into the reply.
+        */
+       memset(&sc, 0, sizeof(struct syn_cache));
+       callout_init(&sc.sc_timer, CALLOUT_MPSAFE);
+       bcopy(src, &sc.sc_src, src->sa_len);
+       bcopy(dst, &sc.sc_dst, dst->sa_len);
+       sc.sc_flags = 0;
+       sc.sc_ipopts = ipopts;
+       // I'll be honest, I don't know what this is, but nothing seems to use 
it. --je
+       sc.sc_tp = NULL;
+
+       /* XXX testing --je
+        */
+       rtcache_setdst(&sc.sc_route, src);
+       rtcache_init(&sc.sc_route);
+
+       /* Get the original ISN by subtracting 1 from the final
+        * ACK's sequence number. XXX Will this fail if there is
+        * data in the first SYN packet? --je
+        */
+       sc.sc_irs = th->th_seq - 1;
+       sc.sc_iss = th->th_ack - 1;
+       sc.sc_peermaxseg = recovered_mss;
+
+       /* Assume the mss we sent in the syn+ack is the same as we
+        * generate here. Not necessarily true every time, so we could
+        * store the sent mss in the outgoing timestamp header if this
+        * causes problems. --je
+        */
+       sc.sc_ourmaxseg = tcp_mss_to_advertise(m->m_flags & M_PKTHDR ?
+                                               m->m_pkthdr.rcvif : NULL,
+                                               src->sa_family);
+
+       ourwin = sbspace(&so->so_rcv);
+       if (ourwin > TCP_MAXWIN)
+               ourwin = TCP_MAXWIN;
+       sc.sc_win = ourwin;
+       sc.sc_timebase = tcp_now - 1;   /* see tcp_newtcpcb() */
+       sc.sc_timestamp = tb.ts_recent;
+       if ((tb.t_flags & (TF_REQ_TSTMP|TF_RCVD_TSTMP)) ==
+           (TF_REQ_TSTMP|TF_RCVD_TSTMP))
+               sc.sc_flags |= SCF_TIMESTAMP;
+
+       sc.sc_requested_s_scale = 15;
+       sc.sc_request_r_scale = 15;
+
+#ifdef TCP_SIGNATURE
+       if (tb.t_flags & TF_SIGNATURE)
+               sc.sc_flags |= SCF_SIGNATURE;
+#endif
+
+       /* This is normally done via SYN_CACHE_TIMER_ARM, but
+        * since we never call that (or touch the cache timer
+        * at all, set it here manually. */
+       TCPT_RANGESET(sc.sc_rxtcur,
+               TCPTV_SRTTDFLT * tcp_backoff[sc.sc_rxtshift], TCPTV_MIN,
+               TCPTV_REXMTMAX);   
+
+       retval = syn_cache_promote(src, dst, th, hlen, tlen, so, m, &sc);
+
+       if (sc.sc_ipopts)
+               (void) m_free(sc.sc_ipopts);
+       rtcache_free(&sc.sc_route);
+       callout_destroy(&sc.sc_timer);
+
+       return retval;
+}
+
+inline void
+syn_cookie_regenerate_secrets(void)
+{
+       u_int32_t *secbits;
+       int i;
+
+       mutex_enter(&syn_cookie_secrets.updating_lock);
+       if (syn_cookie_secrets.reseed_time <= time_uptime) {
+               syn_cookie_secrets.oddeven = syn_cookie_secrets.oddeven ? 0 : 1;
+               secbits = syn_cookie_secrets.oddeven ?
+                       syn_cookie_secrets.secbits_odd : 
syn_cookie_secrets.secbits_even;
+               for (i = 0; i < SYNCOOKIE_SECRET_SIZE; i++)
+                       secbits[i] = arc4random();
+               syn_cookie_secrets.reseed_time = time_uptime + 
SYNCOOKIE_LIFETIME;
+       }
+       mutex_exit(&syn_cookie_secrets.updating_lock);
+}
+
+u_int32_t
+syn_cookie_generate_seq(struct sockaddr *src, struct sockaddr *dst,
+       u_int16_t mss, u_int32_t irs)
+{
+       u_int8_t msstab_entry;
+       u_int32_t new_seq;
+       u_int32_t *secbits;
+       MD5_CTX ctx;
+       u_int32_t md5_buffer[MD5_DIGEST_LENGTH / sizeof(u_int32_t)];
+       u_int off;
+
+       syn_cookie_regenerate_secrets();
+       off = arc4random() & 0x7;
+
+       MD5Init(&ctx);
+       for (msstab_entry = 0; msstab_entry < 7; msstab_entry++)
+               if (mss < tcp_sc_msstab[msstab_entry + 1])
+                       break;
+       new_seq = ( off << 1 ) | ( msstab_entry << 4 );
+
+       mutex_enter(&syn_cookie_secrets.updating_lock);
+
+       secbits = syn_cookie_secrets.oddeven ?
+               syn_cookie_secrets.secbits_odd : 
syn_cookie_secrets.secbits_even;
+
+       new_seq |= syn_cookie_secrets.oddeven;
+
+       MD5Update(&ctx, ((char *) secbits) + off,
+               SYNCOOKIE_SECRET_SIZE * sizeof(*secbits) - off);
+       MD5Update(&ctx, (char *) secbits, off);
+
+       mutex_exit(&syn_cookie_secrets.updating_lock);
+
+       MD5Update(&ctx, (char *) src, src->sa_len );
+       MD5Update(&ctx, (char *) dst, dst->sa_len );
+
+       MD5Update(&ctx, (char *) &irs, sizeof( irs ) );
+       MD5Update(&ctx, (char *) &new_seq, sizeof( new_seq ) );
+       MD5Final((char *) &md5_buffer, &ctx);
+       new_seq |= md5_buffer[0] << 7;
+
+       return new_seq;
+}
+
+u_int16_t
+syn_cookie_check_seq(struct sockaddr *src, struct sockaddr *dst,
+       struct tcphdr *th)
+{
+       u_int32_t ack = th->th_ack - 1;
+       u_int32_t seq = th->th_seq - 1;
+       u_int32_t recovered_oddeven = ack & 0x1;
+       MD5_CTX ctx;
+       u_int32_t *secbits;
+       u_int32_t md5_buffer[MD5_DIGEST_LENGTH / sizeof(u_int32_t)];
+       u_int8_t recovered_msstab_entry;
+       u_int off;
+       u_int32_t flags;
+
+       off = (ack >> 1) & 0x7;
+       recovered_msstab_entry = (ack >> 4) & 0x7;
+       flags = ack & 0x7F;
+
+       mutex_enter(&syn_cookie_secrets.updating_lock);
+
+       /* If the secrets haven't been updated in a lifetime then we know we 
haven't
+        * sent out any cookies recently and this ACK is too old. */
+       if (syn_cookie_secrets.reseed_time + SYNCOOKIE_LIFETIME < time_uptime) {
+               mutex_exit(&syn_cookie_secrets.updating_lock);
+               return 0;
+       }
+
+       secbits = recovered_oddeven ?
+               syn_cookie_secrets.secbits_odd : 
syn_cookie_secrets.secbits_even;
+
+       MD5Init(&ctx);
+       MD5Update(&ctx, ((char *) secbits) + off,
+               SYNCOOKIE_SECRET_SIZE * sizeof(*secbits) - off);
+       MD5Update(&ctx, (char *) secbits, off);
+       mutex_exit(&syn_cookie_secrets.updating_lock);
+
+       MD5Update(&ctx, (char *) src, src->sa_len );
+       MD5Update(&ctx, (char *) dst, dst->sa_len );
+
+       MD5Update(&ctx, (char *) &seq, sizeof( seq ) );
+       MD5Update(&ctx, (char *) &flags, sizeof( flags ) );
+       MD5Final((char *) &md5_buffer, &ctx);
+
+
+       if ((ack & (~0x7F)) != (md5_buffer[0] << 7)) {
+               return 0;
+       }
+
+       return tcp_sc_msstab[recovered_msstab_entry];
+}
+
diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c
index b4cfce0..3cb02c3 100644
--- a/sys/netinet/tcp_usrreq.c
+++ b/sys/netinet/tcp_usrreq.c
@@ -1894,6 +1894,12 @@ sysctl_net_inet_tcp_setup2(struct sysctllog **clog, int 
pf, const char *pfname,
                       SYSCTL_DESCR("Ticks before initial tcp connection times 
out"),
                       sysctl_tcp_keep, 0, &tcp_keepinit, 0,
                       CTL_NET, pf, IPPROTO_TCP, CTL_CREATE, CTL_EOL);
+       sysctl_createv(clog, 0, NULL, NULL,
+                      CTLFLAG_PERMANENT|CTLFLAG_READWRITE,
+                      CTLTYPE_INT, "syn_cookies",
+                      SYSCTL_DESCR("Enable sending of TCP SYN cookies"),
+                      sysctl_tcp_keep, 0, &tcp_syn_cookies, 0,
+                      CTL_NET, pf, IPPROTO_TCP, CTL_CREATE, CTL_EOL);
 
        /* TCP socket buffers auto-sizing nodes */
        sysctl_createv(clog, 0, NULL, NULL,
diff --git a/sys/netinet/tcp_var.h b/sys/netinet/tcp_var.h
index 0b55a53..0986150 100644
--- a/sys/netinet/tcp_var.h
+++ b/sys/netinet/tcp_var.h
@@ -744,7 +744,8 @@ struct syn_cache_head {
 #define        TCPCTL_DEBX             32      /* # of tcp debug sockets */
 #define        TCPCTL_DROP             33      /* drop tcp connection */
 #define        TCPCTL_MSL              34      /* Max Segment Life */
-#define        TCPCTL_MAXID            35
+#define        TCPCTL_SYN_COOKIES      35      /* use tcp syn cookies */
+#define        TCPCTL_MAXID            36
 
 #define        TCPCTL_NAMES { \
        { 0, 0 }, \
@@ -782,6 +783,7 @@ struct syn_cache_head {
        { "debx", CTLTYPE_INT }, \
        { "drop", CTLTYPE_STRUCT }, \
        { "msl", CTLTYPE_INT }, \
+       { "syn_cookies", CTLTYPE_INT }, \
 }
 
 #ifdef _KERNEL
@@ -806,6 +808,7 @@ extern      int tcp_syn_bucket_limit;/* max entries per 
hash bucket */
 extern int tcp_log_refused;    /* log refused connections */
 extern int tcp_do_ecn;         /* TCP ECN enabled/disabled? */
 extern int tcp_ecn_maxretries; /* Max ECN setup retries */
+extern int tcp_syn_cookies;    /* use tcp syncookies */
 #if NRND > 0
 extern int tcp_do_rfc1948;     /* ISS by cryptographic hash */
 #endif
@@ -880,6 +883,14 @@ extern int tcp_autosndbuf_max;
        { 1, 0, &tcp_delack_ticks },            \
        { 1, 0, &tcp_init_win_local },          \
        { 1, 0, &tcp_ackdrop_ppslim },          \
+       { 0 },                                  \
+       { 0 },                                  \
+       { 0 },                                  \
+       { 0 },                                  \
+       { 0 },                                  \
+       { 0 },                                  \
+       { 0 },                                  \
+       { 1, 0, &syn_cookies },         \
 }
 
 struct secasvar;
@@ -977,11 +988,23 @@ void       tcp_statadd(u_int, uint64_t);
 int     syn_cache_add(struct sockaddr *, struct sockaddr *,
                struct tcphdr *, unsigned int, struct socket *,
                struct mbuf *, u_char *, int, struct tcp_opt_info *);
+int     syn_cookie_reply(struct sockaddr *, struct sockaddr *,
+               struct tcphdr *, unsigned int, struct socket *,
+               struct mbuf *, u_char *, int, struct tcp_opt_info *);
 void    syn_cache_unreach(const struct sockaddr *, const struct sockaddr *,
           struct tcphdr *);
 struct socket *syn_cache_get(struct sockaddr *, struct sockaddr *,
                struct tcphdr *, unsigned int, unsigned int,
                struct socket *so, struct mbuf *);
+struct socket *syn_cache_promote(struct sockaddr *, struct sockaddr *,
+               struct tcphdr *, unsigned int, unsigned int,
+               struct socket *so, struct mbuf *, struct syn_cache *);
+struct socket *syn_cookie_validate(struct sockaddr *, struct sockaddr *,
+               struct tcphdr *, unsigned int, unsigned int,
+               struct socket *so, struct mbuf *, u_char *, int);
+u_int32_t      syn_cookie_generate_seq(struct sockaddr *, struct sockaddr *, 
u_int16_t, u_int32_t);
+u_int16_t      syn_cookie_check_seq(struct sockaddr *, struct sockaddr *, 
struct tcphdr *);
+void   syn_cookie_regenerate_secrets(void);
 void    syn_cache_init(void);
 void    syn_cache_insert(struct syn_cache *, struct tcpcb *);
 struct syn_cache *syn_cache_lookup(const struct sockaddr *, const struct 
sockaddr *,


Home | Main Index | Thread Index | Old Index