kern/58560: experimental wg(4) handshake retry doesn't add timing jitter

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/58560: experimental wg(4) handshake retry doesn't add timing jitter
From: campbell+netbsd%mumble.net@localhost
Date: Wed, 7 Aug 2024 14:40:01 +0000 (UTC)

>Number:         58560
>Category:       kern
>Synopsis:       experimental wg(4) handshake retry doesn't add timing jitter
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 07 14:40:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10
>Organization:
The Jitterbug Wgdation
>Environment:
>Description:
From the technical whitepaper (https://www.wireguard.com/papers/wireguard.pdf):

[W] 6.1 Timers & Stateless UX: Preliminaries
"Whenever a handshake initiation message is sent as the result of an expiring timer, an additional amount of jitter is added to the expiration, in order to prevent two peers from repeatedly initiating handshakes at the same time."

This leaves it entirely unspecified what the distribution of the delay is, so really we do implement this already by simply not having a perfect clock behind callout_schedule.

But maybe the intent is to have much higher variance in the actual timing difference -- say, an exponential distribution averaging about 1/16 sec truncated at 1 sec.
>How-To-Repeat:
code inspection
>Fix:
Add a random delay when scheduling wgp_handshake_timeout_timer.

Here's a candidate implementation.  I'm not committing it just yet because it will require adjusting the tests that currently rely on timeouts precise to the second to exercise various paths in the state machine -- they will need to have more slop added to avoid spuriously failing.

/*
 * wg_jittery_timeout(n)
 *
 *	Return a random timeout of at most n + 1 seconds, and at most
 *	INT_MAX ticks, and, unless n is too large for INT_MAX ticks, at
 *	least n seconds.
 *
 *	There is no guidance on the distribution of jitter, so we'll
 *	approximate an exponential distribution with average 1/16 sec,
 *	truncated to 1 sec, to add to n sec.
 *
 *	[W] 6.1 Timers & Stateless UX: Preliminaries
 *	"Whenever a handshake initiation message is sent as the result
 *	 of an expiring timer, an additional amount of jitter is added
 *	 to the expiration, in order to prevent two peers from
 *	 repeatedly initiating handshakes at the same time."
 */
static int
wg_jittery_timeout(int nsec)
{
	/*
	 * Sample g from a truncated Geometric(1/2) distribution,
	 * truncated to MIN(hz, 32).
	 *
	 * ffs32 is 1-based, with ffs32(0) = 0, so ffs32(0) - 1 = -1 =
	 * UINT_MAX.  For nonzero x, ffs32(x) - 1 lies in {0, 1, 2,
	 * ..., 31}.  Hence g = MIN(ffs32(x) - 1, MIN(hz, 32)) lies in
	 * {0, 1, 2, ..., MIN(hz, 32)}; UINT_MAX is clamped to the
	 * largest point, MIN(hz, 32).
	 *
	 * Since g <= MIN(hz, 32), we have g/MIN(hz, 32) <= 1, so
	 *
	 *	hz*(g/MIN(hz, 32)) <= hz,
	 *
	 * and hence the delay d that we actually compute is bounded
	 * by:
	 *
	 *	d := g*floor(hz/MIN(hz, 32))
	 *	  <= g*(hz/MIN(hz, 32))
	 *	   = hz*(g/MIN(hz, 32))
	 *	  <= hz.
	 */
	const uint32_t x = cprng_fast32();
	const unsigned g = MIN(ffs32(x) - 1, MIN(hz, 32));
	const unsgined d = g*(hz/MIN(hz, 32));

	KASSERTMSG(d <= hz, "x=%"PRIu32" g=%u d=%u hz=%d", x, g, d, hz);

	return MIN(wg_rekey_timeout, (unsigned)(INT_MAX/hz - 1))*hz + d;
}

Prev by Date: Re: bin/58558: syslog.conf(5) man page example does not work.
Next by Date: port-xen/58561: panic: kernel diagnostic assertion "x86_read_psl() == 0" failed: file "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581
Previous by Thread: Re: [Suspected Junk Mail] Re: bin/58558: syslog.conf(5) man page example does not work.
Next by Thread: port-xen/58561: panic: kernel diagnostic assertion "x86_read_psl() == 0" failed: file "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581
Indexes:

Home | Main Index | Thread Index | Old Index