netbsd-bugs: kern/17723: kernelized PPPoE assumes a ridiculously reliable link (needs config options)

Subject: kern/17723: kernelized PPPoE assumes a ridiculously reliable link (needs config options)
To: None <gnats-bugs@gnats.netbsd.org>
From: None <tv@pobox.com>
List: netbsd-bugs
Date: 07/25/2002 15:38:07

>Number:         17723
>Category:       kern
>Synopsis:       kernelized PPPoE assumes a ridiculously reliable link (needs config options)
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 25 12:39:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Todd Vierling
>Release:        NetBSD 1.6_BETA4
>Organization:

>Environment:

>Description:

The following are set in sys/net/if_pppoe.c:

#define PPPOE_DISC_TIMEOUT      (hz*5)  /* base for quick timeout calculation */
#define PPPOE_SLOW_RETRY        (hz*60) /* persistent retry interval */
#define PPPOE_DISC_MAXPADI      4       /* retry PADI four times (quickly) */
#define PPPOE_DISC_MAXPADR      2       /* retry PADR twice */

So, the minimum separation for PADI packets is 5 seconds, and only 4 tries are done.
If that fails, we go into 1-minute retry, which is annoyingly long for a "nailed"
link.  Additionally, the PADR is only retried twice at a 5-second interval.  Lose
just two of those packets, and we go back to the PADI cycle.

In the Real World(tm), DSL isn't nearly so nice to data streams.  For me, the retry
values above result in the link taking up to 10 minutes to reestablish after a lost
connection[!].

Plus, one thing that makes my connections crap out at least four times a day is this
from sys/net/if_spppsubr.c:

#define MAXALIVECNT                     3       /* max. alive packets */

On a link which is in the middle of sending a burst chunk of data, LCP packets can be
lost in the noise.  These are retried on 30-second intervals, so if we don't get back
a LCP in 30 seconds (**even with data actively flowing**), the connection is dropped
by NetBSD.  Ugh.

>How-To-Repeat:

Use a DSL link that's either flaky or has high volume, drowning out some LCP packets.  
Notice that NetBSD is assuming a much higher (and usually unattainable) level of data
reliability from the telco.

>Fix:

Ideally, add pppoectl options to set these variables at runtime.  The one in
if_spppsubr.c might need to be a sysctl because of the global nature of that code.

Also, possibly, reset the LCP Echo-Request count to zero if any data has been
received on the link.  LCP Echo-Requests we sent may not have been seen, but we know
at least that there *is* data coming in, so we shouldn't simply drop the link based
on lack of LCP Echo-Response.
>Release-Note:
>Audit-Trail:
>Unformatted: