NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/38637: pppoe fails to reconnect sometimes



On Mon, May 12, 2008 at 01:55:00PM +0000, bouyer%antioche.eu.org@localhost 
wrote:
> >Description:
>       Since I upgraded to 4.99.62, pppoe sometimes fails to reconnect.
>       The kernel says:
> May 12 14:22:48 chassiron /netbsd: pppoe0: LCP keepalive timed out, going to 
> restart the connection
>       and then, the only activity on the ethernet link is:
> 15:17:53.128532 PPPoE PADI [Service-Name] [Host-Uniq 0xF0B16000]
> 15:17:53.186377 PPPoE PADO [AC-Name "BSMSO108"] [Host-Uniq 0xF0B16000] 
> [Service-Name] [AC-Cookie 0xF017FC9DEF801E29EFF08EEFC116CACE]
>       once per minute. Here it has been in this state for about 55 minutes.
>       A reboot cause the pppoe device to reconnect immediatly. A down/up
>       or destroy/create of the pppoe0 device doesn't help, a reboot is
>       required. This seems to only happens when the link is actively used
>       (e.g. pop3 and imap clients active, a web browser open with
>       self-refershing pages loaded, some ssh sessions, ...). The link has
>       been up for several days while I was not home ...
>       A normal connect (after a fresh reboot) looks like
> 15:35:03.563740 PPPoE PADI [Service-Name] [Host-Uniq 0xF0B13000]
> 15:35:03.678298 PPPoE PADO [AC-Name "BSMSO108"] [Host-Uniq 0xF0B13000] 
> [Service-Name] [AC-Cookie 0xF017FC9DEF801E29EFF08EEFC116CACE]
> 15:35:03.678667 PPPoE PADR [Service-Name] [AC-Cookie 
> 0xF017FC9DEF801E29EFF08EEFC116CACE] [Host-Uniq 0xF0B13000]
> 15:35:03.838963 PPPoE PADS [ses 0x104a] [Service-Name] [Host-Uniq 0xF0B13000] 
> [AC-Name "BSMSO108"] [AC-Cookie 0xF017FC9DEF801E29EFF08EEFC116CACE]
> 15:35:03.850141 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 1, length 16
> 15:35:03.958371 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 113, length 
> 21
> 15:35:03.958829 PPPoE  [ses 0x104a] LCP, Conf-Ack (0x02), id 113, length 21
> 15:35:03.961091 PPPoE  [ses 0x104a] LCP, Conf-Ack (0x02), id 1, length 16
> 15:35:04.016830 PPPoE  [ses 0x104a] CHAP, Challenge (0x01), id 22, Value 
> ff40f7a275c2a4792ce99b25447d7c2a, Name BSMSO108
> 15:35:04.017743 PPPoE  [ses 0x104a] CHAP, Response (0x02), id 22, Value 
> 748b4c144654d0ff50853491582a6a17, Name bouyer%net1.nerim.nerim@localhost
> 15:35:04.362918 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 1, length 17
> 15:35:04.363401 PPPoE  [ses 0x104a] LCP, Conf-Ack (0x02), id 1, length 17
> 15:35:04.363746 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 2, length 16
> 15:35:04.422696 PPPoE  [ses 0x104a] LCP, Conf-Nack (0x03), id 2, length 10
> 15:35:04.423209 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 3, length 16
> 15:35:04.475556 PPPoE  [ses 0x104a] LCP, Conf-Nack (0x03), id 3, length 10
> 15:35:04.476004 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 4, length 16
> 15:35:04.531192 PPPoE  [ses 0x104a] LCP, Conf-Nack (0x03), id 4, length 10
> 15:35:04.531779 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 5, length 16
> 15:35:04.585438 PPPoE  [ses 0x104a] LCP, Conf-Nack (0x03), id 5, length 10
> 15:35:04.586005 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 6, length 16
> 15:35:04.642439 PPPoE  [ses 0x104a] LCP, Conf-Nack (0x03), id 6, length 10
> 15:35:04.643141 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 7, length 16
> 15:35:04.694636 PPPoE  [ses 0x104a] LCP, Conf-Reject (0x04), id 7, length 10
> 15:35:04.695074 PPPoE  [ses 0x104a] LCP, Conf-Request (0x01), id 8, length 12
> 15:35:04.750559 PPPoE  [ses 0x104a] LCP, Conf-Ack (0x02), id 8, length 12
> 15:35:04.753007 PPPoE  [ses 0x104a] CHAP, Challenge (0x01), id 23, Value 
> c7707a87f9e0df56aa137099f6c6961e, Name lns303-tip-courbevoie
> 15:35:04.753528 PPPoE  [ses 0x104a] CHAP, Response (0x02), id 23, Value 
> b02dbd3ffedc96e7d42166e83c8bc657, Name bouyer%net1.nerim.nerim@localhost
> 15:35:04.822418 PPPoE  [ses 0x104a] CHAP, Success (0x03), id 23, Msg
> 15:35:04.822963 PPPoE  [ses 0x104a] IPCP, Conf-Request (0x01), id 1, length 6
> 
> so it looks like the PADO sent by BSMSO108 is ignored by the kernel in the
> non-working case.
> It should also be noted that the telco forces a disconnect every 24 hours,
> and in this case the kernel has no problems reconnecting. It only
> happens when it detects a "LCP keepalive timed out", in some case
> (if I power off the modem, the kernel also detects the LCP timeout, but
> powering on the modem again gives a working connection again after a few
> minutes).
> 
> I didn't have this issue with the previous (4.99.20, June 2007) kernel.


More details on this issue: it just showed up again (with a kernel from
a few days ago) and netstat reported:
ppoediscinq:
        queue length: 159
        maximum queue length: 256
        packets dropped: 0
ppoeinq:
        queue length: 159
        maximum queue length: 256
        packets dropped: 0
(all other queue length are 0). So it looks like the pppoe queues stop
being processed, for unknown reason. This may even be the reason of the
disconnect in the first place.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index