Re: kern/38637: pppoe fails to reconnect sometimes

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,bouyer%antioche.eu.org@localhost
Subject: Re: kern/38637: pppoe fails to reconnect sometimes
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Tue, 5 May 2009 20:55:01 +0000 (UTC)

The following reply was made to PR kern/38637; it has been noted by GNATS.

From: Andrew Doran <ad%netbsd.org@localhost>
To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, kern-bug-people%NetBSD.org@localhost,
        gnats-admin%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/38637: pppoe fails to reconnect sometimes
Date: Tue, 5 May 2009 20:52:45 +0000

 On Tue, May 05, 2009 at 10:20:21PM +0200, Manuel Bouyer wrote:
 
 > On Fri, Apr 03, 2009 at 08:06:00PM +0200, Manuel Bouyer wrote:
 > > On Thu, Oct 02, 2008 at 10:50:05AM +0000, Martin Husemann wrote:
 > > >  Sorry, I don't see how this could happen besides the softint handling
 > > >  breaking completely (something corrupting the call wheel?):
 > > > [...]
 > > 
 > > Update on this: this time (with a 5.0_RC3 kernel), it's not the ppoeinq
 > > which stopped being processed and did overflow, but ipintrq !
 > > So it's really related to soft interrupt not being called anymore ...
 > 
 > I think I found the reason for this (or at last a possible reason).
 > In sys/kern/kern_softint.c, sh_flags should be declared volatile.
 > Without it, on ports where splhigh() is inline, the compiler will optimise
 > the second SOFTINT_PENDING test in softint_schedule(). A dissasembly
 > of softint_schedule() with and without the volatile sh_flags confirm this
 > on sparc.
 > Because of this there is a race that could lead to the softhand_t
 > being enqueued twice on si_q, leading to a corrupted queue and
 > some handler being SOFTINT_PENDING but never called.
 
 Nice diagnosis! However the softint code is correct.
 
 splhigh/splx on sparc64 should be real functions, or if one believes there
 is a performance advantage, should be inlines with __insn_barrier in the
 correct spots. Consider the potential effects elsewhere. I pointed this
 issue out to Martin or Matthew a couple of years ago and then promptly
 forgot about it.

Prev by Date: Re: kern/38637: pppoe fails to reconnect sometimes
Next by Date: Re: kern/38637: pppoe fails to reconnect sometimes
Previous by Thread: Re: kern/38637: pppoe fails to reconnect sometimes
Next by Thread: Re: kern/38637: pppoe fails to reconnect sometimes
Indexes:

Home | Main Index | Thread Index | Old Index