tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: panic: softint screwup



On Tue, Feb 11, 2020 at 07:26:29AM +0000, Nick Hudson wrote:
> On 04/02/2020 23:17, Andrew Doran wrote:
> > On Tue, Feb 04, 2020 at 07:03:28AM -0400, Jared McNeill wrote:
> >
> >> First time seeing this one.. an arm64 board sitting idle at the login prompt
> >> rebooted itself with this panic. Unfortunately the default ddb.onpanic=0
> >> strikes again and I can't get any more information than this:
> >
> > I added this recently to replace a vague KASSERT.  Thanks for grabbing the
> > output.
> >
> >> [ 364.3342263] curcpu=0, spl=4 curspl=7
> >> [ 364.3342263] onproc=0xffff00237f743080 => l_stat=7 l_flag=20000201 l_cpu=0
> >> [ 364.3342263] curlwp=0xffff00237f71e580 => l_stat=1 l_flag=00000200 l_cpu=0
> >> [ 364.3342263] pinned=0xffff00237f71e100 => l_stat=7 l_flag=00000200 l_cpu=0
> >> [ 364.3342263] panic: softint screwup
> >> [ 364.3342263] cpu0: Begin traceback...
> >> [ 364.3342263] trace fp ffffffc101da7be0
> >> [ 364.3342263] fp ffffffc101da7c00 vpanic() at ffffffc0004ad728 netbsd:vpanic+0x160
> >> [ 364.3342263] fp ffffffc101da7c70 panic() at ffffffc0004ad81c netbsd:panic+0x44
> >> [ 364.3342263] fp ffffffc101da7d40 softint_dispatch() at ffffffc00047bda4 netbsd:softint_dispatch+0x5c4
> >> [ 364.3342263] fp ffffffc101d9fc30 cpu_switchto_softint() at ffffffc000085198 netbsd:cpu_switchto_softint+0x68
> >> [ 364.3342263] fp ffffffc101d9fc80 splx() at ffffffc0000040d4 netbsd:splx+0xbc
> >> [ 364.3342263] fp ffffffc101d9fcb0 callout_softclock() at ffffffc000489e04 netbsd:callout_softclock+0x36c
> >> [ 364.3342263] fp ffffffc101d9fd40 softint_dispatch() at ffffffc00047b8dc netbsd:softint_dispatch+0xfc
> >> [ 364.3342263] fp ffffffc101d3fcc0 cpu_switchto_softint() at ffffffc000085198 netbsd:cpu_switchto_softint+0x68
> >> [ 364.3342263] fp ffffffc101d3fdf8 cpu_idle() at ffffffc000086128 netbsd:cpu_idle+0x58
> >> [ 364.3342263] fp ffffffc101d3fe40 idle_loop() at ffffffc0004546a4 netbsd:idle_loop+0x174
> >
> > Something has cleared the LW_RUNNING flag on softclk/0 between where it is
> > set (unlocked) at line 884 of kern_softint.c and callout_softclock().
> 
> Isn't it the case that softclk/0 is the victim/interrupted LWP for a soft{serial,net,bio}.
> That's certainly how I read the FP values.
> 
> the callout handler blocked and softclk/0 became a victim as well maybe?
> 
> http://src.illumos.org/source/xref/netbsd-src/sys/kern/kern_synch.c#687
> 
> a soft{serial,net,bio} happends before curlwp is changed away from the blocking softint thread

I suspect putting the RUNNING flag back into l_pflag will cure it, since
the update of l_flag without the LWP locked is dodgy. I can't think of
sonething that would clobber the update, but it is breaking th rules so
to speak..

I'll do just that on Saturday once back in front of a real computer.

Andrew


Home | Main Index | Thread Index | Old Index