Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Xen MP panics in cpu_switchto()



On Mon, Jan 13, 2020 at 02:49:52PM +0000, Andrew Doran wrote:
> > Now I get a different panic:
> > [   1.0000000] vcpu0 at hypervisor0
> > [   1.0000000] vcpu0: 64 page colors
> > [   1.0000000] vcpu0: Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz, id 0x6fb
> > [   1.0000000] vcpu0: node 0, package 0, core 1, smt 0
> > [   1.0000000] vcpu1 at hypervisor0
> > [   1.0000000] vcpu1: 2 page colors
> > [   1.0000000] vcpu1: starting
> > [   1.0000000] vcpu1: is started.
> > [   1.0000000] vcpu1: Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz, id 0x6fb
> > [   1.0000000] vcpu1: node 0, package 0, core 0, smt 0
> > [...]
> > [   1.0000030] UVM: using package allocation scheme, 1 package(s) per bucket
> > [   1.0000030] Xen vcpu1 clock: using event channel 7
> > [   1.8809493] vcpu1: running
> > [   1.8809493] panic: kernel diagnostic assertion "prev != NULL" failed: file "/dsk/l1/misc/bouyer/HEAD/clean/src/sys/kern/kern_lwp.c", line 1021
> > [   1.8809493] cpu1: Begin traceback...
> > [   1.8809493] vpanic(c057f868,d77abf74,d77abf98,c03cc3e5,c057f868,c057f802,c05b0f71,c05b0ce4,3fd,0) at netbsd:vpanic+0x134
> > [   1.8809493] kern_assert(c057f868,c057f802,c05b0f71,c05b0ce4,3fd,0,0,0,c13a6900,c03c60c0) at netbsd:kern_assert+0x23
> > [   1.8809493] lwp_startup(0,c13a6900,8b1000,c0674200,0,c010007a,0,0,0,0) at netbsd:lwp_startup+0x155
> > [   1.8809493] cpu1: End traceback...
> > 
> > If I remove the call to cpu_switchto() in cpu_hatch() it boots, but it seems
> > that all user processes are running on cpu0 only ...
> 
> I looked and the only thing cpu_switchto() is doing there is setting curlwp,
> but that's already set in cpu_start_secondary(), so it's not needed.

It also sets rsp and rbp. I think rbp is not set by anything else, at last
in the Xen case.
The different rbp value would explain why in one case we hit a KASSERT()
in lwp_startup later.
But I don't know what pcb_rbp contains; I couldn't find where the pcb for
idlelwp is initialized.


> 
> > I can't see what extra work the cpu_switchto() could be doing that would
> > matters, execpt maybe the %epb/rbp init. Any idea ?
> 
> Right I don't think cpu_switchto() matters there.  The strategy for
> assigning LWPs to CPUs in the scheduler has changed.  If the machine is not
> busy everything is likely to stay on CPU0.  Are you putting much load on it?

I just tried a build.sh -j4
CPU0 is 100% busy, the others are 100% idle:

load averages:  3.02,  2.14,  1.26;               up 0+00:51:59        16:59:03
61 processes: 5 runnable, 54 sleeping, 2 on CPU
CPU0 states: 39.3% user,  0.0% nice, 60.7% system,  0.0% interrupt,  0.0% idle
CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU3 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Memory: 1402M Act, 168K Inact, 16K Wired, 14M Exec, 1352M File, 1932M Free
Swap: 

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
21392 bouyer    33    0    29M 5964K RUN/0      0:00  2.00%  0.10% as
    0 root       0    0     0K   11M CPU/3      0:30  0.00%  0.00% [system]
   81 bouyer    85    0    20M 3596K kqueue/0   0:19  0.00%  0.00% tmux
  226 bouyer    43    0    16M 1900K CPU/0      0:00  0.00%  0.00% top
16883 bouyer    33    0  8992K 2212K RUN/0      0:00  0.00%  0.00% nbmake
21137 bouyer    33    0  7844K 1220K RUN/0      0:00  0.00%  0.00% sed
12098 bouyer    33    0  4288K  164K RUN/0      0:00  0.00%  0.00% sh
22411 bouyer    33    0  4288K  164K RUN/0      0:00  0.00%  0.00% cc
   42 root      85    0    80M 5768K poll/0     0:00  0.00%  0.00% sshd

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index