Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Xen MP panics in cpu_switchto()
On Mon, Jan 13, 2020 at 02:49:52PM +0000, Andrew Doran wrote:
> > Now I get a different panic:
> > [ 1.0000000] vcpu0 at hypervisor0
> > [ 1.0000000] vcpu0: 64 page colors
> > [ 1.0000000] vcpu0: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz, id 0x6fb
> > [ 1.0000000] vcpu0: node 0, package 0, core 1, smt 0
> > [ 1.0000000] vcpu1 at hypervisor0
> > [ 1.0000000] vcpu1: 2 page colors
> > [ 1.0000000] vcpu1: starting
> > [ 1.0000000] vcpu1: is started.
> > [ 1.0000000] vcpu1: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz, id 0x6fb
> > [ 1.0000000] vcpu1: node 0, package 0, core 0, smt 0
> > [...]
> > [ 1.0000030] UVM: using package allocation scheme, 1 package(s) per bucket
> > [ 1.0000030] Xen vcpu1 clock: using event channel 7
> > [ 1.8809493] vcpu1: running
> > [ 1.8809493] panic: kernel diagnostic assertion "prev != NULL" failed: file "/dsk/l1/misc/bouyer/HEAD/clean/src/sys/kern/kern_lwp.c", line 1021
> > [ 1.8809493] cpu1: Begin traceback...
> > [ 1.8809493] vpanic(c057f868,d77abf74,d77abf98,c03cc3e5,c057f868,c057f802,c05b0f71,c05b0ce4,3fd,0) at netbsd:vpanic+0x134
> > [ 1.8809493] kern_assert(c057f868,c057f802,c05b0f71,c05b0ce4,3fd,0,0,0,c13a6900,c03c60c0) at netbsd:kern_assert+0x23
> > [ 1.8809493] lwp_startup(0,c13a6900,8b1000,c0674200,0,c010007a,0,0,0,0) at netbsd:lwp_startup+0x155
> > [ 1.8809493] cpu1: End traceback...
> >
> > If I remove the call to cpu_switchto() in cpu_hatch() it boots, but it seems
> > that all user processes are running on cpu0 only ...
>
> I looked and the only thing cpu_switchto() is doing there is setting curlwp,
> but that's already set in cpu_start_secondary(), so it's not needed.
It also sets rsp and rbp. I think rbp is not set by anything else, at last
in the Xen case.
The different rbp value would explain why in one case we hit a KASSERT()
in lwp_startup later.
But I don't know what pcb_rbp contains; I couldn't find where the pcb for
idlelwp is initialized.
>
> > I can't see what extra work the cpu_switchto() could be doing that would
> > matters, execpt maybe the %epb/rbp init. Any idea ?
>
> Right I don't think cpu_switchto() matters there. The strategy for
> assigning LWPs to CPUs in the scheduler has changed. If the machine is not
> busy everything is likely to stay on CPU0. Are you putting much load on it?
I just tried a build.sh -j4
CPU0 is 100% busy, the others are 100% idle:
load averages: 3.02, 2.14, 1.26; up 0+00:51:59 16:59:03
61 processes: 5 runnable, 54 sleeping, 2 on CPU
CPU0 states: 39.3% user, 0.0% nice, 60.7% system, 0.0% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU3 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 1402M Act, 168K Inact, 16K Wired, 14M Exec, 1352M File, 1932M Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
21392 bouyer 33 0 29M 5964K RUN/0 0:00 2.00% 0.10% as
0 root 0 0 0K 11M CPU/3 0:30 0.00% 0.00% [system]
81 bouyer 85 0 20M 3596K kqueue/0 0:19 0.00% 0.00% tmux
226 bouyer 43 0 16M 1900K CPU/0 0:00 0.00% 0.00% top
16883 bouyer 33 0 8992K 2212K RUN/0 0:00 0.00% 0.00% nbmake
21137 bouyer 33 0 7844K 1220K RUN/0 0:00 0.00% 0.00% sed
12098 bouyer 33 0 4288K 164K RUN/0 0:00 0.00% 0.00% sh
22411 bouyer 33 0 4288K 164K RUN/0 0:00 0.00% 0.00% cc
42 root 85 0 80M 5768K poll/0 0:00 0.00% 0.00% sshd
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index