NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/38335 (kernel freeze on alpha MP system)



"Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost> writes:
>  > Then on one occasion, the kernel started to repeatedly spew
>  > Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
> 0xfffffc003f9ee00
>  > messages to the console.  The pj value were identical in all the
>  > messages, but the ci_index value varied (0 or 1).
>  >
>  > Do you still think I should try and increase the IPL level of the
>  > pool_cache entry as specified in your message?
>  
>     Try the higher IPL on the pmap_tlb_shootdown_job_cache.  I'm not real 
>  clear on how that IPL is used, but I'm guessing that might be the IPL used 
>  by any locking using by the pool cache routines, and may be needed to 
>  prevent the IPI interrupt from interrupting a pool cache operation.  [That 
>  might have caused the deadlock you observed above.]  Try IPL_CLOCK first, 
>  and then IPL_HIGH if that still has problems relating to the pool cache.
Results for the IPL_HIGH setting: It still has problems with the
pool_cache_get stuff.  A number of consecutive 'build.sh -j4'
resulted in console messages:
Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 0xfffffc003f9efa00
Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 0xfffffc003f9ee080
Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 0xfffffc003f9ee440

Also, at during one of the builds, the system hung completely,
and I had to press the reset button:
----------------------------------------------------------------------
Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x4:        ret     
zero,(ra)
db{0}> tr
cpu_Debugger() at netbsd:cpu_Debugger+0x4
comintr() at netbsd:comintr+0x720
alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
sio_iointr() at netbsd:sio_iointr+0x38
interrupt() at netbsd:interrupt+0x1c0
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
sched_curcpu_runnable_p() at netbsd:sched_curcpu_runnable_p+0x1c
idle_loop() at netbsd:idle_loop+0x1b8
exception_return() at netbsd:exception_return
--- root of call graph ---
db{0}> mach cpu 1
Using CPU 1
db{0}> tra

CPU 0: fatal kernel trap:

CPU 0    trap entry = 0x2 (memory management fault)
CPU 0    a0         = 0xffffffffffffffd9
CPU 0    a1         = 0x1
CPU 0    a2         = 0x0
CPU 0    pc         = 0xfffffc00003ee944
CPU 0    ra         = 0xfffffc00003e8104
CPU 0    pv         = 0xfffffc00003ee890
CPU 0    curlwp     = 0xfffffc003fe29c00
CPU 0        pid = 0, comm = system

Caught exception in ddb.
db{0}> mach cpu 0
CPU 0 not paused
db{0}> show reg
v0          0x1
t0          0xfffffc0000b44e9c  uvmadvice
t1          0xfffffc0000b44e9c  uvmadvice
t2          0
t3          0x2000
t4          0
t5          0x3
t6          0x4
t7          0x1
s0          0xfffffc0038581698
s1          0
s2          0x2
s3          0xe78
s4          0xfffffe0013bc16e0
s5          0x160042000
s6          0xfffffc0031b69000
a0          0xfffffc0038581698
a1          0xfffffc0014c0f760
a2          0xfffffe0013bc1448
a3          0x160042000
a4          0xfffffe0013bc16e0
a5          0
t8          0x1
t9          0
t10         0xfffffc0000c941e8  uvmexp
t11         0x80
ra          0xfffffc00008e0bd4  uvm_fault_internal+0x124
t12         0xfffffc00008e5f50  uvm_map_lookup_entry
at          0xfffffe0013bbe000
gp          0xfffffc0000c30968  
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
sp          0x1
pc          0xfffffc00008e0e50  uvm_fault_internal+0x3a0
ps          0
ai          0x80
pv          0xfffffc00008e5f50  uvm_map_lookup_entry
netbsd:uvm_fault_internal+0x3a0:        srl     t3,#0xd,t3
db{0}> tr

CPU 0: fatal kernel trap:

CPU 0    trap entry = 0x2 (memory management fault)
CPU 0    a0         = 0xffffffffffffffd9
CPU 0    a1         = 0x1
CPU 0    a2         = 0x0
CPU 0    pc         = 0xfffffc00003ee944
CPU 0    ra         = 0xfffffc00003e8104
CPU 0    pv         = 0xfffffc00003ee890
CPU 0    curlwp     = 0xfffffc003fe29c00
CPU 0        pid = 0, comm = system

Caught exception in ddb.
db{0}> reboot 4
----------------------------------------------------------------------

Also, for another build, I got a 

----------------------------------------------------------------------
panic: fpsave ipi didn't
Stopped in pid 22623.1 (sh) at  netbsd:cpu_Debugger+0x4:        ret     
zero,(ra)
db{0}> trace 
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x268
fpusave_proc() at netbsd:fpusave_proc+0x1b4
cpu_lwp_free() at netbsd:cpu_lwp_free+0x28
exit1() at netbsd:exit1+0x568
sys_exit() at netbsd:sys_exit+0x7c
syscall_plain() at netbsd:syscall_plain+0x160
XentSys() at netbsd:XentSys+0x60
--- syscall (1) ---
--- user mode ---
db{0}> mach cpu 1
CPU 1 not paused
db{0}> show reg
v0          0x6
t0          0x1
t1          0x1
t2          0xfffffc003ff48000
t3          0
t4          0
t5          0xfffffc0000b46a65  __func__.21238+0x91c
t6          0xc5343806
t7          0xfffffffffffffcbe
s0          0xfffffc0000c37920  msgbufenabled
s1          0x104
s2          0xfffffc0000c350e8  db_onpanic
s3          0xfffffc003b546800
s4          0xfffffc0037d7b458
s5          0xfffffc0000c7bff0  initproc
s6          0x12003dab0
a0          0x6
a1          0xfffffd01fc0003f8
a2          0
a3          0x8
a4          0x3
a5          0x8
t8          0x2
t9          0x8
t10         0
t11         0x7
ra          0xfffffc000080c5a8  panic+0x268
t12         0xfffffc00003eb590  cpu_Debugger
at          0xfffffe0013b1a000
gp          0xfffffc0000c30968  
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
sp          0xfffffe00139cbcc8
pc          0xfffffc00003eb594  cpu_Debugger+0x4
ps          0x6
ai          0x7
pv          0xfffffc00003eb590  cpu_Debugger
netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
db{0}> reboot 4
----------------------------------------------------------------------

Hope this helps.
                                        -jarle


Home | Main Index | Thread Index | Old Index