NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/38335 (kernel freeze on alpha MP system)



The following reply was made to PR port-alpha/38335; it has been noted by GNATS.

From: Jarle Greipsland <jarle%uninett.no@localhost>
To: gnats-bugs%NetBSD.org@localhost, mhitch%lightning.msu.montana.edu@localhost
Cc: port-alpha-maintainer%netbsd.org@localhost, 
gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Mon, 26 Oct 2009 15:34:14 +0100 (CET)

 "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost> writes:
 >  > Then on one occasion, the kernel started to repeatedly spew
 >  > Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
 > 0xfffffc003f9ee00
 >  > messages to the console.  The pj value were identical in all the
 >  > messages, but the ci_index value varied (0 or 1).
 >  >
 >  > Do you still think I should try and increase the IPL level of the
 >  > pool_cache entry as specified in your message?
 >  
 >     Try the higher IPL on the pmap_tlb_shootdown_job_cache.  I'm not real 
 >  clear on how that IPL is used, but I'm guessing that might be the IPL used 
 >  by any locking using by the pool cache routines, and may be needed to 
 >  prevent the IPI interrupt from interrupting a pool cache operation.  [That 
 >  might have caused the deadlock you observed above.]  Try IPL_CLOCK first, 
 >  and then IPL_HIGH if that still has problems relating to the pool cache.
 Results for the IPL_HIGH setting: It still has problems with the
 pool_cache_get stuff.  A number of consecutive 'build.sh -j4'
 resulted in console messages:
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003f9efa00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee080
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee440
 
 Also, at during one of the builds, the system hung completely,
 and I had to press the reset button:
 ----------------------------------------------------------------------
 Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x4:       ret     
zero,(ra)
 db{0}> tr
 cpu_Debugger() at netbsd:cpu_Debugger+0x4
 comintr() at netbsd:comintr+0x720
 alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
 sio_iointr() at netbsd:sio_iointr+0x38
 interrupt() at netbsd:interrupt+0x1c0
 XentInt() at netbsd:XentInt+0x1c
 --- interrupt (from ipl 0) ---
 sched_curcpu_runnable_p() at netbsd:sched_curcpu_runnable_p+0x1c
 idle_loop() at netbsd:idle_loop+0x1b8
 exception_return() at netbsd:exception_return
 --- root of call graph ---
 db{0}> mach cpu 1
 Using CPU 1
 db{0}> tra
 
 CPU 0: fatal kernel trap:
 
 CPU 0    trap entry = 0x2 (memory management fault)
 CPU 0    a0         = 0xffffffffffffffd9
 CPU 0    a1         = 0x1
 CPU 0    a2         = 0x0
 CPU 0    pc         = 0xfffffc00003ee944
 CPU 0    ra         = 0xfffffc00003e8104
 CPU 0    pv         = 0xfffffc00003ee890
 CPU 0    curlwp     = 0xfffffc003fe29c00
 CPU 0        pid = 0, comm = system
 
 Caught exception in ddb.
 db{0}> mach cpu 0
 CPU 0 not paused
 db{0}> show reg
 v0          0x1
 t0          0xfffffc0000b44e9c  uvmadvice
 t1          0xfffffc0000b44e9c  uvmadvice
 t2          0
 t3          0x2000
 t4          0
 t5          0x3
 t6          0x4
 t7          0x1
 s0          0xfffffc0038581698
 s1          0
 s2          0x2
 s3          0xe78
 s4          0xfffffe0013bc16e0
 s5          0x160042000
 s6          0xfffffc0031b69000
 a0          0xfffffc0038581698
 a1          0xfffffc0014c0f760
 a2          0xfffffe0013bc1448
 a3          0x160042000
 a4          0xfffffe0013bc16e0
 a5          0
 t8          0x1
 t9          0
 t10         0xfffffc0000c941e8  uvmexp
 t11         0x80
 ra          0xfffffc00008e0bd4  uvm_fault_internal+0x124
 t12         0xfffffc00008e5f50  uvm_map_lookup_entry
 at          0xfffffe0013bbe000
 gp          0xfffffc0000c30968  
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
 sp          0x1
 pc          0xfffffc00008e0e50  uvm_fault_internal+0x3a0
 ps          0
 ai          0x80
 pv          0xfffffc00008e5f50  uvm_map_lookup_entry
 netbsd:uvm_fault_internal+0x3a0:        srl     t3,#0xd,t3
 db{0}> tr
 
 CPU 0: fatal kernel trap:
 
 CPU 0    trap entry = 0x2 (memory management fault)
 CPU 0    a0         = 0xffffffffffffffd9
 CPU 0    a1         = 0x1
 CPU 0    a2         = 0x0
 CPU 0    pc         = 0xfffffc00003ee944
 CPU 0    ra         = 0xfffffc00003e8104
 CPU 0    pv         = 0xfffffc00003ee890
 CPU 0    curlwp     = 0xfffffc003fe29c00
 CPU 0        pid = 0, comm = system
 
 Caught exception in ddb.
 db{0}> reboot 4
 ----------------------------------------------------------------------
 
 Also, for another build, I got a 
 
 ----------------------------------------------------------------------
 panic: fpsave ipi didn't
 Stopped in pid 22623.1 (sh) at  netbsd:cpu_Debugger+0x4:        ret     
zero,(ra)
 db{0}> trace 
 cpu_Debugger() at netbsd:cpu_Debugger+0x4
 panic() at netbsd:panic+0x268
 fpusave_proc() at netbsd:fpusave_proc+0x1b4
 cpu_lwp_free() at netbsd:cpu_lwp_free+0x28
 exit1() at netbsd:exit1+0x568
 sys_exit() at netbsd:sys_exit+0x7c
 syscall_plain() at netbsd:syscall_plain+0x160
 XentSys() at netbsd:XentSys+0x60
 --- syscall (1) ---
 --- user mode ---
 db{0}> mach cpu 1
 CPU 1 not paused
 db{0}> show reg
 v0          0x6
 t0          0x1
 t1          0x1
 t2          0xfffffc003ff48000
 t3          0
 t4          0
 t5          0xfffffc0000b46a65  __func__.21238+0x91c
 t6          0xc5343806
 t7          0xfffffffffffffcbe
 s0          0xfffffc0000c37920  msgbufenabled
 s1          0x104
 s2          0xfffffc0000c350e8  db_onpanic
 s3          0xfffffc003b546800
 s4          0xfffffc0037d7b458
 s5          0xfffffc0000c7bff0  initproc
 s6          0x12003dab0
 a0          0x6
 a1          0xfffffd01fc0003f8
 a2          0
 a3          0x8
 a4          0x3
 a5          0x8
 t8          0x2
 t9          0x8
 t10         0
 t11         0x7
 ra          0xfffffc000080c5a8  panic+0x268
 t12         0xfffffc00003eb590  cpu_Debugger
 at          0xfffffe0013b1a000
 gp          0xfffffc0000c30968  
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
 sp          0xfffffe00139cbcc8
 pc          0xfffffc00003eb594  cpu_Debugger+0x4
 ps          0x6
 ai          0x7
 pv          0xfffffc00003eb590  cpu_Debugger
 netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
 db{0}> reboot 4
 ----------------------------------------------------------------------
 
 Hope this helps.
                                        -jarle
 


Home | Main Index | Thread Index | Old Index