NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/38335 (kernel freeze on alpha MP system)



The following reply was made to PR port-alpha/38335; it has been noted by GNATS.

From: Jarle Greipsland <jarle%uninett.no@localhost>
To: mhitch%lightning.msu.montana.edu@localhost
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Wed, 28 Oct 2009 16:01:33 +0100 (CET)

 "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost> writes:
 >    OK, here's something else to try.  I was looking through the alpha 
 > hardware reference manual and checking some of the multiprocessor 
 > information.  I noted that it showed the use of memory barriers when 
 > sending/receiving interrupts between processors.  It looks like the atomic 
 > operations that were used in the netbsd-4 branch included the memory 
 > barrier, but the ones used in netbsd-5 and later do not.  This patch 
 > should add back the memory barriers need for the IPI stuff.
 [ ... ]
 OK, I have applied your patch (and removed the old ones except
 for the one that generates the "Whoa!"-warnings.).  The kernel
 I'm running is GENERIC.MP based on -current from Oct 14th, with
 the following diff:
 ----------------------------------------------------------------------
 Index: arch/alpha/alpha/ipifuncs.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/alpha/alpha/ipifuncs.c,v
 retrieving revision 1.40
 diff -u -r1.40 ipifuncs.c
 --- arch/alpha/alpha/ipifuncs.c        28 Apr 2008 20:23:10 -0000      1.40
 +++ arch/alpha/alpha/ipifuncs.c        28 Oct 2009 14:55:46 -0000
 @@ -130,7 +130,7 @@
                return;
        }
  #endif
 -
 +      alpha_mb();
        pending_ipis = atomic_swap_ulong(&ci->ci_ipis, 0);
  
        /*
 @@ -167,6 +167,7 @@
  #endif
  
        atomic_or_ulong(&cpu_info[cpu_id]->ci_ipis, ipimask);
 +      alpha_mb();
        alpha_pal_wripir(cpu_id);
  }
  
 Index: arch/alpha/alpha/pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/alpha/alpha/pmap.c,v
 retrieving revision 1.243
 diff -u -r1.243 pmap.c
 --- arch/alpha/alpha/pmap.c    4 Oct 2009 17:00:31 -0000       1.243
 +++ arch/alpha/alpha/pmap.c    28 Oct 2009 14:55:46 -0000
 @@ -3699,6 +3699,12 @@
                 * don't really have to do anything else.
                 */
                mutex_spin_enter(&pq->pq_lock);
 +              if (pj && pj == pq->pq_head.tqh_first) {
 +                      printf("Whoa!  pool_cache_get returned an in-use entry! 
ci_index %d pj %p\n",
 +                          self->ci_index, pj);
 +/**/          /*      panic("Oops"); */
 +                      pj = NULL;
 +              }
                pq->pq_pte |= pte;
                if (pq->pq_tbia) {
                        mutex_spin_exit(&pq->pq_lock);
 ----------------------------------------------------------------------
 
 And it both "Whoa!"s and panics:
 ----------------------------------------------------------------------
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003f9ef980
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003d8bfc00
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 1 pj 
0xfffffc003f9ee400
 Whoa!  pool_cache_get returned an in-use entry! ci_index 0 pj 
0xfffffc003f9ee400
 
 CPU 1: fatal kernel trap:
 
 CPU 1    trap entry = 0x2 (memory management fault)
 CPU 1    a0         = 0x40
 CPU 1    a1         = 0x1
 CPU 1    a2         = 0x0
 CPU 1    pc         = 0xfffffc00007371a8
 CPU 1    ra         = 0xfffffc0000737118
 CPU 1    pv         = 0xfffffc00005f6130
 CPU 1    curlwp     = 0xfffffc003f960800
 CPU 1        pid = 0, comm = system
 
 panic: trap
 Stopped in pid 0.37 (system) at netbsd:cpu_Debugger+0x4:        ret     
zero,(ra)
 db{1}> tr
 cpu_Debugger() at netbsd:cpu_Debugger+0x4
 panic() at netbsd:panic+0x268
 trap() at netbsd:trap+0x35c
 XentMM() at netbsd:XentMM+0x20
 --- memory management fault (from ipl 5) ---
 pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0xe8
 alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
 interrupt() at netbsd:interrupt+0x88
 XentInt() at netbsd:XentInt+0x1c
 --- interrupt (from ipl 0) ---
 mutex_spin_exit() at netbsd:mutex_spin_exit+0x5c
 pmap_tlb_shootdown() at netbsd:pmap_tlb_shootdown+0x170
 pmap_kremove() at netbsd:pmap_kremove+0xac
 uvm_pagermapout() at netbsd:uvm_pagermapout+0x40
 uvm_aio_aiodone() at netbsd:uvm_aio_aiodone+0xd4
 db{1}> show reg
 v0          0xfffffe0000034800
 t0          0x1
 t1          0x1
 t2          0xfffffc003ff48000
 t3          0
 t4          0
 t5          0xfffffc0000b46a3d  __func__.21238+0x91c
 t6          0
 t7          0
 s0          0xfffffc0000c378e0  msgbufenabled
 s1          0x104
 s2          0xfffffc0000c350a8  db_onpanic
 s3          0xfffffc00009bf5fc  reg_to_frame+0x5c8
 s4          0xfffffe0013923a38
 s5          0x40
 s6          0xfffffc003f960800
 a0          0x5
 a1          0xfffffd01fc0003f8
 a2          0
 a3          0x8
 a4          0x3
 a5          0xfffffe0000000008
 t8          0xfffffe00139237ff
 t9          0x8
 t10         0x3ea0a5
 t11         0x1ff800
 ra          0xfffffc000080c5b8  panic+0x268
 t12         0xfffffc00003eb590  cpu_Debugger
 at          0x12002438c
 gp          0xfffffc0000c30928  
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
 sp          0xfffffe0013923888
 pc          0xfffffc00003eb594  cpu_Debugger+0x4
 ps          0x6
 ai          0x1ff800
 pv          0xfffffc00003eb590  cpu_Debugger
 netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
 db{1}> mach cpu 0
 Using CPU 0
 db{1}> tr
 
 CPU 1: fatal kernel trap:
 
 CPU 1    trap entry = 0x2 (memory management fault)
 CPU 1    a0         = 0xffffffffffffffd9
 CPU 1    a1         = 0x1
 CPU 1    a2         = 0x0
 CPU 1    pc         = 0xfffffc00003ee944
 CPU 1    ra         = 0xfffffc00003e8104
 CPU 1    pv         = 0xfffffc00003ee890
 CPU 1    curlwp     = 0xfffffc003f960800
 CPU 1        pid = 0, comm = system
 
 Caught exception in ddb.
 db{1}> show reg
 v0          0
 t0          0
 t1          0xfffffc003fe29c00
 t2          0xfffffe0012c8a400
 t3          0
 t4          0
 t5          0xfffffc003fe29c60
 t6          0xfffffc0000c68410  kernel_pmap_store+0x50
 t7          0
 s0          0x1
 s1          0xfffffc003fe29c00
 s2          0xfffffc0000c0de60  cpu_info_primary+0x38
 s3          0xfffffc0000c0de28  cpu_info_primary
 s4          0
 s5          0
 s6          0
 a0          0
 a1          0
 a2          0xfffffe0012c9a000
 a3          0x1
 a4          0xfffffc0000c94308  uvm_fpageqlock
 a5          0
 t8          0x1604db790
 t9          0
 t10         0xffffffff
 t11         0xfffffc003f90a8b8
 ra          0xfffffc00005eac88  idle_loop+0x1b8
 t12         0xfffffc000060e200  kpreempt_enable
 at          0xfffffe0013984000
 gp          0xfffffc0000c30928  
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
 sp          0x1
 pc          0xfffffc00005eac34  idle_loop+0x164
 ps          0
 ai          0xfffffc003f90a8b8
 pv          0xfffffc000060e200  kpreempt_enable
 netbsd:idle_loop+0x164: ldq     pv,-1d30(gp)
 ----------------------------------------------------------------------
 
 So, no cigar this time.  Anything else I should try?
 
                                        -jarle
 -- 
 Q: What's the difference between programming and bug collecting?
 A: None.
 


Home | Main Index | Thread Index | Old Index