Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Pausing/resuming CPU's in DDB



Hi,

> Increasing the number of retries in sparc64_send_ipi() is unlikely to help 
> the situation since you should only be able to exit that routine one of 
> two ways, if it thinks sending the IPI was successful or through this 
> code:
> 
>         if (panicstr == NULL)
>                 panic("cpu%d: ipi_send: couldn't send ipi to UPAID %u"
>                         " (tried %d times)", cpu_number(), upaid, i);
> 
> Are you getting a panic?  If not, then increasing the loop count won't 
> help.

No panic, but I did see the "RED State Exception".  I have increased the
number of retries in sparc64_send_ipi to 10000, and the E3500 survived a
10h30 build.sh -j 16 (base, X, 2 kernels) with all filesystems on NFS.

However, even with retries set to 10000, it still sometimes fails to pause or
to resume all the CPU's, so the loops in mp_pause_cpus() and mp_resume_cpus()
still seem to be necessary.

> Also, instead of always sending the IPI to all the cpus I would recomment 
> updating the cpuset by removing the processors that have halted for the 
> next iteration of your patch.

Next iteration of the patch attached.

Thanks,

J

PS.  It was suggested that the code could be moved to use kcupset(9), but I
haven't looked at that yet.

-- 
  My other computer also runs NetBSD    /        Sailing at Newbiggin
        http://www.netbsd.org/        /   http://www.newbigginsailingclub.org/
Index: ipifuncs.c
===================================================================
RCS file: /cvsroot/src/sys/arch/sparc64/sparc64/ipifuncs.c,v
retrieving revision 1.44
diff -u -p -r1.44 ipifuncs.c
--- ipifuncs.c  12 Feb 2012 16:34:10 -0000      1.44
+++ ipifuncs.c  10 Mar 2012 23:43:11 -0000
@@ -222,7 +222,7 @@ sparc64_send_ipi(int upaid, ipifunc_t fu
        intr_func = (uint64_t)(u_long)func;
 
        /* Schedule an interrupt. */
-       for (i = 0; i < 1000; i++) {
+       for (i = 0; i < 10000; i++) {
                int s = intr_disable();
 
                stxa(IDDR_0H, ASI_INTERRUPT_DISPATCH, intr_func);
@@ -325,17 +325,21 @@ mp_halt_cpus(void)
 void
 mp_pause_cpus(void)
 {
+       int i = 3;
        sparc64_cpuset_t cpuset;
 
        CPUSET_ASSIGN(cpuset, cpus_active);
        CPUSET_DEL(cpuset, cpu_number());
+       while (i-- > 0) {
+               if (CPUSET_EMPTY(cpuset))
+                       return;
 
-       if (CPUSET_EMPTY(cpuset))
-               return;
-
-       sparc64_multicast_ipi(cpuset, sparc64_ipi_pause, 0, 0);
-       if (sparc64_ipi_wait(&cpus_paused, cpuset))
-               sparc64_ipi_error("pause", cpus_paused, cpuset);
+               sparc64_multicast_ipi(cpuset, sparc64_ipi_pause, 0, 0);
+               if (!sparc64_ipi_wait(&cpus_paused, cpuset))
+                       return;
+               CPUSET_SUB(cpuset, cpus_paused);
+       }
+       sparc64_ipi_error("pause", cpus_paused, cpuset);
 }
 
 /*
@@ -354,16 +358,20 @@ mp_resume_cpu(int cno)
 void
 mp_resume_cpus(void)
 {
+       int i = 3;
        sparc64_cpuset_t cpuset;
 
-       CPUSET_CLEAR(cpus_resumed);
-       CPUSET_ASSIGN(cpuset, cpus_paused);
-       membar_Sync();
-       CPUSET_CLEAR(cpus_paused);
+       while (i-- > 0) {
+               CPUSET_CLEAR(cpus_resumed);
+               CPUSET_ASSIGN(cpuset, cpus_paused);
+               membar_Sync();
+               CPUSET_CLEAR(cpus_paused);
 
-       /* CPUs awake on cpus_paused clear */
-       if (sparc64_ipi_wait(&cpus_resumed, cpuset))
-               sparc64_ipi_error("resume", cpus_resumed, cpuset);
+               /* CPUs awake on cpus_paused clear */
+               if (!sparc64_ipi_wait(&cpus_resumed, cpuset))
+                       return;
+       }
+       sparc64_ipi_error("resume", cpus_resumed, cpuset);
 }
 
 int


Home | Main Index | Thread Index | Old Index