NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-macppc/54331: macppc MP kernels fail to boot successfully in -current (8.99.49)



The following reply was made to PR port-macppc/54331; it has been noted by GNATS.

From: "David H. Gutteridge" <david%gutteridge.ca@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 in -current (8.99.49)
Date: Mon, 29 Jul 2019 19:38:01 -0400

 On Thu, 2019-07-25 at 17:15 +0000, Michael wrote:
 > The following reply was made to PR port-macppc/54331; it has been noted by GNATS.
 > 
 > From: Michael <macallan%netbsd.org@localhost>
 > To: gnats-bugs%netbsd.org@localhost
 > Cc: 
 > Subject: Re: port-macppc/54331: macppc MP kernels fail to boot successfully
 >  in -current (8.99.49)
 > Date: Thu, 25 Jul 2019 13:12:31 -0400
 > 
 >  I finally had a look at the code and the logs - looks like a race
 >  between cpu_setup() on the newly hatched CPU and delay(200000) in
 >  cpu_spinup(), where we just wait a fixed time and then bail out if the
 >  other CPU isn't ready. This would explain why the 2nd CPU still wakes
 >  up later on but then things go south because we didn't install the IPI
 >  handler.
 >  
 >  The following patch adds a period of polling h->hatch_running for a
 >  while instead of just blindly increasing the delay().
 >  
 >  Index: cpu_subr.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/powerpc/oea/cpu_subr.c,v
 >  retrieving revision 1.99
 >  diff -u -w -r1.99 cpu_subr.c
 >  --- cpu_subr.c  6 Feb 2019 07:32:50 -0000       1.99
 >  +++ cpu_subr.c  25 Jul 2019 17:03:14 -0000
 >  @@ -1381,6 +1385,17 @@
 >          __asm volatile ("dcbst 0,%0"::"r"(&h->hatch_running):"memory");
 >          __asm volatile ("sync; isync");
 >   #endif
 >  +       int hatch_bail = 0;
 >  +       while ((h->hatch_running < 1) && (hatch_bail < 100000)) {
 >  +               delay(1);
 >  +               hatch_bail++;
 >  +#ifdef CACHE_PROTO_MEI
 >  +               __asm volatile ("dcbi 0,%0"::"r"(&h->hatch_running):"memory");
 >  +               __asm volatile ("sync; isync");
 >  +               __asm volatile ("dcbst 0,%0"::"r"(&h->hatch_running):"memory");
 >  +               __asm volatile ("sync; isync");
 >  +#endif
 >  +       }
 >          if (h->hatch_running < 1) {
 >   #ifdef CACHE_PROTO_MEI
 >                  __asm volatile ("dcbi
 >                  0,%0"::"r"(&cpu_spinstart_ack):"memory");
 
 Hi Michael,
 
 Your patch fixes the initial panic; it now continues booting without
 dropping into the debugger, and it doesn't "spontaneously" reboot at
 any point. Now the debug and non-debug MP kernels behave identically,
 which is that they hang right after USB devices are recognized,
 without any output to indicate why. (My debug kernel includes the
 DEBUG, DIAGNOSTIC, and LOCKDEBUG options.)
 
 Separately, I've now hit an occasion where 8.1_STABLE without debugging
 options also hangs, which I hadn't seen before. It gets farther than
 -current: it mounts file systems and tries /sbin/init, and hangs there.
 I'm going to try some more tests with it and see what happens.
 
 Thanks,
 
 Dave
 
 


Home | Main Index | Thread Index | Old Index