Re: Pausing/resuming CPU's in DDB

To: Martin Husemann <martin%duskware.de@localhost>
Subject: Re: Pausing/resuming CPU's in DDB
From: Eduardo Horvath <eeh%NetBSD.org@localhost>
Date: Thu, 23 Feb 2012 17:09:48 +0000 (UTC)

On Thu, 23 Feb 2012, Martin Husemann wrote:

> On Thu, Feb 23, 2012 at 10:57:47PM +0900, Takeshi Nakayama wrote:
> > How about increase the retry count in sparc64_send_ipi() ?
> > 
> > Ours is now 1000, but FreeBSD is 5000, OpenBSD is 10000.
> 
> Do both, or scale the retry count on cpu speed ...

Looking at the code....

Increasing the number of retries in sparc64_send_ipi() is unlikely to help 
the situation since you should only be able to exit that routine one of 
two ways, if it thinks sending the IPI was successful or through this 
code:

        if (panicstr == NULL)
                panic("cpu%d: ipi_send: couldn't send ipi to UPAID %u"
                        " (tried %d times)", cpu_number(), upaid, i);

Are you getting a panic?  If not, then increasing the loop count won't 
help.


> If it only fails for ddb enter/exit the loop is fine, IMHO - but as we have
> seen other reports of ipi sending failure during normal operation, we should
> add the instrumentation Eduardo suggested and find out where we are blocked
> out that long (but this is mostly orthogonal to the topic at hand).

Also, instead of always sending the IPI to all the cpus I would recomment 
updating the cpuset by removing the processors that have halted for the 
next iteration of your patch.

Eduardo

Follow-Ups:
- Re: Pausing/resuming CPU's in DDB
  - From: Julian Coleman

References:
- Pausing/resuming CPU's in DDB
  - From: Julian Coleman
- Re: Pausing/resuming CPU's in DDB
  - From: Takeshi Nakayama
- Re: Pausing/resuming CPU's in DDB
  - From: Martin Husemann

Prev by Date: Re: recent persistent boot failures
Next by Date: Re: recent persistent boot failures
Previous by Thread: Re: Pausing/resuming CPU's in DDB
Next by Thread: Re: Pausing/resuming CPU's in DDB
Indexes:

Home | Main Index | Thread Index | Old Index