Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Pausing/resuming CPU's in DDB



On Thu, 23 Feb 2012, Martin Husemann wrote:

> On Thu, Feb 23, 2012 at 10:57:47PM +0900, Takeshi Nakayama wrote:
> > How about increase the retry count in sparc64_send_ipi() ?
> > 
> > Ours is now 1000, but FreeBSD is 5000, OpenBSD is 10000.
> 
> Do both, or scale the retry count on cpu speed ...

Looking at the code....

Increasing the number of retries in sparc64_send_ipi() is unlikely to help 
the situation since you should only be able to exit that routine one of 
two ways, if it thinks sending the IPI was successful or through this 
code:

        if (panicstr == NULL)
                panic("cpu%d: ipi_send: couldn't send ipi to UPAID %u"
                        " (tried %d times)", cpu_number(), upaid, i);

Are you getting a panic?  If not, then increasing the loop count won't 
help.


> If it only fails for ddb enter/exit the loop is fine, IMHO - but as we have
> seen other reports of ipi sending failure during normal operation, we should
> add the instrumentation Eduardo suggested and find out where we are blocked
> out that long (but this is mostly orthogonal to the topic at hand).

Also, instead of always sending the IPI to all the cpus I would recomment 
updating the cpuset by removing the processors that have halted for the 
next iteration of your patch.

Eduardo


Home | Main Index | Thread Index | Old Index