Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NMI not working as expected on Dell 2850



Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost> writes:

> In single CPU mode (boot -1), the NMI button will drop the machine into
> the debugger, but only once.  If I "continue" from the debugger, the NMI
> button will no longer work afterwards.

The same thing happens when I use ipmitool on another host to tell the
BMC on this one to pulse an NMI to the CPUs.  It'll work once, and then
nothing happens on subsequent attempts.  Breakpoints and single stepping
work as they should, though, dropping nicely back into the debugger.

> In SMP mode, the NMI button will, again, drop the machine into the
> debugger, but two of the CPUs will do so, the text announcing this
> being printed by both, and intermingled on the serial console.

This only looked wrong -- the problem is that the announcement is
printed by all CPUs, instead of only by the one that's first, and ends
up running the debugger interface.  I moved the printf() into the code
path run only by the winning CPU, and got that cleaned up.

> If I "continue" from the debugger, the machine will hang, and a forced
> power cycle is then the only way out.

I still don't understand this bit.  There are obvious errors in the
kdb_trap() function in sys/arch/amd64/amd64/db_interface.c, inherited
from the i386 version, and possibly originally caused by the visual
confusion of the body of an if() block not being indented.  Code is
executed by the other CPUs that should only be run by the one running
the debugger.  However, fixing that didn't solve the hang.  I suspect
that the function calls in question could just as well be deleted.

If someone with NetBSD/amd64 on a multiprocessor machine that isn't a
Dell PowerEdge server would check whether NMI handling works properly
for them, including being able to "continue", and the dropping into the
debugger again with a new NMI, I'd much appreciate it.

Oh, and here's my cleaned up version of kdb_trap():

int
kdb_trap(int type, int code, db_regs_t *regs)
{
	int s;
	db_regs_t dbreg;

	switch (type) {
	case T_NMI:	/* NMI */
	case T_BPTFLT:	/* breakpoint */
	case T_TRCTRAP:	/* single_step */
	case -1:	/* keyboard interrupt */
		break;
	default:
		if (!db_onpanic && db_recover==0)
			return (0);

		kdbprinttrap(type, code);
		if (db_recover != 0) {
			db_error("Faulted in DDB; continuing...\n");
			/*NOTREACHED*/
		}
	}

#ifdef MULTIPROCESSOR
	if (!db_suspend_others()) {
		ddb_suspend(regs);
	} else {
		curcpu()->ci_ddb_regs = &dbreg;
		ddb_regp = &dbreg;
#endif
		if (type == T_NMI)
			printf("NMI received; going to debugger\n");

		ddb_regs = *regs;
		ddb_regs.tf_cs &= 0xffff;
		ddb_regs.tf_ds &= 0xffff;
		ddb_regs.tf_es &= 0xffff;
		ddb_regs.tf_fs &= 0xffff;
		ddb_regs.tf_gs &= 0xffff;
		ddb_regs.tf_ss &= 0xffff;

		s = splhigh();
		db_active++;
		cnpollc(true);
		db_trap(type, code);
		cnpollc(false);
		db_active--;
		splx(s);

		*regs = ddb_regs;

#ifdef MULTIPROCESSOR
		ddb_regp = &dbreg;
		db_resume_others();
	}
#endif  
	return (1);
}

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"


Home | Main Index | Thread Index | Old Index