NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-amd64/58660: ddb(4) uses IPI instead of NMI to save other CPU registers



>Number:         58660
>Category:       port-amd64
>Synopsis:       ddb(4) uses IPI instead of NMI to save other CPU registers
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 02 16:55:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, ...
>Organization:
The NetBSD Fo
>Environment:
>Description:
When the kernel enters ddb on one CPU, it asks all the other CPUs to save a trap frame for diagnostics and suspend themselves.

    231 #ifdef MULTIPROCESSOR
    232 	if (!db_suspend_others()) {
    233 		ddb_suspend(regs);
    234 	} else {
    235 	curcpu()->ci_ddb_regs = &dbreg;
    236 	ddb_regp = &dbreg;
    237 #endif

https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/db_interface.c?r=1.42#231

(Side note: That indentation is bad and should be fixed.)

    286 void
    287 ddb_ipi(struct trapframe frame)
    288 {
    289 
    290 	ddb_suspend(&frame);
    291 }
    292 
    293 static void
    294 ddb_suspend(struct trapframe *frame)
    295 {
    296 	volatile struct cpu_info *ci = curcpu();
    297 	db_regs_t regs;
    298 
    299 	regs = *frame;
    300 
    301 	ci->ci_ddb_regs = &regs;
    302 
    303 	atomic_or_32(&ci->ci_flags, CPUF_PAUSE);
    304 
    305 	while (ci->ci_flags & CPUF_PAUSE)
    306 		x86_pause();
    307 	ci->ci_ddb_regs = 0;
    308 	tlbflushg();
    309 }

https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/db_interface.c?r=1.42#286

Currently, it does this by sending an IPI (interprocessor interrupt), either via i82489 or via x2apic, to the other CPUs:

    136 static int
    137 db_suspend_others(void)
    138 {
    139 	int cpu_me = cpu_number();
    140 	int win;
    141 
    142 #ifndef XENPV
    143 	if (ddb_vec == 0)
    144 		return 1;
    145 #endif /* XENPV */
    146 
    147 	__cpu_simple_lock(&db_lock);
    148 	if (ddb_cpu == NOCPU)
    149 		ddb_cpu = cpu_me;
    150 	win = (ddb_cpu == cpu_me);
    151 	__cpu_simple_unlock(&db_lock);
    152 	if (win) {
    153 #ifdef XENPV
    154 		xen_broadcast_ipi(XEN_IPI_DDB);
    155 #else
    156 #if NLAPIC > 0
    157 		if (ddb_vec != 0)
    158 			x86_ipi(ddb_vec, LAPIC_DEST_ALLEXCL,
    159 			    LAPIC_DLMODE_FIXED);

https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/db_interface.c?r=1.42#136

    124 static int i82489_ipi(int vec, int target, int dl);
    125 static int x2apic_ipi(int vec, int target, int dl);
    126 int (*x86_ipi)(int, int, int) = i82489_ipi;
...
    331 	if (x2apic_mode) {
    332 		x86_ipi = x2apic_ipi;

https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/lapic.c?r=1.90#124

However, ordinary IPIs can be blocked at any time, whereas NMIs are only blocked when one has been delivered before the next IRET instruction.  We should change, or allow, ddb to use NMIs to trigger the other-CPU save/suspend logic.
>How-To-Repeat:
Get a CPU stuck in a loop with hardware interrupts masked, wait for a heartbeat panic, and try to examine another CPU's trap frame, only to find there's nothing there.

(See also PR bin/58010: crash(8) doesn't support `mach cpu N' to examine registers/stack of other CPUs, https://gnats.NetBSD.org/58010)
>Fix:
Yes, please!



Home | Main Index | Thread Index | Old Index