Subject: kern/28541: mi_switch() can deadlock on biglock
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
List: netbsd-bugs
Date: 12/05/2004 20:00:01
>Number: 28541
>Category: kern
>Synopsis: mi_switch() can deadlock on biglock
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Dec 05 20:00:00 +0000 2004
>Originator: Manuel Bouyer
>Release: NetBSD 2.0_RC5
>Organization:
ASIM/LIP6 http://www-asim.lip6.fr/
>Environment:
System: NetBSD 2.0_RC5 (RAI.MP) #0: Wed Nov 24 17:41:46 CET 2004 bouyer@pop.lip6.fr:/local/pop1/bouyer/netbsd-2-0/src/sys/arch/i386/compile/RAI.MP
Architecture: i386
Machine: i386
>Description:
[initially posted on tech-smp and tech-kern]
This SMP box reliably panics while doing amanda backup with:
panic: TLP IPI rendezvous failed (mask 1)
I have another SMP box (same hardware) with a similar workload, which
is working fine. The difference between the 2 is that this one has
2 8-port puc device for serial consoles (some of them gets a lot
of activity) and it is an amanda client.
The stack traces shows:
CPU 1 (the one that paniced):
panic
pmap_tlp_shootdow
pmap_kremove
pipe_direct_write
pipe_write
dofilewrite
sys_write
syscall_plain
CPU 0:
acquire
spinlock_aquire_count
mi_switch
ltsleep
sbwait
soreceive
soo_read
dofileread
sys_read
syscall_plain
CPU0 is trying to aquire again kernel_lock, while CPU1 has it, and
tries to send an IPI to CPU0.
But I don't know how this would prevent CPU0 from receiving an IPI.
>How-To-Repeat:
Run several mrtg instances, and an amanda client on a dual-CPU box.
>Fix:
unknown.