Subject: port-alpha/25599: Alpha SMP system hangs hard
To: None <gnats-bugs@gnats.netbsd.org>
From: None <mhitch@netbsd.org>
List: netbsd-bugs
Date: 05/16/2004 13:23:10
>Number:         25599
>Category:       port-alpha
>Synopsis:       Alpha SMP system hangs hard
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-alpha-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun May 16 19:24:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Michael L. Hitch
>Release:        NetBSD 1.6.1
>Organization:
Michael L. Hitch                        mhitch@montana.edu
Operations Consulting,  Information Technology Center
Montana State University, Bozeman, MT     USA
>Environment:
	
	
System: NetBSD netbsd2.msu.montana.edu 1.6.1 NetBSD 1.6.1 (CS20.MP) #5: Sat Jun 28 21:49:35 MDT 2003 mhitch@netbsd2.msu.montana.edu:/usr/NetBSD-1.6.1/obj/alphaev56/.alpha/sys/arch/alpha/compile/CS20.MP alpha
Architecture: alpha
Machine: alpha
>Description:

Running an Alpha SMP kernel doing 'lots' of I/O will hang hard.  A halt
or reset is required to recover.

I began experiencing this shortly after I got my API CS20 and running
builds using both CPUs.  It consistently hung during the kernel linking
phase when disk I/O was heavy.  Removing a patch I had applied to allow
my disk drive to negotiate synch/wide transfers seemd to alleviate the
situation, but I would still see a hang every few weeks or months. 
Several other people have reported similar hangs when doing heavy disk
I/O or NFS activity.

At first, my only recourse was to cycle power on my CS20.  After one
hang, I finally located the halt switch (it was located behind a small
hole in the front panel near one of the fans and not easily noticible
with the front bezel in place).  The first time I halted it, all I got
was the PC from the halt message and the register dump from SRM. 
Continuing would panic, attempt to sync disks and dump, and either hung
again, or rebooted.  The PC and register contents seemed to indicate
that CPU 0 was spinning on a PMAP_LOCK().  Continuing from SRM resulted
in a panic, attempts to sync disks and dump, and either hung on the disk
sync, or failed the dump and rebooted.

Once I figured out the kernel had been configured to disable DDB entry
on panic, I was able to get into ddb after the halt and poke around in
the kernel. From the stack traceback and the PC at the halt, it appeared
that CPU 0 was spinning trying to acquire a lock on sched_lock.

Next I started running a kernel with LOCKDEBUG enabled in the hopes it
would say something, but it still hung without any console messages to
provide any clues.  I had also started with a swap partition exactly the
size of memory (yes, I did know better - I don't know why I made that
choice), and finally bit the bullet and repartitioned the disk so I had
some hope of getting a dump. I had a program that would hang the system
easily, but it was very disk I/O intensive and usually left the SCSI
driver in a state that disk syncs and the dump would not work.  I then
just let the system run normally and eventually got a hang that I could
get a dump from after halting the system and continuing.

Now that I finally had a dump file, I could look at the dump with gdb. 
It quickly became apparent I really needed a kernel build with -g, so
yet another iteration....

Now I was able to get several dumps after a hang to analyze.  Each time,
CPU 0 aopeared to be spinning trying to acquire sched_lock.  As I
learned more about the information available with LOCKDEBUG, I
determined that CPU 1 held the sched_lock.  I also determined that CPU 1
had acquired the lock from the idle loop.  Unfortunately, I was unable
to determine what CPU 1 was doing at the time of the hangs.  The dump
routine attempted to halt CPU 1, but couldn't do it and couldn't get the
current state of CPU 1.  I then found out yet more information available
with LOCKDEBUG:  the number of locks held by each cpu, and the list of
simple locks held.  From this I was able to determine that CPU 0 held a
lock on the kernel pmap, and CPU 1 held sched_lock.  Looking at how the
idle loop works, it appeared that when a process gets selected to run,
pmap_activate() gets called, and pmap_activate() locks the proceses's
pmap when mucking with it.

  Now, at long last, I was getting somewhere!   Remembering that the
kernel 'thread' processes run using the kernel's pmap, this hang begins
to make sense.  My theory was that CPU 1 was sitting in the idle loop
waiting for something to do.  In the idle loop, it does an splhigh(),
acquires a lock on sched_lock, and checks for a process ready to run. 
If none is ready, it releases sched_lock, and does an splx() and starts
all over again. Meanwhile, CPU 0 is busy with another process.  It does
something that needs to lock the kernel pmap and locks the kernel pmap. 
Now CPU 0 gets an I/O interrupt and needs to wake up another process. 
In order to do that, it needs to lock sched_lock.  However, at that
point, CPU 1 has now found a kernel thread process that is ready and
already acquired the lock on sched_lock and is trying to acquire the
lock on the kernel pmap - which is held by CPU 0.  Deadlock!!! CPU 1 is
running at splhigh(), and CPU 0 is running at splched() - which appear
to block any console interrupts on the alpha and the machine is now
non-responsive.

In order to test this theory, I applied the following patch.  The patch
just skips the PMAP_LOCK() in pmap_activate() if the pmap is the kernel
pmap.  I don't know enough about the alpha pmap code to know what
problems this could result in, but it couldn't be much worse than a hard
hang.

Index: pmap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/pmap.c,v
retrieving revision 1.191.8.1
diff -u -r1.191.8.1 pmap.c
--- pmap.c	2002/11/24 15:38:39	1.191.8.1
+++ pmap.c	2003/08/30 19:04:02
@@ -2246,6 +2246,9 @@
 		printf("pmap_activate(%p)\n", p);
 #endif
 
+#ifdef MULTIPROCESSOR		/* MP deadlock debug */
+if (pmap != pmap_kernel())	/* MP deadlock debug */
+#endif				/* MP deadlock debug */
 	PMAP_LOCK(pmap);
 
 	/*
@@ -2260,6 +2263,9 @@
 
 	PMAP_ACTIVATE(pmap, p, cpu_id);
 
+#ifdef MULTIPROCESSOR		/* MP deadlock debug */
+if (pmap != pmap_kernel())	/* MP deadlock debug */
+#endif				/* MP deadlock debug */
 	PMAP_UNLOCK(pmap);
 }
 
>How-To-Repeat:
Build an alpha MULTIPROCESSOR kernel and run it for a while on a busy system
(lots of disk I/O or NFS activity).  Eventually it will hang hard and require
a halt or reset.

>Fix:

If the pmap locking is not really required in pmap_activate() when it's the
kernel pmap, the previous patch should be sufficient.

Otherwise, I don't know enough about the alpha pmap code to know how to fix it.

>Release-Note:
>Audit-Trail:
>Unformatted: