Subject: Re: SMP stability issues
To: Chris Rendle-Short <jim@tty1.rr.nu>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-smp
Date: 11/12/2006 00:11:39
On Fri, Nov 10, 2006 at 07:27:22PM +1100, Chris Rendle-Short wrote:
> Hi,
> 
> For the last couple of months I've been running NetBSD 3.0.1 and 3.1 (since yesterday) on an Abit VP6 SMP motherboard with two P3 866's. The system is mainly used as a mail, web, and Samba server, along with occasional other odd tasks.
> 
> When I run off the GENERIC kernel, the machine is rock solid stable. However, when I use either GENERIC.MP or my own kernel (which is basically GENERIC.MP with pcmcia and sound support removed), it invariably locks up after a time running. It is a hard lockup, nothing will revive it other than hitting the reset switch.
> 
> The uptime before the lockup has so far varied between about 1 hour and 6 days. There doesn't seem to be any pattern to it, other than the fact that it only happens when running an SMP kernel. I can't find anything in the logs to give any clues.
> 
> I'm pretty sure it's not a hardware fault, as I've tested everything I can think of. Added to that, prior to running NetBSD the box ran Linux (in SMP mode) without any problems (uptime was 193 days when I took it down to install NetBSD). The root filesystem is on RAIDFrame, if it makes any difference.
> 
> Does anyone have any ideas about what could be causing this, or any troubleshooting clues? Needless to say, it's a very irritating problem.

What chipset does this motherboard have ? Can you post the dmesg ?

Also, you could try to build a kernel with DIAGNOSTIC, DEBUG and LOCKDEBUG
options. A hard hang like that could be a deadlock in the kernel;
one of these options may help to find what's going on.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--