Subject: Re: Continued problems with SMP on NetBSD/alpha
To: Tonnerre LOMBARD <tonnerre@thebsh.sygroup.ch>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 12/18/2006 13:21:28
--pgp-sign-Multipart_Mon_Dec_18_13:21:23_2006-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

At Mon, 18 Dec 2006 07:56:18 +0100,
Tonnerre LOMBARD wrote:
>=20
> On Mon, Dec 18, 2006 at 12:13:46AM +0100, Hubert Feyrer wrote:
> > To make any remarks, some hard data is needed: panic message, stack=20
> > backtrace, maybe ofer things.
>=20
> None. That's my problem. However, last time it crashed I got an error

Note that to many of us the word "crash" implies a kernel panic
happened, and thus you would be given a core dump to the dump device, or
dropped into the kernel debugger (DDB), depending on how your kernel was
compiled and how it is configured (sysctl ddb.onpanic).

I suspect what you are seeing is a complete machine hang?  Is that
right?  I.e. everything comes to a grinding halt from all external
appearances and the only way you've been able to get it going again is
to push the reset button and reboot (or the halt button and get to the
SRM prompt)?

If you have DDB in your kernel ("options DDB"), and you have the sysctl
ddb.onpanic and ddb.fromconsole settings both turned on (i.e. equal to
one) (perhaps by default with "options DDB_ONPANIC=3D1"), then the
question would be whether or not you can force the kernel into the
debugger (send a BREAK signal on a serial console, as long as your Alpha
doesn't have an RMC lights-out controller that intercepts such signals;
or hit the <Ctrl><Alt><ESC> keys on the keyboard if you're running with
a graphics console), at which point you'd be able to provide the
necessary backtrace ("trace") and perhaps other such information.

Since you are having trouble with what you suspect are SMP issues then
you definitely also _need_ to compile your kernel with the LOCKDEBUG
option too (and you might want to build all of userland that way too so
that all the kernel-grovelling userland tools still work).  This may
help detect some SMP deadlocks, and it will also provide a few extra
routines that can be called from DDB to check the state of some kernel
locks too.

I'd be tempted to say that LOCKDEBUG should always be turned on with
"options MULTIPROCESSOR" for now, except for the nasty incompatibilities
it causes with some very useful userland tools that also need to be
recompiled and installed when running such a kernel too.  In fact I've
simply decided to always build everything with LOCKDEBUG, including my
uniprocessor kernels, just to make life easiest (albiet a little slower).

--=20
						Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>       Secrets of the Weird <woods@weird.com>

--pgp-sign-Multipart_Mon_Dec_18_13:21:23_2006-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: uyZEMm6S8/fT3Ky8Rha4Aq90N4tufEJ+

iQA/AwUBRYbcKGJ7XxTCWceFEQLTLwCg1YZvPFqdGL+lLeHv/8bIluiPP5YAoMvE
GzECgR1aCbdeDGFLJNM6I+t9
=OS3s
-----END PGP SIGNATURE-----

--pgp-sign-Multipart_Mon_Dec_18_13:21:23_2006-1--