current-users: Re: ffs

Subject: Re: ffs_alloccg panic
To: Mike Long <mike.long@analog.com>
From: Chris G Demetriou <Chris_G_Demetriou@ux2.sp.cs.cmu.edu>
List: current-users
Date: 08/04/1996 19:33:43
Mike Long said:
> >Date: Thu, 18 Jul 1996 22:30:57 -0700
> >From: "Michael L. VanLoon -- HeadCandy.com" <michaelv@HeadCandy.com>
> 
> >But, this raises a question: do panics in the kernel, with DIAGNOSTIC
> >turned on, still signal a potential bug?  I'm thinking kind of
> >anologeous to warnings emitted from a C compiler.  In other words: in
> >a kernel where all known "warnings" were "fixed", should it be
> >expected that this kernel would never panic if DIAGNOSTIC was built
> >in?
> 
> Mostly; unrecoverable errors would still cause a panic.
> 
> Code wrapped by #ifdef DIAGNOSTIC performs sanity checking of
> assumptions made by kernel code.  In your case, disabling DIAGNOSTIC
> may eliminate panics, but at the cost of possibly introducing silent
> filesystem corruption.  I'd prefer a panic, myself.

"unrecoverable errors would still panic" doesn't really make sense.

DIAGNOSTIC should add no panic() calls which do not signal real
problems.

The point of DIAGNOSTIC is that with a bug-free kernel, in a perfect
world, and all of that, #ifdef DIAGNOSTIC panic() calls should never
be hit, because the situations they signal would not occur.


It's not that "unrecoverable errors would still panic" it's that
conditions which are so far out-there as to cause a panic immediately
would panic (e.g. the cases where DIAGNOSTIC is sanity-checking a
pointer), but many errors which ARE NOT RECOVERABLE (but which also
should not happen) would not be caught and may cause data corruption
and/or a later crash.


In other words, any panic() in a DIAGNOSTIC that's committed to the
source tree should indicate a (potentially serious) software bug (or
could be triggered e.g. by memory corruption by hardware, or
something).  If you're seeing panic() with DIAGNOSTIC turned on, and
they go away with DIAGNOSTIC turned off, it's quite likely that
something bad is going on anyway, and you're just not noticing.


FWIW, the meaning that I use for the various flags is:

	DIAGNOSTIC:

		Cheap sanity checking of assumptions and for bugs.

	DEBUG:

		Extensive (potentially very expensive) checking of
	        assumptions and for bugs.

The notion is that if you can afford your kernel to be a couple of
percent slower, or need the sanity checking, use DIAGNOSTIC.  If
you're debugging something, use DEBUG.

DEBUG can be really expensive...  i seem to recall a figure on the
order of 20+% performance hit on the sparc (because of extensive pmap
debugging checks), if you use it, but i could be misremembering.

Some more expensive checks which arguably should be DEBUG are
actually DIAGNOSTIC, because the bugs that they check for are so
common (e.g. the multiple-free testing in kern_malloc.c).



My thoughts:

	(1) EVERYBODY should use DIAGNOSTIC, with the possible
	    exception of a few people running a 'stable' release on
	    a production system.  In general, unless you ABSOLUTELY
	    cannot afford it, use DIAGNOSTIC.

	(2) people actually developing code should use DEBUG, if
	    that is at all feasible for their architecture.

	(3) people taking performance numbers may or may not want to
	    run with DIAGNOSTIC turned on, but definitely don't want
	    to use DEBUG.

All of the NetBSD/Alpha kernel configs that I use (i.e. all except
NOSY 8-) have both DIAGNOSTIC and DEBUG turned on.



cgd