tech-userlevel: Re: port-xen/29887: sysctl kern.consdev coredumps

Subject: Re: port-xen/29887: sysctl kern.consdev coredumps
To: Nathan J. Williams <nathanw@wasabisystems.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-userlevel
Date: 06/21/2005 12:48:10
--+g7M9IMkV8truYOl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jun 21, 2005 at 12:50:55PM -0400, Nathan J. Williams wrote:
> Bill Studenmund <wrstuden@NetBSD.org> writes:
>=20
> > Uhm, I don't remember saying I don't want core dumps. They can be VERY=
=20
> > valuable. However I don't want a core dump to tell me I had a crash in =
an=20
> > error-handling log message. A core telling me I had a crash in code=20
> > handling a case I _should_ be handling (or should be protecting against=
),=20
> > that's fine.
>=20
> How does printf() distinguish when it's being called to print a log
> message and when it's being called, say, to generate normal
> application output? It doesn't. Not coring in the error situation is
> very much a special-case desire.

I disagree. While I agree I spoke of error handling above, I believe it's=
=20
perfectly reasonable to not core dump in the common case, so I don't see=20
this as a special-case. :-) More below....

> > I'm still not getting how printf() or puts() not coring is incorrect=20
> > behavior. The strongest I've heard so far is that the behavior is=20
> > undefined and up to the implementation. I have not heard a requirement=
=20
> > that the implementation has to crash. Yes, I do agree that a program=20
> > operating in a standardized environment (-std=3D<foo> or -ansi or such)=
=20
> > should not ASSUME that it can do it, but I haven't heard someone quote=
=20
> > that we MUST crash.
>=20
> I don't think anyone is arguing that we are REQUIRED to crash, merely
> that it's a BETTER IDEA than not crashing. The term of art that I
> would apply here is "fail-fast": when the application's state is
> corrupted, it should crash as soon as possible, so as to make the
> crash closest to the corruption and prevent any further corruption of
> data.

I agree with that idea. I actually agree VERY strongly. I believe however
that it is more expedient, both in terms of code and reliability, to let a
printf() succeed even if it's passed a NULL string.

If the program is trying to printf() something, it's trying to say=20
something. Whether or not it's in unrecoverable error land or not, my=20
experience is it's easier to figure out what's wrong if you see=20
"client=3D(null)" or some such in an output file rather than getting a core=
=20
dump. Yes, you can figure it out from the core dump, but the printf says=20
it quicker.

While I agree with the "fail-fast" idea, I feel we get more information
with letting a printf() (or puts()) report "(null)" than core dump. I mean
if a program is blindly using the string, then it will core dump soon
AFTER the printf().  :-) The only case where this printf() behavior would
avert a core dump is if the program doesn't use the NULL-string after the
printf(). So we get output (we learn what's going on in the program) and
we have no crash.

Take care,

Bill

--+g7M9IMkV8truYOl
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFCuG76Wz+3JHUci9cRAtC7AJkBSiafUT2GcfWIphlVXO9SVuaVzwCbBsqV
xyZ0TZJL2i+9zrWXbPpLVwY=
=N3gw
-----END PGP SIGNATURE-----

--+g7M9IMkV8truYOl--