current-users: Re: err(3) and error handling

Subject: Re: err(3) and error handling
To: Ian Fitchet <I.D.Fitchet@fulcrum.co.uk>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: current-users
Date: 02/07/1995 08:49:50
> >What would you rather it print?
> >[...]
> >machine, or ...  (Or maybe you ought to RTFM?)
>  Whilst the traditional *NIX voodoo obscurity may suit many people I
> don't see why NetBSD can't/shouldn't break the mould and be a bit more
> punter friendly with relatively little overhead.

Yeah, I would have to agree that the kind of reply seen ("don't change the
error messages, it could be worse after all"*) is one of the few things that
keep the Unix-Haters people from looking like a bunch of complete loonies.

	Ken Thompson has an automobile which he helped design.  Unlike most
	automobiles, it has neither speedometer, nor gas gage, nor any of the
	numerous idiot lights which plague the modern driver.  Rather, if the
	driver makes any mistake, a giant "?" lights up in the center of the
	dashboard.  "The experienced driver", he says, "will usually know
	what's wrong."

This old joke is funny, of course, because not even Ken Thompson is quite
that minimalist.  (Heck, even (traditional) ed used the error message "?TMP"
if the temporary file overflowed, rather than only "?").

There are many possible strategies for improving the error messages that avoid
the bizarre excesses of IBM or DEC's VMS[**].  UNOS[***], for example, simply
had a separate, richer set of errno-like return codes (the compatability
library would "dumb them down" into errno values); common types of errors
(like argments out of range) still got common values, but genuinely unique
codes (like corrupted filesystems) got their own codes rather than catchall
EHUH values.  I think UNOS had maybe 300 or so error codes defined when I
left CRDS.  Some programs would do their own interpretation of codes, but
the fact that so few error codes were overloaded meant that the default text
was usually an adequate explanation of what went wrong.  As to the huge table
of default text messages, those few of the audience still concerned about
shoehorning programs into PDP-11 address spaces may breathe easier:  the
text was kept in a file, and a function was provided to look up the text
at runtime.  (Come on, how time critical is
	if (wham(x) < 0) {
		errmsg("you blew it:");
		exit(66);
	}
anyway?)  This also made it easier to customize the system for overseas
users (of course, the vast bulk of English strings embedded in programs
overwhelmed that advantage by quite a bit... ;-).

It does seem like it will be difficult to propose a scheme that would be
universally accepted (considering both the number of other schemes in use,
and the number of UNIX vendors with a vested interest in simply shipping the
same old stuff, year after year, doing nothing more than buying tapes of
brand-new stuff, compiling it, and shipping the result if the compiler manages
to produce more executables than error messages), but that isn't necessarily
a reason to throw in the towel.  I must admit that I feel that anyone who
thinks that a corrupted filesystem is best described by "invalid argument"
should seek professional help immediately (from a professional software
engineering professor, that is :-).


[*] Though I must admit that IBM did a really good job in that department.
In AIX, every error message printed by a UNIX utility has been given a
6-digit code (which you can look up in a vast tome) in addition to the
traditional sys_errlist[] message.  In every single case that I personally
had to deal with in my brief exposure to AIX, the printed descriptive text
to be found in that manual simply re-iterated the mindless description of
the general problem associated with the given errno return (i.e.,
"666-394 Invalid Argument" from mount would almost certainly point to text
reading "An invalid argument was given to a system call" rather than "The
filesystem format on the device to be mounted was probably corrupt or
nonexistant.").  Way to go, IBM.

[**] I wish I could remember the exact format of the messages (so I could
do this right), but VMS's default habit of giving not-often-useful messages
THREE TIMES with increasing frenzy

	VMS-E-FUKUP You fucked up.
	VMS-E-FUKUP You really fucked up, you know.
	VMS-E-FUKUP This program finds it difficult to comprehend the degree
		    to which you have fucked up.  You are directed to check
		    the manual set for a complete discussion of the faults in
		    your personal hygeine, social habits, and ancestry (page
		    10732, room 27, shelf 12, volume 53).

(Yes, I know there was a switch to have VMS fail to print some of these; I
never found the messages adequate, printed or not.)

[***] A UNIX workalike written at Charles River Data Systems.  In its early
days, it was sort of a "thinkalike", with system calls that looked reminiscent
of UNIX, but eventually grew a SVID1 compatability mode (courtesy of yours
truly).  It began to die out when it became clear that a handful of people
couldn't keep up with AT&T's "System V--Consider whatever crap we hammer out
today Standard" effort (to say nothing of the fact that it lacked many modern
amenities like TCP/IP (remedied very late), virtual memory (never remedied),
and so on).  Its fundamental synchronization primitives were eventcounts (which
will be recognized by OS theorists and Apollo fans).