Subject: Re: PR's about which(1)
To: NetBSD-current Discussion List <current-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 04/01/2004 21:50:27
[ On Thursday, April 1, 2004 at 22:17:02 (GMT), Christos Zoulas wrote: ]
> Subject: Re: PR's about which(1)
>
> In article <c4i18u$h8l$1@serpens.de>,
> Michael van Elst <mlelstv@serpens.de> wrote:
> > 
> > Isn't it a POSIX requirement that most (but not all) shell builtin
> > commands can be exec()'d ? Even things like 'umask' exist as
> > commands in the PATH.
> 
> I think so.

Hmmm, no, I think not, or at least it's not that general as there are
two distinct classifications of built-in utilities.  The precise wording
in Issue 6 IEEE Std 1003.1-2001 is:
                                                                                
   However, all of the standard utilities, including the regular
   built-ins in the table, but not the special built-ins described in
   "Special Built-In Utilities", shall be implemented in a manner so
   that they can be accessed via the exec family of functions as defined
   in the System Interfaces volume of IEEE Std 1003.1-2001 and can be
   invoked directly by those standard utilities that require it (env,
   find, nice, nohup, time, xargs).

[[ .... ]]

   The special built-in utilities in this section need not be provided
   in a manner accessible via the exec family of functions defined in
   the System Interfaces volume of IEEE Std 1003.1-2001.

	break, colon (:), dot (.), eval, exec, exit, export, readonly,
	return, set, shift, times, trap, unset

However the explanation given for requiring some of those "regular
built-in utilities" is highly questionble.

Assuming "time" is itself a built-in (or that you'd never want to time
the execution of any built-in) the only "regular" built-in utilities I
can think of that can in any way truly benefit from implementation as
separate programs are:  "kill" (for use from "xargs"); and "pwd" (only
because the built-in version can be required to give confusing answers
when the OS supports symlinks).

"true" and "false" have traditionally been available as separate
programs too, but their utility as such is quite a bit more
questionable, assuming they are available as built-ins.  "echo" has been
a separate program to, but of course it's only real utility is also when
the shell has no built-in "echo" and no way of aliasing one from any
other built-in.  Implementing "printf" as an external program is also
kinda silly, unless of course the shell has no built-in "printf", though
potentially it can be useful from other less sophisticated interpreters.
However given the definition of popen(3), even this is somewhat
questionable.

I there are of course ways to implement some of the other built-ins as
external commands with external databases that the shell co-operates
with (e.g. "fc", "alias" and "unalias"), but without a specifically
co-operating shell implementation, and outside the current shell
execution environment, there's really no point.  Even "wait", given the
way it is defined, is either impossible outside the shell execution
environment, or unnecessary (just use waitpid()), and the "informative"
APPLICATION USAGE and RATIONALE sections say as much, though allow for
it to exist as a separate command and have the equivalent result of
"true" when called without any command-line parameters.

Most of the rest of the built-ins, just like "which" (or "type" without
the '-p' option, or "read", or especially the jobs related utilities),
have absolutely no possible meaningful purpose outside of the immediate
shell execution environment, and deeper exampination of the related
"informative" sections of P1003.1-2001 shows agreement despite their
definition as "regular" and not "special".

If any other program really needs to know if a progam is in the PATH, or
where it is in the PATH, implementation using simple string manipulation
and direct low-level system calls (e.g. as a C library function) makes
infinitely more sense than implementation as an external program that
would have to be fork()'ed and execve()'ed.

Even so it's just as easy to do:

	asprintf(&cmd, "sh -c 'type %s'", thing);
	fd = popen(cmd);

as it is to do:

	asprintf(&cmd, "which %s", thing);
	fd = popen(cmd);

and of course if we're talking about POSIX, only the former (or rather
its equivalent using malloc() and sprintf()) is even remotely conforming
since there's no "which" program defined by POSIX.

Interestingly "type" is not listed as a "regular built-in utility", nor
is it listed as a "special built-in utility", though it is of course a
standard utility.  However in the "informative" APPLICATION USAGE
section the following caveat is given:

     Since "type" must be aware of the contents of the current shell
     execution environment (such as the lists of commands, functions,
     and built-ins processed by "hash"), it is always provided as a
     shell regular built-in.  If it is called in a separate utility
     execution environment, such as one of the following:

	nohup type writer
	find . -type f | xargs type

     it might not produce accurate results.

I'm not sure why they didn't just list it as a "special built-in", nor
why the confusing "privided as a shell regular built-in" language is
used.  Poor editing, I guess.

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>