Subject: Re: multi find with different file outputs
To: None <tech-userlevel@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-userlevel
Date: 10/04/2005 00:09:22
Herewith my thoughts.  They're probably worth what you paid. :-)

> +.It Ic -fprint Ar filename
> +This creates
> +.Ar filename
> +or overwrites the file if it already exists.
> +It writes the pathname of the current file to this file, followed
> +by a newline character.

It would be good to specify, in the manpage, exactly when the file will
be created (and the original, if any, destroyed).  As I read the
patches, this does not occur until (and unless) the first pathname is
printed to that file, which is somewhat counterintuitive (to me, POLA
says that if no pathnames are printed to it, the file will be empty,
rather than untouched).

> +		fprint_file_handle = open(plan->c_data,
> +			O_WRONLY | O_CREAT, DEFFILEMODE); 

Don't you want O_TRUNC as well?

> +	if ((write(plan->flags, entry->fts_path,
> +		strlen(entry->fts_path)) == -1) ||
> +	    (write(plan->flags, "\n", 1) == -1)) {
> +		warn("write");
> +		_exit(1);

Any particular reason for not using writev(), so as to do only one
syscall per, rather than two?  Or for that matter, why not use stdio?

> +	if (filename[0] != '/') {
> +		/* for relative filename, prepend the
> +		   original working directory */
> +		if ((new->c_data = malloc(
> +		     (strlen(starting_directory) + strlen(filename) + 2)
> +                       * sizeof(char))) == NULL) {
> +			warn("malloc");
> +			_exit(1);
> +		}
> +		snprintf(new->c_data, strlen(starting_directory) +
> +			strlen(filename) + 2, "%s/%s", starting_directory,
> +			filename);
> +	}
>
Combined with the way plan->c_data is used above, this assumes that the
original working directory, as of startup time, plus the specified
path, still points to the correct place when the first pathname is
written.  While a certain degree of race condition is inevitable here,
I submit that it is less surprising all around to open the file at the
end of startup, rather than delaying a possibly long time and hoping
everything is still the same then.

> +char *starting_directory;	/* starting directory for -fprint */

> +	if ((starting_directory = getcwd(NULL, 0)) < 0)
> +		err(1, "getcwd");

This means that find suddenly no longer works, at all, if it can't
getcwd().  I would consider this a critical bug.  At the very least, a
failure here should not kill the run unless the cwd is needed
(currently, meaning, if -fprint is used).

All of these problems would be fixed by opening the output file at
startup rather than delaying.  The need to getcwd() (and the associated
error condition), the untouched-if-nothing-printed surprise, and the
race with the path to cwd no longer referring to it, all those go away
if this is done.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B