tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Changing basename(3) and dirname(3) for future posix compatibility



The next POSIX update (issue 8 of whatever it is called in the wild) - that
is the next "minor corrections" update, as distinct from the next major
update, is planning on (if I follow correctly, has already decided) to
tighten the definitions of basename(3) and dirname(3) in order to make
it possible to implement them in a thread safe way, and I think to actually
require that.

Issue 8 will probably appear sometime in 2018 - late 2018 if I had to
guess, but 2019 is certainly not impossible.

That is, they are removing the "or in a static buffer" part of the
allowable result - which is the way our versions of those functions are
implemented.

The "the input string may be modified" option is being retained.

This is in response to bug report 1064
	http://austingroupbugs.net/view.php?id=1064

To comply with this will need changes to our implementations - I believe
they can be done without too much harm, I am as yet undecided whether
versioning is needed - the function signatures do not change, but there
might be NetBSD applications currently which are assuming that the input
string is not modified by these functions, and implementing them that way
(for most inputs, and particularly for dirname() but also basename()
sometimes) is far and away the easy way to comply.

FreeBSD have already changed their implementation (they had it harder than
we do, as their prototype had "const char *" as the arg type, where we do
not.)   Others are, or have already updated as well.

The changes proposed, and I believe agreed, are ...

   From the basename() and dirname() articles, remove the following
   two sentences:
	"The basename() function need not be thread-safe."
	"The dirname() function need not be thread-safe."

   In the basename() article, change this sentence:
	"The basename() function may modify the string pointed to by path, 
	and may return a pointer to internal storage. The returned pointer
	might be invalidated or the storage might be overwritten by a
	subsequent call to basename()."
   To:
	"The basename() function may modify the string pointed to by path."

   In the dirname() article, change this sentence:
	"The dirname() function may modify the string pointed to by path,
	and may return a pointer to internal storage. The returned pointer
	might be invalidated or the storage might be overwritten by a
	subsequent call to dirname()."
   To:
	"The dirname() function may modify the string pointed to by path." 

Are there any objections to updating our implementations to match this
change?

It is worth noting that our dirname(3) and basename(3) man pages include...

BUGS
     If the length of the result is longer than PATH_MAX bytes (including the
     terminating nul), the result will be truncated.

     The dirname() function returns a pointer to static storage that may be
     overwritten by subsequent calls to dirname().  This is not strictly a
     bug; it is explicitly allowed by IEEE Std 1003.1-2001 ("POSIX.1").

Both of those would be removed by the change (and the "it is explicitly 
allowed" would no longer be true.)

My current suggestion would be to simply do the implementation the other
obvious way - the couple of cases that return known fixed strings ("." or "/")
would simply return those, all other cases would return a pointer into the
input string, which would have a '\0' dumped on top of a '/' when needed
(which can happen with both basename and dirname).

It is also worth noting (though unsafe to rely upon) that calling basename()
and then dirname() on the same input string is safe, and works (though in
the other order would not, whereas it does now.)   It has never been standards
compliant to do that (in either order - after calling either function the
input string must be assumed to have possibly had its contents trashed.)

FYI:  Other approaches were considered, allowing NULL return and errno in
cases that couldn't be handled, adding _r() functions, which couldn't be
done, as while FreeBSD had done that, they'd botched it, and their function
design was useless - but posed a road block to doing it properly (even if
inventing a new interface was reasonable).  Designing an entirely new function,
to return both basename and dirname (via ptr to ptr args) and the lengths
(via ptr to size_t) as non-null terminated "strings", so pointers into an
unmodified input string could be returned - but that would be an entirely
new invention, which posix (or really any stds ord - regardless of how many
times it is violated, esp by the IETF) is not supposed to do, and here wisely
chose not to.   We could implement such a function however...

Opinions?   Other (new) implementation methods (without changing the
function prototype) ?

kre



Home | Main Index | Thread Index | Old Index