tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Changing basename(3) and dirname(3) for future posix compatibility



    Date:        Tue, 12 Dec 2017 11:04:41 +0100
    From:        Joerg Sonnenberger <joerg%bec.de@localhost>
    Message-ID:  <20171212100441.GA32032%britannica.bec.de@localhost>

  | Effectively requiring modifyable input for basename(3) is highly
  | annoying and breaks legacy code.

Are you talking of just basename() or both basename() and dirname() ?

  | FreeBSD had quite a bit of fun with that.

So I understand, but they had changed the arg (for both) to be const char *,
and had code taking advantage of that.   We only made that mistake
for about 3 weeks, almost a decade ago, so we cannot have any code
passing const strings (like _PATH_XXX) to these functions.

  | I don't really see the point since thread-safety can be obtained
  | much easier by using TLS.

As an implementation technique that would be fine - nothing requires
that the input be modified.   What is (will be) required is that another
call to either of these functions (with different input) by the same,
or a different, thread. not alter the result of the first call, while
that remains valid (which is basically while the input parameter
remains valid.)

  | So who will audit src and pkgsrc for the
  | regressions this is going to introduce?

I have started on src, there are less (that I have found anyway, so far)
than you might think, most of the uses are already fine.  And of course,
fixing the places that are (will be) incorrect is generally not difficult.

In theory, pkgsrc shouldn't be a problem, as linux either already has,
or will have, a similar implementation, FreeBSD does now, and some other
systems have never had anything different (have always modified the input
buffer) - so code that works other places should be OK.

  | Side issue: the reason why basename(3) has to modify the input at all is
  | because it is supposed to drop trailing slashes.

Yes.   And if you're willing to know that, and know that that case won't
happen, then you can safely call basename() and know the input won't be
modified - POSIX simply won't guarantee it (it actually never has, all
that is changing really is dropping the permission to return the answer
in a static buffer, which can be overwritten by the next call, and of course
also avoiding the nonsense that we currently have of silently truncating
the result if the input is too long, which has never been kosher.)

Similarly, with the common (modifying) implementation, the sequence

	bn =- basename(buf);
	dn = dirname(buf);

works too - but similarly, is not (and never was) guaranteed.
(And will not work if performed in the other order.)

  | IMO that is the more
  | central design mistake, but it is prevalent in various other parts.

Mistake or not, that isn't going away.

Almost lastly, for legacy implementations - the very first (as in from 7th
edn) implementations modified the input (at least for dirname - I doubt the
trailing / part existed back then.)   NetBSD switched to the static buffer
technique back in 2002 - before then, it also modified the input.

And finally, finally, we can always just tell POSIX to go screw itself,
we have done that in a few other places - that is certainly an option.
But unless we're going to do that, moaning about how bad it is is not
likely to help a lot.

kre



Home | Main Index | Thread Index | Old Index