Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

readlink(1) realpath(1) and POSIX



POSIX is planning to add readlink(1) in the next version.   Nothing
special to say about that (makes no real difference to us, we have it
already, they will specify only the common options.)

But while doing that, they looked at the -f option, and saw in coreutils
that their man page says to use realpath(1) instead of readlink -f

(They never even got as far as detecting that our readlink -f and the
coreutils readlink -f don't act the same).

So, it was asked whether other systems have realpath(1) - we do, kamil@
added it back in Feb 2020, with the comment:

   Port realpath(1) from FreeBSD

   realpath(1) wraps realpath(3) and returns resolved physical path.

   This utility shipped with GNU and FreeBSD is sometimes
   used in scripts in the wild.

It is currently in HEAD only - it will be in 10 when that gets released.

So, POSIX has more or less decided to skip the -f option of readlink,
and require realpath(1) instead (realpath(3) has been around in POSIX for ages,
but is an XSI option ... realpath(1) won't be, just mandatory (probably)).

However, FreeBSD's realpath(1) (now also ours) and the coreutils realpath(1)
are substantially different beasts - the FreeBSD version is (as kamil said)
just a wrapper around realpath(3) and is quite simple.

coreutils realpath is a monstrous mess.    Fortunately, POSIX aren't
proposing standardising almost any of that, just the basic functionality
which replaces readlink -f.

Unfortunately, for POSIX (and us) basic realpath (as in "realpath file")
has the same basic operational difference as readlink -f has between the
BSD & GNU implementations.   Ours is literally: "call realpath(3), if it
returns something, print that, otherwise it is an error".   Theirs allows
the final component in the expanded and canonicalized path to not exist.
(Their doc does not say what "not exist" really means in the hard cases,
but from testing their implementation, it is clear that if namei() returns
ENOENT for the final component, that is an allowed case, any other error
return is not).

The people who use this demand that functionality remain (I'm still unclear
on why - if the file is not to be created, who cares what its canonical path
would be, if it is, create it first using the known name, and canonicalize
later should work I would have thought ... but they don't agree - they say,
that if we want to know if it exists, we can canonicalise first, then test -e
though for a long time I wasn't sure how that was a rational counter argument,
I'm still not).

For a while I thought we could just do (in C, not exactly this) if
realpath($FILE) fails:
	echo $(realpath $(dirname $FILE))/$(basename $FILE)
(with appropriate tests for when $FILE has no '/' etc), but that doesn't
work - it is not just the last component of the $FILE arg which is allowed not
to exist (though that case is part of it) but where that component exists,
and is a symlink, and the last component of that doesn't exist, or exists
and is another symlink for which ... this can go on (almost) forever.

The current POSIX proposal is to specify "realpath -e" (which is a coreutils
arg which makes theirs act just like ours) and also invent a new -E
arg, which would make ours work like theirs.   It would be unspecified
which was the default - ie: all scripts would need to use one of those
options to be portable.   The allowed result when neither option is given is
made even more bizarre to cater for a built in realpath in mksh, which
is even wackier in its default (and only) behaviour (inexplicable in some
cases) than the coreutils version - but the mksh one takes exactly 1 arg,
the path name, and simply execs realpath from the filesystem if anything
different is passed to it, so "realpath -[Ee] file" will bypass that
implementation and run a real one instead.

I have added -E support to our realpath(1) (that is, to the .c, haven't
gotten around to the man page yet) and of course -e (which is more or
less a no-op).   For now, I have made the default be -E if neither option
is given, which returns the same result as we currently get in cases
we do not currently produce an error, and makes our implementation more
compatible with (the small part that is sane) of the coreutils implementation.

I am not proposing adding any of their myriad other useless options, with
the sole possible exception of -z (which causes their realpath to use \0
rather than \n between output paths, and makes it a little safer in the
possible presence of paths containing newline chars when more than one
path arg is given ... the POSIX version (currently) will only specify
realpath working with a required single file arg .. our version (the FreeBSD
version), defaults to "." if no file is given, coreutils don't do that,
and both versions process as many file args as are given).

The source file size about doubles with these changes, which means about
3 times as much actual code (since about half of the current source is the
boilerplate noise).

Any objections to adding this (man page would come with the commit of
course, so will some ATF tests - I will convert my current test script) ?

Any opinions on whether the default (no -e or -E used) should be as
ours is now, or as coreutils is?   (My slight preference is to follow
coreutils here, it is more compatible).

POSIX's hope is that if we do this, FreeBSD will take the code back, and
the other BSD variants might follow, and the end result might be (mksh aside)
a reasonably consistent world.

kre




Home | Main Index | Thread Index | Old Index