Subject: Re: mkdir with trailing / (patch proposed)
To: Matthew Orgass <darkstar@pgh.net>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 05/05/2002 22:10:35
[ On Sunday, May 5, 2002 at 16:52:12 (-0400), Matthew Orgass wrote: ]
> Subject: Re: mkdir with trailing / (patch proposed) 
>
>   Well, I just took a look at SUSv3 (a.k.a. IEEE 1003.1-2001, see
> www.opengroup.org/austin).  It seems that they have resolved the ambiguity
> in the specification of trailing slashes by doing something completely
> different.  They specify that trailing slashes are to be intrepreted as if
> there was a trailing '/.'.

Hmmmm....  Yes.... I see....  I find the following definition of a
pathname in SuSv3:

    A character string that is used to identify a file.  In the context
    of IEEE Std 1003.1-2001, a pathname consists of, at most, {PATH_MAX}
    bytes, including the terminating null byte.  It has an optional
    beginning slash, followed by zero or more filenames separated by
    slashes.  A pathname may optionally contain one or more trailing
    slashes.  Multiple successive slashes are considered to be the same
    as one slash.

but then in the secion on "Pathname Resolution" I find:

    A pathname that contains at least one non-slash character and that
    ends with one or more trailing slashes shall be resolved as if a
    single dot character ( '.' ) were appended to the pathname.

I really didn't believe it until I read it with my own eyes -- and I
still can't believe any sane Unix person could stand for it!  Sigh.

>    In the rationale they claim that this simply
> resolves the ambiguity and does not break conforming application,

Well, they say a bit more than that, and it's worse than I thought:

    Pathname Resolution                                                         
                                                                                
   It is necessary to differentiate between the definition of pathname
   and the concept of pathname resolution with respect to the handling
   of trailing slashes.  By specifying the behavior here, it is not
   possible to provide an implementation that is conforming but extends
   all interfaces that handle pathnames to also handle strings that are
   not legal pathnames (because they have trailing slashes).
                                                                                
   Pathnames that end with one or more trailing slash characters must
   refer to directory paths.  Previous versions of IEEE Std 1003.1-2001
   were not specific about the distinction between trailing slashes on
   files and directories, and both were permitted.
                                                                                
   Two types of implementation have been prevalent; those that ignored          
   trailing slash characters on all pathnames regardless, and those that        
   permitted them only on existing directories.                                 
                                                                                
   IEEE Std 1003.1-2001 requires that a pathname with a trailing slash
   character be treated as if it had a trailing "/." everywhere.
                                                                                
   Note that this change does not break any conforming applications;
   since there were two different types of implementation, no
   application could have portably depended on either behavior.  This
   change does however require some implementations to be altered to
   remain compliant.  Substantial discussion over a three-year period
   has shown that the benefits to application developers outweighs the
   disadvantages for some vendors.
                                                                                
   On a historical note, some early applications automatically appended
   a '/' to every path.  Rather than fix the applications, the system
   implementation was modified to accept this behavior by ignoring any
   trailing slash.
                                                                                
   Each directory has exactly one parent directory which is represented
   by the name dot-dot in the first directory.  No other directory,
   regardless of linkages established by symbolic links, is considered
   the parent directory by IEEE Std 1003.1-2001.

> however
> it mandates an extra directory lookup which breaks compatability with
> previous POSIX standards and all previously conforming implementations
> (i.e. rmdir("foo/") no longer works).  The fact that they seem to be
> unaware of this incompatability and that this was a late revision makes me
> wonder if it will change again soon.

I should certainly hope it's fixed!  And soon too!

What the rational says is so completely broken it's just not funny.
That changes completely the generic meaning of a slash and makes it more
than just a separator (though of course a sole "/" has always had a
special meaning, and POSIX & SuSv2 et al have been giving possible
optional special meaning on two, and only two, leading slashes for
nearly ever as well).

There have always been applications written for unix that have used
explicit references to the '.' file in a directory to ensure the last
component in a filename is a proper (accessible) directory.  Indeed even
the original Seventh Edition 'mkdir' command uses this trick itself when
it forms the dirname and checkes it for writability.  However to
automatically do that every time a trailing slash is encountered will,
as you say, completely break many conforming

Furthermore if you test the implementation of any original UNIX you'll
find that trailing slashes are simply ignored.  Everywhere.  IEEE
1003.1-2001 definitely breaks previously conforming applications even
when they don't wish to refer to a directory!

I don't know what exactly their "historical note" could refer to.  If
anything it must refer to something prior to the Fifth Edition Unix
since there's no functional difference between /usr/sys/ken/nami.c in v6
(only comments were added) and trailing slashes are clearly ignored, as
is documented on page 19-2 of the John Lions Commentary reprint where
line 7535 is described like this:

	Multiple shashes are acceptable!  (i.e. "////a///b/" is the same
	as "/a/b");

(Lions doesn't document the other multiple slash eater on line 7578)

Ritchie & Thompson's 1974 CACM paper quite clearly describes the "/" as
a separator:

	When  the  name of a file is specified to the system, it may
	be in the form of a path name, which is a sequence of direc-
	tory names separated by slashes, ``/'', and ending in a file
	name.  If the sequence  begins  with  a  slash,  the  search
	begins  in  the  root directory.  The name /alpha/beta/gamma
	causes the system to search the root  for  directory  alpha,
	then  to  search  alpha  for  beta, finally to find gamma in
	beta.  gamma may be an ordinary file, a directory, or a spe-
	cial file.  As a limiting case, the name ``/'' refers to the
	root itself.

I am also really quite stunned by their claim the new rules are of any
benefit to application developers!  That's a load of rubbish if I ever
even heard of one.  Application developers who don't know about the
trailing '.' trick are really in need of a clue!


>   Given this situation, it may be prudent to wait until after the 1.6
> branch before making any changes to the native behavior

That would be A-OK with me.

> (if there was any
> chance of getting it in before then anyway), especially since the current
> NetBSD behavior is the closest implementation to what is specified that is
> compatable with past standards and thus might wind up being specified as
> the correct behavior.

I don't know that I would really call what NetBSD implements to be
either compatible with what is specified.  It's kind of a half-breed,
and unfortunately the face it shows for some userland commands is also
completely different than it is for the system API.

>  On the other hand, it appears that every other
> modern OS allows mkdir("foo/") to create a directory.  The compatabaility
> code at least should be updated.

At the very least, and this is something that should probably be in the
next possible release (and even back ported to ongoing maintenance
releases.  It's a bug, flat out, and fixing it won't break anything.

>   So, what does NetBSD want to do with trailing slashes in native code?

:-)

> And what (if anything) should be done to bring this situation to the
> attention of the Austin folks?

Personally I'm going to look into filing something with them.
Unfortunately I'm a very long way from Reading, UK... probably further
than all the pages of all the POSIX standards lain top to bottom, even
if printed on A4 paper!  ;-)

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>