tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Encoding non-alphanumeric characters in manpage filenames



   | The point is, man(1) has to find the underlying file.

You want to think of this from the encoding point of view, rather than
decoding.  There should be a canonical encoding, for any given name,
which is what man(1) would be looking for.   Any files that happen to
exist which would decode to the same thing, but aren't canonically encoded
would simply not be found.   As long as you apply the same canonical encoding
routine all the time (when creating files, and when looking for one) there
is no issue at all.   If someone decides to ignore that and create other
files, they still can, they just don't work properly.   Tough.

Exactly.

For the sake of backward compatibility, I suggest also searching for the form where no bytes are escaped. If we don't do that, the NetBSD install scripts that deal with all the Perl manpages have to be edited. (And any pkgsrc packages installing manpages with weird chars have to be special-cased for operating systems whose man(1) program use the escaping.)

The canonical encoding form can be whatever floats your boat, but I'd
suggest most chars represent themselves, except when that doesn't work,
and in that case we encode that char .. and of course, as in my mistaken
interpretation of your question, always encode the magic char).

Ideally we'd come up with a convention that could also be used as-is on Windows by Cygwin, Msys, and WSL (Windows Subsystem for Linux). Being able to use the same encoding in web URLs would be even nicer. That leaves quite a restricted set of portable characters. Little more than:

    A-Z a-z 0-9 _ -

you could add to that a few characters like '@', but that may not be worth it since those appear more commonly as metacharacters in various syntaxes than they do in manpage names.

Home | Main Index | Thread Index | Old Index