tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Encoding non-alphanumeric characters in manpage filenames



    Date:        Tue, 9 Nov 2021 15:10:44 +0200
    From:        Lassi Kortela <lassi%lassi.io@localhost>
    Message-ID:  <8b3cec93-e8f0-8a1d-675e-0cd8f846caef%lassi.io@localhost>

  | For present purposes, the extension of "foobar.1" is ".1" and the 
  | extension of "foobar.1.gz" is ".1.gz". We can call it "suffix" instead.

It is irrelevant what you call it, with unix filenames they are just
characters (or, perhaps wrt Mouse's comments, octets).   Meaning is
given to them only by programs that for whatever reason desire to parse
the filenames.   There is nothing special.   Whether in the example
above .1.gz is interpreted by the program, or just .gz depends which
program is involved.   (Neither the .1 nor the .gz are really required,
the .1 is implied by the file being inside man1/ and the .gz can be
determined by reading the first few bytes of the file ... these things
sometimes aid humans tell what the file is about, but that's often it.)

  | The suffix is hopefully restricted to [0-9a-z.] in all cases, and hence 
  | doesn't need to be escaped.

Don't bet on that.   That kind of assumption is doomed to eventually fail.
(And that even if you include A-Z in your list of acceptable chars.)

  | Escaping "." in the stem part is good practice when the name of a 
  | manpage contains a dot.

Why?

  | The manpage for "java.lang.System" in section 
  | "3java" should become something like "java%2Elang%2ESystem.3java". Then 
  | it's clear what's the suffix and what's not.

Why?

What's wrong with leaving it alone?

For the man program, there is a list of suffixes that it looks for (or
adds more often) to names - whether the name might have some other
periods in it doesn't matter at all.   If we wanted to be able to put
a man page "foo.3" in section 6, resulting in foo.3.6 as the filename,
that might (just might, it also might not) cause an issue - but
one hopes that we don't actually want to do that.

  | The objective is to find something portable to most modern OSes:

For what purpose, what are you actually attempting to achieve?
Or is this some academic exercise?

Viewing man pages over the web is irrelevant: if a name needs to be
encoded for the URL, it will be, and then decoded by the server at
the other end, the encoding scheme used for that is relevant only
as an example of such a scheme.

Actually copying it might be counter-productive, as to encode a
man page name, which has been encoded, would need to encode the %
chars to pass them as URLs, so you'd end up with an encoded encoded
name.

kre



Home | Main Index | Thread Index | Old Index