tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Encoding non-alphanumeric characters in manpage filenames



    Date:        Tue, 9 Nov 2021 13:20:38 +0200
    From:        Lassi Kortela <lassi%lassi.io@localhost>
    Message-ID:  <b7ff011c-8ba2-642f-4219-2caa433cc018%lassi.io@localhost>

  | "." is used to split the filename extension from the stem part, so 
  | ideally "." would be escaped in the stem.

Why?   (Aside from perhaps a '.' as the leading char).

Unix filenames have no "extension" - that's a concept from other
operating systems, and we should not pay that any regard at all.
A dot in a filename is simply a dot (though a leading one is
given some special status by some utilities).

Some utilities give specific meanings to filenames that end with
specific characters (like RCS likes ",v" endings) - but such
conventions belong to the utilities that implement them (having a
section name at the end of a man page is one like that - though it
is truly redundant, we don't really need the "1" twice in man1/cat.1
man1/cat would have worked just fine).

But I know how hard it can be to escape from ingrained "knowledge"
gained from use of other systems, and since what unix does looks
"kind of like that" to assume that it is in reality just like
that.


In all of this you need to decide exactly what the objective really is.
The encoding method chosen will vary wildly depending upon what that
is.

If the objective is to be portable to other systems, then some of those
impose a 6+3 naming rule, with a very limited char set (upper case letters,
digits, and a couple of other chars for some) - the only way to encode
anything reasonable for those would be to hash the original filename into
a 24 bit value, then use the name that that 24 bit value expands into).
And hope for no collisions (but 24 bits is 16 million, so you'd need 4 thousand
names in the same directory before the probability of a collision gets
above 50%).

If you're just going to aim for some other systems, then you'd have
to justify why those, and not others.

If the objective is to make reasonable, easy to manipulate (if perhaps
ugly to look at while encoded) names for unix systems, then there's a
totally different mindset when looking for an encoding method, and you
can easily find something where 99% of all real life man pages encode
into themselves (encoding changes nothing) which is, for this purpose,
ideal.

What you shouldn't be attempting to do is solve all of the issues,
generate something that is a panacea.   That way just results in madness.

kre



Home | Main Index | Thread Index | Old Index