Subject: Re: Permit loose matching of codeset names in locales
To: SODA Noriyuki <soda@sra.co.jp>
From: Curt Sampson <cjs@cynic.net>
List: tech-userlevel
Date: 09/07/2004 15:03:36
On Tue, 7 Sep 2004, SODA Noriyuki wrote:

> I don't see there is any actual action that the codeset names are
> being brought into the rest of the UNIX world.

Ok. So we agree that the canonical codeset names under Linux and "the
rest of the world" are different (assuming that this proposal goes
through).

> In any way, it may be OK to support new Linux canononical names as
> aliases.

It "may be ok?" Does this mean you think we should should support the
Linux canonical names as aliases? Or you still object to it? Or you
don't care? I had the strong impression you objected to any aliases at
all, before.

> > SCO, presumably one of these commerical variants that follows
> > the "standard," says here
> >
> >     http://osr5doc.sco.com:457/cgi-bin/man/man?locale+M#LOCALE_xopen
> >
> > that it follows the _X Logical Font Description Conventions_, which
> > nowhere gives me any clear answer on EUC-JP, but in the registry
> > file (xsrc/xc/registry in our CVS repository) does offer things such
> > as "JISX0208.1990-0" (apparently "GL encoding", presumably encoding
> > the JIS X 0208 character set, which by most interpretations would be
> > ISO-2022-JP, also known by that name in the new Linux standard, but
> > called "ISO2022-JP" on NetBSD.)
>
> You are confused here.
>
> NetBSD is the only operating system which supports ISO2022-JP as a
> locale....[s]o, there is no existing standard before NetBSD.

It seems to me that there is an existing standard, and I wonder where
this name "ISO2022-JP" came from. ISO-2022-JP is indeed a GL encoding of
JIS X 0208, if the definition in _CJKV_ is anything to go on:

	GL	Graphic Left. Usually refers to an encoding
		whose bytes have the eighth bit turned off,
		such as ISO-2022.

Given that the SCO page says that the codeset is named with:

    An identification name for the codeset (according to the X
    Consortium Font Charset (Registry and Encoding) Names convention),
    such as: ``ascii'' for 7-bit ASCII, and ``ISO8859-2'' for the ISO
    8859-2 character set.

And when we look in that registery, GL-encoded JIS X 0208 is already
there, and is called "JISX0208.1990-0", what on earth are NetBSD folks
doing inventing new names for this? Sure, the old name is inaccurate,
but you can argue that to a lesser degree for a lot of the other names,
and you're saying we shouldn't change those, or even accept more
accurate alternatives for user input only.

> Also, if you really look at the SCO docuemnt closely, you can find
> they will use codeset names like "ISO8859-2".

Indeed. Because that is exactly what comes from the registry document.

> >     http://docs.sun.com/db/doc/817-2521/6mi67tj48?a=view
> >
> > Apparently, "The ja_JP.PCK locale is based on PC-Kanji code (known as
> > Shift_JIS)...." Oops! NetBSD uses "ja_JP.SJIS," and of course the new
> > Linux standard will be the preferred MIME name, "Shift_JIS".
>
> Sun was the last commercial UNIX which introduced Shift JIS support.

I don't see how this is relevant. Just because Sun is the last, we
should have an explicit policy that we won't accept a locale string that
works fine on Sun?

> You can see this fact by checking /usr/X11R6/lib/X11/locale/locale.*,
> there is "ja_JP.SJIS" in X11R6, but there isn't "ja_JP_JP.PCK".
> And you will see the locale names on NetBSD are what X Consortium
> was choosed. Our names are nothing like "our own" as you are suspecting.

If you go by that locale.aliases file (which is certainly *not*
the file mentioned in the SCO document above), you will find that
the X Consortium aliased "ja_JP.ISO-2022-JP" to the official name,
"ja_JP.JIS7", and there is no "ja_JP.ISO2022-JP" at all. I could go
on further about inconsistencies between these files and the various
alternatives you mention, but I won't bother.

While I can buy your interpretations of "offical" names of codesets,
I can't find anything that makes it significantly stronger than other
possible interpretations.

And I certainly can't see how it benefits a user moving from, say,
Solaris, to be forced to go through pain and suffering because we know
exactly what he means when he asks for "PCK" and simply refuse to let
him use it. Nor can I see how NetBSD benefits when that user gives up
and goes to Linux, which does the odd bit here and there to attempt to
make his life easier.

Soda-san, it seems that your goal here to promote your idea of how
codesets should be named, rather than just live with the reality of how
they are named, at the expense of making NetBSD less user-friendly.

I think I've given ample evidence that the naming is neither as
clear-cut nor as standard as you claim it is. I have registered my
opinion; I will leave it to others to decide how they feel. This is
my last comment on the topic.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.NetBSD.org
     Make up enjoying your city life...produced by BIC CAMERA