Subject: Re: Permit loose matching of codeset names in locales
To: Curt Sampson <cjs@cynic.net>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-userlevel
Date: 09/07/2004 14:14:02
>>>>> On Tue, 7 Sep 2004 10:52:48 +0900 (JST),
	Curt Sampson <cjs@cynic.net> said:

>> http://www.openi18n.org/docs/text/LocNameGuide-V10.txt
>> As you see, official codeset names of locales on Linux should only
>> have uppercase letters, digits and hypens.

> I see. This document says, for example,
> 
>     The following is a list of examples of standard values for the
>     CODESET field...."EUC-JP"...

Yeah, but note that Linux used "eucJP" as the canonical character
codeset name until they made this standard. And they still supports
"eucJP" as well.

>> Another reason is because the namespace used in esdb.alias is
>> different from the namespace for the canonical UNIX codeset names on
>> NetBSD. For example, you can see that the canonical name for Japanese
>> EUC in the esdb.alias is "EUC-JP", but the canonical UNIX codeset name
>> for Japanese EUC is "eucJP" on NetBSD.

> So the canonical codeset name for Linux, which you claim is now being
> brought into line with the rest of the Unix world, is "EUC-JP", 

I don't see there is any actual action that the codeset names are
being brought into the rest of the UNIX world. In any way, it may be
OK to support new Linux canononical names as aliases. But that doesn't
mean supporting every non-canoncail names on Linux is a good idea.

IMHO, the Linux move of codeset names is dumb.
It doesn't add any useful feature to users (*1), it only changes
things which already work without problem. That is complacent.

(*1) They are not going to support full MIME standard.
  e.g. "LANG=ja_JP.ja_JP.Extended_UNIX_Code_Packed_Format_for_Japanese"
  doesn't work even on most recent Linux.

> yet the canonical codeset name under NetBSD, which you claim is
> already in line with the rest of the Unix world is "eucJP".

Yes, HP-UX, Tru64, Solaris and even Linux support ja_JP.eucJP.

> SCO, presumably one of these commerical variants that follows
> the "standard," says here
> 
>     http://osr5doc.sco.com:457/cgi-bin/man/man?locale+M#LOCALE_xopen
> 
> that it follows the _X Logical Font Description Conventions_, which
> nowhere gives me any clear answer on EUC-JP, but in the registry
> file (xsrc/xc/registry in our CVS repository) does offer things such
> as "JISX0208.1990-0" (apparently "GL encoding", presumably encoding
> the JIS X 0208 character set, which by most interpretations would be
> ISO-2022-JP, also known by that name in the new Linux standard, but
> called "ISO2022-JP" on NetBSD.)

You are confused here.

NetBSD is the only operating system which supports ISO2022-JP as a
locale, although ISO2022-JP is very widely used in Japan (nearly all
mails and NetNews are still using ISO2022-JP). So, there is no
existing standard before NetBSD.
You may think ja_JP.JIS7 in X11R6 is the existing standard. But JIS7
is different from ISO2022-JP, because JIS7 includes JISX0201.1976-0:GR 
but ISO2022-JP doesn't.
(FWIW, there is no UNIX variants which support ja_JP.JIS7 either.
 Citrus is the only locale implementation which can support stateful
 encodings like IS02022-JP and JIS7.)

Also, if you really look at the SCO docuemnt closely, you can find
they will use codeset names like "ISO8859-2". This values is what we
are now using, and what Linux was used to be using as their canonical 
codeset name. But new Linux standard says they are going to use
"ISO-8859-2" instead of "ISO8859-2".
i.e. The new convention on SCO is compatible with NetBSD, but not
compatible with new Linux convention.

I think you now see why citrus choosed "ISO2022-JP" instead of
"ISO-2022-JP", that's because existing locales didn't have "-" after
"ISO".

>     http://docs.sun.com/db/doc/817-2521/6mi67tj48?a=view
>
> Apparently, "The ja_JP.PCK locale is based on PC-Kanji code (known as
> Shift_JIS)...." Oops! NetBSD uses "ja_JP.SJIS," and of course the new
> Linux standard will be the preferred MIME name, "Shift_JIS".

Sun was the last commercial UNIX which introduced Shift JIS support.
Every other UNIX variants, (HP-UX, Tru64, NEWS-OS, ...) are using
"ja_JP.SJIS" which is same value with NetBSD. (NEWS-OS was the first
OS which supported "ja_JP.SJIS", that was more than 10 years ago.)

You can see this fact by checking /usr/X11R6/lib/X11/locale/locale.*,
there is "ja_JP.SJIS" in X11R6, but there isn't "ja_JP_JP.PCK".
And you will see the locale names on NetBSD are what X Consortium
was choosed. Our names are nothing like "our own" as you are suspecting.
--
soda