Subject: Re: Permit loose matching of codeset names in locales
To: Curt Sampson <cjs@cynic.net>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-userlevel
Date: 09/07/2004 15:41:14
>>>>> On Tue, 7 Sep 2004 15:03:36 +0900 (JST),
	Curt Sampson <cjs@cynic.net> said:

>> In any way, it may be OK to support new Linux canononical names as
>> aliases.

> It "may be ok?" Does this mean you think we should should support the
> Linux canonical names as aliases? Or you still object to it? Or you
> don't care?

I have to say I don't like the new Linux way. ;-)
But because Linux is most major UNIX variants these days, perhaps
we have to support it anyway ;-)

i.e. I won't object to support new linux canonical names, although
I myself won't do it.

> I had the strong impression you objected to any aliases at
> all, before.

Then please look at my first reply on this thread, I said it's ok to
add an alias.

>> NetBSD is the only operating system which supports ISO2022-JP as a
>> locale....[s]o, there is no existing standard before NetBSD.

> It seems to me that there is an existing standard, and I wonder where
> this name "ISO2022-JP" came from. ISO-2022-JP is indeed a GL encoding of
> JIS X 0208, if the definition in _CJKV_ is anything to go on:

> 	GL	Graphic Left. Usually refers to an encoding
> 		whose bytes have the eighth bit turned off,
> 		such as ISO-2022.

> Given that the SCO page says that the codeset is named with:

>     An identification name for the codeset (according to the X
>     Consortium Font Charset (Registry and Encoding) Names convention),
>     such as: ``ascii'' for 7-bit ASCII, and ``ISO8859-2'' for the ISO
>     8859-2 character set.

> And when we look in that registery, GL-encoded JIS X 0208 is already
> there, and is called "JISX0208.1990-0", what on earth are NetBSD folks
> doing inventing new names for this? Sure, the old name is inaccurate,
> but you can argue that to a lesser degree for a lot of the other names,
> and you're saying we shouldn't change those, or even accept more
> accurate alternatives for user input only.

You are still confused here.

- ISO2022-JP = JIS X0201:GL + JIS X0208.
  So, using JISX0208 for ISO2022-JP is just wrong.
  Theoretically there may be ja_JP.JISX0208 locale which only include
  so-called "zenkaku" characters, so using the name for ISO2022-JP
  prevents the possibilty. (Although it's just a theoretical
  possibility and actually no one is going to use it.)

- All X Consortium Font Charsets are "coded character sets", so it
  cannot be used for any "compound character encoding scheme" like
  ISO2022-JP. It's simply different.
  It can be used for a "simple character encoding scheme" of a single
  byte coded character set like "ISO8859-2", though.
  If you don't know exact meaning of this terminology, please read
  "Unicode Technical Report #17"

>> Also, if you really look at the SCO docuemnt closely, you can find
>> they will use codeset names like "ISO8859-2".

> Indeed. Because that is exactly what comes from the registry document.

Yup.

>> >     http://docs.sun.com/db/doc/817-2521/6mi67tj48?a=view
>> >
>> > Apparently, "The ja_JP.PCK locale is based on PC-Kanji code (known as
>> > Shift_JIS)...." Oops! NetBSD uses "ja_JP.SJIS," and of course the new
>> > Linux standard will be the preferred MIME name, "Shift_JIS".
>> 
>> Sun was the last commercial UNIX which introduced Shift JIS support.

> I don't see how this is relevant. Just because Sun is the last, we
> should have an explicit policy that we won't accept a locale string that
> works fine on Sun?

Well, it's OK to me add an alias for compatiblity with Sun.
But as I repeatedly said, having more than one name is a way backward,
so it's better not to add it unless there is users' demand to add it.

As far as I aware, most Japanese people are fine with current names.

>> You can see this fact by checking /usr/X11R6/lib/X11/locale/locale.*,
>> there is "ja_JP.SJIS" in X11R6, but there isn't "ja_JP_JP.PCK".
>> And you will see the locale names on NetBSD are what X Consortium
>> was choosed. Our names are nothing like "our own" as you are suspecting.

> If you go by that locale.aliases file (which is certainly *not*
> the file mentioned in the SCO document above), you will find that
> the X Consortium aliased "ja_JP.ISO-2022-JP" to the official name,
> "ja_JP.JIS7", and there is no "ja_JP.ISO2022-JP" at all. I could go
> on further about inconsistencies between these files and the various
> alternatives you mention, but I won't bother.

Hmm? Isn't the alias added by Linux people recently?
The name isn't consistent with other names in locale.dir which doesn't
have "-" after "ISO".

The reason why locale.alias have many inconsistencies because the
file is for compatibility with existing implementations.

> And I certainly can't see how it benefits a user moving from, say,
> Solaris, to be forced to go through pain and suffering because we know
> exactly what he means when he asks for "PCK" and simply refuse to let
> him use it.

As I said, I won't object to add "PCK" as an alias, if there is
users' demand.

> Soda-san, it seems that your goal here to promote your idea of how
> codesets should be named, rather than just live with the reality of how
> they are named, at the expense of making NetBSD less user-friendly.

I don't think allowing case insensitive names do make NetBSD
user-friendly, IMHO, it makes users confused, as people misunderstand
that the Linux official codset name is "koi8r".

Anyway, this is not only my idea, but what any UNIX variants except
Linux are currently doing.
--
soda