Subject: Re: Permit loose matching of codeset names in locales
To: SODA Noriyuki <>
From: Curt Sampson <>
List: tech-userlevel
Date: 09/07/2004 10:52:48
On Tue, 7 Sep 2004, SODA Noriyuki wrote:

> As you see, official codeset names of locales on Linux should only
> have uppercase letters, digits and hypens.

I see. This document says, for example,

    The following is a list of examples of standard values for the
    CODESET field...."EUC-JP"...

> Another reason is because the namespace used in esdb.alias is
> different from the namespace for the canonical UNIX codeset names on
> NetBSD. For example, you can see that the canonical name for Japanese
> EUC in the esdb.alias is "EUC-JP", but the canonical UNIX codeset name
> for Japanese EUC is "eucJP" on NetBSD.

So the canonical codeset name for Linux, which you claim is now being
brought into line with the rest of the Unix world, is "EUC-JP", yet the
canonical codeset name under NetBSD, which you claim is already in line
with the rest of the Unix world is "eucJP".

SCO, presumably one of these commerical variants that follows
the "standard," says here

that it follows the _X Logical Font Description Conventions_, which
nowhere gives me any clear answer on EUC-JP, but in the registry
file (xsrc/xc/registry in our CVS repository) does offer things such
as "JISX0208.1990-0" (apparently "GL encoding", presumably encoding
the JIS X 0208 character set, which by most interpretations would be
ISO-2022-JP, also known by that name in the new Linux standard, but
called "ISO2022-JP" on NetBSD.)

But maybe SCO is weird; let's check out Sun:

Apparently, "The ja_JP.PCK locale is based on PC-Kanji code (known as
Shift_JIS)...." Oops! NetBSD uses "ja_JP.SJIS," and of course the new
Linux standard will be the preferred MIME name, "Shift_JIS".

I have not say, I'm not seeing a heck of a lot of consistency here. This
makes me doubt your arguments about compatability.

Curt Sampson  <>   +81 90 7737 2974
     Make up enjoying your city life...produced by BIC CAMERA