tech-userlevel: Re: Permit loose matching of codeset names in locales

Subject: Re: Permit loose matching of codeset names in locales
To: Curt Sampson <cjs@cynic.net>
From: Noriyuki Soda <soda@sra.co.jp>
List: tech-userlevel
Date: 09/05/2004 13:00:46
>>>>> On Sun, 5 Sep 2004 11:46:48 +0900 (JST),
	Curt Sampson <cjs@cynic.net> said:

>> Are you typing charset names regularly?
>> If so, I think your environment is somewhat unusual.

> Please, these are *character set encodings*, not character sets. JIS X
> 208 is a character set. It is represented in many encodings, such as
> ISO-2022-JP, EUC-JP, Shift_JIS and UTF-8.

If you want to use precise wording, "character set encodings" aren't
good either.
Please use "character encoding scheme" - CES (or, "coded character
set" - CCS, if you will mention a character set).

I'm using "charset names" as MIME charset names, and "codeset names"
as UNIX codeset names in locales. So, the above wording "charset
names" by me is just typo of "codeset names". ;-)

> I'm not setting LANG, locale, etc. regularly by hand, no. I have three
> main concerns:
> 
>     1. There is a standard set of encoding names that is widely used.
>     Let's use that, rather than be different for no good reason.

There are various good reasons to use current codeset names, instead
of MIME charset names, because:

	- Current codeset names conform to the convention which is
	  widely used in UNIX based operating systems.
	- Current codeset names are case sensitive, this fact conforms
	  to UNIX convention.
	  If we will conform to MIME charset names, the names
	  must be case sensitive, that isn't what people usually
	  expect in UNIX environment.	
	- There are some third party applications which directly
	  parse locale names, if we would change our codeset names,
	  the change would break such third party applications.
	- Compatibility with existing installation.

>     2. I use one set of configuration files for my accounts across
>     several operating systems. The less I have to do stuff like
>
> 	case `uname` in
> 	    NetBSD) ...
> 	    Linux) ...
> 
>     the happer I am.

Actually, current codeset names used in NetBSD are also supported
in Linux. So, you don't need to use such stuff.
i.e. This won't be a reason to change our way.

>     3. If we can be compatable with Linux, particularly, and to a lesser
>     degree with other popular OSes such as Solaris, without too much
>     difficulty, we should be. That way it's easier to sell using NetBSD
>     rather than Linux.

Our current codeset names are fully compatible with both Linux and
commercial UNIX variants.
If we would make our codeset names conform to new linux standards,
the names would be compatible only with Linux, but not with commerical 
UNIX variants.
So, current codeset names are apparently better for compatibility
reasons.

>> Our codeset names aren't such "our own" things.
>> Our codeset names conform existing UNIX conventions as far as
>> possible, so our current names are just exactly compatible with most
>> commercial UNIX variants.
>> 
>> Changing this is rather "our own" thing, because it makes ours
>> different from existing UNIX conventions.

> If we're compatable with commercial Unix implementations, that's good.
> This change would not make us incompatable with that.

I don't think so.
There isn't actual benefit with the change, it only increases confusion,
IMHO. Fewer variants are better for the codset naming.
--
soda