Subject: Re: iconv question
To: None <wiz@NetBSD.org>
From: Noriyuki Soda <soda@sra.co.jp>
List: tech-userlevel
Date: 01/21/2004 21:14:17
>>>>> On Tue, 20 Jan 2004 10:23:23 +0900 (JST),
	Masao Uebayashi <uebayasi@pultek.co.jp> said:

> On second thought, just using "utf-8" won't work; scmxx is probably
> using convert_to_internal() as a (wrong) replacement of mbstowcs().
> If this is the case, scmxx needs an overhaul, unfortunately.

Yeah.
scmxx definitely depends on non-portable assumptions like follows:
- It assumes that wchar_t is always Unicode, or at least wchar_t
  can represent any character of Unicode.
  While this is true with glibc, it is not true with libc of
  either NetBSD or commercial UNIX.
- It assumes iconv() can convert a multibyte-character-string to
  a wide-character-string (or vice versa) directly.
  While this is true with GNU iconv, it is not true with libc of
  either NetBSD or commercial UNIX. wcstombs() or mbstowcs()
  must be used instead of iconv().

There are two ways to fix this problem:
1. remove the wrong assumptions, and make it really portable.
Or,
2. don't fix, just make a patch.

>> We have patches in pkgsrc that look like this:
>> -      t=convert_from_internal(nl_langinfo(CODESET),wide_str,2);
>> +      t=convert_from_internal("char",wide_str,2);
>> since that seems to have worked with the libiconv
>> package, but this doesn't work with our libiconv.
>> The nl_langinfo version doesn't seem to work either.

Suppose we'll take 2, I guess the following way may work:
- link GNU iconv with scmxx.
- rewrite "nl_langinfo(CODESET)" in the scmxx source to
  "gnu_iconv_codeset_name()", and provide the following implementation:

char *
gnu_iconv_codeset_name(void)
{
	char *netbsd_codeset_name = nl_langinfo(CODESET);

	/*
	 * nl_langinfo(CODESET) on NetBSD is almost compatible with
	 * the codeset names for GNU iconv. But there are some
	 * exceptions. We map such codeset names here.
	 *
	if (strcmp(netbsd_codeset_name, "646") == 0) {
		return ("ANSI_X3.4-1968");
	/* XXX - perhaps some other mappings may be needed here */
	} else { /* otherwise NetBSD codeset names are compatible with GNU's */
		return (netbsd_codeset_name);
	}
}

(NOTE: I haven't really tested this code.)

P.S.
Our codeset name "646" for the "C" locale is same with Solaris.
--
soda