Subject: Re: iconv and conversion from/to local charset and wchar_t
To: Dave Huang <khym@azeotrope.org>
From: Noriyuki Soda <soda@sra.co.jp>
List: tech-userlevel
Date: 01/31/2004 02:25:49
>>>>> On Fri, 30 Jan 2004 09:44:11 -0600, Dave Huang <khym@azeotrope.org> said:

> I don't think nl_langinfo(CODESET) is portable--as far as I can tell,
> there's no standard for either the codeset names returned by
> nl_langinfo, or the acceptable codesets for iconv. I ran into a
> problem with an older version of scmxx (see
> <http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=20017>) where
> NetBSD's nl_langinfo(CODESET) would return "646" for ASCII and
> "ISO8859-1" for ISO 8859-1.  However, GNU iconv was looking for
> "ASCII" or "ISO646-US" for ASCII, and "ISO-8859-1" for ISO 8859-1.

That's because you combined different vendor's libraries.
NetBSD-1.6.x didn't have native iconv() library, so, your choice is
the only option fot the NetBSD-1.6.x branch. And in that case, the
codeset-name-conversion function like what I described in [1] is
needed to combine functions from different vendors:

But NetBSD-current, Linux and any commercial UNIXes have iconv() and
nl_langinfo() in same library (libc), in this case, you can assume
consistency about codeset names between iconv() and nl_langinfo().

From SUSv3 point of view, the state of NetBSD-1.6.x is incomplete,
and it is better to treat it as a special case.

That doesn't mean NetBSD-1.6.x shouldn't be supported in scmxx,
so, you are right and what I wrote in [2] isn't enough (I forgot
to mention that it doesn't apply to NetBSD-1.6).

In other words, the following my statement isn't quite right.

> iconv_open(nl_langinfo(CODESET), "UCS-4") should work on NetBSD.

It should be read as the following:

  iconv_open(nl_langinfo(CODESET), "UCS-4") should work on NetBSD-curernt.

So, to make scmxx work on both NetBSD-1.6 and NetBSD-current,
the following code is needed to convert locale dependent string
to UCS-4:

	iconv_open(current_locale_codeset(), "UCS-4")

The function current_locale_codeset() is:
	on NetBSD-1.6:
		the gnu_iconv_codeset_name() function I described
		in the [1]
	on Linux, NetBSD-current and any other commercial UNIX:
		iconv_open(nl_langinfo(CODESET))
	
[1] http://mail-index.netbsd.org/tech-userlevel/2004/01/21/0002.html
[2] http://mail-index.netbsd.org/tech-userlevel/2004/01/31/0000.html
--
soda