Subject: Re: iconv and conversion from/to local charset and wchar_t
To: Hendrik Sattler <email@example.com>
From: Noriyuki Soda <firstname.lastname@example.org>
Date: 01/31/2004 00:04:35
>>>>> On Thu, 29 Jan 2004 18:53:03 +0100,
Hendrik Sattler <email@example.com> said:
> I am currently programming with GNU libiconv on Debian GNU/Linux. I was told
> that the NetBSD iconv implementation does some things a bit different.
> Mainly, I am interested in some details about easy converting
> characters from local input to an UCS-4 encoded string.
> To do this, GNU libiconv suggests (for iconv_open()) using "" (empty string)
> or "char" for conversion from/to charset as defined by the current locale
> (char*), and "wchar_t" for conversion from/to wchar_t*.
> Another method for the encoding of the string inside a char* that was read
> from the system may be to use nl_langinfo(CODESET).
> What is the suggested method for NetBSD's iconv implementation?
> Currently, I do a runtime check like
> and look at the return value _not_ being (iconv_t)-1.
> Will this work with NetBSD's iconv implementation?
I don't think you need the runtime check, because nl_langinfo(CODESET)
should work on all systems, including Linux, NetBSD and other
commercial UNIX variants.
The "" and "char" are GNU specific extension to iconv_open(3),
and this extension isn't really essential, because nl_langinfo(CODESET)
does same thing with portable way.
So, iconv_open(nl_langinfo(CODESET), "UCS-4") should work on NetBSD.
BTW, Solaris 8 and Solaris 9 only support UTF-7/8/16 for the direct
conversion from/to UCS-4.
So, you have to use the following way to make your program work
on Solaris 8 and Solaris 9:
1. use iconv(nl_langinfo(CODESET), "UTF-8") to convert
locale dependent string to UTF-8,
Of course, this can be omitted, if nl_langinfo(CODESET)
2. use iconv("UTF-8", "UCS-4") to convert UTF-8 to
UCS-4 with machine depdenent endianness.
(1. can be omitted,
> Note that I cannot really use wchar_t as I need to do assumptions
> about the encoding and the numeric values of specific characters.
Yeah, the use of wchar_t in scmxx is a problematic point, it's better
to use something like "typedef int unichar_t" to hold unicode
character, instead of wchar_t.
Thanks for your interest to NetBSD.