Subject: Re: iconv and conversion from/to local charset and wchar_t
To: None <tech-userlevel@netbsd.org>
From: Hendrik Sattler <ubq7@stud.uni-karlsruhe.de>
List: tech-userlevel
Date: 02/12/2004 18:37:13
Hi,

Am Freitag, 30. Januar 2004 18:25 schrieb Noriyuki Soda:
> So, to make scmxx work on both NetBSD-1.6 and NetBSD-current,
> the following code is needed to convert locale dependent string
> to UCS-4:

> [2] http://mail-index.netbsd.org/tech-userlevel/2004/01/31/0000.html:
> BTW, Solaris 8 and Solaris 9 only support UTF-7/8/16 for the direct
> conversion from/to UCS-4.
> So, you have to use the following way to make your program work
> on Solaris 8 and Solaris 9:
>         1. use iconv(nl_langinfo(CODESET), "UTF-8") to convert
>           locale dependent string to UTF-8,
>           Of course, this can be omitted, if nl_langinfo(CODESET)
>           returns "UTF-8".
>         then
>         2. use iconv("UTF-8", "UCS-4") to convert UTF-8 to
>           UCS-4 with machine depdenent endianness.
>         (1. can be omitted, 

I now rewrote the whole thing:
http://cvs.sf.net/viewcvs.py/scmxx/scmxx_C/src/unicode.c?rev=1.8&view=markup

It works without the above work-around (intermediate conversion to UTF-8) on 
Solaris8/Sparc just fine.
It also works fine on systems using GNU iconv.

Thomas mailed me that there are still problems with NetBSD-current like
"Error on text conversion to internal charset: Illegal byte sequence" (EILSEQ)
when using a custom escape sequence like "\20ac" (EuroSign). It also shows '?' 
on output that should be formatted as "\XXXX" (like "\20ac") instead.

The main problem is that only NetBSD-current has this problem. Maybe iconv() 
is broken and returns 0 although a char was not  translatable and thus mapped 
to '?'?
Additionally, does your iconv() tries to interpret escape sequences (chars 
after a '\')? If yes: please don't.
Or maybe I did something wrong and it works on two different implementations 
by accident?
Maybe one of you can take a look at it?

Thanks

Hendrik