Subject: iconv(3) not working properly with euc-kr
To: None <tech-userlevel@netbsd.org>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-userlevel
Date: 09/23/2003 17:55:06
Hi,
I just noticed that iconv(3) doesn't properly convert utf-8 text to
euc-kr text. See below code:
#include <iconv.h>
#include <err.h>
#include <stdlib.h>
main()
{
const char *utf8_str = "\xec\x95\x84\xeb\xa7\x88\xeb\x8f\x84";
char euckr_str[7], *out;
size_t utf8_strlen, euckr_strlen, ret;
iconv_t cd;
utf8_strlen = strlen(utf8_str);
euckr_strlen = sizeof(euckr_str);
cd = iconv_open("euc-kr", "utf-8");
if (cd == (iconv_t)-1)
err(EXIT_FAILURE, "iconv_open()");
out = euckr_str;
ret = iconv(cd, &utf8_str, &utf8_strlen, &out, &euckr_strlen);
if (ret == -1)
err(EXIT_FAILURE, "iconv()");
euckr_str[6] = '\0';
printf("%s\n", euckr_str);
iconv_close(cd);
}
This program converts an utf-8 string to an euc-kr string and prints
it out. When I ran it on FreeBSD using GNU iconv, the result was
correct:
$ ./iconvtest | hexdump -C
00000000 be c6 b8 b6 b5 b5 0a |.......|
00000007
OTOH, on NetBSD-current every character was (mis)converted to 0x3f:
sh-2.05b$ ./iconvtest | hexdump -C
00000000 3f 3f 3f bf 01 0a |???...|
00000006
"bf 01 0a" is garbage left in unused space of the output buffer.
The same result was obtained with iconv(1) as well.
Jun-Young
--
Bang Jun-Young <junyoung@NetBSD.org>