Subject: iconv(3) not working properly with euc-kr
To: None <tech-userlevel@netbsd.org>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-userlevel
Date: 09/23/2003 17:55:06
Hi,

I just noticed that iconv(3) doesn't properly convert utf-8 text to
euc-kr text. See below code:

#include <iconv.h>
#include <err.h>
#include <stdlib.h>

main()
{
	const char *utf8_str = "\xec\x95\x84\xeb\xa7\x88\xeb\x8f\x84";
	char euckr_str[7], *out;
	size_t utf8_strlen, euckr_strlen, ret;
	iconv_t cd;

	utf8_strlen = strlen(utf8_str);
	euckr_strlen = sizeof(euckr_str);

	cd = iconv_open("euc-kr", "utf-8");
	if (cd == (iconv_t)-1)
		err(EXIT_FAILURE, "iconv_open()");

	out = euckr_str;
	ret = iconv(cd, &utf8_str, &utf8_strlen, &out, &euckr_strlen);
	if (ret == -1)
		err(EXIT_FAILURE, "iconv()");

	euckr_str[6] = '\0';
	printf("%s\n", euckr_str);

	iconv_close(cd);
}

This program converts an utf-8 string to an euc-kr string and prints
it out. When I ran it on FreeBSD using GNU iconv, the result was
correct:

$ ./iconvtest | hexdump -C
00000000  be c6 b8 b6 b5 b5 0a                              |.......|
00000007

OTOH, on NetBSD-current every character was (mis)converted to 0x3f:

sh-2.05b$ ./iconvtest | hexdump -C
00000000  3f 3f 3f bf 01 0a                                 |???...|
00000006

"bf 01 0a" is garbage left in unused space of the output buffer.

The same result was obtained with iconv(1) as well.

Jun-Young

-- 
Bang Jun-Young <junyoung@NetBSD.org>