Subject: POSIX nit for iconv(3)
To: None <tech-userlevel@netbsd.org>
From: Valtteri Vuorikoski <vuori@puuhamaa.magenta.net>
List: tech-userlevel
Date: 07/06/2004 00:17:39
I ran into a slight problem with a program that expects iconv(3) to return
an error when it encounters an invalid input character sequence. Instead of
an error, NetBSD's iconv(3) will replace the sequence with '?'.
Quoth IEEE Std 1003.1-2004:
If a sequence of input bytes does not form a valid character in the
specified codeset, conversion shall stop after the previous
successfully converted character.
The problem can be repeated by attempting to feed iconv(1) with
ISO-8859-1 characters as follows:=20
% echo =E4=F6lk | iconv -f iso-2022-jp -t iso-8859-1
??lk
iconv: warning: invalid characters: 2
Solaris, Linux and MacOS X all stop immediately with an error. Since
ISO-2022-JP is a 7-bit character set, it seems that to be POSIXly
correct NetBSD should also stop immediately with an error instead
of pressing on. While I think the NetBSD behavior has its merits,
compatibility should take priority; perhaps there could be an
alternate interface to "convert no matter what"?
Any comments or should I send-pr and/or attempt to fix this?
-v