Subject: Re: behaviour of iconv in NetBSD and pkgsrc libiconv
To: None <tech-userlevel@netbsd.org>
From: Valeriy E. Ushakov <uwe@ptc.spbu.ru>
List: tech-userlevel
Date: 04/03/2006 13:40:46
joerg@britannica.bec.de wrote:

> On Sun, Apr 02, 2006 at 05:53:06PM +0200, Klaus Heinz wrote:
>> This is even mentioned in our man page iconv(3):
>> 
>>   "If no conversion exists for a particular character, an
>>    implementation-defined conversion is performed on this character."
>> 
>> NetBSD's iconv() completes the conversion of the whole buffer and maps
>> such characters to a question mark. The return value of iconv() shows
>> how many of those non-reversible conversions happened.
>> In contrast, converters/libiconv stops the conversion at this point,
>> returns an error and gives the application a chance to do something
>> about the unconvertible character [1].
>
> The GNU implementation is clearly broken.

Glibc manual even says that that behavior may change in the future.
http://www.gnu.org/software/libc/manual/html_node/Generic-Conversion-Interface.html

    Since the character sets selected in the iconv_open call can be
    almost arbitrary, there can be situations where the input buffer
    contains valid characters, which have no identical representation
    in the output character set.  The behavior in this situation is
    undefined.  The /current/ behavior of the GNU C library in this
    situation is to return with an error immediately.  This certainly
    is not the most desirable solution; therefore, future versions
    will provide better ones, but they are not yet finished.


> An implementation defined conversion is *not* an error. EILSEQ is an
> absolutely inappropiate error message, since it doesn't allow to
> distinguish between invalid input and valid but unconvertible input.

What Joerg said.


> The correct behaviour to find the first unconvertible character, as
> sad as it might seem, is to perform a binary search.

How would you detect the "unconvertable" condition?


A quick look at lib/libc/citrus/modules/citrus_iconv_std.c shows that
it seems to have some logic to handle uncovertable char as an error.
I wonder if the practical path would be to introduce something like
__iconv_gnu_bug() function that would turn on glibc compatible
behaviour, so that glibc-addicted pkgsrc programs can be used w/out
much vivisection performed on them.

Hmm, gnu iconv has iconvctl() extension that we can probably implement
and extend to use for the purpose.

SY, Uwe
-- 
uwe@ptc.spbu.ru                         |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen