Subject: Re: iconv_open() / UCS-2-INTERNAL question
To: Christian Biere <christianbiere@gmx.de>
From: Chuck Cranor <chuck@ece.cmu.edu>
List: current-users
Date: 01/30/2007 14:11:53
On Mon, Jan 29, 2007 at 05:02:23AM +0100, Christian Biere wrote:
>Well, if it's not glibc, it must be Mac OS, right?. UCS-2-INTERNAL seems to be
>an Apple thing actually. In any case, they should be trying all known variants
>at runtime instead of hardcoding whatever at compile-time. I don't think
>assuming iconv() supports any kind of UCS-2 is very portable, using UTF-8 and
>converting from/to UCS-2 by hand if required would be better and
>incidently UTF-8 doesn't have any endian issues.


Ah, based on what you've said, I did some source code and Wikipedia 
reading and I think I figured it out!   

String data coming in from the camera (coming in via the PTP2 protocol)
consists of a length byte and then a string in UCS-2 (using the camera's
byte order).   That is why they need UCS-2.   There is a function
ptp_unpack_string() that decodes this.

The problem is that they convert the UCS-2 from the camera to host byte
order before they use iconv() to change it to the desired locale (see 
source code below).  UCS-2-INTERNAL is UCS-2 in the local byte order.

Since they known the byte order of the camera, it seems to me that 
they should just preserve that order and let iconv() do the work.
If the camera is little endian, then they should do iconv_open() with
UCS-2LE, otherwise UCS-2BE.   UCS-2-INTERNAL should not be used.
Then they can get rid of their ifdef's and no longer need to special
case OSs.


chuck


static inline char*
ptp_unpack_string(PTPParams *params, unsigned char* data, uint16_t offset, uint8_t *len)
{
        int i;
        uint8_t loclen;

        /* Cannot exceed 255 (PTP_MAXSTRLEN) since it is a single byte, duh ...  */              
        loclen = dtoh8a(&data[offset]);
        /* This len is used to advance the buffer pointer */
        *len = loclen;
        if (loclen) {
                uint16_t string[PTP_MAXSTRLEN+1];
                char *stringp = (char *) string;
                char loclstr[PTP_MAXSTRLEN*3+1]; /* UTF-8 encoding is max 3 bytes per UCS2 char. */
                char *locp = loclstr;
                size_t nconv;
                size_t convlen = loclen * 2; /* UCS-2 is 16 bit wide */
                size_t convmax = PTP_MAXSTRLEN*3;
 
                for (i=0;i<loclen;i++) {
                        string[i]=dtoh16a(&data[offset+i*2+1]);
                }
                /* be paranoid! Add a terminator. :( */
                string[loclen]=0x0000U;
                loclstr[0]='\0';
                /* loclstr=ucs2_to_utf8(string); */
                /* Do the conversion.  */
                nconv = iconv (params->cd_ucs2_to_locale, &stringp, &convlen, &locp, &convmax);
                /* FIXME: handle size errors */
                loclstr[PTP_MAXSTRLEN*3] = '\0';
                if (nconv == (size_t) -1)
                        return NULL;
                return strdup(loclstr);
        }
        return NULL;
}