Subject: Re: CVS commit: src/sys/dev/usb
To: Tom Spindler <firstname.lastname@example.org>
From: Dieter Baron <email@example.com>
Date: 02/27/2007 09:00:45
In article <20070226234530.GA11458@babymeat.com> Tom wrote:
: > > Please note that fs/unicode.h does not handle UTF-16 surrogates
: > > correctly. What's worth, the API does not allow this to be fixed.
: > >
: > > (Unicode defines more characters than fit in a 16 bit int. In
: > > UTF-16, a character with a code above 0xffff is represented as two
: > > surrogate values. In UTF-8, it is encoded as a 5 byte sequence.
: > > Encoding/decoding one 16 bit value at a time does not allow for this
: > > conversion to be done correctly.)
: Huh? You can encode 0x10000-0x10ffff in four UTF-8 bytes.
Oops, you are correct.
: CESU-8, on the other hand, encodes each surrogate pair as six bytes -
: but its usage is discouraged; see http://unicode.org/faq/utf_bom.html#30