Subject: Re: CVS commit: src/sys/dev/usb
To: Tom Spindler <dogcow@babymeat.com>
From: Dieter Baron <dillo@danbala.tuwien.ac.at>
List: tech-kern
Date: 02/27/2007 09:00:45
In article <20070226234530.GA11458@babymeat.com> Tom wrote:
: > >   Please note that fs/unicode.h does not handle UTF-16 surrogates
: > > correctly.  What's worth, the API does not allow this to be fixed.
: > > 
: > >   (Unicode defines more characters than fit in a 16 bit int.  In
: > > UTF-16, a character with a code above 0xffff is represented as two
: > > surrogate values.  In UTF-8, it is encoded as a 5 byte sequence.
: > > Encoding/decoding one 16 bit value at a time does not allow for this
: > > conversion to be done correctly.)

: Huh? You can encode 0x10000-0x10ffff in four UTF-8 bytes. 

  Oops, you are correct.

: CESU-8, on the other hand, encodes each surrogate pair as six bytes -
: but its usage is discouraged; see http://unicode.org/faq/utf_bom.html#30

						yours,
						dillo