NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PR/58612 CVS commit: src



The following reply was made to PR lib/58612; it has been noted by GNATS.

From: "Taylor R Campbell" <riastradh%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc: 
Subject: PR/58612 CVS commit: src
Date: Mon, 19 Aug 2024 16:22:10 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Aug 19 16:22:10 UTC 2024
 
 Modified Files:
 	src/lib/libc/locale: c32rtomb.c c32rtomb.h
 	src/tests/lib/libc/locale: t_c16rtomb.c t_c8rtomb.c
 
 Log Message:
 c32rtomb(3): Use conversion state to handle shift sequences.
 
 For conversion of Unicode scalar values to coding systems requiring
 shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will
 always produce:
 
 1. a shift sequence from the initial state to some nondefault state,
    like from US-ASCII to JIS X 0208
 2. the encoding of the desired characater
 3. a shift sequence restoring the initial state
 
 This is unnecessary if the output is already in the state needed to
 encoded the desired character.  For example, this method produces
 seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen,
 to encode two consecutive ones -- even though the shift sequence is
 only three bytes long and once shifted YEN SIGN takes only one byte.
 
 Instead, convert the Unicode scalar value to a locale-dependent wide
 character and encode that, by composing
 
 - _citrus_iconv_convert
   => gives us a multibyte encoding of the character from the initial
      state (and restoring the initial state afterward)
 - mbrtowc with initial conversion state
   => gives us the single wide character representation
      XXX If combining characters are possible here, this may fail.
 - wcrtomb with caller's conversion tsate
   => gives us a state-dependent multibyte encoding of the character
 
 XXX Is there a cheaper way to convert from Unicode scalar value to
 locale-dependent wide character?  It is not obvious to me from the
 largely undocumented Citrus machinery, but it would obviously be
 better than this somewhat circuitous Rube Goldberg contraption of
 chained multibyte APIs.
 
 PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift
 sequences
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.3 -r1.4 src/lib/libc/locale/c32rtomb.c
 cvs rdiff -u -r1.1 -r1.2 src/lib/libc/locale/c32rtomb.h
 cvs rdiff -u -r1.5 -r1.6 src/tests/lib/libc/locale/t_c16rtomb.c
 cvs rdiff -u -r1.6 -r1.7 src/tests/lib/libc/locale/t_c8rtomb.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
 


Home | Main Index | Thread Index | Old Index