NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
PR/58612 CVS commit: src
The following reply was made to PR lib/58612; it has been noted by GNATS.
From: "Taylor R Campbell" <riastradh%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc:
Subject: PR/58612 CVS commit: src
Date: Mon, 19 Aug 2024 16:22:10 +0000
Module Name: src
Committed By: riastradh
Date: Mon Aug 19 16:22:10 UTC 2024
Modified Files:
src/lib/libc/locale: c32rtomb.c c32rtomb.h
src/tests/lib/libc/locale: t_c16rtomb.c t_c8rtomb.c
Log Message:
c32rtomb(3): Use conversion state to handle shift sequences.
For conversion of Unicode scalar values to coding systems requiring
shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will
always produce:
1. a shift sequence from the initial state to some nondefault state,
like from US-ASCII to JIS X 0208
2. the encoding of the desired characater
3. a shift sequence restoring the initial state
This is unnecessary if the output is already in the state needed to
encoded the desired character. For example, this method produces
seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen,
to encode two consecutive ones -- even though the shift sequence is
only three bytes long and once shifted YEN SIGN takes only one byte.
Instead, convert the Unicode scalar value to a locale-dependent wide
character and encode that, by composing
- _citrus_iconv_convert
=> gives us a multibyte encoding of the character from the initial
state (and restoring the initial state afterward)
- mbrtowc with initial conversion state
=> gives us the single wide character representation
XXX If combining characters are possible here, this may fail.
- wcrtomb with caller's conversion tsate
=> gives us a state-dependent multibyte encoding of the character
XXX Is there a cheaper way to convert from Unicode scalar value to
locale-dependent wide character? It is not obvious to me from the
largely undocumented Citrus machinery, but it would obviously be
better than this somewhat circuitous Rube Goldberg contraption of
chained multibyte APIs.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift
sequences
To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/lib/libc/locale/c32rtomb.c
cvs rdiff -u -r1.1 -r1.2 src/lib/libc/locale/c32rtomb.h
cvs rdiff -u -r1.5 -r1.6 src/tests/lib/libc/locale/t_c16rtomb.c
cvs rdiff -u -r1.6 -r1.7 src/tests/lib/libc/locale/t_c8rtomb.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Home |
Main Index |
Thread Index |
Old Index