NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PR/57798 CVS commit: src/usr.bin/mklocale



I've committed a revised fix to -current.

As wrote in the commit log, the original problem affected not
only wcwidth(3), but also other attributes storaged in _RuneType.

I'll send a pullup request to netbsd-10 tomorrow, if there's no
objections.

On 2023/12/29 6:40, Valery Ushakov wrote:
  Unicode has three different "numeric" values for a character
Unicode Character Database
  https://unicode.org/reports/tr44/
Numeric_Value is extracted based on the actual numeric value of the
    data in field 8 of UnicodeData.txt or the values of the
    kPrimaryNumeric, kAccountingNumeric, or kOtherNumeric tags, for
    characters listed in the Unihan data files.
Numeric_Type is extracted as follows. If fields 6, 7, and 8 in
    UnicodeData.txt are all non-empty, then Numeric_Type=Decimal.
    Otherwise, if fields 7 and 8 are both non-empty, then
    Numeric_Type=Digit.  Otherwise, if field 8 is non-empty, then
    Numeric_Type=Numeric.  For characters listed in the Unihan data
    files, Numeric_Type=Numeric for characters that have
    kPrimaryNumeric, kAccountingNumeric, or kOtherNumeric tags.  The
    default value is Numeric_Type=None.
The intention of TODIGIT is likely to be able to eventually provide
  support for something like LC_TIME's alt_digits or glibc printf(3)
  extension that provides 'I' modifier for %d and friends - that use
  locale-specific digits, say u+0f20..u+0f29 for Tibetan/Dzongkha
  locales.
But I don't really know much about those areas of locales...

Thank you for info. As far as I can see, most of characters in
problem are categorized to Numeric_Type=Numeric, and it seems
difficult to distinguish these with, e.g., [0-9a-f].

On 2023/12/29 5:50, Brett Lymn wrote:
> Our wide curses relies on wcwidth to determine cursor positioning and call widths, so any > curses based application attempting to display these characters will have a corrupted
> display.>

Yeah, /usr/bin/vi gets confused when edit message that contains
U+5146 actually ;)

I've roughly checked output from -d option of mklocale(1).
Width (and other attribute fields) seems fixed now, as far as
I can see.

On 2023/12/28 13:30, Ryo ONODERA wrote:
> See: 'ud->width >2' in utf8_from_data() in src/external/bsd/tmux/dist/utf8.c
>
>   /* Get UTF-8 character from data. */
>   enum utf8_state
>   utf8_from_data(const struct utf8_data *ud, utf8_char *uc)
>   {
>           u_int   index;
>
>           if (ud->width > 2)
>                   fatalx("invalid UTF-8 width: %u", ud->width);

Oops. I'm not pretty sure whether this is a good programming
practice, but this was actually useful to find out the problem ;)

Thanks,
rin


Home | Main Index | Thread Index | Old Index