Re: PR/57798 CVS commit: src/usr.bin/mklocale

To: lib-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,ryo%tetera.org@localhost
Subject: Re: PR/57798 CVS commit: src/usr.bin/mklocale
From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
Date: Fri, 5 Jan 2024 03:10:02 +0000 (UTC)

The following reply was made to PR lib/57798; it has been noted by GNATS.

From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost, lib-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost, ryo%tetera.org@localhost
Cc: ryo%tetera.org@localhost, Brett Lymn <blymn%internode.on.net@localhost>,
 Valery Ushakov <uwe%stderr.spb.ru@localhost>, Martin Husemann <martin%duskware.de@localhost>
Subject: Re: PR/57798 CVS commit: src/usr.bin/mklocale
Date: Fri, 5 Jan 2024 12:05:53 +0900

 I've committed a revised fix to -current.
 
 As wrote in the commit log, the original problem affected not
 only wcwidth(3), but also other attributes storaged in _RuneType.
 
 I'll send a pullup request to netbsd-10 tomorrow, if there's no
 objections.
 
 On 2023/12/29 6:40, Valery Ushakov wrote:
 >   Unicode has three different "numeric" values for a character
 >   
 >   Unicode Character Database
 >   https://unicode.org/reports/tr44/
 >   
 >     Numeric_Value is extracted based on the actual numeric value of the
 >     data in field 8 of UnicodeData.txt or the values of the
 >     kPrimaryNumeric, kAccountingNumeric, or kOtherNumeric tags, for
 >     characters listed in the Unihan data files.
 >   
 >     Numeric_Type is extracted as follows.  If fields 6, 7, and 8 in
 >     UnicodeData.txt are all non-empty, then Numeric_Type=Decimal.
 >     Otherwise, if fields 7 and 8 are both non-empty, then
 >     Numeric_Type=Digit.  Otherwise, if field 8 is non-empty, then
 >     Numeric_Type=Numeric.  For characters listed in the Unihan data
 >     files, Numeric_Type=Numeric for characters that have
 >     kPrimaryNumeric, kAccountingNumeric, or kOtherNumeric tags.  The
 >     default value is Numeric_Type=None.
 >   
 >   The intention of TODIGIT is likely to be able to eventually provide
 >   support for something like LC_TIME's alt_digits or glibc printf(3)
 >   extension that provides 'I' modifier for %d and friends - that use
 >   locale-specific digits, say u+0f20..u+0f29 for Tibetan/Dzongkha
 >   locales.
 >   
 >   But I don't really know much about those areas of locales...
 
 Thank you for info. As far as I can see, most of characters in
 problem are categorized to Numeric_Type=Numeric, and it seems
 difficult to distinguish these with, e.g., [0-9a-f].
 
 On 2023/12/29 5:50, Brett Lymn wrote:
  > Our wide curses relies on wcwidth to determine cursor positioning and 
 call widths, so any
  > curses based application attempting to display these characters will 
 have a corrupted
  > display.>
 
 Yeah, /usr/bin/vi gets confused when edit message that contains
 U+5146 actually ;)
 
 I've roughly checked output from -d option of mklocale(1).
 Width (and other attribute fields) seems fixed now, as far as
 I can see.
 
 On 2023/12/28 13:30, Ryo ONODERA wrote:
  >   See: 'ud->width >2' in utf8_from_data() in 
 src/external/bsd/tmux/dist/utf8.c
  >
  >   /* Get UTF-8 character from data. */
  >   enum utf8_state
  >   utf8_from_data(const struct utf8_data *ud, utf8_char *uc)
  >   {
  >           u_int   index;
  >
  >           if (ud->width > 2)
  >                   fatalx("invalid UTF-8 width: %u", ud->width);
 
 Oops. I'm not pretty sure whether this is a good programming
 practice, but this was actually useful to find out the problem ;)
 
 Thanks,
 rin

Prev by Date: Re: PR/57798 CVS commit: src/usr.bin/mklocale
Next by Date: re: kern/57816: Add sysctl support for physical cores
Previous by Thread: PR/57798 CVS commit: src/usr.bin/mklocale
Next by Thread: toolchain/57799: llvm amd64 builds are unbearbly slow and hit the build cluster time limit
Indexes:

Home | Main Index | Thread Index | Old Index