NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PR/60369 CVS commit: [netbsd-11] src



The following reply was made to PR standards/60369; it has been noted by GNATS.

From: "Martin Husemann" <martin%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc: 
Subject: PR/60369 CVS commit: [netbsd-11] src
Date: Sat, 4 Jul 2026 15:44:51 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Jul  4 15:44:51 UTC 2026
 
 Modified Files:
 	src/lib/libc/citrus/modules [netbsd-11]: citrus_utf8.c
 	src/lib/libc/locale [netbsd-11]: c8rtomb.c
 	src/tests/lib/libc/locale [netbsd-11]: t_c8rtomb.c
 
 Log Message:
 Additionally pull up following revision(s) (requested by riastradh in ticket #339):
 
 	tests/lib/libc/locale/t_c8rtomb.c: revision 1.8
 	lib/libc/locale/c8rtomb.c: revision 1.10
 	lib/libc/citrus/modules/citrus_utf8.c: revision 1.20
 	lib/libc/citrus/modules/citrus_utf8.c: revision 1.21
 	lib/libc/citrus/modules/citrus_utf8.c: revision 1.22
 
 Be truly pedantic about UTF-8 encodings
 
 If we're not going to be accepting "legacy" UTF-8
 (5 and 6 byte encodings for code points >= 0x00200000 which the
 standards don't allow, as they won't fit in UTF-16) then we
 certainly should never be able to generate them, and even more
 should certainly be pedantic about not allowing the various
 forms of mis-coded strings for which there is no justification
 but have been known to be used to attempt to violate security.
 
 This, I believe, now enforces all the current restrictions, eg,
 it will no longer be possible to encode ascii in 2 bytes (0xc0 '.')
 and similar, the shortest legal encoding is all that will be
 accepted (and all that will be generated, but that was always true).
 
 It is quite possible that this will break things, probably many
 tests, as now random garbage won't be accepted as valid, things
 must be properly encodedd.
 mbrtowc() fix a stupid typo in the previous version.
 
 No idea how I managed to miss this previously.   This update should
 make at least some of the ATF tests (and other stuff) which failed
 after the previous change start working again.
 
 libc: Fix two bugs in UTF-8 decoding and add exhaustive tests.
 1. Despite the recent slew of changes, mbrtowc(3) in UTF-8 locales
    would still fail to return EILSEQ at the first byte where it can,
    because it would silently consume the first two bytes encoding a
    surrogate code point and only reject it after consuming the third
    byte.
 2. The byte classification table for the compressed DFA used by
    c8rtomb(3) was wrong for the bytes c0-c1 and f5-ff.  Somehow the
    program I used, over a decade ago, to generate the compressed DFA
    and classification table got those, and only those, wrong, and I
    can't find that program now, so I'll just have to correct this by
    hand.  Class 10 was missing from the table (but class 11 was not)
    and is obviously the right class for the always-invalid c0-c1 and
    f5-ff because all states transition to UTF8_REJECT (=96) on class
    10, and the same is not true for any other class.
 
 Now mbrtowc(3) in LC_CTYPE=C.UTF_8 is checked against c8rtomb(3)
 systematically to verify they agree on all possible inputs one byte
 at a time.  There are 4501261 distinct such inputs, including the
 valid encodings of all Unicode scalar values, the invalid encodings
 of Unicode surrogate code points, and other invalid encodings.  We
 stop at the first invalid byte, so it is not necessary to examine all
 ~four billion 4-byte strings.  I considered randomly subsampling to
 make this test take less time, but decided that would be too
 confusing.  For debugging purposes, you can run this new test with
 `atf-run -v c8rtomb_all_faliures=yes' to show all failures rather
 than just the first one; this produces megabytes of output with some
 of the bugs we've had, so it's off by default.
 
 PR standards/60369: mbrtowc, mbrlen have wrong return value for some
 invalid byte sequences
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.18.42.1 -r1.18.42.2 \
     src/lib/libc/citrus/modules/citrus_utf8.c
 cvs rdiff -u -r1.9 -r1.9.4.1 src/lib/libc/locale/c8rtomb.c
 cvs rdiff -u -r1.7 -r1.7.4.1 src/tests/lib/libc/locale/t_c8rtomb.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
 



Home | Main Index | Thread Index | Old Index