NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/57544: sed(1) and regex(3) problem with encoding



The following reply was made to PR bin/57544; it has been noted by GNATS.

From: RVP <rvp%SDF.ORG@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: tlaronde%polynum.com@localhost
Subject: Re: bin/57544: sed(1) and regex(3) problem with encoding
Date: Mon, 31 Jul 2023 16:45:00 +0000 (UTC)

 On Mon, 31 Jul 2023, tlaronde%polynum.com@localhost wrote:
 
 > From a cursory look, the difference between setting LC_CTYPE=C (no
 > problem) or LC_CTYPE=fr_FR.ISO8859-15 (just as an example) is perhaps
 > that in the first case extended RE are assumed, while in the latter case
 > legacy is used, hence not following the same path (legacy using
 > p_simp_re() while ERE uses p_ere_exp()).
 >
 
 No, it's the other half of the same test on line 1030 returning true/false.
 In the fr_FR.ISO8859-1 locale, may_escape() returns false for `0xE9' because
 it _is_ an alpha char. In the C/POSIX locate, may_escape() returns true
 as `0xE9' is _not_ an alpha char. there.
 
 Incidentally, that isalpha() test in may_escape() should really use iswalpha()
 because `ch' is of type `wint_t':
 
 ```
 diff -urN regex.orig/regcomp.c regex/regcomp.c
 --- regex.orig/regcomp.c	2022-12-21 17:44:15.000000000 +0000
 +++ regex/regcomp.c	2023-07-31 16:25:38.458547000 +0000
 @@ -1422,7 +1422,7 @@
 
   	if ((p->pflags & PFLAG_LEGACY_ESC) != 0)
   		return (true);
 -	if (isalpha(ch) || ch == '\'' || ch == '`')
 +	if (iswalpha(ch) || ch == '\'' || ch == '`')
   		return (false);
   	return (true);
   #ifdef NOTYET
 ```
 
 As you said, this code ought to be carefully audited. :)
 
 -RVP
 


Home | Main Index | Thread Index | Old Index