NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/57544: sed(1) and regex(3) problem with encoding
The following reply was made to PR bin/57544; it has been noted by GNATS.
From: RVP <rvp%SDF.ORG@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: tlaronde%polynum.com@localhost
Subject: Re: bin/57544: sed(1) and regex(3) problem with encoding
Date: Mon, 31 Jul 2023 16:45:00 +0000 (UTC)
On Mon, 31 Jul 2023, tlaronde%polynum.com@localhost wrote:
> From a cursory look, the difference between setting LC_CTYPE=C (no
> problem) or LC_CTYPE=fr_FR.ISO8859-15 (just as an example) is perhaps
> that in the first case extended RE are assumed, while in the latter case
> legacy is used, hence not following the same path (legacy using
> p_simp_re() while ERE uses p_ere_exp()).
>
No, it's the other half of the same test on line 1030 returning true/false.
In the fr_FR.ISO8859-1 locale, may_escape() returns false for `0xE9' because
it _is_ an alpha char. In the C/POSIX locate, may_escape() returns true
as `0xE9' is _not_ an alpha char. there.
Incidentally, that isalpha() test in may_escape() should really use iswalpha()
because `ch' is of type `wint_t':
```
diff -urN regex.orig/regcomp.c regex/regcomp.c
--- regex.orig/regcomp.c 2022-12-21 17:44:15.000000000 +0000
+++ regex/regcomp.c 2023-07-31 16:25:38.458547000 +0000
@@ -1422,7 +1422,7 @@
if ((p->pflags & PFLAG_LEGACY_ESC) != 0)
return (true);
- if (isalpha(ch) || ch == '\'' || ch == '`')
+ if (iswalpha(ch) || ch == '\'' || ch == '`')
return (false);
return (true);
#ifdef NOTYET
```
As you said, this code ought to be carefully audited. :)
-RVP
Home |
Main Index |
Thread Index |
Old Index