NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/57544: sed(1) and regex(3) problem with encoding



>Number:         57544
>Category:       bin
>Synopsis:       sed(1) and regex(3) problem with encoding
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 26 17:05:00 +0000 2023
>Originator:     Thierry LARONDE
>Release:        NetBSD 10.0_BETA
>Organization:
>Environment:
NetBSD cauchy.polynum.local 10.0_BETA NetBSD 10.0_BETA (cauchy) #0: Mon Feb 27 11:28:34 CET 2023  tlaronde@cauchy.polynum.local:/usr/obj/polynum.NODECONF-cauchy.polynum.local_netbsd-9.3-amd64_netbsd-amd64/netbsd/obj/sys/arch/amd64/compile/cauchy amd64

>Description:
$ export LC_CTYPE=fr_FR.ISO8859-15

and then:

$ echo "éé" | sed 's/é/\é/g'
sed: 1: "s/é/\é/g": RE error: trailing backslash (\)

$ export LC_CTYPE=POSIX.ISO8859-15 # incorrect setting but...
$ echo "éé" | sed 's/é/\é/g'
éé

From a test by Martin HUSEMANN, the problem is on arch where
char == signed char. (On Apple POWERMAC_G5.MP, as expected.)

Note: this is a regression from 9.3 and can be not solved, but masked,
by:

-   (void) setlocale(LC_ALL, "");
+   (void) setlocale(LC_ALL, "POSIX");

probably in every text utility using regex(3). 


>How-To-Repeat:
$ export LC_CTYPE=fr_FR.ISO8859-15
$ echo "éé" | sed 's/é/\é/g'
sed: 1: "s/é/\é/g": RE error: trailing backslash (\)

(On arch where char == signed char as amd64)
>Fix:
Not fixing: problem is lurking. Circumventing:

-   (void) setlocale(LC_ALL, "");
+   (void) setlocale(LC_ALL, "POSIX");



Home | Main Index | Thread Index | Old Index