tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

sed(1) and LC_CTYPE



If setting LC_CTYPE to this:

$ export LC_CTYPE=fr_FR.ISO8859-15

and then:

$ echo "éé" | sed 's/é/\é/g'
sed: 1: "s/é/\é/g": RE error: trailing backslash (\)

Where does the program manage to find a backslash i.e. 0134? While
'é' is 0351.

Since, to my knowledge, we do not support anything via iconv or
whatever, shouldn't we assume simply a string of bytes \`a la C,
that is:

diff --git a/usr.bin/sed/main.c b/usr.bin/sed/main.c
index d87bce2a5c85..c6b69a83cd57 100644
--- a/usr.bin/sed/main.c
+++ b/usr.bin/sed/main.c
@@ -136,7 +136,7 @@ main(int argc, char *argv[])
 	char *temp_arg;
 
 	setprogname(argv[0]);
-	(void) setlocale(LC_ALL, "");
+	(void) setlocale(LC_ALL, "POSIX");
 
 	fflag = 0;
 	inplace = NULL;

? With such a change, the result is:


$ echo "éé" | ./sed 's/é/\é/g'
éé

and this is what I expected.

What is the rationale for taking environment when all the code in the
src expects ASCII to start with? (for commands, range and so on).

What am I doing wrong?
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index