tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sed(1) and LC_CTYPE



On Wed, Jul 26, 2023 at 07:17:17PM +0200, tlaronde%polynum.com@localhost wrote:
> On Wed, Jul 26, 2023 at 06:32:15PM +0200, Martin Husemann wrote:
> > On Wed, Jul 26, 2023 at 12:19:39PM -0400, Mouse wrote:
> > > > $ export LC_CTYPE=fr_FR.ISO8859-15
> > > 
> > > > $ echo "éé" | sed 's/é/\é/g'
> > > > sed: 1: "s/é/\é/g": RE error: trailing backslash (\)
> > > 
> > > I agree that's broken.
> > > 
> > > > Since, to my knowledge, we do not support anything via iconv or
> > > > whatever, shouldn't we assume simply a string of bytes \`a la C, that
> > > > is:
> > > 
> > > Seems to me there's a deeper problem.  Even if something like iconv
> > > _were_ available, fr_FR.ISO8859-15 is a single-octet character set, so
> > > 
> > > > -	(void) setlocale(LC_ALL, "");
> > > > +	(void) setlocale(LC_ALL, "POSIX");
> > > 
> > > should, it seems to me, make no difference.  Am I misunderstanding?
> > 
> > Indeed - and it only does on architectures where char == signed char:
> 
> Very good catch, indeed.
> 
> And this is a regression vs 9.3 and I suspect the main difference is the
> setlocale(3)---that allows not to solve, but to circumvent a more deeper
> problem.
> 
> PR sent as bin/57544

RVP has spotted the culprit (for this one; the whole code would need
a review for a similar problem in other uses and with the interaction
with the locales).

The amended diff, more explanations (and caveats) have been put in
bin/57544 and the correct behavior verified by compiling the libc
with this diff and compiling statically sed(1) against this amended
libc.
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index