NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/54424: awk: broken character classes in UTF-8 locale: only the first matches



>Number:         54424
>Category:       bin
>Synopsis:       awk: broken character classes in UTF-8 locale: only the first matches
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 31 23:05:00 +0000 2019
>Originator:     Martijn Dekker
>Release:        9.0_BETA
>Organization:
modernish
>Environment:
NetBSD localhost 9.0_BETA NetBSD 9.0_BETA (GENERIC) #0: Tue Jul 30 16:52:10 UTC 2019  mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
When a UTF-8 locale is active, /usr/bin/awk only matches the first character class in a bracket expression, even when matching simple ASCII characters.

I've confirmed this on NetBSD 8.1 as well. I've not tested earlier versions.

/usr/bin/awk on OpenBSD, FreeBSD and macOS (also nawk variants) do not have this problem, nor does the current upstream version (20190717).

>How-To-Repeat:
$ echo x | LANG=C awk '/[[:digit:][:alpha:]]/'  # ok
x
$ echo x | LANG=en_US.UTF-8 awk '/[[:digit:][:alpha:]]/'  # WRONG
$ echo x | LANG=en_US.UTF-8 awk '/[[:alpha:][:digit:]]/'  # ok
x

>Fix:



Home | Main Index | Thread Index | Old Index