NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/58619: nawk 2024-08-17 broken and incompatible for non-UTF-8 and non-C locales
The following reply was made to PR bin/58619; it has been noted by GNATS.
From: RVP <rvp%SDF.ORG@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: bin/58619: nawk 2024-08-17 broken and incompatible for non-UTF-8
and non-C locales
Date: Tue, 20 Aug 2024 10:36:34 +0000 (UTC)
On Tue, 20 Aug 2024, rokuyama.rk%gmail.com@localhost wrote:
> (BTW, their documentation is *REALLY* poor.)
>
Ya, the BSD extensions aren't documented in the `bsd-features' branch man-page.
> Try euc.txt, which I converted to EUC-JP from
> http://www.jp.netbsd.org/ja/JP/index.html
>
> ---
> $ ftp https://www.netbsd.org/~rin/euc.txt
> ...
> $ env LC_CTYPE=ja_JP.eucJP \
> awk 'BEGIN{sum = 0} {sum += length($0)} END{print sum}'
> ---
>
> Older versions and 2024-08-17 give 10978 and 10418, respectively.
>> Fix:
> Just for example above:
>
> https://gist.github.com/rokuyama/c7e6d12b6a7bcad0704f706c4f7e9569
>
Well, I guess it's a pain prepending `LC_ALL=C' on all non-UTF-8 locales, so:
```
diff -urN nawk.orig/dist/main.c nawk/dist/main.c
--- nawk.orig/dist/main.c 2024-08-18 03:11:06.691688756 +0000
+++ nawk/dist/main.c 2024-08-20 10:24:10.089804741 +0000
@@ -32,6 +32,7 @@
#include <stdio.h>
#include <ctype.h>
#include <locale.h>
+#include <langinfo.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
@@ -143,6 +144,8 @@
setlocale(LC_CTYPE, "");
setlocale(LC_NUMERIC, "C"); /* for parsing cmdline & prog */
+ if (strcmp(nl_langinfo(CODESET), "UTF-8"))
+ setlocale(LC_ALL, "C"); /* not UTF-8, force "C" */
awk_mb_cur_max = MB_CUR_MAX;
cmdname = argv[0];
if (argc == 1) {
```
-RVP
Home |
Main Index |
Thread Index |
Old Index