NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/58014: wc no longer works with binary files



The following reply was made to PR bin/58014; it has been noted by GNATS.

From: RVP <rvp%SDF.ORG@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/58014: wc no longer works with binary files
Date: Sun, 10 Mar 2024 11:34:31 +0000 (UTC)

 On Sat, 9 Mar 2024, michael.cheponis%gmail.com@localhost wrote:
 
 >> Description:
 > when 'wc' is given input from a binary file, it now gives the error:
 >
 > wc: hello: invalid byte sequence
 >
 > (Assuming 'hello' is a binary file)
 >
 
 
 It's pretty inefficient to use mbrtowc() when a `-m' wasn't supplied (no
 matter what the locale), but, you can at least stop wc from spamming and
 confusing users so...:
 
 ```
 --- wc.c.orig	2024-01-14 17:39:19.000000000 +0000
 +++ wc.c	2024-03-10 11:06:10.228327632 +0000
 @@ -73,7 +73,7 @@
   #endif
 
   static wc_count_t	tlinect, twordct, tcharct, tlongest;
 -static bool		doline, doword, dobyte, dochar, dolongest;
 +static bool		doline, doword, dobyte, dochar, dolongest, warned;
   static int 		rval = 0;
 
   static void	cnt(const char *);
 @@ -148,8 +148,11 @@
   	do {
   		r = mbrtowc(wc, p, len, st);
   		if (r == (size_t)-1) {
 -			warnx("%s: invalid byte sequence", file);
 -			rval = 1;
 +			if (!warned && dochar) {
 +				warnx("%s: invalid byte sequence", file);
 +				rval = 1;
 +				warned = true;
 +			}
 
   			/* XXX skip 1 byte */
   			len--;
 @@ -187,6 +190,7 @@
   	int fd, len = 0;
 
   	linect = wordct = charct = longest = 0;
 +	warned = false;
   	if (file != NULL) {
   		if ((fd = open(file, O_RDONLY, 0)) < 0) {
   			warn("%s", file);
 ```
 
 > wc works as one would expect on arm64.  This error only shows up on amd64
 >
 
 Are the counts between the multi-byte locale vs. C locale actually wrong, or
 is it just the spew that worried you?
 
 > There is no mention of a "-b" switch, say, for binary files; nor is there any explanation on the "man wc" page explaining this.
 >
 
 There's a `-c' (byte-count) switch which is the default.
 
 -RVP
 


Home | Main Index | Thread Index | Old Index