NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/58014: wc no longer works with binary files
Crap. I see what the problem is: I want my "ll" alias to give me commas
in the file length reported.
This requires setting env vars like this:
LANG=en_US.UTF-8
LC_ALL=""
LC_NUMERIC=en_US.UTF-8 locale -k thousands_sep
Specifically, note that LC_ALL must be set to ""
producing output like this
-rwxr-xr-x 1 mac users 17,096 Mar 8 23:31 n*
-rw-r--r-- 1 mac users 560 Mar 8 23:31 n.c
-rw-r--r-- 1 mac users 155 Mar 8 23:01 n.c~
Now, if I set LC_ALL=C (to make 'wc' count ok on binary files), then I
get from my "ll" :
-rwxr-xr-x 1 mac users 17096 Mar 8 23:31 n*
-rw-r--r-- 1 mac users 560 Mar 8 23:31 n.c
-rw-r--r-- 1 mac users 155 Mar 8 23:01 n.c~
Catch 22 -- I have to use an alias for wc that changes the local
environment variable when running wc
alias wc="LC_ALL=C wc"
Again, I'm not sure this is sufficiently documented. I'd be happy to make
suggested changes to the man page(s).
Thanks again,
Mike
On Sat, Mar 9, 2024 at 12:05 PM Michael Cheponis <michael.cheponis%gmail.com@localhost>
wrote:
> It's indeed the case that on my arm64 test of 'wc' that 'worked' on binary
> files, the environment variable "LC_ALL=C" was set.
>
> I think the man page for wc needs updating, at least, to explain its
> interaction with that environment variable. There *is* a discussion on
> that man page about needed to use the posix iswspace() function, but when I
> followed that page, there was no detail about the LC_ALL environment
> variable.
>
> Also, historically, wc was something like this:
>
> int main(int argc, char *argv[]) {
> int character, lineCount = 0, wordCount = 0, byteCount = 0, inWord = 0;
>
> while ((character = getchar()) != EOF) {
> ++byteCount;
> if (character == '\n')
> ++lineCount;
> if (character == ' ' || character == '\n' || character == '\t')
> inWord = 0;
> else if (inWord == 0) {
> inWord = 1;
> ++wordCount;
> }
> }
>
> printf("%d %d %d\n", lineCount, wordCount, byteCount);
> return 0;
> }
>
> That is, because unix 'files' are simply strings-of-bytes, it may be
> meaningless to count 'words' and 'lines' -- but yes, characters (file size)
> is useful.
>
> Generally, I use this when I want to know source size, and the program's
> executable is in the source directory as an artifact - I do "wc *"
>
> Anyway, I'm asking for a documentation change.
>
> Thank you,
> Mike
>
> On Sat, Mar 9, 2024 at 1:55 AM Robert Elz <kre%munnari.oz.au@localhost> wrote:
>
>> The following reply was made to PR bin/58014; it has been noted by GNATS
Home |
Main Index |
Thread Index |
Old Index