NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/58014: wc no longer works with binary files



Crap.  I see what the problem is:  I want my "ll" alias to give me commas
in the file length reported.

This requires setting env vars like this:
LANG=en_US.UTF-8
LC_ALL=""
LC_NUMERIC=en_US.UTF-8 locale -k thousands_sep

Specifically, note that LC_ALL must be set to ""

producing output like this
-rwxr-xr-x  1 mac  users  17,096 Mar  8 23:31 n*
-rw-r--r--  1 mac  users     560 Mar  8 23:31 n.c
-rw-r--r--  1 mac  users     155 Mar  8 23:01 n.c~

Now, if I set LC_ALL=C   (to make 'wc' count ok on binary files), then I
get from my "ll" :
-rwxr-xr-x  1 mac  users   17096 Mar  8 23:31 n*
-rw-r--r--  1 mac  users     560 Mar  8 23:31 n.c
-rw-r--r--  1 mac  users     155 Mar  8 23:01 n.c~

Catch 22 -- I have to use an alias for wc that changes the local
environment variable when running wc

alias wc="LC_ALL=C wc"

Again, I'm not sure this is sufficiently documented.   I'd be happy to make
suggested changes to the man page(s).

Thanks again,
Mike


On Sat, Mar 9, 2024 at 12:05 PM Michael Cheponis <michael.cheponis%gmail.com@localhost>
wrote:

> It's indeed the case that on my arm64 test of 'wc' that 'worked' on binary
> files, the environment variable "LC_ALL=C" was set.
>
> I think the man page for wc needs updating, at least, to explain its
> interaction with that environment variable.   There *is* a discussion on
> that man page about needed to use the posix iswspace() function, but when I
> followed that page, there was no detail about the LC_ALL environment
> variable.
>
> Also, historically, wc was something like this:
>
> int main(int argc, char *argv[]) {
>     int character, lineCount = 0, wordCount = 0, byteCount = 0, inWord = 0;
>
>     while ((character = getchar()) != EOF) {
>         ++byteCount;
>         if (character == '\n')
>             ++lineCount;
>         if (character == ' ' || character == '\n' || character == '\t')
>             inWord = 0;
>         else if (inWord == 0) {
>             inWord = 1;
>             ++wordCount;
>         }
>     }
>
>     printf("%d %d %d\n", lineCount, wordCount, byteCount);
>     return 0;
> }
>
> That is, because unix 'files' are simply strings-of-bytes, it may be
> meaningless to count 'words' and 'lines' -- but yes, characters (file size)
> is useful.
>
> Generally, I use this when I want to know source size, and the program's
> executable is in the source directory as an artifact - I do "wc *"
>
> Anyway, I'm asking for a documentation change.
>
> Thank you,
> Mike
>
> On Sat, Mar 9, 2024 at 1:55 AM Robert Elz <kre%munnari.oz.au@localhost> wrote:
>
>> The following reply was made to PR bin/58014; it has been noted by GNATS


Home | Main Index | Thread Index | Old Index