It's indeed the case that on my arm64 test of 'wc' that 'worked' on binary files, the environment variable "LC_ALL=C" was set.
I think the man page for wc needs updating, at least, to explain its interaction with that environment variable. There *is* a discussion on that man page about needed to use the posix iswspace() function, but when I followed that page, there was no detail about the LC_ALL environment variable.
Also, historically, wc was something like this:
int main(int argc, char *argv[]) {
int character, lineCount = 0, wordCount = 0, byteCount = 0, inWord = 0;
while ((character = getchar()) != EOF) {
++byteCount;
if (character == '\n')
++lineCount;
if (character == ' ' || character == '\n' || character == '\t')
inWord = 0;
else if (inWord == 0) {
inWord = 1;
++wordCount;
}
}
printf("%d %d %d\n", lineCount, wordCount, byteCount);
return 0;
}
That is, because unix 'files' are simply strings-of-bytes, it may be meaningless to count 'words' and 'lines' -- but yes, characters (file size) is useful.
Generally, I use this when I want to know source size, and the program's executable is in the source directory as an artifact - I do "wc *"
Anyway, I'm asking for a documentation change.
Thank you,
Mike