Subject: Re: wc: filename: invalid byte sequence
To: None <tls@rek.tjls.com>
From: Roland Dowdeswell <elric@imrryr.org>
List: tech-userlevel
Date: 08/26/2007 17:37:29
On 1188094481 seconds since the Beginning of the UNIX epoch
Thor Lancelot Simon wrote:
>
>On Sat, Aug 25, 2007 at 04:14:20PM -0700, John Nemeth wrote:
>> On Jan 15,  4:13am, Thomas Klausner wrote:
>> } 
>> } On the attached file, wc(1) on 4.99.30/amd64 reports "invalid byte
>> } sequence" quite often.
>> } I don't see why it should do that, the byte sequence is perfectly
>> } valid (for an mp3 file). I guess it's a bug in the wide character
>> } library or its usage by wc. Should I send-pr?
>> 
>>      wc is designed to work with text files, not binary files.
>
>Really?  How fascinating!  When did the designer of wc tell you this?

The man page says that it can count either bytes or characters.  One
presumes that is the difference:

     -c      The number of bytes in each input file is written to the standard
             output.

     -m      The number of characters in each input file is written to the
             standard output.

So, use wc -c.

Presumably, when it is counting either words, lines or characters it will
have to try to process bytes in the current locale.

--
    Roland Dowdeswell                      http://www.Imrryr.ORG/~elric/