tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)



On Thu, Sep 11, 2008 at 5:10 PM, der Mouse 
<mouse%rodents-montreal.org@localhost> wrote:
>> Replace broken unicode sequence to whitespace in the plain text files.
> It's hardly fair to call these "broken unicode sequences" when there's
> no particular reason to think that those files' contents are supposed
> to be Unicode.  Most likely, I'd say, they're Latin-1 non-break-space
> characters, and calling them "broken unicode" makes about as much sense
> as calling a bicycle a broken car.
wc f.e. complains in similar way. Actually I caught this sentence there.

> Not that replacing them with ASCII spaces isn't a good idea.  I just
> don't think the terminology is fair.
Fedora Team obligates its members to convert all documentation stuff to UTF-8.
NetBSD couldn't go in the same way, I think.
But most of new versions of the famous tools are going to be UTF-8
(wide char internally) compatible. Thus, less, wc, e.t.c. are
complaining on that kind of symbols which are looked like Unicode
sequence starters.

-- 
With Best Regards,
Andy Shevchenko


Home | Main Index | Thread Index | Old Index