Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
From: der Mouse <mouse%Rodents-Montreal.ORG@localhost>
Date: Thu, 11 Sep 2008 11:08:44 -0400 (EDT)

>>>> But most of new versions of the famous tools are going to be UTF-8
>>>> (wide char internally) compatible.  Thus, less, wc, e.t.c. are
>>>> complaining on that kind of symbols which are looked like Unicode
>>>> sequence starters.

I trust there will be a way to shut this off?  I ran into a real
headache on a Linux system which I eventually tracked down to its wc
being UTF-8 by default and exiting (silently!) as soon as it ran into
an invalid UTF-8 sequence.  This broke rather severely when I, coming
from a traditional Unix background, used those tools to manipulate
bytes rather than characters.

If they complain, that at least will alert people to the problem.  But
if they don't have any easy way to go back to the traditional
behaviour, I'll have to replace them - or, more likely, just not
"upgrade".  I do not want UTF-8; if I want to use Unicode, it seems
much saner to me to use streams of hexdecets rather than encoding
hexdecets into octet streams with a funky variable-length encoding.

>>> I think you should only complain about files that are not valid
>>> latin1.
>> Not that I care so much, but are NetBSD supposed to have its files
>> in Latin1?  Is that supposed to be the source character set, or
>> what?
> I think that simply is the practical reality.

I agree.

I think the default should be Latin-1, except that I also think tools
such as wc should, by default, not complain about invalid Latin-1,
instead sticking with the traditional behaviour of operating on bytes
rather than characters.

This is not to say that it should be impossible - or even difficult -
to make them use UTF-8 (or Latin-1 with errors for invalid octets).
Just that it shouldn't be the default.

/~\ The ASCII                           der Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Hubert Feyrer
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Alan Barrett

References:
- [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Andy Shevchenko
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: der Mouse
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Andy Shevchenko
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Joerg Sonnenberger
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Anders Magnusson
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Joerg Sonnenberger

Prev by Date: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Next by Date: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Previous by Thread: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Next by Thread: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Indexes:

Home | Main Index | Thread Index | Old Index