tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)



On Thu, 11 Sep 2008, der Mouse wrote:
> I do not want UTF-8; if I want to use Unicode, it seems
> much saner to me to use streams of hexdecets rather than encoding
> hexdecets into octet streams with a funky variable-length encoding.

Unicode is a 21-bit character set (or 31-bit in some old versions).
The 16-bit encoding is just as funky and variable-length as the 8-bit
encoding.

> >> Not that I care so much, but are NetBSD supposed to have its files
> >> in Latin1?  Is that supposed to be the source character set, or
> >> what?
> > I think that simply is the practical reality.
> I agree.

I think the default should be either ASCII or UTF-8.  Other encodings
are too abmiguous.  For example, when you see an octet outside the ASCII
range and not part of a valid UTF-8 sequence, do you guess that it's
iso-8859-1, iso-8859-2, iso-8859-whatever, or something else entirely?

> I think the default should be Latin-1, except that I also think tools
> such as wc should, by default, not complain about invalid Latin-1,
> instead sticking with the traditional behaviour of operating on bytes
> rather than characters.

I was talking about the default encoding used for source code and text
files supplied with the OS.  How tools should behave is a different
question, but I share your concerns.

--apb (Alan Barrett)


Home | Main Index | Thread Index | Old Index