Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)

To: tech-userlevel%netbsd.org@localhost
Subject: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
From: Joachim König <him%online.de@localhost>
Date: Mon, 15 Sep 2008 22:00:45 +0200

On Fri, Sep 12, 2008 Greg A. Woods wrote:

Perhaps this is what Joachim meant in part and simply mis-spoke by saying 
"Latin-1" and really meant 7-bit US ASCII?

Latin-1 seems to be the most used 8-Bit char encoding, at least it's themost used 8-bit encoding in python, but it's not that important to me

if it's 7-bit ascii or something else. My main point was that there
should be a default in case we detect a text file where one of the bytes
has the 8th bit set. In the case of 7-bit ascii it would be an error.

The BOM OTOH doesn't solve the problem everywhere of course. The
fact that a file is a text file encoded in a certain way is actually

information about the file and should be stored somewhere in themetadata of the file and not in the byte stream itself. In mostfilesystems, this knowledge has to come from somewhere else and theapplication has to correctly specifiy the mode when opening the

file (e.g. 'rb' or 'r') and doing the decoding itself, but the
fopen-ing doesn't make a difference on unix, but on Windows. The BOM
is only a vehicle to help guessing the encoding and ordering.

Joachim

Follow-Ups:
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Greg A. Woods; Planix, Inc.

Prev by Date: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Next by Date: "Re: Does have any article about NetBSD quota" or "to MKHTML or not to MKHTML"
Previous by Thread: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Next by Thread: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Indexes:

Home | Main Index | Thread Index | Old Index