Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)

To: NetBSD Userlevel Technical discussion list <tech-userlevel%NetBSD.org@localhost>
Subject: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
From: "Greg A. Woods; Planix, Inc." <woods%planix.ca@localhost>
Date: Tue, 16 Sep 2008 13:43:43 -0400


On 16-Sep-08, at 1:03 PM, der Mouse wrote:

The ideal solution, given these limitations, is to maintain the
illusion that there is effectively only one true encoding and charset
(for "text" files, at least), just as Unix has always done,


I don't know what Unix you've been using, but I've certainly never had
any such "illusion".


I'm being pedantic by using "Unix".  :-)

So, I'm saying Unix(tm) is/was an ASCII-only system. Systemsconforming to some POSIX enxtension/revision can give the illusionthat each given "session" (usually per "user", but potentially perprocess) works with one given charset and encoding.

Before I had a more world-aware view of systems I often laughed at theoddness of things like the "text" and "binary" commands incommunications tools such as FTP.

The impression I've gotten from the Unices I've used is that files
don't _have_ encodings or charsets; those are imposed, if at all, by
the software (and sometimes hardware, eg, serial terminals) that
interprets the octet stream stored in the file.  Indeed, there isn't

even any notion of a text file per se, only tools that treat files -or

more often octet streams - as text (and others that don't, of course).

Indeed, but that's obviously an isolationist view of things, wouldn'tyou agree?

The _sanest_ alternatives would be to add content-specifying metadata
to the filesystem, and all the tools necessary to make sure it's
always set and used as correctly as possible;


I dunno; I wouldn't call that sane;

Indeed, that's why I pedantically used "_sanest_" to emphasize thatit's not ideal in any way and that these alternatives are by far quitesecondary to the hopefully better idea promoted by Plan 9.

I'd call it diametrically opposed
to one of Unix's great strengths (that strength being a lack of
distinctions such as text vs binary or STREAM-LF vs fixed-length
records vs ISAM vs etc).

I think the Unix (and now Plan 9) way of looking at text encoding andcharset is the best compromise given the way the tools were designed.

I'm still, after 28 years of pretty much full-on exposure to, andimmersion in, the idea, not quite convinced that it's a good idea toalways be so agnostic about text versus arbitrary and perhaps opaquebinary content of files. There are of course many advantages, andperhaps even some of the minor disadvantages (such as forcing users tobe cautious) are actually good in the long run. I see advantages inthe way Apple's systems have used metadata to enhance functionalityand to provide a "safer" environment for naive users.

Even further from sane I suggested, almost entirely in jest, the ideaof MIME processing in STDIO. Truly though I think that would be theonly way to allow applications to be completely free from having torely on user judgement about file contents. Files, at least textfiles, would always be wrapped in meta information about their contentencoding and charset (and perhaps type too). Any file that was notbinary data would therefore have MIME headers. I.e. this would belike having filesystem metadata to provide the same information, butit would keep everything in userland, and even more importantly itwould provide applications with a system-supplied set of "invisible"methods for dealing with all the conversion issues while at the sametime maintaining the illusion for users that they are free to view thewhole world as using their personally chosen encoding and charset.I.e. STDIO would automatically convert all MIME files into the user'scurrent locale for use by all applications. It's sort of the extremeof what someone earlier proposed as the way to do everything "properly".


--
                                        Greg A. Woods; Planix, Inc.
                                        <woods%planix.ca@localhost>

Attachment: PGP.sig
Description: This is a digitally signed message part

Follow-Ups:
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: der Mouse

References:
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Joachim König
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: Greg A. Woods; Planix, Inc.
- Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
  - From: der Mouse

Prev by Date: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Next by Date: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Previous by Thread: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Next by Thread: Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)
Indexes:

Home | Main Index | Thread Index | Old Index