tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [PATCH] replace 0xA0 to whitespace in plain text files (part 2)



> I'm being pedantic by using "Unix".  :-)

> So, I'm saying Unix(tm)

I thought the trademark was UNIX, not Unix.

>> The impression I've gotten from the Unices I've used is that files
>> don't _have_ encodings or charsets; those are imposed, if at all, by
>> the software (and sometimes hardware, eg, serial terminals) that
>> interprets the octet stream stored in the file.
> Indeed, but that's obviously an isolationist view of things, wouldn't
> you agree?

No, actually, I don't agree.  What am I missing?  What does it isolate
from what?  It's the only way I can see to allow different users and
processes the freedom to each decide what encoding they want use.
Indeed, for compatability with the rest of the world, it pretty much
_must_ be possible to manipulate text in more or less any encoding.

> Indeed, that's why I pedantically used "_sanest_" to emphasize that
> it's not ideal in any way and that these alternatives are by far
> quite secondary to the hopefully better idea promoted by Plan 9.

What idea is that?  "When we want to know what encoding you're using
we'll tell you"?  I much prefer that my OS not try to dictate to me
what encoding I shall use for my data - text or otherwise, in files or
otherwise.

>> I'd call it diametrically opposed to one of Unix's great strengths
>> (that strength being a lack of distinctions such as text vs binary
>> or STREAM-LF vs fixed-length records vs ISAM vs etc).
> I think the Unix (and now Plan 9) way of looking at text encoding and
> charset is the best compromise given the way the tools were designed.

I thought you said the Plan 9 way was "thou shalt use UTF-8 for all thy
text".  Are you saying that's the UNIX way too?  It certainly wasn't
back on BSD 4.1c, BSD 4.2, SunOS 3.x, SunOS 4.x...indeed, any Unix
variant I've ever used, I think including at least a few that had the
right to use the UNIX trademark name.

> I'm still, after 28 years of pretty much full-on exposure to, and
> immersion in, the idea, not quite convinced that it's a good idea to
> always be so agnostic about text versus arbitrary and perhaps opaque
> binary content of files.

Always?  No, it isn't.  For example, in email it is worse than at least
one of the alternatives; that's why we have MIME.

But for an OS, and its (supposedly) general-purpose tools?  I would
call anything else arrogant and obnoxious, even "broken" in my more
strident and frustrated moods.

/~\ The ASCII                           der Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index