tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n

der Mouse <mouse%Rodents-Montreal.ORG@localhost> writes:

>>> The main reason for it is that UTF-8 wastes half of bandwidth on
>>> wire,
> This is true only if you normally use a non-ASCII set of characters
> that have an 8-bit character set, and you're comparing to such a set.
> (Examples might be 8859-7 and KOI-8.  Not 8859-1, because most 8859-1
> users draw on the ASCII low half heavily.)

Maybe it isn't obvious, but there's quite large part of the world that
writes non-Latin script and as such uses mostly non-Latin characters. ;)

>> - How does UTF-8 waste half of the bandwidth?
> See above.  If you use, say, KOI-8 (Cyrillic, which strikes me as
> likely what Aleksej is using), or 8859-7 (Greek), or 8859-8 (Hebrew)
> and are using mostly the non-ASCII half, then UTF-8 encoding results in
> two octets per character on the wire for most characters, as opposed to
> using KOI-8 (or whatever), which uses one octet per character.
> Sometimes this is important; sometimes it's not.  Aleksej has a good
> point in that FFS (which is probably what most NetBSD systems use) has
> a limit of 255 on directory entry name length - but that's 255 octets,
> not 255 characters.  If you have a tendency to use file names in the
> 100-200 character range, this may well matter to you.  There doubtless
> are plenty of other relatively small limits which look smaller when
> viewed through UTF-8 glasses....

You can use whatever short file names, it's at your wish... but only
before you have to communicate with the outer world, and out there it is
quite usual to have file names longer than 100 and even 200 characters.
And it's not you who can change this.


Attachment: pgpytbjDBMJ_S.pgp
Description: PGP signature

Home | Main Index | Thread Index | Old Index