Subject: Re: codeset recoding engine
To: None <tech-kern@netbsd.org>
From: Christos Zoulas <christos@zoulas.com>
List: tech-kern
Date: 11/13/1999 00:13:20
In article <199911112248.XAA01806@jdolecek.per4mance.cz>,
Jaromir Dolecek <dolecek@ics.muni.cz> wrote:
>
>const struct codeset_table *get_codeset __P((const char *));
>u_int32_t unicode_convert __P((const struct codeset_table *, unicode_t));
>ssize_t unicode_convert_string __P((const struct codeset_table *,
>                        const unicode_t *, size_t, char *, size_t));
>u_int32_t codeset_getrune __P((const struct codeset_table *, const char *,
>                        char const **));
>

I would prefer that all routines have a common prefix (maybe codeset_)
so they don't pollute thew hole namespace.

>get_codeset() returns a pointer to structure used for recoding,
>the pointer is later passed to unicode_convert(), unicode_convert_string()
>and codeset_getrune().

codeset_get()? maybe

>unicode_convert() converts single unicode character to the representation
>of target codeset.

codeset_unicode_getc()? maybe

>unicode_convert_string() recodes string of unicode characters
>into string of target characters. Appropriate encoding is applied,
>so that e.g. eucjp is encoded to euc or common Unicode to utf-8.

codeset_unicode_gets()? maybe

Why does it return ssize_t? Can it fail? It also the order of the
arguments should be (dst, src) instead of (src, dst) like most other
routines.

>One remaining issue - I'm not an eglish speaker, so I don't know
>whether term "codeset" or "charset" should be used. "Charset"
>is used by virtually everyone else, but "codeset" seems to be slighly
>better, as we are working with set of codes (i.e. the numbers
>used to represent characters) and not the actual character sets.
>I'm not attached to using "codeset" though, I'd like to use whatever
>is gramatically & semantically OK.
>Opinions ?

It should be referred to as "code set" in the documentation I believe.

christos