tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Unicode programming



>In theory, it's better to use the existing well-tested wheel rather
>than reinventing it.  But given the current state of such things, I'm
>far from convinced that the existing wheels are well-tested, never mind
>the situation on systems where they don't exist at all.

I understand; it looks like that might be the only portable option
right now.

>>   I'm also wondering what people do about things like finding out how
>>   many columns a particular series of Unicode codepoints occupies;
>
>"columns"?  In terms of pixels, or character cells, or what?  In any
>case, the answer will depend on what's displaying them; since you said
>you're pushing display off to some other program, you can't really tell
>even in principle.  Personally, I'd probably do what I do when I want
>to line things up now: assume a character-cell font, meaning that each
>character occupies one character cell.  But I'm hardly an expert.

Remember that this is a command-line program that is going to be run
inside of an xterm or the equivalent; when I say "columns", I mean
"character cells"; I don't care about pixels.

>wcwidth and wcswidth are actually unimplementable, because they depend
>on information not available even in theory to the application (the
>application responsible for displaying text may not have even been
>chosen, much less started, at wcwidth() time, and may run on a
>completely different machine).  For that matter, when text is displayed
>in a variable-pitch font, "column positions" don't really even exist.

I am thinking that wcwidth and wcswidth are making assumptions that
I am making; it's a situation similar to "command-line program inside
of xterm".  But like you have said, they aren't really general.

>I don't really consider myself competent to comment on what's common.
>In your situation, I'd probably just stuff codepoints in (a typedef
>for) unsigned short - you said you're willing to write off anything
>outside the BMP.

Well, I don't want to write off, per se ... if I get something
outside of the BMP, I want to send it off to xterm or the equivalent.
If I get the number of columns that those codepoints take up wrong, I
won't lose sleep over it.  That's why I was thinking of maybe UTF-16;
it will handle the BMP relatively efficiently, but can deal with stuff
outside of it.

Thanks for the input!

--Ken


Home | Main Index | Thread Index | Old Index