[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unicode programming
>In theory, it's better to use the existing well-tested wheel rather
>than reinventing it. But given the current state of such things, I'm
>far from convinced that the existing wheels are well-tested, never mind
>the situation on systems where they don't exist at all.
I understand; it looks like that might be the only portable option
>> I'm also wondering what people do about things like finding out how
>> many columns a particular series of Unicode codepoints occupies;
>"columns"? In terms of pixels, or character cells, or what? In any
>case, the answer will depend on what's displaying them; since you said
>you're pushing display off to some other program, you can't really tell
>even in principle. Personally, I'd probably do what I do when I want
>to line things up now: assume a character-cell font, meaning that each
>character occupies one character cell. But I'm hardly an expert.
Remember that this is a command-line program that is going to be run
inside of an xterm or the equivalent; when I say "columns", I mean
"character cells"; I don't care about pixels.
>wcwidth and wcswidth are actually unimplementable, because they depend
>on information not available even in theory to the application (the
>application responsible for displaying text may not have even been
>chosen, much less started, at wcwidth() time, and may run on a
>completely different machine). For that matter, when text is displayed
>in a variable-pitch font, "column positions" don't really even exist.
I am thinking that wcwidth and wcswidth are making assumptions that
I am making; it's a situation similar to "command-line program inside
of xterm". But like you have said, they aren't really general.
>I don't really consider myself competent to comment on what's common.
>In your situation, I'd probably just stuff codepoints in (a typedef
>for) unsigned short - you said you're willing to write off anything
>outside the BMP.
Well, I don't want to write off, per se ... if I get something
outside of the BMP, I want to send it off to xterm or the equivalent.
If I get the number of columns that those codepoints take up wrong, I
won't lose sleep over it. That's why I was thinking of maybe UTF-16;
it will handle the BMP relatively efficiently, but can deal with stuff
outside of it.
Thanks for the input!
Main Index |
Thread Index |