tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



Mouse <mouse%Rodents-Montreal.ORG@localhost> wrote:
 |Interpreting octets - or chars - as characters is a human-interface
 |thing, and I think it should stay at the human-interface layer.

Maybe the draft wasn't too clear; i guess the interface should be
splitted into a tc* and a tg* family, i.e., codepoint-wise and
graphem-wise, where the former simply represents a replacement for
what the current *w* family does, but without the byte->wide->byte
round-trip, plus some additions to de/normalize UTF-8 input, and
to verify input correctness on explicit request, but not knowing
about combining etc.  boundaries, whereas the latter will
automatically understand those, too.

The internal implementation can handle both cases rather
unchanged, it's just a matter of what kind of PEEKBOUND is used
(and where this is not true, it should be changed), given that no
automatic conversion should be applied, which is not what
i propose.
Of course this tg* family will not work for the initial
implementation, which rather doesn't know anything about the data
that it is working on.  It's just a draft in the end.

 |/~\ The ASCII                           Mouse
 |\ / Ribbon Campaign
 | X  Against HTML              mouse%rodents-montreal.org@localhost
 |/ \ Email!         7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Thanks,

--steffen


Home | Main Index | Thread Index | Old Index