tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [PATCH] Support for mbsnrtowcs and wcsnrtomb



SODA Noriyuki wrote:
> But stateful encodings are totally different from stateless encodings
> in some aspect.

Yeah, I know that :-) Obviously, it is bulky and probably inefficient to
use sequences of mbrtowc calls to implement mbs[n]rtowcs(), when the
initialization of that routine needs to perform many operations related
to the initialisation of the state. In fact, this was (probably) the
very reason for having mbsrtowcs() in the C94 Amd1.


> For example, the behavior of mbrstowcs() and mbsnrtowcs() in the
> specification (i.e. resetting the state) makes those functions
> unuseful to parse a file with a stateful encoding.

I can agree with you about mbrstowcs() here: if you pass a 0-terminated
string to that function, it is required to end in the initial state,
that is, to reset the state to initial.
Note this is transposed into the iso-2022-jp encoding for emails: you
are required to put (otherwise unnecessary) "ESC $ x Y" at the start of
each line, even if the previous line ended in kanji.

But I do not see why this would apply to mbrnstowcs().
Moreover, I believe the real point of that function is exactly here: to
be able to call it with just the content of a line, _without_ the final
\0; thus the call will translate all the characters in the line, but
will keep the state information (since _only_ in case of terminated line
it is reset to the initial state); thus it would allow to be called
later with another line, without needing the introducing sequence.


Antoine


Home | Main Index | Thread Index | Old Index