NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

sending/receiving UTF-8 characters from terminal to program



Hi All,

My locale is as follows:

export LANG="hu_HU.UTF-8"
export LC_CTYPE="hu_HU.UTF-8"
export LC_MESSAGES="hu_HU.UTF-8"

The problem is that even though I can type the special characters of the locale everywhere (though it's funny in the terminal where they only appear after the third key press) in X but when echoing a string containing such chars like (áéíóöőúű) and piping them to a program, they arrive somehow differently compared to when the same string is passed by calling the same program. I don't have any other example than foma (https://fomafst.github.io). It has a tool called flookup which requires special morphological dictionaries which most probably noone uses here (except me) but when passing a string from the command line like:

echo néz|flookup magyar.fst

it results in:

néz     +?

However, when passing the string as:

echo néz|flookup magyar.fst

I get a successful analysis:

néz    +swConsonant+néz[stem]+CON
néz    +swConsonant+néz[stem]+CON+Nom
néz    néz[stem]+Verb+IndefSg3

When calling the api function behind flookup from a program passing the string 'néz', I also get the analysis successfully. I don't have a clear explanation for this (only partially) and I also wonder why the terminal does not translate the locale special utf-8 bytes back to a character when they're printed by the program. Actually, it's just inconvenient to always type the strings what I want to analyse in a program, compile and execute it instead of giving it a go from the shell.

Could anyone explain me what happens here and how I can handle it?

Thanks,
r0ller



Home | Main Index | Thread Index | Old Index