Re: sending/receiving UTF-8 characters from terminal to program

To: RVP <rvp%SDF.ORG@localhost>
Subject: Re: sending/receiving UTF-8 characters from terminal to program
From: r0ller <r0ller%freemail.hu@localhost>
Date: Fri, 20 Jan 2023 15:09:44 +0100

Well, checking what printf results in, I get:

$printf 'néz'|hexdump -C
00000000  6e e9 7a                                          |n.z|
00000003
$printf $'n\uE9z'|hexdump -C
00000000  6e c3 a9 7a                                       |n..z|
00000004

It's definitely different from what you got for 'néz'. What does that mean?

Thanks,
r0ller

On 1/20/23 9:55 AM, RVP wrote:

On Fri, 20 Jan 2023, r0ller wrote:

Thanks for your efforts to reproduce it :) I just don't get why itworks for you with the same locales and why it doesn't for me.Are there any other settings that affect encoding besides LC variablesand LANG?


Since we seem to have the same flookup binary, check against the
magyar.fst I used:

https://github.com/r0ller/alice/tree/master/hi_android/foma

Next check that the input you're feeding to flookup actually _is_
UTF-8. Both /bin/sh and bash output UTF-8 if given Unicode code-
points in the form `\uNNNN'. So,

$ printf 'néz' | hexdump -C
00000000  6e c3 a9 7a                                       |n..z|
00000004
$ printf $'n\uE9z' | hexdump -C
00000000  6e c3 a9 7a                                       |n..z|
00000004
$

If that works, then check those UTF-8 bytes against whatever the
terminal emulator generated from your keystrokes for the `&eacute;'
in `néz'.

-RVP

Follow-Ups:
- Re: sending/receiving UTF-8 characters from terminal to program
  - From: Valery Ushakov

References:
- sending/receiving UTF-8 characters from terminal to program
  - From: r0ller
- Re: sending/receiving UTF-8 characters from terminal to program
  - From: RVP
- Re: sending/receiving UTF-8 characters from terminal to program
  - From: r0ller
- Re: sending/receiving UTF-8 characters from terminal to program
  - From: RVP

Prev by Date: Re: sending/receiving UTF-8 characters from terminal to program
Next by Date: Re: sending/receiving UTF-8 characters from terminal to program
Previous by Thread: Re: sending/receiving UTF-8 characters from terminal to program
Next by Thread: Re: sending/receiving UTF-8 characters from terminal to program
Indexes:

Home | Main Index | Thread Index | Old Index