pkgsrc-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: postgresql encoding/locale issues
Thanks all for the comments. It is providing some clarity. Responding
to all at the risk of being confusing;
Jonathan Perkin <jperkin%mnx.io@localhost> writes:
> This has been my understanding for many years - LANG and LC_ALL should
> not be set, but instead use LC_CTYPE to specify the general locale you
> wish to use (en_GB.UTF-8 for me), and then other LC_* as appropriate.
I realize we are talking about pgsql, but to get messages in a different
language, LC_CTYPE doesn't do it (and that doesn't surprise me). But, I
agree that for pgsql we are talking about choosing the db encoding.
$ LC_CTYPE=fr_FR.UTF-8 date
Fri Apr 14 20:17:49 EDT 2023
$ LANG=fr_FR.UTF-8 date
ven. avr. 14 20:18:04 EDT 2023
> In particular you likely do not want LC_COLLATE to default to whatever
> you set LANG to, as the only sane sort order is LC_COLLATE=C.
(I think this was Joerg's point.)
What if my strings are in UTF-8? Wouldn't I want them interpreted as
unicode and sorted that way, vs sorting the utf-8 encoding as bytes?
And if I were using fr_FR.ISO8859-1 I would sort of expect e è and é to
sort near each other despite the first one's codepoint being far from
the others, but I have no idea what is correct. (I was raised ASCII and
don't use newfangled quotes, so I don't really have this issue.)
> From: Edgar Fuß <ef%math.uni-bonn.de@localhost>
> CREATE DATABASE foo WITH OWNER bar TEMPLATE template0 ENCODING UTF8 LC_CTYPE de_DE.UTF-8
Thanks - this avoids the issue of pgsql interpreting locale variables,
and that seems like a good use of a big hammer. This seems very much
like what Matthias is doing.
> From: Robert Elz <kre%munnari.OZ.AU@localhost>
| which also
| seems normal (to be using UTF-8, and one's own language).
> [straightening me about env var processing order omitted, but thank you]
> Using ones own language, sure, but setting UTF-8 in LANG might not
> be the best idea, LANG provides the default locale for LC_TIME
> LC_MONETARY LC_... as well as LC_CTYPE the only locale setting
> for which the character encoding method is really relevant.
This is perhaps the issue, and it still feels like a bug. As far as I
can tell, LANG typically includes an encoding, not just a country code
pair. (The variable LANGUAGE, perhaps a linuxism, seems to not have
encoding).
I find that "LANG=es_ES date" prints in English (surely "C"), and
"LANG=fr_FR.UTF-8 date" is French.
So certainly it's valid to suggest that I only set LC_CTYPE because
that's all I want to control, but setting LANG=en_US.UTF-8 seems valid
in general to me, and corresponds to what I expect people whose desired
language isn't English would set.
Regarding COLLATE, setting LANG leads to:
$ LANG=en_US.UTF-8 locale
LANG="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="C"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=""
which seems ok.
I wonder then if pgsql expects LANG not to have an encoding, but it
certainly seems like it should give a much better error message.
Home |
Main Index |
Thread Index |
Old Index