pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: postgresql encoding/locale issues



    Date:        Fri, 14 Apr 2023 08:39:15 -0400
    From:        Greg Troxel <gdt%lexort.com@localhost>
    Message-ID:  <rmiwn2eocy4.fsf%s1.lexort.com@localhost>

I can't comment on any of the applications you're using, or what
they need - I know nothing about any of them.

I also don't know a lot about locales in general, but I do have
some understanding of how the environment variables are intended
to work:

  | My own shell environment has
  |   LANG=en_US.UTF-8
  | which as I understand should mean LC_* are all that value,

Not quite.   The locale env vars, the various LC_* and LANG form a
hierarchy, with LC_ALL at the top, the other LC_* all equal next (but
used for different purposes) and LANG at the bottom.   LANG essentially
sets the default locale when none of the other vars is set (or for the
specific LC_thing vars, when the one wanted is not set, and nor is LC_ALL).

  | which also
  | seems normal (to be using UTF-8, and one's own language).

Using ones own language, sure, but setting UTF-8 in LANG might not
be the best idea, LANG provides the default locale for LC_TIME
LC_MONETARY LC_...  as well as LC_CTYPE the only locale setting
for which the character encoding method is really relevant.

  | I have added LC_CTYPE and LC_ALL after searching about this.

That would be overkill, LC_ALL should almost never be set in the
normal environment, it is intended to be used by specific applications
that (perhaps temporarily) need a specific environment setting, as it
overrides all the others.   Setting LC_ALL means any other LC_* or LANG
settings become irrelevant.

  |   /usr/pkg/pgsql #> env|egrep LANG\|LC
  |   LANG=en_US.UTF-8
  |   LC_CTYPE=en_US.UTF-8
  |   LC_ALL=en_US.UTF-8

Only the last of those is doing anything useful, and if some locale
setting other than LC_CTYPE is being used (which from what you have
said I assume it probably is) attempting to force UTF-8 into it is
likely what is breaking things.

  | If I remove LANG and LC_ALL, but leave LC_CTYPE, postgresql starts and
  | creates a database.

Which suggests that en_US.UTF-8 is a valid LC_CTYPE setting, but
isn't valid for some other locale setting - for you that setup is
probably fine, as for things other than the char encoding, the
default (C, or POSIX) locale will probably work for you, since most
of those settings are work works in the US...

But you could also try
	LANG=en_US
	LC_CTYPE=en_US.UTF-8

to force things to default to US English, and with a UTF-8 encoding
of characters.

  | So:

I can't comment on 2 of your questions, but"

  |   If postgresql cares about LC_CTYPE for encoding, why is it objecting
  |   to other locale variables being set?

I would assume that postgresql also cares about one of the other LC_xxx
settings, for which en_US.UTF-8 is not a valid value (no locate data, for
that category, whichever one it is, with that designator, exists in the
system).

kre



Home | Main Index | Thread Index | Old Index