BUG #16068: Collate of 'Norwegian Bokmål' is problematic
The following bug has been logged on the website:
Bug reference: 16068
Logged by: Robert Ford
Email address: robfordww@gmail.com
PostgreSQL version: 12.0
Operating system: Windows
Description:
Hi,
I want to point to an issue I discovered when installing v12.0 for windows.
The installer sets the Collate in the config file to 'Norwegian Bokmål.1251'
(or something similar, but notice the 'å') This seems to trigger all kind
of bugs. For instance "select * from pg_settings" results in an utf8 decode
error. PgAdmin also returns a lot of utf8 decode errors. The problem
seemed to go away when I changed the collation to "nb_NO" and ran initdb.
This error should only occur in countries where the installer creates
Collation names with non-ascii characters, Norway is one of them.
PG Bug reporting form <noreply@postgresql.org> writes:
I want to point to an issue I discovered when installing v12.0 for windows.
Um, Windows-what exactly?
The installer sets the Collate in the config file to 'Norwegian Bokmål.1251'
(or something similar, but notice the 'å') This seems to trigger all kind
of bugs.
That collation name has given us trouble before, cf
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e
I wonder whether Microsoft changed it again :-(
regards, tom lane
[ please keep the list cc'd ]
Robert Ford <robfordww@gmail.com> writes:
On Sat, Oct 19, 2019, 22:03 Tom Lane <tgl@sss.pgh.pa.us> wrote:
The installer sets the Collate in the config file to 'Norwegian
Bokmål.1251' (or something similar, but notice the 'å')
Um, Windows-what exactly?
That collation name has given us trouble before, cf
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e
I wonder whether Microsoft changed it again :-(
Windows server 2012
Hm, that's not very new, and it's certainly a version we've tested.
In fact I'd have guessed the above-mentioned patches were tested
against that.
Anyway, my first thought about this is that the mapping installed by
db29620d4 looks like it will recognize 'Norwegian (Bokmål)' but not
'Norwegian Bokmål'. Could you be more precise about exactly what
you're seeing in the config file?
regards, tom lane
Import Notes
Reply to msg id not found: CABbzBDAgGVvwhvEJcwsB4vpzMknyNzBAxRSNUwMypbucphWt6w@mail.gmail.com
Sorry about this, but the version was *"Windows 2016 standard*". I let the
installer stay on "Default locale" while installing. This results is a
config file with the following values:
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error
message
# strings
lc_monetary = 'Norwegian Bokmål_Norway.1252' # locale for monetary
formatting
lc_numeric = 'Norwegian Bokmål_Norway.1252' # locale for number formatting
lc_time = 'Norwegian Bokmål_Norway.1252' # locale for time formatting
Then:
C:\Program Files\PostgreSQL\12\bin>psql postgres postgres
Password for user postgres:
psql (12.0)
WARNING: Console code page (850) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
*postgres=# select * from pg_settings;ERROR: invalid byte sequence for
encoding "UTF8": 0xe5 0x6c 0x5f*
søn. 20. okt. 2019 kl. 23:57 skrev Tom Lane <tgl@sss.pgh.pa.us>:
Show quoted text
[ please keep the list cc'd ]
Robert Ford <robfordww@gmail.com> writes:
On Sat, Oct 19, 2019, 22:03 Tom Lane <tgl@sss.pgh.pa.us> wrote:
The installer sets the Collate in the config file to 'Norwegian
Bokmål.1251' (or something similar, but notice the 'å')Um, Windows-what exactly?
That collation name has given us trouble before, cfhttps://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e
I wonder whether Microsoft changed it again :-(
Windows server 2012
Hm, that's not very new, and it's certainly a version we've tested.
In fact I'd have guessed the above-mentioned patches were tested
against that.Anyway, my first thought about this is that the mapping installed by
db29620d4 looks like it will recognize 'Norwegian (Bokmål)' but not
'Norwegian Bokmål'. Could you be more precise about exactly what
you're seeing in the config file?regards, tom lane
Robert Ford <robfordww@gmail.com> writes:
Sorry about this, but the version was *"Windows 2016 standard*". I let the
installer stay on "Default locale" while installing. This results is a
config file with the following values:
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error
message
Okay, so we need to translate that string to 'Norwegian_Norway' too.
That's an easy fix, but as far as I can tell from the past discussions
about this, the bugs it'll fix are distinct from what you're complaining
about here:
WARNING: Console code page (850) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
We don't have any support for Windows code page 850. Looking at the
wikipedia page about that doesn't make me much inclined to add it
either: wikipedia says that (a) it's largely been obsoleted by 1252,
and (b) there's confusion about what the code page's contents are,
specifically whether it contains a euro sign. So my recommendation
here is just to switch your console code page to 1252.
*postgres=# select * from pg_settings;ERROR: invalid byte sequence for
encoding "UTF8": 0xe5 0x6c 0x5f*
That hex sequence looks suspiciously like "ål_" in CP1252, so this is an
encoding confusion problem. I think it'd go away if you simplified
these postgresql.conf entries to 'Norwegian_Norway.1252' and restarted.
What I don't remember offhand is where the funny locale name spelling
might've propagated besides these entries.
regards, tom lane