Collation versions on Windows (help wanted, apply within)

Started by Thomas Munroabout 6 years ago14 messages
#1Thomas Munro
thomas.munro@gmail.com
2 attachment(s)

Hello hackers,

Here's a draft patch that teaches PostgreSQL how to ask for collation
versions on Windows. It receives a pair of DWORDs, which, it displays
as hex. The values probably have an internal structure that is
displayed in a user-friendly way by software like Active Directory and
SQL Server (I'm pretty sure they both track collation versions and
reindex), but I don't know. It is based on the documentation at:

https://docs.microsoft.com/en-us/windows/win32/win7appqual/nls-sorting-changes

My understanding of our OS and tool chain version strategy on that
platform is limited, but it looks like we only allow ourselves to use
Vista (and later) APIs if the compiler is Visual Studio 2015 (aka
14.0) or later. So I tested that this builds cleanly on AppVeyor
using that compiler (see attached CI patch). The regression tests
failed with Windows error code 87 before I added in the check to skip
"C" and "POSIX", so I know the new code is reached. I don't have an
environment to test it beyond that.

The reason for returning an empty string for "C" and "POSIX" is the
following comment for get_collation_actual_version():

* A particular provider must always either return a non-NULL string or return
* NULL (if it doesn't support versions). It must not return NULL for some
* collcollate and not NULL for others.

I'm not sure why, or if that really makes sense.

Do any Windows hackers want to help get it into shape? Some things to
do: test it, verify that the _WIN32_WINNT >= 0x0600 stuff makes sense
(why do we target such ancient Windows releases anyway?), see if there
is way we could use GetNLSVersion() (no "Ex") to make this work on
older Windows system, check if it makes sense to assume that
collcollate is encoded with CP_ACP ("the system default Windows ANSI
code page", used elsewhere in the PG source tree for a similar
purpose, but this seems likely to go wrong for locale names that have
non-ASCII characters, and indeed we see complaints on the lists
involving the word "Bokmål"), and recommend a better way to display
the collation version as text. I'll add this to the next commitfest
to attract some eyeballs (but note that when cfbot compiles it, it
will be using an older tool chain and Win32 target, so the new code
will be ifdef'd out and regression test success means nothing).

To test that it works, you'd need to look at the contents of
pg_collation to confirm that you see the new version strings, create
an index on a column that explicitly uses a collation that has a
version, update the pg_collation table by hand to have a to a
different value, and then open a new session and to access the index
to check that you get a warning about the version changing. The
warning can be cleared by using ALTER COLLATION ... REFRESH VERSION.

Attachments:

0001-Add-collation-versions-for-Windows.patchapplication/octet-stream; name=0001-Add-collation-versions-for-Windows.patchDownload
From 5174d658dc7ab05964518bcfc82b82e408fb6a17 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 8 Nov 2019 12:01:05 +1300
Subject: [PATCH 1/2] Add collation versions for Windows.

Work in progress.
---
 src/backend/utils/adt/pg_locale.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index fcdbaae37b..36f70279e5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1532,6 +1532,29 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
 		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(WIN32) && _WIN32_WINNT >= 0x0600
+		/*
+		 * If we are targeting Windows Vista and above, we can ask for a
+		 * name given a collation name (earlier versions required a
+		 * location code that we don't have).
+		 */
+		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
+		WCHAR wide_collcollate[LOCALE_NAME_MAX_LENGTH];
+
+		/* These would be invalid arguments, but have no version. */
+		if (pg_strcasecmp("c", collcollate) == 0 ||
+			pg_strcasecmp("posix", collcollate) == 0)
+			return pstrdup("");
+
+		/* For all other names, ask the OS. */
+		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
+							LOCALE_NAME_MAX_LENGTH);
+		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
+			ereport(ERROR,
+					(errmsg("could not get collation version for locale \"%s\": error code %lu",
+							collcollate,
+							GetLastError())));
+		collversion = psprintf("%x,%x", version.dwNLSVersion, version.dwDefinedVersion);
 #endif
 	}
 
-- 
2.23.0

0002-CI.patchapplication/octet-stream; name=0002-CI.patchDownload
From a8406a25827c9502cd239adbc4c64f8b7638626e Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 8 Nov 2019 12:01:33 +1300
Subject: [PATCH 2/2] CI

---
 appveyor.yml  | 25 +++++++++++++++++++++++++
 buildsetup.pl | 38 ++++++++++++++++++++++++++++++++++++++
 dumpregr.pl   | 20 ++++++++++++++++++++
 3 files changed, 83 insertions(+)
 create mode 100644 appveyor.yml
 create mode 100644 buildsetup.pl
 create mode 100644 dumpregr.pl

diff --git a/appveyor.yml b/appveyor.yml
new file mode 100644
index 0000000000..72f5c2b7e9
--- /dev/null
+++ b/appveyor.yml
@@ -0,0 +1,25 @@
+#  appveyor.yml
+
+image: Visual Studio 2015
+
+install:
+  - appveyor-retry cinst winflexbison
+  - '"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /x64'
+  - '"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64'
+
+before_build:
+  - rename c:\ProgramData\chocolatey\bin\win_flex.exe flex.exe
+  - rename c:\ProgramData\chocolatey\bin\win_bison.exe bison.exe
+  - perl buildsetup.pl
+
+build:
+  project: pgsql.sln
+
+test_script:
+  - cd src\tools\msvc && vcregress check
+
+on_failure:
+  - perl dumpregr.pl
+
+configuration:
+  - Release
diff --git a/buildsetup.pl b/buildsetup.pl
new file mode 100644
index 0000000000..23df2fb1aa
--- /dev/null
+++ b/buildsetup.pl
@@ -0,0 +1,38 @@
+# first part of postgres build.pl, just doesn't run msbuild
+
+use strict;
+
+BEGIN
+{
+
+    chdir("../../..") if (-d "../msvc" && -d "../../../src");
+
+}
+
+use lib "src/tools/msvc";
+
+use Cwd;
+
+use Mkvcbuild;
+
+# buildenv.pl is for specifying the build environment settings
+# it should contain lines like:
+# $ENV{PATH} = "c:/path/to/bison/bin;$ENV{PATH}";
+
+if (-e "src/tools/msvc/buildenv.pl")
+{
+    do "src/tools/msvc/buildenv.pl";
+}
+elsif (-e "./buildenv.pl")
+{
+    do "./buildenv.pl";
+}
+
+# set up the project
+our $config;
+do "config_default.pl";
+do "config.pl" if (-f "src/tools/msvc/config.pl");
+
+# print "PATH: $_\n" foreach (split(';',$ENV{PATH}));
+
+Mkvcbuild::mkvcbuild($config);
diff --git a/dumpregr.pl b/dumpregr.pl
new file mode 100644
index 0000000000..08d276b52d
--- /dev/null
+++ b/dumpregr.pl
@@ -0,0 +1,20 @@
+use strict;
+use warnings FATAL => qw(all);
+
+use File::Find;
+
+my $Target = "regression.diffs";
+
+find(\&dump, "src");
+
+sub dump {
+  if ($_ eq $Target) {
+    my $path = $File::Find::name;
+    print "=== \$path ===\\n";
+    open(my $fh, "<", $_) || die "wtf";
+    while (my $line = <$fh>) {
+      print $line;
+      if ($. > 1000) { last; }
+    }
+  }
+}
-- 
2.23.0

#2Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#1)
Re: Collation versions on Windows (help wanted, apply within)

On Fri, Nov 8, 2019 at 12:44 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Do any Windows hackers want to help get it into shape? Some things to
do: test it, verify that the _WIN32_WINNT >= 0x0600 stuff makes sense
(why do we target such ancient Windows releases anyway?)

You have to keep in mind that _WIN32_WINNT also applies to MinGW, so any
build with those tools will use a value of 0x0501 and this code will be
ifdef'd out.

As from where this value comes, my take is that it has not been revised in
a long time [1]/messages/by-id/20090907112633.C851.52131E4D@oss.ntt.co.jp. Windows 7 , Server 2008 and 2008 R2 support will end next
year [2]https://support.microsoft.com/en-us/help/4456235/end-of-support-for-windows-server-2008-and-windows-server-2008-r2 [3]https://support.microsoft.com/en-us/help/4057281/windows-7-support-will-end-on-january-14-2020, maybe you can make a case for updating this value.

see if there is way we could use GetNLSVersion() (no "Ex") to make this
work on
older Windows system

Older systems is just Windows Server 2003, not sure if it is worth any
effort.

check if it makes sense to assume that
collcollate is encoded with CP_ACP ("the system default Windows ANSI
code page", used elsewhere in the PG source tree for a similar
purpose, but this seems likely to go wrong for locale names that have
non-ASCII characters, and indeed we see complaints on the lists
involving the word "Bokmål"), and recommend a better way to display
the collation version as text.

The GetNLSVersionEx() function uses a "Language tag" value, check Language
Code Identifier (LCID) [4]https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/70feba9f-294e-491e-b6eb-56532684c37f, and these tags are plain ascii.

To test that it works, you'd need to look at the contents of
pg_collation to confirm that you see the new version strings, create
an index on a column that explicitly uses a collation that has a
version, update the pg_collation table by hand to have a to a
different value, and then open a new session and to access the index
to check that you get a warning about the version changing. The
warning can be cleared by using ALTER COLLATION ... REFRESH VERSION.

The code works as expected with this collation:

postgres=# CREATE COLLATION en_US (LC_COLLATE = 'en-US', LC_CTYPE =
'en-US');
CREATE COLLATION
postgres=# select * from pg_collation;
oid | collname | collnamespace | collowner | collprovider |
collisdeterministic | collencoding | collcollate | collctype | collversion
-------+-----------+---------------+-----------+--------------+---------------------+--------------+-------------+-----------+-------------
100 | default | 11 | 10 | d | t
| -1 | | |
950 | C | 11 | 10 | c | t
| -1 | C | C |
951 | POSIX | 11 | 10 | c | t
| -1 | POSIX | POSIX |
12326 | ucs_basic | 11 | 10 | c | t
| 6 | C | C |
16387 | en_us | 2200 | 10 | c | t
| 24 | en-US | en-US | 6020f,6020f
(5 rows)

The error code 87 is an ERROR_INVALID_PARAMETER that is raised when the
collate input does not match a valid tag, I would suggest not returning it
directly.

Regards,

Juan José Santamaría Flecha

[1]: /messages/by-id/20090907112633.C851.52131E4D@oss.ntt.co.jp
/messages/by-id/20090907112633.C851.52131E4D@oss.ntt.co.jp
[2]: https://support.microsoft.com/en-us/help/4456235/end-of-support-for-windows-server-2008-and-windows-server-2008-r2
https://support.microsoft.com/en-us/help/4456235/end-of-support-for-windows-server-2008-and-windows-server-2008-r2
[3]: https://support.microsoft.com/en-us/help/4057281/windows-7-support-will-end-on-january-14-2020
https://support.microsoft.com/en-us/help/4057281/windows-7-support-will-end-on-january-14-2020
[4]: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/70feba9f-294e-491e-b6eb-56532684c37f
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/70feba9f-294e-491e-b6eb-56532684c37f

Regards,

Juan José Santamaría Flecha

#3Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#1)
1 attachment(s)
Re: Collation versions on Windows (help wanted, apply within)

On Fri, Nov 8, 2019 at 12:44 AM Thomas Munro <thomas.munro@gmail.com> wrote:

recommend a better way to display the collation version as text.

There is a major and a minor version. The attached patch applies on top the
previous.

Regards,

Juan José Santamaría Flecha

Attachments:

0001-Add-collation-versions-for-Windows-with-major-minor.patchapplication/octet-stream; name=0001-Add-collation-versions-for-Windows-with-major-minor.patchDownload
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 36f7027..e45a215 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1554,7 +1554,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 					(errmsg("could not get collation version for locale \"%s\": error code %lu",
 							collcollate,
 							GetLastError())));
-		collversion = psprintf("%x,%x", version.dwNLSVersion, version.dwDefinedVersion);
+		collversion = psprintf("%d.%d,%d.%d",
+							   (version.dwNLSVersion >> 8) & 0x0000FFFF, version.dwNLSVersion & 0x000000FF,
+							   (version.dwDefinedVersion >> 8) & 0x0000FFFF, version.dwDefinedVersion & 0x000000FF);
 #endif
 	}
 
#4Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#3)
1 attachment(s)
Re: Collation versions on Windows (help wanted, apply within)

On Sat, Nov 9, 2019 at 10:20 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

Do any Windows hackers want to help get it into shape? Some things to
do: test it, verify that the _WIN32_WINNT >= 0x0600 stuff makes sense
(why do we target such ancient Windows releases anyway?)

You have to keep in mind that _WIN32_WINNT also applies to MinGW, so any build with those tools will use a value of 0x0501 and this code will be ifdef'd out.

As from where this value comes, my take is that it has not been revised in a long time [1]. Windows 7 , Server 2008 and 2008 R2 support will end next year [2] [3], maybe you can make a case for updating this value.

Ah, I see, thanks. I think what I have is OK for now then. If
someone else who is closer to the matter wants to argue that we should
always target Vista+ (for example on MinGW) in order to access this
functionality, I'll let them do that separately.

see if there is way we could use GetNLSVersion() (no "Ex") to make this work on
older Windows system

Older systems is just Windows Server 2003, not sure if it is worth any effort.

Cool. Nothing to do here then.

16387 | en_us | 2200 | 10 | c | t | 24 | en-US | en-US | 6020f,6020f

Thanks for testing!

On Sat, Nov 9, 2019 at 11:04 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

There is a major and a minor version. The attached patch applies on top the previous.

Perfect. I've merged this into the patch.

It's interesting that minor version changes mean no order changed but
new code points were added; that must be useful if your system
prevents you from using code points before you add them, I guess (?).
I don't understand the difference between the NLS and "defined"
versions, but at this stage I don't think we can try to be too fancy
here, think we're just going to have to assume we need both of them
and treat this the same way across all providers: if it moves, reindex
it.

Attachments:

0001-Add-collation-versions-for-Windows-v2.patchapplication/octet-stream; name=0001-Add-collation-versions-for-Windows-v2.patchDownload
From 5a793e9c02bdff86cf6aed09d76b889f23d218d1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 8 Nov 2019 12:01:05 +1300
Subject: [PATCH 1/2] Add collation versions for Windows.

On Vista and later, use GetNLSVersionEx() to request collation
version information.

Author: Thomas Munro
Reviewed-by: Juan Jose Santamaria Flecha
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
---
 src/backend/utils/adt/pg_locale.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index fcdbaae37b..b8e11b65c5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1532,6 +1532,33 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
 		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(WIN32) && _WIN32_WINNT >= 0x0600
+		/*
+		 * If we are targeting Windows Vista and above, we can ask for a
+		 * name given a collation name (earlier versions required a
+		 * location code that we don't have).
+		 */
+		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
+		WCHAR wide_collcollate[LOCALE_NAME_MAX_LENGTH];
+
+		/* These would be invalid arguments, but have no version. */
+		if (pg_strcasecmp("c", collcollate) == 0 ||
+			pg_strcasecmp("posix", collcollate) == 0)
+			return pstrdup("");
+
+		/* For all other names, ask the OS. */
+		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
+							LOCALE_NAME_MAX_LENGTH);
+		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
+			ereport(ERROR,
+					(errmsg("could not get collation version for locale \"%s\": error code %lu",
+							collcollate,
+							GetLastError())));
+		collversion = psprintf("%d.%d,%d.%d",
+							   (version.dwNLSVersion >> 8) & 0xFFFF,
+							   version.dwNLSVersion & 0xFF,
+							   (version.dwDefinedVersion >> 8) & 0xFFFF,
+							   version.dwDefinedVersion & 0xFF);
 #endif
 	}
 
-- 
2.23.0

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#1)
Re: Collation versions on Windows (help wanted, apply within)

On Fri, Nov 8, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

The reason for returning an empty string for "C" and "POSIX" is the
following comment for get_collation_actual_version():

* A particular provider must always either return a non-NULL string or return
* NULL (if it doesn't support versions). It must not return NULL for some
* collcollate and not NULL for others.

I'm not sure why, or if that really makes sense.

Peter E, do you have any thoughts on this question?

#6Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Thomas Munro (#5)
Re: Collation versions on Windows (help wanted, apply within)

On 2019-11-26 21:39, Thomas Munro wrote:

On Fri, Nov 8, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

The reason for returning an empty string for "C" and "POSIX" is the
following comment for get_collation_actual_version():

* A particular provider must always either return a non-NULL string or return
* NULL (if it doesn't support versions). It must not return NULL for some
* collcollate and not NULL for others.

I'm not sure why, or if that really makes sense.

Peter E, do you have any thoughts on this question?

Doesn't make sense to me either.

We need to handle the various combinations of null and non-null stored
and actual versions, which we do, but that only applies within a given
collcollate. I don't think we need the property that that comment calls
for.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#6)
Re: Collation versions on Windows (help wanted, apply within)

On Wed, Nov 27, 2019 at 10:38 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2019-11-26 21:39, Thomas Munro wrote:

On Fri, Nov 8, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

The reason for returning an empty string for "C" and "POSIX" is the
following comment for get_collation_actual_version():

* A particular provider must always either return a non-NULL string or return
* NULL (if it doesn't support versions). It must not return NULL for some
* collcollate and not NULL for others.

I'm not sure why, or if that really makes sense.

Peter E, do you have any thoughts on this question?

Doesn't make sense to me either.

We need to handle the various combinations of null and non-null stored
and actual versions, which we do, but that only applies within a given
collcollate. I don't think we need the property that that comment calls
for.

While wondering about that, I noticed the "C.UTF-8" encoding (here on
a glibc system):

postgres=# \pset null <NULL>
Null display is "<NULL>".
postgres=# select collname, collprovider, collencoding, collcollate,
collctype, collversion from pg_collation ;
collname | collprovider | collencoding | collcollate | collctype |
collversion
------------+--------------+--------------+-------------+------------+-------------
default | d | -1 | | | <NULL>
C | c | -1 | C | C | <NULL>
POSIX | c | -1 | POSIX | POSIX | <NULL>
ucs_basic | c | 6 | C | C | <NULL>
C.UTF-8 | c | 6 | C.UTF-8 | C.UTF-8 | 2.28
en_NZ.utf8 | c | 6 | en_NZ.utf8 | en_NZ.utf8 | 2.28
en_NZ | c | 6 | en_NZ.utf8 | en_NZ.utf8 | 2.28
(7 rows)

I wonder if we should do something to give that no collversion, since
we know that it's really just another way of spelling "binary sort
please", or if we'd be better off not trying to interpret the names
locale -a spits out.

Juan Jose,

Would you mind posting the full output of the above query (with <NULL>
showing) on a Windows system after running initdb with the -v2 patch,
so we can see how the collations look?

#8Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Thomas Munro (#7)
Re: Collation versions on Windows (help wanted, apply within)

On 2019-12-16 01:26, Thomas Munro wrote:

While wondering about that, I noticed the "C.UTF-8" encoding (here on
a glibc system):

postgres=# \pset null <NULL>
Null display is "<NULL>".
postgres=# select collname, collprovider, collencoding, collcollate,
collctype, collversion from pg_collation ;
collname | collprovider | collencoding | collcollate | collctype |
collversion
------------+--------------+--------------+-------------+------------+-------------
default | d | -1 | | | <NULL>
C | c | -1 | C | C | <NULL>
POSIX | c | -1 | POSIX | POSIX | <NULL>
ucs_basic | c | 6 | C | C | <NULL>
C.UTF-8 | c | 6 | C.UTF-8 | C.UTF-8 | 2.28
en_NZ.utf8 | c | 6 | en_NZ.utf8 | en_NZ.utf8 | 2.28
en_NZ | c | 6 | en_NZ.utf8 | en_NZ.utf8 | 2.28
(7 rows)

I wonder if we should do something to give that no collversion, since
we know that it's really just another way of spelling "binary sort
please", or if we'd be better off not trying to interpret the names
locale -a spits out.

I think it's worth handling that separately. If we want to give it an
air of generality and not just hard-code that one locale name, we could
strip off the encoding and compare the rest with the well-known encoding
names.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#7)
1 attachment(s)
Re: Collation versions on Windows (help wanted, apply within)

On Mon, Dec 16, 2019 at 1:26 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Nov 27, 2019 at 10:38 AM Peter Eisentraut

Would you mind posting the full output of the above query (with <NULL>
showing) on a Windows system after running initdb with the -v2 patch,
so we can see how the collations look?

Sure, you can find attached the full output with ICU.

This is a resume to illustrate an issue with locale = 'C':

postgres=# CREATE COLLATION c_test (locale = 'C');
CREATE COLLATION
postgres=# select collname, collprovider, collencoding, collcollate,
collctype, collversion from pg_collation ;
collname | collprovider | collencoding | collcollate |
collctype | collversion
------------------------+--------------+--------------+------------------+------------------+-------------
default | d | -1 | |
| <NULL>
C | c | -1 | C |
C | <NULL>
POSIX | c | -1 | POSIX |
POSIX | <NULL>
ucs_basic | c | 6 | C |
C | <NULL>
und-x-icu | i | -1 | und |
und | 153.97
[... resumed ...]
c_test | c | 6 | C |
C |
(757 rows)

Shouldn't it be NULL?

Regards,

Juan José Santamaría Flecha

Attachments:

pg_collation.outapplication/octet-stream; name=pg_collation.outDownload
#10Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#9)
Re: Collation versions on Windows (help wanted, apply within)

On Tue, Dec 17, 2019 at 1:40 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

On Mon, Dec 16, 2019 at 1:26 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Nov 27, 2019 at 10:38 AM Peter Eisentraut

Would you mind posting the full output of the above query (with <NULL>
showing) on a Windows system after running initdb with the -v2 patch,
so we can see how the collations look?

Sure, you can find attached the full output with ICU.

Oh, I didn't realise until now that on Windows, initdb doesn't do
anything like "locale -a" to populate the system locales (which should
perhaps be done with EnumSystemLocalesEx(), as shown in Noah's nearby
problem report). So you have to define them manually, and otherwise
most people probably just use the default.

This is a resume to illustrate an issue with locale = 'C':

postgres=# CREATE COLLATION c_test (locale = 'C');
CREATE COLLATION
postgres=# select collname, collprovider, collencoding, collcollate, collctype, collversion from pg_collation ;
collname | collprovider | collencoding | collcollate | collctype | collversion
------------------------+--------------+--------------+------------------+------------------+-------------
default | d | -1 | | | <NULL>
C | c | -1 | C | C | <NULL>
POSIX | c | -1 | POSIX | POSIX | <NULL>
ucs_basic | c | 6 | C | C | <NULL>
und-x-icu | i | -1 | und | und | 153.97
[... resumed ...]
c_test | c | 6 | C | C |
(757 rows)

Shouldn't it be NULL?

Yeah, it should. I'll post a new patch next week that does that and
also removes the comment about how you can't have a mixture of NULL
and non-NULL, and see if I can identify anything that depends on that.
If you'd like to do that, please go ahead.

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#10)
2 attachment(s)
Re: Collation versions on Windows (help wanted, apply within)

On Wed, Dec 18, 2019 at 11:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Dec 17, 2019 at 1:40 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

This is a resume to illustrate an issue with locale = 'C':

postgres=# CREATE COLLATION c_test (locale = 'C');
CREATE COLLATION
postgres=# select collname, collprovider, collencoding, collcollate, collctype, collversion from pg_collation ;
collname | collprovider | collencoding | collcollate | collctype | collversion
------------------------+--------------+--------------+------------------+------------------+-------------
default | d | -1 | | | <NULL>
C | c | -1 | C | C | <NULL>
POSIX | c | -1 | POSIX | POSIX | <NULL>
ucs_basic | c | 6 | C | C | <NULL>
und-x-icu | i | -1 | und | und | 153.97
[... resumed ...]
c_test | c | 6 | C | C |
(757 rows)

Shouldn't it be NULL?

Done in this new 0002 patch (untested). 0001 removes the comment that
individual collations can't have a NULL version, reports NULL for
Linux/glibc collations like C.UTF-8 by stripping the suffix and
comparing with C and POSIX as suggested by Peter E.

Attachments:

v3-0001-Allow-NULL-version-for-individual-collations.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Allow-NULL-version-for-individual-collations.patchDownload
From 95f35fcbc61354c984193293f70e8ee8a944df91 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 23 Mar 2020 12:48:59 +1300
Subject: [PATCH v3 1/2] Allow NULL version for individual collations.

Remove the documented restriction that collation providers must either
return NULL for all collations or non-NULL for all collations.

Use NULL for glibc collations like "C.UTF-8", which might otherwise lead
future proposed commits to force unnecessary index rebuilds.

Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
---
 src/backend/utils/adt/pg_locale.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 64fd3ae18a..b42122f9ce 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1505,10 +1505,6 @@ pg_newlocale_from_collation(Oid collid)
 /*
  * Get provider-specific collation version string for the given collation from
  * the operating system/library.
- *
- * A particular provider must always either return a non-NULL string or return
- * NULL (if it doesn't support versions).  It must not return NULL for some
- * collcollate and not NULL for others.
  */
 char *
 get_collation_actual_version(char collprovider, const char *collcollate)
@@ -1540,6 +1536,23 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	if (collprovider == COLLPROVIDER_LIBC)
 	{
 #if defined(__GLIBC__)
+		char	   *copy = pstrdup(collcollate);
+		char	   *copy_suffix = strstr(copy, ".");
+		bool		need_version = true;
+
+		/*
+		 * Check for names like C.UTF-8 by chopping off the encoding suffix on
+		 * our temporary copy, so we can skip the version.
+		 */
+		if (copy_suffix)
+			*copy_suffix = '\0';
+		if (pg_strcasecmp("c", copy) == 0 ||
+			pg_strcasecmp("posix", copy) == 0)
+			need_version = false;
+		pfree(copy);
+		if (!need_version)
+			return NULL;
+
 		/* Use the glibc version because we don't have anything better. */
 		collversion = pstrdup(gnu_get_libc_version());
 #endif
-- 
2.20.1

v3-0002-Add-collation-versions-for-Windows.patchtext/x-patch; charset=UTF-8; name=v3-0002-Add-collation-versions-for-Windows.patchDownload
From 953148ccfcb28856db70fe0cb4a99b9f2a08a0ce Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 8 Nov 2019 12:01:05 +1300
Subject: [PATCH v3 2/2] Add collation versions for Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On Vista and later, use GetNLSVersionEx() to request collation version
information.

Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
---
 src/backend/utils/adt/pg_locale.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b42122f9ce..2562eb5416 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1555,6 +1555,33 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 
 		/* Use the glibc version because we don't have anything better. */
 		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(WIN32) && _WIN32_WINNT >= 0x0600
+		/*
+		 * If we are targeting Windows Vista and above, we can ask for a name
+		 * given a collation name (earlier versions required a location code
+		 * that we don't have).
+		 */
+		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
+		WCHAR		wide_collcollate[LOCALE_NAME_MAX_LENGTH];
+
+		/* These would be invalid arguments, but have no version. */
+		if (pg_strcasecmp("c", collcollate) == 0 ||
+			pg_strcasecmp("posix", collcollate) == 0)
+			return NULL;
+
+		/* For all other names, ask the OS. */
+		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
+							LOCALE_NAME_MAX_LENGTH);
+		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
+			ereport(ERROR,
+					(errmsg("could not get collation version for locale \"%s\": error code %lu",
+							collcollate,
+							GetLastError())));
+		collversion = psprintf("%d.%d,%d.%d",
+							   (version.dwNLSVersion >> 8) & 0xFFFF,
+							   version.dwNLSVersion & 0xFF,
+							   (version.dwDefinedVersion >> 8) & 0xFFFF,
+							   version.dwDefinedVersion & 0xFF);
 #endif
 	}
 
-- 
2.20.1

#12Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#11)
Re: Collation versions on Windows (help wanted, apply within)

On Mon, Mar 23, 2020 at 5:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Done in this new 0002 patch (untested). 0001 removes the comment that
individual collations can't have a NULL version, reports NULL for
Linux/glibc collations like C.UTF-8 by stripping the suffix and
comparing with C and POSIX as suggested by Peter E.

It applies and passes tests without a problem in Windows, and works as
expected.

Regards

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#12)
Re: Collation versions on Windows (help wanted, apply within)

On Tue, Mar 24, 2020 at 7:56 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

On Mon, Mar 23, 2020 at 5:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Done in this new 0002 patch (untested). 0001 removes the comment that
individual collations can't have a NULL version, reports NULL for
Linux/glibc collations like C.UTF-8 by stripping the suffix and
comparing with C and POSIX as suggested by Peter E.

It applies and passes tests without a problem in Windows, and works as expected.

Thanks! Pushed.

From the things we learned in this thread, I think there is an open
item for someone to write a patch to call EnumSystemLocalesEx() and
populate the initial set of collations, where we use "locale -a" on
Unix. I'm not sure where the encoding is supposed to come from
though, which is why I didn't try to write a patch myself.

#14Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#13)
Re: Collation versions on Windows (help wanted, apply within)

On Wed, Mar 25, 2020 at 4:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Thanks! Pushed.

Great!

From the things we learned in this thread, I think there is an open
item for someone to write a patch to call EnumSystemLocalesEx() and
populate the initial set of collations, where we use "locale -a" on
Unix. I'm not sure where the encoding is supposed to come from
though, which is why I didn't try to write a patch myself.

I will take a look at this when the current commitfest is over.

Regards,

Juan José Santamaría Flecha