Inconsistent results with libc sorting on Windows

Started by Daniel Veriteover 2 years ago19 messages
#1Daniel Verite
daniel@manitou-mail.org
1 attachment(s)

Hi,

While trying pg16beta1 libc collations on Windows, I noticed that UTF-8
text sorts sometimes differently across invocations with the same
locales, which is wrong since these collations are deterministic.

The OS is Windows 10 Home, version 10.0.19045 Build 19045,
self-built 16beta1 with VS Community 2022, without ICU, default
configuration in postgresql.conf.

It seems to occur more or less randomly with all libc locales except
C/POSIX, with the probability of getting differences being seemingly
much higher when the data gets larger in number of rows and uses
higher codepoints (like if all character are in [U+0001,U+0400] the
sorts never differ with 40k rows, but they do if there are much more
rows or if the range is [U+0001,U+2000]).

Also, it does not occur at all if parallel scan is disabled.

I've come up with a self-contained script that generates random words
and repeatedly sorts and feed them to md5sum. It takes the number of
rows and the highest Unicode codepoint as arguments, and shows when the
checksums differ across consecutive invocations.

Here's a typical run showing how it goes wrong after the 14th sort:

$ bash repro-coll-windows.sh 40000 16383
NOTICE: relation "random_words" already exists, skipping
CREATE TABLE
TRUNCATE TABLE
CREATE FUNCTION
DROP COLLATION
CREATE COLLATION
INSERT 0 40000
ANALYZE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
35050d858f4c590788132627e74f62c8 -> e746b626fcc848cbbc67570a7dde03bb
(iter=15)
16
e746b626fcc848cbbc67570a7dde03bb -> 35050d858f4c590788132627e74f62c8
(iter=16)
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
35050d858f4c590788132627e74f62c8 -> 6bf38563d1267339122154bd7d4fbfce
(iter=38)
39
6bf38563d1267339122154bd7d4fbfce -> 35050d858f4c590788132627e74f62c8
(iter=39)
40 41 42 43 44 45 46 47 48 49 50 51
35050d858f4c590788132627e74f62c8 -> 3d2072698054d0bd57beefea0248b7e6
(iter=51)
52
3d2072698054d0bd57beefea0248b7e6 -> 35050d858f4c590788132627e74f62c8
(iter=52)
53 54 55 56 57 58 59 ^C

Would anyone be able to reproduce this? That might be a local problem
although there's nothing special installed AFAICS.
Initially I saw this with a larger dataset that I can't share, and the diffs
between outputs showed that only a few lines out of 2 million lines
were getting displaced across sorts.
It also happens on the same OS with Pg15.3 (EDB build) and the default
libc collation, so I would not immediately suspect new code in Pg16.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachments:

repro-coll-windows.shapplication/octet-stream; name=repro-coll-windows.shDownload
#2Joe Conway
mail@joeconway.com
In reply to: Daniel Verite (#1)
Re: Inconsistent results with libc sorting on Windows

On 6/5/23 18:07, Daniel Verite wrote:

While trying pg16beta1 libc collations on Windows, I noticed that UTF-8
text sorts sometimes differently across invocations with the same
locales, which is wrong since these collations are deterministic.

<snip>

Also, it does not occur at all if parallel scan is disabled.

This is a wild shot in the dark, but I wonder if somehow the locale is
being initialized (i.e. setlocale) differently in the parallel workers
than the backend due to some Windows specific behavior differences?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Daniel Verite (#1)
Re: Inconsistent results with libc sorting on Windows

On Tue, Jun 6, 2023 at 10:08 AM Daniel Verite <daniel@manitou-mail.org> wrote:

Also, it does not occur at all if parallel scan is disabled.

Could this be a clue that it is failing to be transitive?

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#3)
Re: Inconsistent results with libc sorting on Windows

On Tue, Jun 6, 2023 at 1:30 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Jun 6, 2023 at 10:08 AM Daniel Verite <daniel@manitou-mail.org> wrote:

Also, it does not occur at all if parallel scan is disabled.

Could this be a clue that it is failing to be transitive?

That vaguely rang a bell for me... and then I remembered this thread:

/messages/by-id/20191206063401.GB1629883@rfd.leadboat.com

#5Daniel Verite
daniel@manitou-mail.org
In reply to: Thomas Munro (#4)
Re: Inconsistent results with libc sorting on Windows

Thomas Munro wrote:

Also, it does not occur at all if parallel scan is disabled.

Could this be a clue that it is failing to be transitive?

That vaguely rang a bell for me... and then I remembered this thread:

/messages/by-id/20191206063401.GB1629883@rfd.leadboat.com

Thanks for the pointer, non-transitive comparisons seem a likely cause
indeed.

The parallel scan appears to imply some randomness in the sequence of
comparisons, which makes the problem more visible.
After changing the test to shuffle the rows before each sort,
non-parallel scans also produce outputs that differ, proving that
parallelism is not a root cause.

Running the test with all the libc collations with collencoding in
(-1,6) shows that the only ones not affected are C/POSIX/ucs_basic.
Otherwise the 569 other pre-created libc collations that can be used
with UTF-8 are affected, plus the default collation
(French_France.1252 in my case).

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#6Joe Conway
mail@joeconway.com
In reply to: Daniel Verite (#5)
Re: Inconsistent results with libc sorting on Windows

On 6/7/23 07:58, Daniel Verite wrote:

Thomas Munro wrote:

Also, it does not occur at all if parallel scan is disabled.

Could this be a clue that it is failing to be transitive?

That vaguely rang a bell for me... and then I remembered this thread:

/messages/by-id/20191206063401.GB1629883@rfd.leadboat.com

Thanks for the pointer, non-transitive comparisons seem a likely cause
indeed.

The parallel scan appears to imply some randomness in the sequence of
comparisons, which makes the problem more visible.
After changing the test to shuffle the rows before each sort,
non-parallel scans also produce outputs that differ, proving that
parallelism is not a root cause.

Running the test with all the libc collations with collencoding in
(-1,6) shows that the only ones not affected are C/POSIX/ucs_basic.
Otherwise the 569 other pre-created libc collations that can be used
with UTF-8 are affected, plus the default collation
(French_France.1252 in my case).

Wow, that sounds pretty horrid

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#7Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Joe Conway (#6)
1 attachment(s)
Re: Inconsistent results with libc sorting on Windows

On 6/7/23 07:58, Daniel Verite wrote:

Thomas Munro wrote:

Also, it does not occur at all if parallel scan is disabled.

Could this be a clue that it is failing to be transitive?

That vaguely rang a bell for me... and then I remembered this thread:

/messages/by-id/20191206063401.GB1629883@rfd.leadboat.com

Thanks for the pointer, non-transitive comparisons seem a likely cause
indeed.

Just to make sure we are all seeing the same problem, does the attached
patch fix your test?

Regards,

Juan José Santamaría Flecha

Attachments:

0001-POC-Inconsistent-results-with-libc-sorting-on-Window.patchapplication/octet-stream; name=0001-POC-Inconsistent-results-with-libc-sorting-on-Window.patchDownload
From 0012312bf784bf0e0954feacc55fae9b647458f0 Mon Sep 17 00:00:00 2001
From: Juan Jose Santamaria Flecha <juanjo.santamaria@gmail.com>
Date: Thu, 8 Jun 2023 09:27:44 -0400
Subject: [PATCH] POC Inconsistent results with libc sorting on Windows

---
 src/backend/utils/adt/pg_locale.c | 11 ++++++++++-
 src/include/utils/pg_locale.h     |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16..903e0c4 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1534,6 +1534,14 @@ pg_newlocale_from_collation(Oid collid)
 								NULL);
 #else
 				loc = _create_locale(LC_ALL, collcollate);
+				if (GetDatabaseEncoding() == PG_UTF8)
+				{
+					wchar_t		wcollcollate[LOCALE_NAME_MAX_LENGTH];
+
+					MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wcollcollate,
+										LOCALE_NAME_MAX_LENGTH);
+					result.info.lcid = LocaleNameToLCID(wcollcollate, 0);
+				}
 #endif
 				if (!loc)
 					report_newlocale_failure(collcollate);
@@ -1790,7 +1798,8 @@ pg_strncoll_libc_win32_utf8(const char *arg1, size_t len1, const char *arg2,
 	errno = 0;
 #ifdef HAVE_LOCALE_T
 	if (locale)
-		result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+		result = CompareStringW(locale->info.lcid, NORM_LINGUISTIC_CASING,
+								(LPWSTR) a1p, -1, (LPWSTR) a2p, -1);
 	else
 #endif
 		result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243..966067a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -81,6 +81,9 @@ struct pg_locale_struct
 #ifdef HAVE_LOCALE_T
 		locale_t	lt;
 #endif
+#ifdef WIN32
+		LCID		lcid;
+#endif
 #ifdef USE_ICU
 		struct
 		{
-- 
2.11.0

#8Daniel Verite
daniel@manitou-mail.org
In reply to: Juan José Santamaría Flecha (#7)
Re: Inconsistent results with libc sorting on Windows

Juan José Santamaría Flecha wrote:

Just to make sure we are all seeing the same problem, does the attached
patch fix your test?

The problem of the random changes in sorting disappears for all libc
locales in pg_collation, so this is very promising.

However it persists for the default collation.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#9Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Daniel Verite (#8)
1 attachment(s)
Re: Inconsistent results with libc sorting on Windows

On Fri, Jun 9, 2023 at 11:18 AM Daniel Verite <daniel@manitou-mail.org>
wrote:

Juan José Santamaría Flecha wrote:

Just to make sure we are all seeing the same problem, does the attached
patch fix your test?

The problem of the random changes in sorting disappears for all libc
locales in pg_collation, so this is very promising.

Great to hear, I'll try to push the patch to something reviewable as per
attached.

However it persists for the default collation.

We can apply a similar approach there.

Regards,

Juan José Santamaría Flecha

Attachments:

0001-WIN32-Inconsistent-results-with-libc-utf8-sorting.patchapplication/octet-stream; name=0001-WIN32-Inconsistent-results-with-libc-utf8-sorting.patchDownload
From 219cf386d74b979119bc75a769a937b649b969f0 Mon Sep 17 00:00:00 2001
From: Juan Jose Santamaria Flecha <juanjo.santamaria@gmail.com>
Date: Fri, 9 Jun 2023 16:57:32 -0400
Subject: [PATCH] WIN32 Inconsistent results with libc utf8 sorting

---
 src/backend/utils/adt/pg_locale.c | 142 ++++++++++++++++++++++++++------------
 src/include/utils/pg_locale.h     |   3 +
 2 files changed, 99 insertions(+), 46 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16..7500fa5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1534,6 +1534,20 @@ pg_newlocale_from_collation(Oid collid)
 								NULL);
 #else
 				loc = _create_locale(LC_ALL, collcollate);
+				if (GetDatabaseEncoding() == PG_UTF8)
+				{
+					wchar_t		wcollcollate[LOCALE_NAME_MAX_LENGTH];
+					LCID 		lcid;
+
+					MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wcollcollate,
+										LOCALE_NAME_MAX_LENGTH);
+					lcid = LocaleNameToLCID(wcollcollate, 0);
+					if (lcid == 0)
+						ereport(ERROR,
+								(errmsg("could not convert locale name to LCID: error code %lu",
+										GetLastError())));
+					result.info.lcid = lcid;
+				}
 #endif
 				if (!loc)
 					report_newlocale_failure(collcollate);
@@ -1565,7 +1579,10 @@ pg_newlocale_from_collation(Oid collid)
 #endif
 			}
 
-			result.info.lt = loc;
+#ifdef WIN32
+			if (!result.info.lcid)
+#endif
+				result.info.lt = loc;
 #else							/* not HAVE_LOCALE_T */
 			/* platform that doesn't support locale_t */
 			ereport(ERROR,
@@ -1729,77 +1746,110 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 }
 
 /*
- * pg_strncoll_libc_win32_utf8
+ * pg_strncoll_sort_key
  *
  * Win32 does not have UTF-8. Convert UTF8 arguments to wide characters and
- * invoke wcscoll() or wcscoll_l().
+ * produce a normalized sort key based on the locale. Returns a palloced
+ * string.
  */
 #ifdef WIN32
-static int
-pg_strncoll_libc_win32_utf8(const char *arg1, size_t len1, const char *arg2,
-							size_t len2, pg_locale_t locale)
+static char *
+pg_strncoll_sort_key(const char *arg, size_t len, pg_locale_t locale,
+					 int *sortlen)
 {
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	char	   *a1p,
-			   *a2p;
-	int			a1len = len1 * 2 + 2;
-	int			a2len = len2 * 2 + 2;
-	int			r;
-	int			result;
-
-	Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
-	Assert(GetDatabaseEncoding() == PG_UTF8);
-#ifndef WIN32
-	Assert(false);
-#endif
+	char       *ap;
+	int         alen = len * 2 + 2;
+	int         result;
 
-	if (a1len + a2len > TEXTBUFLEN)
-		buf = palloc(a1len + a2len);
-
-	a1p = buf;
-	a2p = buf + a1len;
+	ap = palloc(alen);
 
 	/* API does not work for zero-length input */
-	if (len1 == 0)
-		r = 0;
+	if (len == 0)
+		result = 0;
 	else
 	{
-		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
-								(LPWSTR) a1p, a1len / 2);
-		if (!r)
+		result = MultiByteToWideChar(CP_UTF8, 0, arg, len,
+									 (LPWSTR) ap, alen / 2);
+		if (!result)
 			ereport(ERROR,
 					(errmsg("could not convert string to UTF-16: error code %lu",
 							GetLastError())));
 	}
-	((LPWSTR) a1p)[r] = 0;
+	((LPWSTR) ap)[result] = 0;
 
-	if (len2 == 0)
-		r = 0;
-	else
+	errno = 0;
+#ifdef HAVE_LOCALE_T
+	if (locale)
 	{
-		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
-								(LPWSTR) a2p, a2len / 2);
-		if (!r)
+		int         mapsize;
+		char       *map;
+
+		mapsize = LCMapStringW(locale->info.lcid, LCMAP_SORTKEY, (LPWSTR) ap, -1, NULL, 0);
+		if (mapsize == 0)
 			ereport(ERROR,
-					(errmsg("could not convert string to UTF-16: error code %lu",
+					(errmsg("could not produce a normalized sort key: error code %lu",
+							GetLastError())));
+
+		map = palloc(mapsize);
+
+		result = LCMapStringW(locale->info.lcid, LCMAP_SORTKEY, (LPWSTR) ap, -1,
+							  (LPWSTR) map, mapsize);
+		if (result == 0)
+			ereport(ERROR,
+					(errmsg("could not produce a normalized sort key: error code %lu",
 							GetLastError())));
+
+		pfree(ap);
+		ap = map;
 	}
-	((LPWSTR) a2p)[r] = 0;
+#endif
+
+	*sortlen = result;
+	return ap;
+}
+#endif							/* WIN32 */
+
+/*
+ * pg_strncoll_libc_win32_utf8
+ *
+ * For Win32 UTF-8 string comparison we will use LCMapStringW() or
+ * CompareStringOrdinal().
+ */
+#ifdef WIN32
+static int
+pg_strncoll_libc_win32_utf8(const char *arg1, size_t len1, const char *arg2,
+							size_t len2, pg_locale_t locale)
+{
+	char	   *a1p,
+			   *a2p;
+	int			a1plen = 0,
+				a2plen = 0;
+	int			result;
+
+	Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+#ifndef WIN32
+	Assert(false);
+#endif
+
+	a1p = pg_strncoll_sort_key(arg1, len1, locale, &a1plen);
+	a2p = pg_strncoll_sort_key(arg2, len2, locale, &a2plen);
 
-	errno = 0;
 #ifdef HAVE_LOCALE_T
 	if (locale)
-		result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+		result = memcmp((LPWSTR) a1p, (LPWSTR) a2p, (a1plen < a2plen) ?
+							a1plen : a2plen);
 	else
 #endif
-		result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
-	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
-		ereport(ERROR,
+	{
+		result = CompareStringOrdinal((LPWSTR) a1p, -1, (LPWSTR) a2p, -1, FALSE) - 2;
+		if (result == -2)
+			ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
+	}
 
-	if (buf != sbuf)
-		pfree(buf);
+	pfree(a1p);
+	pfree(a2p);
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243..d642bbc 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -80,6 +80,9 @@ struct pg_locale_struct
 	{
 #ifdef HAVE_LOCALE_T
 		locale_t	lt;
+#ifdef WIN32
+		LCID		lcid;
+#endif
 #endif
 #ifdef USE_ICU
 		struct
-- 
2.11.0

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#9)
Re: Inconsistent results with libc sorting on Windows

Trying to follow along here... you're doing the moral equivalent of
strxfrm(), so sort keys have the transitive property but direct string
comparisons don't? Or is this because LCIDs reach a different
algorithm somehow (or otherwise why do you need to use LCIDs for this,
when there is a non-LCID version of that function, with a warning not
to use the older LCID version[1]https://learn.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringw?)

[1]: https://learn.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringw

In reply to: Thomas Munro (#10)
Re: Inconsistent results with libc sorting on Windows

On Tue, Jun 13, 2023 at 5:59 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Trying to follow along here... you're doing the moral equivalent of
strxfrm(), so sort keys have the transitive property but direct string
comparisons don't? Or is this because LCIDs reach a different
algorithm somehow (or otherwise why do you need to use LCIDs for this,
when there is a non-LCID version of that function, with a warning not
to use the older LCID version[1]?)

I'm reminded of the fact that the abbreviated keys strxfrm() debacle
(back when 9.5 was released) was caused by a bug in strcoll() -- not a
bug in strxfrm() itself. From our point of view the problem was that
strxfrm() failed to be bug compatible with strcoll() due to a buggy
strcoll() optimization.

I believe that strxfrm() is generally less likely to have bugs than
strcoll(). There are far fewer opportunities to dodge unnecessary work
in the case of strxfrm()-like algorithms (offering something like
ICU's pg_strnxfrm_prefix_icu() prefix optimization is the only one).
On the other hand, collation library implementers are likely to
heavily optimize strcoll() for typical use-cases such as sorting and
binary search. Using strxfrm() for everything is discouraged [1]https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison -- Peter Geoghegan.

There is an important difference between this issue and the various
glibc collation related bugs that I've come across, though: to the
best of my knowledge there was never a glibc bug that caused strcoll()
to violate transitive consistency -- it always agreed with itself. So
this is a new one on me. Seems like that might make "always use
strxfrm()" (or whatever it's actually called on this platform)
acceptable; strxfrm() can't really violate transitive consistency in
the same way. (I think -- I'm assuming that it'll always produce a
conditioned binary string in a deterministic fashion, since AFAICT
even this buggy strcoll()-like function won't ever give an
inconsistent answer when it compares the same two strings.)

[1]: https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison -- Peter Geoghegan
--
Peter Geoghegan

#12Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Peter Geoghegan (#11)
1 attachment(s)
Re: Inconsistent results with libc sorting on Windows

On Wed, Jun 14, 2023 at 4:13 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jun 13, 2023 at 5:59 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

Trying to follow along here... you're doing the moral equivalent of
strxfrm(), so sort keys have the transitive property but direct string
comparisons don't? Or is this because LCIDs reach a different
algorithm somehow (or otherwise why do you need to use LCIDs for this,
when there is a non-LCID version of that function, with a warning not
to use the older LCID version[1]?)

I'm reminded of the fact that the abbreviated keys strxfrm() debacle
(back when 9.5 was released) was caused by a bug in strcoll() -- not a
bug in strxfrm() itself. From our point of view the problem was that
strxfrm() failed to be bug compatible with strcoll() due to a buggy
strcoll() optimization.

I believe that strxfrm() is generally less likely to have bugs than
strcoll(). There are far fewer opportunities to dodge unnecessary work
in the case of strxfrm()-like algorithms (offering something like
ICU's pg_strnxfrm_prefix_icu() prefix optimization is the only one).
On the other hand, collation library implementers are likely to
heavily optimize strcoll() for typical use-cases such as sorting and
binary search. Using strxfrm() for everything is discouraged [1].

Yes, I think the situation is quite similar to what you describe, with its
WIN32 peculiarities. Take for example the attached program, it'll output:

s1 = s2
s2 = s3
s1 > s3
c1 > c2
c2 > c3
c1 > c3

As you can see the test for CompareStringEx() is broken, but we get a sane
answer with LCMapStringEx().

Regards,

Juan José Santamaría Flecha

Attachments:

compare_transitivity.capplication/octet-stream; name=compare_transitivity.cDownload
#13Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#12)
Re: Inconsistent results with libc sorting on Windows

On Wed, Jun 14, 2023 at 10:50 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:

Yes, I think the situation is quite similar to what you describe, with its WIN32 peculiarities. Take for example the attached program, it'll output:

s1 = s2
s2 = s3
s1 > s3
c1 > c2
c2 > c3
c1 > c3

As you can see the test for CompareStringEx() is broken, but we get a sane answer with LCMapStringEx().

Given that the documented behaviour is that ".. the sort key produces
the same order as when the source string is used in CompareString or
CompareStringEx"[1]https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications, this seems like a reportable bug, unless perhaps
your test program is hiding an error with that default case you have.

[1]: https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications

#14Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#13)
Re: Inconsistent results with libc sorting on Windows

On Thu, Jun 15, 2023 at 1:57 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Given that the documented behaviour is that ".. the sort key produces
the same order as when the source string is used in CompareString or
CompareStringEx"[1], this seems like a reportable bug, unless perhaps
your test program is hiding an error with that default case you have.

[1]
https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications

Ok, let's see where the report goes:

https://developercommunity.visualstudio.com/t/CompareStringEx-non-transitive/10393003?q=comparestringex

Regards,

Juan José Santamaría Flecha

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Juan José Santamaría Flecha (#14)
Re: Inconsistent results with libc sorting on Windows

On 2023-Jun-19, Juan José Santamaría Flecha wrote:

Ok, let's see where the report goes:

https://developercommunity.visualstudio.com/t/CompareStringEx-non-transitive/10393003?q=comparestringex

Hm, so this appears to have been marked as solved by Microsoft. Can you
recheck? Also, what does the resolution mean for Postgres, in practical
terms?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it." (New York Times, about Microsoft PowerPoint)

#16Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Alvaro Herrera (#15)
Re: Inconsistent results with libc sorting on Windows

El lun, 3 jul 2023, 17:42, Alvaro Herrera <alvherre@alvh.no-ip.org>
escribió:

On 2023-Jun-19, Juan José Santamaría Flecha wrote:

Ok, let's see where the report goes:

https://developercommunity.visualstudio.com/t/CompareStringEx-non-transitive/10393003?q=comparestringex

Hm, so this appears to have been marked as solved by Microsoft. Can you
recheck? Also, what does the resolution mean for Postgres, in practical
terms?

It's not really solved, they have just pushed the issue to another team
that's only reachable through the Windows feedback hub. I've already
provided the feedback, but it's only visible through the proprietary app.

I would say there haven't been any real progress on that front so far.

Regards,

Juan José Santamaría Flecha

Show quoted text
#17Noah Misch
noah@leadboat.com
In reply to: Juan José Santamaría Flecha (#12)
Re: Inconsistent results with libc sorting on Windows

On Wed, Jun 14, 2023 at 12:50:28PM +0200, Juan José Santamaría Flecha wrote:

On Wed, Jun 14, 2023 at 4:13 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jun 13, 2023 at 5:59 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

Trying to follow along here... you're doing the moral equivalent of
strxfrm(), so sort keys have the transitive property but direct string
comparisons don't? Or is this because LCIDs reach a different
algorithm somehow (or otherwise why do you need to use LCIDs for this,
when there is a non-LCID version of that function, with a warning not
to use the older LCID version[1]?)

I'm reminded of the fact that the abbreviated keys strxfrm() debacle
(back when 9.5 was released) was caused by a bug in strcoll() -- not a
bug in strxfrm() itself. From our point of view the problem was that
strxfrm() failed to be bug compatible with strcoll() due to a buggy
strcoll() optimization.

I believe that strxfrm() is generally less likely to have bugs than
strcoll(). There are far fewer opportunities to dodge unnecessary work
in the case of strxfrm()-like algorithms (offering something like
ICU's pg_strnxfrm_prefix_icu() prefix optimization is the only one).
On the other hand, collation library implementers are likely to
heavily optimize strcoll() for typical use-cases such as sorting and
binary search. Using strxfrm() for everything is discouraged [1].

Yes, I think the situation is quite similar to what you describe, with its
WIN32 peculiarities. Take for example the attached program, it'll output:

s1 = s2
s2 = s3
s1 > s3
c1 > c2
c2 > c3
c1 > c3

As you can see the test for CompareStringEx() is broken, but we get a sane
answer with LCMapStringEx().

The LCMapStringEx() solution is elegant. I do see
https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications
says, "If an application calls the function to create a sort key for a string
containing an Arabic kashida, the function creates no sort key value." That's
aggravating.

#18Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Noah Misch (#17)
1 attachment(s)
Re: Inconsistent results with libc sorting on Windows

On Fri, Aug 11, 2023 at 7:29 AM Noah Misch <noah@leadboat.com> wrote:

The LCMapStringEx() solution is elegant. I do see

https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications
says, "If an application calls the function to create a sort key for a
string
containing an Arabic kashida, the function creates no sort key value."
That's
aggravating.

I think the problem there is that it is just poorly explained.
Take for example the attached program that compares "normal" and "kashida"
'Raħīm' taken from [1]https://en.wikipedia.org/wiki/Kashida, you'll get:

c1 = c2
c2 = c1

meaning that "normal" and "kashida" are the same string. So, probably that
phrase should read something like: "If an application calls the function to
create a sort key for a string containing an Arabic kashida character, the
function will ignore that character and no sort key value will be generated
for it."

[1]: https://en.wikipedia.org/wiki/Kashida

Regards,

Juan José Santamaría Flecha

Attachments:

compare_kashida.ctext/plain; charset=US-ASCII; name=compare_kashida.cDownload
#19Noah Misch
noah@leadboat.com
In reply to: Juan José Santamaría Flecha (#18)
Re: Inconsistent results with libc sorting on Windows

On Fri, Aug 11, 2023 at 11:48:18AM +0200, Juan José Santamaría Flecha wrote:

On Fri, Aug 11, 2023 at 7:29 AM Noah Misch <noah@leadboat.com> wrote:

The LCMapStringEx() solution is elegant. I do see

https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications
says, "If an application calls the function to create a sort key for a
string
containing an Arabic kashida, the function creates no sort key value."
That's
aggravating.

I think the problem there is that it is just poorly explained.
Take for example the attached program that compares "normal" and "kashida"
'Raħīm' taken from [1], you'll get:

c1 = c2
c2 = c1

meaning that "normal" and "kashida" are the same string. So, probably that
phrase should read something like: "If an application calls the function to
create a sort key for a string containing an Arabic kashida character, the
function will ignore that character and no sort key value will be generated
for it."

Good. That sounds fine. Thanks for clarifying.