[PROPOSAL] Skip test citext_utf8 on Windows

Started by Oleg Tselebrovskiyalmost 2 years ago6 messages
#1Oleg Tselebrovskiy
o.tselebrovskiy@postgrespro.ru
1 attachment(s)

Greetings, everyone!

While running "installchecks" on databases with UTF-8 encoding the test
citext_utf8 fails because of Turkish dotted I like this:

  SELECT 'i'::citext = 'İ'::citext AS t;
   t
  ---
- t
+ f
  (1 row)

I tried to replicate the test's results by hand and with any collation
that I tried (including --locale="Turkish") this test failed

Also an interesing result of my tesing. If you initialize you DB
with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
the output will be this:
lower
-------
İ
(1 row)

Which I find strange since lower() uses collation that was passed
(default in this case but still)

My PostgreSQL version is this:
postgres=# select version();
version
----------------------------------------------------------------------
PostgreSQL 17devel on x86_64-windows, compiled by gcc-13.1.0, 64-bit

The proposed patch for skipping test is attached

Oleg Tselebrovskiy, Postgres Pro

Attachments:

skip_citext_utf8.patchtext/x-diff; name=skip_citext_utf8.patchDownload
��diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out

index 5d988dcd485..6c4069f9469 100644

--- a/contrib/citext/expected/citext_utf8.out

+++ b/contrib/citext/expected/citext_utf8.out

@@ -10,7 +10,8 @@

 SELECT getdatabaseencoding() <> 'UTF8' OR

        (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'

         FROM pg_database

-        WHERE datname=current_database())

+        WHERE datname=current_database()) OR

+	   (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32')

        AS skip_test \gset

 \if :skip_test

 \quit

diff --git a/contrib/citext/expected/citext_utf8_1.out b/contrib/citext/expected/citext_utf8_1.out

index 7065a5da190..d4472b1c36a 100644

--- a/contrib/citext/expected/citext_utf8_1.out

+++ b/contrib/citext/expected/citext_utf8_1.out

@@ -10,7 +10,8 @@

 SELECT getdatabaseencoding() <> 'UTF8' OR

        (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'

         FROM pg_database

-        WHERE datname=current_database())

+        WHERE datname=current_database()) OR

+	   (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32')

        AS skip_test \gset

 \if :skip_test

 \quit

diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql

index 34b232d64e2..53775cdcd35 100644

--- a/contrib/citext/sql/citext_utf8.sql

+++ b/contrib/citext/sql/citext_utf8.sql

@@ -11,7 +11,8 @@

 SELECT getdatabaseencoding() <> 'UTF8' OR

        (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'

         FROM pg_database

-        WHERE datname=current_database())

+        WHERE datname=current_database()) OR

+	   (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32')

        AS skip_test \gset

 \if :skip_test

 \quit

#2Michael Paquier
michael@paquier.xyz
In reply to: Oleg Tselebrovskiy (#1)
Re: [PROPOSAL] Skip test citext_utf8 on Windows

On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote:

The proposed patch for skipping test is attached

Your attached patch seems to be in binary format.
--
Michael

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Oleg Tselebrovskiy (#1)
Re: [PROPOSAL] Skip test citext_utf8 on Windows

On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:

Greetings, everyone!

While running "installchecks" on databases with UTF-8 encoding the test
citext_utf8 fails because of Turkish dotted I like this:

 SELECT 'i'::citext = 'İ'::citext AS t;
  t
 ---
- t
+ f
 (1 row)

I tried to replicate the test's results by hand and with any collation
that I tried (including --locale="Turkish") this test failed

Also an interesing result of my tesing. If you initialize you DB
with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
the output will be this:
 lower
-------
 İ
(1 row)

Which I find strange since lower() uses collation that was passed
(default in this case but still)

Wouldn't we be better off finding a Windows fix for this, instead of
sweeping it under the rug?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Andrew Dunstan (#3)
Re: [PROPOSAL] Skip test citext_utf8 on Windows

On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:

Greetings, everyone!

While running "installchecks" on databases with UTF-8 encoding the test
citext_utf8 fails because of Turkish dotted I like this:

SELECT 'i'::citext = 'İ'::citext AS t;
t
---
- t
+ f
(1 row)

I tried to replicate the test's results by hand and with any collation
that I tried (including --locale="Turkish") this test failed

Also an interesing result of my tesing. If you initialize you DB
with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
the output will be this:
lower
-------
İ
(1 row)

Which I find strange since lower() uses collation that was passed
(default in this case but still)

Wouldn't we be better off finding a Windows fix for this, instead of
sweeping it under the rug?

Given the sorry state of our Windows locale support, I've started
wondering about deleting it and telling users to adopt our nascent
built-in support or ICU[1]/messages/by-id/CA+hUKGJhV__g_TJ0jVqPbnTuqT++M6KFv2wj+9AV-cABNCXN6Q@mail.gmail.com.

This other thread [2]/messages/by-id/1407a2c0-062b-4e4c-b728-438fdff5cb07@manitou-mail.org says the sorting is intransitive so I don't
think it really meets our needs anyway.

[1]: /messages/by-id/CA+hUKGJhV__g_TJ0jVqPbnTuqT++M6KFv2wj+9AV-cABNCXN6Q@mail.gmail.com
[2]: /messages/by-id/1407a2c0-062b-4e4c-b728-438fdff5cb07@manitou-mail.org

#5Oleg Tselebrovskiy
o.tselebrovskiy@postgrespro.ru
In reply to: Michael Paquier (#2)
1 attachment(s)
Re: [PROPOSAL] Skip test citext_utf8 on Windows

Michael Paquier писал(а) 2024-03-12 06:24:

On Mon, Mar 11, 2024 at 03:21:11PM +0700, Oleg Tselebrovskiy wrote:

The proposed patch for skipping test is attached

Your attached patch seems to be in binary format.
--
Michael

Right, I had it saved in not-UTF-8 encoding. Kind of ironic

Here's a fixed version

Attachments:

v2_skip_citext_utf8.patchtext/x-diff; name=v2_skip_citext_utf8.patchDownload
diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out
index 5d988dcd485..6c4069f9469 100644
--- a/contrib/citext/expected/citext_utf8.out
+++ b/contrib/citext/expected/citext_utf8.out
@@ -10,7 +10,8 @@
 SELECT getdatabaseencoding() <> 'UTF8' OR
        (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'
         FROM pg_database
-        WHERE datname=current_database())
+        WHERE datname=current_database()) OR
+	   (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32')
        AS skip_test \gset
 \if :skip_test
 \quit
diff --git a/contrib/citext/expected/citext_utf8_1.out b/contrib/citext/expected/citext_utf8_1.out
index 7065a5da190..d4472b1c36a 100644
--- a/contrib/citext/expected/citext_utf8_1.out
+++ b/contrib/citext/expected/citext_utf8_1.out
@@ -10,7 +10,8 @@
 SELECT getdatabaseencoding() <> 'UTF8' OR
        (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'
         FROM pg_database
-        WHERE datname=current_database())
+        WHERE datname=current_database()) OR
+	   (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32')
        AS skip_test \gset
 \if :skip_test
 \quit
diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql
index 34b232d64e2..53775cdcd35 100644
--- a/contrib/citext/sql/citext_utf8.sql
+++ b/contrib/citext/sql/citext_utf8.sql
@@ -11,7 +11,8 @@
 SELECT getdatabaseencoding() <> 'UTF8' OR
        (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'
         FROM pg_database
-        WHERE datname=current_database())
+        WHERE datname=current_database()) OR
+	   (version() ~ 'windows' OR version() ~ 'Visual C\+\+' OR version() ~ 'mingw32')
        AS skip_test \gset
 \if :skip_test
 \quit
#6Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#4)
Re: [PROPOSAL] Skip test citext_utf8 on Windows

On 2024-03-11 Mo 22:50, Thomas Munro wrote:

On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:

Greetings, everyone!

While running "installchecks" on databases with UTF-8 encoding the test
citext_utf8 fails because of Turkish dotted I like this:

SELECT 'i'::citext = 'İ'::citext AS t;
t
---
- t
+ f
(1 row)

I tried to replicate the test's results by hand and with any collation
that I tried (including --locale="Turkish") this test failed

Also an interesing result of my tesing. If you initialize you DB
with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
the output will be this:
lower
-------
İ
(1 row)

Which I find strange since lower() uses collation that was passed
(default in this case but still)

Wouldn't we be better off finding a Windows fix for this, instead of
sweeping it under the rug?

Given the sorry state of our Windows locale support, I've started
wondering about deleting it and telling users to adopt our nascent
built-in support or ICU[1].

This other thread [2] says the sorting is intransitive so I don't
think it really meets our needs anyway.

[1] /messages/by-id/CA+hUKGJhV__g_TJ0jVqPbnTuqT++M6KFv2wj+9AV-cABNCXN6Q@mail.gmail.com
[2] /messages/by-id/1407a2c0-062b-4e4c-b728-438fdff5cb07@manitou-mail.org

Makes more sense than just hacking the tests to avoid running them on
Windows. (I also didn't much like doing it by parsing the version
string, although I know there's at least one precedent for doing that.)

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com