Fix order of checking ICU options in initdb and create database

Started by Marina Polyakovaabout 3 years ago13 messages
#1Marina Polyakova
m.polyakova@postgrespro.ru
1 attachment(s)

Hello!

This is the last proposed patch on this subject [1]/messages/by-id/e94aca035bf0b92fac42d204ad385552@postgrespro.ru moved to a separate
thread for Commitfest..

It looks like that the current order of checking ICU options in initdb
and create database in PG 15+ is not user-friendly. Examples:

1. initdb reports a missing ICU locale although it may already report
that the selected encoding cannot be used:

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --encoding sql-ascii --locale-provider icu --icu-locale en-US
hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

2. initdb/create database report problems with the ICU locale/encoding
although they may already report that ICU is not supported in this
build:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en-US hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ createdb --locale-provider icu --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

2.2.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii
hoge
createdb: error: database creation failed: ERROR: encoding "SQL_ASCII"
is not supported with ICU provider

$ createdb --locale-provider icu --icu-locale en-US --encoding utf8 hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

The patch
v1-0001-Fix-order-of-checking-ICU-options-in-initdb-and-c.patch fixes
this:

1.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

2.2.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii
hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

A side effect of the proposed patch in initdb is that if ICU locale is
missed (or ICU is not supported in this build), the provider, locales
and encoding are reported before the error message:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user
"marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
provider: icu
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: ru_RU.UTF-8
LC_NUMERIC: ru_RU.UTF-8
LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user
"marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
provider: icu
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: ru_RU.UTF-8
LC_NUMERIC: ru_RU.UTF-8
LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU is not supported in this build

I was thinking about another master-only version of the patch that first
checks everything for provider, locales and encoding but IMO it is worse
[2]: /messages/by-id/79f410460c4fc9534000785adb8bf39a@postgrespro.ru

[1]: /messages/by-id/e94aca035bf0b92fac42d204ad385552@postgrespro.ru
/messages/by-id/e94aca035bf0b92fac42d204ad385552@postgrespro.ru
[2]: /messages/by-id/79f410460c4fc9534000785adb8bf39a@postgrespro.ru
/messages/by-id/79f410460c4fc9534000785adb8bf39a@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v1-0001-Fix-order-of-checking-ICU-options-in-initdb-and-c.patchtext/x-diff; name=v1-0001-Fix-order-of-checking-ICU-options-in-initdb-and-c.patchDownload
From da761095a7e39463ec6083484c4b1b924cacf387 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Sat, 29 Oct 2022 14:25:05 +0300
Subject: [PATCH v1] Fix order of checking ICU options in initdb and create
 database

First check if ICU is supported in this build. Then check that the selected
encoding is supported BY ICU (the encoding can be set from lc_ctype which can be
set from an environment variable). Only then check the option for the ICU
locale.
---
 src/backend/commands/dbcommands.c |  6 ++++++
 src/backend/utils/adt/pg_locale.c | 10 ++-------
 src/bin/initdb/initdb.c           | 36 ++++++++++++++-----------------
 src/bin/initdb/t/001_initdb.pl    | 21 ++++++++++++++----
 src/bin/scripts/t/020_createdb.pl | 31 ++++++++++++++++++++++----
 src/include/utils/pg_locale.h     |  2 +-
 6 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 16e422138b..4c1e79fca3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+#ifndef USE_ICU
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#else
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("ICU locale must be specified")));
 
 		check_icu_locale(dbiculocale);
+#endif
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..743f11e1d1 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 	}
 }
 
-#endif							/* USE_ICU */
-
 /*
  * Check if the given locale ID is valid, and ereport(ERROR) if it isn't.
  */
 void
 check_icu_locale(const char *icu_locale)
 {
-#ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
 
@@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale)
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
 	ucol_close(collator);
-#else
-	ereport(ERROR,
-			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("ICU is not supported in this build")));
-#endif
 }
 
+#endif							/* USE_ICU */
+
 /*
  * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
  * Therefore we keep them here rather than with the mbutils code.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a043055..82e6644f89 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2047,9 +2047,12 @@ check_locale_encoding(const char *locale, int user_enc)
  *
  * this should match the similar check in the backend createdb() function
  */
-static bool
+static void
 check_icu_locale_encoding(int user_enc)
 {
+#ifndef USE_ICU
+	pg_fatal("ICU is not supported in this build");
+#else
 	if (!(is_encoding_supported_by_icu(user_enc)))
 	{
 		pg_log_error("encoding mismatch");
@@ -2058,9 +2061,17 @@ check_icu_locale_encoding(int user_enc)
 		pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, "
 						  "or choose a matching combination.",
 						  progname);
-		return false;
+		exit(1);
 	}
-	return true;
+
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
+#endif
 }
 
 /*
@@ -2113,20 +2124,6 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
-
-	if (locale_provider == COLLPROVIDER_ICU)
-	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
-
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
-#ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
-#endif
-	}
 }
 
 /*
@@ -2388,9 +2385,8 @@ setup_locale_encoding(void)
 		!check_locale_encoding(lc_collate, encodingid))
 		exit(1);				/* check_locale_encoding printed the error */
 
-	if (locale_provider == COLLPROVIDER_ICU &&
-		!check_icu_locale_encoding(encodingid))
-		exit(1);
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding(encodingid);
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 164fc11cbf..884506a1bc 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -123,28 +123,41 @@ if ($ENV{with_icu} eq 'yes')
 		[
 			'initdb',                '--no-sync',
 			'--locale-provider=icu', '--encoding=SQL_ASCII',
-			'--icu-locale=en', "$tempdir/dataX"
+			"$tempdir/dataX"
 		],
 		qr/error: encoding mismatch/,
 		'fails for encoding not supported by ICU');
 }
 else
 {
-	command_fails(
+	command_fails_like(
 		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
+		qr/error: ICU is not supported in this build/,
 		'locale provider ICU fails since no ICU support');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--encoding=SQL_ASCII',
+			"$tempdir/data2"
+		],
+		qr/error: ICU is not supported in this build/,
+		'locale provider ICU with not supported ICU encoding fails since no ICU support'
+	);
 }
 
-command_fails(
+command_fails_like(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
+	qr/error: unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
-command_fails(
+command_fails_like(
 	[
 		'initdb',                 '--no-sync',
 		'--locale-provider=libc', '--icu-locale=en',
 		"$tempdir/dataX"
 	],
+	qr/error: --icu-locale cannot be specified unless locale provider "icu" is chosen/,
 	'fails for invalid option combination');
 
 done_testing();
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index a74bf3b0d8..45b44fce46 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -30,11 +30,12 @@ if ($ENV{with_icu} eq 'yes')
 	# This fails because template0 uses libc provider and has no ICU
 	# locale set.  It would succeed if template0 used the icu
 	# provider.  XXX Maybe split into multiple tests?
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu', 'foobar4'
 		],
+		qr/ERROR:  ICU locale must be specified/,
 		'create database with ICU fails without ICU locale specified');
 
 	$node->issues_sql_like(
@@ -46,12 +47,13 @@ if ($ENV{with_icu} eq 'yes')
 		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
 		'create database with ICU locale specified');
 
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu',
 			'--icu-locale=@colNumeric=lower', 'foobarX'
 		],
+		qr/ERROR:  could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	$node->command_fails_like(
@@ -78,18 +80,39 @@ if ($ENV{with_icu} eq 'yes')
 }
 else
 {
-	$node->command_fails(
+	$node->command_fails_like(
 		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		qr/ERROR:  ICU is not supported in this build/,
 		'create database with ICU fails since no ICU support');
+
+	$node->command_fails_like(
+		[
+			'createdb',             '-T',
+			'template0',            '--locale-provider=icu',
+			'--encoding=SQL_ASCII', 'foobar4'
+		],
+		qr/ERROR:  ICU is not supported in this build/,
+		'create database with ICU and not supported ICU encoding fails since no ICU support'
+	);
 }
 
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
-$node->command_fails(
+$node->command_fails_like(
 	[ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	qr/ERROR:  unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
+$node->command_fails_like(
+	[
+		'createdb',        '-T',
+		'template0',       '--locale-provider=libc',
+		'--icu-locale=en', 'foobarX'
+	],
+	qr/ERROR:  ICU locale cannot be specified unless locale provider is ICU/,
+	'fails for invalid option combination');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a875942123..6e8af39976 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -104,8 +104,8 @@ extern char *get_collation_actual_version(char collprovider, const char *collcol
 #ifdef USE_ICU
 extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
 extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
 extern void check_icu_locale(const char *icu_locale);
+#endif
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
-- 
2.25.1

#2Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Marina Polyakova (#1)
2 attachment(s)
Re: Fix order of checking ICU options in initdb and create database

On 2022-10-29 14:33, Marina Polyakova wrote:

Hello!

This is the last proposed patch on this subject [1] moved to a
separate thread for Commitfest..

Also added a patch to export with_icu when running src/bin/scripts tests
[1]: https://cirrus-ci.com/task/4825664661487616
The problem can be reproduced as

$ meson setup <source dir> && ninja && meson test --print-errorlogs
--suite setup --suite scripts

[1]: https://cirrus-ci.com/task/4825664661487616

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v1-0002-Fix-order-of-checking-ICU-options-in-initdb-and-c.patchtext/x-diff; name=v1-0002-Fix-order-of-checking-ICU-options-in-initdb-and-c.patchDownload
From a6de5922fc533c88c7288051d8797d021ae91dae Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Sat, 29 Oct 2022 14:25:05 +0300
Subject: [PATCH v1 2/2] Fix order of checking ICU options in initdb and create
 database

First check if ICU is supported in this build. Then check that the selected
encoding is supported BY ICU (the encoding can be set from lc_ctype which can be
set from an environment variable). Only then check the option for the ICU
locale.
---
 src/backend/commands/dbcommands.c |  6 ++++++
 src/backend/utils/adt/pg_locale.c | 10 ++-------
 src/bin/initdb/initdb.c           | 36 ++++++++++++++-----------------
 src/bin/initdb/t/001_initdb.pl    | 21 ++++++++++++++----
 src/bin/scripts/t/020_createdb.pl | 31 ++++++++++++++++++++++----
 src/include/utils/pg_locale.h     |  2 +-
 6 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 16e422138b..4c1e79fca3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+#ifndef USE_ICU
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#else
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("ICU locale must be specified")));
 
 		check_icu_locale(dbiculocale);
+#endif
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..743f11e1d1 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 	}
 }
 
-#endif							/* USE_ICU */
-
 /*
  * Check if the given locale ID is valid, and ereport(ERROR) if it isn't.
  */
 void
 check_icu_locale(const char *icu_locale)
 {
-#ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
 
@@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale)
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
 	ucol_close(collator);
-#else
-	ereport(ERROR,
-			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("ICU is not supported in this build")));
-#endif
 }
 
+#endif							/* USE_ICU */
+
 /*
  * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
  * Therefore we keep them here rather than with the mbutils code.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a043055..82e6644f89 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2047,9 +2047,12 @@ check_locale_encoding(const char *locale, int user_enc)
  *
  * this should match the similar check in the backend createdb() function
  */
-static bool
+static void
 check_icu_locale_encoding(int user_enc)
 {
+#ifndef USE_ICU
+	pg_fatal("ICU is not supported in this build");
+#else
 	if (!(is_encoding_supported_by_icu(user_enc)))
 	{
 		pg_log_error("encoding mismatch");
@@ -2058,9 +2061,17 @@ check_icu_locale_encoding(int user_enc)
 		pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, "
 						  "or choose a matching combination.",
 						  progname);
-		return false;
+		exit(1);
 	}
-	return true;
+
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
+#endif
 }
 
 /*
@@ -2113,20 +2124,6 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
-
-	if (locale_provider == COLLPROVIDER_ICU)
-	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
-
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
-#ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
-#endif
-	}
 }
 
 /*
@@ -2388,9 +2385,8 @@ setup_locale_encoding(void)
 		!check_locale_encoding(lc_collate, encodingid))
 		exit(1);				/* check_locale_encoding printed the error */
 
-	if (locale_provider == COLLPROVIDER_ICU &&
-		!check_icu_locale_encoding(encodingid))
-		exit(1);
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding(encodingid);
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 164fc11cbf..884506a1bc 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -123,28 +123,41 @@ if ($ENV{with_icu} eq 'yes')
 		[
 			'initdb',                '--no-sync',
 			'--locale-provider=icu', '--encoding=SQL_ASCII',
-			'--icu-locale=en', "$tempdir/dataX"
+			"$tempdir/dataX"
 		],
 		qr/error: encoding mismatch/,
 		'fails for encoding not supported by ICU');
 }
 else
 {
-	command_fails(
+	command_fails_like(
 		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
+		qr/error: ICU is not supported in this build/,
 		'locale provider ICU fails since no ICU support');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--encoding=SQL_ASCII',
+			"$tempdir/data2"
+		],
+		qr/error: ICU is not supported in this build/,
+		'locale provider ICU with not supported ICU encoding fails since no ICU support'
+	);
 }
 
-command_fails(
+command_fails_like(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
+	qr/error: unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
-command_fails(
+command_fails_like(
 	[
 		'initdb',                 '--no-sync',
 		'--locale-provider=libc', '--icu-locale=en',
 		"$tempdir/dataX"
 	],
+	qr/error: --icu-locale cannot be specified unless locale provider "icu" is chosen/,
 	'fails for invalid option combination');
 
 done_testing();
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index a74bf3b0d8..45b44fce46 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -30,11 +30,12 @@ if ($ENV{with_icu} eq 'yes')
 	# This fails because template0 uses libc provider and has no ICU
 	# locale set.  It would succeed if template0 used the icu
 	# provider.  XXX Maybe split into multiple tests?
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu', 'foobar4'
 		],
+		qr/ERROR:  ICU locale must be specified/,
 		'create database with ICU fails without ICU locale specified');
 
 	$node->issues_sql_like(
@@ -46,12 +47,13 @@ if ($ENV{with_icu} eq 'yes')
 		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
 		'create database with ICU locale specified');
 
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu',
 			'--icu-locale=@colNumeric=lower', 'foobarX'
 		],
+		qr/ERROR:  could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	$node->command_fails_like(
@@ -78,18 +80,39 @@ if ($ENV{with_icu} eq 'yes')
 }
 else
 {
-	$node->command_fails(
+	$node->command_fails_like(
 		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		qr/ERROR:  ICU is not supported in this build/,
 		'create database with ICU fails since no ICU support');
+
+	$node->command_fails_like(
+		[
+			'createdb',             '-T',
+			'template0',            '--locale-provider=icu',
+			'--encoding=SQL_ASCII', 'foobar4'
+		],
+		qr/ERROR:  ICU is not supported in this build/,
+		'create database with ICU and not supported ICU encoding fails since no ICU support'
+	);
 }
 
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
-$node->command_fails(
+$node->command_fails_like(
 	[ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	qr/ERROR:  unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
+$node->command_fails_like(
+	[
+		'createdb',        '-T',
+		'template0',       '--locale-provider=libc',
+		'--icu-locale=en', 'foobarX'
+	],
+	qr/ERROR:  ICU locale cannot be specified unless locale provider is ICU/,
+	'fails for invalid option combination');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a875942123..6e8af39976 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -104,8 +104,8 @@ extern char *get_collation_actual_version(char collprovider, const char *collcol
 #ifdef USE_ICU
 extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
 extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
 extern void check_icu_locale(const char *icu_locale);
+#endif
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
-- 
2.25.1

v1-0001-Export-with_icu-when-running-src-bin-scripts-test.patchtext/x-diff; name=v1-0001-Export-with_icu-when-running-src-bin-scripts-test.patchDownload
From dc768bc3b1df0e387ecc3ef636f6233e22c33e59 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Sat, 29 Oct 2022 16:03:21 +0300
Subject: [PATCH v1 1/2] Export with_icu when running src/bin/scripts tests
 with meson

---
 src/bin/scripts/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 837562c24e..c9c74d10ac 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -38,6 +38,7 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
+    'env': {'with_icu': icu.found() ? 'yes' : 'no'},
     'tests': [
       't/010_clusterdb.pl',
       't/011_clusterdb_all.pl',
-- 
2.25.1

In reply to: Marina Polyakova (#2)
Re: Fix order of checking ICU options in initdb and create database

Hello Marina.

I just reviewed your patch.

It applied cleanly at my current master (commit d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67).

Also, it worked as described in email. Since it's a clarification in an
error message, I think the documentation is fine.

I played a bit with "make check", creating a database in my native
language (pt_BR), testing with some data and everything worked as
expected.

--
Jose Arthur Benetasso Villanova

#4Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Jose Arthur Benetasso Villanova (#3)
Re: Fix order of checking ICU options in initdb and create database

On 2022-11-12 22:43, Jose Arthur Benetasso Villanova wrote:

Hello Marina.

I just reviewed your patch.

It applied cleanly at my current master (commit
d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67).

Also, it worked as described in email. Since it's a clarification in
an error message, I think the documentation is fine.

I played a bit with "make check", creating a database in my native
language (pt_BR), testing with some data and everything worked as
expected.

Hello!

Thank you!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#5Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Marina Polyakova (#2)
Re: Fix order of checking ICU options in initdb and create database

On 29.10.22 15:09, Marina Polyakova wrote:

On 2022-10-29 14:33, Marina Polyakova wrote:

Hello!

This is the last proposed patch on this subject [1] moved to a
separate thread for Commitfest..

Also added a patch to export with_icu when running src/bin/scripts tests
[1].

I have committed the meson change.

#6Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Marina Polyakova (#1)
Re: Fix order of checking ICU options in initdb and create database

On 29.10.22 13:33, Marina Polyakova wrote:

2. initdb/create database report problems with the ICU locale/encoding
although they may already report that ICU is not supported in this build:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en-US hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR:  ICU locale must be
specified

$ createdb --locale-provider icu --icu-locale en-US hoge
createdb: error: database creation failed: ERROR:  ICU is not supported
in this build

I'm not in favor of changing this. The existing code intentionally
tries to centralize the "ICU is not supported in this build" knowledge
in few places. Your patch tries to make this check early, but in the
process adds more places where ICU support needs to be checked
explicitly. This increases the code size and also creates a future
burden to maintain that level of checking. I think building without ICU
should be considered a marginal configuration at this point, so we don't
need to go out of our way to create a perfect user experience for this
configuration, as long as we check somewhere in the end.

#7Марина Полякова
polyakova.marina69@gmail.com
In reply to: Peter Eisentraut (#5)
Re: Fix order of checking ICU options in initdb and create database

чт, 17 нояб. 2022 г. в 09:54, Peter Eisentraut
<peter.eisentraut@enterprisedb.com>:

On 29.10.22 15:09, Marina Polyakova wrote:

On 2022-10-29 14:33, Marina Polyakova wrote:

Hello!

This is the last proposed patch on this subject [1] moved to a
separate thread for Commitfest..

Also added a patch to export with_icu when running src/bin/scripts tests
[1].

I have committed the meson change.

Thank you!

(Sorry, I'm having problems sending emails from corporate email :( )

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#8Марина Полякова
polyakova.marina69@gmail.com
In reply to: Peter Eisentraut (#6)
Re: Fix order of checking ICU options in initdb and create database

чт, 17 нояб. 2022 г. в 09:58, Peter Eisentraut
<peter.eisentraut@enterprisedb.com>:

On 29.10.22 13:33, Marina Polyakova wrote:

2. initdb/create database report problems with the ICU locale/encoding
although they may already report that ICU is not supported in this build:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en-US hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ createdb --locale-provider icu --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

I'm not in favor of changing this. The existing code intentionally
tries to centralize the "ICU is not supported in this build" knowledge
in few places. Your patch tries to make this check early, but in the
process adds more places where ICU support needs to be checked
explicitly. This increases the code size and also creates a future
burden to maintain that level of checking. I think building without ICU
should be considered a marginal configuration at this point, so we don't
need to go out of our way to create a perfect user experience for this
configuration, as long as we check somewhere in the end.

Maybe this should be written in the documentation [1]https://www.postgresql.org/docs/15/install-procedure.html or --with-icu
should be used by default? As a developer I usually check something
with the simplest configure run to make sure other options do not
affect the checked behaviour. And some other developers in our company
also use simple configure runs, without --with-icu etc.

[1]: https://www.postgresql.org/docs/15/install-procedure.html

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#9Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Марина Полякова (#8)
Re: Fix order of checking ICU options in initdb and create database

On 19.11.22 13:12, Марина Полякова wrote:

I'm not in favor of changing this. The existing code intentionally
tries to centralize the "ICU is not supported in this build" knowledge
in few places. Your patch tries to make this check early, but in the
process adds more places where ICU support needs to be checked
explicitly. This increases the code size and also creates a future
burden to maintain that level of checking. I think building without ICU
should be considered a marginal configuration at this point, so we don't
need to go out of our way to create a perfect user experience for this
configuration, as long as we check somewhere in the end.

Maybe this should be written in the documentation [1] or --with-icu
should be used by default? As a developer I usually check something
with the simplest configure run to make sure other options do not
affect the checked behaviour. And some other developers in our company
also use simple configure runs, without --with-icu etc.

Well, this isn't a hard rule, just my opinion and where I see the world
moving. It's similar to --with-openssl and --with-lz4 etc.

#10Марина Полякова
polyakova.marina69@gmail.com
In reply to: Peter Eisentraut (#9)
3 attachment(s)
Re: Fix order of checking ICU options in initdb and create database

сб, 19 нояб. 2022 г. в 15:51, Peter Eisentraut
<peter.eisentraut@enterprisedb.com>:

On 19.11.22 13:12, Марина Полякова wrote:

I'm not in favor of changing this. The existing code intentionally
tries to centralize the "ICU is not supported in this build" knowledge
in few places. Your patch tries to make this check early, but in the
process adds more places where ICU support needs to be checked
explicitly. This increases the code size and also creates a future
burden to maintain that level of checking. I think building without ICU
should be considered a marginal configuration at this point, so we don't
need to go out of our way to create a perfect user experience for this
configuration, as long as we check somewhere in the end.

Maybe this should be written in the documentation [1] or --with-icu
should be used by default? As a developer I usually check something
with the simplest configure run to make sure other options do not
affect the checked behaviour. And some other developers in our company
also use simple configure runs, without --with-icu etc.

Well, this isn't a hard rule, just my opinion and where I see the world
moving. It's similar to --with-openssl and --with-lz4 etc.

Here is another set of proposed patches:

v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch
Target: PG 15+
Fix encoding check in initdb when the option --icu-locale is not used:

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

As with the previous version of this fix a side effect is that if ICU
locale is missed (or ICU is not supported in this build), the
provider, locales and encoding are reported before the error message:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user "marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
provider: icu
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: ru_RU.UTF-8
LC_NUMERIC: ru_RU.UTF-8
LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en hoge
The files belonging to this database system will be owned by user "marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: en
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: ru_RU.UTF-8
LC_NUMERIC: ru_RU.UTF-8
LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU is not supported in this build

v2-0002-doc-building-without-ICU-is-not-recommended.patch
Target: PG 15+
Fix the documentation that --without-icu is a marginal configuration.

v2-0003-Build-with-ICU-by-default.patch
Target: PG 16+
Build with ICU by default as already done for readline and zlib libraries.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patchDownload
From b4dca9097633c585da232bc640f2517b3c443f09 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Sat, 19 Nov 2022 17:24:55 +0300
Subject: [PATCH v2 1/3] Fix encoding check in initdb when the option
 --icu-locale is not used

First check that the selected encoding is supported by ICU (the encoding can be
set from lc_ctype which can be set from an environment variable). Only then
check the option for the ICU locale.
---
 src/bin/initdb/initdb.c           | 35 +++++++++++++------------------
 src/bin/initdb/t/001_initdb.pl    | 17 ++++++++++-----
 src/bin/scripts/t/020_createdb.pl | 26 ++++++++++++++++++-----
 3 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a043055..444c8a505a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2047,7 +2047,7 @@ check_locale_encoding(const char *locale, int user_enc)
  *
  * this should match the similar check in the backend createdb() function
  */
-static bool
+static void
 check_icu_locale_encoding(int user_enc)
 {
 	if (!(is_encoding_supported_by_icu(user_enc)))
@@ -2058,9 +2058,19 @@ check_icu_locale_encoding(int user_enc)
 		pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, "
 						  "or choose a matching combination.",
 						  progname);
-		return false;
+		exit(1);
 	}
-	return true;
+
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
+#ifndef USE_ICU
+	pg_fatal("ICU is not supported in this build");
+#endif
 }
 
 /*
@@ -2113,20 +2123,6 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
-
-	if (locale_provider == COLLPROVIDER_ICU)
-	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
-
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
-#ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
-#endif
-	}
 }
 
 /*
@@ -2388,9 +2384,8 @@ setup_locale_encoding(void)
 		!check_locale_encoding(lc_collate, encodingid))
 		exit(1);				/* check_locale_encoding printed the error */
 
-	if (locale_provider == COLLPROVIDER_ICU &&
-		!check_icu_locale_encoding(encodingid))
-		exit(1);
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding(encodingid);
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 164fc11cbf..20fe29851c 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -123,28 +123,35 @@ if ($ENV{with_icu} eq 'yes')
 		[
 			'initdb',                '--no-sync',
 			'--locale-provider=icu', '--encoding=SQL_ASCII',
-			'--icu-locale=en', "$tempdir/dataX"
+			"$tempdir/dataX"
 		],
 		qr/error: encoding mismatch/,
 		'fails for encoding not supported by ICU');
 }
 else
 {
-	command_fails(
-		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=en',
+			"$tempdir/data2"
+		],
+		qr/error: ICU is not supported in this build/,
 		'locale provider ICU fails since no ICU support');
 }
 
-command_fails(
+command_fails_like(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
+	qr/error: unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
-command_fails(
+command_fails_like(
 	[
 		'initdb',                 '--no-sync',
 		'--locale-provider=libc', '--icu-locale=en',
 		"$tempdir/dataX"
 	],
+	qr/error: --icu-locale cannot be specified unless locale provider "icu" is chosen/,
 	'fails for invalid option combination');
 
 done_testing();
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index a74bf3b0d8..3030ea7f9d 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -30,11 +30,12 @@ if ($ENV{with_icu} eq 'yes')
 	# This fails because template0 uses libc provider and has no ICU
 	# locale set.  It would succeed if template0 used the icu
 	# provider.  XXX Maybe split into multiple tests?
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu', 'foobar4'
 		],
+		qr/ERROR:  ICU locale must be specified/,
 		'create database with ICU fails without ICU locale specified');
 
 	$node->issues_sql_like(
@@ -46,12 +47,13 @@ if ($ENV{with_icu} eq 'yes')
 		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
 		'create database with ICU locale specified');
 
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu',
 			'--icu-locale=@colNumeric=lower', 'foobarX'
 		],
+		qr/ERROR:  could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	$node->command_fails_like(
@@ -78,18 +80,32 @@ if ($ENV{with_icu} eq 'yes')
 }
 else
 {
-	$node->command_fails(
-		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+	$node->command_fails_like(
+		[
+			'createdb', '-T', 'template0', '-E', 'UTF8',
+			'--locale-provider=icu', '--icu-locale=en', 'foobar4'
+		],
+		qr/ERROR:  ICU is not supported in this build/,
 		'create database with ICU fails since no ICU support');
 }
 
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
-$node->command_fails(
+$node->command_fails_like(
 	[ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	qr/ERROR:  unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
+$node->command_fails_like(
+	[
+		'createdb',        '-T',
+		'template0',       '--locale-provider=libc',
+		'--icu-locale=en', 'foobarX'
+	],
+	qr/ERROR:  ICU locale cannot be specified unless locale provider is ICU/,
+	'fails for invalid option combination');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
-- 
2.25.1

v2-0003-Build-with-ICU-by-default.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Build-with-ICU-by-default.patchDownload
From fe405bda51528493beda3461255cbe3eac361c6b Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Sat, 19 Nov 2022 18:57:12 +0300
Subject: [PATCH v2 3/3] Build with ICU by default

Add the option --without-icu to turn it off.
---
 configure                      | 35 ++++++++-------
 configure.ac                   | 18 ++++++--
 doc/src/sgml/installation.sgml | 82 ++++++++++++++++------------------
 meson.build                    |  2 +
 4 files changed, 74 insertions(+), 63 deletions(-)

diff --git a/configure b/configure
index 3966368b8d..1b3ecf9d11 100755
--- a/configure
+++ b/configure
@@ -1558,7 +1558,7 @@ Optional Packages:
                           set WAL block size in kB [8]
   --with-CC=CMD           set compiler (deprecated)
   --with-llvm             build with LLVM based JIT support
-  --with-icu              build with ICU support
+  --without-icu           build without ICU support
   --with-tcl              build Tcl modules (PL/Tcl)
   --with-tclconfig=DIR    tclConfig.sh is in DIR
   --with-perl             build Perl modules (PL/Perl)
@@ -8320,7 +8320,9 @@ $as_echo "#define USE_ICU 1" >>confdefs.h
   esac
 
 else
-  with_icu=no
+  with_icu=yes
+
+$as_echo "#define USE_ICU 1" >>confdefs.h
 
 fi
 
@@ -8389,31 +8391,27 @@ fi
 	# Put the nasty error message in config.log where it belongs
 	echo "$ICU_PKG_ERRORS" >&5
 
-	as_fn_error $? "Package requirements (icu-uc icu-i18n) were not met:
-
-$ICU_PKG_ERRORS
-
-Consider adjusting the PKG_CONFIG_PATH environment variable if you
-installed software in a non-standard prefix.
+	as_fn_error $? "ICU library not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible pkg-config isn't looking in the proper directory.
 
 Alternatively, you may set the environment variables ICU_CFLAGS
 and ICU_LIBS to avoid the need to call pkg-config.
-See the pkg-config man page for more details." "$LINENO" 5
+See the pkg-config man page for more details.
+
+Use --without-icu to disable ICU support." "$LINENO" 5
 elif test $pkg_failed = untried; then
         { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-	{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "The pkg-config script could not be found or is too old.  Make sure it
-is in your PATH or set the PKG_CONFIG environment variable to the full
-path to pkg-config.
+	as_fn_error $? "ICU library not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible pkg-config isn't looking in the proper directory.
 
 Alternatively, you may set the environment variables ICU_CFLAGS
 and ICU_LIBS to avoid the need to call pkg-config.
 See the pkg-config man page for more details.
 
-To get pkg-config, see <http://pkg-config.freedesktop.org/>.
-See \`config.log' for more details" "$LINENO" 5; }
+Use --without-icu to disable ICU support." "$LINENO" 5
 else
 	ICU_CFLAGS=$pkg_cv_ICU_CFLAGS
 	ICU_LIBS=$pkg_cv_ICU_LIBS
@@ -16715,7 +16713,10 @@ if test "$with_icu" = yes; then
 if test "x$ac_cv_header_unicode_ucol_h" = xyes; then :
 
 else
-  as_fn_error $? "header file <unicode/ucol.h> is required for ICU" "$LINENO" 5
+  as_fn_error $? "ICU header file <unicode/ucol.h> not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-icu to disable ICU support." "$LINENO" 5
 fi
 
 
diff --git a/configure.ac b/configure.ac
index f76b7ee31f..bf396f8d83 100644
--- a/configure.ac
+++ b/configure.ac
@@ -831,13 +831,22 @@ AC_SUBST(enable_thread_safety)
 # ICU
 #
 AC_MSG_CHECKING([whether to build with ICU support])
-PGAC_ARG_BOOL(with, icu, no, [build with ICU support],
+PGAC_ARG_BOOL(with, icu, yes, [build without ICU support],
               [AC_DEFINE([USE_ICU], 1, [Define to build with ICU support. (--with-icu)])])
 AC_MSG_RESULT([$with_icu])
 AC_SUBST(with_icu)
 
 if test "$with_icu" = yes; then
-  PKG_CHECK_MODULES(ICU, icu-uc icu-i18n)
+  PKG_CHECK_MODULES(ICU, icu-uc icu-i18n, [],
+                    [AC_MSG_ERROR([ICU library not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible pkg-config isn't looking in the proper directory.
+
+Alternatively, you may set the environment variables ICU_CFLAGS
+and ICU_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+Use --without-icu to disable ICU support.])])
 fi
 
 #
@@ -1936,7 +1945,10 @@ if test "$with_icu" = yes; then
 
   # Verify we have ICU's header files
   AC_CHECK_HEADER(unicode/ucol.h, [],
-        [AC_MSG_ERROR([header file <unicode/ucol.h> is required for ICU])])
+        [AC_MSG_ERROR([ICU header file <unicode/ucol.h> not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-icu to disable ICU support.])])
 
   CPPFLAGS=$ac_save_CPPFLAGS
 fi
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index df62573db1..e15f6f8643 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -913,49 +913,6 @@ build-postgresql:
        </listitem>
       </varlistentry>
 
-      <varlistentry>
-       <term><option>--with-icu</option></term>
-       <listitem>
-        <para>
-         Build with support for
-         the <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
-         library, enabling use of ICU collation
-         features<phrase condition="standalone-ignore"> (see
-         <xref linkend="collation"/>)</phrase>.
-         This requires the <productname>ICU4C</productname> package
-         to be installed.  The minimum required version
-         of <productname>ICU4C</productname> is currently 4.2.
-        </para>
-
-        <para>
-         By default,
-         <productname>pkg-config</productname><indexterm><primary>pkg-config</primary></indexterm>
-         will be used to find the required compilation options.  This is
-         supported for <productname>ICU4C</productname> version 4.6 and later.
-         For older versions, or if <productname>pkg-config</productname> is
-         not available, the variables <envar>ICU_CFLAGS</envar>
-         and <envar>ICU_LIBS</envar> can be specified
-         to <filename>configure</filename>, like in this example:
-<programlisting>
-./configure ... --with-icu ICU_CFLAGS='-I/some/where/include' ICU_LIBS='-L/some/where/lib -licui18n -licuuc -licudata'
-</programlisting>
-         (If <productname>ICU4C</productname> is in the default search path
-         for the compiler, then you still need to specify nonempty strings in
-         order to avoid use of <productname>pkg-config</productname>, for
-         example, <literal>ICU_CFLAGS=' '</literal>.)
-        </para>
-
-        <note>
-         <para>
-          Building without
-          <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
-          library is considered a marginal configuration and is not recommended
-          unless really necessary.
-         </para>
-        </note>
-       </listitem>
-      </varlistentry>
-
       <varlistentry id="configure-with-llvm">
        <term><option>--with-llvm</option></term>
        <listitem>
@@ -1266,6 +1223,45 @@ build-postgresql:
        </listitem>
       </varlistentry>
 
+      <varlistentry>
+       <term><option>--without-icu</option></term>
+       <listitem>
+        <para>
+         Prevents use of the
+         <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
+         library. This disables support of ICU collation
+         features<phrase condition="standalone-ignore"> (see
+         <xref linkend="collation"/>)</phrase>.
+        </para>
+        <note>
+         <para>
+          Building with support for
+          the <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
+          library requires the <productname>ICU4C</productname> package
+          to be installed.  The minimum required version
+          of <productname>ICU4C</productname> is currently 4.2.
+         </para>
+         <para>
+          By default,
+          <productname>pkg-config</productname><indexterm><primary>pkg-config</primary></indexterm>
+          will be used to find the required compilation options.  This is
+          supported for <productname>ICU4C</productname> version 4.6 and later.
+          For older versions, or if <productname>pkg-config</productname> is
+          not available, the variables <envar>ICU_CFLAGS</envar>
+          and <envar>ICU_LIBS</envar> can be specified
+          to <filename>configure</filename>, like in this example:
+ <programlisting>
+ ./configure ... ICU_CFLAGS='-I/some/where/include' ICU_LIBS='-L/some/where/lib -licui18n -licuuc -licudata'
+ </programlisting>
+          (If <productname>ICU4C</productname> is in the default search path
+          for the compiler, then you still need to specify nonempty strings in
+          order to avoid use of <productname>pkg-config</productname>, for
+          example, <literal>ICU_CFLAGS=' '</literal>.)
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
       <varlistentry>
        <term><option>--disable-spinlocks</option></term>
        <listitem>
diff --git a/meson.build b/meson.build
index 058382046e..5e3cd93a69 100644
--- a/meson.build
+++ b/meson.build
@@ -709,6 +709,8 @@ if not icuopt.disabled()
 
   if icu.found()
     cdata.set('USE_ICU', 1)
+  else
+    warning('did not find icu')
   endif
 
 else
-- 
2.25.1

v2-0002-doc-building-without-ICU-is-not-recommended.patchtext/x-patch; charset=US-ASCII; name=v2-0002-doc-building-without-ICU-is-not-recommended.patchDownload
From 5a994c9d039987c2739ccd788450205b06e1782c Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Sat, 19 Nov 2022 17:39:42 +0300
Subject: [PATCH v2 2/3] doc: building without ICU is not recommended

---
 doc/src/sgml/installation.sgml | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 319c7e6966..df62573db1 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -944,6 +944,15 @@ build-postgresql:
          order to avoid use of <productname>pkg-config</productname>, for
          example, <literal>ICU_CFLAGS=' '</literal>.)
         </para>
+
+        <note>
+         <para>
+          Building without
+          <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
+          library is considered a marginal configuration and is not recommended
+          unless really necessary.
+         </para>
+        </note>
        </listitem>
       </varlistentry>
 
-- 
2.25.1

#11Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Марина Полякова (#10)
Re: Fix order of checking ICU options in initdb and create database

On 19.11.22 20:36, Марина Полякова wrote:

Here is another set of proposed patches:

v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch
Target: PG 15+
Fix encoding check in initdb when the option --icu-locale is not used:

I'm having a hard time figuring out from your examples what you are
trying to change. Which one is the "before" example and which one is
the "after", and which aspect specifically is the issue and what
specifically is being addressed? I tried out the examples in the
current code and didn't find anything obviously wrong in the behavior.

I'm concerned that the initdb.c changes are a bit unprincipled. They
just move code around to achieve some behavior without keeping the
overall structure in mind. For example, check_icu_locale_encoding()
intentionally had the same API as check_locale_encoding(), but now
that's being changed. And setlocales() did all the locale parameter
validity checking, but now part of that is being moved around. I'm
afraid this makes initdb.c even more spaghetti code than it already is.

What about those test changes? I can't tell if they are related.
createdb isn't being changed; is that test change related or separate?

v2-0002-doc-building-without-ICU-is-not-recommended.patch
Target: PG 15+
Fix the documentation that --without-icu is a marginal configuration.

v2-0003-Build-with-ICU-by-default.patch
Target: PG 16+
Build with ICU by default as already done for readline and zlib libraries.

We are not going to make these kinds of changes specifically for ICU.
I'd say, for example, the same applies to --with-openssl and --with-lz4,
and probably others. If this is an issue, then we need a more general
approach than just ICU. This should be a separate thread in any case.

#12Gregory Stark (as CFM)
stark.cfm@gmail.com
In reply to: Peter Eisentraut (#11)
Re: Fix order of checking ICU options in initdb and create database

Is this feedback enough to focus the work on the right things?

I feel like Marina Polyakova pointed out some real confusing behaviour
and perhaps there's a way to solve them by focusing on one at a time
without making large changes in the code.

Perhaps an idea would be to have each module provide two functions, one
which is called early and signals an error if that module's parameters
are provided when it's not compiled in, and a second which verifies
that the parameters are consistent at the point in time where that's
appropriate. (Not entirely unlike what we do for GUCs, though simpler)

If every module did that consistently then it would avoid making the
changes "unprincipled" or "spaghetti" though frankly I find words like
that not very helpful to someone receiving that feedback.

The patch is obviously not ready for commit now but it also seems like
the feedback has not been really sufficient for Marina Polyakova to
make progress either.

--
Gregory Stark
As Commitfest Manager

#13Daniel Gustafsson
daniel@yesql.se
In reply to: Gregory Stark (as CFM) (#12)
Re: Fix order of checking ICU options in initdb and create database

This patch has now been waiting for author since December, with the thread
stalled. I am marking this returned with feedback for now, please feel free to
re-submit the patch to a future CF when there is renewed interest in working on
this.

--
Daniel Gustafsson