pgsql: ICU support
ICU support
Add a column collprovider to pg_collation that determines which library
provides the collation data. The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.
The pg_locale_t type is changed to a union that contains the
provider-specific locale handles. Users of locale information are
changed to look into that struct for the appropriate handle to use.
Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures. This is currently only supported by
ICU-provided collations.
initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.
Currently, ICU-provided collations can only be explicitly named
collations. The global database locales are still always libc-provided.
ICU support is enabled by configure --with-icu.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/eccfef81e1f73ee41f1d8bfe4fa4e80576945048
Modified Files
--------------
aclocal.m4 | 1 +
config/pkg.m4 | 275 ++++++
configure | 313 ++++++
configure.in | 35 +
doc/src/sgml/catalogs.sgml | 19 +
doc/src/sgml/charset.sgml | 177 +++-
doc/src/sgml/func.sgml | 17 +
doc/src/sgml/installation.sgml | 15 +
doc/src/sgml/mvcc.sgml | 3 +-
doc/src/sgml/ref/alter_collation.sgml | 55 ++
doc/src/sgml/ref/create_collation.sgml | 37 +-
src/Makefile.global.in | 4 +
src/backend/Makefile | 2 +-
src/backend/catalog/pg_collation.c | 52 +-
src/backend/commands/collationcmds.c | 288 +++++-
src/backend/common.mk | 2 +
src/backend/nodes/copyfuncs.c | 13 +
src/backend/nodes/equalfuncs.c | 11 +
src/backend/parser/gram.y | 18 +-
src/backend/regex/regc_pg_locale.c | 110 ++-
src/backend/tcop/utility.c | 8 +
src/backend/utils/adt/formatting.c | 453 +++++----
src/backend/utils/adt/like.c | 53 +-
src/backend/utils/adt/pg_locale.c | 266 ++++-
src/backend/utils/adt/selfuncs.c | 8 +-
src/backend/utils/adt/varlena.c | 179 +++-
src/backend/utils/mb/encnames.c | 76 ++
src/bin/initdb/initdb.c | 3 +-
src/bin/pg_dump/pg_dump.c | 75 +-
src/bin/pg_dump/t/002_pg_dump.pl | 2 +-
src/bin/psql/describe.c | 7 +-
src/include/catalog/pg_collation.h | 25 +-
src/include/catalog/pg_collation_fn.h | 2 +
src/include/catalog/pg_proc.h | 3 +
src/include/commands/collationcmds.h | 1 +
src/include/mb/pg_wchar.h | 6 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/parsenodes.h | 11 +
src/include/pg_config.h.in | 6 +
src/include/utils/pg_locale.h | 32 +-
src/test/regress/GNUmakefile | 3 +
src/test/regress/expected/collate.icu.out | 1126 ++++++++++++++++++++++
src/test/regress/expected/collate.linux.utf8.out | 80 +-
src/test/regress/sql/collate.icu.sql | 433 +++++++++
src/test/regress/sql/collate.linux.utf8.sql | 32 +
45 files changed, 3929 insertions(+), 409 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 3/23/17 15:33, Peter Eisentraut wrote:
ICU support
Seeing some crashes on the build farm, investigating. If someone can
reproduce locally and get a backtrace, let me know.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/23/2017 04:07 PM, Peter Eisentraut wrote:
On 3/23/17 15:33, Peter Eisentraut wrote:
ICU support
Seeing some crashes on the build farm, investigating. If someone can
reproduce locally and get a backtrace, let me know.
The buildfarm actually does that :-)
See for example
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2017-03-23%2019%3A37%3A22>
which has stuff like:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 Generic_Text_IC_like (collation=100, pat=0x10cf3f8, str=0x10cf388) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:197
197 if (pg_database_encoding_max_length() > 1 || locale->provider == COLLPROVIDER_ICU)
#0 Generic_Text_IC_like (collation=100, pat=0x10cf3f8, str=0x10cf388) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:197
#1 texticlike (fcinfo=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:394
#2 0x00000000005db832 in ExecMakeFunctionResultNoSets (fcache=0x106af48, econtext=0x106b828, isNull=0x7fff5884877c "") at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/executor/execQual.c:1866
#3 0x00000000005dfe76 in ExecEvalExprSwitchContext (expression=expression@entry=0x106af48, econtext=<optimized out>, isNull=isNull@entry=0x7fff5884877c "") at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/executor/execQual.c:4228
#4 0x0000000000671d29 in evaluate_expr (expr=<optimized out>, result_type=result_type@entry=16, result_typmod=result_typmod@entry=-1, result_collation=result_collation@entry=0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4680
#5 0x00000000006732ab in evaluate_function (context=0x7fff58848aa0, func_tuple=0x7fb799a56048, funcvariadic=0 '\000', args=0x10cf8a0, input_collid=100, result_collid=0, result_typmod=-1, result_type=16, funcid=1633) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4237
#6 simplify_function (funcid=1633, result_type=16, result_typmod=result_typmod@entry=-1, result_collid=0, input_collid=100, args_p=args_p@entry=0x7fff588488f8, funcvariadic=0 '\000', process_args=1 '\001', allow_non_const=1 '\001', context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3877
#7 0x0000000000673b4a in eval_const_expressions_mutator (node=0x10cf410, context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2584
#8 0x0000000000619be7 in expression_tree_mutator (node=node@entry=0x10cf460, mutator=mutator@entry=0x673a10 <eval_const_expressions_mutator>, context=context@entry=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:3017
#9 0x0000000000673a50 in eval_const_expressions_mutator (node=0x10cf460, context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3527
#10 0x0000000000619e4b in expression_tree_mutator (node=node@entry=0x10cf4b0, mutator=mutator@entry=0x673a10 <eval_const_expressions_mutator>, context=context@entry=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:2912
#11 0x0000000000673a50 in eval_const_expressions_mutator (node=0x10cf4b0, context=context@entry=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3527
#12 0x000000000067574f in eval_const_expressions (root=root@entry=0x10cf5d0, node=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2378
#13 0x0000000000660d95 in preprocess_expression (root=root@entry=0x10cf5d0, expr=<optimized out>, kind=kind@entry=1) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:880
#14 0x000000000066519d in subquery_planner (glob=glob@entry=0x10cf198, parse=parse@entry=0x10cec38, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=tuple_fraction@entry=0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:593
#15 0x0000000000666321 in standard_planner (parse=0x10cec38, cursorOptions=256, boundParams=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:306
#16 0x00000000006efeec in pg_plan_query (querytree=querytree@entry=0x10cec38, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:798
#17 0x00000000006effce in pg_plan_queries (querytrees=<optimized out>, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:864
#18 0x00000000006f0467 in exec_simple_query (query_string=0x10cdd68 "SELECT 'hawkeye' ILIKE 'h%' AS \"true\";") at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:1029
#19 0x00000000006f17cb in PostgresMain (argc=<optimized out>, argv=argv@entry=0x10784a0, dbname=0x1078298 "regression", username=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:4071
#20 0x0000000000478ce0 in BackendRun (port=0x1073670) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4317
#21 BackendStartup (port=0x1073670) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:3989
#22 ServerLoop () at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1729
#23 0x00000000006907f7 in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x104c6e0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1337
#24 0x0000000000479e98 in main (argc=8, argv=0x104c6e0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:228
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
The buildfarm's mostly happy, but "prion" still isn't, which
suggests there's someplace in the new code that references an
already-closed relcache entry.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
I wrote:
The buildfarm's mostly happy, but "prion" still isn't, which
suggests there's someplace in the new code that references an
already-closed relcache entry.
Ah, scratch that, I was overthinking it. The problem is pretty
clear from the error message:
+ ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist
I can reproduce this with vanilla configure options if I'm running
with --with-icu and LANG=C. In short, this regression test does not
work in C locale, and you need to find a way to disable it in that
environment.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/23/2017 11:53 PM, Tom Lane wrote:
I wrote:
The buildfarm's mostly happy, but "prion" still isn't, which
suggests there's someplace in the new code that references an
already-closed relcache entry.Ah, scratch that, I was overthinking it. The problem is pretty
clear from the error message:+ ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist
I can reproduce this with vanilla configure options if I'm running
with --with-icu and LANG=C. In short, this regression test does not
work in C locale, and you need to find a way to disable it in that
environment.
Possibly something like this:
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index a747fac..d3a68cb 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -126,8 +126,12 @@ tablespace-setup:
REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS)
ifeq ($(with_icu),yes)
+ifndef NO_LOCALE
+ifneq ($(LANG),C)
override EXTRA_TESTS := collate.icu $(EXTRA_TESTS)
endif
+endif
+endif
check: all tablespace-setup
$(pg_regress_check) $(REGRESS_OPTS)
--schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
On 03/23/2017 11:53 PM, Tom Lane wrote:
+ ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist
I can reproduce this with vanilla configure options if I'm running
with --with-icu and LANG=C. In short, this regression test does not
work in C locale, and you need to find a way to disable it in that
environment.
Possibly something like this:
REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS) ifeq ($(with_icu),yes) +ifndef NO_LOCALE +ifneq ($(LANG),C) override EXTRA_TESTS := collate.icu $(EXTRA_TESTS) endif +endif +endif
That's better than nothing, certainly, but it doesn't catch all the ways
that C locale could be selected (LC_ALL, no valid setting for LANG, etc).
I suppose that what we really want to know is what encoding will be used
in the regression database. I wonder whether the Makefile could find
that out more directly. Although maybe it's time to bite the bullet
and teach pg_regress how to select whether or not to run the test ---
that would have the advantage of (probably) working on Windows, which
no amount of Makefile-hacking will do.
BTW, is there a reason that override isn't spelled more like
override EXTRA_TESTS += collate.icu
? That seems more readable and idiomatic to me.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/24/2017 09:26 AM, Tom Lane wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
On 03/23/2017 11:53 PM, Tom Lane wrote:
+ ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist
I can reproduce this with vanilla configure options if I'm running
with --with-icu and LANG=C. In short, this regression test does not
work in C locale, and you need to find a way to disable it in that
environment.Possibly something like this:
REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS)
ifeq ($(with_icu),yes)
+ifndef NO_LOCALE
+ifneq ($(LANG),C)
override EXTRA_TESTS := collate.icu $(EXTRA_TESTS)
endif
+endif
+endifThat's better than nothing, certainly, but it doesn't catch all the ways
that C locale could be selected (LC_ALL, no valid setting for LANG, etc).I suppose that what we really want to know is what encoding will be used
in the regression database. I wonder whether the Makefile could find
that out more directly. Although maybe it's time to bite the bullet
and teach pg_regress how to select whether or not to run the test ---
that would have the advantage of (probably) working on Windows, which
no amount of Makefile-hacking will do.
All true, although Peter hasn't actually committed a change that would
enable the test in MSVC builds, so that's not a danger right now.
While the change could be made more robust, along with 3 lines of change
in the buildfarm code to make sure LANG is set in appropriate places it
is enough to turn prion green, and actually running the tests when
testing en_US.UTF8.
I agree we need a general mechanism for running tests conditionally.
Let's not wait for that, though :-)
BTW, is there a reason that override isn't spelled more like
override EXTRA_TESTS += collate.icu
? That seems more readable and idiomatic to me.
+1.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 24 March 2017 at 06:33, Peter Eisentraut <peter_e@gmx.net> wrote:
http://git.postgresql.org/pg/commitdiff/eccfef81e1f73ee41f1d8bfe4fa4e80576945048
[...]
src/include/utils/pg_locale.h | 32 +-
I see this commit changed the definition of pg_locale_t
+struct pg_locale_t
+{
+ char provider;
+ union
+ {
#ifdef HAVE_LOCALE_T
-typedef locale_t pg_locale_t;
-#else
-typedef int pg_locale_t;
+ locale_t lt;
+#endif
+#ifdef USE_ICU
+ struct {
+ const char *locale;
+ UCollator *ucol;
+ } icu;
#endif
+ } info;
+};
but forgot to update varstr_cmp() completely.
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale);
should be:
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale->info.lt);
Patch attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
icu_fix.patchapplication/octet-stream; name=icu_fix.patchDownload+1-1
David Rowley <david.rowley@2ndquadrant.com> writes:
I see this commit changed the definition of pg_locale_t
but forgot to update varstr_cmp() completely.
It's pretty disturbing that we don't seem to have caught that
in testing. Some work on code coverage seems indicated.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 3/23/17 23:53, Tom Lane wrote:
Ah, scratch that, I was overthinking it. The problem is pretty
clear from the error message:+ ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist
I can reproduce this with vanilla configure options if I'm running
with --with-icu and LANG=C. In short, this regression test does not
work in C locale, and you need to find a way to disable it in that
environment.
Done.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 3/24/17 20:32, David Rowley wrote:
but forgot to update varstr_cmp() completely.
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale);
should be:
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale->info.lt);
Patch attached.
Fixed.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 3/24/17 22:28, Tom Lane wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
I see this commit changed the definition of pg_locale_t
but forgot to update varstr_cmp() completely.It's pretty disturbing that we don't seem to have caught that
in testing.
Or during compilation!?!
Some work on code coverage seems indicated.
That would require a Windows variant of collate.linux.utf8.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers