Reverse collations (initially for making keyset pagination cover more cases)

Started by David Fetterabout 6 years ago8 messages
#1David Fetter
david@fetter.org
1 attachment(s)

Folks,

Please find attached a patch for $Subject.

Motivation:

When people are doing keyset pagination, the simple cases redound to
adding a WHERE that looks like

(a, b, c) > (most_recent_a, most_recent_b, most_recent_c)

which corresponds to an ORDER BY clause that looks like

ORDER BY a, b, c

The fun starts when there are mixes of ASC and DESC in the ORDER BY
clause. Reverse collations make this simpler by inverting the meaning
of > (or similar), which makes the rowtypes still sortable in a new
dictionary order, so the pagination would look like:

(a, b, c) > (most_recent_a, most_recent_b COLLATE "C_backwards", most_recent_c)

with an ORDER BY like:

ORDER BY a, b DESC, c

What say?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

v1-0001-Reverse-collations.patchtext/x-diff; charset=us-asciiDownload
From cd603421da0d8e4fc19401f24f46da8a26d88d3c Mon Sep 17 00:00:00 2001
From: David Fetter <david@fetter.org>
Date: Sat, 16 Nov 2019 09:29:14 -0800
Subject: [PATCH v1] Reverse collations
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.23.0"

This is a multi-part message in MIME format.
--------------2.23.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Make it possible to define collations which reverse the usual meanings
of <, <=, >, and >= for their corresponding collations.

This in turn makes it easier to do keyset pagination on text with mixes
of ASC and DESC in the ORDER BY.

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 55694c4368..3086936515 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2106,6 +2106,13 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>Is the collation deterministic?</entry>
      </row>
 
+     <row>
+      <entry><structfield>collisreverse</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>Is the collation reversed?</entry>
+     </row>
+
      <row>
       <entry><structfield>collencoding</structfield></entry>
       <entry><type>int4</type></entry>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index def4dda6e8..cb913871b7 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -24,7 +24,8 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> (
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
-    [ VERSION = <replaceable>version</replaceable> ]
+    [ VERSION = <replaceable>version</replaceable>, ]
+    [ REVERSE = <replaceable>boolean</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
 </synopsis>
@@ -166,6 +167,16 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>REVERSE</literal></term>
+
+     <listitem>
+      <para>
+       Specifies whether the collation should sort in reverse. Defaults to false.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>existing_collation</replaceable></term>
 
@@ -225,6 +236,13 @@ CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');
 </programlisting>
   </para>
 
+  <para>
+   To create a collation from the <literal>C</literal> locale that sorts backwards:
+<programlisting>
+CREATE COLLATION C_sdrawckab (locale = 'C', reverse = true);
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index dd99d53547..6198f77f36 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -47,6 +47,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				char collprovider,
 				bool collisdeterministic,
+				bool collisreverse,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *collversion,
@@ -162,6 +163,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collowner - 1] = ObjectIdGetDatum(collowner);
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
+	values[Anum_pg_collation_collisreverse - 1] = BoolGetDatum(collisreverse);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
 	namestrcpy(&name_collate, collcollate);
 	values[Anum_pg_collation_collcollate - 1] = NameGetDatum(&name_collate);
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 919e092483..ab37705b27 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -60,11 +60,13 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *reverseEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate = NULL;
 	char	   *collctype = NULL;
 	char	   *collproviderstr = NULL;
 	bool		collisdeterministic = true;
+	bool		collisreverse = false;
 	int			collencoding = 0;
 	char		collprovider = 0;
 	char	   *collversion = NULL;
@@ -95,6 +97,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "reverse") == 0)
+			defelp = &reverseEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -130,6 +134,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collctype = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collctype));
 		collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 		collisdeterministic = ((Form_pg_collation) GETSTRUCT(tp))->collisdeterministic;
+		collisreverse = ((Form_pg_collation) GETSTRUCT(tp))->collisreverse;
 		collencoding = ((Form_pg_collation) GETSTRUCT(tp))->collencoding;
 
 		ReleaseSysCache(tp);
@@ -165,6 +170,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	if (deterministicEl)
 		collisdeterministic = defGetBoolean(deterministicEl);
 
+	if (reverseEl)
+		collisreverse = defGetBoolean(reverseEl);
+
 	if (versionEl)
 		collversion = defGetString(versionEl);
 
@@ -222,6 +230,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 GetUserId(),
 							 collprovider,
 							 collisdeterministic,
+							 collisreverse,
 							 collencoding,
 							 collcollate,
 							 collctype,
@@ -605,7 +614,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 * about existing ones.
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
-									 COLLPROVIDER_LIBC, true, enc,
+									 COLLPROVIDER_LIBC, true, false, enc,
 									 localebuf, localebuf,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
@@ -666,7 +675,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			int			enc = aliases[i].enc;
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
-									 COLLPROVIDER_LIBC, true, enc,
+									 COLLPROVIDER_LIBC, true, false, enc,
 									 locale, locale,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
@@ -728,7 +737,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
-									 COLLPROVIDER_ICU, true, -1,
+									 COLLPROVIDER_ICU, true, false, -1,
 									 collcollate, collcollate,
 									 get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
 									 true, true);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index fcdbaae37b..f9e47c8bff 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1155,7 +1155,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		collcollate = NameStr(collform->collcollate);
 		collctype = NameStr(collform->collctype);
 
-		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
+		cache_entry->collate_is_c = !collform->collisreverse &&
+									((strcmp(collcollate, "C") == 0) ||
 									 (strcmp(collcollate, "POSIX") == 0));
 		cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
 								   (strcmp(collctype, "POSIX") == 0));
@@ -1357,6 +1358,7 @@ pg_newlocale_from_collation(Oid collid)
 		memset(&result, 0, sizeof(result));
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
+		result.reverse = collform->collisreverse;
 
 		if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 69165eb311..02cbcbd23d 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1603,6 +1603,9 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 			if (a2p != a2buf)
 				pfree(a2p);
 
+			if (collid != DEFAULT_COLLATION_OID && mylocale->reverse)
+				INVERT_COMPARE_RESULT(result);
+
 			return result;
 		}
 #endif							/* WIN32 */
@@ -1681,6 +1684,9 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 			(!mylocale || mylocale->deterministic))
 			result = strcmp(a1p, a2p);
 
+		if (collid != DEFAULT_COLLATION_OID && mylocale->reverse)
+			INVERT_COMPARE_RESULT(result);
+
 		if (a1p != a1buf)
 			pfree(a1p);
 		if (a2p != a2buf)
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 27602fa49c..61daf205d8 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -954,6 +954,22 @@ get_collation_isdeterministic(Oid colloid)
 	return result;
 }
 
+bool
+get_collation_isreverse(Oid colloid)
+{
+	HeapTuple	tp;
+	Form_pg_collation	colltup;
+	bool		result;
+
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(colloid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for collation %u", colloid);
+	colltup = (Form_pg_collation) GETSTRUCT(tp);
+	result = colltup->collisreverse;
+	ReleaseSysCache(tp);
+	return result;
+}
+
 /*				---------- CONSTRAINT CACHE ----------					 */
 
 /*
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..33e357c300 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1712,8 +1712,8 @@ setup_collation(FILE *cmdfd)
 	 * in pg_collation.h.  But add it before reading system collations, so
 	 * that it wins if libc defines a locale named ucs_basic.
 	 */
-	PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, collcollate, collctype)"
-				   "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, %d, 'C', 'C');\n\n",
+	PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collisreverse, collencoding, collcollate, collctype)"
+				   "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, false, %d, 'C', 'C');\n\n",
 				   BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, PG_UTF8);
 
 	/* Now import all collations we can find in the operating system */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bf69adc2f4..32801744b9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -13466,6 +13466,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 	PGresult   *res;
 	int			i_collprovider;
 	int			i_collisdeterministic;
+	int			i_collisreverse;
 	int			i_collcollate;
 	int			i_collctype;
 	const char *collprovider;
@@ -13501,6 +13502,13 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 		appendPQExpBufferStr(query,
 							 "true AS collisdeterministic, ");
 
+	if (fout->remoteVersion >= 130000)
+		appendPQExpBufferStr(query,
+							 "collisreverse, ");
+	else
+		appendPQExpBufferStr(query,
+							 "false as collisreverse, ");
+
 	appendPQExpBuffer(query,
 					  "collcollate, "
 					  "collctype "
@@ -13512,6 +13520,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 
 	i_collprovider = PQfnumber(res, "collprovider");
 	i_collisdeterministic = PQfnumber(res, "collisdeterministic");
+	i_collisreverse = PQfnumber(res, "collisreverse");
 	i_collcollate = PQfnumber(res, "collcollate");
 	i_collctype = PQfnumber(res, "collctype");
 
@@ -13540,6 +13549,9 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 	if (strcmp(PQgetvalue(res, 0, i_collisdeterministic), "f") == 0)
 		appendPQExpBufferStr(q, ", deterministic = false");
 
+	if (strcmp(PQgetvalue(res, 0, i_collisreverse), "t") == 0)
+		appendPQExpBufferStr(q, ", reverse = true");
+
 	if (strcmp(collcollate, collctype) == 0)
 	{
 		appendPQExpBufferStr(q, ", locale = ");
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b3b9313b36..d87aa96915 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4525,6 +4525,17 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 						  gettext_noop("yes"),
 						  gettext_noop("Deterministic?"));
 
+	if (pset.sversion >= 130000)
+		appendPQExpBuffer(&buf,
+						  ",\n       CASE WHEN c.collisreverse THEN '%s' ELSE '%s' END AS \"%s\"",
+						  gettext_noop("yes"), gettext_noop("no"),
+						  gettext_noop("Reverse?"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       '%s' AS \"%s\"",
+						  gettext_noop("no"),
+						  gettext_noop("Reverse?"));
+
 	if (verbose)
 		appendPQExpBuffer(&buf,
 						  ",\n       pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index d3366f361d..929d521d05 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -34,6 +34,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	Oid			collowner;		/* owner of collation */
 	char		collprovider;	/* see constants below */
 	bool		collisdeterministic BKI_DEFAULT(t);
+	bool		collisreverse BKI_DEFAULT(f);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
 	NameData	collcollate;	/* LC_COLLATE setting */
 	NameData	collctype;		/* LC_CTYPE setting */
@@ -63,6 +64,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							Oid collowner,
 							char collprovider,
 							bool collisdeterministic,
+							bool collisreverse,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *collversion,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bff9f..90e1fbc33a 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -92,6 +92,7 @@ extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 								  Oid *typid, int32 *typmod, Oid *collid);
 extern char *get_collation_name(Oid colloid);
 extern bool get_collation_isdeterministic(Oid colloid);
+extern bool get_collation_isreverse(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern char *get_language_name(Oid langoid, bool missing_ok);
 extern Oid	get_opclass_family(Oid opclass);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b4b3aa5843..d571eab4c6 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -83,6 +83,7 @@ struct pg_locale_struct
 {
 	char		provider;
 	bool		deterministic;
+	bool		reverse;
 	union
 	{
 #ifdef HAVE_LOCALE_T
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0dee7d783a..2e51dde931 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -11,6 +11,7 @@
  */
 CREATE SCHEMA collate_tests;
 SET search_path = collate_tests;
+CREATE COLLATION "C_sdrawckab" (locale = 'C', reverse = true);
 CREATE TABLE collate_test1 (
     a int,
     b text COLLATE "C" NOT NULL
@@ -66,6 +67,14 @@ SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'abc' COLLATE "C";
  3 | bbc
 (2 rows)
 
+SELECT * FROM collate_test1 WHERE b COLLATE "C_sdrawckab" >= 'abc' COLLATE "C_sdrawckab";
+ a |  b  
+---+-----
+ 1 | abc
+ 2 | Abc
+ 4 | ABD
+(3 rows)
+
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'bbc' COLLATE "POSIX"; -- fail
 ERROR:  collation mismatch between explicit collations "C" and "POSIX"
 LINE 1: ...* FROM collate_test1 WHERE b COLLATE "C" >= 'bbc' COLLATE "P...
@@ -682,8 +691,9 @@ SELECT collation for ((SELECT b FROM collate_test1 LIMIT 1));
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 17 other objects
-DETAIL:  drop cascades to table collate_test1
+NOTICE:  drop cascades to 18 other objects
+DETAIL:  drop cascades to collation "C_sdrawckab"
+drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
 drop cascades to type testdomain_p
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index 89de26a227..1317ad4733 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -13,6 +13,8 @@
 CREATE SCHEMA collate_tests;
 SET search_path = collate_tests;
 
+CREATE COLLATION "C_sdrawckab" (locale = 'C', reverse = true);
+
 CREATE TABLE collate_test1 (
     a int,
     b text COLLATE "C" NOT NULL
@@ -42,6 +44,7 @@ INSERT INTO collate_test2 SELECT * FROM collate_test1;
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'abc';
 SELECT * FROM collate_test1 WHERE b >= 'abc' COLLATE "C";
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'abc' COLLATE "C";
+SELECT * FROM collate_test1 WHERE b COLLATE "C_sdrawckab" >= 'abc' COLLATE "C_sdrawckab";
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'bbc' COLLATE "POSIX"; -- fail
 
 CREATE DOMAIN testdomain_p AS text COLLATE "POSIX";

--------------2.23.0--


#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#1)
Re: Reverse collations (initially for making keyset pagination cover more cases)

David Fetter <david@fetter.org> writes:

Please find attached a patch for $Subject.

I think there's a reason why this hasn't been proposed before.

Back before we had full support of ASC/DESC index sort order, there was
interest in having reverse-sort operator classes, and there are bits and
pieces still in the code that tried to cater to that. But we never got
it to the point where such things would really be pleasant to use.
Now that we have ASC/DESC indexes, there's no value in a reverse-sort
operator class, so the idea's pretty much been consigned to the dustbin.

This looks to me like it's trying to go down that same path at the
collation level, and it seems like just as bad of an idea here.

The fundamental problem with what you propose is that it'd require
a bunch of infrastructure (which you haven't even attempted) to teach
the planner about the relationships between forward- and reverse-sort
collation pairs, so that it could figure out that scanning some index
backwards would satisfy a request for the reverse-sort collation,
or vice versa. Without such infrastructure, the feature is really
just a gotcha, because queries won't get optimized the way users
would expect them to.

And no, I don't think we should accept the feature and then go write
that infrastructure. If we couldn't make it work well at the opclass
level, I don't think things will go better at the collation level.

Lastly, your proposed use-case has some attraction, but this proposal
only supports it if the column you need to be differently sorted is
textual. What if the sort columns are all numerics and timestamps?
Thinking about that, it seems like what we'd want is some sort of
more-general notion of row comparison, to express "bounded below in
an arbitrary ORDER BY ordering". Not quite sure what it ought to
look like.

regards, tom lane

#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: Reverse collations (initially for making keyset pagination cover more cases)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Lastly, your proposed use-case has some attraction, but this
Tom> proposal only supports it if the column you need to be differently
Tom> sorted is textual. What if the sort columns are all numerics and
Tom> timestamps?

There are already trivial ways to reverse the orders of those, viz.
(-number) and (-extract(epoch from timestampcol)). The lack of any
equivalent method for text is what prompted this idea.

Tom> Thinking about that, it seems like what we'd want is some sort of
Tom> more-general notion of row comparison, to express "bounded below
Tom> in an arbitrary ORDER BY ordering". Not quite sure what it ought
Tom> to look like.

Well, one obvious completely general method is to teach the planner
(somehow) to spot conditions of the form

(a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...)

etc. and make them indexable if the sense of the > or < operator at
each step matched an ASC or DESC column in the index.

This would be a substantial win, because this kind of condition is one
often (incorrectly, for current PG) shown as an example of how to do
keyset pagination on multiple columns. But it would require some amount
of new logic in both the planner and, afaik, in the btree AM; I haven't
looked at how much.

--
Andrew (irc:RhodiumToad)

#4David Fetter
david@fetter.org
In reply to: Tom Lane (#2)
Re: Reverse collations (initially for making keyset pagination cover more cases)

On Sun, Nov 17, 2019 at 02:30:35PM -0500, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

Please find attached a patch for $Subject.

I think there's a reason why this hasn't been proposed before.

Back before we had full support of ASC/DESC index sort order, there was
interest in having reverse-sort operator classes, and there are bits and
pieces still in the code that tried to cater to that. But we never got
it to the point where such things would really be pleasant to use.
Now that we have ASC/DESC indexes, there's no value in a reverse-sort
operator class, so the idea's pretty much been consigned to the dustbin.

This looks to me like it's trying to go down that same path at the
collation level, and it seems like just as bad of an idea here.

The fundamental problem with what you propose is that it'd require
a bunch of infrastructure (which you haven't even attempted) to teach
the planner about the relationships between forward- and reverse-sort
collation pairs, so that it could figure out that scanning some index
backwards would satisfy a request for the reverse-sort collation,
or vice versa. Without such infrastructure, the feature is really
just a gotcha, because queries won't get optimized the way users
would expect them to.

And no, I don't think we should accept the feature and then go write
that infrastructure. If we couldn't make it work well at the opclass
level, I don't think things will go better at the collation level.

Lastly, your proposed use-case has some attraction, but this proposal
only supports it if the column you need to be differently sorted is
textual. What if the sort columns are all numerics and timestamps?

Those are pretty straightforward to generate: -column, and
-extract('epoch' FROM column), respectively.

Thinking about that, it seems like what we'd want is some sort of
more-general notion of row comparison, to express "bounded below in
an arbitrary ORDER BY ordering". Not quite sure what it ought to
look like.

I'm not, either, but the one I'm proposing seems like a lot less
redundant code (and hence a lot less room for errors) than what people
generally see proposed for this use case, to wit:

(a, b, c) < ($1, $2 COLLATE "C_backwards", $3)
...
ORDER BY a, b DESC, c

as opposed to the "standard" way to do it

(a > $1) OR
(a = $1 AND b < $2) OR
(a = $1 AND b = $2 AND c > $3)
...
ORDER BY a, b DESC, c

which may not even get optimized correctly.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: David Fetter (#4)
1 attachment(s)
Re: Reverse collations (initially for making keyset pagination cover more cases)

"David" == David Fetter <david@fetter.org> writes:

First, in testing the patch I found there were indeed some missing
cases: the sortsupport version of the comparator needs to be fixed too.
I attach a draft addition to your patch, you should probably look at
adding test cases that need this to work.

David> (a, b, c) < ($1, $2 COLLATE "C_backwards", $3)
David> ...
David> ORDER BY a, b DESC, c

That would have to be:

WHERE (a, b COLLATE "C_backwards", c) < ($1, $2, $3)
...
ORDER BY a, b COLLATE "C_backwards", c

Adding the below patch to yours, I can get this on the regression test
db (note that this is a -O0 asserts build, timings may be slow relative
to a production build):

create collation "C_rev" ( LOCALE = "C", REVERSE = true );
create index on tenk1 (hundred, (stringu1::text collate "C_rev"), string4);

explain analyze
select hundred, stringu1::text, string4
from tenk1
where (hundred, stringu1::text COLLATE "C_rev", string4)

(10, 'WKAAAA', 'VVVVxx')

order by hundred, (stringu1::text collate "C_rev"), string4
limit 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..1.28 rows=5 width=132) (actual time=0.029..0.038 rows=5 loops=1)
-> Index Scan using tenk1_hundred_stringu1_string4_idx on tenk1 (cost=0.29..1768.49 rows=8900 width=132) (actual time=0.028..0.036 rows=5 loops=1)
Index Cond: (ROW(hundred, ((stringu1)::text)::text, string4) > ROW(10, 'WKAAAA'::text, 'VVVVxx'::name))
Planning Time: 0.225 ms
Execution Time: 0.072 ms
(5 rows)

and I checked the results, and they look correct now.

--
Andrew (irc:RhodiumToad)

Attachments:

revcoll_x.patchtext/x-patchDownload
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 02cbcbd23d..61ab9720c5 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -84,6 +84,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
+	bool		reverse;
 	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
@@ -2090,6 +2091,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		/* Initialize */
 		sss->last_returned = 0;
 		sss->locale = locale;
+		sss->reverse = (locale != 0) && locale->reverse;
 
 		/*
 		 * To avoid somehow confusing a strxfrm() blob and an original string,
@@ -2401,6 +2403,9 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 		(!sss->locale || sss->locale->deterministic))
 		result = strcmp(sss->buf1, sss->buf2);
 
+	if (sss->reverse)
+		INVERT_COMPARE_RESULT(result);
+
 	/* Cache result, perhaps saving an expensive strcoll() call next time */
 	sss->cache_blob = false;
 	sss->last_returned = result;
@@ -2663,6 +2668,13 @@ done:
 	 */
 	res = DatumBigEndianToNative(res);
 
+	/*
+	 * Account for reverse-ordering locales by flipping the bits. Note that
+	 * Datum is an unsigned type (uintptr_t).
+	 */
+	if (sss->reverse)
+		res ^= ~(Datum)0;
+
 	/* Don't leak memory here */
 	if (PointerGetDatum(authoritative) != original)
 		pfree(authoritative);
#6David Fetter
david@fetter.org
In reply to: Andrew Gierth (#5)
1 attachment(s)
Re: Reverse collations (initially for making keyset pagination cover more cases)

On Sun, Nov 17, 2019 at 11:23:23PM +0000, Andrew Gierth wrote:

"David" == David Fetter <david@fetter.org> writes:

First, in testing the patch I found there were indeed some missing
cases: the sortsupport version of the comparator needs to be fixed too.
I attach a draft addition to your patch, you should probably look at
adding test cases that need this to work.

David> (a, b, c) < ($1, $2 COLLATE "C_backwards", $3)
David> ...
David> ORDER BY a, b DESC, c

That would have to be:

WHERE (a, b COLLATE "C_backwards", c) < ($1, $2, $3)
...
ORDER BY a, b COLLATE "C_backwards", c

Adding the below patch to yours, I can get this on the regression test
db (note that this is a -O0 asserts build, timings may be slow relative
to a production build):

create collation "C_rev" ( LOCALE = "C", REVERSE = true );
create index on tenk1 (hundred, (stringu1::text collate "C_rev"), string4);

explain analyze
select hundred, stringu1::text, string4
from tenk1
where (hundred, stringu1::text COLLATE "C_rev", string4)

(10, 'WKAAAA', 'VVVVxx')

order by hundred, (stringu1::text collate "C_rev"), string4
limit 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..1.28 rows=5 width=132) (actual time=0.029..0.038 rows=5 loops=1)
-> Index Scan using tenk1_hundred_stringu1_string4_idx on tenk1 (cost=0.29..1768.49 rows=8900 width=132) (actual time=0.028..0.036 rows=5 loops=1)
Index Cond: (ROW(hundred, ((stringu1)::text)::text, string4) > ROW(10, 'WKAAAA'::text, 'VVVVxx'::name))
Planning Time: 0.225 ms
Execution Time: 0.072 ms
(5 rows)

and I checked the results, and they look correct now.

Here's that patch with your correction rolled in.

This will need more tests, and possibly more documentation.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

v2-0001-Reverse-collations.patchtext/x-diff; charset=us-asciiDownload
From 2096ea540a23c60b242e9f9c058d25e0d075f20a Mon Sep 17 00:00:00 2001
From: David Fetter <david@fetter.org>
Date: Sat, 16 Nov 2019 09:29:14 -0800
Subject: [PATCH v2] Reverse collations
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.23.0"

This is a multi-part message in MIME format.
--------------2.23.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Make it possible to define collations which reverse the usual meanings
of <, <=, >, and >= for their corresponding collations.

This in turn makes it easier to do keyset pagination on text with mixes
of ASC and DESC in the ORDER BY.

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 55694c4368..3086936515 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2106,6 +2106,13 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>Is the collation deterministic?</entry>
      </row>
 
+     <row>
+      <entry><structfield>collisreverse</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>Is the collation reversed?</entry>
+     </row>
+
      <row>
       <entry><structfield>collencoding</structfield></entry>
       <entry><type>int4</type></entry>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index def4dda6e8..cb913871b7 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -24,7 +24,8 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> (
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
-    [ VERSION = <replaceable>version</replaceable> ]
+    [ VERSION = <replaceable>version</replaceable>, ]
+    [ REVERSE = <replaceable>boolean</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
 </synopsis>
@@ -166,6 +167,16 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>REVERSE</literal></term>
+
+     <listitem>
+      <para>
+       Specifies whether the collation should sort in reverse. Defaults to false.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>existing_collation</replaceable></term>
 
@@ -225,6 +236,13 @@ CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');
 </programlisting>
   </para>
 
+  <para>
+   To create a collation from the <literal>C</literal> locale that sorts backwards:
+<programlisting>
+CREATE COLLATION C_sdrawckab (locale = 'C', reverse = true);
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index dd99d53547..6198f77f36 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -47,6 +47,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				char collprovider,
 				bool collisdeterministic,
+				bool collisreverse,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *collversion,
@@ -162,6 +163,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collowner - 1] = ObjectIdGetDatum(collowner);
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
+	values[Anum_pg_collation_collisreverse - 1] = BoolGetDatum(collisreverse);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
 	namestrcpy(&name_collate, collcollate);
 	values[Anum_pg_collation_collcollate - 1] = NameGetDatum(&name_collate);
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 919e092483..ab37705b27 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -60,11 +60,13 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *reverseEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate = NULL;
 	char	   *collctype = NULL;
 	char	   *collproviderstr = NULL;
 	bool		collisdeterministic = true;
+	bool		collisreverse = false;
 	int			collencoding = 0;
 	char		collprovider = 0;
 	char	   *collversion = NULL;
@@ -95,6 +97,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "reverse") == 0)
+			defelp = &reverseEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -130,6 +134,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collctype = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collctype));
 		collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 		collisdeterministic = ((Form_pg_collation) GETSTRUCT(tp))->collisdeterministic;
+		collisreverse = ((Form_pg_collation) GETSTRUCT(tp))->collisreverse;
 		collencoding = ((Form_pg_collation) GETSTRUCT(tp))->collencoding;
 
 		ReleaseSysCache(tp);
@@ -165,6 +170,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	if (deterministicEl)
 		collisdeterministic = defGetBoolean(deterministicEl);
 
+	if (reverseEl)
+		collisreverse = defGetBoolean(reverseEl);
+
 	if (versionEl)
 		collversion = defGetString(versionEl);
 
@@ -222,6 +230,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 GetUserId(),
 							 collprovider,
 							 collisdeterministic,
+							 collisreverse,
 							 collencoding,
 							 collcollate,
 							 collctype,
@@ -605,7 +614,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 * about existing ones.
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
-									 COLLPROVIDER_LIBC, true, enc,
+									 COLLPROVIDER_LIBC, true, false, enc,
 									 localebuf, localebuf,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
@@ -666,7 +675,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			int			enc = aliases[i].enc;
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
-									 COLLPROVIDER_LIBC, true, enc,
+									 COLLPROVIDER_LIBC, true, false, enc,
 									 locale, locale,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
@@ -728,7 +737,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
-									 COLLPROVIDER_ICU, true, -1,
+									 COLLPROVIDER_ICU, true, false, -1,
 									 collcollate, collcollate,
 									 get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
 									 true, true);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index fcdbaae37b..f9e47c8bff 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1155,7 +1155,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		collcollate = NameStr(collform->collcollate);
 		collctype = NameStr(collform->collctype);
 
-		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
+		cache_entry->collate_is_c = !collform->collisreverse &&
+									((strcmp(collcollate, "C") == 0) ||
 									 (strcmp(collcollate, "POSIX") == 0));
 		cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
 								   (strcmp(collctype, "POSIX") == 0));
@@ -1357,6 +1358,7 @@ pg_newlocale_from_collation(Oid collid)
 		memset(&result, 0, sizeof(result));
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
+		result.reverse = collform->collisreverse;
 
 		if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 69165eb311..61ab9720c5 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -84,6 +84,7 @@ typedef struct
 	int			last_returned;	/* Last comparison result (cache) */
 	bool		cache_blob;		/* Does buf2 contain strxfrm() blob, etc? */
 	bool		collate_c;
+	bool		reverse;
 	Oid			typid;			/* Actual datatype (text/bpchar/bytea/name) */
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
@@ -1603,6 +1604,9 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 			if (a2p != a2buf)
 				pfree(a2p);
 
+			if (collid != DEFAULT_COLLATION_OID && mylocale->reverse)
+				INVERT_COMPARE_RESULT(result);
+
 			return result;
 		}
 #endif							/* WIN32 */
@@ -1681,6 +1685,9 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 			(!mylocale || mylocale->deterministic))
 			result = strcmp(a1p, a2p);
 
+		if (collid != DEFAULT_COLLATION_OID && mylocale->reverse)
+			INVERT_COMPARE_RESULT(result);
+
 		if (a1p != a1buf)
 			pfree(a1p);
 		if (a2p != a2buf)
@@ -2084,6 +2091,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		/* Initialize */
 		sss->last_returned = 0;
 		sss->locale = locale;
+		sss->reverse = (locale != 0) && locale->reverse;
 
 		/*
 		 * To avoid somehow confusing a strxfrm() blob and an original string,
@@ -2395,6 +2403,9 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 		(!sss->locale || sss->locale->deterministic))
 		result = strcmp(sss->buf1, sss->buf2);
 
+	if (sss->reverse)
+		INVERT_COMPARE_RESULT(result);
+
 	/* Cache result, perhaps saving an expensive strcoll() call next time */
 	sss->cache_blob = false;
 	sss->last_returned = result;
@@ -2657,6 +2668,13 @@ done:
 	 */
 	res = DatumBigEndianToNative(res);
 
+	/*
+	 * Account for reverse-ordering locales by flipping the bits. Note that
+	 * Datum is an unsigned type (uintptr_t).
+	 */
+	if (sss->reverse)
+		res ^= ~(Datum)0;
+
 	/* Don't leak memory here */
 	if (PointerGetDatum(authoritative) != original)
 		pfree(authoritative);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 27602fa49c..61daf205d8 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -954,6 +954,22 @@ get_collation_isdeterministic(Oid colloid)
 	return result;
 }
 
+bool
+get_collation_isreverse(Oid colloid)
+{
+	HeapTuple	tp;
+	Form_pg_collation	colltup;
+	bool		result;
+
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(colloid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for collation %u", colloid);
+	colltup = (Form_pg_collation) GETSTRUCT(tp);
+	result = colltup->collisreverse;
+	ReleaseSysCache(tp);
+	return result;
+}
+
 /*				---------- CONSTRAINT CACHE ----------					 */
 
 /*
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..33e357c300 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1712,8 +1712,8 @@ setup_collation(FILE *cmdfd)
 	 * in pg_collation.h.  But add it before reading system collations, so
 	 * that it wins if libc defines a locale named ucs_basic.
 	 */
-	PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, collcollate, collctype)"
-				   "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, %d, 'C', 'C');\n\n",
+	PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collisreverse, collencoding, collcollate, collctype)"
+				   "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, false, %d, 'C', 'C');\n\n",
 				   BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, PG_UTF8);
 
 	/* Now import all collations we can find in the operating system */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bf69adc2f4..32801744b9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -13466,6 +13466,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 	PGresult   *res;
 	int			i_collprovider;
 	int			i_collisdeterministic;
+	int			i_collisreverse;
 	int			i_collcollate;
 	int			i_collctype;
 	const char *collprovider;
@@ -13501,6 +13502,13 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 		appendPQExpBufferStr(query,
 							 "true AS collisdeterministic, ");
 
+	if (fout->remoteVersion >= 130000)
+		appendPQExpBufferStr(query,
+							 "collisreverse, ");
+	else
+		appendPQExpBufferStr(query,
+							 "false as collisreverse, ");
+
 	appendPQExpBuffer(query,
 					  "collcollate, "
 					  "collctype "
@@ -13512,6 +13520,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 
 	i_collprovider = PQfnumber(res, "collprovider");
 	i_collisdeterministic = PQfnumber(res, "collisdeterministic");
+	i_collisreverse = PQfnumber(res, "collisreverse");
 	i_collcollate = PQfnumber(res, "collcollate");
 	i_collctype = PQfnumber(res, "collctype");
 
@@ -13540,6 +13549,9 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 	if (strcmp(PQgetvalue(res, 0, i_collisdeterministic), "f") == 0)
 		appendPQExpBufferStr(q, ", deterministic = false");
 
+	if (strcmp(PQgetvalue(res, 0, i_collisreverse), "t") == 0)
+		appendPQExpBufferStr(q, ", reverse = true");
+
 	if (strcmp(collcollate, collctype) == 0)
 	{
 		appendPQExpBufferStr(q, ", locale = ");
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b3b9313b36..d87aa96915 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4525,6 +4525,17 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 						  gettext_noop("yes"),
 						  gettext_noop("Deterministic?"));
 
+	if (pset.sversion >= 130000)
+		appendPQExpBuffer(&buf,
+						  ",\n       CASE WHEN c.collisreverse THEN '%s' ELSE '%s' END AS \"%s\"",
+						  gettext_noop("yes"), gettext_noop("no"),
+						  gettext_noop("Reverse?"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       '%s' AS \"%s\"",
+						  gettext_noop("no"),
+						  gettext_noop("Reverse?"));
+
 	if (verbose)
 		appendPQExpBuffer(&buf,
 						  ",\n       pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index d3366f361d..929d521d05 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -34,6 +34,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	Oid			collowner;		/* owner of collation */
 	char		collprovider;	/* see constants below */
 	bool		collisdeterministic BKI_DEFAULT(t);
+	bool		collisreverse BKI_DEFAULT(f);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
 	NameData	collcollate;	/* LC_COLLATE setting */
 	NameData	collctype;		/* LC_CTYPE setting */
@@ -63,6 +64,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							Oid collowner,
 							char collprovider,
 							bool collisdeterministic,
+							bool collisreverse,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *collversion,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bff9f..90e1fbc33a 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -92,6 +92,7 @@ extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 								  Oid *typid, int32 *typmod, Oid *collid);
 extern char *get_collation_name(Oid colloid);
 extern bool get_collation_isdeterministic(Oid colloid);
+extern bool get_collation_isreverse(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern char *get_language_name(Oid langoid, bool missing_ok);
 extern Oid	get_opclass_family(Oid opclass);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b4b3aa5843..d571eab4c6 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -83,6 +83,7 @@ struct pg_locale_struct
 {
 	char		provider;
 	bool		deterministic;
+	bool		reverse;
 	union
 	{
 #ifdef HAVE_LOCALE_T
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0dee7d783a..2e51dde931 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -11,6 +11,7 @@
  */
 CREATE SCHEMA collate_tests;
 SET search_path = collate_tests;
+CREATE COLLATION "C_sdrawckab" (locale = 'C', reverse = true);
 CREATE TABLE collate_test1 (
     a int,
     b text COLLATE "C" NOT NULL
@@ -66,6 +67,14 @@ SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'abc' COLLATE "C";
  3 | bbc
 (2 rows)
 
+SELECT * FROM collate_test1 WHERE b COLLATE "C_sdrawckab" >= 'abc' COLLATE "C_sdrawckab";
+ a |  b  
+---+-----
+ 1 | abc
+ 2 | Abc
+ 4 | ABD
+(3 rows)
+
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'bbc' COLLATE "POSIX"; -- fail
 ERROR:  collation mismatch between explicit collations "C" and "POSIX"
 LINE 1: ...* FROM collate_test1 WHERE b COLLATE "C" >= 'bbc' COLLATE "P...
@@ -682,8 +691,9 @@ SELECT collation for ((SELECT b FROM collate_test1 LIMIT 1));
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 17 other objects
-DETAIL:  drop cascades to table collate_test1
+NOTICE:  drop cascades to 18 other objects
+DETAIL:  drop cascades to collation "C_sdrawckab"
+drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
 drop cascades to type testdomain_p
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index 89de26a227..1317ad4733 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -13,6 +13,8 @@
 CREATE SCHEMA collate_tests;
 SET search_path = collate_tests;
 
+CREATE COLLATION "C_sdrawckab" (locale = 'C', reverse = true);
+
 CREATE TABLE collate_test1 (
     a int,
     b text COLLATE "C" NOT NULL
@@ -42,6 +44,7 @@ INSERT INTO collate_test2 SELECT * FROM collate_test1;
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'abc';
 SELECT * FROM collate_test1 WHERE b >= 'abc' COLLATE "C";
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'abc' COLLATE "C";
+SELECT * FROM collate_test1 WHERE b COLLATE "C_sdrawckab" >= 'abc' COLLATE "C_sdrawckab";
 SELECT * FROM collate_test1 WHERE b COLLATE "C" >= 'bbc' COLLATE "POSIX"; -- fail
 
 CREATE DOMAIN testdomain_p AS text COLLATE "POSIX";

--------------2.23.0--


#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#3)
Re: Reverse collations (initially for making keyset pagination cover more cases)

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Lastly, your proposed use-case has some attraction, but this
Tom> proposal only supports it if the column you need to be differently
Tom> sorted is textual. What if the sort columns are all numerics and
Tom> timestamps?

There are already trivial ways to reverse the orders of those, viz.
(-number) and (-extract(epoch from timestampcol)). The lack of any
equivalent method for text is what prompted this idea.

Those "trivial ways" have failure cases, eg with INT_MIN. I don't buy
that this argument justifies introducing a kluge into text comparison.

Tom> Thinking about that, it seems like what we'd want is some sort of
Tom> more-general notion of row comparison, to express "bounded below
Tom> in an arbitrary ORDER BY ordering". Not quite sure what it ought
Tom> to look like.

Well, one obvious completely general method is to teach the planner
(somehow) to spot conditions of the form
(a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...)
etc. and make them indexable if the sense of the > or < operator at
each step matched an ASC or DESC column in the index.

I think really the only attraction of that is that it could be argued
to be standard --- but I rather doubt that it's common for DBMSes to
recognize such things. It'd certainly be a royal pain in the rear
both to implement and to use, at least for more than about two sort
columns.

Back at
/messages/by-id/10492.1531515255@sss.pgh.pa.us
I proposed that we might consider allowing row comparisons to specify
an explicit list of operators:

One idea for resolving that is to extend the OPERATOR syntax to allow
multiple operator names for row comparisons, along the lines of
ROW(a,b) OPERATOR(pg_catalog.<, public.<) ROW(c,d)

I wonder whether it'd be feasible to solve this problem by doing that
and then allowing the operators to be of different comparison types,
that is "ROW(a,b) OPERATOR(<, >) ROW(c,d)". The semantics would be
that the first not-equal column pair determines the result according
to the relevant operator. But I'm not quite sure what to do if the
rows are in fact equal --- if some of the operators are like "<" and
some are like "<=", what should the result be? Maybe let the last
column's operator decide that?

regards, tom lane

#8Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#7)
Re: Reverse collations (initially for making keyset pagination cover more cases)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Well, one obvious completely general method is to teach the planner
(somehow) to spot conditions of the form
(a > $1 OR (a = $1 AND b > $2) OR (a = $1 AND b = $2 AND c > $3) ...)
etc. and make them indexable if the sense of the > or < operator at
each step matched an ASC or DESC column in the index.

Tom> I think really the only attraction of that is that it could be
Tom> argued to be standard --- but I rather doubt that it's common for
Tom> DBMSes to recognize such things.

At least MSSQL can recognize that a query with

WHERE (a > @a OR (a = @a AND b > @b)) ORDER BY a,b

can be satisfied with an ordered index scan on an (a,b) index and no
sort, which is good enough for pagination queries. Haven't confirmed
yes/no for any other databases yet.

(As an aside, if you try and do that in PG using UNION ALL in place of
the OR, to try and get a mergeappend of two index scans, it doesn't work
well because of how we discard redundant pathkeys; you end up with Sort
nodes in the plan.)

Tom> It'd certainly be a royal pain in the rear both to implement and
Tom> to use, at least for more than about two sort columns.

For pagination one or two columns seems most likely, but in any event
the query can be generated mechanically if need be.

--
Andrew (irc:RhodiumToad)