Conflict of implicit collations doesn't propagate out of subqueries

Started by Markus Winandover 5 years ago3 messages
#1Markus Winand
markus.winand@winand.at
1 attachment(s)

Hi!

I think I’ve found a bug related to the strength of collations. Attached is a
WIP patch, that address some other issues too.

In this this example, the conflict of implicit collations propagates correctly:

WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
FROM data
ORDER BY ( c || posix ) || posix

ERROR: collation mismatch between implicit collations "C" and "POSIX"
LINE 6: ORDER BY ( c || posix ) || posix;
^
HINT: You can choose the collation by applying the COLLATE clause to one or both expressions.

However, if the conflict happens in a subquery, it doesn’t anymore:

WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
FROM (SELECT *, c || posix AS none FROM data) data
ORDER BY none || posix;

c | posix | none
---+-------+------
a | b | ab
(1 row)

The problem is in parse_collate.c:566: A Var (and some other nodes) without
valid collation gets the strength COLLATE_NONE:

if (OidIsValid(collation))
strength = COLLATE_IMPLICIT;
else
strength = COLLATE_NONE;

However, for a Var it should be COLLATE_CONFLICT, which corresponds to the
standards collation derivation “none”. I guess there could be other similar
cases which I don’t know about.

The patch fixes that, plus necessary consequences (error handling for new
scenarios) as well as some “cosmetic” improvements I found along the way.

Unnecessary to mention that this patch might break existing code.

With the patch, the second example form above fails similarly to the first example:

ERROR: collation mismatch between implicit collation "POSIX" and unknown collation
LINE 6: ORDER BY none || posix;
^
HINT: You can choose the collation by applying the COLLATE clause to one or both expressions.

Other changes in the patch:

** Error Handling

The current parse time error handling of implicit collisions has always both
colliding collations. The patch allows a COLLATE_CONFLICT without knowing which
collations caused the conflict (it’s not stored in the Var node). Thus we may
have know two, one or none of the collations that contributed to the collision.

The new function ereport_implicit_collation_mismatch takes care of that.

** Renaming COLLATE_NONE

I found the name COLLATE_NONE a little bit unfortuante as it can easily be
mistaken for the collation derivation “none” the SQL standard uses. I renamed
it to COLLATE_NA for “not applicable”. The standard doesn’t have a word for
that as non character string types just don’t have collations.

** Removal of location2

The code keeps track of the location (for parser_errposition) of both
collations that collided, even though it reports only one of them. The patch
removes location2.

Due to this, the some errors report the other location as before (previously
the second, now the first). See collate.out in the patch. The patch has TODO
comments for code that would be needed to take the other one.

Is there any policy to report the second location or to strictly keep the
errors where they used to be?

** General cleanup

The patch also removes a little code that I think is not needed (anymore).

** Tests

Besides the errposition, only one existing test breaks. It previously triggered
a runtime error, now it triggers a parse time error:

         SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 ORDER BY 2; -- fail
        -ERROR:  could not determine which collation to use for string comparison
        -HINT:  Use the COLLATE clause to set the collation explicitly.
        +ERROR:  collation mismatch between implicit collations
        +LINE 1: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM co...
        +                  ^
        +HINT:  Use the COLLATE clause to set the collation explicitly

The patch also adds a new test to trigger the problem in the first place.

The patch is against REL_12_STABLE but applies to master too.

-markus

Attachments:

WIP_collation_strength_conflict_for_Var_v1.patchapplication/octet-stream; name=WIP_collation_strength_conflict_for_Var_v1.patch; x-unix-mode=0644Download
diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c
index 4d2e586bd6..d67ccebe2d 100644
--- a/src/backend/parser/parse_collate.c
+++ b/src/backend/parser/parse_collate.c
@@ -50,12 +50,13 @@
 
 /*
  * Collation strength (the SQL standard calls this "derivation").  Order is
- * chosen to allow comparisons to work usefully.  Note: the standard doesn't
- * seem to distinguish between NONE and CONFLICT.
+ * chosen to allow comparisons to work usefully in context of the SQL standard
+ * clause "Result of data type combinations". COLLATE_CONFLICT corresponds
+ * to the standard's derivation "none".
  */
 typedef enum
 {
-	COLLATE_NONE,				/* expression is of a noncollatable datatype */
+	COLLATE_NA,				    /* expression is of a noncollatable datatype */
 	COLLATE_IMPLICIT,			/* collation was derived implicitly */
 	COLLATE_CONFLICT,			/* we had a conflict of implicit collations */
 	COLLATE_EXPLICIT			/* collation was derived explicitly */
@@ -67,9 +68,14 @@ typedef struct
 	Oid			collation;		/* OID of current collation, if any */
 	CollateStrength strength;	/* strength of current collation choice */
 	int			location;		/* location of expr that set collation */
-	/* Remaining fields are only valid when strength == COLLATE_CONFLICT */
+	/*
+	 * If two known implicit collations conflict (strength == COLLATE_CONFLICT),
+	 * the following is set for error reporting.
+	 * Note: Implicit collations can also conflict with COLLATE_CONFLICT. In that
+	 * case collation2 == InvalidOid. Additionally, the collation above
+	 * could also be InvalidOid.
+	 */
 	Oid			collation2;		/* OID of conflicting collation */
-	int			location2;		/* location of expr that set collation2 */
 } assign_collations_context;
 
 static bool assign_query_collations_walker(Node *node, ParseState *pstate);
@@ -79,7 +85,6 @@ static void merge_collation_state(Oid collation,
 								  CollateStrength strength,
 								  int location,
 								  Oid collation2,
-								  int location2,
 								  assign_collations_context *context);
 static void assign_aggregate_collations(Aggref *aggref,
 										assign_collations_context *loccontext);
@@ -181,13 +186,46 @@ assign_expr_collations(ParseState *pstate, Node *expr)
 	/* initialize context for tree walk */
 	context.pstate = pstate;
 	context.collation = InvalidOid;
-	context.strength = COLLATE_NONE;
+	context.strength = COLLATE_NA;
 	context.location = -1;
 
 	/* and away we go */
 	(void) assign_collations_walker(expr, &context);
 }
 
+/*
+ * ereport_implicit_collation_mismatch()
+ *		Report mismatch with best available details
+ *
+ * COLLATE_CONFLICT can happen without knowing the conflicting collations (when
+ * it comes from a Var). Thus, the available details for reporting may vary.
+ */
+static void
+ereport_implicit_collation_mismatch(const assign_collations_context* const context)
+{
+	if (context->collation != InvalidOid && context->collation2 != InvalidOid)
+		ereport(ERROR,
+				(errcode(ERRCODE_COLLATION_MISMATCH),
+				 errmsg("collation mismatch between implicit collations \"%s\" and \"%s\"",
+						get_collation_name(context->collation),
+						get_collation_name(context->collation2)),
+				 errhint("You can choose the collation by applying the COLLATE clause to one or both expressions."),
+				 parser_errposition(context->pstate, context->location)));
+	else if (context->collation != InvalidOid)
+		ereport(ERROR,
+				(errcode(ERRCODE_COLLATION_MISMATCH),
+				 errmsg("collation mismatch between implicit collation \"%s\" and unknown collation",
+						get_collation_name(context->collation)),
+				 errhint("You can choose the collation by applying the COLLATE clause to one or both expressions."),
+				 parser_errposition(context->pstate, context->location)));
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_COLLATION_MISMATCH),
+				 errmsg("collation mismatch between implicit collations"),
+				 errhint("Use the COLLATE clause to set the collation explicitly"),
+				 parser_errposition(context->pstate, context->location)));
+}
+
 /*
  * select_common_collation()
  *		Identify a common collation for a list of expressions.
@@ -212,7 +250,7 @@ select_common_collation(ParseState *pstate, List *exprs, bool none_ok)
 	/* initialize context for tree walk */
 	context.pstate = pstate;
 	context.collation = InvalidOid;
-	context.strength = COLLATE_NONE;
+	context.strength = COLLATE_NA;
 	context.location = -1;
 
 	/* and away we go */
@@ -223,17 +261,11 @@ select_common_collation(ParseState *pstate, List *exprs, bool none_ok)
 	{
 		if (none_ok)
 			return InvalidOid;
-		ereport(ERROR,
-				(errcode(ERRCODE_COLLATION_MISMATCH),
-				 errmsg("collation mismatch between implicit collations \"%s\" and \"%s\"",
-						get_collation_name(context.collation),
-						get_collation_name(context.collation2)),
-				 errhint("You can choose the collation by applying the COLLATE clause to one or both expressions."),
-				 parser_errposition(context.pstate, context.location2)));
+		ereport_implicit_collation_mismatch(&context);
 	}
 
 	/*
-	 * Note: if strength is still COLLATE_NONE, we'll return InvalidOid, but
+	 * Note: if strength is still COLLATE_NA, we'll return InvalidOid, but
 	 * that's okay because it must mean none of the expressions returned
 	 * collatable datatypes.
 	 */
@@ -257,7 +289,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 	assign_collations_context loccontext;
 	Oid			collation;
 	CollateStrength strength;
-	int			location;
+	int			location = -1;
 
 	/* Need do nothing for empty subexpressions */
 	if (node == NULL)
@@ -270,11 +302,8 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 	 */
 	loccontext.pstate = context->pstate;
 	loccontext.collation = InvalidOid;
-	loccontext.strength = COLLATE_NONE;
+	loccontext.strength = COLLATE_NA;
 	loccontext.location = -1;
-	/* Set these fields just to suppress uninitialized-value warnings: */
-	loccontext.collation2 = InvalidOid;
-	loccontext.location2 = -1;
 
 	/*
 	 * Recurse if appropriate, then determine the collation for this node.
@@ -331,7 +360,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 				{
 					/* Node's result type isn't collatable. */
 					collation = InvalidOid;
-					strength = COLLATE_NONE;
+					strength = COLLATE_NA;
 					location = -1;	/* won't be used */
 				}
 			}
@@ -427,7 +456,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 				{
 					/* Node's result type isn't collatable. */
 					collation = InvalidOid;
-					strength = COLLATE_NONE;
+					strength = COLLATE_NA;
 					location = -1;	/* won't be used */
 				}
 
@@ -470,14 +499,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 			 */
 			if (strength == COLLATE_CONFLICT &&
 				((TargetEntry *) node)->ressortgroupref != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_COLLATION_MISMATCH),
-						 errmsg("collation mismatch between implicit collations \"%s\" and \"%s\"",
-								get_collation_name(loccontext.collation),
-								get_collation_name(loccontext.collation2)),
-						 errhint("You can choose the collation by applying the COLLATE clause to one or both expressions."),
-						 parser_errposition(context->pstate,
-											loccontext.location2)));
+				ereport_implicit_collation_mismatch(&loccontext);
 			break;
 		case T_InferenceElem:
 		case T_RangeTblRef:
@@ -565,8 +587,14 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 
 			if (OidIsValid(collation))
 				strength = COLLATE_IMPLICIT;
+			else if (nodeTag(node) == T_Var && type_is_collatable(exprType(node)))
+				/*
+				 * A Var of a collatable type that has no collation has
+				 * COLLATE_CONFLICT as per SQL standard subclause "<identifier chain>".
+				 */
+				strength = COLLATE_CONFLICT;
 			else
-				strength = COLLATE_NONE;
+				strength = COLLATE_NA;
 			location = exprLocation(node);
 			break;
 
@@ -686,7 +714,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 				if (OidIsValid(typcollation))
 				{
 					/* Node's result is collatable; what about its input? */
-					if (loccontext.strength > COLLATE_NONE)
+					if (loccontext.strength > COLLATE_NA)
 					{
 						/* Collation state bubbles up from children. */
 						collation = loccontext.collation;
@@ -710,7 +738,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 				{
 					/* Node's result type isn't collatable. */
 					collation = InvalidOid;
-					strength = COLLATE_NONE;
+					strength = COLLATE_NA;
 					location = -1;	/* won't be used */
 				}
 
@@ -743,7 +771,6 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 						  strength,
 						  location,
 						  loccontext.collation2,
-						  loccontext.location2,
 						  context);
 
 	return false;
@@ -757,7 +784,6 @@ merge_collation_state(Oid collation,
 					  CollateStrength strength,
 					  int location,
 					  Oid collation2,
-					  int location2,
 					  assign_collations_context *context)
 {
 	/*
@@ -767,15 +793,24 @@ merge_collation_state(Oid collation,
 	 */
 	if (strength > context->strength)
 	{
-		/* Override previous parent state */
-		context->collation = collation;
 		context->strength = strength;
-		context->location = location;
+
+		/*
+		 * The stronger collation can be COLLATE_CONFLICT without known collation
+		 * (collation == InvalidOid). In that case don't overwrite possibly
+		 * useful reporting details.
+		 */
+		if ( ! (strength == COLLATE_CONFLICT && collation == InvalidOid))
+		{
+			context->collation = collation;
+			context->location = location;
+		}
+
 		/* Bubble up error info if applicable */
 		if (strength == COLLATE_CONFLICT)
 		{
 			context->collation2 = collation2;
-			context->location2 = location2;
+			context->location = location;
 		}
 	}
 	else if (strength == context->strength)
@@ -783,7 +818,7 @@ merge_collation_state(Oid collation,
 		/* Merge, or detect error if there's a collation conflict */
 		switch (strength)
 		{
-			case COLLATE_NONE:
+			case COLLATE_NA:
 				/* Nothing + nothing is still nothing */
 				break;
 			case COLLATE_IMPLICIT:
@@ -796,8 +831,10 @@ merge_collation_state(Oid collation,
 					{
 						/* Override previous parent state */
 						context->collation = collation;
-						context->strength = strength;
 						context->location = location;
+
+						/* It's the same-strength branch, no need to copy */
+						Assert(context->strength == strength);
 					}
 					else if (collation != DEFAULT_COLLATION_OID)
 					{
@@ -810,12 +847,23 @@ merge_collation_state(Oid collation,
 						 */
 						context->strength = COLLATE_CONFLICT;
 						context->collation2 = collation;
-						context->location2 = location;
+
+						// TODO: Uncomment to keep error messages at same place
+						// context->location = location;
 					}
 				}
 				break;
 			case COLLATE_CONFLICT:
-				/* We're still conflicted ... */
+				/*
+				 * We're still conflicted. If there is no collation for error
+				 * reporting yet, take this one.
+				 */
+				if (context->collation == InvalidOid)
+				{
+					context->collation = collation;
+					// TODO: Uncomment to keep error messages at same place
+					// context->location = location;
+				}
 				break;
 			case COLLATE_EXPLICIT:
 				if (collation != context->collation)
@@ -836,6 +884,19 @@ merge_collation_state(Oid collation,
 				break;
 		}
 	}
+	else if (context->strength == COLLATE_CONFLICT && context->collation == InvalidOid)
+	{
+		/*
+		 * The node is already the result of a conflict between two unknown
+		 * collations. Let's use the children's collation for reporting.
+		 * Note; The children's strength is COLLATE_IMPLICIT or COLLATE_NA,
+		 * the latter case doesn't do any harm so it is not explicitly checked.
+		 */
+		Assert(strength == COLLATE_IMPLICIT || strength == COLLATE_NA);
+		context->collation = collation;
+		// TODO: Uncomment to keep error messages at same place
+		// context->location = location;
+	}
 }
 
 /*
@@ -965,25 +1026,15 @@ assign_hypothetical_collations(Aggref *aggref,
 		 */
 		paircontext.pstate = loccontext->pstate;
 		paircontext.collation = InvalidOid;
-		paircontext.strength = COLLATE_NONE;
+		paircontext.strength = COLLATE_NA;
 		paircontext.location = -1;
-		/* Set these fields just to suppress uninitialized-value warnings: */
-		paircontext.collation2 = InvalidOid;
-		paircontext.location2 = -1;
 
 		(void) assign_collations_walker(h_arg, &paircontext);
 		(void) assign_collations_walker((Node *) s_tle->expr, &paircontext);
 
 		/* deal with collation conflict */
 		if (paircontext.strength == COLLATE_CONFLICT)
-			ereport(ERROR,
-					(errcode(ERRCODE_COLLATION_MISMATCH),
-					 errmsg("collation mismatch between implicit collations \"%s\" and \"%s\"",
-							get_collation_name(paircontext.collation),
-							get_collation_name(paircontext.collation2)),
-					 errhint("You can choose the collation by applying the COLLATE clause to one or both expressions."),
-					 parser_errposition(paircontext.pstate,
-										paircontext.location2)));
+			ereport_implicit_collation_mismatch(&paircontext);
 
 		/*
 		 * At this point paircontext.collation can be InvalidOid only if the
@@ -1024,7 +1075,6 @@ assign_hypothetical_collations(Aggref *aggref,
 								  paircontext.strength,
 								  paircontext.location,
 								  paircontext.collation2,
-								  paircontext.location2,
 								  loccontext);
 
 		h_cell = lnext(h_cell);
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0dee7d783a..78ee8f6efa 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -382,7 +382,7 @@ SELECT array_agg(a ORDER BY x COLLATE "C", y COLLATE "POSIX") FROM collate_test1
 SELECT array_agg(a ORDER BY x||y) FROM collate_test10;  -- fail
 ERROR:  collation mismatch between implicit collations "C" and "POSIX"
 LINE 1: SELECT array_agg(a ORDER BY x||y) FROM collate_test10;
-                                       ^
+                                    ^
 HINT:  You can choose the collation by applying the COLLATE clause to one or both expressions.
 SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test1 ORDER BY 2;
  a |  b  
@@ -422,8 +422,10 @@ SELECT a, b FROM collate_test2 EXCEPT SELECT a, b FROM collate_test2 WHERE a < 2
 (3 rows)
 
 SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 ORDER BY 2; -- fail
-ERROR:  could not determine which collation to use for string comparison
-HINT:  Use the COLLATE clause to set the collation explicitly.
+ERROR:  collation mismatch between implicit collations
+LINE 1: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM co...
+                  ^
+HINT:  Use the COLLATE clause to set the collation explicitly
 SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2; -- ok
  a |  b  
 ---+-----
@@ -440,7 +442,7 @@ SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2; -- ok
 SELECT a, b FROM collate_test1 UNION SELECT a, b FROM collate_test2 ORDER BY 2; -- fail
 ERROR:  collation mismatch between implicit collations "C" and "POSIX"
 LINE 1: SELECT a, b FROM collate_test1 UNION SELECT a, b FROM collat...
-                                                       ^
+                  ^
 HINT:  You can choose the collation by applying the COLLATE clause to one or both expressions.
 SELECT a, b COLLATE "C" FROM collate_test1 UNION SELECT a, b FROM collate_test2 ORDER BY 2; -- ok
  a |  b  
@@ -453,13 +455,13 @@ SELECT a, b COLLATE "C" FROM collate_test1 UNION SELECT a, b FROM collate_test2
 
 SELECT a, b FROM collate_test1 INTERSECT SELECT a, b FROM collate_test2 ORDER BY 2; -- fail
 ERROR:  collation mismatch between implicit collations "C" and "POSIX"
-LINE 1: ...ELECT a, b FROM collate_test1 INTERSECT SELECT a, b FROM col...
-                                                             ^
+LINE 1: SELECT a, b FROM collate_test1 INTERSECT SELECT a, b FROM co...
+                  ^
 HINT:  You can choose the collation by applying the COLLATE clause to one or both expressions.
 SELECT a, b FROM collate_test1 EXCEPT SELECT a, b FROM collate_test2 ORDER BY 2; -- fail
 ERROR:  collation mismatch between implicit collations "C" and "POSIX"
 LINE 1: SELECT a, b FROM collate_test1 EXCEPT SELECT a, b FROM colla...
-                                                        ^
+                  ^
 HINT:  You can choose the collation by applying the COLLATE clause to one or both expressions.
 CREATE TABLE test_u AS SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2; -- fail
 ERROR:  no collation was derived for column "b" with collatable type text
@@ -478,7 +480,7 @@ select x || y from collate_test10; -- ok, because || is not collation aware
 select x, y from collate_test10 order by x || y; -- not so ok
 ERROR:  collation mismatch between implicit collations "C" and "POSIX"
 LINE 1: select x, y from collate_test10 order by x || y;
-                                                      ^
+                                                 ^
 HINT:  You can choose the collation by applying the COLLATE clause to one or both expressions.
 -- collation mismatch between recursive and non-recursive term
 WITH RECURSIVE foo(x) AS
@@ -513,6 +515,12 @@ SELECT * FROM collate_test10 WHERE (x, y) NOT IN (SELECT y COLLATE "C", x COLLAT
 ---+---+---
 (0 rows)
 
+-- unknown collation from subquery conflicts with implicit collation in outer query
+SELECT * FROM (select x, x||y xy from collate_test10 ) t  order by xy || x;
+ERROR:  collation mismatch between implicit collation "C" and unknown collation
+LINE 1: ...select x, x||y xy from collate_test10 ) t  order by xy || x;
+                                                               ^
+HINT:  You can choose the collation by applying the COLLATE clause to one or both expressions.
 -- casting
 SELECT CAST('42' AS text COLLATE "C");
 ERROR:  syntax error at or near "COLLATE"
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index 89de26a227..0c9638a703 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -169,6 +169,9 @@ SELECT * FROM collate_test10 WHERE (x, y) NOT IN (SELECT y, x FROM collate_test1
 SELECT * FROM collate_test10 WHERE (x COLLATE "POSIX", y COLLATE "C") NOT IN (SELECT y, x FROM collate_test10);
 SELECT * FROM collate_test10 WHERE (x, y) NOT IN (SELECT y COLLATE "C", x COLLATE "POSIX" FROM collate_test10);
 
+-- unknown collation from subquery conflicts with implicit collation in outer query
+SELECT * FROM (select x, x||y xy from collate_test10) t  order by xy || x; -- fail
+
 -- casting
 
 SELECT CAST('42' AS text COLLATE "C");
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markus Winand (#1)
Re: Conflict of implicit collations doesn't propagate out of subqueries

Markus Winand <markus.winand@winand.at> writes:

However, if the conflict happens in a subquery, it doesn’t anymore:

WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
FROM (SELECT *, c || posix AS none FROM data) data
ORDER BY none || posix;

c | posix | none
---+-------+------
a | b | ab
(1 row)

I'm not exactly convinced this is a bug. Can you cite chapter and verse
in the spec to justify throwing an error?

AIUI, collation conflicts can only occur within a single expression, and
this is not that. Moreover, even if data.none arguably has no collation,
treating it from outside the sub-query as having collation strength "none"
seems to me to be similar to our policy of promoting unknown-type subquery
outputs to type "text" rather than leaving them to cause trouble later.
It's not pedantically correct, but nobody liked the old behavior.

regards, tom lane

#3Markus Winand
markus.winand@winand.at
In reply to: Tom Lane (#2)
Re: Conflict of implicit collations doesn't propagate out of subqueries

On 28.05.2020, at 23:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Markus Winand <markus.winand@winand.at> writes:

However, if the conflict happens in a subquery, it doesn’t anymore:

WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
FROM (SELECT *, c || posix AS none FROM data) data
ORDER BY none || posix;

c | posix | none
---+-------+------
a | b | ab
(1 row)

I'm not exactly convinced this is a bug. Can you cite chapter and verse
in the spec to justify throwing an error?

I think it is 6.6 Syntax Rule 17:

• 17) If the declared type of a <basic identifier chain> BIC is character string, then the collation derivation of the declared type of BIC is

Case:

• a) If the declared type has a declared type collation DTC, then implicit.

• b) Otherwise, none.

That gives derivation “none” to the column.

When this is concatenated, 9.5 ("Result of data type combinations”) SR 3 a ii 3 applies:

• ii) The collation derivation and declared type collation of the result are determined as follows. Case:

• 1) If some data type in DTS has an explicit collation derivation [… doesn’t apply]
• 2) If every data type in DTS has an implicit collation derivation, then [… doesn’t apply beause of “every"]
• 3) Otherwise, the collation derivation is none. [applies]

Also, the standard doesn’t have a forth derivation (strength). It also says that
not having a declared type collation implies the derivation “none”. See 4.2.2:

Every declared type that is a character string type has a collation
derivation, this being either none, implicit, or explicit. The
collation derivation of a declared type with a declared type collation
that is explicitly or implicitly specified by a <data type> is implicit.
If the collation derivation of a declared type that has a declared type
collation is not implicit, then it is explicit. The collation derivation
of an expression of character string type that has no declared type
collation is none.

-markus