Bug Fix: COLLATE with multiple ORDER BYs in aggregates

Started by David Fetterover 12 years ago7 messages
#1David Fetter
david@fetter.org
1 attachment(s)

Folks,

While testing the upcoming FILTER clause for aggregates, Erik Rijkers
uncovered a long-standing bug in $subject, namely that this case
wasn't handled. Please find attached a patch by Andrew Gierth and
myself which fixes this issue and adds a regression test to ensure it
remains fixed.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

fix_collate_for_order_agg_20130423.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c
index 7ed50cd..f432360 100644
--- a/src/backend/parser/parse_collate.c
+++ b/src/backend/parser/parse_collate.c
@@ -319,86 +319,6 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 				}
 			}
 			break;
-		case T_CaseExpr:
-			{
-				/*
-				 * CaseExpr is a special case because we do not want to
-				 * recurse into the test expression (if any).  It was already
-				 * marked with collations during transformCaseExpr, and
-				 * furthermore its collation is not relevant to the result of
-				 * the CASE --- only the output expressions are. So we can't
-				 * use expression_tree_walker here.
-				 */
-				CaseExpr   *expr = (CaseExpr *) node;
-				Oid			typcollation;
-				ListCell   *lc;
-
-				foreach(lc, expr->args)
-				{
-					CaseWhen   *when = (CaseWhen *) lfirst(lc);
-
-					Assert(IsA(when, CaseWhen));
-
-					/*
-					 * The condition expressions mustn't affect the CASE's
-					 * result collation either; but since they are known to
-					 * yield boolean, it's safe to recurse directly on them
-					 * --- they won't change loccontext.
-					 */
-					(void) assign_collations_walker((Node *) when->expr,
-													&loccontext);
-					(void) assign_collations_walker((Node *) when->result,
-													&loccontext);
-				}
-				(void) assign_collations_walker((Node *) expr->defresult,
-												&loccontext);
-
-				/*
-				 * Now determine the CASE's output collation.  This is the
-				 * same as the general case below.
-				 */
-				typcollation = get_typcollation(exprType(node));
-				if (OidIsValid(typcollation))
-				{
-					/* Node's result is collatable; what about its input? */
-					if (loccontext.strength > COLLATE_NONE)
-					{
-						/* Collation state bubbles up from children. */
-						collation = loccontext.collation;
-						strength = loccontext.strength;
-						location = loccontext.location;
-					}
-					else
-					{
-						/*
-						 * Collatable output produced without any collatable
-						 * input.  Use the type's collation (which is usually
-						 * DEFAULT_COLLATION_OID, but might be different for a
-						 * domain).
-						 */
-						collation = typcollation;
-						strength = COLLATE_IMPLICIT;
-						location = exprLocation(node);
-					}
-				}
-				else
-				{
-					/* Node's result type isn't collatable. */
-					collation = InvalidOid;
-					strength = COLLATE_NONE;
-					location = -1;		/* won't be used */
-				}
-
-				/*
-				 * Save the state into the expression node.  We know it
-				 * doesn't care about input collation.
-				 */
-				if (strength == COLLATE_CONFLICT)
-					exprSetCollation(node, InvalidOid);
-				else
-					exprSetCollation(node, collation);
-			}
-			break;
 		case T_RowExpr:
 			{
 				/*
@@ -634,9 +554,83 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 				 */
 				Oid			typcollation;
 
-				(void) expression_tree_walker(node,
-											  assign_collations_walker,
-											  (void *) &loccontext);
+				/*
+				 * Some node types use part of the general case with
+				 * somepecial processing.  Do them here rather than
+				 * duplicate code.
+				 */
+				switch (nodeTag(node))
+				{
+					case T_CaseExpr:
+						{
+							/*
+							 * CaseExpr is a special case because we
+							 * do not want to recurse into the test
+							 * expression (if any).  It was already
+							 * marked with collations during
+							 * transformCaseExpr, and furthermore its
+							 * collation is not relevant to the result
+							 * of the CASE --- only the output
+							 * expressions are. So we can't use
+							 * expression_tree_walker here.
+							 */
+							CaseExpr	*expr = (CaseExpr *) node;
+							ListCell	*lc;
+
+							foreach(lc, expr->args)
+							{
+								CaseWhen	*when = (CaseWhen *) lfirst(lc);
+
+								Assert(IsA(when, CaseWhen));
+
+								/*
+								 * The condition expressions mustn't
+								 * affect the CASE's result collation
+								 * either; but since they are known to
+								 * yield boolean, it's safe to recurse
+								 * directly on them.  They won't
+								 * change loccontext.
+								 */
+								(void) assign_collations_walker((Node *) when->expr,
+																		 &loccontext);
+								(void) assign_collations_walker((Node *) when->result,
+																		 &loccontext);
+							}
+							(void) assign_collations_walker((Node *) expr->defresult,
+																	 &loccontext);
+						}
+						break;
+
+					case T_Aggref:
+						{
+							/*
+							 * Aggref is special because expressions
+							 * used only for ordering shouldn't be
+							 * taken to conflict with each other or
+							 * with other args.
+							 */
+
+							Aggref *aggref = (Aggref *) node;
+							ListCell *lc;
+
+							foreach (lc, aggref->args)
+							{
+								TargetEntry	*tle = (TargetEntry *) lfirst(lc);
+
+								if (tle->resjunk)
+									assign_expr_collations(context->pstate, (Node *) tle);
+								else
+									(void) assign_collations_walker((Node *) tle,
+																	&loccontext);
+							}
+						}
+						break;
+
+					default:
+						(void) expression_tree_walker(node,
+													  assign_collations_walker,
+													  (void *) &loccontext);
+				}
 
 				typcollation = get_typcollation(exprType(node));
 				if (OidIsValid(typcollation))
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 4ab9566..5ad2705 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -362,6 +362,13 @@ SELECT array_agg(b ORDER BY b) FROM collate_test2;
  {ABD,Abc,abc,bbc}
 (1 row)
 
+-- Make sure multiple COLLATEs work correctly in aggregates with ORDER BY.
+SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b);
+ array_agg 
+-----------
+ {foo}
+(1 row)
+
 SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test1 ORDER BY 2;
  a |  b  
 ---+-----
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index 3c960e7..c32d042 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -128,6 +128,9 @@ SELECT min(b), max(b) FROM collate_test2;
 SELECT array_agg(b ORDER BY b) FROM collate_test1;
 SELECT array_agg(b ORDER BY b) FROM collate_test2;
 
+-- Make sure multiple COLLATEs work correctly in aggregates with ORDER BY.
+SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b);
+
 SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test1 ORDER BY 2;
 SELECT a, b FROM collate_test2 UNION SELECT a, b FROM collate_test2 ORDER BY 2;
 SELECT a, b FROM collate_test2 WHERE a < 4 INTERSECT SELECT a, b FROM collate_test2 WHERE a > 1 ORDER BY 2;
#2David Fetter
david@fetter.org
In reply to: David Fetter (#1)
Examples Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates

On Tue, Apr 23, 2013 at 09:57:27AM -0700, David Fetter wrote:

Folks,

While testing the upcoming FILTER clause for aggregates, Erik Rijkers
uncovered a long-standing bug in $subject, namely that this case
wasn't handled. Please find attached a patch by Andrew Gierth and
myself which fixes this issue and adds a regression test to ensure it
remains fixed.

Please see below results when I run the regression test query on git
master, REL_9_2_STABLE, and REL9_1_STABLE, respectively:

$ psql postgres
psql (9.3devel)
Type "help" for help.

shackle@postgres:5493=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b);
ERROR: collation mismatch between explicit collations "C" and "POSIX"
LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F...
^

$ psql postgres
psql (9.2.4)
Type "help" for help.

shackle@postgres:5492=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b);
ERROR: collation mismatch between explicit collations "C" and "POSIX"
LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F...
^

$ psql postgres
psql (9.1.9)
Type "help" for help.

shackle@postgres:5491=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b);
ERROR: collation mismatch between explicit collations "C" and "POSIX"
LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F...
^
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#1)
Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates

David Fetter <david@fetter.org> writes:

While testing the upcoming FILTER clause for aggregates, Erik Rijkers
uncovered a long-standing bug in $subject, namely that this case
wasn't handled. Please find attached a patch by Andrew Gierth and
myself which fixes this issue and adds a regression test to ensure it
remains fixed.

I don't find this patch to be a good idea.

The argument for it seems to be that

array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX")

should not throw an error, but why not? And what does that have to do
with whacking around the code for CASE?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates

On 2013-04-25 13:42:32 -0400, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

While testing the upcoming FILTER clause for aggregates, Erik Rijkers
uncovered a long-standing bug in $subject, namely that this case
wasn't handled. Please find attached a patch by Andrew Gierth and
myself which fixes this issue and adds a regression test to ensure it
remains fixed.

I don't find this patch to be a good idea.

The argument for it seems to be that

array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX")

should not throw an error, but why not?

Uh. Why should it? SELECT foo COLLATE "C" FROM ... ORDER BY bar COLLATE
"POSIX" doesn't throw one either?

And what does that have to do with whacking around the code for CASE?

I guess that's to avoid to repeat that already triplicated block of code
once more. The goal seems to make sense to me, although I am not 100%
that thats the nicest solution to get of the repetition.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-04-25 13:42:32 -0400, Tom Lane wrote:

The argument for it seems to be that
array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX")
should not throw an error, but why not?

Uh. Why should it? SELECT foo COLLATE "C" FROM ... ORDER BY bar COLLATE
"POSIX" doesn't throw one either?

After thinking about it a bit more, this case *should* throw an error:

string_agg(a COLLATE "C", b COLLATE "POSIX")

but these should not:

array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX")

array_agg(a ORDER BY b COLLATE "C", c COLLATE "POSIX")

that is, the ORDER BY expression(s) ought to be considered independently
rather than as part of the agg's argument list.

It looks like the proposed patch gets this right, but the proposed
test cases really fail to illuminate the problem IMO.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6David Fetter
david@fetter.org
In reply to: Tom Lane (#5)
Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates

On Thu, Apr 25, 2013 at 06:04:10PM -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-04-25 13:42:32 -0400, Tom Lane wrote:

The argument for it seems to be that
array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX")
should not throw an error, but why not?

Uh. Why should it? SELECT foo COLLATE "C" FROM ... ORDER BY bar COLLATE
"POSIX" doesn't throw one either?

After thinking about it a bit more, this case *should* throw an error:

string_agg(a COLLATE "C", b COLLATE "POSIX")

but these should not:

array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX")

array_agg(a ORDER BY b COLLATE "C", c COLLATE "POSIX")

that is, the ORDER BY expression(s) ought to be considered independently
rather than as part of the agg's argument list.

It looks like the proposed patch gets this right, but the proposed
test cases really fail to illuminate the problem IMO.

regards, tom lane

Am I understanding correctly that you want the code left alone and the
test case expanded as above?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#1)
Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates

David Fetter <david@fetter.org> writes:

While testing the upcoming FILTER clause for aggregates, Erik Rijkers
uncovered a long-standing bug in $subject, namely that this case
wasn't handled. Please find attached a patch by Andrew Gierth and
myself which fixes this issue and adds a regression test to ensure it
remains fixed.

Applied with minor editorialization.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers