Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

Started by Michał Kłeczekalmost 2 years ago7 messages
#1Michał Kłeczek
michal@kleczek.org

The following query:

SELECT * FROM (
SELECT 2023 AS year, * FROM remote_table_1
UNION ALL
SELECT 2022 AS year, * FROM remote_table_2
)
ORDER BY year DESC;

yields the following remote query:

SELECT [columns] FROM remote_table_1 ORDER BY 2023 DESC

and subsequently fails remote execution.

Not really sure where the problem is - the planner or postgres_fdw.
I guess it is postgres_fdw not filtering out ordering keys.

This filtering would also be pretty useful in the following scenario (which is also why I went through UNION ALL route and discovered this issue):

I have a table partitioned by year, partitions are remote tables.
On remote servers I have a GIST index that does not support ordering ([1]/messages/by-id/B2AC13F9-6655-4E27-BFD3-068844E5DC91@kleczek.org) so I would like to avoid sending ORDER BY year to remote servers.
Ideally redundant ordering should be filtered out.

[1]: /messages/by-id/B2AC13F9-6655-4E27-BFD3-068844E5DC91@kleczek.org


Kind regards,
Michal

#2David Rowley
dgrowleyml@gmail.com
In reply to: Michał Kłeczek (#1)
2 attachment(s)
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

On Thu, 7 Mar 2024 at 19:09, Michał Kłeczek <michal@kleczek.org> wrote:

The following query:

SELECT * FROM (
SELECT 2023 AS year, * FROM remote_table_1
UNION ALL
SELECT 2022 AS year, * FROM remote_table_2
)
ORDER BY year DESC;

yields the following remote query:

SELECT [columns] FROM remote_table_1 ORDER BY 2023 DESC

and subsequently fails remote execution.

Not really sure where the problem is - the planner or postgres_fdw.
I guess it is postgres_fdw not filtering out ordering keys.

Interesting. I've attached a self-contained recreator for the casual passerby.

I think the fix should go in appendOrderByClause(). It's at that
point we look for the EquivalenceMember for the relation and can
easily discover if the em_expr is a Const. I think we can safely just
skip doing any ORDER BY <const> stuff and not worry about if the
literal format of the const will appear as a reference to an ordinal
column position in the ORDER BY clause.

Something like the attached patch I think should work.

I wonder if we need a test...

David

Attachments:

postgres_fdw_order_by_const_fix.patchtext/plain; charset=US-ASCII; name=postgres_fdw_order_by_const_fix.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 8fc66fa11c..2f4ed33173 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -3914,11 +3914,11 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
 	int			nestlevel;
 	const char *delim = " ";
 	StringInfo	buf = context->buf;
+	bool		gotone = false;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
 
-	appendStringInfoString(buf, " ORDER BY");
 	foreach(lcell, pathkeys)
 	{
 		PathKey    *pathkey = lfirst(lcell);
@@ -3949,6 +3949,21 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
 		if (em == NULL)
 			elog(ERROR, "could not find pathkey item to sort");
 
+		/*
+		 * If the member just has a Const expression then we needn't add it to
+		 * the ORDER BY clause.  This can happen in UNION ALL queries where the
+		 * union child targetlist has a Const.  Adding these would be wasteful,
+		 * but also, for INT columns, putting an integer literal will be seen
+		 * as an ordinal column position rather than a value to sort by, so we
+		 * must skip these.
+		 */
+		if (IsA(em->em_expr, Const))
+			continue;
+
+		if (!gotone)
+			appendStringInfoString(buf, " ORDER BY");
+		gotone = true;
+
 		em_expr = em->em_expr;
 
 		/*
demo_of_postgres_fdw_order_by_const_bug.sqlapplication/octet-stream; name=demo_of_postgres_fdw_order_by_const_bug.sqlDownload
#3Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: David Rowley (#2)
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

On Thu, Mar 7, 2024 at 4:39 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 7 Mar 2024 at 19:09, Michał Kłeczek <michal@kleczek.org> wrote:

The following query:

SELECT * FROM (
SELECT 2023 AS year, * FROM remote_table_1
UNION ALL
SELECT 2022 AS year, * FROM remote_table_2
)
ORDER BY year DESC;

yields the following remote query:

SELECT [columns] FROM remote_table_1 ORDER BY 2023 DESC

and subsequently fails remote execution.

Not really sure where the problem is - the planner or postgres_fdw.
I guess it is postgres_fdw not filtering out ordering keys.

Interesting. I've attached a self-contained recreator for the casual
passerby.

I think the fix should go in appendOrderByClause(). It's at that
point we look for the EquivalenceMember for the relation and can
easily discover if the em_expr is a Const. I think we can safely just
skip doing any ORDER BY <const> stuff and not worry about if the
literal format of the const will appear as a reference to an ordinal
column position in the ORDER BY clause.

deparseSortGroupClause() calls deparseConst() with showtype = 1.
appendOrderByClause() may want to do something similar for consistency. Or
remove it from deparseSortGroupClause() as well?

Something like the attached patch I think should work.

I wonder if we need a test...

Yes.

--
Best Wishes,
Ashutosh Bapat

#4David Rowley
dgrowleyml@gmail.com
In reply to: Ashutosh Bapat (#3)
1 attachment(s)
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

On Fri, 8 Mar 2024 at 00:54, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Mar 7, 2024 at 4:39 PM David Rowley <dgrowleyml@gmail.com> wrote:

I think the fix should go in appendOrderByClause(). It's at that
point we look for the EquivalenceMember for the relation and can
easily discover if the em_expr is a Const. I think we can safely just
skip doing any ORDER BY <const> stuff and not worry about if the
literal format of the const will appear as a reference to an ordinal
column position in the ORDER BY clause.

deparseSortGroupClause() calls deparseConst() with showtype = 1. appendOrderByClause() may want to do something similar for consistency. Or remove it from deparseSortGroupClause() as well?

The fix could also be to use deparseConst() in appendOrderByClause()
and have that handle Const EquivalenceMember instead. I'd rather just
skip them. To me, that seems less risky than ensuring deparseConst()
handles all Const types correctly.

Also, as far as adjusting GROUP BY to eliminate Consts, I don't think
that's material for a bug fix. If we want to consider doing that,
that's for master only.

I wonder if we need a test...

Yes.

I've added two of those in the attached.

I also changed the way the delimiter stuff works as the exiting code
seems to want to avoid having a bool flag to record if we're adding
the first item. The change I'm making means the bool flag is now
required, so we may as well use that flag to deal with the delimiter
append too.

David

Attachments:

postgres_fdw_order_by_const_fix_v2.patchtext/plain; charset=US-ASCII; name=postgres_fdw_order_by_const_fix_v2.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 8fc66fa11c..a30f00179e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -3912,13 +3912,12 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
 {
 	ListCell   *lcell;
 	int			nestlevel;
-	const char *delim = " ";
 	StringInfo	buf = context->buf;
+	bool		gotone = false;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
 
-	appendStringInfoString(buf, " ORDER BY");
 	foreach(lcell, pathkeys)
 	{
 		PathKey    *pathkey = lfirst(lcell);
@@ -3951,6 +3950,25 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
 
 		em_expr = em->em_expr;
 
+		/*
+		 * If the member is a Const expression then we needn't add it to the
+		 * ORDER BY clause.  This can happen in UNION ALL queries where the
+		 * union child targetlist has a Const.  Adding these would be wasteful,
+		 * but also, for INT columns, an integer literal will be seen as an
+		 * ordinal column position rather than a value to sort by, so we must
+		 * skip these.
+		 */
+		if (IsA(em_expr, Const))
+			continue;
+
+		if (!gotone)
+		{
+			appendStringInfoString(buf, " ORDER BY ");
+			gotone = true;
+		}
+		else
+			appendStringInfoString(buf, ", ");
+
 		/*
 		 * Lookup the operator corresponding to the strategy in the opclass.
 		 * The datatype used by the opfamily is not necessarily the same as
@@ -3965,7 +3983,6 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
 				 pathkey->pk_strategy, em->em_datatype, em->em_datatype,
 				 pathkey->pk_opfamily);
 
-		appendStringInfoString(buf, delim);
 		deparseExpr(em_expr, context);
 
 		/*
@@ -3975,7 +3992,6 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
 		appendOrderBySuffix(oprid, exprType((Node *) em_expr),
 							pathkey->pk_nulls_first, context);
 
-		delim = ", ";
 	}
 	reset_transmission_modes(nestlevel);
 }
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..f3339b4ac8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -919,6 +919,44 @@ EXPLAIN (VERBOSE, COSTS OFF)
          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
 (6 rows)
 
+-- Ensure we don't push ORDER BY expressions which are Consts at the UNION
+-- child level to the foreign server.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM (
+    SELECT 1 AS type,c1 FROM ft1
+    UNION ALL
+    SELECT 2 AS type,c1 FROM ft2
+) ORDER BY type,c1;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Merge Append
+   Sort Key: (1), ft1.c1
+   ->  Foreign Scan on public.ft1
+         Output: 1, ft1.c1
+         Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC NULLS LAST
+   ->  Foreign Scan on public.ft2
+         Output: 2, ft2.c1
+         Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC NULLS LAST
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM (
+    SELECT 1 AS type,c1 FROM ft1
+    UNION ALL
+    SELECT 2 AS type,c1 FROM ft2
+) ORDER BY type;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Merge Append
+   Sort Key: (1)
+   ->  Foreign Scan on public.ft1
+         Output: 1, ft1.c1
+         Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+   ->  Foreign Scan on public.ft2
+         Output: 2, ft2.c1
+         Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+(8 rows)
+
 -- user-defined operator/function
 CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
 BEGIN
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 812e7646e1..bc8fdb6419 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -362,6 +362,22 @@ EXPLAIN (VERBOSE, COSTS OFF)
 EXPLAIN (VERBOSE, COSTS OFF)
 	SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "C";
 
+-- Ensure we don't push ORDER BY expressions which are Consts at the UNION
+-- child level to the foreign server.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM (
+    SELECT 1 AS type,c1 FROM ft1
+    UNION ALL
+    SELECT 2 AS type,c1 FROM ft2
+) ORDER BY type,c1;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM (
+    SELECT 1 AS type,c1 FROM ft1
+    UNION ALL
+    SELECT 2 AS type,c1 FROM ft2
+) ORDER BY type;
+
 -- user-defined operator/function
 CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
 BEGIN
#5Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#4)
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

On Fri, Mar 8, 2024 at 10:13 AM David Rowley <dgrowleyml@gmail.com> wrote:

The fix could also be to use deparseConst() in appendOrderByClause()
and have that handle Const EquivalenceMember instead. I'd rather just
skip them. To me, that seems less risky than ensuring deparseConst()
handles all Const types correctly.

I've looked at this patch a bit. I once wondered why we don't check
pathkey->pk_eclass->ec_has_const with EC_MUST_BE_REDUNDANT to see if the
pathkey is not needed. Then I realized that a child member would not be
marked as constant even if the child expr is a Const, as explained in
add_child_rel_equivalences().

BTW, I wonder if it is possible that we have a pseudoconstant expression
that is not of type Const. In such cases we would need to check
'bms_is_empty(pull_varnos(em_expr))' instead of 'IsA(em_expr, Const)'.
However, I'm unable to think of such an expression in this context.

The patch looks good to me otherwise.

Thanks
Richard

#6David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#5)
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

On Fri, 8 Mar 2024 at 23:14, Richard Guo <guofenglinux@gmail.com> wrote:

I've looked at this patch a bit. I once wondered why we don't check
pathkey->pk_eclass->ec_has_const with EC_MUST_BE_REDUNDANT to see if the
pathkey is not needed. Then I realized that a child member would not be
marked as constant even if the child expr is a Const, as explained in
add_child_rel_equivalences().

This situation where the child member is a Const but the parent isn't
is unique to UNION ALL queries. The only other cases where we have
child members are with partitioned and inheritance tables. In those
cases, the parent member just maps to the equivalent child member
replacing parent Vars with the corresponding child Var according to
the column mapping between the parent and child. It might be nice if
partitioning supported mapping to a Const as in many cases that could
save storing the same value in the table every time, but we don't
support that, so this can only happen with UNION ALL queries.

BTW, I wonder if it is possible that we have a pseudoconstant expression
that is not of type Const. In such cases we would need to check
'bms_is_empty(pull_varnos(em_expr))' instead of 'IsA(em_expr, Const)'.
However, I'm unable to think of such an expression in this context.

I can't see how there'd be any problems with a misinterpretation of a
pseudoconstant value as an ordinal column position on the remote
server. Surely it's only actual "Const" node types that we're just
going to call the type's output function which risks it yielding a
string of digits and the remote server thinking that we must mean an
ordinal column position.

The patch looks good to me otherwise.

Thanks

David

#7Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: David Rowley (#4)
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

On Fri, Mar 8, 2024 at 7:43 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 8 Mar 2024 at 00:54, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Mar 7, 2024 at 4:39 PM David Rowley <dgrowleyml@gmail.com>

wrote:

I think the fix should go in appendOrderByClause(). It's at that
point we look for the EquivalenceMember for the relation and can
easily discover if the em_expr is a Const. I think we can safely just
skip doing any ORDER BY <const> stuff and not worry about if the
literal format of the const will appear as a reference to an ordinal
column position in the ORDER BY clause.

deparseSortGroupClause() calls deparseConst() with showtype = 1.

appendOrderByClause() may want to do something similar for consistency. Or
remove it from deparseSortGroupClause() as well?

The fix could also be to use deparseConst() in appendOrderByClause()
and have that handle Const EquivalenceMember instead. I'd rather just
skip them. To me, that seems less risky than ensuring deparseConst()
handles all Const types correctly.

Also, as far as adjusting GROUP BY to eliminate Consts, I don't think
that's material for a bug fix. If we want to consider doing that,
that's for master only.

If appendOrderByClause() would have been using deparseConst() since the
beginning this bug would not be there. Instead of maintaining two different
ways of deparsing ORDER BY clause, we could maintain just one. I think we
should unify those. If we should do it in only master be it so. I am fine
to leave back branches with two methods.

I wonder if we need a test...

Yes.

I've added two of those in the attached.

Thanks. They look fine to me.

--
Best Wishes,
Ashutosh Bapat