v12.0: ERROR: could not find pathkey item to sort

Started by Justin Pryzbyover 6 years ago20 messages

pryzby@telsasoft.com

over 6 years ago

I've reduced the failing query as much as possible to this:

-- This is necessary to fail:
SET enable_nestloop=off;

SELECT * FROM
(SELECT start_time, t1.site_id
FROM pgw_kpi_view t1
-- Apparently the where clause is necessary to fail...
WHERE (start_time>='2019-10-10' AND start_time<'2019-10-11')
-- The group by MAY be necessary to fail...
GROUP BY 1,2
) AS data
JOIN sites ON ( sites.site_location='' OR sites.site_office=data.site_id)

The view is actually a join of two relkind=p partitioned tables (which I
will acknowledge probably performs poorly).

(gdb) bt
#0 errfinish (dummy=dummy@entry=0) at elog.c:411
#1 0x000000000087a959 in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0x9d93d8 "could not find pathkey item to sort") at elog.c:1365
#2 0x00000000006a587f in prepare_sort_from_pathkeys (lefttree=0x7f7cb84492e8, pathkeys=<optimized out>, relids=0x7f7cb8410700, reqColIdx=reqColIdx@entry=0x0, adjust_tlist_in_place=<optimized out>,
adjust_tlist_in_place@entry=false, p_numsortkeys=p_numsortkeys@entry=0x7ffc4b2e10c4, p_sortColIdx=p_sortColIdx@entry=0x7ffc4b2e10c8, p_sortOperators=p_sortOperators@entry=0x7ffc4b2e10d0,
p_collations=p_collations@entry=0x7ffc4b2e10d8, p_nullsFirst=p_nullsFirst@entry=0x7ffc4b2e10e0) at createplan.c:5893
#3 0x00000000006a5a6a in make_sort_from_pathkeys (lefttree=<optimized out>, pathkeys=<optimized out>, relids=<optimized out>) at createplan.c:6020
#4 0x00000000006a6e30 in create_sort_plan (flags=4, best_path=0x7f7cb8410cc8, root=0x7f7fdc3ac6b0) at createplan.c:1985
#5 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=0x7f7cb8410cc8, flags=flags@entry=4) at createplan.c:459
#6 0x00000000006a6e4e in create_group_plan (best_path=0x7f7cb8410d58, root=0x7f7fdc3ac6b0) at createplan.c:2012
#7 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=best_path@entry=0x7f7cb8410d58, flags=flags@entry=1) at createplan.c:464
#8 0x00000000006a8278 in create_merge_append_plan (flags=4, best_path=0x7f7cb8446cd8, root=0x7f7fdc3ac6b0) at createplan.c:1333
#9 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=0x7f7cb8446cd8, flags=flags@entry=4) at createplan.c:402
#10 0x00000000006a6e4e in create_group_plan (best_path=0x7f7cb84486c8, root=0x7f7fdc3ac6b0) at createplan.c:2012
#11 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=0x7f7cb84486c8, flags=flags@entry=1) at createplan.c:464
#12 0x00000000006a9739 in create_plan (root=0x7f7fdc3ac6b0, best_path=<optimized out>) at createplan.c:325
#13 0x00000000006aa988 in create_subqueryscan_plan (scan_clauses=0x0, tlist=0x7f7cb8450820, best_path=0x7f7cb8448db8, root=0x7f7fdc34b948) at createplan.c:3385
#14 create_scan_plan (root=root@entry=0x7f7fdc34b948, best_path=best_path@entry=0x7f7cb8448db8, flags=<optimized out>, flags@entry=0) at createplan.c:670
#15 0x00000000006a6d31 in create_plan_recurse (root=root@entry=0x7f7fdc34b948, best_path=0x7f7cb8448db8, flags=flags@entry=0) at createplan.c:427
#16 0x00000000006a983a in create_nestloop_plan (best_path=0x7f7cb844fb80, root=0x7f7fdc34b948) at createplan.c:4008
#17 create_join_plan (root=root@entry=0x7f7fdc34b948, best_path=best_path@entry=0x7f7cb844fb80) at createplan.c:1020
#18 0x00000000006a6d75 in create_plan_recurse (root=root@entry=0x7f7fdc34b948, best_path=0x7f7cb844fb80, flags=flags@entry=1) at createplan.c:393
#19 0x00000000006a9739 in create_plan (root=root@entry=0x7f7fdc34b948, best_path=<optimized out>) at createplan.c:325
#20 0x00000000006b5a04 in standard_planner (parse=0x1bd2308, cursorOptions=256, boundParams=0x0) at planner.c:413
#21 0x000000000075fb2e in pg_plan_query (querytree=querytree@entry=0x1bd2308, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at postgres.c:878
#22 0x000000000075fbee in pg_plan_queries (querytrees=<optimized out>, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at postgres.c:968
#23 0x000000000076007a in exec_simple_query (
query_string=0x1ba36e0 "SELECT * FROM\n\t(SELECT start_time, t1.site_id\n\tFROM pgw_kpi_view t1\n\t\n\tWHERE (start_time>='2019-10-10' AND start_time<'2019-10-11')\n\t\n\tGROUP BY 1,2\n\t) AS data\nJOIN sites ON ( sites.site_location='' OR"...) at postgres.c:1143
#24 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1bd8e70, dbname=0x1bd8d10 "ts", username=<optimized out>) at postgres.c:4236
#25 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431
#26 BackendStartup (port=0x1bd5190) at postmaster.c:4122
#27 ServerLoop () at postmaster.c:1704
#28 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1b9e280) at postmaster.c:1377
#29 0x0000000000484c93 in main (argc=3, argv=0x1b9e280) at main.c:228

bt f:

#2 0x00000000006a587f in prepare_sort_from_pathkeys (lefttree=0x7f7cb84492e8, pathkeys=<optimized out>, relids=0x7f7cb8410700, reqColIdx=reqColIdx@entry=0x0, adjust_tlist_in_place=<optimized out>,
adjust_tlist_in_place@entry=false, p_numsortkeys=p_numsortkeys@entry=0x7ffc4b2e10c4, p_sortColIdx=p_sortColIdx@entry=0x7ffc4b2e10c8, p_sortOperators=p_sortOperators@entry=0x7ffc4b2e10d0,
p_collations=p_collations@entry=0x7ffc4b2e10d8, p_nullsFirst=p_nullsFirst@entry=0x7ffc4b2e10e0) at createplan.c:5893
sortexpr = <optimized out>
ec = 0x7f7cb8edbe28
em = <optimized out>
tle = <optimized out>
pathkey = <optimized out>
pk_datatype = <optimized out>
sortop = <optimized out>
j = <optimized out>
tlist = 0x7f7cb8451bb8
i = 0x7f7cb8edc2d8
numsortkeys = 0
sortColIdx = 0x7f7cb8451c58
sortOperators = 0x7f7cb8451c70
collations = 0x7f7cb8451c88
nullsFirst = 0x7f7cb8451ca0
__func__ = "prepare_sort_from_pathkeys"
#3 0x00000000006a5a6a in make_sort_from_pathkeys (lefttree=<optimized out>, pathkeys=<optimized out>, relids=<optimized out>) at createplan.c:6020
numsortkeys = 32636
sortColIdx = 0x7f7cb8447468
sortOperators = 0x7f7cb83fa278
collations = 0x0
nullsFirst = 0x7f7cb8edc2f8

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Justin Pryzby (#1)

Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

I've reduced the failing query as much as possible to this:
-- This is necessary to fail:
SET enable_nestloop=off;

SELECT * FROM
(SELECT start_time, t1.site_id
FROM pgw_kpi_view t1
-- Apparently the where clause is necessary to fail...
WHERE (start_time>='2019-10-10' AND start_time<'2019-10-11')
-- The group by MAY be necessary to fail...
GROUP BY 1,2
) AS data
JOIN sites ON ( sites.site_location='' OR sites.site_office=data.site_id)

The view is actually a join of two relkind=p partitioned tables (which I
will acknowledge probably performs poorly).

Could you provide a self-contained test case please?

regards, tom lane

Justin Pryzby

pryzby@telsasoft.com

over 6 years ago

In reply to: Tom Lane (#2)

Re: v12.0: ERROR: could not find pathkey item to sort

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

The view is actually a join of two relkind=p partitioned tables (which I
will acknowledge probably performs poorly).

Could you provide a self-contained test case please?

Working on it. FWIW explain for a v11 customer looks like this:

Nested Loop (cost=10000011818.10..10000076508.23 rows=734500 width=159)
Join Filter: ((s.site_location = ''::text) OR (s.site_office = (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))))
-> Group (cost=11818.10..11946.31 rows=2937 width=40)
Group Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Merge Append (cost=11818.10..11931.59 rows=2944 width=40)
Sort Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Group (cost=332.48..333.10 rows=83 width=40)
Group Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Sort (cost=332.48..332.69 rows=83 width=40)
Sort Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Hash Full Join (cost=46.48..329.84 rows=83 width=40)
Hash Cond: ((huawei_ggsn_201610.ne_name = huawei_ggsn_gw_201610.ne_name) AND (huawei_ggsn_201610.ggsn_function = huawei_ggsn_gw_201610.ggsn_function) AND (huawei_ggsn_201610.start_time = huawei_
ggsn_gw_201610.start_time) AND (huawei_ggsn_201610.interval_seconds = huawei_ggsn_gw_201610.interval_seconds) AND (huawei_ggsn_201610.device_id = huawei_ggsn_gw_201610.device_id) AND (huawei_ggsn_201610.c_134710251 = huawei_ggs
n_gw_201610.c_134710251) AND (huawei_ggsn_201610.c_134710252 = huawei_ggsn_gw_201610.c_134710252) AND (huawei_ggsn_201610.c_134710253 = huawei_ggsn_gw_201610.c_134710253) AND (huawei_ggsn_201610.ne_id = huawei_ggsn_gw_201610.ne
_id) AND (huawei_ggsn_201610.ugw_function = huawei_ggsn_gw_201610.ugw_function))
Filter: ((COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time) >= '2019-10-01 00:00:00-11'::timestamp with time zone) AND (COALESCE(huawei_ggsn_201610.start_time, huawei_ggs
n_gw_201610.start_time) < '2019-10-02 00:00:00-11'::timestamp with time zone))
-> Seq Scan on huawei_ggsn_201610 (cost=0.00..255.44 rows=744 width=94)
-> Hash (cost=20.44..20.44 rows=744 width=94)
-> Seq Scan on huawei_ggsn_gw_201610 (cost=0.00..20.44 rows=744 width=94)
[...]

I'm suspecting this; is it useful to test with this commit reverted ?

commit 8edd0e79460b414b1d971895312e549e95e12e4f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Mar 25 15:42:35 2019 -0400

Suppress Append and MergeAppend plan nodes that have a single child.

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Justin Pryzby (#3)

Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

I'm suspecting this; is it useful to test with this commit reverted ?

I wouldn't bother; we'd still need a test case to find out what the
problem is.

regards, tom lane

Justin Pryzby

pryzby@telsasoft.com

over 6 years ago

In reply to: Tom Lane (#2)

Re: v12.0: ERROR: could not find pathkey item to sort

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

SET enable_partitionwise_aggregate = 'on';
SET enable_partitionwise_join = 'on';
SET max_parallel_workers_per_gather=0;
-- maybe not important but explain(settings) suggests I should include them for completeness:
SET effective_io_concurrency = '0';
SET work_mem = '512MB';
SET jit = 'off';

CREATE TABLE s(site_id int, site_location text, site_office text);
INSERT INTO s SELECT generate_series(1,99),'','';

CREATE TABLE t(start_time timestamp, site_id text, i int)PARTITION BY RANGE(start_time);
CREATE TABLE t1 PARTITION OF t FOR VALUES FROM ('2019-10-01')TO('2019-10-02');
INSERT INTO t1 SELECT a,b FROM generate_series( '2019-10-01'::timestamp, '2019-10-01 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, generate_series(1,99)c;
CREATE TABLE t2 PARTITION OF t FOR VALUES FROM ('2019-10-02')TO('2019-10-03');
INSERT INTO t2 SELECT a,b FROM generate_series( '2019-10-02'::timestamp, '2019-10-02 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, generate_series(1,99)c;

ANALYZE s,t;

explain
SELECT s.* FROM
(SELECT start_time, site_id::int
FROM t t1 FULL JOIN t t2 USING(start_time,site_id)
WHERE (start_time>='2019-10-01' AND start_time<'2019-10-01 01:00')
GROUP BY 1,2) AS data
JOIN s ON (s.site_location='' OR s.site_office::int=data.site_id)

Justin

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Justin Pryzby (#5)

Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

[ test case ]

Yup, fails for me too. Will look shortly.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Justin Pryzby (#5)

Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

[ test case ]

Oh, this is the same issue Amit described in

/messages/by-id/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com

namely that we're not generating EquivalenceClass members corresponding
to sub-joins of a partitionwise join.

Are you interested in helping to test the patches proposed there?

regards, tom lane

Justin Pryzby

pryzby@telsasoft.com

about 6 years ago

In reply to: Tom Lane (#7)

Re: v12.0: ERROR: could not find pathkey item to sort

On Sun, Oct 13, 2019 at 02:06:02PM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

[ test case ]

Oh, this is the same issue Amit described in

/messages/by-id/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com

namely that we're not generating EquivalenceClass members corresponding
to sub-joins of a partitionwise join.

Are you interested in helping to test the patches proposed there?

Sure. Any requests other than testing that our original query works correctly
and maybe endeavoring to read the patch ?

BTW it probably should've been documented as an "Open Item" for v12.

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

Justin Pryzby

pryzby@telsasoft.com

about 6 years ago

In reply to: Justin Pryzby (#8)

Re: v12.0: ERROR: could not find pathkey item to sort

On Sun, Oct 13, 2019 at 01:30:29PM -0500, Justin Pryzby wrote:

BTW it probably should've been documented as an "Open Item" for v12.

https://commitfest.postgresql.org/25/2278/
I realized possibly people were thinking of that as a "feature" and not a
bugfix for backpatch (?)

But, my issue is a query which worked under v11 PWJ but fails under v12
(apparently broken by d25ea01275).

In my mind, if the planner doesn't support that query with PWJ, I think it
should run without PWJ rather than fail.

Justin

#10

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Justin Pryzby (#9)

Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Oct 13, 2019 at 01:30:29PM -0500, Justin Pryzby wrote:

BTW it probably should've been documented as an "Open Item" for v12.

https://commitfest.postgresql.org/25/2278/
I realized possibly people were thinking of that as a "feature" and not a
bugfix for backpatch (?)
But, my issue is a query which worked under v11 PWJ but fails under v12
(apparently broken by d25ea01275).

Yeah, this should have been dealt with as an open item, but it
slipped through the cracks. We'll make sure to get it fixed,
one way or another, for 12.1.

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

regards, tom lane

#11

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tom Lane (#10)

2 attachment(s)

Re: v12.0: ERROR: could not find pathkey item to sort

Sorry about the late reply.

On Mon, Oct 14, 2019 at 11:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Oct 13, 2019 at 01:30:29PM -0500, Justin Pryzby wrote:

BTW it probably should've been documented as an "Open Item" for v12.

https://commitfest.postgresql.org/25/2278/
I realized possibly people were thinking of that as a "feature" and not a
bugfix for backpatch (?)
But, my issue is a query which worked under v11 PWJ but fails under v12
(apparently broken by d25ea01275).

Yeah, this should have been dealt with as an open item, but it
slipped through the cracks. We'll make sure to get it fixed,
one way or another, for 12.1.

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

Actually, the patch I proposed to fix equivalence code can be applied
on its own. The example I gave on that thread needs us to fix
partitionwise code to even work, which is perhaps a 13-only change,
but we have an example here that is broken due to d25ea01275.
Perhaps, considering applying my patch seems better than alternatives
which are either reverting d25ea01275 or avoiding doing partitionwise
join for such queries.

Since we've got 3373c71553 ("Speed up finding EquivalenceClasses for a
given set of rels") in HEAD, need two versions of the patch; please
see attached.

Thanks,
Amit

Attachments:

d25ea01275-fixup-v12.patchapplication/octet-stream; name=d25ea01275-fixup-v12.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b7723481b0..0a95e7221d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 688d9b0707..268ade8ccf 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2109,12 +2109,15 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL,
+ * respectively, or JOINREL and OTHER_JOINREL.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
@@ -2124,6 +2127,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
 		ListCell   *lc2;
+		EquivalenceMember *first_em;
 
 		/*
 		 * If this EC contains a volatile expression, then generating child
@@ -2133,6 +2137,19 @@ add_child_rel_equivalences(PlannerInfo *root,
 		if (cur_ec->ec_has_volatile)
 			continue;
 
+		first_em = (EquivalenceMember *) linitial(cur_ec->ec_members);
+
+		/*
+		 * Only translate ECs whose members expressions could possibly match
+		 * the parent relation.  That is, for "baserel" parent and child
+		 * relations only consider ECs that contain single-relation members,
+		 * whereas, for "joinrel" parent and child relations, only consider ECs
+		 * that contain multi-relation members.  Note that looking at the first
+		 * EC member is enough, because others should look the same.
+		 */
+		if (bms_membership(parent_rel->relids) != bms_membership(first_em->em_relids))
+			continue;
+
 		/*
 		 * No point in searching if child's topmost parent rel is not
 		 * mentioned in eclass.
@@ -2147,34 +2164,31 @@ add_child_rel_equivalences(PlannerInfo *root,
 			if (cur_em->em_is_const)
 				continue;		/* ignore consts here */
 
-			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
-			 */
-			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
-
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_is_subset(cur_em->em_relids, child_rel->top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
@@ -2210,6 +2224,12 @@ add_child_rel_equivalences(PlannerInfo *root,
 				(void) add_eq_member(cur_ec, child_expr,
 									 new_relids, new_nullable_relids,
 									 true, cur_em->em_datatype);
+
+				/*
+				 * There aren't going to be more expressions to translate in
+				 * the same EC.
+				 */
+				break;
 			}
 		}
 	}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..bf74abd188 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -848,6 +848,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..832cc84bb4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -150,7 +150,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..003c8d1c24 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,67 @@ explain (costs off)
    Filter: ((unique1 = unique1) OR (unique2 = unique2))
 (2 rows)
 
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((COALESCE(t1.b, t2.b)) = child_joins_ecs_testtab1.a)
+   ->  Group
+         Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+         ->  Merge Append
+               Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+               ->  Group
+                     Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
+                                 Filter: ((COALESCE(t1.a, t2.a) >= 1) AND (COALESCE(t1.a, t2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p1 t2
+               ->  Group
+                     Group Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.b = t2_1.b))
+                                 Filter: ((COALESCE(t1_1.a, t2_1.a) >= 1) AND (COALESCE(t1_1.a, t2_1.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p2 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p2 t2_1
+               ->  Group
+                     Group Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.a = t2_2.a) AND (t1_2.b = t2_2.b))
+                                 Filter: ((COALESCE(t1_2.a, t2_2.a) >= 1) AND (COALESCE(t1_2.a, t2_2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p3 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p3 t2_2
+   ->  Materialize
+         ->  Seq Scan on child_joins_ecs_testtab1
+(38 rows)
+
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..bd792dfc3c 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,26 @@ explain (costs off)
 -- this could be converted, but isn't at present
 explain (costs off)
   select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;

d25ea01275-fixup-v13.patchapplication/octet-stream; name=d25ea01275-fixup-v13.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index db3a68a51d..70d0be691e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ccc07ba9f0..67756ab671 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1193,6 +1193,20 @@ generate_join_implied_equalities(PlannerInfo *root,
 		result = list_concat(result, sublist);
 	}
 
+	/*
+	 * Remember the EC indexes that were found to refer to this joinrel.  This
+	 * is useful in add_child_rel_equivalences() where we need to add child
+	 * versions of the expressions in each of these ECs.
+	 */
+	if (!bms_is_empty(matching_ecs))
+	{
+		RelOptInfo *joinrel = find_join_rel(root, join_relids);
+
+		if (joinrel)
+			joinrel->eclass_indexes = bms_add_members(joinrel->eclass_indexes,
+													  matching_ecs);
+	}
+
 	return result;
 }
 
@@ -2215,12 +2229,15 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL,
+ * respectively, or JOINREL and OTHER_JOINREL.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
@@ -2231,13 +2248,14 @@ add_child_rel_equivalences(PlannerInfo *root,
 	 * eclass_indexes to avoid searching all of root->eq_classes.
 	 */
 	Assert(root->ec_merging_done);
-	Assert(IS_SIMPLE_REL(parent_rel));
+	Assert(IS_SIMPLE_REL(parent_rel) || IS_JOIN_REL(parent_rel));
 
 	i = -1;
 	while ((i = bms_next_member(parent_rel->eclass_indexes, i)) >= 0)
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
 		int			num_members;
+		EquivalenceMember *first_em;
 
 		/*
 		 * If this EC contains a volatile expression, then generating child
@@ -2247,8 +2265,21 @@ add_child_rel_equivalences(PlannerInfo *root,
 		if (cur_ec->ec_has_volatile)
 			continue;
 
+		first_em = (EquivalenceMember *) linitial(cur_ec->ec_members);
+
+		/*
+		 * Only translate ECs whose members expressions could possibly match
+		 * the parent relation.  That is, for "baserel" parent and child
+		 * relations only consider ECs that contain single-relation members,
+		 * whereas, for "joinrel" parent and child relations, only consider ECs
+		 * that contain multi-relation members.  Note that looking at the first
+		 * EC member is enough, because others should look the same.
+		 */
+		if (bms_membership(parent_rel->relids) != bms_membership(first_em->em_relids))
+			continue;
+
 		/* Sanity check eclass_indexes only contain ECs for parent_rel */
-		Assert(bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids));
+		Assert(bms_overlap(child_rel->top_parent_relids, cur_ec->ec_relids));
 
 		/*
 		 * We don't use foreach() here because there's no point in scanning
@@ -2263,34 +2294,31 @@ add_child_rel_equivalences(PlannerInfo *root,
 			if (cur_em->em_is_const)
 				continue;		/* ignore consts here */
 
-			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
-			 */
-			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
-
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_is_subset(cur_em->em_relids, child_rel->top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
@@ -2329,6 +2357,12 @@ add_child_rel_equivalences(PlannerInfo *root,
 
 				/* Record this EC index for the child rel */
 				child_rel->eclass_indexes = bms_add_member(child_rel->eclass_indexes, i);
+
+				/*
+				 * There aren't going to be more expressions to translate in
+				 * the same EC.
+				 */
+				break;
 			}
 		}
 	}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 85415381fb..e659606e9e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -661,6 +661,9 @@ build_join_rel(PlannerInfo *root,
 	joinrel->nullable_partexprs = NULL;
 	joinrel->partitioned_child_rels = NIL;
 
+	/* Add the joinrel to the PlannerInfo. */
+	add_join_rel(root, joinrel);
+
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
@@ -734,9 +737,6 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
 
-	/* Add the joinrel to the PlannerInfo. */
-	add_join_rel(root, joinrel);
-
 	/*
 	 * Also, if dynamic-programming join search is active, add the new joinrel
 	 * to the appropriate sublist.  Note: you might think the Assert on number
@@ -854,6 +854,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
@@ -863,10 +873,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
 
-	/*
-	 * If the parent joinrel has pending equivalence classes, so does the
-	 * child.
-	 */
 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
 
 	/* Is the join between partitions itself partitioned? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..832cc84bb4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -150,7 +150,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..003c8d1c24 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,67 @@ explain (costs off)
    Filter: ((unique1 = unique1) OR (unique2 = unique2))
 (2 rows)
 
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((COALESCE(t1.b, t2.b)) = child_joins_ecs_testtab1.a)
+   ->  Group
+         Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+         ->  Merge Append
+               Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+               ->  Group
+                     Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
+                                 Filter: ((COALESCE(t1.a, t2.a) >= 1) AND (COALESCE(t1.a, t2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p1 t2
+               ->  Group
+                     Group Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.b = t2_1.b))
+                                 Filter: ((COALESCE(t1_1.a, t2_1.a) >= 1) AND (COALESCE(t1_1.a, t2_1.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p2 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p2 t2_1
+               ->  Group
+                     Group Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.a = t2_2.a) AND (t1_2.b = t2_2.b))
+                                 Filter: ((COALESCE(t1_2.a, t2_2.a) >= 1) AND (COALESCE(t1_2.a, t2_2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p3 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p3 t2_2
+   ->  Materialize
+         ->  Seq Scan on child_joins_ecs_testtab1
+(38 rows)
+
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..bd792dfc3c 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,26 @@ explain (costs off)
 -- this could be converted, but isn't at present
 explain (costs off)
   select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;

#12

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Amit Langote (#11)

Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Oct 14, 2019 at 11:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

Actually, the patch I proposed to fix equivalence code can be applied
on its own. The example I gave on that thread needs us to fix
partitionwise code to even work, which is perhaps a 13-only change,
but we have an example here that is broken due to d25ea01275.
Perhaps, considering applying my patch seems better than alternatives
which are either reverting d25ea01275 or avoiding doing partitionwise
join for such queries.

I looked at this a bit, and I see that the core idea is to generate
the missing EC members by applying add_child_rel_equivalences when
building child join rels. Perhaps we can make that work, but this
patch fails to, because you've ignored the comment pointing out
that the nullable_relids fixup logic only works for baserels:

* And likewise for nullable_relids. Note this code assumes
* parent and child relids are singletons.

We could improve that to work for joinrels, I think, but it would become
very much more complicated (and slower). AFAICS the logic would have
to be "iterate through top_parent_relids, see which ones are in
em_nullable_relids, and for each one that is, find the corresponding
child_relid and substitute that in new_nullable_relids". This is a
bit of a problem because we don't have any really convenient way to
map individual top parent relids to child relids. I wonder if we
should extend AppendRelInfo to include the top parent relid as well as
the immediate parent. (Perhaps, while we're at it, we could make
adjust_appendrel_attrs_multilevel less of an inefficient and
underdocumented mess.)

Also, I'm pretty sure this addition is wrong/broken:

+
+                /*
+                 * There aren't going to be more expressions to translate in
+                 * the same EC.
+                 */
+                break;

What makes you think that an EC can only contain one entry per rel?

More generally, as long as this patch requires changing
add_child_rel_equivalences' API anyway, I wonder if we should
rethink that altogether. Looking at the code now, I realize that
d25ea01275 resulted in horribly bastardizing that function's API,
because the parent_rel and appinfo arguments are only consulted in
some cases, while in other cases we disregard them and rely on
child_rel->top_parent_relids to figure out how to translate stuff.
It would be possible to make the argument list be just (root, child_rel)
and always rely on child_rel->top_parent_relids. At the very least,
if we keep the extra arguments, we should document them as being just
optimizations.

regards, tom lane

#13

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tom Lane (#12)

2 attachment(s)

Re: v12.0: ERROR: could not find pathkey item to sort

Thanks for taking a look and sorry about the delay in replying.

On Fri, Oct 25, 2019 at 1:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Oct 14, 2019 at 11:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

Actually, the patch I proposed to fix equivalence code can be applied
on its own. The example I gave on that thread needs us to fix
partitionwise code to even work, which is perhaps a 13-only change,
but we have an example here that is broken due to d25ea01275.
Perhaps, considering applying my patch seems better than alternatives
which are either reverting d25ea01275 or avoiding doing partitionwise
join for such queries.

I looked at this a bit, and I see that the core idea is to generate
the missing EC members by applying add_child_rel_equivalences when
building child join rels. Perhaps we can make that work, but this
patch fails to, because you've ignored the comment pointing out
that the nullable_relids fixup logic only works for baserels:

* And likewise for nullable_relids. Note this code assumes
* parent and child relids are singletons.

We could improve that to work for joinrels, I think, but it would become
very much more complicated (and slower). AFAICS the logic would have
to be "iterate through top_parent_relids, see which ones are in
em_nullable_relids, and for each one that is, find the corresponding
child_relid and substitute that in new_nullable_relids". This is a
bit of a problem because we don't have any really convenient way to
map individual top parent relids to child relids.

Actually, there is adjust_child_relids_multilevel() which translates
the top parent relids in em_nullable_relids to child relids.

I have updated the patches that way.

I wonder if we
should extend AppendRelInfo to include the top parent relid as well as
the immediate parent. (Perhaps, while we're at it, we could make
adjust_appendrel_attrs_multilevel less of an inefficient and
underdocumented mess.)

Hmm, I agree we should try to fix that situation somehow. Even better
if we could do away with child expressions in ECs, because they cause
EC related code to show up in profiles when executing complex queries
with thousands of partitions.

Also, I'm pretty sure this addition is wrong/broken:
+
+                /*
+                 * There aren't going to be more expressions to translate in
+                 * the same EC.
+                 */
+                break;
What makes you think that an EC can only contain one entry per rel?

I was wrong about that. Fixed.

More generally, as long as this patch requires changing
add_child_rel_equivalences' API anyway, I wonder if we should
rethink that altogether. Looking at the code now, I realize that
d25ea01275 resulted in horribly bastardizing that function's API,
because the parent_rel and appinfo arguments are only consulted in
some cases, while in other cases we disregard them and rely on
child_rel->top_parent_relids to figure out how to translate stuff.
It would be possible to make the argument list be just (root, child_rel)
and always rely on child_rel->top_parent_relids.

Actually, as of 3373c71553, add_child_rel_equivalences() looks at
parent_rel->eclass_indexes to look up ECs, so maybe we can't take out
parent_rel.

At the very least,
if we keep the extra arguments, we should document them as being just
optimizations.

For common cases that doesn't involve multi-level partitioning, it
really helps to have the appinfos be supplied by the caller because
they're already available. I've added a comment at the top about
that.

Attached updated patches.

Thanks,
Amit

Attachments:

d25ea01275-fixup-HEAD-v2.patchapplication/octet-stream; name=d25ea01275-fixup-HEAD-v2.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index db3a68a51d..70d0be691e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ccc07ba9f0..2b838c2777 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1193,6 +1193,20 @@ generate_join_implied_equalities(PlannerInfo *root,
 		result = list_concat(result, sublist);
 	}
 
+	/*
+	 * Remember the EC indexes that were found to refer to this joinrel.  This
+	 * is useful in add_child_rel_equivalences() where we need to add child
+	 * versions of the expressions in each of these ECs.
+	 */
+	if (!bms_is_empty(matching_ecs))
+	{
+		RelOptInfo *joinrel = find_join_rel(root, join_relids);
+
+		if (joinrel)
+			joinrel->eclass_indexes = bms_add_members(joinrel->eclass_indexes,
+													  matching_ecs);
+	}
+
 	return result;
 }
 
@@ -2209,35 +2223,59 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 
 /*
  * add_child_rel_equivalences
- *	  Search for EC members that reference the parent_rel, and
+ *	  Search for EC members that reference the root parent of child_rel, and
  *	  add transformed members referencing the child_rel.
  *
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL
+ * or JOINREL and OTHER_JOINREL, respectively.
+ *
+ * The AppendRelInfos that are passed in are not used at all if child_rel is
+ * not a direct child of parent_rel, because they contain mapping from the
+ * direct parent, whereas we'd like to translate from the root parent's ECs.
+ * Still, having caller pass them in common cases that don't involve multi-
+ * level inheritance is great for performance, because
+ * adjust_appendrel_attrs_multilevel(), etc. have to look up AppendRelInfos
+ * from child relids whose overhead can quickly add up.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
 	int			i;
+	Relids		top_parent_relids = child_rel->top_parent_relids;
 
 	/*
 	 * EC merging should be complete already, so we can use the parent rel's
 	 * eclass_indexes to avoid searching all of root->eq_classes.
 	 */
 	Assert(root->ec_merging_done);
-	Assert(IS_SIMPLE_REL(parent_rel));
+	Assert(IS_SIMPLE_REL(parent_rel) || IS_JOIN_REL(parent_rel));
 
+	/*
+	 * A given EC may be referenced from both a "baserel" whose expressions
+	 * appear in it and also from the "joinrels" containing that "baserel".
+	 * Whereas this function is always called for baserels, it's called for
+	 * joinrels only if partitionwise join is used.
+	 *
+	 * Furthermore, a given expression in the EC may reference one or more
+	 * baserels.  If the latter, it must be transformed only by a parent
+	 * joinrel which contains all of the referenced baserels.  If, OTOH,
+	 * it references only one baserel, it must be transformed only once,
+	 * that is, with that baserel as the parent.
+	 */
 	i = -1;
 	while ((i = bms_next_member(parent_rel->eclass_indexes, i)) >= 0)
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
-		int			num_members;
+		ListCell   *lc;
 
 		/*
 		 * If this EC contains a volatile expression, then generating child
@@ -2247,56 +2285,69 @@ add_child_rel_equivalences(PlannerInfo *root,
 		if (cur_ec->ec_has_volatile)
 			continue;
 
-		/* Sanity check eclass_indexes only contain ECs for parent_rel */
-		Assert(bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids));
-
 		/*
-		 * We don't use foreach() here because there's no point in scanning
-		 * newly-added child members, so we can stop after the last
-		 * pre-existing EC member.
+		 * Sanity check that eclass_indexes only contain ECs that reference
+		 * parent_rel.  Actually, ECs only ever mention the top parent's
+		 * relids.  For joinrel parents, an EC may not mention all component
+		 * rels.
 		 */
-		num_members = list_length(cur_ec->ec_members);
-		for (int pos = 0; pos < num_members; pos++)
+		Assert(bms_is_subset(top_parent_relids, cur_ec->ec_relids) ||
+			   (IS_JOIN_REL(parent_rel) &&
+				bms_overlap(top_parent_relids, cur_ec->ec_relids)));
+
+		foreach(lc, cur_ec->ec_members)
 		{
-			EquivalenceMember *cur_em = (EquivalenceMember *) list_nth(cur_ec->ec_members, pos);
+			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
+
+			/*
+			 * There's no point in scanning child members, as their children
+			 * are in turn are obtained by applying multi-level translation
+			 * to the top level expressions, so we can stop after the last
+			 * non-child EC member.
+			 */
+			if (cur_em->em_is_child)
+				break;
 
 			if (cur_em->em_is_const)
 				continue;		/* ignore consts here */
 
 			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
+			 * Does this member reference child's topmost parent rel?  Any
+			 * multi-relation expressions should be transformed only when
+			 * needed and possible.
 			 */
-			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
-
-			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_membership(cur_em->em_relids) ==
+				bms_membership(top_parent_relids) &&
+				bms_is_subset(cur_em->em_relids, top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
 														  (Node *) cur_em->em_expr,
 														  child_rel->relids,
-														  child_rel->top_parent_relids);
+														  top_parent_relids);
 				}
 
 				/*
@@ -2306,22 +2357,20 @@ add_child_rel_equivalences(PlannerInfo *root,
 				 * don't want the child member to be marked as constant.
 				 */
 				new_relids = bms_difference(cur_em->em_relids,
-											child_rel->top_parent_relids);
+											top_parent_relids);
 				new_relids = bms_add_members(new_relids, child_rel->relids);
 
 				/*
-				 * And likewise for nullable_relids.  Note this code assumes
-				 * parent and child relids are singletons.
+				 * For nullable_relids, we must selectively replace parent
+				 * nullable relids to child ones.
 				 */
 				new_nullable_relids = cur_em->em_nullable_relids;
-				if (bms_overlap(new_nullable_relids,
-								child_rel->top_parent_relids))
-				{
-					new_nullable_relids = bms_difference(new_nullable_relids,
-														 child_rel->top_parent_relids);
-					new_nullable_relids = bms_add_members(new_nullable_relids,
-														  child_rel->relids);
-				}
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
+					new_nullable_relids =
+						adjust_child_relids_multilevel(root,
+													   new_nullable_relids,
+													   child_rel->relids,
+													   child_rel->top_parent_relids);
 
 				(void) add_eq_member(cur_ec, child_expr,
 									 new_relids, new_nullable_relids,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 85415381fb..e659606e9e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -661,6 +661,9 @@ build_join_rel(PlannerInfo *root,
 	joinrel->nullable_partexprs = NULL;
 	joinrel->partitioned_child_rels = NIL;
 
+	/* Add the joinrel to the PlannerInfo. */
+	add_join_rel(root, joinrel);
+
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
@@ -734,9 +737,6 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
 
-	/* Add the joinrel to the PlannerInfo. */
-	add_join_rel(root, joinrel);
-
 	/*
 	 * Also, if dynamic-programming join search is active, add the new joinrel
 	 * to the appropriate sublist.  Note: you might think the Assert on number
@@ -854,6 +854,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
@@ -863,10 +873,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
 
-	/*
-	 * If the parent joinrel has pending equivalence classes, so does the
-	 * child.
-	 */
 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
 
 	/* Is the join between partitions itself partitioned? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..832cc84bb4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -150,7 +150,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..003c8d1c24 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,67 @@ explain (costs off)
    Filter: ((unique1 = unique1) OR (unique2 = unique2))
 (2 rows)
 
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((COALESCE(t1.b, t2.b)) = child_joins_ecs_testtab1.a)
+   ->  Group
+         Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+         ->  Merge Append
+               Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+               ->  Group
+                     Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
+                                 Filter: ((COALESCE(t1.a, t2.a) >= 1) AND (COALESCE(t1.a, t2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p1 t2
+               ->  Group
+                     Group Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.b = t2_1.b))
+                                 Filter: ((COALESCE(t1_1.a, t2_1.a) >= 1) AND (COALESCE(t1_1.a, t2_1.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p2 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p2 t2_1
+               ->  Group
+                     Group Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.a = t2_2.a) AND (t1_2.b = t2_2.b))
+                                 Filter: ((COALESCE(t1_2.a, t2_2.a) >= 1) AND (COALESCE(t1_2.a, t2_2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p3 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p3 t2_2
+   ->  Materialize
+         ->  Seq Scan on child_joins_ecs_testtab1
+(38 rows)
+
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..bd792dfc3c 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,26 @@ explain (costs off)
 -- this could be converted, but isn't at present
 explain (costs off)
   select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;

d25ea01275-fixup-PG12-v2.patchapplication/octet-stream; name=d25ea01275-fixup-PG12-v2.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b7723481b0..0a95e7221d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 688d9b0707..06d3771d20 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2103,23 +2103,48 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 
 /*
  * add_child_rel_equivalences
- *	  Search for EC members that reference the parent_rel, and
+ *	  Search for EC members that reference the root parent of child_rel, and
  *	  add transformed members referencing the child_rel.
  *
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL
+ * or JOINREL and OTHER_JOINREL, respectively.
+ *
+ * The AppendRelInfos that are passed in are not used at all if child_rel is
+ * not a direct child of parent_rel, because they contain mapping from the
+ * direct parent, whereas we'd like to translate from the root parent's ECs.
+ * Still, having caller pass them in common cases that don't involve multi-
+ * level inheritance is great for performance, because
+ * adjust_appendrel_attrs_multilevel(), etc. have to look up AppendRelInfos
+ * from child relids whose overhead can quickly add up.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
 	ListCell   *lc1;
+	Relids		top_parent_relids = child_rel->top_parent_relids;
+	Assert(IS_SIMPLE_REL(parent_rel) || IS_JOIN_REL(parent_rel));
 
+	/*
+	 * A given EC may be referenced from both a "baserel" whose expressions
+	 * appear in it and also from the "joinrels" containing that "baserel".
+	 * Whereas this function is always called for baserels, it's called for
+	 * joinrels only if partitionwise join is used.
+	 *
+	 * Furthermore, a given expression in the EC may reference one or more
+	 * baserels.  If the latter, it must be transformed only by a parent
+	 * joinrel which contains all of the referenced baserels.  If, OTOH,
+	 * it references only one baserel, it must be transformed only once,
+	 * that is, with that baserel as the parent.
+	 */
 	foreach(lc1, root->eq_classes)
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
@@ -2137,7 +2162,9 @@ add_child_rel_equivalences(PlannerInfo *root,
 		 * No point in searching if child's topmost parent rel is not
 		 * mentioned in eclass.
 		 */
-		if (!bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids))
+		if (!(bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids) ||
+			  (IS_JOIN_REL(parent_rel) &&
+			   bms_overlap(top_parent_relids, cur_ec->ec_relids))))
 			continue;
 
 		foreach(lc2, cur_ec->ec_members)
@@ -2148,39 +2175,51 @@ add_child_rel_equivalences(PlannerInfo *root,
 				continue;		/* ignore consts here */
 
 			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
+			 * There's no point in scanning child members, as their children
+			 * are in turn are obtained by applying multi-level translation
+			 * to the top level expressions, so we can stop after the last
+			 * non-child EC member.
 			 */
 			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
+				break;
 
-			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			/*
+			 * Does this member reference child's topmost parent rel?  Any
+			 * multi-relation expressions should be transformed only when
+			 * needed and possible.
+			 */
+			if (bms_membership(cur_em->em_relids) ==
+				bms_membership(top_parent_relids) &&
+				bms_is_subset(cur_em->em_relids, child_rel->top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
 														  (Node *) cur_em->em_expr,
 														  child_rel->relids,
-														  child_rel->top_parent_relids);
+														  top_parent_relids);
 				}
 
 				/*
@@ -2190,22 +2229,20 @@ add_child_rel_equivalences(PlannerInfo *root,
 				 * don't want the child member to be marked as constant.
 				 */
 				new_relids = bms_difference(cur_em->em_relids,
-											child_rel->top_parent_relids);
+											top_parent_relids);
 				new_relids = bms_add_members(new_relids, child_rel->relids);
 
 				/*
-				 * And likewise for nullable_relids.  Note this code assumes
-				 * parent and child relids are singletons.
+				 * For nullable_relids, we must selectively replace parent
+				 * nullable relids to child ones.
 				 */
 				new_nullable_relids = cur_em->em_nullable_relids;
-				if (bms_overlap(new_nullable_relids,
-								child_rel->top_parent_relids))
-				{
-					new_nullable_relids = bms_difference(new_nullable_relids,
-														 child_rel->top_parent_relids);
-					new_nullable_relids = bms_add_members(new_nullable_relids,
-														  child_rel->relids);
-				}
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
+					new_nullable_relids =
+						adjust_child_relids_multilevel(root,
+													   new_nullable_relids,
+													   child_rel->relids,
+													   top_parent_relids);
 
 				(void) add_eq_member(cur_ec, child_expr,
 									 new_relids, new_nullable_relids,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..bf74abd188 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -848,6 +848,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..832cc84bb4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -150,7 +150,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..003c8d1c24 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,67 @@ explain (costs off)
    Filter: ((unique1 = unique1) OR (unique2 = unique2))
 (2 rows)
 
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((COALESCE(t1.b, t2.b)) = child_joins_ecs_testtab1.a)
+   ->  Group
+         Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+         ->  Merge Append
+               Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+               ->  Group
+                     Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
+                                 Filter: ((COALESCE(t1.a, t2.a) >= 1) AND (COALESCE(t1.a, t2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p1 t2
+               ->  Group
+                     Group Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.b = t2_1.b))
+                                 Filter: ((COALESCE(t1_1.a, t2_1.a) >= 1) AND (COALESCE(t1_1.a, t2_1.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p2 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p2 t2_1
+               ->  Group
+                     Group Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.a = t2_2.a) AND (t1_2.b = t2_2.b))
+                                 Filter: ((COALESCE(t1_2.a, t2_2.a) >= 1) AND (COALESCE(t1_2.a, t2_2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p3 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p3 t2_2
+   ->  Materialize
+         ->  Seq Scan on child_joins_ecs_testtab1
+(38 rows)
+
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..bd792dfc3c 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,26 @@ explain (costs off)
 -- this could be converted, but isn't at present
 explain (costs off)
   select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;

#14

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Amit Langote (#13)

Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

Attached updated patches.

[ looks at that... ] I seriously, seriously dislike what you did
in build_join_rel, ie adding the new joinrel to the global data
structures before it's fully filled in. That's inevitably going
to bite us on the ass someday, and you couldn't even be bothered
with a comment.

Worse, the reason you did that seems to be so that
generate_join_implied_equalities can find and modify the joinrel,
which is an undocumented and not very well thought out side-effect.
There are other call sites for that where the joinrel may or may not
already exist, meaning that it might or might not add more members into
the joinrel's eclass_indexes. At best that's wasted work, and at
worst it costs efficiency, since we could in principle get different
sets of common relids depending on which join input pair we're
considering. They're equally valid sets, but unioning them is
just going to add more relids than we need.

Also, the existing logic around eclass_indexes is that it's only
set for baserels and we know it is valid after we've finished
EC merging. I don't much like modifying add_child_rel_equivalences
to have some different opinions about that for joinrels.

It'd probably be better to just eat the cost of doing
get_common_eclass_indexes over again when it's time to do
add_child_rel_equivalences for a joinrel, since we aren't (I hope)
going to do that more than once per joinrel anyway. This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

David --- much of the complexity here comes from the addition of
the eclass_indexes infrastructure, so do you have any thoughts?

regards, tom lane

#15

David Rowley

david.rowley@2ndquadrant.com

about 6 years ago

In reply to: Tom Lane (#14)

Re: v12.0: ERROR: could not find pathkey item to sort

On Thu, 31 Oct 2019 at 05:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David --- much of the complexity here comes from the addition of
the eclass_indexes infrastructure, so do you have any thoughts?

Hindsight makes me think I should have mentioned in the comment for
eclass_indexes that it's only used for simple rels and remains NULL
for anything else.

All the code in equivclass.c either uses get_common_eclass_indexes()
and get_eclass_indexes_for_relids() which go down to the simple rel
level to obtain their eclass_indexes. When calling
get_eclass_indexes_for_relids() we'll build a union Bitmapset with the
indexes from each simple rel that the join rel is made from. We only
ever directly use the eclass_indexes field when we're certain we're
dealing with a simple rel already. get_eclass_indexes_for_relids()
would do the same job, but using the field directly saves a bit of
needless effort and memory allocations. So, in short, I don't really
see why we need to set eclass_indexes for anything other than simple
rels.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#16

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tom Lane (#14)

1 attachment(s)

Re: v12.0: ERROR: could not find pathkey item to sort

Thanks for checking.

On Thu, Oct 31, 2019 at 1:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, the existing logic around eclass_indexes is that it's only
set for baserels and we know it is valid after we've finished
EC merging. I don't much like modifying add_child_rel_equivalences
to have some different opinions about that for joinrels.

It'd probably be better to just eat the cost of doing
get_common_eclass_indexes over again when it's time to do
add_child_rel_equivalences for a joinrel, since we aren't (I hope)
going to do that more than once per joinrel anyway.

If you mean once per child joinrel, then yes. Implemented this
approach in the attached updated patch.

ISTM, get_common_eclass_indexes() returns the same value for all the
child joinrels (or really for its outer and inner component rels) as
it does for the parent joinrel. So, it might be better to set
eclass_indexes in parent joinrel once and use the same value for all
its descendant joinrels. Although, we'd need to export
get_common_eclass_indexes() out of equivclass.c to call it from
build_join_rel() such that it doesn't require messing with where the
joinrel is added to the global data structure. Maybe that complicates
eclass_indexes infrastructure though.

This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

Separate entry points sounds better, but only in HEAD? Should we have
separate entry points in PG 12 too?

Attached updated patch only for HEAD.

Thanks,
Amit

Attachments:

d25ea01275-fixup-HEAD-v3.patchapplication/octet-stream; name=d25ea01275-fixup-HEAD-v3.patchDownload

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ccc07ba9f0..332f05875f 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2209,7 +2209,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 
 /*
  * add_child_rel_equivalences
- *	  Search for EC members that reference the parent_rel, and
+ *	  Search for EC members that reference the root parent of child_rel, and
  *	  add transformed members referencing the child_rel.
  *
  * Note that this function won't be called at all unless we have at least some
@@ -2217,6 +2217,14 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  *
  * parent_rel and child_rel could be derived from appinfo, but since the
  * caller has already computed them, we might as well just pass them in.
+ *
+ * The AppendRelInfos that are passed in are not used at all if child_rel is
+ * not a direct child of parent_rel, because they would contain mapping from
+ * the direct parent, whereas we need to translate from the root parent's EC
+ * expressions.  Still, having caller pass them in common cases that don't
+ * involve multi-level inheritance is great for performance, because
+ * adjust_appendrel_attrs_multilevel(), etc. have to look up AppendRelInfos
+ * from child relids whose overhead can quickly add up.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
@@ -2334,6 +2342,129 @@ add_child_rel_equivalences(PlannerInfo *root,
 	}
 }
 
+/*
+ * add_child_join_rel_equivalences
+ *	  Like add_child_rel_equivalences(), but for joinrels
+ *
+ * Here we find the ECs relevant to top parent joinrel and add transformed
+ * member expressions that refer to given child joinrel.
+ */
+void
+add_child_join_rel_equivalences(PlannerInfo *root,
+								int nappinfos, AppendRelInfo **appinfos,
+								RelOptInfo *parent_joinrel,
+								RelOptInfo *child_joinrel,
+								RelOptInfo *child_outer_rel,
+								RelOptInfo *child_inner_rel)
+{
+	Relids		top_parent_relids = child_joinrel->top_parent_relids;
+	Bitmapset  *matching_ecs;
+	int			i;
+
+	Assert(IS_JOIN_REL(child_joinrel) && IS_JOIN_REL(parent_joinrel));
+
+	matching_ecs = get_common_eclass_indexes(root,
+											 child_inner_rel->relids,
+											 child_outer_rel->relids);
+	i = -1;
+	while ((i = bms_next_member(matching_ecs, i)) >= 0)
+	{
+		EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+		int			num_members;
+
+		/*
+		 * If this EC contains a volatile expression, then generating child
+		 * EMs would be downright dangerous, so skip it.  We rely on a
+		 * volatile EC having only one EM.
+		 */
+		if (cur_ec->ec_has_volatile)
+			continue;
+
+		/* Sanity check */
+		Assert(bms_overlap(top_parent_relids, cur_ec->ec_relids));
+
+		/*
+		 * We don't use foreach() here because there's no point in scanning
+		 * newly-added child members, so we can stop after the last
+		 * pre-existing EC member.
+		 */
+		num_members = list_length(cur_ec->ec_members);
+		for (int pos = 0; pos < num_members; pos++)
+		{
+			EquivalenceMember *cur_em = (EquivalenceMember *) list_nth(cur_ec->ec_members, pos);
+
+			if (cur_em->em_is_const)
+				continue;		/* ignore consts here */
+
+			/*
+			 * We consider only original EC members here, not
+			 * already-transformed child members.
+			 */
+			if (cur_em->em_is_child)
+				continue;
+
+			/*
+			 * Does this member reference child's topmost parent rel?  Don't
+			 * bother with expressions that reference single base appendrel,
+			 * because they would already have been transformed.
+			 */
+			if (bms_membership(cur_em->em_relids) == BMS_MULTIPLE &&
+				bms_overlap(cur_em->em_relids, top_parent_relids))
+			{
+				/* Yes, generate transformed child version */
+				Expr	   *child_expr;
+				Relids		new_relids;
+				Relids		new_nullable_relids;
+
+				if (parent_joinrel->reloptkind == RELOPT_JOINREL)
+				{
+					/* Simple single-level transformation */
+					child_expr = (Expr *)
+						adjust_appendrel_attrs(root,
+											   (Node *) cur_em->em_expr,
+											   nappinfos, appinfos);
+				}
+				else
+				{
+					Assert(parent_joinrel->reloptkind == RELOPT_OTHER_JOINREL);
+					/* Must do multi-level transformation */
+					child_expr = (Expr *)
+						adjust_appendrel_attrs_multilevel(root,
+														  (Node *) cur_em->em_expr,
+														  child_joinrel->relids,
+														  top_parent_relids);
+				}
+
+				/*
+				 * Transform em_relids to match.  Note we do *not* do
+				 * pull_varnos(child_expr) here, as for example the
+				 * transformation might have substituted a constant, but we
+				 * don't want the child member to be marked as constant.
+				 */
+				new_relids = bms_difference(cur_em->em_relids,
+											child_joinrel->top_parent_relids);
+				new_relids = bms_add_members(new_relids, child_joinrel->relids);
+
+				/*
+				 * For nullable_relids, we must selectively replace parent
+				 * nullable relids to child ones.
+				 */
+				new_nullable_relids = cur_em->em_nullable_relids;
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
+					new_nullable_relids =
+						adjust_child_relids_multilevel(root,
+													   new_nullable_relids,
+													   child_joinrel->relids,
+													   top_parent_relids);
+
+				(void) add_eq_member(cur_ec, child_expr,
+									 new_relids, new_nullable_relids,
+									 true, cur_em->em_datatype);
+			}
+		}
+	}
+}
+
 
 /*
  * generate_implied_equalities_for_column
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 85415381fb..f0e8e5ed3a 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -854,7 +854,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
-	pfree(appinfos);
 
 	/*
 	 * Lateral relids referred in child join will be same as that referred in
@@ -869,6 +868,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	 */
 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
 
+	if (joinrel->has_eclass_joins || has_useful_pathkeys(root, parent_joinrel))
+		add_child_join_rel_equivalences(root,
+										nappinfos, appinfos,
+										parent_joinrel, joinrel,
+										outer_rel, inner_rel);
+
+	pfree(appinfos);
+
 	/* Is the join between partitions itself partitioned? */
 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
 								 jointype);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..55c95664e7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -153,6 +153,12 @@ extern void add_child_rel_equivalences(PlannerInfo *root,
 									   AppendRelInfo *appinfo,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
+extern void add_child_join_rel_equivalences(PlannerInfo *root,
+								int nappinfos, AppendRelInfo **appinfos,
+								RelOptInfo *parent_rel,
+								RelOptInfo *child_rel,
+								RelOptInfo *child_outer_rel,
+								RelOptInfo *child_inner_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
 													RelOptInfo *rel,
 													ec_matches_callback_type callback,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..003c8d1c24 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,67 @@ explain (costs off)
    Filter: ((unique1 = unique1) OR (unique2 = unique2))
 (2 rows)
 
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((COALESCE(t1.b, t2.b)) = child_joins_ecs_testtab1.a)
+   ->  Group
+         Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+         ->  Merge Append
+               Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+               ->  Group
+                     Group Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1.a, t2.a)), (COALESCE(t1.b, t2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
+                                 Filter: ((COALESCE(t1.a, t2.a) >= 1) AND (COALESCE(t1.a, t2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p1 t2
+               ->  Group
+                     Group Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_1.a, t2_1.a)), (COALESCE(t1_1.b, t2_1.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.b = t2_1.b))
+                                 Filter: ((COALESCE(t1_1.a, t2_1.a) >= 1) AND (COALESCE(t1_1.a, t2_1.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p2 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p2 t2_1
+               ->  Group
+                     Group Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                     ->  Sort
+                           Sort Key: (COALESCE(t1_2.a, t2_2.a)), (COALESCE(t1_2.b, t2_2.b))
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.a = t2_2.a) AND (t1_2.b = t2_2.b))
+                                 Filter: ((COALESCE(t1_2.a, t2_2.a) >= 1) AND (COALESCE(t1_2.a, t2_2.a) < 200000))
+                                 ->  Seq Scan on child_joins_ecs_testtab2_p3 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on child_joins_ecs_testtab2_p3 t2_2
+   ->  Materialize
+         ->  Seq Scan on child_joins_ecs_testtab1
+(38 rows)
+
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..bd792dfc3c 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,26 @@ explain (costs off)
 -- this could be converted, but isn't at present
 explain (costs off)
   select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+-- Check that child merge join for a FULL OUTER join works correctly
+SET enable_partitionwise_join TO on;
+SET enable_partitionwise_aggregate TO on;
+CREATE TABLE child_joins_ecs_testtab1 (a int);
+INSERT INTO child_joins_ecs_testtab1 SELECT generate_series(1, 100);
+CREATE TABLE child_joins_ecs_testtab2 (a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE child_joins_ecs_testtab2_p1 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (1) TO (10001);
+CREATE TABLE child_joins_ecs_testtab2_p2 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (10001) TO (20001);
+CREATE TABLE child_joins_ecs_testtab2_p3 PARTITION OF child_joins_ecs_testtab2 FOR VALUES FROM (20001) TO (30001);
+INSERT INTO child_joins_ecs_testtab2 SELECT a, a % 100 + 1 FROM generate_series(1, 30000) a;
+ANALYZE child_joins_ecs_testtab1, child_joins_ecs_testtab2;
+-- this forces plan to be a specific shape
+SET work_mem TO '0.1MB';
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (COSTS OFF)
+SELECT child_joins_ecs_testtab1.*
+  FROM (SELECT a, b
+        FROM child_joins_ecs_testtab2 t1 FULL JOIN child_joins_ecs_testtab2 t2 USING(a, b)
+        WHERE a >= 1 AND a < 200000
+        GROUP BY 1, 2) AS data
+  JOIN child_joins_ecs_testtab1 ON (child_joins_ecs_testtab1.a = data.b);
+DROP TABLE child_joins_ecs_testtab1, child_joins_ecs_testtab2;

#17

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Amit Langote (#16)

1 attachment(s)

Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

Separate entry points sounds better, but only in HEAD?

I had actually had in mind that we might have two wrappers around a
common search-and-replace routine, but after studying the code I see that
there's just enough differences to make it probably not worth the trouble
to do it like that. I did spend a bit of time removing cosmetic
differences between the two versions, though, mostly by creating
common local variables.

I think the way you did the matching_ecs computation is actually wrong:
we need to find ECs that reference any rel of the join, not only those
that reference both inputs. If nothing else, the way you have it here
makes the results dependent on which pair of input rels gets considered
first, and that's certainly bad for multiway joins.

Also, I thought we should try to put more conditions on whether we
invoke add_child_join_rel_equivalences at all. In the attached I did

if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
(joinrel->has_eclass_joins ||
has_useful_pathkeys(root, parent_joinrel)))

but I wonder if we could do more, like checking to see if the parent
joinrel is partitioned. Alternatively, maybe it's unnecessary because
we won't try to build child joinrels unless these conditions are true?

I did not like the test case either. Creating a whole new (and rather
large) test table just for this one case is unreasonably expensive ---
it about doubles the runtime of the equivclass test for me. There's
already a perfectly good test table in partition_join.sql, which seems
like a more natural home for this anyhow. After a bit of finagling
I was able to adapt the test query to fail on that table.

Patch v4 attached. I've not looked at what we need to do to make this
work in v12.

regards, tom lane

Attachments:

d25ea01275-fixup-HEAD-v4.patchtext/x-diff; charset=us-ascii; name=d25ea01275-fixup-HEAD-v4.patchDownload

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ccc07ba..082776f 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2209,7 +2209,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 
 /*
  * add_child_rel_equivalences
- *	  Search for EC members that reference the parent_rel, and
+ *	  Search for EC members that reference the root parent of child_rel, and
  *	  add transformed members referencing the child_rel.
  *
  * Note that this function won't be called at all unless we have at least some
@@ -2217,6 +2217,12 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  *
  * parent_rel and child_rel could be derived from appinfo, but since the
  * caller has already computed them, we might as well just pass them in.
+ *
+ * The passed-in AppendRelInfo is not used when the parent_rel is not a
+ * top-level baserel, since it shows the mapping from the parent_rel but
+ * we need to translate EC expressions that refer to the top-level parent.
+ * Using it is faster than using adjust_appendrel_attrs_multilevel(), though,
+ * so we prefer it when we can.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
@@ -2224,6 +2230,8 @@ add_child_rel_equivalences(PlannerInfo *root,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
+	Relids		top_parent_relids = child_rel->top_parent_relids;
+	Relids		child_relids = child_rel->relids;
 	int			i;
 
 	/*
@@ -2248,7 +2256,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 			continue;
 
 		/* Sanity check eclass_indexes only contain ECs for parent_rel */
-		Assert(bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids));
+		Assert(bms_is_subset(top_parent_relids, cur_ec->ec_relids));
 
 		/*
 		 * We don't use foreach() here because there's no point in scanning
@@ -2268,13 +2276,14 @@ add_child_rel_equivalences(PlannerInfo *root,
 			 * already-transformed child members.  Otherwise, if some original
 			 * member expression references more than one appendrel, we'd get
 			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
+			 * combinations of children.  (But add_child_join_rel_equivalences
+			 * may add targeted combinations for partitionwise-join purposes.)
 			 */
 			if (cur_em->em_is_child)
 				continue;		/* ignore children here */
 
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_overlap(cur_em->em_relids, top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
@@ -2295,8 +2304,8 @@ add_child_rel_equivalences(PlannerInfo *root,
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
 														  (Node *) cur_em->em_expr,
-														  child_rel->relids,
-														  child_rel->top_parent_relids);
+														  child_relids,
+														  top_parent_relids);
 				}
 
 				/*
@@ -2306,21 +2315,20 @@ add_child_rel_equivalences(PlannerInfo *root,
 				 * don't want the child member to be marked as constant.
 				 */
 				new_relids = bms_difference(cur_em->em_relids,
-											child_rel->top_parent_relids);
-				new_relids = bms_add_members(new_relids, child_rel->relids);
+											top_parent_relids);
+				new_relids = bms_add_members(new_relids, child_relids);
 
 				/*
 				 * And likewise for nullable_relids.  Note this code assumes
 				 * parent and child relids are singletons.
 				 */
 				new_nullable_relids = cur_em->em_nullable_relids;
-				if (bms_overlap(new_nullable_relids,
-								child_rel->top_parent_relids))
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
 				{
 					new_nullable_relids = bms_difference(new_nullable_relids,
-														 child_rel->top_parent_relids);
+														 top_parent_relids);
 					new_nullable_relids = bms_add_members(new_nullable_relids,
-														  child_rel->relids);
+														  child_relids);
 				}
 
 				(void) add_eq_member(cur_ec, child_expr,
@@ -2334,6 +2342,133 @@ add_child_rel_equivalences(PlannerInfo *root,
 	}
 }
 
+/*
+ * add_child_join_rel_equivalences
+ *	  Like add_child_rel_equivalences(), but for joinrels
+ *
+ * Here we find the ECs relevant to the top parent joinrel and add transformed
+ * member expressions that refer to this child joinrel.
+ *
+ * Note that this function won't be called at all unless we have at least some
+ * reason to believe that the EC members it generates will be useful.
+ */
+void
+add_child_join_rel_equivalences(PlannerInfo *root,
+								int nappinfos, AppendRelInfo **appinfos,
+								RelOptInfo *parent_joinrel,
+								RelOptInfo *child_joinrel)
+{
+	Relids		top_parent_relids = child_joinrel->top_parent_relids;
+	Relids		child_relids = child_joinrel->relids;
+	Bitmapset  *matching_ecs;
+	int			i;
+
+	Assert(IS_JOIN_REL(child_joinrel) && IS_JOIN_REL(parent_joinrel));
+
+	/* We need consider only ECs that mention the parent joinrel */
+	matching_ecs = get_eclass_indexes_for_relids(root, top_parent_relids);
+
+	i = -1;
+	while ((i = bms_next_member(matching_ecs, i)) >= 0)
+	{
+		EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+		int			num_members;
+
+		/*
+		 * If this EC contains a volatile expression, then generating child
+		 * EMs would be downright dangerous, so skip it.  We rely on a
+		 * volatile EC having only one EM.
+		 */
+		if (cur_ec->ec_has_volatile)
+			continue;
+
+		/* Sanity check on get_eclass_indexes_for_relids result */
+		Assert(bms_overlap(top_parent_relids, cur_ec->ec_relids));
+
+		/*
+		 * We don't use foreach() here because there's no point in scanning
+		 * newly-added child members, so we can stop after the last
+		 * pre-existing EC member.
+		 */
+		num_members = list_length(cur_ec->ec_members);
+		for (int pos = 0; pos < num_members; pos++)
+		{
+			EquivalenceMember *cur_em = (EquivalenceMember *) list_nth(cur_ec->ec_members, pos);
+
+			if (cur_em->em_is_const)
+				continue;		/* ignore consts here */
+
+			/*
+			 * We consider only original EC members here, not
+			 * already-transformed child members.
+			 */
+			if (cur_em->em_is_child)
+				continue;		/* ignore children here */
+
+			/*
+			 * We may ignore expressions that reference a single baserel,
+			 * because add_child_rel_equivalences should have handled them.
+			 */
+			if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
+				continue;
+
+			/* Does this member reference child's topmost parent rel? */
+			if (bms_overlap(cur_em->em_relids, top_parent_relids))
+			{
+				/* Yes, generate transformed child version */
+				Expr	   *child_expr;
+				Relids		new_relids;
+				Relids		new_nullable_relids;
+
+				if (parent_joinrel->reloptkind == RELOPT_JOINREL)
+				{
+					/* Simple single-level transformation */
+					child_expr = (Expr *)
+						adjust_appendrel_attrs(root,
+											   (Node *) cur_em->em_expr,
+											   nappinfos, appinfos);
+				}
+				else
+				{
+					/* Must do multi-level transformation */
+					Assert(parent_joinrel->reloptkind == RELOPT_OTHER_JOINREL);
+					child_expr = (Expr *)
+						adjust_appendrel_attrs_multilevel(root,
+														  (Node *) cur_em->em_expr,
+														  child_relids,
+														  top_parent_relids);
+				}
+
+				/*
+				 * Transform em_relids to match.  Note we do *not* do
+				 * pull_varnos(child_expr) here, as for example the
+				 * transformation might have substituted a constant, but we
+				 * don't want the child member to be marked as constant.
+				 */
+				new_relids = bms_difference(cur_em->em_relids,
+											top_parent_relids);
+				new_relids = bms_add_members(new_relids, child_relids);
+
+				/*
+				 * For nullable_relids, we must selectively replace parent
+				 * nullable relids with child ones.
+				 */
+				new_nullable_relids = cur_em->em_nullable_relids;
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
+					new_nullable_relids =
+						adjust_child_relids_multilevel(root,
+													   new_nullable_relids,
+													   child_relids,
+													   top_parent_relids);
+
+				(void) add_eq_member(cur_ec, child_expr,
+									 new_relids, new_nullable_relids,
+									 true, cur_em->em_datatype);
+			}
+		}
+	}
+}
+
 
 /*
  * generate_implied_equalities_for_column
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8541538..1e71e2e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -843,6 +843,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* Compute information relevant to foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
+	/* Conpute information needed for mapping Vars to the child rel */
 	appinfos = find_appinfos_by_relids(root, joinrel->relids, &nappinfos);
 
 	/* Set up reltarget struct */
@@ -854,7 +855,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
-	pfree(appinfos);
 
 	/*
 	 * Lateral relids referred in child join will be same as that referred in
@@ -886,6 +886,22 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* Add the relation to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
+	/*
+	 * If we are using partitionwise join or aggregation, we might need
+	 * EquivalenceClass members corresponding to the child join, so that we
+	 * can represent sort pathkeys for it.  As with children of baserels, we
+	 * shouldn't need this unless there are relevant eclass joins (implying
+	 * that a merge join might be possible) or pathkeys to sort by.
+	 */
+	if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
+		(joinrel->has_eclass_joins ||
+		 has_useful_pathkeys(root, parent_joinrel)))
+		add_child_join_rel_equivalences(root,
+										nappinfos, appinfos,
+										parent_joinrel, joinrel);
+
+	pfree(appinfos);
+
 	return joinrel;
 }
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137..c6c3463 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -153,6 +153,11 @@ extern void add_child_rel_equivalences(PlannerInfo *root,
 									   AppendRelInfo *appinfo,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
+extern void add_child_join_rel_equivalences(PlannerInfo *root,
+											int nappinfos,
+											AppendRelInfo **appinfos,
+											RelOptInfo *parent_rel,
+											RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
 													RelOptInfo *rel,
 													ec_matches_callback_type callback,
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index cad8dd5..975bf67 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -459,6 +459,83 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
  550 |     | 
 (12 rows)
 
+-- bug with inadequate sort key representation
+SET enable_partitionwise_aggregate TO true;
+SET enable_hashjoin TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Group
+   Group Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+   ->  Merge Append
+         Sort Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+         ->  Group
+               Group Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+                     ->  Merge Full Join
+                           Merge Cond: ((prt1_p1.a = p2.a) AND (prt1_p1.b = p2.b))
+                           Filter: ((COALESCE(prt1_p1.a, p2.a) >= 490) AND (COALESCE(prt1_p1.a, p2.a) <= 510))
+                           ->  Sort
+                                 Sort Key: prt1_p1.a, prt1_p1.b
+                                 ->  Seq Scan on prt1_p1
+                           ->  Sort
+                                 Sort Key: p2.a, p2.b
+                                 ->  Seq Scan on prt2_p1 p2
+         ->  Group
+               Group Key: (COALESCE(prt1_p2.a, p2_1.a)), (COALESCE(prt1_p2.b, p2_1.b))
+               ->  Sort
+                     Sort Key: (COALESCE(prt1_p2.a, p2_1.a)), (COALESCE(prt1_p2.b, p2_1.b))
+                     ->  Merge Full Join
+                           Merge Cond: ((prt1_p2.a = p2_1.a) AND (prt1_p2.b = p2_1.b))
+                           Filter: ((COALESCE(prt1_p2.a, p2_1.a) >= 490) AND (COALESCE(prt1_p2.a, p2_1.a) <= 510))
+                           ->  Sort
+                                 Sort Key: prt1_p2.a, prt1_p2.b
+                                 ->  Seq Scan on prt1_p2
+                           ->  Sort
+                                 Sort Key: p2_1.a, p2_1.b
+                                 ->  Seq Scan on prt2_p2 p2_1
+         ->  Group
+               Group Key: (COALESCE(prt1_p3.a, p2_2.a)), (COALESCE(prt1_p3.b, p2_2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(prt1_p3.a, p2_2.a)), (COALESCE(prt1_p3.b, p2_2.b))
+                     ->  Merge Full Join
+                           Merge Cond: ((prt1_p3.a = p2_2.a) AND (prt1_p3.b = p2_2.b))
+                           Filter: ((COALESCE(prt1_p3.a, p2_2.a) >= 490) AND (COALESCE(prt1_p3.a, p2_2.a) <= 510))
+                           ->  Sort
+                                 Sort Key: prt1_p3.a, prt1_p3.b
+                                 ->  Seq Scan on prt1_p3
+                           ->  Sort
+                                 Sort Key: p2_2.a, p2_2.b
+                                 ->  Seq Scan on prt2_p3 p2_2
+(43 rows)
+
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 495 | 20
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 501 |  1
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 507 |  7
+ 508 |  8
+ 510 | 10
+(14 rows)
+
+RESET enable_partitionwise_aggregate;
+RESET enable_hashjoin;
 --
 -- partitioned by expression
 --
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index fb3ba18..92994b4 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -91,6 +91,21 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, t2.b t2b, t2.c t2c, least(t1.a,t2.a,t3.a) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
 			  ON t1.c = ss.t2c WHERE (t1.b + coalesce(ss.t2b, 0)) = 0 ORDER BY t1.a;
 
+-- bug with inadequate sort key representation
+SET enable_partitionwise_aggregate TO true;
+SET enable_hashjoin TO false;
+
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+RESET enable_partitionwise_aggregate;
+RESET enable_hashjoin;
+
 --
 -- partitioned by expression
 --

#18

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tom Lane (#17)

1 attachment(s)

Re: v12.0: ERROR: could not find pathkey item to sort

On Sun, Nov 3, 2019 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

Separate entry points sounds better, but only in HEAD?

I had actually had in mind that we might have two wrappers around a
common search-and-replace routine, but after studying the code I see that
there's just enough differences to make it probably not worth the trouble
to do it like that. I did spend a bit of time removing cosmetic
differences between the two versions, though, mostly by creating
common local variables.

Agree that having two totally separate routines is better.

I think the way you did the matching_ecs computation is actually wrong:
we need to find ECs that reference any rel of the join, not only those
that reference both inputs. If nothing else, the way you have it here
makes the results dependent on which pair of input rels gets considered
first, and that's certainly bad for multiway joins.

I'm not sure I fully understand the problems, but maybe you're right.

Also, I thought we should try to put more conditions on whether we
invoke add_child_join_rel_equivalences at all. In the attached I did

if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
(joinrel->has_eclass_joins ||
has_useful_pathkeys(root, parent_joinrel)))

but I wonder if we could do more, like checking to see if the parent
joinrel is partitioned. Alternatively, maybe it's unnecessary because
we won't try to build child joinrels unless these conditions are true?

Actually, I think we can assert in add_child_rel_equivalences() that
enable_partitionwise_join is true. Then checking
enable_partitionwise_aggregate is unnecessary.

I did not like the test case either. Creating a whole new (and rather
large) test table just for this one case is unreasonably expensive ---
it about doubles the runtime of the equivclass test for me. There's
already a perfectly good test table in partition_join.sql, which seems
like a more natural home for this anyhow. After a bit of finagling
I was able to adapt the test query to fail on that table.

That's great. I tried but I can only finagle so much when it comes to
twisting around plan shapes to what I need. :)

Patch v4 attached. I've not looked at what we need to do to make this
work in v12.

Thanks a lot for the revised patch.

Maybe the only difference between HEAD and v12 is that the former has
eclass_indexes infrastructure, whereas the latter doesn't? I have
attached a version of your patch adapted for v12.

Also, looking at this in the patched code:

+ /*
+ * We may ignore expressions that reference a single baserel,
+ * because add_child_rel_equivalences should have handled them.
+ */
+ if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
+ continue;

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not? Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members. I'm
thinking that we should apply this diff:

diff --git a/src/backend/optimizer/path/equivclass.c
b/src/backend/optimizer/path/equivclass.c
index e8e9e9a314..d4d80c8101 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2169,7 +2169,7 @@ add_child_rel_equivalences(PlannerInfo *root,
                 continue;       /* ignore children here */

             /* Does this member reference child's topmost parent rel? */
-            if (bms_overlap(cur_em->em_relids, top_parent_relids))
+            if (bms_is_subset(cur_em->em_relids, top_parent_relids))
             {
                 /* Yes, generate transformed child version */
                 Expr       *child_expr;

Thoughts?

Thanks,
Amit

Attachments:

d25ea01275-fixup-PG12-v4.patchapplication/octet-stream; name=d25ea01275-fixup-PG12-v4.patchDownload

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 688d9b0707..e8e9e9a314 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2103,7 +2103,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 
 /*
  * add_child_rel_equivalences
- *	  Search for EC members that reference the parent_rel, and
+ *	  Search for EC members that reference the root parent of child_rel, and
  *	  add transformed members referencing the child_rel.
  *
  * Note that this function won't be called at all unless we have at least some
@@ -2111,6 +2111,12 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  *
  * parent_rel and child_rel could be derived from appinfo, but since the
  * caller has already computed them, we might as well just pass them in.
+ *
+ * The passed-in AppendRelInfo is not used when the parent_rel is not a
+ * top-level baserel, since it shows the mapping from the parent_rel but
+ * we need to translate EC expressions that refer to the top-level parent.
+ * Using it is faster than using adjust_appendrel_attrs_multilevel(), though,
+ * so we prefer it when we can.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
@@ -2119,6 +2125,10 @@ add_child_rel_equivalences(PlannerInfo *root,
 						   RelOptInfo *child_rel)
 {
 	ListCell   *lc1;
+	Relids		top_parent_relids = child_rel->top_parent_relids;
+	Relids		child_relids = child_rel->relids;
+
+	Assert(IS_SIMPLE_REL(parent_rel));
 
 	foreach(lc1, root->eq_classes)
 	{
@@ -2137,7 +2147,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 		 * No point in searching if child's topmost parent rel is not
 		 * mentioned in eclass.
 		 */
-		if (!bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids))
+		if (!bms_is_subset(top_parent_relids, cur_ec->ec_relids))
 			continue;
 
 		foreach(lc2, cur_ec->ec_members)
@@ -2152,13 +2162,14 @@ add_child_rel_equivalences(PlannerInfo *root,
 			 * already-transformed child members.  Otherwise, if some original
 			 * member expression references more than one appendrel, we'd get
 			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
+			 * combinations of children.  (But add_child_join_rel_equivalences
+			 * may add targeted combinations for partitionwise-join purposes.)
 			 */
 			if (cur_em->em_is_child)
 				continue;		/* ignore children here */
 
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_overlap(cur_em->em_relids, top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
@@ -2179,8 +2190,8 @@ add_child_rel_equivalences(PlannerInfo *root,
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
 														  (Node *) cur_em->em_expr,
-														  child_rel->relids,
-														  child_rel->top_parent_relids);
+														  child_relids,
+														  top_parent_relids);
 				}
 
 				/*
@@ -2190,21 +2201,20 @@ add_child_rel_equivalences(PlannerInfo *root,
 				 * don't want the child member to be marked as constant.
 				 */
 				new_relids = bms_difference(cur_em->em_relids,
-											child_rel->top_parent_relids);
-				new_relids = bms_add_members(new_relids, child_rel->relids);
+											top_parent_relids);
+				new_relids = bms_add_members(new_relids, child_relids);
 
 				/*
 				 * And likewise for nullable_relids.  Note this code assumes
 				 * parent and child relids are singletons.
 				 */
 				new_nullable_relids = cur_em->em_nullable_relids;
-				if (bms_overlap(new_nullable_relids,
-								child_rel->top_parent_relids))
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
 				{
 					new_nullable_relids = bms_difference(new_nullable_relids,
-														 child_rel->top_parent_relids);
+														 top_parent_relids);
 					new_nullable_relids = bms_add_members(new_nullable_relids,
-														  child_rel->relids);
+														  child_relids);
 				}
 
 				(void) add_eq_member(cur_ec, child_expr,
@@ -2215,6 +2225,131 @@ add_child_rel_equivalences(PlannerInfo *root,
 	}
 }
 
+/*
+ * add_child_join_rel_equivalences
+ *	  Like add_child_rel_equivalences(), but for joinrels
+ *
+ * Here we find the ECs relevant to the top parent joinrel and add transformed
+ * member expressions that refer to this child joinrel.
+ *
+ * Note that this function won't be called at all unless we have at least some
+ * reason to believe that the EC members it generates will be useful.
+ */
+void
+add_child_join_rel_equivalences(PlannerInfo *root,
+								int nappinfos, AppendRelInfo **appinfos,
+								RelOptInfo *parent_joinrel,
+								RelOptInfo *child_joinrel)
+{
+	ListCell   *lc1;
+	Relids		top_parent_relids = child_joinrel->top_parent_relids;
+	Relids		child_relids = child_joinrel->relids;
+
+	Assert(IS_JOIN_REL(child_joinrel) && IS_JOIN_REL(parent_joinrel));
+
+	foreach(lc1, root->eq_classes)
+	{
+		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
+		ListCell   *lc2;
+
+		/*
+		 * If this EC contains a volatile expression, then generating child
+		 * EMs would be downright dangerous, so skip it.  We rely on a
+		 * volatile EC having only one EM.
+		 */
+		if (cur_ec->ec_has_volatile)
+			continue;
+
+		/*
+		 * No point in searching if child's topmost parent rel is not
+		 * mentioned in eclass.
+		 */
+		if (!bms_overlap(top_parent_relids, cur_ec->ec_relids))
+			continue;
+
+		/*
+		 * We don't use foreach() here because there's no point in scanning
+		 * newly-added child members, so we can stop after the last
+		 * pre-existing EC member.
+		 */
+		foreach(lc2, cur_ec->ec_members)
+		{
+			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
+
+			if (cur_em->em_is_const)
+				continue;		/* ignore consts here */
+
+			/*
+			 * We consider only original EC members here, not
+			 * already-transformed child members.
+			 */
+			if (cur_em->em_is_child)
+				continue;		/* ignore children here */
+
+			/*
+			 * We may ignore expressions that reference a single baserel,
+			 * because add_child_rel_equivalences should have handled them.
+			 */
+			if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
+				continue;
+
+			/* Does this member reference child's topmost parent rel? */
+			if (bms_overlap(cur_em->em_relids, top_parent_relids))
+			{
+				/* Yes, generate transformed child version */
+				Expr	   *child_expr;
+				Relids		new_relids;
+				Relids		new_nullable_relids;
+
+				if (parent_joinrel->reloptkind == RELOPT_JOINREL)
+				{
+					/* Simple single-level transformation */
+					child_expr = (Expr *)
+						adjust_appendrel_attrs(root,
+											   (Node *) cur_em->em_expr,
+											   nappinfos, appinfos);
+				}
+				else
+				{
+					/* Must do multi-level transformation */
+					Assert(parent_joinrel->reloptkind == RELOPT_OTHER_JOINREL);
+					child_expr = (Expr *)
+						adjust_appendrel_attrs_multilevel(root,
+														  (Node *) cur_em->em_expr,
+														  child_relids,
+														  top_parent_relids);
+				}
+
+				/*
+				 * Transform em_relids to match.  Note we do *not* do
+				 * pull_varnos(child_expr) here, as for example the
+				 * transformation might have substituted a constant, but we
+				 * don't want the child member to be marked as constant.
+				 */
+				new_relids = bms_difference(cur_em->em_relids,
+											top_parent_relids);
+				new_relids = bms_add_members(new_relids, child_relids);
+
+				/*
+				 * For nullable_relids, we must selectively replace parent
+				 * nullable relids with child ones.
+				 */
+				new_nullable_relids = cur_em->em_nullable_relids;
+				if (bms_overlap(new_nullable_relids, top_parent_relids))
+					new_nullable_relids =
+						adjust_child_relids_multilevel(root,
+													   new_nullable_relids,
+													   child_relids,
+													   top_parent_relids);
+
+				(void) add_eq_member(cur_ec, child_expr,
+									 new_relids, new_nullable_relids,
+									 true, cur_em->em_datatype);
+			}
+		}
+	}
+}
+
 
 /*
  * generate_implied_equalities_for_column
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..a0865b27ba 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -837,6 +837,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* Compute information relevant to foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
+	/* Conpute information needed for mapping Vars to the child rel */
 	appinfos = find_appinfos_by_relids(root, joinrel->relids, &nappinfos);
 
 	/* Set up reltarget struct */
@@ -848,7 +849,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
-	pfree(appinfos);
 
 	/*
 	 * Lateral relids referred in child join will be same as that referred in
@@ -883,6 +883,22 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* Add the relation to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
+	/*
+	 * If we are using partitionwise join or aggregation, we might need
+	 * EquivalenceClass members corresponding to the child join, so that we
+	 * can represent sort pathkeys for it.  As with children of baserels, we
+	 * shouldn't need this unless there are relevant eclass joins (implying
+	 * that a merge join might be possible) or pathkeys to sort by.
+	 */
+	if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
+		(joinrel->has_eclass_joins ||
+		 has_useful_pathkeys(root, parent_joinrel)))
+		add_child_join_rel_equivalences(root,
+										nappinfos, appinfos,
+										parent_joinrel, joinrel);
+
+	pfree(appinfos);
+
 	return joinrel;
 }
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..c6c34630c2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -153,6 +153,11 @@ extern void add_child_rel_equivalences(PlannerInfo *root,
 									   AppendRelInfo *appinfo,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
+extern void add_child_join_rel_equivalences(PlannerInfo *root,
+											int nappinfos,
+											AppendRelInfo **appinfos,
+											RelOptInfo *parent_rel,
+											RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
 													RelOptInfo *rel,
 													ec_matches_callback_type callback,
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index cad8dd591a..975bf6765c 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -459,6 +459,83 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
  550 |     | 
 (12 rows)
 
+-- bug with inadequate sort key representation
+SET enable_partitionwise_aggregate TO true;
+SET enable_hashjoin TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                                    QUERY PLAN                                                     
+-------------------------------------------------------------------------------------------------------------------
+ Group
+   Group Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+   ->  Merge Append
+         Sort Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+         ->  Group
+               Group Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
+                     ->  Merge Full Join
+                           Merge Cond: ((prt1_p1.a = p2.a) AND (prt1_p1.b = p2.b))
+                           Filter: ((COALESCE(prt1_p1.a, p2.a) >= 490) AND (COALESCE(prt1_p1.a, p2.a) <= 510))
+                           ->  Sort
+                                 Sort Key: prt1_p1.a, prt1_p1.b
+                                 ->  Seq Scan on prt1_p1
+                           ->  Sort
+                                 Sort Key: p2.a, p2.b
+                                 ->  Seq Scan on prt2_p1 p2
+         ->  Group
+               Group Key: (COALESCE(prt1_p2.a, p2_1.a)), (COALESCE(prt1_p2.b, p2_1.b))
+               ->  Sort
+                     Sort Key: (COALESCE(prt1_p2.a, p2_1.a)), (COALESCE(prt1_p2.b, p2_1.b))
+                     ->  Merge Full Join
+                           Merge Cond: ((prt1_p2.a = p2_1.a) AND (prt1_p2.b = p2_1.b))
+                           Filter: ((COALESCE(prt1_p2.a, p2_1.a) >= 490) AND (COALESCE(prt1_p2.a, p2_1.a) <= 510))
+                           ->  Sort
+                                 Sort Key: prt1_p2.a, prt1_p2.b
+                                 ->  Seq Scan on prt1_p2
+                           ->  Sort
+                                 Sort Key: p2_1.a, p2_1.b
+                                 ->  Seq Scan on prt2_p2 p2_1
+         ->  Group
+               Group Key: (COALESCE(prt1_p3.a, p2_2.a)), (COALESCE(prt1_p3.b, p2_2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(prt1_p3.a, p2_2.a)), (COALESCE(prt1_p3.b, p2_2.b))
+                     ->  Merge Full Join
+                           Merge Cond: ((prt1_p3.a = p2_2.a) AND (prt1_p3.b = p2_2.b))
+                           Filter: ((COALESCE(prt1_p3.a, p2_2.a) >= 490) AND (COALESCE(prt1_p3.a, p2_2.a) <= 510))
+                           ->  Sort
+                                 Sort Key: prt1_p3.a, prt1_p3.b
+                                 ->  Seq Scan on prt1_p3
+                           ->  Sort
+                                 Sort Key: p2_2.a, p2_2.b
+                                 ->  Seq Scan on prt2_p3 p2_2
+(43 rows)
+
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 495 | 20
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 501 |  1
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 507 |  7
+ 508 |  8
+ 510 | 10
+(14 rows)
+
+RESET enable_partitionwise_aggregate;
+RESET enable_hashjoin;
 --
 -- partitioned by expression
 --
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index fb3ba18a26..92994b479b 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -91,6 +91,21 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, t2.b t2b, t2.c t2c, least(t1.a,t2.a,t3.a) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
 			  ON t1.c = ss.t2c WHERE (t1.b + coalesce(ss.t2b, 0)) = 0 ORDER BY t1.a;
 
+-- bug with inadequate sort key representation
+SET enable_partitionwise_aggregate TO true;
+SET enable_hashjoin TO false;
+
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+RESET enable_partitionwise_aggregate;
+RESET enable_hashjoin;
+
 --
 -- partitioned by expression
 --

#19

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Amit Langote (#18)

Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

On Sun, Nov 3, 2019 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, I thought we should try to put more conditions on whether we
invoke add_child_join_rel_equivalences at all. In the attached I did
if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
(joinrel->has_eclass_joins ||
has_useful_pathkeys(root, parent_joinrel)))
but I wonder if we could do more, like checking to see if the parent
joinrel is partitioned. Alternatively, maybe it's unnecessary because
we won't try to build child joinrels unless these conditions are true?

Actually, I think we can assert in add_child_rel_equivalences() that
enable_partitionwise_join is true. Then checking
enable_partitionwise_aggregate is unnecessary.

After tracing the call sites back a bit further, I agree that we won't
be here in the first place unless enable_partitionwise_join is true,
so the extra tests I proposed are unnecessary. I took them out again.

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not?  Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members.  I'm
thinking that we should apply this diff:
-            if (bms_overlap(cur_em->em_relids, top_parent_relids))
+            if (bms_is_subset(cur_em->em_relids, top_parent_relids))

Meh, I'm not really convinced. The case where this would be relevant
is an EC generated from something like "WHERE (a.x + b.y) = c.z"
where "a" is partitioned. It's possible that we'd never have a use
for a sort key corresponding to "a_child.x + b.y", but I think that's
not obvious, and probably short-sighted. Anyway such EC members are
pretty rare in the first place, so we're not going to win much
performance by trying to optimize them.

Anyway, I've pushed the fix for Justin's problem to v12 and HEAD.
The problem with poor planning of multiway joins that you mentioned
in the other thread remains open, but I imagine the patches you
posted there are going to need rebasing over this commit, so I
set that CF entry to Waiting On Author.

regards, tom lane

#20

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tom Lane (#19)

Re: v12.0: ERROR: could not find pathkey item to sort

On Wed, Nov 6, 2019 at 1:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:
I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not?  Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members.  I'm
thinking that we should apply this diff:
-            if (bms_overlap(cur_em->em_relids, top_parent_relids))
+            if (bms_is_subset(cur_em->em_relids, top_parent_relids))
Meh, I'm not really convinced. The case where this would be relevant
is an EC generated from something like "WHERE (a.x + b.y) = c.z"
where "a" is partitioned. It's possible that we'd never have a use
for a sort key corresponding to "a_child.x + b.y", but I think that's
not obvious, and probably short-sighted. Anyway such EC members are
pretty rare in the first place, so we're not going to win much
performance by trying to optimize them.

OK.

Anyway, I've pushed the fix for Justin's problem to v12 and HEAD.
The problem with poor planning of multiway joins that you mentioned
in the other thread remains open, but I imagine the patches you
posted there are going to need rebasing over this commit, so I
set that CF entry to Waiting On Author.

Thank you. I will send rebased patches on that thread.

Regards,
Amit