Problem about postponing gathering partial paths for topmost scan/join rel

Started by Richard Guoover 4 years ago8 messages

guofenglinux@gmail.com

over 4 years ago

Hi all,

In commit 3f90ec85 we are trying to postpone gathering partial paths for
topmost scan/join rel until we know the final targetlist, in order to
allow more accurate costing of parallel paths. We do this by the
following code snippet in standard_join_search:

+   /*
+    * Except for the topmost scan/join rel, consider gathering
+    * partial paths.  We'll do the same for the topmost scan/join rel
+    * once we know the final targetlist (see grouping_planner).
+    */
+   if (lev < levels_needed)
+           generate_gather_paths(root, rel, false);

This change may cause a problem if the joinlist contains sub-joinlist
nodes, in which case 'lev == levels_needed' does not necessarily imply
it's the topmost for the final scan/join rel. It may only be the topmost
scan/join rel for the subproblem. And then we would miss the Gather
paths for this subproblem. It can be illustrated with the query below:

create table foo(i int, j int);
insert into foo select i, i from generate_series(1,50000)i;
analyze foo;

set max_parallel_workers_per_gather to 4;
set parallel_setup_cost to 0;
set parallel_tuple_cost to 0;
set min_parallel_table_scan_size to 0;

# explain (costs off) select * from foo a join foo b on a.i = b.i full join
foo c on b.i = c.i;
QUERY PLAN
----------------------------------------------------
Hash Full Join
Hash Cond: (b.i = c.i)
-> Hash Join
Hash Cond: (a.i = b.i)
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo a
-> Hash
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo b
-> Hash
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo c
(15 rows)

Please note how we do the join for rel a and b. We run Gather above the
parallel scan and then do the join above the Gather.

These two base rels are grouped in a subproblem because of the FULL
JOIN. And due to the mentioned code change, we are unable to gather
partial paths for their joinrel.

If we can somehow fix this problem, then we would be able to do better
planning by running parallel join first and then doing Gather above the
join.

-> Gather
Workers Planned: 4
-> Parallel Hash Join
Hash Cond: (a.i = b.i)
-> Parallel Seq Scan on foo a
-> Parallel Hash
-> Parallel Seq Scan on foo b

To fix this problem, I'm thinking we can leverage 'root->all_baserels'
to tell if we are at the topmost scan/join rel, something like:

--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3041,7 +3041,7 @@ standard_join_search(PlannerInfo *root, int
levels_needed, List *initial_rels)
                         * partial paths.  We'll do the same for the
topmost scan/join rel
                         * once we know the final targetlist (see
grouping_planner).
                         */
-                       if (lev < levels_needed)
+                       if (!bms_equal(rel->relids, root->all_baserels))
                                generate_useful_gather_paths(root, rel,
false);

Any thoughts?

Thanks
Richard

Richard Guo

guofenglinux@gmail.com

over 4 years ago

In reply to: Richard Guo (#1)

1 attachment(s)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

On Wed, Jul 28, 2021 at 3:42 PM Richard Guo <guofenglinux@gmail.com> wrote:

To fix this problem, I'm thinking we can leverage 'root->all_baserels'
to tell if we are at the topmost scan/join rel, something like:

--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3041,7 +3041,7 @@ standard_join_search(PlannerInfo *root, int
levels_needed, List *initial_rels)
* partial paths.  We'll do the same for the
topmost scan/join rel
* once we know the final targetlist (see
grouping_planner).
*/
-                       if (lev < levels_needed)
+                       if (!bms_equal(rel->relids, root->all_baserels))
generate_useful_gather_paths(root, rel,
false);

Any thoughts?

Attach a patch to include the fix described upthread. Would appreciate
any comments on this topic.

Thanks
Richard

Attachments:

v1-0001-Gather-partial-paths-for-subproblem-s-topmost-sca.patchapplication/octet-stream; name=v1-0001-Gather-partial-paths-for-subproblem-s-topmost-sca.patchDownload

From a7ed44b2ebecffac297cbbe181c1980d8edb58a9 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 29 Jul 2021 12:00:44 +0800
Subject: [PATCH v1] Gather partial paths for subproblem's topmost scan/join
 rel

---
 src/backend/optimizer/path/allpaths.c         |  2 +-
 src/test/regress/expected/select_parallel.out | 22 +++++++++++++++++++
 src/test/regress/sql/select_parallel.sql      |  5 +++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 671117314a..640e3b4049 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3041,7 +3041,7 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			 * partial paths.  We'll do the same for the topmost scan/join rel
 			 * once we know the final targetlist (see grouping_planner).
 			 */
-			if (lev < levels_needed)
+			if (!bms_equal(rel->relids, root->all_baserels))
 				generate_useful_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 4ea1aa7dfd..c9cc8cdc18 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1194,4 +1194,26 @@ SELECT 1 FROM tenk1_vw_sec
                  Filter: (f1 < tenk1_vw_sec.unique1)
 (9 rows)
 
+-- test gather for subproblem's topmost scan/join rel
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1 a JOIN tenk1 b ON a.two = b.two
+  FULL JOIN tenk1 c ON b.two = c.two;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (b.two = c.two)
+         ->  Gather
+               Workers Planned: 4
+               ->  Parallel Hash Join
+                     Hash Cond: (a.two = b.two)
+                     ->  Parallel Seq Scan on tenk1 a
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on tenk1 b
+         ->  Hash
+               ->  Gather
+                     Workers Planned: 4
+                     ->  Parallel Seq Scan on tenk1 c
+(14 rows)
+
 rollback;
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index f924731248..3fcf82cd9d 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -455,4 +455,9 @@ EXPLAIN (COSTS OFF)
 SELECT 1 FROM tenk1_vw_sec
   WHERE (SELECT sum(f1) FROM int4_tbl WHERE f1 < unique1) < 100;
 
+-- test gather for subproblem's topmost scan/join rel
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1 a JOIN tenk1 b ON a.two = b.two
+  FULL JOIN tenk1 c ON b.two = c.two;
+
 rollback;
-- 
2.31.0

Antonin Houska

ah@cybertec.at

over 3 years ago

In reply to: Richard Guo (#2)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Jul 28, 2021 at 3:42 PM Richard Guo <guofenglinux@gmail.com> wrote:

To fix this problem, I'm thinking we can leverage 'root->all_baserels'
to tell if we are at the topmost scan/join rel, something like:
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3041,7 +3041,7 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
* partial paths.  We'll do the same for the topmost scan/join rel
* once we know the final targetlist (see grouping_planner).
*/
-                       if (lev < levels_needed)
+                       if (!bms_equal(rel->relids, root->all_baserels))
generate_useful_gather_paths(root, rel, false);
Any thoughts?

Attach a patch to include the fix described upthread. Would appreciate
any comments on this topic.

I think I understand the idea but I'm not sure about the regression test. I
suspect that in the plan

EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1 a JOIN tenk1 b ON a.two =3D b.two
FULL JOIN tenk1 c ON b.two =3D c.two;
QUERY PLAN

------------------------------------------------------------
Aggregate
-> Hash Full Join
Hash Cond: (b.two =3D c.two)
-> Gather
Workers Planned: 4
-> Parallel Hash Join
Hash Cond: (a.two =3D b.two)
-> Parallel Seq Scan on tenk1 a
-> Parallel Hash
-> Parallel Seq Scan on tenk1 b
-> Hash
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on tenk1 c

the Gather node is located below the "Hash Full Join" node only because that
kind of join currently cannot be executed by parallel workers. If the parallel
"Hash Full Join" gets implemented (I've noticed but not checked in detail
[1]: https://commitfest.postgresql.org/38/2903/

I'd prefer a test that demonstrates that the Gather node at the top of the
"subproblem plan" is useful purely from the *cost* perspective, rather than
due to executor limitation.

[1]: https://commitfest.postgresql.org/38/2903/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Richard Guo

guofenglinux@gmail.com

over 3 years ago

In reply to: Antonin Houska (#3)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

On Thu, Jul 14, 2022 at 10:02 PM Antonin Houska <ah@cybertec.at> wrote:

I'd prefer a test that demonstrates that the Gather node at the top of the
"subproblem plan" is useful purely from the *cost* perspective, rather than
due to executor limitation.

This patch provides an additional path (Gather atop of subproblem) which
was not available before. But your concern makes sense that we need to
show this new path is valuable from competing on cost with other paths.

How about we change to Nested Loop at the topmost? Something like:

set join_collapse_limit to 2;

# explain (costs off) select * from foo a join foo b on a.i = b.i join foo
c on b.i > c.i;
QUERY PLAN
----------------------------------------------------
Nested Loop
Join Filter: (b.i > c.i)
-> Gather
Workers Planned: 4
-> Parallel Hash Join
Hash Cond: (a.i = b.i)
-> Parallel Seq Scan on foo a
-> Parallel Hash
-> Parallel Seq Scan on foo b
-> Materialize
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo c
(13 rows)

Without the patch, the path which is Gather atop of subproblem is not
available, and we would get:

# explain (costs off) select * from foo a join foo b on a.i = b.i join foo
c on b.i > c.i;
QUERY PLAN
----------------------------------------------------
Nested Loop
Join Filter: (b.i > c.i)
-> Hash Join
Hash Cond: (a.i = b.i)
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo a
-> Hash
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo b
-> Materialize
-> Gather
Workers Planned: 4
-> Parallel Seq Scan on foo c
(15 rows)

Thanks
Richard

Richard Guo

guofenglinux@gmail.com

over 3 years ago

In reply to: Richard Guo (#4)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

On Fri, Jul 15, 2022 at 4:03 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Jul 14, 2022 at 10:02 PM Antonin Houska <ah@cybertec.at> wrote:

I'd prefer a test that demonstrates that the Gather node at the top of the
"subproblem plan" is useful purely from the *cost* perspective, rather
than
due to executor limitation.

This patch provides an additional path (Gather atop of subproblem) which
was not available before. But your concern makes sense that we need to
show this new path is valuable from competing on cost with other paths.

How about we change to Nested Loop at the topmost? Something like:

Maybe a better example is that we use a small table 'c' to avoid the
Gather node above scanning 'c', so that the path of parallel nestloop is
possible to be generated.

set join_collapse_limit to 2;

# explain (costs off) select * from a join b on a.i = b.i join c on b.i >
c.i;
QUERY PLAN
------------------------------------------------
Nested Loop
Join Filter: (b.i > c.i)
-> Seq Scan on c
-> Gather
Workers Planned: 4
-> Parallel Hash Join
Hash Cond: (a.i = b.i)
-> Parallel Seq Scan on a
-> Parallel Hash
-> Parallel Seq Scan on b
(10 rows)

Thanks
Richard

Richard Guo

guofenglinux@gmail.com

over 3 years ago

In reply to: Richard Guo (#5)

1 attachment(s)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

On Fri, Jul 15, 2022 at 5:00 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Fri, Jul 15, 2022 at 4:03 PM Richard Guo <guofenglinux@gmail.com>
wrote:

On Thu, Jul 14, 2022 at 10:02 PM Antonin Houska <ah@cybertec.at> wrote:

I'd prefer a test that demonstrates that the Gather node at the top of
the
"subproblem plan" is useful purely from the *cost* perspective, rather
than
due to executor limitation.

This patch provides an additional path (Gather atop of subproblem) which
was not available before. But your concern makes sense that we need to
show this new path is valuable from competing on cost with other paths.

How about we change to Nested Loop at the topmost? Something like:

Maybe a better example is that we use a small table 'c' to avoid the
Gather node above scanning 'c', so that the path of parallel nestloop is
possible to be generated.

Update the patch with the new test case.

Thanks
Richard

Attachments:

v2-0001-Gather-partial-paths-for-subproblem-s-topmost-sca.patchapplication/octet-stream; name=v2-0001-Gather-partial-paths-for-subproblem-s-topmost-sca.patchDownload

From 4f10509df2f57760f85794334ee944156d1d79de Mon Sep 17 00:00:00 2001
From: pgsql-guo <richard.guo@openpie.com>
Date: Mon, 18 Jul 2022 07:02:19 +0000
Subject: [PATCH v2] Gather partial paths for subproblem's topmost scan/join

---
 src/backend/optimizer/path/allpaths.c         |  2 +-
 src/test/regress/expected/select_parallel.out | 20 +++++++++++++++++++
 src/test/regress/sql/select_parallel.sql      |  9 +++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e9342097e5..d9810c5c56 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3402,7 +3402,7 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			 * partial paths.  We'll do the same for the topmost scan/join rel
 			 * once we know the final targetlist (see grouping_planner).
 			 */
-			if (lev < levels_needed)
+			if (!bms_equal(rel->relids, root->all_baserels))
 				generate_useful_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 4ea1aa7dfd..9afbc3d7df 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1194,4 +1194,24 @@ SELECT 1 FROM tenk1_vw_sec
                  Filter: (f1 < tenk1_vw_sec.unique1)
 (9 rows)
 
+-- test gather for subproblem's topmost scan/join rel
+set join_collapse_limit to 2;
+alter table d_star set (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT * FROM tenk1 a JOIN tenk1 b ON a.two = b.two JOIN d_star c ON b.two > c.aa;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Nested Loop
+   Join Filter: (b.two > c.aa)
+   ->  Seq Scan on d_star c
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Hash Join
+               Hash Cond: (a.two = b.two)
+               ->  Parallel Seq Scan on tenk1 a
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on tenk1 b
+(10 rows)
+
+alter table d_star reset (parallel_workers);
 rollback;
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index f924731248..4fb8b0e0e7 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -455,4 +455,13 @@ EXPLAIN (COSTS OFF)
 SELECT 1 FROM tenk1_vw_sec
   WHERE (SELECT sum(f1) FROM int4_tbl WHERE f1 < unique1) < 100;
 
+-- test gather for subproblem's topmost scan/join rel
+set join_collapse_limit to 2;
+alter table d_star set (parallel_workers = 0);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM tenk1 a JOIN tenk1 b ON a.two = b.two JOIN d_star c ON b.two > c.aa;
+
+alter table d_star reset (parallel_workers);
+
 rollback;
-- 
2.25.1

Antonin Houska

ah@cybertec.at

over 3 years ago

In reply to: Richard Guo (#6)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

Richard Guo <guofenglinux@gmail.com> wrote:

On Fri, Jul 15, 2022 at 5:00 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Fri, Jul 15, 2022 at 4:03 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Jul 14, 2022 at 10:02 PM Antonin Houska <ah@cybertec.at> wrote:

I'd prefer a test that demonstrates that the Gather node at the top of the
"subproblem plan" is useful purely from the *cost* perspective, rather than
due to executor limitation.

This patch provides an additional path (Gather atop of subproblem) which
was not available before. But your concern makes sense that we need to
show this new path is valuable from competing on cost with other paths.

How about we change to Nested Loop at the topmost? Something like:

Maybe a better example is that we use a small table 'c' to avoid the
Gather node above scanning 'c', so that the path of parallel nestloop is
possible to be generated.

Update the patch with the new test case.

ok, this makes sense to me. Just one minor suggestion: the command

alter table d_star reset (parallel_workers);

is not necessary because it's immediately followed by

rollback;

I'm going to set the CF entry to "ready for committer'".

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Antonin Houska (#7)

Re: Problem about postponing gathering partial paths for topmost scan/join rel

Antonin Houska <ah@cybertec.at> writes:

I'm going to set the CF entry to "ready for committer'".

I pushed this with some editorialization:

* Grepping found another caller of generate_useful_gather_paths
with the exact same bug, in geqo_eval.c. (A wise man once said
that the most powerful bug-finding heuristic he knew was "where
else did we make this same mistake?")

* I thought it best to make set_rel_pathlist() use an identically-
worded test for the equivalent purpose for baserels. It's not
actively broken, but examining bms_membership seems confusingly
complicated, and I doubt it's faster than bms_equal either.

* I omitted the test case because I didn't think it would buy us
anything. It failed to detect the GEQO variant of the bug, and
it would fail to detect the most likely future way of breaking
this, which is that I forget to change these instances of
all_baserels next time I rebase [1]https://commitfest.postgresql.org/39/3755/.

regards, tom lane

[1]: https://commitfest.postgresql.org/39/3755/