pgsql: Fix cardinality estimates for parallel joins.
Fix cardinality estimates for parallel joins.
For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong. The previous coding
got this completely wrong for parallel joins.
Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet. Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.
Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.
Discussion: /messages/by-id/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com
Branch
------
REL9_6_STABLE
Details
-------
http://git.postgresql.org/pg/commitdiff/2d443ae1b0121e15265864d2b2143509fa70e8e4
Modified Files
--------------
src/backend/optimizer/path/costsize.c | 74 +++++++++++++++++++++++------------
1 file changed, 48 insertions(+), 26 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas <rhaas@postgresql.org> wrote:
Fix cardinality estimates for parallel joins.
+ /*
+ * In the case of a parallel plan, the row count needs to represent
+ * the number of tuples processed per worker.
+ */
+ path->rows = clamp_row_est(path->rows / parallel_divisor);
}
path->startup_cost = startup_cost;
@@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
else
path->path.rows = path->path.parent->rows;
+ /* For partial paths, scale row estimate. */
+ if (path->path.parallel_workers > 0)
+ path->path.rows /= get_parallel_divisor(&path->path);
Isn't it better to call clamp_row_est in join costing functions as we
are doing in cost_seqscan()? Is there a reason to keep those
different?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Mon, Jan 16, 2017 at 7:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas <rhaas@postgresql.org> wrote:
Fix cardinality estimates for parallel joins.
+ /* + * In the case of a parallel plan, the row count needs to represent + * the number of tuples processed per worker. + */ + path->rows = clamp_row_est(path->rows / parallel_divisor); }path->startup_cost = startup_cost;
@@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
else
path->path.rows = path->path.parent->rows;+ /* For partial paths, scale row estimate. */ + if (path->path.parallel_workers > 0) + path->path.rows /= get_parallel_divisor(&path->path);Isn't it better to call clamp_row_est in join costing functions as we
are doing in cost_seqscan()? Is there a reason to keep those
different?
No, those should probably be changed to match.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 17, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 16, 2017 at 7:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas <rhaas@postgresql.org> wrote:
Fix cardinality estimates for parallel joins.
+ /* + * In the case of a parallel plan, the row count needs to represent + * the number of tuples processed per worker. + */ + path->rows = clamp_row_est(path->rows / parallel_divisor); }path->startup_cost = startup_cost;
@@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
else
path->path.rows = path->path.parent->rows;+ /* For partial paths, scale row estimate. */ + if (path->path.parallel_workers > 0) + path->path.rows /= get_parallel_divisor(&path->path);Isn't it better to call clamp_row_est in join costing functions as we
are doing in cost_seqscan()? Is there a reason to keep those
different?No, those should probably be changed to match.
So I guess that'd look something like this?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
clamp-parallel-join-est.patchapplication/octet-stream; name=clamp-parallel-join-est.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8883586..777ba4c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2140,7 +2140,12 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
/* For partial paths, scale row estimate. */
if (path->path.parallel_workers > 0)
- path->path.rows /= get_parallel_divisor(&path->path);
+ {
+ double parallel_divisor = get_parallel_divisor(&path->path);
+
+ path->path.rows =
+ clamp_row_est(path->path.rows / parallel_divisor);
+ }
/*
* We could include disable_cost in the preliminary estimate, but that
@@ -2562,7 +2567,12 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
/* For partial paths, scale row estimate. */
if (path->jpath.path.parallel_workers > 0)
- path->jpath.path.rows /= get_parallel_divisor(&path->jpath.path);
+ {
+ double parallel_divisor = get_parallel_divisor(&path->jpath.path);
+
+ path->jpath.path.rows =
+ clamp_row_est(path->jpath.path.rows / parallel_divisor);
+ }
/*
* We could include disable_cost in the preliminary estimate, but that
@@ -2945,7 +2955,12 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
/* For partial paths, scale row estimate. */
if (path->jpath.path.parallel_workers > 0)
- path->jpath.path.rows /= get_parallel_divisor(&path->jpath.path);
+ {
+ double parallel_divisor = get_parallel_divisor(&path->jpath.path);
+
+ path->jpath.path.rows =
+ clamp_row_est(path->jpath.path.rows / parallel_divisor);
+ }
/*
* We could include disable_cost in the preliminary estimate, but that
On Tue, Mar 14, 2017 at 9:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 17, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 16, 2017 at 7:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Isn't it better to call clamp_row_est in join costing functions as we
are doing in cost_seqscan()? Is there a reason to keep those
different?No, those should probably be changed to match.
So I guess that'd look something like this?
Yes, the patch looks good to me.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers