estimation for join results cardinality is sometimes more than the product of the downstream nodes'

Started by Alexey Bashtanovover 8 years ago2 messages
#1Alexey Bashtanov
bashtanov@imap.cc
3 attachment(s)

Hello,

Postgres can produce a plan with a nested loop node having rows estimate
much more than the product of underlying nodes' estimates, relying only
on outer relation size:

alexey=# explain
SELECT oid, relname
FROM (
SELECT m.oid, m.relname
FROM pg_class m
UNION ALL
SELECT m.oid, m.relname
FROM pg_class m
) m
WHERE oid IN (VALUES (162456317), (162456310));
QUERY PLAN
----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.31..33.24 rows=*341* width=68)
-> Unique (cost=0.04..0.04 rows=*2* width=4)
-> Sort (cost=0.04..0.04 rows=2 width=4)
Sort Key: (("*VALUES*".column1)::oid)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=4)
-> Append (cost=0.27..16.58 rows=*2* width=68)
-> Index Scan using pg_class_oid_index on pg_class m
(cost=0.27..8.29 rows=1 width=68)
Index Cond: (oid = ("*VALUES*".column1)::oid)
-> Index Scan using pg_class_oid_index on pg_class m_1
(cost=0.27..8.29 rows=1 width=68)
Index Cond: (oid = ("*VALUES*".column1)::oid)
(10 rows)

Why?
Is there a reason that join cardinality estimates are not limited by the
product of the joined parts cardinalities like in the
join-card-est.patch attached?
An example of a query working faster as a result of this change is in
join-card-est.sql, result is in join-card-est.result

Best Regards,
Alexey

Attachments:

join-card-est.patchtext/x-patch; name=join-card-est.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b35acb7..fa4bebc 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2185,15 +2185,6 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 	else
 		path->path.rows = path->path.parent->rows;
 
-	/* For partial paths, scale row estimate. */
-	if (path->path.parallel_workers > 0)
-	{
-		double		parallel_divisor = get_parallel_divisor(&path->path);
-
-		path->path.rows =
-			clamp_row_est(path->path.rows / parallel_divisor);
-	}
-
 	/*
 	 * We could include disable_cost in the preliminary estimate, but that
 	 * would amount to optimizing for the case where the join method is
@@ -2321,6 +2312,19 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 		ntuples = outer_path_rows * inner_path_rows;
 	}
 
+	/* We cannot emit more rows than we process. */
+	if (path->path.rows > ntuples)
+		path->path.rows = clamp_row_est(ntuples);
+
+	/* For partial paths, scale row estimate. */
+	if (path->path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->path);
+
+		path->path.rows =
+			clamp_row_est(path->path.rows / parallel_divisor);
+	}
+
 	/* CPU costs */
 	cost_qual_eval(&restrict_qual_cost, path->joinrestrictinfo, root);
 	startup_cost += restrict_qual_cost.startup;
join-card-est.sqlapplication/sql; name=join-card-est.sqlDownload
join-card-est.resulttext/plain; charset=UTF-8; name=join-card-est.resultDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Bashtanov (#1)
Re: estimation for join results cardinality is sometimes more than the product of the downstream nodes'

Alexey Bashtanov <bashtanov@imap.cc> writes:

Is there a reason that join cardinality estimates are not limited by the
product of the joined parts cardinalities like in the
join-card-est.patch attached?

Because that would be giving an unfair advantage to some paths over
others based on nothing except estimation errors. I do not think we'd
get a net benefit in plan quality.

If we could do this earlier and adjust the join relation's overall
cardinality estimate, it might be something to consider.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers