Parameter for planner estimate of recursive queries

Started by Simon Riggsabout 4 years ago16 messages

simon.riggs@enterprisedb.com

about 4 years ago

2 attachment(s)

I've been investigating the poor performance of a WITH RECURSIVE
query, which I've recreated with test data.

The first thing was to re-write the query, which helped improve
performance by about 30%, but the plan was still very bad. With a
small patch I've been able to improve performance by about x100.

The poor performance is traced to the planner cost estimates for
recursive queries. Specifically, the cost of the recursive arm of the
query is evaluated based upon both of these hardcoded assumptions:

1. The recursion will last for 10 loops
2. The average size of the worktable will be 10x the size of the
initial query (non-recursive term).

Taken together these assumptions lead to a very poor estimate of the
worktable activity (in this case), which leads to the plan changing as
a result.

The factor 10 is a reasonably safe assumption and helps avoid worst
case behavior in bigger graph queries. However, the factor 10 is way
too large for many types of graph query, such as where the path
through the data is tight, and/or the query is written to prune bushy
graphs, e.g. shortest path queries. The factor 10 should not be
hardcoded in the planner, but should be settable, just as
cursor_tuple_fraction is.

I've written a short patch to make the estimate of the avg size of the
worktable configurable:

recursive_worktable_estimate = N (default 10)

Using this parameter with the test query results in a consistently
repeatable ~100x gain in performance, using
recursive_worktable_estimate = 1 for a shortest path query:

Unpatched: 1775ms
Patched: 17.2ms

This is because the estimated size of the worktable is closer to the
truth and so leads naturally to a more sensible plan. EXPLAINs
attached - please look at the estimated rows for the WorkTable Scan.

There are various options for setting the two estimates: just one, or
other, or both values separately, or both together. Note that I
haven't touched the estimate that recursion will last for 10 loops. I
figured that people would object to two knobs.

Thoughts?

There are 2 other ways to speed up the query. One is to set
enable_seqscan = off, which helps about 20%, but may have other
consequences. Two is to set work_mem = 512MB, so that the poor plan
(hash) works as fast as possible, but that is far from optimal either.
Getting the right plan is x20 faster than either of those
alternatives.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

recursive_worktable_estimate.v1.patchapplication/octet-stream; name=recursive_worktable_estimate.v1.patchDownload

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1e4d404f02..9b189de91d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -123,6 +123,7 @@ double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double		parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
 double		parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST;
+double		recursive_worktable_estimate = DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE;
 
 int			effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE;
 
@@ -5659,10 +5660,11 @@ set_cte_size_estimates(PlannerInfo *root, RelOptInfo *rel, double cte_rows)
 	if (rte->self_reference)
 	{
 		/*
-		 * In a self-reference, arbitrarily assume the average worktable size
-		 * is about 10 times the nonrecursive term's size.
+		 * Use recursive_worktable_estimate to get average size of worktable,
+		 * across all iterations. This will vary depending upon how bushy the
+		 * data is, so allow the user to set based upon their data.
 		 */
-		rel->tuples = 10 * cte_rows;
+		rel->tuples = recursive_worktable_estimate * cte_rows;
 	}
 	else
 	{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d2ce4a8450..c84643bc0d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3702,6 +3702,19 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"recursive_worktable_estimate", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the planner's estimate of the size of "
+						 "the recursive_worktable as a multiple of the "
+						 "initial non-recursive query size."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&recursive_worktable_estimate,
+		DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE, 0.0, 1000000.0,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"geqo_selection_bias", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("GEQO: selective pressure within the population."),
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 2113bc82de..9af7a6eea3 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -29,6 +29,8 @@
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
 #define DEFAULT_PARALLEL_SETUP_COST  1000.0
 
+#define DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE 10.0
+
 #define DEFAULT_EFFECTIVE_CACHE_SIZE  524288	/* measured in pages */
 
 typedef enum
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 41b49b2662..eccac5039f 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -92,6 +92,7 @@ extern PGDLLIMPORT double cpu_operator_cost;
 extern PGDLLIMPORT double parallel_tuple_cost;
 extern PGDLLIMPORT double parallel_setup_cost;
 extern PGDLLIMPORT int effective_cache_size;
+extern PGDLLIMPORT double recursive_worktable_estimate;
 
 extern double clamp_row_est(double nrows);

graph_query_explains.outapplication/octet-stream; name=graph_query_explains.outDownload

Simon Riggs

simon.riggs@enterprisedb.com

about 4 years ago

In reply to: Simon Riggs (#1)

Re: Parameter for planner estimate of recursive queries

On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

The poor performance is traced to the planner cost estimates for
recursive queries. Specifically, the cost of the recursive arm of the
query is evaluated based upon both of these hardcoded assumptions:

1. The recursion will last for 10 loops
2. The average size of the worktable will be 10x the size of the
initial query (non-recursive term).

Taken together these assumptions lead to a very poor estimate of the
worktable activity (in this case), which leads to the plan changing as
a result.

The factor 10 is a reasonably safe assumption and helps avoid worst
case behavior in bigger graph queries. However, the factor 10 is way
too large for many types of graph query, such as where the path
through the data is tight, and/or the query is written to prune bushy
graphs, e.g. shortest path queries. The factor 10 should not be
hardcoded in the planner, but should be settable, just as
cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

So I'm, thinking we use

rel->tuples = min(1, cte_rows * cte_rows/2);

--
Simon Riggs http://www.EnterpriseDB.com/

Kenaniah Cerny

kenaniah@gmail.com

almost 4 years ago

In reply to: Simon Riggs (#2)

Re: Parameter for planner estimate of recursive queries

The factor 10 should not be hardcoded in the planner, but should be

settable, just as cursor_tuple_fraction is.

I feel considerably out of my depth here, but I like the idea of a working
table size multiplier GUC, given the challenges of predicting the number of
iterations (and any adjustments to cardinality per iteration). An
exponential cost function may lead to increasingly pathological outcomes,
especially when estimates for cte_rows are off.

In the EXPLAINs, it looked like the estimates for knows_pkey were off by an
order of magnitude or so. It's possible the planner would have chosen the
Nested Loop plan if knows_pkey had estimated to rows=87 (as the WindowAgg
would have estimated to roughly the same size as the second plan anyways,
even with rel->tuples = 10 * cte_rows).

I also wonder if there's a better default than cte_rows * 10, but
implementing a new GUC sounds like a reasonable solution to this as well.

Kenaniah

On Sat, Jan 22, 2022 at 1:58 PM Simon Riggs <simon.riggs@enterprisedb.com>
wrote:

Show quoted text

On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com>
wrote:

The poor performance is traced to the planner cost estimates for
recursive queries. Specifically, the cost of the recursive arm of the
query is evaluated based upon both of these hardcoded assumptions:

1. The recursion will last for 10 loops
2. The average size of the worktable will be 10x the size of the
initial query (non-recursive term).

Taken together these assumptions lead to a very poor estimate of the
worktable activity (in this case), which leads to the plan changing as
a result.

The factor 10 is a reasonably safe assumption and helps avoid worst
case behavior in bigger graph queries. However, the factor 10 is way
too large for many types of graph query, such as where the path
through the data is tight, and/or the query is written to prune bushy
graphs, e.g. shortest path queries. The factor 10 should not be
hardcoded in the planner, but should be settable, just as
cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

So I'm, thinking we use

rel->tuples = min(1, cte_rows * cte_rows/2);

--
Simon Riggs http://www.EnterpriseDB.com/

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Simon Riggs (#2)

Re: Parameter for planner estimate of recursive queries

On 31.12.21 15:10, Simon Riggs wrote:

The factor 10 is a reasonably safe assumption and helps avoid worst
case behavior in bigger graph queries. However, the factor 10 is way
too large for many types of graph query, such as where the path
through the data is tight, and/or the query is written to prune bushy
graphs, e.g. shortest path queries. The factor 10 should not be
hardcoded in the planner, but should be settable, just as
cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.
I think the way you have characterized it makes sense, too: for graph
OLAP, you want a larger value, for graph OLTP, you want a smaller value.

Hamid Akhtar

hamid.akhtar@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#4)

Re: Parameter for planner estimate of recursive queries

On Tue, 25 Jan 2022 at 14:44, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 31.12.21 15:10, Simon Riggs wrote:

The factor 10 is a reasonably safe assumption and helps avoid worst
case behavior in bigger graph queries. However, the factor 10 is way
too large for many types of graph query, such as where the path
through the data is tight, and/or the query is written to prune bushy
graphs, e.g. shortest path queries. The factor 10 should not be
hardcoded in the planner, but should be settable, just as
cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.
I think the way you have characterized it makes sense, too: for graph
OLAP, you want a larger value, for graph OLTP, you want a smaller value.

Do you think there is a case to replace the 10x multiplier with
"recursive_worktable_estimate" for total_rows calculation in the
cost_recursive_union function too?

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#4)

Re: Parameter for planner estimate of recursive queries

On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.

I agree. It's a bit lame, but seems pretty harmless, and I can't see
us realistically doing a lot better with any reasonable amount of
work.

--
Robert Haas
EDB: http://www.enterprisedb.com

Simon Riggs

simon.riggs@enterprisedb.com

almost 4 years ago

In reply to: Robert Haas (#6)

Re: Parameter for planner estimate of recursive queries

On Fri, 28 Jan 2022 at 14:07, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.

I agree. It's a bit lame, but seems pretty harmless, and I can't see
us realistically doing a lot better with any reasonable amount of
work.

Shall I set this as Ready For Committer?

--
Simon Riggs http://www.EnterpriseDB.com/

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Simon Riggs (#7)

Re: Parameter for planner estimate of recursive queries

Hi,

On 2022-03-10 17:42:14 +0000, Simon Riggs wrote:

Shall I set this as Ready For Committer?

Currently this CF entry fails on cfbot: https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158

[16:27:35.772] # Failed test 'no parameters missing from postgresql.conf.sample'
[16:27:35.772] # at t/003_check_guc.pl line 82.
[16:27:35.772] # got: '1'
[16:27:35.772] # expected: '0'
[16:27:35.772] # Looks like you failed 1 test of 3.
[16:27:35.772] [16:27:35] t/003_check_guc.pl ..............

Marked as waiting on author.

- Andres

Simon Riggs

simon.riggs@enterprisedb.com

almost 4 years ago

In reply to: Andres Freund (#8)

1 attachment(s)

Re: Parameter for planner estimate of recursive queries

On Tue, 22 Mar 2022 at 00:04, Andres Freund <andres@anarazel.de> wrote:

On 2022-03-10 17:42:14 +0000, Simon Riggs wrote:

Shall I set this as Ready For Committer?

Currently this CF entry fails on cfbot: https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158

[16:27:35.772] # Failed test 'no parameters missing from postgresql.conf.sample'
[16:27:35.772] # at t/003_check_guc.pl line 82.
[16:27:35.772] # got: '1'
[16:27:35.772] # expected: '0'
[16:27:35.772] # Looks like you failed 1 test of 3.
[16:27:35.772] [16:27:35] t/003_check_guc.pl ..............

Marked as waiting on author.

Thanks.

I've corrected that by adding a line to postgresql.conf.sample, as
well as adding docs.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

recursive_worktable_estimate.v2.patchapplication/octet-stream; name=recursive_worktable_estimate.v2.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..86673b4fba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5919,6 +5919,25 @@ SELECT * FROM parent WHERE key = 2400;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-recursive-worktable-estimate" xreflabel="recursive_worktable_estimate">
+      <term><varname>recursive_worktable_estimate</varname> (<type>floating point</type>)
+      <indexterm>
+       <primary><varname>recursive_worktable_estimate</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the planner's estimate of the average size of the worktable, if
+        a recursive query is requested, as a multiple of the number of rows in the
+        initial non-recursive term.  The default is 10.0.  Smaller values of this
+        setting bias the planner towards using <quote>fast start</quote> plans
+        for recursive queries, which will retrieve the first few rows quickly while
+        perhaps taking a long time to fetch all rows. A setting of 1.0 is appropriate
+        for shortest path queries; larger values may be better for graph analytics.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
    </sect1>
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4d9f3b4bb6..31e8de4bf3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -123,6 +123,7 @@ double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double		parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
 double		parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST;
+double		recursive_worktable_estimate = DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE;
 
 int			effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE;
 
@@ -5665,10 +5666,11 @@ set_cte_size_estimates(PlannerInfo *root, RelOptInfo *rel, double cte_rows)
 	if (rte->self_reference)
 	{
 		/*
-		 * In a self-reference, arbitrarily assume the average worktable size
-		 * is about 10 times the nonrecursive term's size.
+		 * Use recursive_worktable_estimate to get average size of worktable,
+		 * across all iterations. This will vary depending upon how bushy the
+		 * data is, so allow the user to set based upon their data.
 		 */
-		rel->tuples = 10 * cte_rows;
+		rel->tuples = recursive_worktable_estimate * cte_rows;
 	}
 	else
 	{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 932aefc777..60dfa66b28 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3740,6 +3740,19 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"recursive_worktable_estimate", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the planner's estimate of the size of "
+						 "the recursive_worktable as a multiple of the "
+						 "initial non-recursive query size."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&recursive_worktable_estimate,
+		DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE, 0.0, 1000000.0,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"geqo_selection_bias", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("GEQO: selective pressure within the population."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4cf5b26a36..c7eeb11a17 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -426,6 +426,7 @@
 					# JOIN clauses
 #plan_cache_mode = auto			# auto, force_generic_plan or
 					# force_custom_plan
+#recursive_worktable_estimate = 10.0 # Set to 1.0 for shortest path queries
 
 
 #------------------------------------------------------------------------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 356a51f370..1afaae5e61 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -29,6 +29,8 @@
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
 #define DEFAULT_PARALLEL_SETUP_COST  1000.0
 
+#define DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE 10.0
+
 #define DEFAULT_EFFECTIVE_CACHE_SIZE  524288	/* measured in pages */
 
 typedef enum
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 6b8ee0c69f..5e60f0bb27 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -92,6 +92,7 @@ extern PGDLLIMPORT double cpu_operator_cost;
 extern PGDLLIMPORT double parallel_tuple_cost;
 extern PGDLLIMPORT double parallel_setup_cost;
 extern PGDLLIMPORT int effective_cache_size;
+extern PGDLLIMPORT double recursive_worktable_estimate;
 
 extern double clamp_row_est(double nrows);

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Robert Haas (#6)

Re: Parameter for planner estimate of recursive queries

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.

I agree. It's a bit lame, but seems pretty harmless, and I can't see
us realistically doing a lot better with any reasonable amount of
work.

Yeah, agreed on all counts. The thing that makes it lame is that
there's no reason to expect that the same multiplier is good for
every recursive query done in an installation, or even in a session.

One could imagine dealing with that by adding custom syntax to WITH,
as we have already done once:

WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ...

But I *really* hesitate to go there, mainly because once we do
something like that we can't ever undo it. I think Simon's
proposal is a reasonable low-effort compromise.

Some nitpicks:

* The new calculation needs clamp_row_est(), since the float
GUC could be fractional or even zero.

* Do we want to prevent the GUC value from being zero? It's not
very sensible, plus I think we might want to reserve that value
to mean "use the built-in calculation", in case we ever do put
in some smarter logic here. But I'm not sure what a reasonable
non-zero lower bound would be.

* The proposed docs claim that a smaller setting works by biasing
the planner towards fast-start plans, but I don't think I believe
that explanation. I'd venture that we want text more along the
lines of "This may help the planner choose the most appropriate
method for joining the work table to the query's other tables".

regards, tom lane

#11

Simon Riggs

simon.riggs@enterprisedb.com

almost 4 years ago

In reply to: Tom Lane (#10)

Re: Parameter for planner estimate of recursive queries

On Wed, 23 Mar 2022 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On the one hand, this smells like a planner hint. But on the other
hand, it doesn't look like we will come up with proper graph-aware
selectivity estimation system any time soon, so just having all graph
OLTP queries suck until then because the planner hint is hardcoded
doesn't seem like a better solution. So I think this setting can be ok.

I agree. It's a bit lame, but seems pretty harmless, and I can't see
us realistically doing a lot better with any reasonable amount of
work.

Yeah, agreed on all counts. The thing that makes it lame is that
there's no reason to expect that the same multiplier is good for
every recursive query done in an installation, or even in a session.

One could imagine dealing with that by adding custom syntax to WITH,
as we have already done once:

WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ...

But I *really* hesitate to go there, mainly because once we do
something like that we can't ever undo it. I think Simon's
proposal is a reasonable low-effort compromise.

Some nitpicks:

* The new calculation needs clamp_row_est(), since the float
GUC could be fractional or even zero.

True, will do.

* Do we want to prevent the GUC value from being zero? It's not
very sensible, plus I think we might want to reserve that value
to mean "use the built-in calculation", in case we ever do put
in some smarter logic here. But I'm not sure what a reasonable
non-zero lower bound would be.

Agreed, makes sense.

* The proposed docs claim that a smaller setting works by biasing
the planner towards fast-start plans, but I don't think I believe
that explanation. I'd venture that we want text more along the
lines of "This may help the planner choose the most appropriate
method for joining the work table to the query's other tables".

OK, will improve.

[New patch version pending]

--
Simon Riggs http://www.EnterpriseDB.com/

#12

Simon Riggs

simon.riggs@enterprisedb.com

almost 4 years ago

In reply to: Simon Riggs (#11)

1 attachment(s)

Re: Parameter for planner estimate of recursive queries

On Wed, 23 Mar 2022 at 18:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

[New patch version pending]

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

recursive_worktable_estimate.v3.patchapplication/octet-stream; name=recursive_worktable_estimate.v3.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..d440429a2c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5919,6 +5919,23 @@ SELECT * FROM parent WHERE key = 2400;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-recursive-worktable-estimate" xreflabel="recursive_worktable_estimate">
+      <term><varname>recursive_worktable_estimate</varname> (<type>floating point</type>)
+      <indexterm>
+       <primary><varname>recursive_worktable_estimate</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the planner's estimate of the average size of the worktable as a multiple of
+        the initial non-recursive term of the query.  The default is 10.0.
+        This helps the planner choose the most appropriate method for joining the worktable
+        to the query's other tables.  A setting of 1.0 is appropriate for shortest path
+        queries; graph analytics queries may benefit from larger values.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
    </sect1>
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4d9f3b4bb6..4015efb314 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -123,6 +123,7 @@ double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double		parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
 double		parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST;
+double		recursive_worktable_estimate = DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE;
 
 int			effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE;
 
@@ -5664,11 +5665,20 @@ set_cte_size_estimates(PlannerInfo *root, RelOptInfo *rel, double cte_rows)
 
 	if (rte->self_reference)
 	{
+		double multiplier = recursive_worktable_estimate;
+
+		/*
+		 * Reserve zero as a special case meaning use-the-built-in-calculation.
+		 */
+		if (recursive_worktable_estimate == 0.0)
+			multiplier = DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE;
+
 		/*
-		 * In a self-reference, arbitrarily assume the average worktable size
-		 * is about 10 times the nonrecursive term's size.
+		 * Use recursive_worktable_estimate to get average size of worktable,
+		 * across all iterations. This will vary depending upon how bushy the
+		 * data is, so allow the user to set based upon their data.
 		 */
-		rel->tuples = 10 * cte_rows;
+		rel->tuples = clamp_row_est(multiplier * cte_rows);
 	}
 	else
 	{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 932aefc777..677fdcc46f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3740,6 +3740,19 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"recursive_worktable_estimate", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the planner's estimate of the average size "
+						 "of the worktable as a multiple of the initial "
+						 "non-recursive term of the query."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&recursive_worktable_estimate,
+		DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE, 0.0, 1000000.0,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"geqo_selection_bias", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("GEQO: selective pressure within the population."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4cf5b26a36..c7eeb11a17 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -426,6 +426,7 @@
 					# JOIN clauses
 #plan_cache_mode = auto			# auto, force_generic_plan or
 					# force_custom_plan
+#recursive_worktable_estimate = 10.0 # Set to 1.0 for shortest path queries
 
 
 #------------------------------------------------------------------------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 356a51f370..1afaae5e61 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -29,6 +29,8 @@
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
 #define DEFAULT_PARALLEL_SETUP_COST  1000.0
 
+#define DEFAULT_RECURSIVE_WORKTABLE_ESTIMATE 10.0
+
 #define DEFAULT_EFFECTIVE_CACHE_SIZE  524288	/* measured in pages */
 
 typedef enum
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 6b8ee0c69f..5e60f0bb27 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -92,6 +92,7 @@ extern PGDLLIMPORT double cpu_operator_cost;
 extern PGDLLIMPORT double parallel_tuple_cost;
 extern PGDLLIMPORT double parallel_setup_cost;
 extern PGDLLIMPORT int effective_cache_size;
+extern PGDLLIMPORT double recursive_worktable_estimate;
 
 extern double clamp_row_est(double nrows);

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Simon Riggs (#12)

Re: Parameter for planner estimate of recursive queries

Simon Riggs <simon.riggs@enterprisedb.com> writes:

[New patch version pending]

Do you have any objection if I rename the GUC to
recursive_worktable_factor? That seems a bit clearer as to what
it does, and it leaves more room for other knobs in the same area
if we decide we need any.

regards, tom lane

#14

Simon Riggs

simon.riggs@enterprisedb.com

almost 4 years ago

In reply to: Tom Lane (#13)

Re: Parameter for planner estimate of recursive queries

On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon.riggs@enterprisedb.com> writes:

[New patch version pending]

Do you have any objection if I rename the GUC to
recursive_worktable_factor? That seems a bit clearer as to what
it does, and it leaves more room for other knobs in the same area
if we decide we need any.

None, I think your proposal is a better name.

--
Simon Riggs http://www.EnterpriseDB.com/

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Simon Riggs (#14)

Re: Parameter for planner estimate of recursive queries

Simon Riggs <simon.riggs@enterprisedb.com> writes:

On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have any objection if I rename the GUC to
recursive_worktable_factor? That seems a bit clearer as to what
it does, and it leaves more room for other knobs in the same area
if we decide we need any.

None, I think your proposal is a better name.

Pushed that way.

regards, tom lane

#16

Simon Riggs

simon.riggs@enterprisedb.com

almost 4 years ago

In reply to: Tom Lane (#15)

Re: Parameter for planner estimate of recursive queries

On Thu, 24 Mar 2022 at 15:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon.riggs@enterprisedb.com> writes:

On Wed, 23 Mar 2022 at 20:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have any objection if I rename the GUC to
recursive_worktable_factor? That seems a bit clearer as to what
it does, and it leaves more room for other knobs in the same area
if we decide we need any.

None, I think your proposal is a better name.

Pushed that way.

Ok, thanks.

--
Simon Riggs http://www.EnterpriseDB.com/