Added prosupport function for estimating numeric generate_series rows

Started by 孤傲小二~阿沐about 1 year ago11 messages
#1孤傲小二~阿沐
tsinghualucky912@foxmail.com
1 attachment(s)

Hello hackers, I saw a recent submission: Teach planner how to estimate rows for timestamp generate_series. I provide a patch for the numeric type here, and a simple test is as follows:

postgres=# explain SELECT * FROM generate_series(-25.0, -1.0, 2.0);
                              QUERY PLAN                              
----------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..0.13 rows=13 width=32)
(1 row)

postgres=# explain SELECT * FROM generate_series(-25.0, -1.0);    
                              QUERY PLAN                              
----------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..0.25 rows=25 width=32)
(1 row)

postgres=#

I really want to know your thoughts, please give me feedback. Thank you.

Attachments:

0001-Added-prosupport-functions-for-estimating-numeric-ge.patchapplication/octet-stream; charset=utf-8; name=0001-Added-prosupport-functions-for-estimating-numeric-ge.patchDownload
From 5d3f393ea345e33001f0fea79e51ff146fc9fdca Mon Sep 17 00:00:00 2001
From: TsinghuaLucky912 <2903807914@qq.com>
Date: Wed, 27 Nov 2024 01:00:42 -0800
Subject: [PATCH] Added prosupport functions for estimating numeric
 generate_series rows

---
 src/backend/utils/adt/numeric.c | 103 ++++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat |   9 ++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 344d7137f9..5c67e1f443 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -32,6 +32,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "optimizer/optimizer.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/array.h"
@@ -1828,6 +1829,108 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * Planner support function for generate_series(numeric, numeric [, numeric])
+ */
+Datum
+generate_series_numeric_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestRows))
+	{
+		/* Try to estimate the number of rows returned */
+		SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+		if (is_funcclause(req->node))	/* be paranoid */
+		{
+			List	   *args = ((FuncExpr *) req->node)->args;
+			Node	   *arg1,
+					   *arg2,
+					   *arg3;
+
+			/* We can use estimated argument values here */
+			arg1 = estimate_expression_value(req->root, linitial(args));
+			arg2 = estimate_expression_value(req->root, lsecond(args));
+			if (list_length(args) >= 3)
+				arg3 = estimate_expression_value(req->root, lthird(args));
+			else
+				arg3 = NULL;
+
+			/*
+			 * If any argument is constant NULL, we can safely assume that
+			 * zero rows are returned.  Otherwise, if they're all non-NULL
+			 * constants, we can calculate the number of rows that will be
+			 * returned.  Use double arithmetic to avoid overflow hazards.
+			 */
+			if ((IsA(arg1, Const) &&
+				 ((Const *) arg1)->constisnull) ||
+				(IsA(arg2, Const) &&
+				 ((Const *) arg2)->constisnull) ||
+				(arg3 != NULL && IsA(arg3, Const) &&
+				 ((Const *) arg3)->constisnull))
+			{
+				req->rows = 0;
+				ret = (Node *) req;
+			}
+			else if (IsA(arg1, Const) &&
+					 IsA(arg2, Const) &&
+					 (arg3 == NULL || IsA(arg3, Const)))
+			{
+				Numeric		start,
+							finish,
+							step;
+				NumericVar	var_start,
+							var_finish,
+							var_diff;
+				NumericVar	nstep = const_one;
+
+				init_var(&var_start);
+				init_var(&var_finish);
+				init_var(&var_diff);
+
+				start = DatumGetNumeric(((Const *) arg1)->constvalue);
+				finish = DatumGetNumeric(((Const *) arg2)->constvalue);
+
+				set_var_from_num(start, &var_start);
+				set_var_from_num(finish, &var_finish);
+
+				if(arg3)
+				{
+					step = DatumGetNumeric(((Const *) arg3)->constvalue);
+					init_var_from_num(step, &nstep);
+				}
+
+				sub_var(&var_finish, &var_start, &var_diff);
+
+				/* This equation works for either sign of step */
+				if (cmp_var(&nstep, &const_zero) != 0)
+				{
+					if(nstep.sign != var_diff.sign)
+					{
+						req->rows = 0;
+						ret = (Node *) req;
+					}
+					else
+					{
+						double ddiff;
+						NumericVar q;
+						init_var(&q);
+						div_var(&var_diff, &nstep, &q, 0, false, false);
+						ddiff = numericvar_to_double_no_overflow(&q);
+
+						req->rows = floor(ddiff + 1);
+						ret = (Node *) req;
+					}
+				}
+			}
+		}
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*
  * Implements the numeric version of the width_bucket() function
  * defined by SQL2003. See also width_bucket_float8().
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..9575524007 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8464,13 +8464,18 @@
   proname => 'generate_series_int8_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'generate_series_int8_support' },
 { oid => '3259', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric numeric',
   prosrc => 'generate_series_step_numeric' },
 { oid => '3260', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric',
   prosrc => 'generate_series_numeric' },
+{ oid => '8405', descr => 'planner support for generate_series',
+  proname => 'generate_series_numeric_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'generate_series_numeric_support' },
 { oid => '938', descr => 'non-persistent series generator',
   proname => 'generate_series', prorows => '1000',
   prosupport => 'generate_series_timestamp_support', proretset => 't',
-- 
2.31.1

#2Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: 孤傲小二~阿沐 (#1)
Re: Added prosupport function for estimating numeric generate_series rows

On Thu, 28 Nov 2024 at 07:47, 孤傲小二~阿沐 <tsinghualucky912@foxmail.com> wrote:

Hello hackers, I saw a recent submission: Teach planner how to estimate rows for timestamp generate_series. I provide a patch for the numeric type here, and a simple test is as follows:

I really want to know your thoughts, please give me feedback. Thank you.

Good idea.

Some random review comments:

This should test for special inputs, NaN and infinity (it doesn't make
sense to convert those to NumericVars). generate_series() produces an
error for all such inputs, so the support function can just not
produce an estimate for these cases (the same as when the step size is
zero).

NumericVars initialised using init_var() should be freed using
free_var(). That can be avoided for the 3 inputs, by using
init_var_from_num(), rather than set_var_from_num(), which saves
copying digit arrays. It should then be possible to write this using a
single additional allocated NumericVar and one init_var()/free_var()
pair.

There's no need to use floor(), since the div_var() call already
produces a floored integer result.

It could use some regression test cases.

Regards,
Dean

#3孤傲小二~阿沐
tsinghualucky912@foxmail.com
In reply to: Dean Rasheed (#2)
1 attachment(s)
Re: Added prosupport function for estimating numeric generate_series rows

Hello, thank you very much for your attention and guidance. I have modified and improved the problem you mentioned. The patch of version v2 is attached below.&nbsp;

Regarding regression testing, I implemented it with the help of some facilities of generate_series_timestamp_support last time. Everything is normal in Cirrus CI test.

Looking forward to your reply, thank you very much.

原始邮件

发件人: "Dean Rasheed" <dean.a.rasheed@gmail.com&gt;
发件时间: 2024年11月28日 21:56
收件人: "孤傲小二~阿沐" <tsinghualucky912@foxmail.com&gt;
抄送: "pgsql-hackers" <pgsql-hackers@lists.postgresql.org&gt; , "japinli" <japinli@hotmail.com&gt; , "jian.universality" <jian.universality@gmail.com&gt;
主题: Re: Added prosupport function for estimating numeric generate_series rows

On Thu, 28 Nov 2024 at 07:47, 孤傲小二~阿沐 wrote:
&gt;
&gt; Hello hackers, I saw a recent submission: Teach planner how to estimate rows for timestamp generate_series. I provide a patch for the numeric type here, and a simple test is as follows:
&gt;
&gt; I really want to know your thoughts, please give me feedback. Thank you.
&gt;

Good idea.

Some random review comments:

This should test for special inputs, NaN and infinity (it doesn't make
sense to convert those to NumericVars). generate_series() produces an
error for all such inputs, so the support function can just not
produce an estimate for these cases (the same as when the step size is
zero).

NumericVars initialised using init_var() should be freed using
free_var(). That can be avoided for the 3 inputs, by using
init_var_from_num(), rather than set_var_from_num(), which saves
copying digit arrays. It should then be possible to write this using a
single additional allocated NumericVar and one init_var()/free_var()
pair.

There's no need to use floor(), since the div_var() call already
produces a floored integer result.

It could use some regression test cases.

Regards,
Dean

Attachments:

v2_0001-Added-prosupport-function-for-estimating-numeric-gen.patchapplication/octet-stream; charset=utf-8; name=v2_0001-Added-prosupport-function-for-estimating-numeric-gen.patchDownload
From c4ccd5a093cd94dd010fc0bf4d20a93282ea9c83 Mon Sep 17 00:00:00 2001
From: TsinghuaLucky912 <2903807914@qq.com>
Date: Thu, 28 Nov 2024 08:51:39 -0800
Subject: [PATCH] Added prosupport function for estimating numeric
 generate_series rows

---
 src/backend/utils/adt/numeric.c              | 117 +++++++++++++++++++
 src/include/catalog/pg_proc.dat              |   9 +-
 src/test/regress/expected/misc_functions.out |  52 +++++++++
 src/test/regress/sql/misc_functions.sql      |  33 ++++++
 4 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 344d7137f9..e42b4dfb88 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -32,6 +32,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "optimizer/optimizer.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/array.h"
@@ -1828,6 +1829,122 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * Planner support function for generate_series(numeric, numeric [, numeric])
+ */
+Datum
+generate_series_numeric_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestRows))
+	{
+		/* Try to estimate the number of rows returned */
+		SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+		if (is_funcclause(req->node))	/* be paranoid */
+		{
+			List	   *args = ((FuncExpr *) req->node)->args;
+			Node	   *arg1,
+					   *arg2,
+					   *arg3;
+
+			/* We can use estimated argument values here */
+			arg1 = estimate_expression_value(req->root, linitial(args));
+			arg2 = estimate_expression_value(req->root, lsecond(args));
+			if (list_length(args) >= 3)
+				arg3 = estimate_expression_value(req->root, lthird(args));
+			else
+				arg3 = NULL;
+
+			/*
+			 * If any argument is constant NULL, we can safely assume that
+			 * zero rows are returned.  Otherwise, if they're all non-NULL
+			 * constants, we can calculate the number of rows that will be
+			 * returned.  Use double arithmetic to avoid overflow hazards.
+			 */
+			if ((IsA(arg1, Const) &&
+				 ((Const *) arg1)->constisnull) ||
+				(IsA(arg2, Const) &&
+				 ((Const *) arg2)->constisnull) ||
+				(arg3 != NULL && IsA(arg3, Const) &&
+				 ((Const *) arg3)->constisnull))
+			{
+				req->rows = 0;
+				ret = (Node *) req;
+			}
+			else if (IsA(arg1, Const) &&
+					 IsA(arg2, Const) &&
+					 (arg3 == NULL || IsA(arg3, Const)))
+			{
+				Numeric		start,
+							finish,
+							step;
+				NumericVar	var_start,
+							var_finish,
+							var_diff;
+				NumericVar	nstep = const_one;
+
+				init_var(&var_start);
+				init_var(&var_finish);
+				init_var(&var_diff);
+
+				start = DatumGetNumeric(((Const *) arg1)->constvalue);
+				finish = DatumGetNumeric(((Const *) arg2)->constvalue);
+
+				if (NUMERIC_IS_SPECIAL(start) || NUMERIC_IS_SPECIAL(finish))
+				{
+					goto cleanup;
+				}
+
+				init_var_from_num(start, &var_start);
+				init_var_from_num(finish, &var_finish);
+
+				if(arg3)
+				{
+					step = DatumGetNumeric(((Const *) arg3)->constvalue);
+					if (NUMERIC_IS_SPECIAL(step))
+					{
+						goto cleanup;
+					}
+					init_var_from_num(step, &nstep);
+				}
+
+				sub_var(&var_finish, &var_start, &var_diff);
+
+				/* This equation works for either sign of step */
+				if (cmp_var(&nstep, &const_zero) != 0)
+				{
+					if(nstep.sign != var_diff.sign)
+					{
+						req->rows = 0;
+						ret = (Node *) req;
+					}
+					else
+					{
+						NumericVar q;
+						init_var(&q);
+						div_var(&var_diff, &nstep, &q, 0, false, false);
+
+						req->rows = numericvar_to_double_no_overflow(&q) + 1;
+						ret = (Node *) req;
+
+						free_var(&q);
+					}
+				}
+
+cleanup:
+				free_var(&var_start);
+				free_var(&var_finish);
+				free_var(&var_diff);
+			}
+		}
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*
  * Implements the numeric version of the width_bucket() function
  * defined by SQL2003. See also width_bucket_float8().
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..9575524007 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8464,13 +8464,18 @@
   proname => 'generate_series_int8_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'generate_series_int8_support' },
 { oid => '3259', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric numeric',
   prosrc => 'generate_series_step_numeric' },
 { oid => '3260', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric',
   prosrc => 'generate_series_numeric' },
+{ oid => '8405', descr => 'planner support for generate_series',
+  proname => 'generate_series_numeric_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'generate_series_numeric_support' },
 { oid => '938', descr => 'non-persistent series generator',
   proname => 'generate_series', prorows => '1000',
   prosupport => 'generate_series_timestamp_support', proretset => 't',
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 36b1201f9f..64aced884a 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -712,6 +712,58 @@ false, true, false, true);
 -- the support function.
 SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
 ERROR:  step size cannot equal zero
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13 loops=1)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+                                   explain_mask_costs                                   
+----------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_numeric() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);
+ERROR:  step size cannot equal zero
 -- Test functions for control data
 SELECT count(*) > 0 AS ok FROM pg_control_checkpoint();
  ok 
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b7495d70eb..f1d8c0f6ea 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -311,6 +311,39 @@ false, true, false, true);
 -- the support function.
 SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
 
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_numeric() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);
+
 -- Test functions for control data
 SELECT count(*) > 0 AS ok FROM pg_control_checkpoint();
 SELECT count(*) > 0 AS ok FROM pg_control_init();
-- 
2.31.1

#4David Rowley
dgrowleyml@gmail.com
In reply to: 孤傲小二~阿沐 (#3)
Re: Added prosupport function for estimating numeric generate_series rows

On Fri, 29 Nov 2024 at 06:25, 孤傲小二~阿沐 <tsinghualucky912@foxmail.com> wrote:

Hello, thank you very much for your attention and guidance. I have modified and improved the problem you mentioned. The patch of version v2 is attached below.

I've only had a quick look at the patch.

* The following needs a DatumGetFloat8():

+ req->rows = numericvar_to_double_no_overflow(&q) + 1;

* It would also be good to see a comment explaining the following line:

+ if(nstep.sign != var_diff.sign)

Something like: /* When the sign of the step size and the series range
don't match, there are no rows in the series. */

* You should add one test for the generate_series(numeric, numeric).
You've only got tests for the 3 arg version.

Also a few minor things:

* Missing space after "if"

+ if(arg3)

* We always have an empty line after variable declarations, that's missing in:

+ NumericVar q;
+ init_var(&q);

* No need for the braces in:

+ if (NUMERIC_IS_SPECIAL(step))
+ {
+ goto cleanup;
+ }

David

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#4)
Re: Added prosupport function for estimating numeric generate_series rows

David Rowley <dgrowleyml@gmail.com> writes:

Also a few minor things:
* Missing space after "if"
+ if(arg3)
* We always have an empty line after variable declarations, that's missing in:
+ NumericVar q;
+ init_var(&q);

This sort of stuff is best addressed by running the code through
pgindent, rather than fixing it manually. Usually we don't insist
on submitters getting it right; the committer should pgindent it.

regards, tom lane

#6songjinzhou
tsinghualucky912@foxmail.com
In reply to: Tom Lane (#5)
1 attachment(s)
Re: Added prosupport function for estimating numeric generate_series rows

&gt; This sort of stuff is best addressed by running the code through
&gt; pgindent, rather than fixing it manually. Usually we don't insist
&gt; on submitters getting it right; the committer should pgindent it.

Hello, thank you and David Rowley for your comments.&nbsp;

I have used pgindent to adjust the code format and added comments and missing regression test cases. Here is the patch of version v3.&nbsp;

I look forward to your reply. Thank you very much!

Regards, Song Jinzhou

Attachments:

v3_0001-Added-prosupport-function-for-estimating-numeric-gen.patchapplication/octet-stream; charset=utf-8; name=v3_0001-Added-prosupport-function-for-estimating-numeric-gen.patchDownload
From d7c94eb6eb008ee962dc23ebe24f8ae7aa84895b Mon Sep 17 00:00:00 2001
From: TsinghuaLucky912 <2903807914@qq.com>
Date: Thu, 28 Nov 2024 08:51:39 -0800
Subject: [PATCH] Added prosupport function for estimating numeric
 generate_series rows

---
 src/backend/utils/adt/numeric.c              | 118 +++++++++++++++++++
 src/include/catalog/pg_proc.dat              |   9 +-
 src/test/regress/expected/misc_functions.out |  60 ++++++++++
 src/test/regress/sql/misc_functions.sql      |  37 ++++++
 4 files changed, 222 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 344d7137f9..5cec3d6bd1 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -32,6 +32,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "optimizer/optimizer.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/array.h"
@@ -1828,6 +1829,123 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * Planner support function for generate_series(numeric, numeric [, numeric])
+ */
+Datum
+generate_series_numeric_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestRows))
+	{
+		/* Try to estimate the number of rows returned */
+		SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+		if (is_funcclause(req->node))	/* be paranoid */
+		{
+			List	   *args = ((FuncExpr *) req->node)->args;
+			Node	   *arg1,
+					   *arg2,
+					   *arg3;
+
+			/* We can use estimated argument values here */
+			arg1 = estimate_expression_value(req->root, linitial(args));
+			arg2 = estimate_expression_value(req->root, lsecond(args));
+			if (list_length(args) >= 3)
+				arg3 = estimate_expression_value(req->root, lthird(args));
+			else
+				arg3 = NULL;
+
+			/*
+			 * If any argument is constant NULL, we can safely assume that
+			 * zero rows are returned.  Otherwise, if they're all non-NULL
+			 * constants, we can calculate the number of rows that will be
+			 * returned.  Use double arithmetic to avoid overflow hazards.
+			 */
+			if ((IsA(arg1, Const) &&
+				 ((Const *) arg1)->constisnull) ||
+				(IsA(arg2, Const) &&
+				 ((Const *) arg2)->constisnull) ||
+				(arg3 != NULL && IsA(arg3, Const) &&
+				 ((Const *) arg3)->constisnull))
+			{
+				req->rows = 0;
+				ret = (Node *) req;
+			}
+			else if (IsA(arg1, Const) &&
+					 IsA(arg2, Const) &&
+					 (arg3 == NULL || IsA(arg3, Const)))
+			{
+				Numeric		start,
+							finish,
+							step;
+				NumericVar	var_start,
+							var_finish,
+							var_diff;
+				NumericVar	nstep = const_one;
+
+				init_var(&var_start);
+				init_var(&var_finish);
+				init_var(&var_diff);
+
+				start = DatumGetNumeric(((Const *) arg1)->constvalue);
+				finish = DatumGetNumeric(((Const *) arg2)->constvalue);
+
+				if (NUMERIC_IS_SPECIAL(start) || NUMERIC_IS_SPECIAL(finish))
+				{
+					goto cleanup;
+				}
+
+				init_var_from_num(start, &var_start);
+				init_var_from_num(finish, &var_finish);
+
+				if (arg3)
+				{
+					step = DatumGetNumeric(((Const *) arg3)->constvalue);
+					if (NUMERIC_IS_SPECIAL(step))
+						goto cleanup;
+
+					init_var_from_num(step, &nstep);
+				}
+
+				sub_var(&var_finish, &var_start, &var_diff);
+
+				/* This equation works for either sign of step */
+				if (cmp_var(&nstep, &const_zero) != 0)
+				{
+					/* When the sign of the step size and the series range don't match, there are no rows in the series. */
+					if (nstep.sign != var_diff.sign)
+					{
+						req->rows = 0;
+						ret = (Node *) req;
+					}
+					else
+					{
+						NumericVar	q;
+
+						init_var(&q);
+						div_var(&var_diff, &nstep, &q, 0, false, false);
+
+						req->rows = numericvar_to_double_no_overflow(&q) + 1;
+						ret = (Node *) req;
+
+						free_var(&q);
+					}
+				}
+
+		cleanup:
+				free_var(&var_start);
+				free_var(&var_finish);
+				free_var(&var_diff);
+			}
+		}
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 /*
  * Implements the numeric version of the width_bucket() function
  * defined by SQL2003. See also width_bucket_float8().
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..9575524007 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8464,13 +8464,18 @@
   proname => 'generate_series_int8_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'generate_series_int8_support' },
 { oid => '3259', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric numeric',
   prosrc => 'generate_series_step_numeric' },
 { oid => '3260', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric',
   prosrc => 'generate_series_numeric' },
+{ oid => '8405', descr => 'planner support for generate_series',
+  proname => 'generate_series_numeric_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'generate_series_numeric_support' },
 { oid => '938', descr => 'non-persistent series generator',
   proname => 'generate_series', prorows => '1000',
   prosupport => 'generate_series_timestamp_support', proretset => 't',
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 36b1201f9f..2fb8c771db 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -712,6 +712,66 @@ false, true, false, true);
 -- the support function.
 SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
 ERROR:  step size cannot equal zero
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13 loops=1)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+                                   explain_mask_costs                                   
+----------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, '-infinity'::NUMERIC) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_numeric() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);
+ERROR:  step size cannot equal zero
 -- Test functions for control data
 SELECT count(*) > 0 AS ok FROM pg_control_checkpoint();
  ok 
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b7495d70eb..3e8171cc09 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -311,6 +311,43 @@ false, true, false, true);
 -- the support function.
 SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
 
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, '-infinity'::NUMERIC) g(s);$$,
+false, true, false, true);
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_numeric() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);
+
 -- Test functions for control data
 SELECT count(*) > 0 AS ok FROM pg_control_checkpoint();
 SELECT count(*) > 0 AS ok FROM pg_control_init();
-- 
2.31.1

#7David Rowley
dgrowleyml@gmail.com
In reply to: songjinzhou (#6)
Re: Added prosupport function for estimating numeric generate_series rows

On Fri, 29 Nov 2024 at 18:50, songjinzhou <tsinghualucky912@foxmail.com> wrote:

Hello, thank you and David Rowley for your comments.

I have used pgindent to adjust the code format and added comments and missing regression test cases. Here is the patch of version v3.

It looks fine to me. The only things I'd adjust are stylistic,
namely; 1) remove two tabs before the goto label, 2) remove redundant
braces around the goto cleanup, 3) rename the variable "q" to
something slightly more meaningful, maybe "res" or "rows".

I'll defer to Dean.

David

#8Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#7)
Re: Added prosupport function for estimating numeric generate_series rows

On Fri, 29 Nov 2024, 12:01 David Rowley, <dgrowleyml@gmail.com> wrote:

On Fri, 29 Nov 2024 at 18:50, songjinzhou <tsinghualucky912@foxmail.com>
wrote:

Hello, thank you and David Rowley for your comments.

I have used pgindent to adjust the code format and added comments and

missing regression test cases. Here is the patch of version v3.

It looks fine to me. The only things I'd adjust are stylistic,
namely; 1) remove two tabs before the goto label, 2) remove redundant
braces around the goto cleanup, 3) rename the variable "q" to
something slightly more meaningful, maybe "res" or "rows".

I'll defer to Dean.

There are a couple more things that I think need tidying up. I'll post an
update when I get back to my computer.

Regards,
Dean

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#8)
1 attachment(s)
Re: Added prosupport function for estimating numeric generate_series rows

On Fri, 29 Nov 2024 at 13:10, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

There are a couple more things that I think need tidying up. I'll post an update when I get back to my computer.

Here's an update with some cosmetic tidying up, plus a couple of
not-so-cosmetic changes:

The new #include wasn't in the right place alphabetically (the same is
true for the recent timestamp equivalent function).

It's not necessary to call init_var() for a variable that you're going
to initialise with init_var_from_num(), and it's then not necessary to
call free_var() for that variable.

It's not necessary to have separate NumericVars for the difference and
quotient -- the same variable can be reused.

Doing both those things means that there's only one variable to free
after the computation, and it can be kept local to if-step-not-zero
code block, so there's no need for the "goto cleanup" stuff.

It seems worth avoiding div_var() in the 2-argument case, when step = 1.

Regards,
Dean

Attachments:

v4-0001-Add-a-planner-support-function-for-numeric-genera.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-a-planner-support-function-for-numeric-genera.patchDownload
From b2de4cdc4f0c2752a734ce9dd2a55d92644ba92e Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Fri, 29 Nov 2024 19:49:28 +0000
Subject: [PATCH v4] Add a planner support function for numeric
 generate_series().

This allows the planner to estimate the number of rows returned by
generate_series(numeric, numeric[, numeric]), when the input values
can be estimated at plan time.

Song Jinzhou, reviewed by Dean Rasheed and David Rowley.

Discussion: https://postgr.es/m/tencent_F43E7F4DD50EF5986D1051DE8DE547910206%40qq.com
Discussion: https://postgr.es/m/tencent_1F6D5B9A1545E02FD7D0EE508DFD056DE50A%40qq.com
---
 src/backend/utils/adt/numeric.c              | 121 +++++++++++++++++++
 src/include/catalog/pg_proc.dat              |   9 +-
 src/test/regress/expected/misc_functions.out |  65 ++++++++++
 src/test/regress/sql/misc_functions.sql      |  38 ++++++
 4 files changed, 231 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 344d7137f9..5c33cd3ab0 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -34,6 +34,7 @@
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "optimizer/optimizer.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/float.h"
@@ -1827,6 +1828,126 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 		SRF_RETURN_DONE(funcctx);
 }
 
+/*
+ * Planner support function for generate_series(numeric, numeric [, numeric])
+ */
+Datum
+generate_series_numeric_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestRows))
+	{
+		/* Try to estimate the number of rows returned */
+		SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+		if (is_funcclause(req->node))	/* be paranoid */
+		{
+			List	   *args = ((FuncExpr *) req->node)->args;
+			Node	   *arg1,
+					   *arg2,
+					   *arg3;
+
+			/* We can use estimated argument values here */
+			arg1 = estimate_expression_value(req->root, linitial(args));
+			arg2 = estimate_expression_value(req->root, lsecond(args));
+			if (list_length(args) >= 3)
+				arg3 = estimate_expression_value(req->root, lthird(args));
+			else
+				arg3 = NULL;
+
+			/*
+			 * If any argument is constant NULL, we can safely assume that
+			 * zero rows are returned.  Otherwise, if they're all non-NULL
+			 * constants, we can calculate the number of rows that will be
+			 * returned.
+			 */
+			if ((IsA(arg1, Const) &&
+				 ((Const *) arg1)->constisnull) ||
+				(IsA(arg2, Const) &&
+				 ((Const *) arg2)->constisnull) ||
+				(arg3 != NULL && IsA(arg3, Const) &&
+				 ((Const *) arg3)->constisnull))
+			{
+				req->rows = 0;
+				ret = (Node *) req;
+			}
+			else if (IsA(arg1, Const) &&
+					 IsA(arg2, Const) &&
+					 (arg3 == NULL || IsA(arg3, Const)))
+			{
+				Numeric		start_num;
+				Numeric		stop_num;
+				NumericVar	step = const_one;
+
+				/*
+				 * If any argument is NaN or infinity, generate_series() will
+				 * error out, so we needn't produce an estimate.
+				 */
+				start_num = DatumGetNumeric(((Const *) arg1)->constvalue);
+				stop_num = DatumGetNumeric(((Const *) arg2)->constvalue);
+
+				if (NUMERIC_IS_SPECIAL(start_num) ||
+					NUMERIC_IS_SPECIAL(stop_num))
+					PG_RETURN_POINTER(NULL);
+
+				if (arg3)
+				{
+					Numeric		step_num;
+
+					step_num = DatumGetNumeric(((Const *) arg3)->constvalue);
+
+					if (NUMERIC_IS_SPECIAL(step_num))
+						PG_RETURN_POINTER(NULL);
+
+					init_var_from_num(step_num, &step);
+				}
+
+				/*
+				 * The number of rows that will be returned is given by
+				 * floor((stop - start) / step) + 1, if the sign of step
+				 * matches the sign of stop - start.  Otherwise, no rows will
+				 * be returned.
+				 */
+				if (cmp_var(&step, &const_zero) != 0)
+				{
+					NumericVar	start;
+					NumericVar	stop;
+					NumericVar	res;
+
+					init_var_from_num(start_num, &start);
+					init_var_from_num(stop_num, &stop);
+
+					init_var(&res);
+					sub_var(&stop, &start, &res);
+
+					if (step.sign != res.sign)
+					{
+						/* no rows will be returned */
+						req->rows = 0;
+						ret = (Node *) req;
+					}
+					else
+					{
+						if (arg3)
+							div_var(&res, &step, &res, 0, false, false);
+						else
+							trunc_var(&res, 0); /* step = 1 */
+
+						req->rows = numericvar_to_double_no_overflow(&res) + 1;
+						ret = (Node *) req;
+					}
+
+					free_var(&res);
+				}
+			}
+		}
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 
 /*
  * Implements the numeric version of the width_bucket() function
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..9575524007 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8464,13 +8464,18 @@
   proname => 'generate_series_int8_support', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'generate_series_int8_support' },
 { oid => '3259', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric numeric',
   prosrc => 'generate_series_step_numeric' },
 { oid => '3260', descr => 'non-persistent series generator',
-  proname => 'generate_series', prorows => '1000', proretset => 't',
+  proname => 'generate_series', prorows => '1000',
+  prosupport => 'generate_series_numeric_support', proretset => 't',
   prorettype => 'numeric', proargtypes => 'numeric numeric',
   prosrc => 'generate_series_numeric' },
+{ oid => '8405', descr => 'planner support for generate_series',
+  proname => 'generate_series_numeric_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'generate_series_numeric_support' },
 { oid => '938', descr => 'non-persistent series generator',
   proname => 'generate_series', prorows => '1000',
   prosupport => 'generate_series_timestamp_support', proretset => 't',
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 36b1201f9f..8c73d2b621 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -712,6 +712,71 @@ false, true, false, true);
 -- the support function.
 SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
 ERROR:  step size cannot equal zero
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25 loops=1)
+(1 row)
+
+-- As above but with non-default step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                    
+------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+                                   explain_mask_costs                                   
+----------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for error cases (infinity/NaN values
+-- and zero step size)
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
 -- Test functions for control data
 SELECT count(*) > 0 AS ok FROM pg_control_checkpoint();
  ok 
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b7495d70eb..0a0c6e5a6b 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -311,6 +311,44 @@ false, true, false, true);
 -- the support function.
 SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
 
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+
+-- As above but with non-default step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for error cases (infinity/NaN values
+-- and zero step size)
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
+false, true, false, true);
+
 -- Test functions for control data
 SELECT count(*) > 0 AS ok FROM pg_control_checkpoint();
 SELECT count(*) > 0 AS ok FROM pg_control_init();
-- 
2.43.0

#10tsinghualucky912@foxmail.com
tsinghualucky912@foxmail.com
In reply to: 孤傲小二~阿沐 (#3)
Re: Re: Added prosupport function for estimating numeric generate_series rows

On 2024-11-30, Dean Rasheed wrote:
On Fri, 29 Nov 2024 at 13:10, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

There are a couple more things that I think need tidying up. I'll post an update when I get back to my computer.

Here's an update with some cosmetic tidying up, plus a couple of
not-so-cosmetic changes:

The new #include wasn't in the right place alphabetically (the same is
true for the recent timestamp equivalent function).

It's not necessary to call init_var() for a variable that you're going
to initialise with init_var_from_num(), and it's then not necessary to
call free_var() for that variable.

It's not necessary to have separate NumericVars for the difference and
quotient -- the same variable can be reused.

Doing both those things means that there's only one variable to free
after the computation, and it can be kept local to if-step-not-zero
code block, so there's no need for the "goto cleanup" stuff.

It seems worth avoiding div_var() in the 2-argument case, when step = 1.

Regards,
Dean

Dear Dean Rasheed, I have reviewed the v4 patch and it is very thoughtful and reasonable, with a very clever attention to detail (plus I am very happy that we can get rid of the goto, which I was not a big fan of).

This patch looks very good and I have no complaints about it. Thanks again for your help from beginning to end!

Regards, Song Jinzhou

#11Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: tsinghualucky912@foxmail.com (#10)
Re: Re: Added prosupport function for estimating numeric generate_series rows

On Sat, 30 Nov 2024 at 00:38, tsinghualucky912@foxmail.com
<tsinghualucky912@foxmail.com> wrote:

Dear Dean Rasheed, I have reviewed the v4 patch and it is very thoughtful and reasonable, with a very clever attention to detail (plus I am very happy that we can get rid of the goto, which I was not a big fan of).

This patch looks very good and I have no complaints about it. Thanks again for your help from beginning to end!

Cool. Patch committed.

Regards,
Dean