A little cosmetic to convert_VALUES_to_ANY()

Started by Tender Wang6 months ago8 messages
#1Tender Wang
tndrwang@gmail.com
1 attachment(s)

Hi,

While debugging the pull-up sublink codes, I noticed the
convert_VALUES_to_ANY().
The function is to convert "where colX in VALUES(xxx)" into SAOP. It
firstly scans the values_list to
make sure no volatile function is in this list, then it scans this
values_list again to check that it
only includes Const items.

We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).

Also, add a brace for better/more consistent style in the attached patch.

--
Thanks,
Tender Wang

Attachments:

0001-A-little-cosmetic-to-convert_VALUES_to_ANY.patchtext/plain; charset=US-ASCII; name=0001-A-little-cosmetic-to-convert_VALUES_to_ANY.patchDownload
From 0eeb7487ba5bdacd260987514cd0a77eb8c88eaa Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Sun, 27 Jul 2025 15:26:20 +0800
Subject: [PATCH] A little cosmetic to convert_VALUES_to_ANY().

In original logic, we have to do two passes iteration over the
values_lists. One is to check if have volatile functions, the other
one is to check if have no Const. We can merge above two iterations into
one. Also add a brace for better/more consistent style.
---
 src/backend/optimizer/plan/subselect.c    | 9 ++++++---
 src/backend/optimizer/prep/prepjointree.c | 4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d71ed958e31..7a275f430f2 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1255,11 +1255,10 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values)
 
 	/*
 	 * Also, check that only RTE corresponds to VALUES; the list of values has
-	 * at least two items and no volatile functions.
+	 * at least two items.
 	 */
 	if (rte->rtekind != RTE_VALUES ||
-		list_length(rte->values_lists) < 2 ||
-		contain_volatile_functions((Node *) rte->values_lists))
+		list_length(rte->values_lists) < 2)
 		return NULL;
 
 	foreach(lc, rte->values_lists)
@@ -1267,6 +1266,10 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values)
 		List	   *elem = lfirst(lc);
 		Node	   *value = linitial(elem);
 
+		/* Must have no volatile functions */
+		if (contain_volatile_functions(value))
+			return NULL;
+
 		/*
 		 * Prepare an evaluation of the right side of the operator with
 		 * substitution of the given value.
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 35e8d3c183b..d9908d16d79 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -848,13 +848,13 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
 			if ((saop = convert_VALUES_to_ANY(root,
 											  sublink->testexpr,
 											  (Query *) sublink->subselect)) != NULL)
-
+			{
 				/*
 				 * The VALUES sequence was simplified.  Nothing more to do
 				 * here.
 				 */
 				return (Node *) saop;
-
+			}
 			if ((j = convert_ANY_sublink_to_join(root, sublink,
 												 available_rels1)) != NULL)
 			{
-- 
2.34.1

#2Tender Wang
tndrwang@gmail.com
In reply to: Tender Wang (#1)
Re: A little cosmetic to convert_VALUES_to_ANY()

Tender Wang <tndrwang@gmail.com> 于2025年7月27日周日 15:53写道:

Hi,

While debugging the pull-up sublink codes, I noticed the
convert_VALUES_to_ANY().
The function is to convert "where colX in VALUES(xxx)" into SAOP. It
firstly scans the values_list to
make sure no volatile function is in this list, then it scans this
values_list again to check that it
only includes Const items.

We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).

Also, add a brace for better/more consistent style in the attached patch.

I have added this to the commitfest in [1]https://commitfest.postgresql.org/patch/5937/. I hope someone can help me
review it.

[1]: https://commitfest.postgresql.org/patch/5937/

--
Thanks,
Tender Wang

#3Chao Li
li.evan.chao@gmail.com
In reply to: Tender Wang (#2)
Re: A little cosmetic to convert_VALUES_to_ANY()

I tested this patch locally and it works for me.

However I doubt if it really improve performance. The original code "contain_volatile_functions((Node *) rte->values_lists)" recursively work through rte-values_list, and this patch move contain_volatile_functions() into the for loop, and checks against every item of rte->values_list. As entries of values_list could just be Const, contain_volatile_functions() will then return immediately, so that this changes reduces total calls to contain_volatile_functions(). From this perspective, this patch may improve the performance.

On the other side, let's say a case where a values_list contains 10 items, and the last item is volatile. With the original code, it won't enter the for loop, nothing will be built; with this patch, operations have been performed on the first 9 items, but eventually it returns NULL when hit the last item. So in this case, performance could be worse.

Overall, I am afraid the burden could beat the gain.

The new status of this patch is: Waiting on Author

#4Tender Wang
tndrwang@gmail.com
In reply to: Chao Li (#3)
Re: A little cosmetic to convert_VALUES_to_ANY()

Chao Li <li.evan.chao@gmail.com> 于2025年8月4日周一 11:11写道:

I tested this patch locally and it works for me.

However I doubt if it really improve performance. The original code
"contain_volatile_functions((Node *) rte->values_lists)" recursively work
through rte-values_list, and this patch move contain_volatile_functions()
into the for loop, and checks against every item of rte->values_list. As
entries of values_list could just be Const, contain_volatile_functions()
will then return immediately, so that this changes reduces total calls to
contain_volatile_functions(). From this perspective, this patch may improve
the performance.

On the other side, let's say a case where a values_list contains 10 items,
and the last item is volatile. With the original code, it won't enter the
for loop, nothing will be built; with this patch, operations have been
performed on the first 9 items, but eventually it returns NULL when hit the
last item. So in this case, performance could be worse.

Yes, in this case, the original code scans the values_list, finds the
volatile functions, and returns from convert_VALUES_to_ANY().
My patch will perform unnecessary convert_testexpr
and eval_const_expressions work.

Overall, I am afraid the burden could beat the gain.

It's not easy to evaluate performance gain.
I'm not sure it's worth doing the change. If someone doesn't like the
patch. I will withdraw it.

The new status of this patch is: Waiting on Author

Thanks for your review.

Actually, the purpose of convert_VALUES_to_ANY() is only to convert the
sublink that includes the Values expression to SAOP.
I have a question: Can we move the convert_VALUES_to_ANY() to another
place? For example:preprocess_qual_conditions().
In pull_up_sublinks_jointree_recurse(), what we want to do is to pull up
the sublinks.

--
Thanks,
Tender Wang

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Tender Wang (#1)
Re: A little cosmetic to convert_VALUES_to_ANY()

On Sun, Jul 27, 2025 at 12:53 AM Tender Wang <tndrwang@gmail.com> wrote:

We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).

This seems like an unusual usage of big-O notation. Always saw the
meaningful pieces inside the O, and in general I thought O(n) =~ O(xn) for
all x, but especially if x is small.

The point made by Chao Li, that by using a loop guard we avoid performing
expression evaluation, seems like a reasonable and sufficient non-data
supported reason for having the code structured this way. I'd at least
want a more data-driven reason to try and change it in such a simple way
against the read/write boundary that presently exists.

I was pondering whether contain_volatile_functions could/should be taught
to also detect whether every expression it sees resolves to a constant...
contain_volatile_functions((Node *) rte->values_lists, &has_non_constants)

But got a little ways down the call stack and hit the "walker" macro and
decided to stop...

Also, add a brace for better/more consistent style in the attached patch.

Should try and avoid oh-by-the-way fixes like this in behavior patches.

This specific one also doesn't seem warranted. Or, at least, the original
style is indeed de facto acceptable; if-blocks containing a single
executing statement do not get braces but rely on indentation alone. I
don't think that every other if-block in the region being multi-statement
supports an exception.

David J.

#6Tender Wang
tndrwang@gmail.com
In reply to: David G. Johnston (#5)
Re: A little cosmetic to convert_VALUES_to_ANY()

David G. Johnston <david.g.johnston@gmail.com> 于2025年8月4日周一 12:33写道:

On Sun, Jul 27, 2025 at 12:53 AM Tender Wang <tndrwang@gmail.com> wrote:

We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).

This seems like an unusual usage of big-O notation. Always saw the
meaningful pieces inside the O, and in general I thought O(n) =~ O(xn) for
all x, but especially if x is small.

The point made by Chao Li, that by using a loop guard we avoid performing
expression evaluation, seems like a reasonable and sufficient non-data
supported reason for having the code structured this way. I'd at least
want a more data-driven reason to try and change it in such a simple way
against the read/write boundary that presently exists.

Agree, I will withdraw the patch.

I was pondering whether contain_volatile_functions could/should be taught
to also detect whether every expression it sees resolves to a constant...
contain_volatile_functions((Node *) rte->values_lists, &has_non_constants)

But got a little ways down the call stack and hit the "walker" macro and
decided to stop...

Also, add a brace for better/more consistent style in the attached patch.

Should try and avoid oh-by-the-way fixes like this in behavior patches.

This specific one also doesn't seem warranted. Or, at least, the original
style is indeed de facto acceptable; if-blocks containing a single
executing statement do not get braces but rely on indentation alone. I
don't think that every other if-block in the region being multi-statement
supports an exception.

About this, you can check this commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=aadf7db66ef5a8a723eb3362e2c8b460738f1107
.
Add brace for better/more consistent style.

--
Thanks,
Tender Wang

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Tender Wang (#6)
Re: A little cosmetic to convert_VALUES_to_ANY()

On Sun, Aug 3, 2025 at 9:40 PM Tender Wang <tndrwang@gmail.com> wrote:

Also, add a brace for better/more consistent style in the attached patch.

Should try and avoid oh-by-the-way fixes like this in behavior patches.

This specific one also doesn't seem warranted. Or, at least, the original
style is indeed de facto acceptable; if-blocks containing a single
executing statement do not get braces but rely on indentation alone. I
don't think that every other if-block in the region being multi-statement
supports an exception.

About this, you can check this commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=aadf7db66ef5a8a723eb3362e2c8b460738f1107
.
Add brace for better/more consistent style.

OK...I only agree with one of those changes and the overall patch is
basically a code style patch and a single function affected - neither of
which applied here. I'd probably have done the else-if block too; but
there is nothing wrong or definitely better about the initial guard being
braced and I'd argue the brace-less version is subjectively superior. Not
enough to go back and change again but still the original choice was better
IMO.

David J.

#8David Rowley
dgrowleyml@gmail.com
In reply to: Tender Wang (#1)
Re: A little cosmetic to convert_VALUES_to_ANY()

On Sun, 27 Jul 2025 at 19:53, Tender Wang <tndrwang@gmail.com> wrote:

While debugging the pull-up sublink codes, I noticed the convert_VALUES_to_ANY().
The function is to convert "where colX in VALUES(xxx)" into SAOP. It firstly scans the values_list to
make sure no volatile function is in this list, then it scans this values_list again to check that it
only includes Const items.

We can merge the two scans into one. This can reduce the time complexity from 2O(n) to O(n).

I imagine that >95% of the time, probably more, the VALUES list is
going to contain only Consts, and if anything is done here, then it
likely should be to skip the convert_testexpr() /
eval_const_expressions() rigmarole when the input is already Const.

The proposal to move the contain_volatile_functions() doesn't seem
like a good idea to me as technically putting it where you propose
makes it mostly redundant (eval_const_expressions() will never
evaluate a volatile function and return a Const). I suspect it's there
because it's a slightly cheaper pre-check, and if you move it to
inside the loop then you may end up doing a bunch of work in
eval_const_expressions(), such as function inlining, which will just
be thrown away if you discover that the next item in the VALUES list
has a volatile.

Might be worth checking the planning overheads of large VALUES list
and seeing if it's worth putting the call to convert_testexpr() and
eval_const_expressions() inside an if (!IsA(value, Const)).

David