A little cosmetic to convert_VALUES_to_ANY()
Hi,
While debugging the pull-up sublink codes, I noticed the
convert_VALUES_to_ANY().
The function is to convert "where colX in VALUES(xxx)" into SAOP. It
firstly scans the values_list to
make sure no volatile function is in this list, then it scans this
values_list again to check that it
only includes Const items.
We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).
Also, add a brace for better/more consistent style in the attached patch.
--
Thanks,
Tender Wang
Attachments:
0001-A-little-cosmetic-to-convert_VALUES_to_ANY.patchtext/plain; charset=US-ASCII; name=0001-A-little-cosmetic-to-convert_VALUES_to_ANY.patchDownload
From 0eeb7487ba5bdacd260987514cd0a77eb8c88eaa Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Sun, 27 Jul 2025 15:26:20 +0800
Subject: [PATCH] A little cosmetic to convert_VALUES_to_ANY().
In original logic, we have to do two passes iteration over the
values_lists. One is to check if have volatile functions, the other
one is to check if have no Const. We can merge above two iterations into
one. Also add a brace for better/more consistent style.
---
src/backend/optimizer/plan/subselect.c | 9 ++++++---
src/backend/optimizer/prep/prepjointree.c | 4 ++--
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d71ed958e31..7a275f430f2 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1255,11 +1255,10 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values)
/*
* Also, check that only RTE corresponds to VALUES; the list of values has
- * at least two items and no volatile functions.
+ * at least two items.
*/
if (rte->rtekind != RTE_VALUES ||
- list_length(rte->values_lists) < 2 ||
- contain_volatile_functions((Node *) rte->values_lists))
+ list_length(rte->values_lists) < 2)
return NULL;
foreach(lc, rte->values_lists)
@@ -1267,6 +1266,10 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values)
List *elem = lfirst(lc);
Node *value = linitial(elem);
+ /* Must have no volatile functions */
+ if (contain_volatile_functions(value))
+ return NULL;
+
/*
* Prepare an evaluation of the right side of the operator with
* substitution of the given value.
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 35e8d3c183b..d9908d16d79 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -848,13 +848,13 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
if ((saop = convert_VALUES_to_ANY(root,
sublink->testexpr,
(Query *) sublink->subselect)) != NULL)
-
+ {
/*
* The VALUES sequence was simplified. Nothing more to do
* here.
*/
return (Node *) saop;
-
+ }
if ((j = convert_ANY_sublink_to_join(root, sublink,
available_rels1)) != NULL)
{
--
2.34.1
Tender Wang <tndrwang@gmail.com> 于2025年7月27日周日 15:53写道:
Hi,
While debugging the pull-up sublink codes, I noticed the
convert_VALUES_to_ANY().
The function is to convert "where colX in VALUES(xxx)" into SAOP. It
firstly scans the values_list to
make sure no volatile function is in this list, then it scans this
values_list again to check that it
only includes Const items.We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).Also, add a brace for better/more consistent style in the attached patch.
I have added this to the commitfest in [1]https://commitfest.postgresql.org/patch/5937/. I hope someone can help me
review it.
[1]: https://commitfest.postgresql.org/patch/5937/
--
Thanks,
Tender Wang
I tested this patch locally and it works for me.
However I doubt if it really improve performance. The original code "contain_volatile_functions((Node *) rte->values_lists)" recursively work through rte-values_list, and this patch move contain_volatile_functions() into the for loop, and checks against every item of rte->values_list. As entries of values_list could just be Const, contain_volatile_functions() will then return immediately, so that this changes reduces total calls to contain_volatile_functions(). From this perspective, this patch may improve the performance.
On the other side, let's say a case where a values_list contains 10 items, and the last item is volatile. With the original code, it won't enter the for loop, nothing will be built; with this patch, operations have been performed on the first 9 items, but eventually it returns NULL when hit the last item. So in this case, performance could be worse.
Overall, I am afraid the burden could beat the gain.
The new status of this patch is: Waiting on Author
Chao Li <li.evan.chao@gmail.com> 于2025年8月4日周一 11:11写道:
I tested this patch locally and it works for me.
However I doubt if it really improve performance. The original code
"contain_volatile_functions((Node *) rte->values_lists)" recursively work
through rte-values_list, and this patch move contain_volatile_functions()
into the for loop, and checks against every item of rte->values_list. As
entries of values_list could just be Const, contain_volatile_functions()
will then return immediately, so that this changes reduces total calls to
contain_volatile_functions(). From this perspective, this patch may improve
the performance.On the other side, let's say a case where a values_list contains 10 items,
and the last item is volatile. With the original code, it won't enter the
for loop, nothing will be built; with this patch, operations have been
performed on the first 9 items, but eventually it returns NULL when hit the
last item. So in this case, performance could be worse.
Yes, in this case, the original code scans the values_list, finds the
volatile functions, and returns from convert_VALUES_to_ANY().
My patch will perform unnecessary convert_testexpr
and eval_const_expressions work.
Overall, I am afraid the burden could beat the gain.
It's not easy to evaluate performance gain.
I'm not sure it's worth doing the change. If someone doesn't like the
patch. I will withdraw it.
The new status of this patch is: Waiting on Author
Thanks for your review.
Actually, the purpose of convert_VALUES_to_ANY() is only to convert the
sublink that includes the Values expression to SAOP.
I have a question: Can we move the convert_VALUES_to_ANY() to another
place? For example:preprocess_qual_conditions().
In pull_up_sublinks_jointree_recurse(), what we want to do is to pull up
the sublinks.
--
Thanks,
Tender Wang
On Sun, Jul 27, 2025 at 12:53 AM Tender Wang <tndrwang@gmail.com> wrote:
We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).
This seems like an unusual usage of big-O notation. Always saw the
meaningful pieces inside the O, and in general I thought O(n) =~ O(xn) for
all x, but especially if x is small.
The point made by Chao Li, that by using a loop guard we avoid performing
expression evaluation, seems like a reasonable and sufficient non-data
supported reason for having the code structured this way. I'd at least
want a more data-driven reason to try and change it in such a simple way
against the read/write boundary that presently exists.
I was pondering whether contain_volatile_functions could/should be taught
to also detect whether every expression it sees resolves to a constant...
contain_volatile_functions((Node *) rte->values_lists, &has_non_constants)
But got a little ways down the call stack and hit the "walker" macro and
decided to stop...
Also, add a brace for better/more consistent style in the attached patch.
Should try and avoid oh-by-the-way fixes like this in behavior patches.
This specific one also doesn't seem warranted. Or, at least, the original
style is indeed de facto acceptable; if-blocks containing a single
executing statement do not get braces but rely on indentation alone. I
don't think that every other if-block in the region being multi-statement
supports an exception.
David J.
David G. Johnston <david.g.johnston@gmail.com> 于2025年8月4日周一 12:33写道:
On Sun, Jul 27, 2025 at 12:53 AM Tender Wang <tndrwang@gmail.com> wrote:
We can merge the two scans into one. This can reduce the time complexity
from 2O(n) to O(n).This seems like an unusual usage of big-O notation. Always saw the
meaningful pieces inside the O, and in general I thought O(n) =~ O(xn) for
all x, but especially if x is small.The point made by Chao Li, that by using a loop guard we avoid performing
expression evaluation, seems like a reasonable and sufficient non-data
supported reason for having the code structured this way. I'd at least
want a more data-driven reason to try and change it in such a simple way
against the read/write boundary that presently exists.
Agree, I will withdraw the patch.
I was pondering whether contain_volatile_functions could/should be taught
to also detect whether every expression it sees resolves to a constant...
contain_volatile_functions((Node *) rte->values_lists, &has_non_constants)But got a little ways down the call stack and hit the "walker" macro and
decided to stop...Also, add a brace for better/more consistent style in the attached patch.
Should try and avoid oh-by-the-way fixes like this in behavior patches.
This specific one also doesn't seem warranted. Or, at least, the original
style is indeed de facto acceptable; if-blocks containing a single
executing statement do not get braces but rely on indentation alone. I
don't think that every other if-block in the region being multi-statement
supports an exception.
About this, you can check this commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=aadf7db66ef5a8a723eb3362e2c8b460738f1107
.
Add brace for better/more consistent style.
--
Thanks,
Tender Wang
On Sun, Aug 3, 2025 at 9:40 PM Tender Wang <tndrwang@gmail.com> wrote:
Also, add a brace for better/more consistent style in the attached patch.
Should try and avoid oh-by-the-way fixes like this in behavior patches.
This specific one also doesn't seem warranted. Or, at least, the original
style is indeed de facto acceptable; if-blocks containing a single
executing statement do not get braces but rely on indentation alone. I
don't think that every other if-block in the region being multi-statement
supports an exception.About this, you can check this commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=aadf7db66ef5a8a723eb3362e2c8b460738f1107
.
Add brace for better/more consistent style.
OK...I only agree with one of those changes and the overall patch is
basically a code style patch and a single function affected - neither of
which applied here. I'd probably have done the else-if block too; but
there is nothing wrong or definitely better about the initial guard being
braced and I'd argue the brace-less version is subjectively superior. Not
enough to go back and change again but still the original choice was better
IMO.
David J.
On Sun, 27 Jul 2025 at 19:53, Tender Wang <tndrwang@gmail.com> wrote:
While debugging the pull-up sublink codes, I noticed the convert_VALUES_to_ANY().
The function is to convert "where colX in VALUES(xxx)" into SAOP. It firstly scans the values_list to
make sure no volatile function is in this list, then it scans this values_list again to check that it
only includes Const items.We can merge the two scans into one. This can reduce the time complexity from 2O(n) to O(n).
I imagine that >95% of the time, probably more, the VALUES list is
going to contain only Consts, and if anything is done here, then it
likely should be to skip the convert_testexpr() /
eval_const_expressions() rigmarole when the input is already Const.
The proposal to move the contain_volatile_functions() doesn't seem
like a good idea to me as technically putting it where you propose
makes it mostly redundant (eval_const_expressions() will never
evaluate a volatile function and return a Const). I suspect it's there
because it's a slightly cheaper pre-check, and if you move it to
inside the loop then you may end up doing a bunch of work in
eval_const_expressions(), such as function inlining, which will just
be thrown away if you discover that the next item in the VALUES list
has a volatile.
Might be worth checking the planning overheads of large VALUES list
and seeing if it's worth putting the call to convert_testexpr() and
eval_const_expressions() inside an if (!IsA(value, Const)).
David