BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault
The following bug has been logged on the website:
Bug reference: 18657
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 22.04
Description:
The following query:
SELECT JSON_OBJECT('a': JSON_OBJECTAGG('b': random() RETURNING text) FORMAT
JSON);
triggers a server crash with the following stack trace:
Core was generated by `postgres: law regression [local] SELECT
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055b54914ee1b in ExecBuildAggTrans (...) at execExpr.c:3514
3514 expr_setup_walker((Node *)
pertrans->aggref->aggdirectargs,
(gdb) bt
#0 0x000055b54914ee1b in ExecBuildAggTrans (...) at execExpr.c:3514
#1 0x000055b549180ee9 in ExecInitAgg (...) at nodeAgg.c:4017
#2 0x000055b54916eb13 in ExecInitNode (...) at execProcnode.c:341
#3 0x000055b549162d3f in InitPlan (...) at execMain.c:968
#4 0x000055b549162732 in standard_ExecutorStart (...) at execMain.c:263
#5 0x000055b54916245a in ExecutorStart (...) at execMain.c:139
#6 0x000055b549411063 in PortalStart (...) at pquery.c:517
#7 0x000055b54940d172 in exec_simple_query (
query_string=0x55b54b3f4430 "SELECT JSON_OBJECT('a': JSON_OBJECTAGG('b':
random() RETURNING text) FORMAT JSON);") at postgres.c:1239
(gdb) p pertrans->aggref
$1 = (Aggref *) 0x0
First bad commit for this anomaly is b6e1157e7.
It seems that reverting the code at src/backend/executor/execExpr.c case
T_JsonValueExpr: fix the problem
Revert patch:
diff --git a/src/backend/executor/execExpr.c
b/src/backend/executor/execExpr.c
index a5395536a1..4143ff1730 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -2317,8 +2317,21 @@ ExecInitExprRec(Expr *node, ExprState *state,
{
JsonValueExpr *jve = (JsonValueExpr *) node;
- Assert(jve->formatted_expr != NULL);
- ExecInitExprRec(jve->formatted_expr, state, resv, resnull);
+ ExecInitExprRec(jve->raw_expr, state, resv, resnull);
+
+ if (jve->formatted_expr)
+ {
+ Datum *innermost_caseval = state->innermost_caseval;
+ bool *innermost_isnull = state->innermost_casenull;
+
+ state->innermost_caseval = resv;
+ state->innermost_casenull = resnull;
+
+ ExecInitExprRec(jve->formatted_expr, state, resv, resnull);
+
+ state->innermost_caseval = innermost_caseval;
+ state->innermost_casenull = innermost_isnull;
+ }
break;
}
psql -U postgres -h /tmp
psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.2), server 17.0)
WARNING: psql major version 16, server major version 17.
Some psql features might not work.
Type "help" for help.
postgres=> SELECT JSON_OBJECT('a': JSON_OBJECTAGG('b': random() RETURNING
text) FORMAT JSON);
json_object
--------------------------------------
{"a" : { "b" : 0.9135423351926648 }}
(1 row)
postgres=>
Em ter., 15 de out. de 2024 às 09:16, PG Bug reporting form <
noreply@postgresql.org> escreveu:
Show quoted text
The following bug has been logged on the website:
Bug reference: 18657
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 22.04
Description:The following query:
SELECT JSON_OBJECT('a': JSON_OBJECTAGG('b': random() RETURNING text) FORMAT
JSON);triggers a server crash with the following stack trace:
Core was generated by `postgres: law regression [local] SELECT'.
Program terminated with signal SIGSEGV, Segmentation fault.#0 0x000055b54914ee1b in ExecBuildAggTrans (...) at execExpr.c:3514
3514 expr_setup_walker((Node *)
pertrans->aggref->aggdirectargs,
(gdb) bt
#0 0x000055b54914ee1b in ExecBuildAggTrans (...) at execExpr.c:3514
#1 0x000055b549180ee9 in ExecInitAgg (...) at nodeAgg.c:4017
#2 0x000055b54916eb13 in ExecInitNode (...) at execProcnode.c:341
#3 0x000055b549162d3f in InitPlan (...) at execMain.c:968
#4 0x000055b549162732 in standard_ExecutorStart (...) at execMain.c:263
#5 0x000055b54916245a in ExecutorStart (...) at execMain.c:139
#6 0x000055b549411063 in PortalStart (...) at pquery.c:517
#7 0x000055b54940d172 in exec_simple_query (
query_string=0x55b54b3f4430 "SELECT JSON_OBJECT('a':
JSON_OBJECTAGG('b':
random() RETURNING text) FORMAT JSON);") at postgres.c:1239(gdb) p pertrans->aggref
$1 = (Aggref *) 0x0First bad commit for this anomaly is b6e1157e7.
Attachments:
patchapplication/octet-stream; name=patchDownload+15-2
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
--
Michael
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz>
wrote:On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr in
expression_tree_walker_impl(), which
is called in preprocess_aggrefs().
I try to above solution, no crashed again.
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2024年10月16日周三 16:20写道:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz>
wrote:On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr in
expression_tree_walker_impl(), which
is called in preprocess_aggrefs().I try to above solution, no crashed again.
I attached a patch. I found a typo comment when I worked on this issue.
But I don't include a test case, because random() result is different
every time, maybe using explain statement.
Any thoughts?
--
Thanks,
Tender Wang
Attachments:
0001-Fix-json_objectagg-crashed-when-including-random-fun.patchapplication/octet-stream; name=0001-Fix-json_objectagg-crashed-when-including-random-fun.patchDownload+1-4
On Wed, Oct 16, 2024 at 5:20 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr in expression_tree_walker_impl(), which
is called in preprocess_aggrefs().I try to above solution, no crashed again.
Thanks for the analysis. That is my conclusion as well.
Attached a patch.
--
Thanks, Amit Langote
Attachments:
v1-0001-SQL-JSON-Fix-some-oversights-in-commit-b6e1157e7.patchapplication/x-patch; name=v1-0001-SQL-JSON-Fix-some-oversights-in-commit-b6e1157e7.patchDownload+43-12
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:06写道:
On Wed, Oct 16, 2024 at 5:20 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz>
wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr inexpression_tree_walker_impl(), which
is called in preprocess_aggrefs().
I try to above solution, no crashed again.
Thanks for the analysis. That is my conclusion as well.
Attached a patch.
Yeah, yours look more better than mine. And the typo in my attached patch,
is that right?
JSON_OBJECT() should be JSON_OBJECTAGG() near transformJsonObjectAgg()
--
Thanks,
Tender Wang
On Wed, Oct 16, 2024 at 6:11 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:06写道:
On Wed, Oct 16, 2024 at 5:20 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr in expression_tree_walker_impl(), which
is called in preprocess_aggrefs().I try to above solution, no crashed again.
Thanks for the analysis. That is my conclusion as well.
Attached a patch.
Yeah, yours look more better than mine. And the typo in my attached patch, is that right?
JSON_OBJECT() should be JSON_OBJECTAGG() near transformJsonObjectAgg()
Yeah, the typo needs to be fixed as well. Thanks for the patch.
So, attaching 0002 for that.
--
Thanks, Amit Langote
Attachments:
v2-0001-SQL-JSON-Fix-some-oversights-in-commit-b6e1157e7.patchapplication/octet-stream; name=v2-0001-SQL-JSON-Fix-some-oversights-in-commit-b6e1157e7.patchDownload+43-12
v2-0002-Fix-typo-in-comment-of-transformJsonAggConstructo.patchapplication/octet-stream; name=v2-0002-Fix-typo-in-comment-of-transformJsonAggConstructo.patchDownload+1-2
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:14写道:
On Wed, Oct 16, 2024 at 6:11 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:06写道:
On Wed, Oct 16, 2024 at 5:20 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz>
wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form
wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr inexpression_tree_walker_impl(), which
is called in preprocess_aggrefs().
I try to above solution, no crashed again.
Thanks for the analysis. That is my conclusion as well.
Attached a patch.
Yeah, yours look more better than mine. And the typo in my attached
patch, is that right?
JSON_OBJECT() should be JSON_OBJECTAGG() near transformJsonObjectAgg()
Yeah, the typo needs to be fixed as well. Thanks for the patch.
So, attaching 0002 for that.
LGTM
--
Thanks,
Tender Wang
On Wed, Oct 16, 2024 at 6:19 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:14写道:
On Wed, Oct 16, 2024 at 6:11 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:06写道:
On Wed, Oct 16, 2024 at 5:20 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr in expression_tree_walker_impl(), which
is called in preprocess_aggrefs().I try to above solution, no crashed again.
Thanks for the analysis. That is my conclusion as well.
Attached a patch.
Yeah, yours look more better than mine. And the typo in my attached patch, is that right?
JSON_OBJECT() should be JSON_OBJECTAGG() near transformJsonObjectAgg()Yeah, the typo needs to be fixed as well. Thanks for the patch.
So, attaching 0002 for that.
I've pushed 0002 in advance.
I realized I hadn't explained in the commit message why the outputs of
some existing tests changed, so I decided to give it more thought
before pushing 0001. The changes seem harmless at first glance, but I
want to consider them further.
Also, it might be better to leave a comment where the commit is
removing code, as follows:
- if (WALK(jve->raw_expr))
- return true;
+ /* Ignore raw_expr because it's not relevant at runtime. */
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 20:00写道:
On Wed, Oct 16, 2024 at 6:19 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:14写道:
On Wed, Oct 16, 2024 at 6:11 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 17:06写道:
On Wed, Oct 16, 2024 at 5:20 PM Tender Wang <tndrwang@gmail.com>
wrote:
Amit Langote <amitlangote09@gmail.com> 于2024年10月16日周三 08:35写道:
On Wed, Oct 16, 2024 at 9:19 AM Michael Paquier <
michael@paquier.xyz> wrote:
On Tue, Oct 15, 2024 at 11:00:01AM +0000, PG Bug reporting form
wrote:
First bad commit for this anomaly is b6e1157e7.
Amit, any thoughts?
Will look into it, thanks.
Hi,
I debug this issue, and I find that:
after b6e1157e7, we shouldn't walk JsonValueExpr.raw_expr inexpression_tree_walker_impl(), which
is called in preprocess_aggrefs().
I try to above solution, no crashed again.
Thanks for the analysis. That is my conclusion as well.
Attached a patch.
Yeah, yours look more better than mine. And the typo in my attached
patch, is that right?
JSON_OBJECT() should be JSON_OBJECTAGG() near
transformJsonObjectAgg()
Yeah, the typo needs to be fixed as well. Thanks for the patch.
So, attaching 0002 for that.
I've pushed 0002 in advance.
I realized I hadn't explained in the commit message why the outputs of
some existing tests changed, so I decided to give it more thought
before pushing 0001. The changes seem harmless at first glance, but I
want to consider them further.
I didn't looked more in details. I guessed it is related with below codes:
- MUTATE(newnode->raw_expr, jve->raw_expr, Expr *);
Because in my patch I didn't remove above code, so I didn't have the plan
diffs.
The output in these plan diffs seemed same with their SQL statements. If we
removed
MUTATE(newnode->raw_expr, jve->raw_expr, Expr *);
the output seemd to make sense.
Anyway, above is my guess. I didn't debug these.
Also, it might be better to leave a comment where the commit is
removing code, as follows:- if (WALK(jve->raw_expr)) - return true; + /* Ignore raw_expr because it's not relevant at runtime. */
+1
--
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> writes:
Also, it might be better to leave a comment where the commit is
removing code, as follows:
- if (WALK(jve->raw_expr)) - return true; + /* Ignore raw_expr because it's not relevant at runtime. */
Would it be better for parse analysis to explicitly NULL out this
field once it's done looking at it? Carrying unmaintained pieces
of an expression tree around seems both inefficient and prone to
future failures of this same ilk. The further the raw_expr gets
out of step with current reality, the worse the hazards.
regards, tom lane
On Wed, Oct 16, 2024 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
Also, it might be better to leave a comment where the commit is
removing code, as follows:- if (WALK(jve->raw_expr)) - return true; + /* Ignore raw_expr because it's not relevant at runtime. */Would it be better for parse analysis to explicitly NULL out this
field once it's done looking at it? Carrying unmaintained pieces
of an expression tree around seems both inefficient and prone to
future failures of this same ilk. The further the raw_expr gets
out of step with current reality, the worse the hazards.
It seems we can't do that in this case, because get_rule_expr() wants
to read it.
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> writes:
On Wed, Oct 16, 2024 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Would it be better for parse analysis to explicitly NULL out this
field once it's done looking at it? Carrying unmaintained pieces
of an expression tree around seems both inefficient and prone to
future failures of this same ilk. The further the raw_expr gets
out of step with current reality, the worse the hazards.
It seems we can't do that in this case, because get_rule_expr() wants
to read it.
I think you just increased my level of concern. Aren't you saying
that get_rule_expr is likely to deliver utter garbage, because it
will be working from an expression that has not tracked
transformations made to other parts of the tree?
Also, I see that ruleutils is applying get_rule_expr to raw_expr,
which AFAICT means it's *not* raw-parser output, else that wouldn't
work at all. Which means the field is badly named and incorrectly
documented. But in particular it means you cannot just make the
tree-walking routines ignore it --- you *must* update it during
transformations such as subquery pullup, or you will have garbage
that will cause EXPLAIN to crash or at least produce wrong results.
In other words, the proposed patch is dangerously wrong. I suspect at
this point that the true bug might be the opposite: somebody feeling
that they could dispense with updating raw_expr when they shouldn't.
In particular, I think "... is not evaluated by
eval_const_expressions_mutator()" is already a huge red flag.
There are transformations that eval_const_expressions does that
are not optional.
regards, tom lane
On Thu, Oct 17, 2024 at 9:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Wed, Oct 16, 2024 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Would it be better for parse analysis to explicitly NULL out this
field once it's done looking at it? Carrying unmaintained pieces
of an expression tree around seems both inefficient and prone to
future failures of this same ilk. The further the raw_expr gets
out of step with current reality, the worse the hazards.It seems we can't do that in this case, because get_rule_expr() wants
to read it.I think you just increased my level of concern. Aren't you saying
that get_rule_expr is likely to deliver utter garbage, because it
will be working from an expression that has not tracked
transformations made to other parts of the tree?Also, I see that ruleutils is applying get_rule_expr to raw_expr,
which AFAICT means it's *not* raw-parser output, else that wouldn't
work at all. Which means the field is badly named and incorrectly
documented. But in particular it means you cannot just make the
tree-walking routines ignore it --- you *must* update it during
transformations such as subquery pullup, or you will have garbage
that will cause EXPLAIN to crash or at least produce wrong results.
One thing we could do is have get_rule_expr() print formatted_expr.
While this would make the output a bit verbose as shown in one of the
diffs I get, it would at least avoid the issues you mentioned with
printing raw_expr.
EXPLAIN (VERBOSE, COSTS OFF) SELECT JSON('123'::bytea FORMAT JSON);
- QUERY PLAN
------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------
Result
- Output: JSON('\x313233'::bytea FORMAT JSON)
+ Output: JSON((convert_from('\x313233'::bytea, 'UTF8'::name))::json
FORMAT JSON)
In other words, the proposed patch is dangerously wrong. I suspect at
this point that the true bug might be the opposite: somebody feeling
that they could dispense with updating raw_expr when they shouldn't.
The issue in this particular case is that preprocess_aggref() adds the
AggTransInfo of the Aggref mentioned in raw_expr because
expression_tree_walker() processes it. However, since raw_expr is
ignored by the executor, the Aggref contained in it is not added to
AggState.aggs. This causes the aggno and aggtransno values of the
Aggrefs that are added to become out of sync, leading to a crash.
I don't see a way to control whether raw_expr is processed by
preprocess_aggref other than ignoring it in expression_tree_walker().
But maybe I misunderstood you.
--
Thanks, Amit Langote
I wrote:
In other words, the proposed patch is dangerously wrong. I suspect at
this point that the true bug might be the opposite: somebody feeling
that they could dispense with updating raw_expr when they shouldn't.
To expand on that: the proximate cause of the bug is that
ExecBuildAggTrans is expecting that expression initialization found
an Aggref for every aggtransno index. In the plan as emitted by
the planner, there are two json_object_agg Aggref nodes, one in
the JsonValueExpr's raw_expr and the other in the formatted_expr.
preprocess_aggrefs() assigned aggno 0 and aggtransno 0 to the
first one, aggno 1 and aggtransno 1 to the second. As already
noted upthread, the core of the problem is that ExecInitExprRec's
T_JsonValueExpr case is not recursing into raw_expr, so that
only the second Aggref gets found and linked into the parent
AggState, thus leaving a hole in the per-transno array.
The proposed patch fixes that indirectly by lobotomizing
expression_tree_walker so that preprocess_aggrefs doesn't
find the raw_expr's Aggref either. However, the side effects
of that will be spectacularly unpleasant, because there are
approximately zero other callers of expression_tree_walker
that will be pleased with it.
In the short term, I suspect the only workable fix is to undo the
optimization of having ExecInitExprRec not recurse into both raw_expr
and formatted_expr. In the longer run, I'd look hard at why the
tree needs to have two copies of that subexpression at all.
At least in this example, it appears that the only difference is
a type coercion expression wrapped around the formatted_expr, and
there are surely better ways to handle that.
BTW, the reason that using a volatile function matters is that
that stops find_compatible_agg from deciding that the two identical
Aggrefs can be given the same aggno and aggtransno, which masks
the bug in another way. That also means that the planner thinks
the two Aggrefs represent distinct computations. At minimum that
messes up our estimates of aggregate evaluation costs, and there
might be more subtle semantic consequences.
I remain concerned about whether it's really okay to skip raw_expr
in eval_const_expressions_mutator, too.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> 于2024年10月17日周四 08:56写道:
Amit Langote <amitlangote09@gmail.com> writes:
On Wed, Oct 16, 2024 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Would it be better for parse analysis to explicitly NULL out this
field once it's done looking at it? Carrying unmaintained pieces
of an expression tree around seems both inefficient and prone to
future failures of this same ilk. The further the raw_expr gets
out of step with current reality, the worse the hazards.It seems we can't do that in this case, because get_rule_expr() wants
to read it.I think you just increased my level of concern. Aren't you saying
that get_rule_expr is likely to deliver utter garbage, because it
will be working from an expression that has not tracked
transformations made to other parts of the tree?Also, I see that ruleutils is applying get_rule_expr to raw_expr,
which AFAICT means it's *not* raw-parser output, else that wouldn't
work at all. Which means the field is badly named and incorrectly
documented. But in particular it means you cannot just make the
tree-walking routines ignore it --- you *must* update it during
transformations such as subquery pullup, or you will have garbage
that will cause EXPLAIN to crash or at least produce wrong results.
When I first see the "raw_expr", I think it is output after syntax
parsing.
But it is not. The "raw_expr" is transformed in transformJsonValueExpr().
And the formated_expr is different from raw_expr when the targettype is not
same with the expr's type.
I do some codes hack that make the raw_expr store the output of syntax
parsing not
transformed result. As you said above, explain will report error.
In other words, the proposed patch is dangerously wrong. I suspect at
this point that the true bug might be the opposite: somebody feeling
that they could dispense with updating raw_expr when they shouldn't.
In particular, I think "... is not evaluated by
eval_const_expressions_mutator()" is already a huge red flag.
There are transformations that eval_const_expressions does that
are not optional.
Can we make the raw_expr just save the raw_parser output, and formated_expr
store the transformed output, and we only touch the formated_expr when need
to
evaluating expression?
--
Thanks,
Tender Wang
On Thu, Oct 17, 2024 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
In other words, the proposed patch is dangerously wrong. I suspect at
this point that the true bug might be the opposite: somebody feeling
that they could dispense with updating raw_expr when they shouldn't.To expand on that: the proximate cause of the bug is that
ExecBuildAggTrans is expecting that expression initialization found
an Aggref for every aggtransno index. In the plan as emitted by
the planner, there are two json_object_agg Aggref nodes, one in
the JsonValueExpr's raw_expr and the other in the formatted_expr.
preprocess_aggrefs() assigned aggno 0 and aggtransno 0 to the
first one, aggno 1 and aggtransno 1 to the second. As already
noted upthread, the core of the problem is that ExecInitExprRec's
T_JsonValueExpr case is not recursing into raw_expr, so that
only the second Aggref gets found and linked into the parent
AggState, thus leaving a hole in the per-transno array.The proposed patch fixes that indirectly by lobotomizing
expression_tree_walker so that preprocess_aggrefs doesn't
find the raw_expr's Aggref either. However, the side effects
of that will be spectacularly unpleasant, because there are
approximately zero other callers of expression_tree_walker
that will be pleased with it.In the short term, I suspect the only workable fix is to undo the
optimization of having ExecInitExprRec not recurse into both raw_expr
and formatted_expr.
This or actually I'm tempted to simply revert the whole thing
(b6e1157e7d3) as an ill-considered refactoring, because I am not able
to convince myself that calling ExecAggPlainTransByVal() twice, via
both raw_expr and formatted_expr, is always safe.
In the longer run, I'd look hard at why the
tree needs to have two copies of that subexpression at all.
At least in this example, it appears that the only difference is
a type coercion expression wrapped around the formatted_expr, and
there are surely better ways to handle that.
Before commit b6e1157e, raw_expr and formatted_expr were evaluated
separately, but only raw_expr would contain the actual expression
(such as an Aggref in this case). The result of raw_expr was passed as
an argument to the formatting expression (like CoerceViaIO) using a
CaseValueExpr. That commit removed the CaseValueExpr indirection and
embedded the expression from raw_expr directly into formatted_expr,
making the separate runtime evaluation of raw_expr redundant.
However, this change was ill-advised given this report.
Thoughts on the attached?
--
Thanks, Amit Langote
Attachments:
v3-0001-Revert-Don-t-include-CaseTestExpr-in-JsonValueExp.patchapplication/octet-stream; name=v3-0001-Revert-Don-t-include-CaseTestExpr-in-JsonValueExp.patchDownload+124-24
Amit Langote <amitlangote09@gmail.com> writes:
On Thu, Oct 17, 2024 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the short term, I suspect the only workable fix is to undo the
optimization of having ExecInitExprRec not recurse into both raw_expr
and formatted_expr.
This or actually I'm tempted to simply revert the whole thing
(b6e1157e7d3) as an ill-considered refactoring, because I am not able
to convince myself that calling ExecAggPlainTransByVal() twice, via
both raw_expr and formatted_expr, is always safe.
Not following the concern here? As far as nodeAgg is concerned,
they'd be two independent aggregates.
In any case, whatever we do in master, you can't "simply revert"
b6e1157e7 in released branches. It changed the way JsonValueExpr is
represented in stored rules, and you don't get to undo that midstream.
(The commit should have included a catversion bump, in fact.)
I do think that reverting the ExecInitExprRec and
eval_const_expressions_mutator changes would be good, but we
can't undo the parser changes, at least not in v16/v17.
regards, tom lane