BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Started by PG Bug reporting formover 3 years ago34 messagesbugs

noreply@postgresql.org

over 3 years ago

The following bug has been logged on the website:

Bug reference: 17800
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 15.2
Operating system: Ubuntu 22.04
Description:

When executing the following queries:
CREATE TABLE t (a INT PRIMARY KEY, b TEXT) PARTITION BY LIST (a);

CREATE TABLE tp1 (b TEXT, a INT PRIMARY KEY);
ALTER TABLE t ATTACH PARTITION tp1 FOR VALUES IN (1);

CREATE TABLE tp2 (a INT PRIMARY KEY, b TEXT);
ALTER TABLE t ATTACH PARTITION tp2 FOR VALUES IN (2);

INSERT INTO t VALUES (1), (2);

INSERT INTO t VALUES (1), (2) ON CONFLICT(a)
DO UPDATE SET (a, b) = (SELECT t.a, t.b || '+');

I get the server crashed with the coredump:
Core was generated by `postgres: law regression [local] INSERT
'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/1121242' in core file too small.
#0 0x000056173f49d91d in pg_detoast_datum_packed (datum=0x2) at
fmgr.c:1853
1853 if (VARATT_IS_COMPRESSED(datum) ||
VARATT_IS_EXTERNAL(datum))
(gdb) bt
#0 0x000056173f49d91d in pg_detoast_datum_packed (datum=0x2) at
fmgr.c:1853
#1 0x000056173f44bb3c in textcat (fcinfo=0x561740c0a4d8) at varlena.c:750
#2 0x000056173efebd6e in ExecInterpExpr (state=0x561740c09ff0,
econtext=0x561740c09d18,
isnull=0x7ffc5f595d2f) at execExprInterp.c:752
#3 0x000056173f04a86a in ExecEvalExprSwitchContext (state=0x561740c09ff0,
econtext=0x561740c09d18,
isNull=0x7ffc5f595d2f) at ../../../src/include/executor/executor.h:344
#4 0x000056173f04a8e2 in ExecProject (projInfo=0x561740c09fe8)
at ../../../src/include/executor/executor.h:378
#5 0x000056173f04ab19 in ExecResult (pstate=0x561740c09c08) at
nodeResult.c:136
#6 0x000056173f04e40d in ExecProcNode (node=0x561740c09c08) at
../../../src/include/executor/executor.h:262
#7 0x000056173f0508fe in ExecSetParamPlan (node=0x561740c112c0,
econtext=0x561740c0f8c8)
at nodeSubplan.c:1133
#8 0x000056173efef225 in ExecEvalParamExec (state=0x561740c0fa80,
op=0x561740c0fb48, econtext=0x561740c0f8c8)
at execExprInterp.c:2428
#9 0x000056173efec5e8 in ExecInterpExpr (state=0x561740c0fa80,
econtext=0x561740c0f8c8,
isnull=0x7ffc5f5961cf) at execExprInterp.c:1065
#10 0x000056173efee179 in ExecInterpExprStillValid (state=0x561740c0fa80,
econtext=0x561740c0f8c8,
isNull=0x7ffc5f5961cf) at execExprInterp.c:1838
#11 0x000056173f040b4a in ExecEvalExprSwitchContext (state=0x561740c0fa80,
econtext=0x561740c0f8c8,
isNull=0x7ffc5f5961cf) at ../../../src/include/executor/executor.h:344
#12 0x000056173f040bc2 in ExecProject (projInfo=0x561740c0fa78)
at ../../../src/include/executor/executor.h:378
#13 0x000056173f044def in ExecOnConflictUpdate (context=0x7ffc5f596420,
resultRelInfo=0x561740c2c418,
conflictTid=0x7ffc5f5962e2, excludedSlot=0x561740c0b138, canSetTag=true,
returning=0x7ffc5f5962e8)
at nodeModifyTable.c:2659
#14 0x000056173f0426d8 in ExecInsert (context=0x7ffc5f596420,
resultRelInfo=0x561740c2c418,
slot=0x561740c0b138, canSetTag=true, inserted_tuple=0x0,
insert_destrel=0x0) at nodeModifyTable.c:1059
#15 0x000056173f046dc6 in ExecModifyTable (pstate=0x561740c0a578) at
nodeModifyTable.c:3810
#16 0x000056173f004dd3 in ExecProcNodeFirst (node=0x561740c0a578) at
execProcnode.c:464
#17 0x000056173eff7ff6 in ExecProcNode (node=0x561740c0a578) at
../../../src/include/executor/executor.h:262
#18 0x000056173effae2b in ExecutePlan (estate=0x561740c09938,
planstate=0x561740c0a578,
use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false,
numberTuples=0,
direction=ForwardScanDirection, dest=0x561740c0f310, execute_once=true)
at execMain.c:1633
#19 0x000056173eff86de in standard_ExecutorRun (queryDesc=0x561740c12c78,
direction=ForwardScanDirection,
count=0, execute_once=true) at execMain.c:364
#20 0x000056173eff84e7 in ExecutorRun (queryDesc=0x561740c12c78,
direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:308
#21 0x000056173f2aa948 in ProcessQuery (plan=0x561740c20158,
sourceText=0x561740b1a178 "INSERT INTO t VALUES (1), (2) ON
CONFLICT(a)\n DO UPDATE SET (a, b) = (SELECT t.a, t.b || '+');",
params=0x0, queryEnv=0x0, dest=0x561740c0f310, qc=0x7ffc5f596880) at
pquery.c:160
#22 0x000056173f2ac4ec in PortalRunMulti (portal=0x561740b926c8,
isTopLevel=true, setHoldSnapshot=false,
dest=0x561740c0f310, altdest=0x561740c0f310, qc=0x7ffc5f596880) at
pquery.c:1277
#23 0x000056173f2ab9e0 in PortalRun (portal=0x561740b926c8,
count=9223372036854775807, isTopLevel=true,
run_once=true, dest=0x561740c0f310, altdest=0x561740c0f310,
qc=0x7ffc5f596880) at pquery.c:791
#24 0x000056173f2a4743 in exec_simple_query (
query_string=0x561740b1a178 "INSERT INTO t VALUES (1), (2) ON
CONFLICT(a)\n DO UPDATE SET (a, b) = (SELECT t.a, t.b || '+');") at
postgres.c:1237
#25 0x000056173f2a9754 in PostgresMain (dbname=0x561740b52578 "regression",
username=0x561740b17708 "law")
at postgres.c:4565
#26 0x000056173f1cd4bb in BackendRun (port=0x561740b42b10) at
postmaster.c:4461
#27 0x000056173f1ccd47 in BackendStartup (port=0x561740b42b10) at
postmaster.c:4189
#28 0x000056173f1c908c in ServerLoop () at postmaster.c:1779
#29 0x000056173f1c8936 in PostmasterMain (argc=3, argv=0x561740b15640) at
postmaster.c:1463
#30 0x000056173f082cfa in main (argc=3, argv=0x561740b15640) at main.c:200

But when executing:
UPDATE t SET (a, b) = (SELECT t.a, t.b || '+');
I get:
ERROR: attribute 1 of type tp1 has wrong type
DETAIL: Table has type text, but query expects integer.

By comparing two callstacks I can see that in the second case
ExecInterpExprStillValid() is executed after the latest
ExecEvalExprSwitchContext().
The ExecInterpExprStillValid() function contains:
/* skip the check during further executions */
state->evalfunc = (ExprStateEvalFunc) state->evalfunc_private;

If just call evalfunc_private() here, the first case ends with the error as
expected.

Observed on REL_11_STABLE..master.

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

On Sat, Feb 18, 2023 at 08:00:00AM +0000, PG Bug reporting form wrote:

INSERT INTO t VALUES (1), (2) ON CONFLICT(a)
DO UPDATE SET (a, b) = (SELECT t.a, t.b || '+');

But when executing:
UPDATE t SET (a, b) = (SELECT t.a, t.b || '+');
I get:
ERROR: attribute 1 of type tp1 has wrong type
DETAIL: Table has type text, but query expects integer.

Reproduced here.

By comparing two callstacks I can see that in the second case
ExecInterpExprStillValid() is executed after the latest
ExecEvalExprSwitchContext().
The ExecInterpExprStillValid() function contains:
/* skip the check during further executions */
state->evalfunc = (ExprStateEvalFunc) state->evalfunc_private;

If just call evalfunc_private() here, the first case ends with the error as
expected.

Yeah, it would sound logic to me to have consistency with the
ExecEvalExprSwitchContext() checks here, so it seems like the executor
has missed the call for a long time. Would you like to write a patch,
perhaps? Did you bisect the origin of that?
--
Michael

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Michael Paquier (#2)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Michael Paquier <michael@paquier.xyz> writes:

On Sat, Feb 18, 2023 at 08:00:00AM +0000, PG Bug reporting form wrote:

But when executing:
UPDATE t SET (a, b) = (SELECT t.a, t.b || '+');
I get:
ERROR: attribute 1 of type tp1 has wrong type
DETAIL: Table has type text, but query expects integer.

Reproduced here.

I think there may be two different bugs here. The coredump in the
ON CONFLICT case goes back to v11 for me (and v10 doesn't support
the primary-key-on-partition case at all, so this error may be
aboriginal to the feature). But I only see this "wrong type" failure
in v14 and later. I didn't try bisecting yet.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Tom Lane (#3)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

I wrote:

I think there may be two different bugs here. The coredump in the
ON CONFLICT case goes back to v11 for me (and v10 doesn't support
the primary-key-on-partition case at all, so this error may be
aboriginal to the feature). But I only see this "wrong type" failure
in v14 and later. I didn't try bisecting yet.

Bisecting produced some interesting results: the "wrong type"
failure has existed for a pretty long time on HEAD. The reason
I don't see it on v13 etc is that commits 3f7323cbb et al fixed
it in those branches. That's probably accidental, but I'm too
tired to poke at it more tonight.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Tom Lane (#4)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

I wrote:

Bisecting produced some interesting results: the "wrong type"
failure has existed for a pretty long time on HEAD. The reason
I don't see it on v13 etc is that commits 3f7323cbb et al fixed
it in those branches. That's probably accidental, but I'm too
tired to poke at it more tonight.

Sigh ... no, it's not accidental: this is exactly the same problem
3f7323cbb fixed, but popping up in two new places.

In the first place, the test cases we had that led to 3f7323cbb
were using traditionally-inherited tables, in which an update
of this sort leads to a plan like

regression=# create table ti (a int, b text);
CREATE TABLE
regression=# create table tichild() inherits(ti);
CREATE TABLE
regression=# explain (verbose, costs off) UPDATE ti SET (a, b) = (SELECT ti.a, ti.b || '+');
QUERY PLAN
---------------------------------------------------------------------------
Update on public.ti
Update on public.ti ti_1
Update on public.tichild ti_2
-> Result
Output: $2, $3, (SubPlan 1 (returns $2,$3)), ti.tableoid, ti.ctid
-> Append
-> Seq Scan on public.ti ti_1
Output: ti_1.a, ti_1.b, ti_1.tableoid, ti_1.ctid
-> Seq Scan on public.tichild ti_2
Output: ti_2.a, ti_2.b, ti_2.tableoid, ti_2.ctid
SubPlan 1 (returns $2,$3)
-> Result
Output: ti.a, (ti.b || '+'::text)
(13 rows)

But if the target is a partitioned table, as in Alexander's example:

regression=# explain (verbose, costs off) UPDATE t SET (a, b) = (SELECT t.a, t.b || '+');
QUERY PLAN
-----------------------------------------------------------------------------------
Update on public.t
Update on public.tp1 t_1
Update on public.tp2 t_2
-> Append
-> Seq Scan on public.tp1 t_1
Output: $2, $3, (SubPlan 1 (returns $2,$3)), t_1.tableoid, t_1.ctid
SubPlan 1 (returns $2,$3)
-> Result
Output: t_1.a, (t_1.b || '+'::text)
-> Seq Scan on public.tp2 t_2
Output: $2, $3, (SubPlan 1 (returns $2,$3)), t_2.tableoid, t_2.ctid
(11 rows)

I'm not real sure why this discrepancy exists, or whether it's a good
idea. But at any rate, the reason why I thought 3f7323cbb would only
be needed in the back branches is that I thought we had eliminated all
cases of making multiple clones of an UPDATE's target list when we
nuked inheritance_planner. But here we have multiple clones of a
MULTIEXPR_SUBLINK SubPlan, and they're all sharing the same output
parameters ($2 and $3, here), and so the confusion about which args list
has to be executed during a recomputation comes right back.

If that were the only problem then I'd seriously think about fixing it
by disallowing this sort of push-down of the UPDATE targetlist when
MULTIEXPR_SUBLINKs are present. However, the reason we're seeing a
comparable problem in ON CONFLICT is that ExecInitPartitionInfo *also*
makes clones of an UPDATE targetlist, and it is far too naive to fix
this problem.

Not sure about a good fix. We could imagine porting v13's
SS_make_multiexprs_unique logic forward into the newer branches,
but that depends on a lot of planner infrastructure that won't
be available when ExecInitPartitionInfo runs. I think in the
back branches we might be forced into disallowing MULTIEXPR_SUBLINKs
in ON CONFLICT targetlists, because supporting them properly is
looking like a mess.

At this point I'm seriously regretting the entire MULTIEXPR_SUBLINK
design, and am wondering if we could find a different solution for
that. That couldn't lead to a back-patchable fix, obviously.

Another angle is that having ExecInitPartitionInfo doing this sort
of work is a big misallocation of responsibility. If these tlists
were getting built in the planner it'd be far easier to fix.

regards, tom lane

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Michael Paquier (#2)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 07:49:25 +0900, Michael Paquier wrote:

On Sat, Feb 18, 2023 at 08:00:00AM +0000, PG Bug reporting form wrote:

By comparing two callstacks I can see that in the second case
ExecInterpExprStillValid() is executed after the latest
ExecEvalExprSwitchContext().
The ExecInterpExprStillValid() function contains:
/* skip the check during further executions */
state->evalfunc = (ExprStateEvalFunc) state->evalfunc_private;

If just call evalfunc_private() here, the first case ends with the error as
expected.

Yeah, it would sound logic to me to have consistency with the
ExecEvalExprSwitchContext() checks here, so it seems like the executor
has missed the call for a long time. Would you like to write a patch,
perhaps? Did you bisect the origin of that?

What inconsistency / missed call? We use ExecInterpExprStillValid() on the
first execution, but not on later executions. I don't see cases where we omit
calls to ExecInterpExprStillValid().

Afaict this is a problem of a wrongly generated target list, which isn't what
ExecInterpExprStillValid() guards against:
/*
* First time through, check whether attribute matches Var. Might not be
* ok anymore, due to schema changes. We do that by setting up a callback
* that does checking on the first call, which then sets the evalfunc
* callback to the actual method of execution.
*/
state->evalfunc = ExecInterpExprStillValid;

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#6)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Andres Freund <andres@anarazel.de> writes:

Afaict this is a problem of a wrongly generated target list, which isn't what
ExecInterpExprStillValid() guards against:

The targetlists are okay, really. The core problem is that each
targetlist has an instance of the MULTIEXPR_SUBLINK SubPlan with a
differently-mutated "args" list, and it looks to me like we correctly
mutated that for the associated child table. But because all the
instances share the same output Params, the ExecSetParamPlan mechanism
gets confused about which version of the SubPlan it ought to invoke
to recompute the output Params.

It occurs to me that one possible fix is to make MULTIEXPR_SUBLINK
and the associated output Params use a separate ParamExecData array;
instead of the query-wide es_param_exec_vals array, use one that
is local to the specific targetlist's ExprState. I'm not sure how
much violence that does to the current notion of an ExprState ---
do we think that is read-only during execution?

If we did have a local-to-the-expression ParamExecData array, maybe that
could be used to get a cleaner fix for things like the domain VALUES
and case-test-expression hacks.

regards, tom lane

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#5)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 12:27:54 -0500, Tom Lane wrote:

If that were the only problem then I'd seriously think about fixing it
by disallowing this sort of push-down of the UPDATE targetlist when
MULTIEXPR_SUBLINKs are present.

For a moment I was wondering whether we could do the opposite. Force the
subplan references to be below the Append node. But leaving aside that that
doesn't look trivial, it wouldn't do anything for the insert [oc] case.

However, the reason we're seeing a comparable problem in ON CONFLICT is that
ExecInitPartitionInfo *also* makes clones of an UPDATE targetlist, and it is
far too naive to fix this problem.

Not sure about a good fix. We could imagine porting v13's
SS_make_multiexprs_unique logic forward into the newer branches,
but that depends on a lot of planner infrastructure that won't
be available when ExecInitPartitionInfo runs.

And we really can't change the plan shape much at that point, given that we're
deferring ExecInitPartitionInfo to happen as late as possible.

I think in the
back branches we might be forced into disallowing MULTIEXPR_SUBLINKs
in ON CONFLICT targetlists, because supporting them properly is
looking like a mess.

Are you thinking of doing that in general, or only for partitioned tables? I'm
worried that this would break a lot of working queries.

Perhaps we could restrict cases of erroring out to when there's a mismatch in
column order between the partitioned table and the partition? IIUC we'll,
somewhat accidentally, do the right thing if the column order matches?

I continue to maintain that it was a seriously bad idea to support this level
of difference between partitioned table and partitions. It adds ugliness and
inefficiencies all over.

Another angle is that having ExecInitPartitionInfo doing this sort
of work is a big misallocation of responsibility. If these tlists
were getting built in the planner it'd be far easier to fix.

Agreed, that doesn't seem quite right. Generally it feels a bunch of
responsibilities that should be more on the planner level have been pushed
down into execPartition.c - but it's not obvious how to fix that in all cases
without increasing the overhead when using runtime partition pruning.

Greetings,

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#7)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 15:16:11 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Afaict this is a problem of a wrongly generated target list, which isn't what
ExecInterpExprStillValid() guards against:

The targetlists are okay, really. The core problem is that each
targetlist has an instance of the MULTIEXPR_SUBLINK SubPlan with a
differently-mutated "args" list, and it looks to me like we correctly
mutated that for the associated child table. But because all the
instances share the same output Params, the ExecSetParamPlan mechanism
gets confused about which version of the SubPlan it ought to invoke
to recompute the output Params.

It doesn't seem crazy to describe that as a wrong targetlist :). But anyway,
all I meant was that the problem isn't one that ExecInterpExprStillValid() can
handle.

It occurs to me that one possible fix is to make MULTIEXPR_SUBLINK
and the associated output Params use a separate ParamExecData array;
instead of the query-wide es_param_exec_vals array, use one that
is local to the specific targetlist's ExprState. I'm not sure how
much violence that does to the current notion of an ExprState ---
do we think that is read-only during execution?

I don't think you would need to modify ExprState - the information about
params etc comes from the ExprContext, right? So we'd need to build a
different ExprContext for partitions, and use that when evaluating the
expressions.

Oh, I guess you might be referencing ExecInitExprWithParams(), from
6719b238e8f0? I don't like that much, it seems like the wrong level - all
other similar info comes from the ExprContext, why not here as well? Either
way, it looks to me like it'd not be used here, as that's just used for
PARAM_EXTERN, but we're dealing with PARAM_EXEC. Just changing the
->ext_params at runtime wouldn't work, it seems to be resolved when the
expression is built.

We don't currently have infrastructure for setting
econtext->ecxt_param_exec_vals to something else, but that shouldn't be too
hard to add.

If we did have a local-to-the-expression ParamExecData array, maybe that
could be used to get a cleaner fix for things like the domain VALUES
and case-test-expression hacks.

Hm, I'm not quite following along here.

Greetings,

Andres Freund

#10

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#9)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Andres Freund <andres@anarazel.de> writes:

On 2023-02-21 15:16:11 -0500, Tom Lane wrote:

It occurs to me that one possible fix is to make MULTIEXPR_SUBLINK
and the associated output Params use a separate ParamExecData array;
instead of the query-wide es_param_exec_vals array, use one that
is local to the specific targetlist's ExprState. I'm not sure how
much violence that does to the current notion of an ExprState ---
do we think that is read-only during execution?

I don't think you would need to modify ExprState - the information about
params etc comes from the ExprContext, right? So we'd need to build a
different ExprContext for partitions, and use that when evaluating the
expressions.
...
We don't currently have infrastructure for setting
econtext->ecxt_param_exec_vals to something else, but that shouldn't be too
hard to add.

No, that won't work, because many usages of PARAM_EXEC Params are
specifically intended to transmit datums from one expression (plan node)
to another. That's why that array was query-global to begin with.
What I'm wondering about is adding a separate array, and likely a separate
ParamKind, that would have a less-than-query-wide scope. We might be able
to get away with having that be plan-node-wide, but making it local to the
specific compiled expression feels safer and easier to reason about.

If we did have a local-to-the-expression ParamExecData array, maybe that
could be used to get a cleaner fix for things like the domain VALUES
and case-test-expression hacks.

Hm, I'm not quite following along here.

I'm just arm-waving at this point, it's not real clear to me either.
But I do remember that we have some ugly hacks centered around the
fact that domain VALUES and CaseTestExpr are implemented with a single
datum slot per EContext. I'd rather convert them into something like
PARAM_EXEC with no sharing.

regards, tom lane

#11

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#10)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 15:55:15 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-02-21 15:16:11 -0500, Tom Lane wrote:

It occurs to me that one possible fix is to make MULTIEXPR_SUBLINK
and the associated output Params use a separate ParamExecData array;
instead of the query-wide es_param_exec_vals array, use one that
is local to the specific targetlist's ExprState. I'm not sure how
much violence that does to the current notion of an ExprState ---
do we think that is read-only during execution?

I don't think you would need to modify ExprState - the information about
params etc comes from the ExprContext, right? So we'd need to build a
different ExprContext for partitions, and use that when evaluating the
expressions.
...
We don't currently have infrastructure for setting
econtext->ecxt_param_exec_vals to something else, but that shouldn't be too
hard to add.

No, that won't work, because many usages of PARAM_EXEC Params are
specifically intended to transmit datums from one expression (plan node)
to another. That's why that array was query-global to begin with.
What I'm wondering about is adding a separate array, and likely a separate
ParamKind, that would have a less-than-query-wide scope. We might be able
to get away with having that be plan-node-wide, but making it local to the
specific compiled expression feels safer and easier to reason about.

What I was trying to suggest is that you could have a dedicated ExprContext
that'd point to such a separate array. That'd allow you to choose the the
separate array on a per-expression-evaluation basis (not even per ExprState).
We already have multiple ExprContexts in some nodes, so this wouldn't break
new ground.

Greetings,

Andres Freund

#12

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#11)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Andres Freund <andres@anarazel.de> writes:

On 2023-02-21 15:55:15 -0500, Tom Lane wrote:

What I'm wondering about is adding a separate array, and likely a separate
ParamKind, that would have a less-than-query-wide scope. We might be able
to get away with having that be plan-node-wide, but making it local to the
specific compiled expression feels safer and easier to reason about.

What I was trying to suggest is that you could have a dedicated ExprContext
that'd point to such a separate array. That'd allow you to choose the the
separate array on a per-expression-evaluation basis (not even per ExprState).
We already have multiple ExprContexts in some nodes, so this wouldn't break
new ground.

I'd really like to *not* need the surrounding plan node to know about
this. The tlist push-down behavior shown upthread would result in
that requirement propagating to just about every plan node type,
certainly every one that allows projection.

If we're certain that we'll only need this for MULTIEXPR_SUBLINK and
thus only in tlists, we could conceivably put the support into
ExecProject and friends rather than directly in the ExprState
infrastructure. But that feels like a rather strange compromise,
and it'd foreclose using the concept for other short-lifespan
Param-like nodes.

Another idea I'm toying with is that the expression compiler could
allocate some space when it sees a MULTIEXPR_SUBLINK, and then
connect up the multiexec Params to that.

regards, tom lane

#13

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#12)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 17:34:30 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-02-21 15:55:15 -0500, Tom Lane wrote:

What I'm wondering about is adding a separate array, and likely a separate
ParamKind, that would have a less-than-query-wide scope. We might be able
to get away with having that be plan-node-wide, but making it local to the
specific compiled expression feels safer and easier to reason about.

What I was trying to suggest is that you could have a dedicated ExprContext
that'd point to such a separate array. That'd allow you to choose the the
separate array on a per-expression-evaluation basis (not even per ExprState).
We already have multiple ExprContexts in some nodes, so this wouldn't break
new ground.

I'd really like to *not* need the surrounding plan node to know about
this. The tlist push-down behavior shown upthread would result in
that requirement propagating to just about every plan node type,
certainly every one that allows projection.

Hm, fair point. I had thought about this in a too isolated way, about
expressions evaluated as part of nodeModifyTable.c, rather than stuff
downstream for it.

I was pondering changing EState->es_param_exec_vals while descending into
partition-specific code, just in nodeModifyTable.c, but that doesn't work
as-is, because all the ExprContexts have ->ecxt_param_exec_vals set to the
EState one at node initialization time.

I don't know why we have a copy of es_param_exec_vals in the ExprContext,
tbh. It probably is a tiny bit faster, due to avoiding one level of
indirection, but I have a hard time believing it matters compared to other
costs.

If we're certain that we'll only need this for MULTIEXPR_SUBLINK and
thus only in tlists, we could conceivably put the support into
ExecProject and friends rather than directly in the ExprState
infrastructure. But that feels like a rather strange compromise,
and it'd foreclose using the concept for other short-lifespan
Param-like nodes.

Perhaps we should deal with this by generating a distinct type of expression
step, that looks up information about the param in a different place? Nothing
forces us to have the expression step look into

prm = &(econtext->ecxt_param_exec_vals[op->d.param.paramid]);

If we e.g. emitted a distinct step type for MULTIEXPR_SUBLINK, we could have
it look up params in llast(exprstate->parent->state->es_multilink) that we
push/pop on in nodeModifyTable.c. I think that should work, but it ain't
pretty.

A related idea is be to perform more of the necessary lookups during
expression compilation. If we figured out the execPlan node during expression
compilation, we'd not run into danger of looking up the wrong plan during
expression evaluation.

We already do look into estate during expression compilation for information
about the es_param_list_info, so this wouldn't be breaking new ground.

Another idea I'm toying with is that the expression compiler could
allocate some space when it sees a MULTIEXPR_SUBLINK, and then
connect up the multiexec Params to that.

Where are you thinking of getting the information for connecting the params
from? I don't think we currently have a good way to figure that out during
evaluation time, right?

Greetings,

Andres Freund

#14

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#13)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Andres Freund <andres@anarazel.de> writes:

Perhaps we should deal with this by generating a distinct type of expression
step, that looks up information about the param in a different place? Nothing
forces us to have the expression step look into
prm = &(econtext->ecxt_param_exec_vals[op->d.param.paramid]);

Right, where I was going was to have a distinct EEOP type that finds
the ParamExecData in some other way. The main question is where to keep
that not-so-global ParamExecData.

Another idea I'm toying with is that the expression compiler could
allocate some space when it sees a MULTIEXPR_SUBLINK, and then
connect up the multiexec Params to that.

Where are you thinking of getting the information for connecting the params
from? I don't think we currently have a good way to figure that out during
evaluation time, right?

It would have to look something like the forward-jump fixup logic,
that is keep track of unfinished Param-referencing steps and go back
to fill them in when it finds the driving MULTIEXPR_SUBLINK and allocates
some space to hold the output of that. A lot of details still to be
filled in there, but it doesn't seem very different from stuff we're
already doing.

regards, tom lane

#15

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#14)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 19:00:07 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Perhaps we should deal with this by generating a distinct type of expression
step, that looks up information about the param in a different place? Nothing
forces us to have the expression step look into
prm = &(econtext->ecxt_param_exec_vals[op->d.param.paramid]);

Right, where I was going was to have a distinct EEOP type that finds
the ParamExecData in some other way. The main question is where to keep
that not-so-global ParamExecData.

We currently overwrite prm->execPlan in ExecInitSubPlan(), when creating a
second reference to the subplan. Which is why we then end up using the wrong
SubPlanState in ExecSetParamPlan().

The problem of using the wrong SubPlanState doesn't look too hard to solve: We
could stash the "actual" execPlan in scratch.d.param.something, instead of
looking it up during ExecEvalParamExec().

I think that'd not be quite enough, because due to sharing the same
ParamExecData, we wouldn't know when to recompute the plan.

It also looks like something might not yet quite compute/adjust the types
completely enough in execPartition.c...

Greetings,

Andres Freund

#16

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#15)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-21 17:17:05 -0800, Andres Freund wrote:

We currently overwrite prm->execPlan in ExecInitSubPlan(), when creating a
second reference to the subplan. Which is why we then end up using the wrong
SubPlanState in ExecSetParamPlan().

The problem of using the wrong SubPlanState doesn't look too hard to solve: We
could stash the "actual" execPlan in scratch.d.param.something, instead of
looking it up during ExecEvalParamExec().

It's not quite that easy, because there can be references to subplans that
aren't subquery specific.

I did come up with a, very hacky at this point, prototype that does seems to
fix the issue.

The main problem here IMO is that
a) there can be many different ParamExecData->execPlans for a single param
b) we only want a single value for a PARAM_EXEC to be valid at a time

One solution to that is to move execPlans out of ParamExecData.

In the attached prototype I added a two dimensional list to EState. The first
level is indexed by the paramid, the second identifies partition specific
plans. For non-partition specific subplans it is 0, for partition specific
ones it is a new integer assigned sequentially within ExecInitPartitionInfo().

When generating a reference to a PARAM_EXEC parameter, the current
partition-specific offset is added to the expression step.

One significant complication is that we can't actually know at the point we
encounter the parameter whether the subplan is partition specific or not, or
at least it wasn't obvious to me how to do so. So ExecEvalParamExec() has to
try both.

Obviously the attached patch is in no way meant to be more than a proof of
concept.

I don't think we can move all of the responsibilities here to the planner - we
don't want to generate a plan that has pre-generated expressions, with
different param ids, for every potentially affected partition. So I do think
we need some runtime way of identifying the correct execPlan.

But I think the planner could make the executors job easier, by providing
conveniently accessible information about what a specific paramid is used
for. If we e.g. knew that a specific Param
a) will require execPlan processing
b) references an expression that is expected to be partition specific

we could figure out during expression compilation whether to make the
expression step reference the "query global" plan, or not. Instead of
deferring that to a runtime check in ExecEvalParamExec().

The attached really is just an exploration of the idea, not something anywhere
near a real fix.

It also doesn't yet address the issue with the wrong subplan chosen when the
planner expands partitions, as that doesn't go through
ExecInitPartitionInfo(). But I'm not sure that should be addressed by the same
mechanism - that seems a bit more the planner's responsibility. This means
we'll continue to fail to evaluate
explain (verbose, analyze) UPDATE t SET (a, b) = (SELECT t.a, t.b || '+')
except that now we'll print
WARNING: 01000: overwriting previous subplan exec plan: 0
LOCATION: ExecInitSubPlan, nodeSubplan.c:884
which probably ought to at least be an assertion failure if not a runtime
ERROR, once we hit that, things won't work reliably.

While hacking on this I found it helpful to have ruleutils/explain provide a
bit more detail:
- have references to subplans print the arguments
- deparse onConflictSet
- print subplans if evaluated in the context of a different node

Not sure if any of that is interesting enough to be worth doing outside of
hacking on code like this.

Greetings,

Andres Freund

#17

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#16)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Andres Freund <andres@anarazel.de> writes:

I did come up with a, very hacky at this point, prototype that does seems to
fix the issue.

Hmm ... this doesn't look very much like what I was imagining. Let
me draft a prototype and we can compare.

regards, tom lane

#18

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Tom Lane (#17)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

I wrote:

Hmm ... this doesn't look very much like what I was imagining. Let
me draft a prototype and we can compare.

Here's what I was thinking about. I didn't bother adding regression
test cases yet, but it fixes both of the symptoms Alexander found.

This looks pretty workable to me, and in particular I think it'd be
safe to backpatch (with some fields moved to the end of their structs
to satisfy ABI worries). We could then revert 3f7323cbb et al
in the back branches.

regards, tom lane

#19

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#18)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Hi,

On 2023-02-23 13:53:34 -0500, Tom Lane wrote:

I wrote:

Hmm ... this doesn't look very much like what I was imagining. Let
me draft a prototype and we can compare.

Here's what I was thinking about. I didn't bother adding regression
test cases yet, but it fixes both of the symptoms Alexander found.

I'm not sure I really like the design of the params being local to a single
ExprState, or even local to individual steps in the expression. It seems to
buy further into making MULTIEXPR a special case. Particularly because we here
don't actually need multiple values to live at the same time, we just need
multiple execPlan fields.

Doing that amount of additional work in ExecReadyExpr() seems worrisome to me
- looks like it'd trigger in a lot of expressions that won't need any
adjustment. We could easily short-circuit based on last_param not being set
though.

But perhaps we don't actually need the work in ExecReadyExpr()? What if we
moved private_exec_vals + a bitmap when to use it into ExprState? Afaict we
don't have cases where single paramid could be used multiple times within a
single expression?

I think that might also provide a better basis for redesigning CaseTestExpr
etc, they could use that ExprState local param array as well?

Perhaps the planner could at some point provide metadata about the params in a
query, e.g. whether they ought to be used in a expression-local (eventually
perhaps also node-local?) way, or query-wide way. Then we could emit a
dedicated expression step for each of the cases, which we can't easily right
now.

Greetings,

Andres Freund

#20

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#19)

Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash

Andres Freund <andres@anarazel.de> writes:

I'm not sure I really like the design of the params being local to a single
ExprState, or even local to individual steps in the expression. It seems to
buy further into making MULTIEXPR a special case.

Well, it *is* a special case, because it's (ab)using a mechanism
originally meant only for initplans to do something else. Maybe
we should throw out the whole implementation and start over, but
that line of thinking isn't going to lead to something back-patchable.
I'm not very sure what we would do fundamentally differently anyway.

In any case, I don't see what's the problem with Param values being
transmitted locally to an expression. As I hand-waved earlier,
things like CaseTestValue might profitably be treated that way.

Doing that amount of additional work in ExecReadyExpr() seems worrisome to me
- looks like it'd trigger in a lot of expressions that won't need any
adjustment. We could easily short-circuit based on last_param not being set
though.

Yeah, there's room for some optimization there, but I was trying
to minimize the amount of change to the data structures. One idea,
if we don't mind adding another pointer field to ExprState, is to
make a list of just the SubPlans that have private output arrays.
Then, in the vast majority of expressions, that list will be empty
and we needn't do anything much in ExecReadyExpr. I also contemplated
building some intermediate data structure to ease matching of Params
and SubPlans --- however, it's not real clear that that wouldn't cost
more than it saves. We aren't likely to have very many of these
SubPlans in any one query. (Even the specialized-list idea could be
a net loss once you consider the palloc overhead of making a list.
Maybe we should chain the interesting EEOP_SUBPLAN steps together,
similarly to what I did with EEOP_PARAM_EXEC? There's room for a
link field.)

But perhaps we don't actually need the work in ExecReadyExpr()? What if we
moved private_exec_vals + a bitmap when to use it into ExprState?

Maybe, but then you're adding runtime cost to EEOP_PARAM_EXEC execution
(to check the bitmap) to save compile cost. Doesn't sound promising.

You do have one good point here, which is that we don't really need N
private_exec_vals if there are multiple MULTIEXPR subplans --- they
could share one. But I'm not sure how much contortionism would be
involved in exploiting that observation.