SEGFAULT on a concurrent UPDATE of mix of local and foreign partitions
Hi,
Postgres SEGFAULT'ed on the UPDATE of mix of local and foreign partitions.
Initialization - see t.sql
For replaying this segfault just execute in parallel:
UPDATE test SET x = x - 1;
The problem was introduced by commit 1375422.
ExecUpdate has found a concurrently updated tuples and starts subplan
evaluation. This operation creates new EState for EPQState and sets
es_result_relations in NULL value. Next, ExecInitNode(subplan) is
launched and underlying ExecInitForeignScan tries to access to an
element of es_result_relations. This causes SEGFAULT.
I studied this problem shortly. I think, EPQState can use
es_result_relations of a parent EState. Patch in attachment fixes this.
check-world passed clearly.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
0001-Share-es_result_relations-of-parent-EState-in-EPQSta.patchtext/plain; charset=UTF-8; name=0001-Share-es_result_relations-of-parent-EState-in-EPQSta.patch; x-mac-creator=0; x-mac-type=0Download+5-4
backtrace.txttext/plain; charset=UTF-8; name=backtrace.txt; x-mac-creator=0; x-mac-type=0Download
On 06/08/2021 08:34, Andrey Lepikhov wrote:
Hi,
Postgres SEGFAULT'ed on the UPDATE of mix of local and foreign partitions.
Initialization - see t.sql
For replaying this segfault just execute in parallel:
UPDATE test SET x = x - 1;The problem was introduced by commit 1375422.
ExecUpdate has found a concurrently updated tuples and starts subplan
evaluation. This operation creates new EState for EPQState and sets
es_result_relations in NULL value. Next, ExecInitNode(subplan) is
launched and underlying ExecInitForeignScan tries to access to an
element of es_result_relations. This causes SEGFAULT.
Thanks for the report!
I studied this problem shortly. I think, EPQState can use
es_result_relations of a parent EState. Patch in attachment fixes this.
check-world passed clearly.
Hmm, that seems to work, but I have a few questions:
Can we be sure that the es_result_relations entry of the result relation
has been initialized when the EPQState is constructed? The general model
now is that you initialize them lazily, when needed.
Does it ever make sense to re-evaluate a direct Foreign Update as part
of EvalPlanQual? It doesn't seem sane to me, you cannot just run an
UPDATE statement again and expect it to be idempotent. So I think it
would make sense skip this during EvalPlanQual processing altogether.
And no need to call the FDW's Begin/EndDirectModify functions either.
It's not clear to me what guarantee there is that a direct-modify
Foreign Update/Delete node is never re-evaluated as part of
EvalPlanQual. Is it because the a ModifyTable node is always at the top
of the plan tree? Or is it because the FDW never uses direct-modify in
queries where EvalPlanQual might be needed? Or because we use
ROW_MARK_COPY in those cases?
In any case, re-evaluating a direct-modify statement doesn't seem sane,
so I propose the attached patch to add some runtime checks for that and
to avoid the original segfault.
- Heikki
Attachments:
0001-Fix-segfault-during-EvalPlanQual-with-mix-of-local-a.patchtext/x-patch; charset=UTF-8; name=0001-Fix-segfault-during-EvalPlanQual-with-mix-of-local-a.patchDownload+38-6
On 8/10/21 1:29 AM, Heikki Linnakangas wrote:
On 06/08/2021 08:34, Andrey Lepikhov wrote:
Hi,
Postgres SEGFAULT'ed on the UPDATE of mix of local and foreign
partitions.
Initialization - see t.sql
For replaying this segfault just execute in parallel:
UPDATE test SET x = x - 1;The problem was introduced by commit 1375422.
ExecUpdate has found a concurrently updated tuples and starts subplan
evaluation. This operation creates new EState for EPQState and sets
es_result_relations in NULL value. Next, ExecInitNode(subplan) is
launched and underlying ExecInitForeignScan tries to access to an
element of es_result_relations. This causes SEGFAULT.Thanks for the report!
I studied this problem shortly. I think, EPQState can use
es_result_relations of a parent EState. Patch in attachment fixes this.
check-world passed clearly.Hmm, that seems to work, but I have a few questions:
Can we be sure that the es_result_relations entry of the result relation
has been initialized when the EPQState is constructed? The general model
now is that you initialize them lazily, when needed.
As I understand the code correctly - yes, because a ModifyTable node
higher and will initialize this field earlier. But in many cases it is
not needed.
Does it ever make sense to re-evaluate a direct Foreign Update as part
of EvalPlanQual? It doesn't seem sane to me, you cannot just run an
UPDATE statement again and expect it to be idempotent. So I think it
would make sense skip this during EvalPlanQual processing altogether.
And no need to call the FDW's Begin/EndDirectModify functions either.
Agree. It is my first dive into the EPQ code. After studying i think
this makes no sense.
It's not clear to me what guarantee there is that a direct-modify
Foreign Update/Delete node is never re-evaluated as part of
EvalPlanQual. Is it because the a ModifyTable node is always at the top
of the plan tree? Or is it because the FDW never uses direct-modify in
queries where EvalPlanQual might be needed? Or because we use
ROW_MARK_COPY in those cases?
ModifyTable node may be found in an underlying CTE, for example. And,
right now, it's not clear for me too.
In any case, re-evaluating a direct-modify statement doesn't seem sane,
so I propose the attached patch to add some runtime checks for that and
to avoid the original segfaultYour patch looks much better.
--
regards,
Andrey Lepikhov
Postgres Professional
On 10/08/2021 12:18, Andrey V. Lepikhov wrote:
On 8/10/21 1:29 AM, Heikki Linnakangas wrote:
Does it ever make sense to re-evaluate a direct Foreign Update as part
of EvalPlanQual? It doesn't seem sane to me, you cannot just run an
UPDATE statement again and expect it to be idempotent. So I think it
would make sense skip this during EvalPlanQual processing altogether.
And no need to call the FDW's Begin/EndDirectModify functions either.Agree. It is my first dive into the EPQ code. After studying i think
this makes no sense.
Pushed the fix to master and v14. Thanks for the report!
- Heikki