TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

Started by Masahiko Sawada8 months ago42 messagesbugs
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi all,

Kristian Lejao (colleague, in CC) has found the following assertion
failure in postgres_fdw.c when rechecking the result tuple via
EvalPlanQual():

TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c",
Line: 2366, PID: 2043518

Here is the reproducible steps that I've simplified from the one
Kristian originally created:

1. setup local node:

create extension postgres_fdw;
create server srv foreign data wrapper postgres_fdw options (host
'localhost', port '5433', dbname 'postgres');
create user mapping for public server srv;
create table a (i int primary key);
create foreign table b (i int) server srv;
create foreign table c (i int) server srv;
insert into a values (1);

2. setup remote node:

create table b (i int);
create table c (i int);
insert into b values (1);
insert into c values (1);

3. attach to the backend process started on the local node (say conn1)
using gdb and set breakpoint at table_tuple_lock().

4. run the following query on conn1 (which stops before locking the
result tuple):

select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update;

5. on another session, update the tuple concurrently:

update a set i = i + 1; -- update 1 tuple

6. continue the query on conn1, the server crashes due to the assertion failure.

The plan of the FOR UPDATE query lead to this issue is:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
LockRows (cost=0.00..615886.00 rows=2550 width=14)
Output: a.i, ((SubPlan 1)), a.ctid
-> Seq Scan on public.a (cost=0.00..615860.50 rows=2550 width=14)
Output: a.i, (SubPlan 1), a.ctid
SubPlan 1
-> Foreign Scan (cost=100.00..241.50 rows=225 width=4)
Output: 1
Relations: (public.b) INNER JOIN (public.c)
Remote SQL: SELECT NULL FROM (public.b r1 INNER JOIN
public.c r2 ON (((r2.i = $1::integer)) AND ((r1.i = $1::integer))))
(9 rows)

The point is that in the subquery in the target list we pushed the
inner join to the foreign server. In postgresGetForeignJoinPaths(), we
prepare the join path for EvalPlanQual() check (and used in
postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR
UPDATE/SHARE (as shown below) but we skip it since the subquery itself
is parsed as a normal SELECT query without rowMarks, leaving
fdw_outerpath of the ForeignScan node NULL:

/*
* If there is a possibility that EvalPlanQual will be executed, we need
* to be able to reconstruct the row using scans of the base relations.
* GetExistingLocalJoinPath will find a suitable path for this purpose in
* the path list of the joinrel, if one exists. We must be careful to
* call it before adding any ForeignPath, since the ForeignPath might
* dominate the only suitable local path available. We also do it before
* calling foreign_join_ok(), since that function updates fpinfo and marks
* it as pushable if the join is found to be pushable.
*/
if (root->parse->commandType == CMD_DELETE ||
root->parse->commandType == CMD_UPDATE ||
root->rowMarks)
{
epq_path = GetExistingLocalJoinPath(joinrel);

Therefore, if the tuple is concurrently updated before taking a lock,
we recheck the traversed tuple via EvalPlanQual() but we end up with
the assertion failure since we didn't prepare the join plan for that.

The attached patch includes the draft fix and regression tests (using
injection points).

I don't have enough experience with the planner and FDW code area to
evaluate whether the patch fixes the issue in the right approach.
Feedback is very welcome. I've confirmed this assertion could happen
with the same scenario on all supported branches.

In addition to that, I realized that none of the regression tests
execute postgresRecheckForeignScan()[1]https://coverage.postgresql.org/contrib/postgres_fdw/postgres_fdw.c.gcov.html#2354(). I think we need to add
regression tests to cover that function.

Regards,

[1]: https://coverage.postgresql.org/contrib/postgres_fdw/postgres_fdw.c.gcov.html#2354()

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

0001-Fix-assertion-failure-in-postgresGetForeignJoinPaths.patchapplication/x-patch; name=0001-Fix-assertion-failure-in-postgresGetForeignJoinPaths.patchDownload+114-4
#2Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#1)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

Sawada-san,

On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Kristian Lejao (colleague, in CC) has found the following assertion
failure in postgres_fdw.c when rechecking the result tuple via
EvalPlanQual():

TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c",
Line: 2366, PID: 2043518

Here is the reproducible steps that I've simplified from the one
Kristian originally created:

[...]

The point is that in the subquery in the target list we pushed the
inner join to the foreign server. In postgresGetForeignJoinPaths(), we
prepare the join path for EvalPlanQual() check (and used in
postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR
UPDATE/SHARE (as shown below) but we skip it since the subquery itself
is parsed as a normal SELECT query without rowMarks, leaving
fdw_outerpath of the ForeignScan node NULL:

/*
* If there is a possibility that EvalPlanQual will be executed, we need
* to be able to reconstruct the row using scans of the base relations.
* GetExistingLocalJoinPath will find a suitable path for this purpose in
* the path list of the joinrel, if one exists. We must be careful to
* call it before adding any ForeignPath, since the ForeignPath might
* dominate the only suitable local path available. We also do it before
* calling foreign_join_ok(), since that function updates fpinfo and marks
* it as pushable if the join is found to be pushable.
*/
if (root->parse->commandType == CMD_DELETE ||
root->parse->commandType == CMD_UPDATE ||
root->rowMarks)
{
epq_path = GetExistingLocalJoinPath(joinrel);

Therefore, if the tuple is concurrently updated before taking a lock,
we recheck the traversed tuple via EvalPlanQual() but we end up with
the assertion failure since we didn't prepare the join plan for that.

The attached patch includes the draft fix and regression tests (using
injection points).

I don't have enough experience with the planner and FDW code area to
evaluate whether the patch fixes the issue in the right approach.
Feedback is very welcome. I've confirmed this assertion could happen
with the same scenario on all supported branches.

Will review. Thank you for the report and patch!

In addition to that, I realized that none of the regression tests
execute postgresRecheckForeignScan()[1]. I think we need to add
regression tests to cover that function.

Yeah, I think so too.

Best regards,
Etsuro Fujita

#3Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#1)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Tue, Aug 05, 2025 at 11:00:54AM -0700, Masahiko Sawada wrote:

The attached patch includes the draft fix and regression tests (using
injection points).

+$psql_session->query_safe(qq[
+    select injection_points_set_local();
+    select injection_points_attach('heapam_lock_tuple-before-lock', 'wait');
+]);

It seems to me that an isolation test would be a better fit here. TAP
tests are usually a good fit for injection points if you need to do
direct node manipulations, like restarts or stops. The whole test
posted only does SQL-ish things, and you could rely on a loopback
server as we do in the SQL tests of postgres_fdw?
--
Michael

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#3)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Aug 6, 2025 at 8:39 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 05, 2025 at 11:00:54AM -0700, Masahiko Sawada wrote:

The attached patch includes the draft fix and regression tests (using
injection points).

+$psql_session->query_safe(qq[
+    select injection_points_set_local();
+    select injection_points_attach('heapam_lock_tuple-before-lock', 'wait');
+]);

It seems to me that an isolation test would be a better fit here. TAP
tests are usually a good fit for injection points if you need to do
direct node manipulations, like restarts or stops. The whole test
posted only does SQL-ish things, and you could rely on a loopback
server as we do in the SQL tests of postgres_fdw?

Yes, it's definitely possible to create the test using isolation. I
wasn't sure how to invoke isolation tests only when injection points
are enabled.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#5Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#4)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote:

Yes, it's definitely possible to create the test using isolation. I
wasn't sure how to invoke isolation tests only when injection points
are enabled.

For meson, I think that I would tweak the isolation test list based on
get_option('injection_points'), and do a similar thing for Makefile.
--
Michael

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#5)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Thu, Aug 7, 2025 at 4:39 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote:

Yes, it's definitely possible to create the test using isolation. I
wasn't sure how to invoke isolation tests only when injection points
are enabled.

For meson, I think that I would tweak the isolation test list based on
get_option('injection_points'), and do a similar thing for Makefile.

Thank you for the advice! I've changed the regression tests to use
isolation tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Fix-assertion-failure-in-postgresGetForeignJoinPa.patchapplication/octet-stream; name=v2-0001-Fix-assertion-failure-in-postgresGetForeignJoinPa.patchDownload+114-5
#7Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#6)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Fri, Aug 08, 2025 at 12:15:19AM -0700, Masahiko Sawada wrote:

On Thu, Aug 7, 2025 at 4:39 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote:

Yes, it's definitely possible to create the test using isolation. I
wasn't sure how to invoke isolation tests only when injection points
are enabled.

For meson, I think that I would tweak the isolation test list based on
get_option('injection_points'), and do a similar thing for Makefile.

Thank you for the advice! I've changed the regression tests to use
isolation tests.

That's enough to reproduce the MULL pointer dereference, thanks!

I would suggest to add a description at the top of foreign_recheck,
documenting the purpose of the test.

+# "s0_lock" execute a FOR UPDATE query but it stops before locking the result
s/execute/executes/, with an 's'.

Fujita-san, are you planning to look at the proposal?
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Mon, Aug 18, 2025 at 03:58:45PM +0900, Michael Paquier wrote:

That's enough to reproduce the MULL pointer dereference, thanks!

I would suggest to add a description at the top of foreign_recheck,
documenting the purpose of the test.

+# "s0_lock" execute a FOR UPDATE query but it stops before locking the result
s/execute/executes/, with an 's'.

Another thing that I've noticed has been forgotten: postgres_fdw's
.gitignore needs entries for output_iso/ and tmp_check_iso/. Based on
git-prompt, GIT_PS1_SHOWUNTRACKEDFILES reports these paths after a
test run.
--
Michael

#9Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Michael Paquier (#7)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

Hi,

On Mon, Aug 18, 2025 at 3:58 PM Michael Paquier <michael@paquier.xyz> wrote:

Fujita-san, are you planning to look at the proposal?

I have just started reviewing the patch. I will provide my first
comments by this Friday at the very latest.

Best regards,
Etsuro Fujita

#10Michael Paquier
michael@paquier.xyz
In reply to: Etsuro Fujita (#9)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Mon, Aug 18, 2025 at 05:42:58PM +0900, Etsuro Fujita wrote:

I have just started reviewing the patch. I will provide my first
comments by this Friday at the very latest.

Great, thanks a lot!
--
Michael

#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#8)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Mon, Aug 18, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Aug 18, 2025 at 03:58:45PM +0900, Michael Paquier wrote:

That's enough to reproduce the MULL pointer dereference, thanks!

I would suggest to add a description at the top of foreign_recheck,
documenting the purpose of the test.

+# "s0_lock" execute a FOR UPDATE query but it stops before locking the result
s/execute/executes/, with an 's'.

Another thing that I've noticed has been forgotten: postgres_fdw's
.gitignore needs entries for output_iso/ and tmp_check_iso/. Based on
git-prompt, GIT_PS1_SHOWUNTRACKEDFILES reports these paths after a
test run.

Thank you for reviewing the patch! I'll incorporate these comments in
the next version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#12Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#2)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Aug 6, 2025 at 8:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The attached patch includes the draft fix and regression tests (using
injection points).

I don't have enough experience with the planner and FDW code area to
evaluate whether the patch fixes the issue in the right approach.
Feedback is very welcome. I've confirmed this assertion could happen
with the same scenario on all supported branches.

Will review. Thank you for the report and patch!

First, my apologies for the delay.

I reviewed the postgres_fdw.c part of the fix:

@@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
     * calling foreign_join_ok(), since that function updates fpinfo and marks
     * it as pushable if the join is found to be pushable.
     */
-   if (root->parse->commandType == CMD_DELETE ||
-       root->parse->commandType == CMD_UPDATE ||
-       root->rowMarks)
+   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
+   {
+       if (proot->parse->commandType == CMD_DELETE ||
+           proot->parse->commandType == CMD_UPDATE ||
+           proot->rowMarks)
+       {
+           need_epq = true;
+           break;
+       }
+   }
+
+   if (need_epq)
    {
        epq_path = GetExistingLocalJoinPath(joinrel);
        if (!epq_path)

I think this successfully avoids the assertion failure and produces
the correct result, but sorry, I don't think it's the right way to go.
I think the root cause of this issue is in the EPQ handling of
foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
recheck method function for a given foreign/custom join whenever it is
called for EPQ rechecking, but that is not 100% correct. I think the
correct handling is: run the recheck method function for the join if
it is called for EPQ rechecking and the join is a *descendant* join in
the EPQ plan tree; otherwise run the access method function for the
join even if it is called for EPQ rechecking, like the attached (where
I used the epqParam of the given EPQState to determine whether the
join is a descendant join or not, which localizes the fix pretty
well). For the SELECT FOR UPDATE query shown upthread, when doing an
EPQ recheck, the fix evaluates the sub-select expression in the target
list by doing a remote join, not a local join, so it would work more
efficiently than the fix you proposed.

Sorry, I have not yet reviewed the isolation test part. Will do next.

Best regards,
Etsuro Fujita

Attachments:

Fix-EPQ-handling-of-foreign-joins-in-ExecScanFetch.patchapplication/octet-stream; name=Fix-EPQ-handling-of-foreign-joins-in-ExecScanFetch.patchDownload+15-7
#13Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#12)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Fri, Aug 22, 2025 at 3:59 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 6, 2025 at 8:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The attached patch includes the draft fix and regression tests (using
injection points).

I don't have enough experience with the planner and FDW code area to
evaluate whether the patch fixes the issue in the right approach.
Feedback is very welcome. I've confirmed this assertion could happen
with the same scenario on all supported branches.

Will review. Thank you for the report and patch!

First, my apologies for the delay.

I reviewed the postgres_fdw.c part of the fix:

@@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
* calling foreign_join_ok(), since that function updates fpinfo and marks
* it as pushable if the join is found to be pushable.
*/
-   if (root->parse->commandType == CMD_DELETE ||
-       root->parse->commandType == CMD_UPDATE ||
-       root->rowMarks)
+   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
+   {
+       if (proot->parse->commandType == CMD_DELETE ||
+           proot->parse->commandType == CMD_UPDATE ||
+           proot->rowMarks)
+       {
+           need_epq = true;
+           break;
+       }
+   }
+
+   if (need_epq)
{
epq_path = GetExistingLocalJoinPath(joinrel);
if (!epq_path)

I think this successfully avoids the assertion failure and produces
the correct result, but sorry, I don't think it's the right way to go.
I think the root cause of this issue is in the EPQ handling of
foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
recheck method function for a given foreign/custom join whenever it is
called for EPQ rechecking, but that is not 100% correct. I think the
correct handling is: run the recheck method function for the join if
it is called for EPQ rechecking and the join is a *descendant* join in
the EPQ plan tree; otherwise run the access method function for the
join even if it is called for EPQ rechecking, like the attached (where
I used the epqParam of the given EPQState to determine whether the
join is a descendant join or not, which localizes the fix pretty
well). For the SELECT FOR UPDATE query shown upthread, when doing an
EPQ recheck, the fix evaluates the sub-select expression in the target
list by doing a remote join, not a local join, so it would work more
efficiently than the fix you proposed.

Thank you for reviewing the patch! I've confirmed that your patch
fixes the issue too.

If I understand your proposed fix correctly, the reported problem is
fixed by not rechecking the test tuple by ForeignRecheck() (performing
a local join), but instead we simply call ForeignNext() and get the
next tuple. I think while this fix would have to have a regression
test like I've proposed, it's probably a good time to add some
regression tests to cover postgresRecheckForeignScan() too possibly as
a separate patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#14Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#13)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Sat, Aug 23, 2025 at 5:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 22, 2025 at 3:59 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

I reviewed the postgres_fdw.c part of the fix:

@@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
* calling foreign_join_ok(), since that function updates fpinfo and marks
* it as pushable if the join is found to be pushable.
*/
-   if (root->parse->commandType == CMD_DELETE ||
-       root->parse->commandType == CMD_UPDATE ||
-       root->rowMarks)
+   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
+   {
+       if (proot->parse->commandType == CMD_DELETE ||
+           proot->parse->commandType == CMD_UPDATE ||
+           proot->rowMarks)
+       {
+           need_epq = true;
+           break;
+       }
+   }
+
+   if (need_epq)
{
epq_path = GetExistingLocalJoinPath(joinrel);
if (!epq_path)

I think this successfully avoids the assertion failure and produces
the correct result, but sorry, I don't think it's the right way to go.
I think the root cause of this issue is in the EPQ handling of
foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
recheck method function for a given foreign/custom join whenever it is
called for EPQ rechecking, but that is not 100% correct. I think the
correct handling is: run the recheck method function for the join if
it is called for EPQ rechecking and the join is a *descendant* join in
the EPQ plan tree; otherwise run the access method function for the
join even if it is called for EPQ rechecking, like the attached (where
I used the epqParam of the given EPQState to determine whether the
join is a descendant join or not, which localizes the fix pretty
well). For the SELECT FOR UPDATE query shown upthread, when doing an
EPQ recheck, the fix evaluates the sub-select expression in the target
list by doing a remote join, not a local join, so it would work more
efficiently than the fix you proposed.

Thank you for reviewing the patch! I've confirmed that your patch
fixes the issue too.

Thanks for testing!

If I understand your proposed fix correctly, the reported problem is
fixed by not rechecking the test tuple by ForeignRecheck() (performing
a local join), but instead we simply call ForeignNext() and get the
next tuple.

That's right.

I think while this fix would have to have a regression
test like I've proposed, it's probably a good time to add some
regression tests to cover postgresRecheckForeignScan() too possibly as
a separate patch.

Agreed. Actually, we have fixed many EvalPlanQual issues with
postgres_fdw so far, but have always left adding test cases for future
work, so let's do that now!

The test case you showed upthread and added to the patch is useful, so
I think we should add it as well, but my question about it is: is it
really a good idea to use injection points? Why don't you just use
BEGIN/COMMIT statements like this:

-- session 1
begin isolation level read committed;
update a set i = i + 1;

-- session 2
begin isolation level read committed;
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- waits for the transaction in session 1 to complete

-- session 1
commit;

-- session 2
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- produces results after doing an EvalPlanQual recheck
i | ?column?
---+----------
2 |
(1 row)

Again, my apologies for the late response.

Best regards,
Etsuro Fujita

#15Michael Paquier
michael@paquier.xyz
In reply to: Etsuro Fujita (#14)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Sep 17, 2025 at 07:42:36PM +0900, Etsuro Fujita wrote:

The test case you showed upthread and added to the patch is useful, so
I think we should add it as well, but my question about it is: is it
really a good idea to use injection points?

Why don't you just use BEGIN/COMMIT statements like this:

-- session 1
begin isolation level read committed;
update a set i = i + 1;

-- session 2
begin isolation level read committed;
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- waits for the transaction in session 1 to complete

-- session 1
commit;

-- session 2
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- produces results after doing an EvalPlanQual recheck
i | ?column?
---+----------
2 |
(1 row)

Again, my apologies for the late response.

As far as I can see, this causes the SELECT FOR UPDATE of session 2
that's waiting for the commit of session 1 to crash, if we don't have
the fix, of course. Removing the dependency with injection points is
nice if we don't require it, so we can just tweak the isolation test
proposed upthread to use the same schema, but the queries you are
suggesting.

As a whole, +1 for your suggestion.
--
Michael

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#15)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Sep 17, 2025 at 5:25 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Sep 17, 2025 at 07:42:36PM +0900, Etsuro Fujita wrote:

The test case you showed upthread and added to the patch is useful, so
I think we should add it as well, but my question about it is: is it
really a good idea to use injection points?

Why don't you just use BEGIN/COMMIT statements like this:

-- session 1
begin isolation level read committed;
update a set i = i + 1;

-- session 2
begin isolation level read committed;
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- waits for the transaction in session 1 to complete

-- session 1
commit;

-- session 2
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update; -- produces results after doing an EvalPlanQual recheck
i | ?column?
---+----------
2 |
(1 row)

Again, my apologies for the late response.

As far as I can see, this causes the SELECT FOR UPDATE of session 2
that's waiting for the commit of session 1 to crash, if we don't have
the fix, of course. Removing the dependency with injection points is
nice if we don't require it, so we can just tweak the isolation test
proposed upthread to use the same schema, but the queries you are
suggesting.

+1

I changed the regression tests and used the fix proposed by
Fujita-san. Please review the attached new version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Fix-assertion-failure-in-postgresGetForeignJoinPa.patchapplication/octet-stream; name=v3-0001-Fix-assertion-failure-in-postgresGetForeignJoinPa.patchDownload+97-9
#17Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#16)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Tue, Sep 23, 2025 at 02:41:45PM -0700, Masahiko Sawada wrote:

I changed the regression tests and used the fix proposed by
Fujita-san. Please review the attached new version patch.

Thanks for the updated patch. Confirmed that I can still blow up the
backend as expected when not using the fix, only the test.
--
Michael

#18Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#16)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I changed the regression tests and used the fix proposed by
Fujita-san. Please review the attached new version patch.

Thanks for updating the patch!

I took a quick glance at the patch. My initial comment is: it only
includes the test case discussed here, but I think it's a good idea to
add more cases in it (that exercise code paths in
postgresRecheckForeignScan), as I mentioned upthread. What do you
think about that? If no objections, I'd like to update it to include
such cases.

Anyway I'll review it in more detail.

Best regards,
Etsuro Fujita

#19Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#18)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I changed the regression tests and used the fix proposed by
Fujita-san. Please review the attached new version patch.

Thanks for updating the patch!

I took a quick glance at the patch. My initial comment is: it only
includes the test case discussed here, but I think it's a good idea to
add more cases in it (that exercise code paths in
postgresRecheckForeignScan), as I mentioned upthread. What do you
think about that? If no objections, I'd like to update it to include
such cases.

+1 to add more test cases for the code paths in postgresRecheckForeignScan.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#20Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#19)
Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

On Wed, Sep 24, 2025 at 09:45:26AM -0700, Masahiko Sawada wrote:

On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

I took a quick glance at the patch. My initial comment is: it only
includes the test case discussed here, but I think it's a good idea to
add more cases in it (that exercise code paths in
postgresRecheckForeignScan), as I mentioned upthread. What do you
think about that? If no objections, I'd like to update it to include
such cases.

+1 to add more test cases for the code paths in postgresRecheckForeignScan.

+1.  Would these be included in the same isolation test?  That may be
cleaner, especially as it seems like the EPQ re-evaluations should be
able to happen with the same schema as the one set up by the test.
--
Michael
#21Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Michael Paquier (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Etsuro Fujita (#18)
#23Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#22)
#24Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#16)
#25Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#24)
#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#26)
#28Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#26)
#29Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#28)
#30Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#29)
#31Michael Paquier
michael@paquier.xyz
In reply to: Etsuro Fujita (#30)
#32Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Michael Paquier (#31)
#33Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#32)
#34Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#33)
#35Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Michael Paquier (#34)
#36Michael Paquier
michael@paquier.xyz
In reply to: Etsuro Fujita (#35)
#37Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Michael Paquier (#36)
#38Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#37)
#39Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#38)
#40Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Michael Paquier (#39)
#41Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Etsuro Fujita (#40)
#42Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#41)