BUG #17344: Assert failed on queiring async_capable foreign table with inheritance
The following bug has been logged on the website:
Bug reference: 17344
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 14.1
Operating system: Ubuntu 20.04
Description:
When executing the following query:
create extension postgres_fdw;
do $d$
begin
execute $$create server loopback foreign data wrapper postgres_fdw
options (dbname '$$||current_database()||$$',
port '$$||current_setting('port')||$$',
async_capable 'true'
)$$;
end;
$d$;
create user mapping for current_user server loopback;
create table tab1 (a int);
insert into tab1 values(1), (2), (3);
create foreign table ftab1 (b int) server loopback options (table_name
'tab1');
create foreign table ftab2 () inherits (ftab1) server loopback;
select a from tab1 where a in (select a from ftab1);
I get an assertion failed with the stacktrace:
Core was generated by `postgres: law regression [local] SELECT
'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f2a2f9fa859 in __GI_abort () at abort.c:79
#2 0x0000557dece3c5d8 in ExceptionalCondition
(conditionName=conditionName@entry=0x7f2a225a8da7 "areq != pendingAreq",
errorType=errorType@entry=0x7f2a225a76aa "FailedAssertion",
fileName=0x7ffde2262540 "\271\305\343\354}U",
fileName@entry=0x7f2a225a88af "postgres_fdw.c",
lineNumber=lineNumber@entry=6974) at assert.c:69
#3 0x00007f2a225a4e34 in produce_tuple_asynchronously (areq=<optimized
out>, fetch=<optimized out>)
at postgres_fdw.c:6974
#4 0x0000557decb6e152 in ExecAsyncRequest (areq=0x557ded812670) at
execAsync.c:38
#5 0x0000557decb94e4e in ExecAppendAsyncBegin (node=0x557ded7f2040) at
nodeAppend.c:907
#6 ExecAppend (pstate=0x557ded7f2040) at nodeAppend.c:317
#7 0x0000557decbb0183 in ExecProcNode (node=0x557ded7f2040) at
../../../src/include/executor/executor.h:257
#8 ExecResult (pstate=0x557ded7f1e38) at nodeResult.c:115
#9 0x0000557decbb263d in ExecProcNode (node=0x557ded7f1e38) at
../../../src/include/executor/executor.h:257
#10 ExecScanSubPlan (isNull=0x557ded81ba35, econtext=0x557ded81b7e8,
node=0x557ded81c2e0) at nodeSubplan.c:323
#11 ExecSubPlan (node=node@entry=0x557ded81c2e0,
econtext=econtext@entry=0x557ded81b7e8, isNull=0x557ded81ba35)
at nodeSubplan.c:89
#12 0x0000557decb77beb in ExecEvalSubPlan (econtext=0x557ded81b7e8,
op=<optimized out>, state=0x557ded81ba30)
at execExprInterp.c:3932
#13 ExecInterpExpr (state=0x557ded81ba30, econtext=0x557ded81b7e8,
isnull=<optimized out>) at execExprInterp.c:1564
#14 0x0000557decb865d5 in ExecEvalExprSwitchContext (isNull=0x7ffde2262ad7,
econtext=0x557ded81b7e8,
state=0x557ded81ba30) at ../../../src/include/executor/executor.h:339
#15 ExecQual (econtext=0x557ded81b7e8, state=0x557ded81ba30) at
../../../src/include/executor/executor.h:408
#16 ExecScan (node=0x557ded7f3748, accessMtd=0x557decbb0ad0 <SeqNext>,
recheckMtd=0x557decbb0ac0 <SeqRecheck>)
at execScan.c:227
#17 0x0000557decb7cc7a in ExecProcNode (node=0x557ded7f3748) at
../../../src/include/executor/executor.h:257
#18 ExecutePlan (execute_once=<optimized out>, dest=0x557ded80a428,
direction=<optimized out>, numberTuples=0,
sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x557ded7f3748,
estate=0x557ded7f1b90) at execMain.c:1551
#19 standard_ExecutorRun (queryDesc=0x557ded800c20, direction=<optimized
out>, count=0, execute_once=<optimized out>)
at execMain.c:361
#20 0x0000557decd18b9c in PortalRunSelect (portal=0x557ded756860,
forward=<optimized out>, count=0,
dest=<optimized out>) at pquery.c:921
#21 0x0000557decd1a2cb in PortalRun (portal=portal@entry=0x557ded756860,
count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
dest=dest@entry=0x557ded80a428,
altdest=altdest@entry=0x557ded80a428, qc=0x7ffde2262d80) at
pquery.c:765
#22 0x0000557decd1614b in exec_simple_query (
query_string=0x557ded6e7960 "select a from tab1 where a in (select a
from ftab1);") at postgres.c:1214
#23 0x0000557decd17d51 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffde22631f0, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4486
#24 0x0000557decc83c8d in BackendRun (port=0x557ded70b3d0,
port=0x557ded70b3d0) at postmaster.c:4530
#25 BackendStartup (port=0x557ded70b3d0) at postmaster.c:4252
#26 ServerLoop () at postmaster.c:1745
#27 0x0000557decc84c31 in PostmasterMain (argc=<optimized out>,
argv=<optimized out>) at postmaster.c:1417
#28 0x0000557dec9b24b2 in main (argc=3, argv=0x557ded6e1990) at main.c:209
With async_capable = 'false' I get expected results.
But when ftab1 defined as
create foreign table ftab1 (a int) server loopback options (table_name
'tab1');
The select fails with:
psql:query.sql:19: ERROR: could not connect to server "loopback"
DETAIL: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:
sorry, too many clients already
CONTEXT: remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
...
with or without async_capable.
On Sat, Dec 25, 2021 at 10:00:01AM +0000, PG Bug reporting form wrote:
create extension postgres_fdw;
do $d$
begin
execute $$create server loopback foreign data wrapper postgres_fdw
options (dbname '$$||current_database()||$$',
port '$$||current_setting('port')||$$',
async_capable 'true'
)$$;
end;
$d$;
create user mapping for current_user server loopback;create table tab1 (a int);
insert into tab1 values(1), (2), (3);
create foreign table ftab1 (b int) server loopback options (table_name
'tab1');create foreign table ftab2 () inherits (ftab1) server loopback;
select a from tab1 where a in (select a from ftab1);With async_capable = 'false' I get expected results.
But when ftab1 defined as
create foreign table ftab1 (a int) server loopback options (table_name
'tab1');
The select fails with:
psql:query.sql:19: ERROR: could not connect to server "loopback"
DETAIL: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:
sorry, too many clients already
CONTEXT: remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
Apparently in both cases the second foreign scan of append ends up in
recursion:
-> Foreign Scan on public.ftab2 ftab1_2
Remote SQL: SELECT NULL FROM public.ftab2
This leads to a chain of the same cursor declarations:
DECLARE c1 CURSOR FOR SELECT a FROM public.ftab4
With a pattern:
bind -> new backend with the same declare -> execute -> bind -> ...
In the case of async foreign scan it hits limitation when no new
requests be sent while there is a pending one. In the second case it
hits connection limit. Shouldn't this type of foreign paths be avoided
by FDW?
On Sun, Dec 26, 2021 at 8:26 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Sat, Dec 25, 2021 at 10:00:01AM +0000, PG Bug reporting form wrote:
create extension postgres_fdw;
do $d$
begin
execute $$create server loopback foreign data wrapper postgres_fdw
options (dbname '$$||current_database()||$$',
port '$$||current_setting('port')||$$',
async_capable 'true'
)$$;
end;
$d$;
create user mapping for current_user server loopback;create table tab1 (a int);
insert into tab1 values(1), (2), (3);
create foreign table ftab1 (b int) server loopback options (table_name
'tab1');create foreign table ftab2 () inherits (ftab1) server loopback;
select a from tab1 where a in (select a from ftab1);With async_capable = 'false' I get expected results.
But when ftab1 defined as
create foreign table ftab1 (a int) server loopback options (table_name
'tab1');
The select fails with:
psql:query.sql:19: ERROR: could not connect to server "loopback"
DETAIL: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:
sorry, too many clients already
CONTEXT: remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
remote SQL command: SELECT a FROM public.ftab2
Reproduced here.
Apparently in both cases the second foreign scan of append ends up in
recursion:-> Foreign Scan on public.ftab2 ftab1_2
Remote SQL: SELECT NULL FROM public.ftab2This leads to a chain of the same cursor declarations:
DECLARE c1 CURSOR FOR SELECT a FROM public.ftab4
With a pattern:
bind -> new backend with the same declare -> execute -> bind -> ...
In the case of async foreign scan it hits limitation when no new
requests be sent while there is a pending one.
Yeah, but when executing a simple query “select NULL from ftab1”,
which would essentially be the same as the sublink subquery used in
the first case, I don’t get the assertion failure. (The simple query
only throws the connection limit error.) The root cause of the
assertion failure in the first case might be something other than the
limitation. I’ll look into this in more detail.
Thanks, Alexander and Dmitry!
Best regards,
Etsuro Fujita
On Tue, Dec 28, 2021 at 10:14 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
The root cause of the
assertion failure in the first case might be something other than the
limitation. I’ll look into this in more detail.
I think the root cause is that we fail to process a pending async
request (if any) in postgresReScanForeignScan() in the case where we
just reset the next_tuple counter in that function, without
destroying/recreating or rewinding the cursor. (In the case where we
destroy/recreate or rewind the cursor in that function, the pending
async request would be processed by pgfdw_exec_query() called from
that function.) This breaks the assumption about ExecAppend() that
when called for the first time after ReScan, there are no pending
async requests made for async subplans for the Append node, causing
the assertion failure in postgresForeignAsyncRequest() called from
that ExecAppend(). My oversight in commit 27e1f1456. :-(
To fix, I modified postgresReScanForeignScan() so that we always
process a pending async request (if any) before restarting the foreign
scan. Attached is a patch for that. I tested the patch with the
first case, and it addresses the assertion failure.
Best regards,
Etsuro Fujita
Attachments:
fix-postgres-fdw-async-rescan.patchapplication/octet-stream; name=fix-postgres-fdw-async-rescan.patchDownload+11-0
On Fri, Dec 31, 2021 at 04:36:42PM +0900, Etsuro Fujita wrote:
On Tue, Dec 28, 2021 at 10:14 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:The root cause of the
assertion failure in the first case might be something other than the
limitation. I’ll look into this in more detail.I think the root cause is that we fail to process a pending async
request (if any) in postgresReScanForeignScan() in the case where we
just reset the next_tuple counter in that function, without
destroying/recreating or rewinding the cursor. (In the case where we
destroy/recreate or rewind the cursor in that function, the pending
async request would be processed by pgfdw_exec_query() called from
that function.) This breaks the assumption about ExecAppend() that
when called for the first time after ReScan, there are no pending
async requests made for async subplans for the Append node, causing
the assertion failure in postgresForeignAsyncRequest() called from
that ExecAppend(). My oversight in commit 27e1f1456. :-(To fix, I modified postgresReScanForeignScan() so that we always
process a pending async request (if any) before restarting the foreign
scan. Attached is a patch for that. I tested the patch with the
first case, and it addresses the assertion failure.
Yep, makes sense now, thank you. The fix works for me, but I'm curious
about the requestee condition:
fsstate->conn_state->pendingAreq->requestee == (PlanState *) node)
You've mentioned that in those cases where the cursor will be
destroyed/recreated or rewind, pgfdw_exec_query will take care of
processing pending requests, and looks like it will do this without
checking the requestee. Just for me to understand, why is this condition
necessary?
Hello Etsuro-san,
31.12.2021 10:36, Etsuro Fujita wrote:
On Tue, Dec 28, 2021 at 10:14 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
The root cause of the
assertion failure in the first case might be something other than the
limitation. I’ll look into this in more detail.To fix, I modified postgresReScanForeignScan() so that we always
process a pending async request (if any) before restarting the foreign
scan. Attached is a patch for that. I tested the patch with the
first case, and it addresses the assertion failure.
Thanks for the fix! I can confirm that it eliminates the failure and
propose a simple test case for the issue. Maybe you will find it useful.
Best regards,
Alexander
Attachments:
posgres_fdw.sql.patchtext/x-patch; charset=UTF-8; name=posgres_fdw.sql.patchDownload+9-0
On Fri, Dec 31, 2021 at 9:32 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Fri, Dec 31, 2021 at 04:36:42PM +0900, Etsuro Fujita wrote:
I think the root cause is that we fail to process a pending async
request (if any) in postgresReScanForeignScan() in the case where we
just reset the next_tuple counter in that function, without
destroying/recreating or rewinding the cursor. (In the case where we
destroy/recreate or rewind the cursor in that function, the pending
async request would be processed by pgfdw_exec_query() called from
that function.) This breaks the assumption about ExecAppend() that
when called for the first time after ReScan, there are no pending
async requests made for async subplans for the Append node, causing
the assertion failure in postgresForeignAsyncRequest() called from
that ExecAppend(). My oversight in commit 27e1f1456. :-(To fix, I modified postgresReScanForeignScan() so that we always
process a pending async request (if any) before restarting the foreign
scan. Attached is a patch for that. I tested the patch with the
first case, and it addresses the assertion failure.Yep, makes sense now, thank you. The fix works for me, but I'm curious
about the requestee condition:fsstate->conn_state->pendingAreq->requestee == (PlanState *) node)
You've mentioned that in those cases where the cursor will be
destroyed/recreated or rewind, pgfdw_exec_query will take care of
processing pending requests, and looks like it will do this without
checking the requestee. Just for me to understand, why is this condition
necessary?
The reason is that the pending request might not need to be processed.
Consider this query:
alter server loopback1 options (add async_capable 'true');
alter server loopback1 options (add use_remote_estimate 'true');
create table a (aa int, constraint a_pkey primary key (aa));
create table loct (aa int, bb text, constraint loct_pkey primary key (aa));
create foreign table b (bb text) inherits (a) server loopback1 options
(table_name 'loct');
insert into a select i from generate_series(1, 10000) i;
insert into b select i, to_char(i, 'FM00000') from generate_series(1, 10000) i;
analyze loct;
analyze a;
explain verbose select * from a t1, a t2 where t1.aa = t2.aa limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=0.29..54.64 rows=1 width=8)
Output: t1.aa, t2.aa
-> Nested Loop (cost=0.29..2174178.00 rows=40000 width=8)
Output: t1.aa, t2.aa
-> Append (cost=0.00..700.00 rows=20000 width=4)
-> Seq Scan on public.a t1_1 (cost=0.00..145.00
rows=10000 width=4)
Output: t1_1.aa
-> Async Foreign Scan on public.b t1_2
(cost=100.00..455.00 rows=10000 width=4)
Output: t1_2.aa
Remote SQL: SELECT aa FROM public.loct
-> Append (cost=0.29..108.65 rows=2 width=4)
-> Index Only Scan using a_pkey on public.a t2_1
(cost=0.29..0.31 rows=1 width=4)
Output: t2_1.aa
Index Cond: (t2_1.aa = t1.aa)
-> Async Foreign Scan on public.b t2_2
(cost=100.30..108.33 rows=1 width=4)
Output: t2_2.aa
Remote SQL: SELECT aa FROM public.loct WHERE
(($1::integer = aa))
(17 rows)
Note that when starting the scan for the outer side, we’ll begin an
asynchronous fetch for the foreign table on the outer side, but we
won’t wait for the results of it until after we get all tuples from
the local table on the outer side (see the documentation note about
the async_capable option for postgres_fdw [1]https://www.postgresql.org/docs/14/postgres-fdw.html#id-1.11.7.44.11). So when rescanning
the inner side after getting the first tuple from the local table on
the outer side, the asynchronous fetch begun for the foreign table on
the outer side will still be in progress, and we’ll have it as
pendingAreq when called from postgresReScanForeignScan() for
rescanning the foreign table on the inner side. BUT it would be
useless to process the pending request there, because even without
processing the pending request, we can get a matching tuple for the
outer tuple from the local table on the inner side, as the local table
will be scanned first even in the inner side, and then return the
query result. I think the condition mentioned above is useful to
avoid processing such a useless pending request and return result
tuples more quickly.
Thanks for reviewing!
Best regards,
Etsuro Fujita
[1]: https://www.postgresql.org/docs/14/postgres-fdw.html#id-1.11.7.44.11
Hi Alexander,
On Mon, Jan 3, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
31.12.2021 10:36, Etsuro Fujita wrote:
To fix, I modified postgresReScanForeignScan() so that we always
process a pending async request (if any) before restarting the foreign
scan. Attached is a patch for that. I tested the patch with the
first case, and it addresses the assertion failure.
Thanks for the fix! I can confirm that it eliminates the failure and
Good to know. Thanks for testing!
propose a simple test case for the issue. Maybe you will find it useful.
Thanks, but in my environment, the test case doesn’t cause any
failure. How about something like the attached, which is made by
modifying the original test case to avoid the connection-limit-error
issue. In the attached, I also added a comment to a function to match
other places.
Best regards,
Etsuro Fujita
Attachments:
fix-postgres-fdw-async-rescan-2.patchapplication/octet-stream; name=fix-postgres-fdw-async-rescan-2.patchDownload+68-0
07.01.2022 09:15, Etsuro Fujita wrote:
On Mon, Jan 3, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
propose a simple test case for the issue. Maybe you will find it useful.
Thanks, but in my environment, the test case doesn’t cause any
failure.
It's interesting. I perform on REL_14_STABLE (61c8da50) the following:
wget
/messages/by-id/attachment/129455/posgres_fdw.sql.patch
-O - | git apply
./configure --enable-debug --enable-cassert >/dev/null && make -j8
/dev/null && make -j8 -C contrib >/dev/null && make check -C
contrib/postgres_fdw/
and get:
test postgres_fdw ... FAILED (test process exited with
exit code 2) 1116 ms
How about something like the attached, which is made by
modifying the original test case to avoid the connection-limit-error
issue. In the attached, I also added a comment to a function to match
other places.
Your modified test fails with unpatched postgres_fdw.c too (as
expected). I would prefer your test as more elaborated.
Best regards,
Alexander
On Sat, Jan 8, 2022 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
07.01.2022 09:15, Etsuro Fujita wrote:
On Mon, Jan 3, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
propose a simple test case for the issue. Maybe you will find it useful.
Thanks, but in my environment, the test case doesn’t cause any
failure.It's interesting. I perform on REL_14_STABLE (61c8da50) the following:
wget
/messages/by-id/attachment/129455/posgres_fdw.sql.patch
-O - | git apply
./configure --enable-debug --enable-cassert >/dev/null && make -j8/dev/null && make -j8 -C contrib >/dev/null && make check -C
contrib/postgres_fdw/
and get:
test postgres_fdw ... FAILED (test process exited with
exit code 2) 1116 ms
Reproduced. Sorry for that. Maybe, I tested this on a
non-assertion-enabled build by mistake. :-(
How about something like the attached, which is made by
modifying the original test case to avoid the connection-limit-error
issue. In the attached, I also added a comment to a function to match
other places.Your modified test fails with unpatched postgres_fdw.c too (as
expected). I would prefer your test as more elaborated.
Ok, but I modified the added test case a bit further so that foreign
tables are well-defined like in your second test case, which I think
is good. Attached is an updated version of the patch.
Thanks!
Best regards,
Etsuro Fujita
Attachments:
fix-postgres-fdw-async-rescan-3.patchapplication/octet-stream; name=fix-postgres-fdw-async-rescan-3.patchDownload+68-0
Hello Etsuro-san,
08.01.2022 11:57, Etsuro Fujita wrote:
Ok, but I modified the added test case a bit further so that foreign
tables are well-defined like in your second test case, which I think
is good. Attached is an updated version of the patch.
Thank you! The updated version looks even better to me.
Best regards,
Alexander
On Sun, Jan 9, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
08.01.2022 11:57, Etsuro Fujita wrote:
Ok, but I modified the added test case a bit further so that foreign
tables are well-defined like in your second test case, which I think
is good. Attached is an updated version of the patch.Thank you! The updated version looks even better to me.
Thanks for reviewing! I've committed the patch.
Best regards,
Etsuro Fujita