batch fdw insert bug (Postgres 14)
Hi
I am testing new features in Postgres 14, and I found bug
EXPLAIN ANALYZE VERBOSE for insert to FDW table with batch_size 1000
returns
-------------------------------------------------------------------------------------------------------------------------------
Insert on public.vzdalena_tabulka2 (cost=0.00..175.00 rows=0 width=0)
(actual time=30.269..30.270 rows=0 loops=1)
Remote SQL:
\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F
Batch Size: 1000
-> Function Scan on pg_catalog.generate_series g (cost=0.00..175.00
rows=10000 width=36) (actual time=0.453..1.988 rows=10
Output: g.i, ('AHOJ'::text || (g.i)::text)
Function Call: generate_series(1, 10000)
Planning Time: 0.075 ms
Execution Time: 31.032 ms
(8 rows)
Regards
Pavel
pá 7. 5. 2021 v 11:48 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:
Hi
I am testing new features in Postgres 14, and I found bug
EXPLAIN ANALYZE VERBOSE for insert to FDW table with batch_size 1000
returns-------------------------------------------------------------------------------------------------------------------------------
Insert on public.vzdalena_tabulka2 (cost=0.00..175.00 rows=0 width=0)
(actual time=30.269..30.270 rows=0 loops=1)
Remote SQL:
\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F
Batch Size: 1000
-> Function Scan on pg_catalog.generate_series g (cost=0.00..175.00
rows=10000 width=36) (actual time=0.453..1.988 rows=10
Output: g.i, ('AHOJ'::text || (g.i)::text)
Function Call: generate_series(1, 10000)
Planning Time: 0.075 ms
Execution Time: 31.032 ms
(8 rows)
reproducer
CREATE DATABASE omega;
\c omega
CREATE TABLE tabulka(a int, b varchar);
\c postgres
CREATE EXTENSION postgres_fdw;
CREATE SERVER omega_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname 'omega');
CREATE USER MAPPING FOR "pavel.stehule"
SERVER omega_db OPTIONS (user 'pavel.stehule');
CREATE FOREIGN TABLE vzdalena_tabulka(a int, b varchar)
SERVER omega_db
OPTIONS (table_name 'tabulka');
CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar)
SERVER omega_db
OPTIONS (table_name 'tabulka', batch_size '1000');
EXPLAIN ANALYZE VERBOSE INSERT INTO vzdalena_tabulka SELECT i, 'AHOJ' || i
FROM generate_series(1,10000) g(i);
EXPLAIN ANALYZE VERBOSE INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i
FROM generate_series(1,10000) g(i);
Pavel
Show quoted text
Regards
Pavel
I am testing new features in Postgres 14, and I found bug
EXPLAIN ANALYZE VERBOSE for insert to FDW table with batch_size 1000 returns
-------------------------------------------------------------------------------------------------------------------------------
Insert on public.vzdalena_tabulka2 (cost=0.00..175.00 rows=0 width=0) (actual time=30.269..30.270 rows=0 loops=1)
Remote SQL: \x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F
Batch Size: 1000
-> Function Scan on pg_catalog.generate_series g (cost=0.00..175.00 rows=10000 width=36) (actual time=0.453..1.988 rows=10
Output: g.i, ('AHOJ'::text || (g.i)::text)
Function Call: generate_series(1, 10000)
Planning Time: 0.075 ms
Execution Time: 31.032 ms
(8 rows)
reproducer
I can reproduce the issue and did some basic analysis on it.
The "Remote SQL" is built from the following code:
----------------
char *sql = strVal(list_nth(fdw_private,
FdwModifyPrivateUpdateSql));
ExplainPropertyText("Remote SQL", sql, es);
---------------
It use the query string stored in list fdw_private.
However, the "fmstate->query" will also point to the string in fdw_private,
by postgresBeginForeignModify --> create_foreign_modify --> "fmstate->query = query;"
And in execute_foreign_modify(), " fmstate->query " will be freed when rebuild the query
string to do the batch insert like the following:
----------------
if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
{
...
/* Build INSERT string with numSlots records in its VALUES clause. */
initStringInfo(&sql);
rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
fmstate->p_nums, *numSlots - 1)
** pfree(fmstate->query);
fmstate->query = sql.data;
----------------
So, it output the freed pointer as "Remote SQL".
For the fix.
The query string could be rebuilt depending on the numSlots,
which query string should be output ?
should we just output the original query string like the attached patch ?
Or should we output the last one?
Best regards,
houzj
Attachments:
0001-fix-explain-info-about-fdw-batch-insert.patchapplication/octet-stream; name=0001-fix-explain-info-about-fdw-batch-insert.patchDownload
From 1362d0740311a9194a35b418e788b693efd3dc4c Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 7 May 2021 20:19:19 +0800
Subject: [PATCH] fix explain info about fdw batch insert
---
contrib/postgres_fdw/postgres_fdw.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9a31bbb86b..1a4b9b1c95 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3680,7 +3680,11 @@ create_foreign_modify(EState *estate,
/* Set up remote query information. */
fmstate->query = query;
if (operation == CMD_INSERT)
- fmstate->orig_query = pstrdup(fmstate->query);
+ {
+ fmstate->query = pstrdup(fmstate->query);
+ fmstate->orig_query = query;
+ }
+
fmstate->target_attrs = target_attrs;
fmstate->values_end = values_end;
fmstate->has_returning = has_returning;
--
2.18.4
On 5/7/21 2:46 PM, houzj.fnst@fujitsu.com wrote:
I am testing new features in Postgres 14, and I found bug
EXPLAIN ANALYZE VERBOSE for insert to FDW table with batch_size 1000 returns
-------------------------------------------------------------------------------------------------------------------------------
Insert on public.vzdalena_tabulka2 (cost=0.00..175.00 rows=0 width=0) (actual time=30.269..30.270 rows=0 loops=1)
Remote SQL: \x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F
Batch Size: 1000
-> Function Scan on pg_catalog.generate_series g (cost=0.00..175.00 rows=10000 width=36) (actual time=0.453..1.988 rows=10
Output: g.i, ('AHOJ'::text || (g.i)::text)
Function Call: generate_series(1, 10000)
Planning Time: 0.075 ms
Execution Time: 31.032 ms
(8 rows)
reproducerI can reproduce the issue and did some basic analysis on it.
The "Remote SQL" is built from the following code:
----------------
char *sql = strVal(list_nth(fdw_private,
FdwModifyPrivateUpdateSql));ExplainPropertyText("Remote SQL", sql, es);
---------------It use the query string stored in list fdw_private.
However, the "fmstate->query" will also point to the string in fdw_private,
by postgresBeginForeignModify --> create_foreign_modify --> "fmstate->query = query;"And in execute_foreign_modify(), " fmstate->query " will be freed when rebuild the query
string to do the batch insert like the following:----------------
if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
{
...
/* Build INSERT string with numSlots records in its VALUES clause. */
initStringInfo(&sql);
rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
fmstate->p_nums, *numSlots - 1)
** pfree(fmstate->query);
fmstate->query = sql.data;
----------------So, it output the freed pointer as "Remote SQL".
For the fix.
The query string could be rebuilt depending on the numSlots,
which query string should be output ?
should we just output the original query string like the attached patch ?
Or should we output the last one?
Yeah. The problem is we build fdw_private list once (which references
the SQL string), and during execution we may pfree() it. But then
EXPLAIN ANALYZE gets the same fdw_private list and tries to use the SQL
string which we pfreed() already.
I think the simplest fix is simply to pstrdup() the query when building
fmstate in create_foreign_modify. We've already been doing that for
orig_query anyway. That seems cleaner than printing the last query we
build (which would be confusing I think).
I've pushed a fix doing that. We only need that for INSERT queries, and
we might even restrict that to cases with batching if needed.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pá 7. 5. 2021 v 22:43 odesílatel Tomas Vondra <tomas.vondra@enterprisedb.com>
napsal:
On 5/7/21 2:46 PM, houzj.fnst@fujitsu.com wrote:
I am testing new features in Postgres 14, and I found bug
EXPLAIN ANALYZE VERBOSE for insert to FDW table with batch_size 1000returns
-------------------------------------------------------------------------------------------------------------------------------
Insert on public.vzdalena_tabulka2 (cost=0.00..175.00 rows=0
width=0) (actual time=30.269..30.270 rows=0 loops=1)
Remote SQL:
\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F
Batch Size: 1000
-> Function Scan on pg_catalog.generate_series g(cost=0.00..175.00 rows=10000 width=36) (actual time=0.453..1.988 rows=10
Output: g.i, ('AHOJ'::text || (g.i)::text)
Function Call: generate_series(1, 10000)
Planning Time: 0.075 ms
Execution Time: 31.032 ms
(8 rows)
reproducerI can reproduce the issue and did some basic analysis on it.
The "Remote SQL" is built from the following code:
----------------
char *sql = strVal(list_nth(fdw_private,FdwModifyPrivateUpdateSql));
ExplainPropertyText("Remote SQL", sql, es);
---------------It use the query string stored in list fdw_private.
However, the "fmstate->query" will also point to the string infdw_private,
by postgresBeginForeignModify --> create_foreign_modify -->
"fmstate->query = query;"
And in execute_foreign_modify(), " fmstate->query " will be freed when
rebuild the query
string to do the batch insert like the following:
----------------
if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
{
...
/* Build INSERT string with numSlots records in its VALUESclause. */
initStringInfo(&sql);
rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
fmstate->p_nums, *numSlots - 1)
** pfree(fmstate->query);
fmstate->query = sql.data;
----------------So, it output the freed pointer as "Remote SQL".
For the fix.
The query string could be rebuilt depending on the numSlots,
which query string should be output ?
should we just output the original query string like the attached patch ?
Or should we output the last one?Yeah. The problem is we build fdw_private list once (which references
the SQL string), and during execution we may pfree() it. But then
EXPLAIN ANALYZE gets the same fdw_private list and tries to use the SQL
string which we pfreed() already.I think the simplest fix is simply to pstrdup() the query when building
fmstate in create_foreign_modify. We've already been doing that for
orig_query anyway. That seems cleaner than printing the last query we
build (which would be confusing I think).I've pushed a fix doing that. We only need that for INSERT queries, and
we might even restrict that to cases with batching if needed.
Great
Thank you
Pavel
Show quoted text
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
I think the simplest fix is simply to pstrdup() the query when building
fmstate in create_foreign_modify. We've already been doing that for
orig_query anyway. That seems cleaner than printing the last query we
build (which would be confusing I think).I've pushed a fix doing that. We only need that for INSERT queries, and
we might even restrict that to cases with batching if needed.
Thank you for investigating and committing the fix. (I'm relieved that the feature was not reverted.)
Regards
Takayuki Tsunakawa