Fdw batch insert error out when set batch_size > 65535

Started by Zhijie Hou (Fujitsu)almost 5 years ago33 messageshackers
Jump to latest
#1Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com

Hi,

When reading the code, I noticed some possible issue about fdw batch insert.
When I set batch_size > 65535 and insert more than 65535 rows into foreign table,
It will throw an error:

For example:

------------------
CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar)
  SERVER omega_db
  OPTIONS (table_name 'tabulka', batch_size '65536');

INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i FROM generate_series(1,65536) g(i);

ERROR: number of parameters must be between 0 and 65535
CONTEXT: remote SQL command: INSERT INTO public.tabulka(a, b) VALUES ($1, $2), ($3, $4)...
------------------

Actually, I think if the (number of columns) * (number of rows) > 65535, then we can
get this error. But, I think almost no one will set such a large value, so I think adjust the
batch_size automatically when doing INSERT seems an acceptable solution.

In the postgresGetForeignModifyBatchSize(), if we found the (num of param) * batch_size
Is greater than the limit(65535), then we set it to 65535 / (num of param).

Thoughts ?

Best regards,
houzj

Attachments:

0001-limit-the-fdw-batch-size.patchapplication/octet-stream; name=0001-limit-the-fdw-batch-size.patchDownload+14-4
0002-doc-note.patchapplication/octet-stream; name=0002-doc-note.patchDownload+3-2
#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#1)
Re: Fdw batch insert error out when set batch_size > 65535

On Fri, May 21, 2021 at 8:18 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Hi,

When reading the code, I noticed some possible issue about fdw batch insert.
When I set batch_size > 65535 and insert more than 65535 rows into foreign table,
It will throw an error:

For example:

------------------
CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar)
SERVER omega_db
OPTIONS (table_name 'tabulka', batch_size '65536');

INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i FROM generate_series(1,65536) g(i);

ERROR: number of parameters must be between 0 and 65535
CONTEXT: remote SQL command: INSERT INTO public.tabulka(a, b) VALUES ($1, $2), ($3, $4)...
------------------

Actually, I think if the (number of columns) * (number of rows) > 65535, then we can
get this error. But, I think almost no one will set such a large value, so I think adjust the
batch_size automatically when doing INSERT seems an acceptable solution.

In the postgresGetForeignModifyBatchSize(), if we found the (num of param) * batch_size
Is greater than the limit(65535), then we set it to 65535 / (num of param).

Thoughts ?

+1 to limit batch_size for postgres_fdw and it's a good idea to have a
macro for the max params.

Few comments:
1) How about using macro in the error message, something like below?
        appendPQExpBuffer(errorMessage,
                          libpq_gettext("number of parameters must be
between 0 and %d\n"),
                          PQ_MAX_PARAM_NUMBER);
2) I think we can choose not mention the 65535 because it's hard to
maintain if that's changed in libpq protocol sometime in future. How
about
"The final number of rows postgres_fdw inserts in a batch actually
depends on the number of columns and the provided batch_size value.
This is because of the limit the libpq protocol (which postgres_fdw
uses to connect to a remote server) has on the number of query
parameters that can be specified per query. For instance, if the
number of columns * batch_size is more than the limit, then the libpq
emits an error. But postgres_fdw adjusts the batch_size to avoid this
error."
instead of
+       overrides an option specified for the server. Note if the batch size
+       exceed the protocol limit (number of columns * batch_size > 65535),
+       then the actual batch size will be less than the specified batch_size.
3) I think "postgres_fdw should insert in each insert operation"
doesn't hold after this  patch. We can reword it to "postgres_fdw
inserts in each insert operation".
       This option specifies the number of rows
<filename>postgres_fdw</filename>
       should insert in each insert operation. It can be specified for a
4) How about naming the macro as PQ_QUERY_PARAM_MAX_LIMIT?
5) We can use macro in code comments as well.
+     * 65535, so set the batch_size to not exceed limit in a batch insert.
6) I think both code and docs patches can be combined into a single patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#3Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#2)
RE: Fdw batch insert error out when set batch_size > 65535

From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Friday, May 21, 2021 1:42 PM

On Fri, May 21, 2021 at 8:18 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Actually, I think if the (number of columns) * (number of rows) >
65535, then we can get this error. But, I think almost no one will set
such a large value, so I think adjust the batch_size automatically when doing

INSERT seems an acceptable solution.

In the postgresGetForeignModifyBatchSize(), if we found the (num of
param) * batch_size Is greater than the limit(65535), then we set it to 65535 /

(num of param).

Thoughts ?

+1 to limit batch_size for postgres_fdw and it's a good idea to have a
macro for the max params.

Few comments:

Thanks for the comments.

1) How about using macro in the error message, something like below?
appendPQExpBuffer(errorMessage,
libpq_gettext("number of parameters must be
between 0 and %d\n"),
PQ_MAX_PARAM_NUMBER);

Changed.

2) I think we can choose not mention the 65535 because it's hard to maintain if
that's changed in libpq protocol sometime in future. How about "The final
number of rows postgres_fdw inserts in a batch actually depends on the
number of columns and the provided batch_size value.
This is because of the limit the libpq protocol (which postgres_fdw uses to
connect to a remote server) has on the number of query parameters that can
be specified per query. For instance, if the number of columns * batch_size is
more than the limit, then the libpq emits an error. But postgres_fdw adjusts the
batch_size to avoid this error."
instead of
+       overrides an option specified for the server. Note if the batch size
+       exceed the protocol limit (number of columns * batch_size > 65535),
+       then the actual batch size will be less than the specified batch_size.

Thanks, your description looks better. Changed.

3) I think "postgres_fdw should insert in each insert operation"
doesn't hold after this patch. We can reword it to "postgres_fdw inserts in
each insert operation".
This option specifies the number of rows
<filename>postgres_fdw</filename>
should insert in each insert operation. It can be specified for a

Changed.

4) How about naming the macro as PQ_QUERY_PARAM_MAX_LIMIT?

Changed.

5) We can use macro in code comments as well.

Thanks, I reorganized the code comments.

+ * 65535, so set the batch_size to not exceed limit in a batch insert.
6) I think both code and docs patches can be combined into a single patch.

Combined.

Attaching V2 patch. Please consider it for further review.

Best regards,
houzj

Attachments:

v2-0001-limit-the-fdw-batch-size.patchapplication/octet-stream; name=v2-0001-limit-the-fdw-batch-size.patchDownload+33-13
#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#3)
Re: Fdw batch insert error out when set batch_size > 65535

On Fri, May 21, 2021 at 1:19 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attaching V2 patch. Please consider it for further review.

Thanks for the patch. Some more comments:

1) Can fmstate->p_nums ever be 0 in postgresGetForeignModifyBatchSize?
By any chance, if it can, I think instead of an assert, we can have
something like below, since we are using it in the division.
    if (fmstate->p_nums > 0 &&
       (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT))
    {
        batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums;
    }
Also, please remove the { and } for the above if condition, because
for 1 line statements we don't normally use { and }
2) An empty line after the macro definition will be good.
+#define PQ_QUERY_PARAM_MAX_LIMIT 65535
 extern int    PQsendQuery(PGconn *conn, const char *query);
3) I think we use:
<filename>postgres_fdw</filename> not postgres_fdw
<literal>batch_size</literal> not batch_size
the number of columns * <literal>batch_size</literal> not the number
of columns * batch_size
+       overrides an option specified for the server. Note the final number
+       of rows postgres_fdw inserts in a batch actually depends on the
+       number of columns and the provided batch_size value. This is because
+       of the limit the libpq protocol (which postgres_fdw uses to connect
+       to a remote server) has on the number of query parameters that can
+       be specified per query. For instance, if the number of columns
* batch_size
+       is more than the limit, then the libpq emits an error. But postgres_fdw
+       adjusts the batch_size to avoid this error.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Bharath Rupireddy (#4)
Re: Fdw batch insert error out when set batch_size > 65535

On 5/21/21 5:03 AM, Bharath Rupireddy wrote:

On Fri, May 21, 2021 at 1:19 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attaching V2 patch. Please consider it for further review.

Thanks for the patch. Some more comments:

1) Can fmstate->p_nums ever be 0 in postgresGetForeignModifyBatchSize?
By any chance, if it can, I think instead of an assert, we can have
something like below, since we are using it in the division.
if (fmstate->p_nums > 0 &&
(batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT))
{
batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums;
}
Also, please remove the { and } for the above if condition, because
for 1 line statements we don't normally use { and }

We used to filter that out in pgindent IIRC but we don't any more.
IMNSHO there are cases when it makes the code more readable, especially
if (as here) the condition spans more than one line. I also personally
dislike having one branch of an "if" statement with braces and another
without - it looks far better to my eyes to have all or none with
braces. But I realize that's a matter of taste, and there are plenty of
examples in the code running counter to my taste here.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#6Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#4)
RE: Fdw batch insert error out when set batch_size > 65535

From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Friday, May 21, 2021 5:03 PM

On Fri, May 21, 2021 at 1:19 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attaching V2 patch. Please consider it for further review.

Thanks for the patch. Some more comments:

1) Can fmstate->p_nums ever be 0 in postgresGetForeignModifyBatchSize?
By any chance, if it can, I think instead of an assert, we can have something like
below, since we are using it in the division.
if (fmstate->p_nums > 0 &&
(batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT))
{
batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums;
}
Also, please remove the { and } for the above if condition, because for 1 line
statements we don't normally use { and }
2) An empty line after the macro definition will be good.
+#define PQ_QUERY_PARAM_MAX_LIMIT 65535
extern int    PQsendQuery(PGconn *conn, const char *query);
3) I think we use:
<filename>postgres_fdw</filename> not postgres_fdw
<literal>batch_size</literal> not batch_size the number of columns *
<literal>batch_size</literal> not the number of columns * batch_size
+       overrides an option specified for the server. Note the final number
+       of rows postgres_fdw inserts in a batch actually depends on the
+       number of columns and the provided batch_size value. This is because
+       of the limit the libpq protocol (which postgres_fdw uses to connect
+       to a remote server) has on the number of query parameters that can
+       be specified per query. For instance, if the number of columns
* batch_size
+       is more than the limit, then the libpq emits an error. But postgres_fdw
+       adjusts the batch_size to avoid this error.

Thanks for the comments. I have addressed all comments to the v3 patch.
BTW, Is the batch_size issue here an Open Item of PG14 ?

Best regards,
houzj

Attachments:

v3-0001-limit-the-fdw-batch-size.patchapplication/octet-stream; name=v3-0001-limit-the-fdw-batch-size.patchDownload+34-13
#7Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#6)
Re: Fdw batch insert error out when set batch_size > 65535

On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments. I have addressed all comments to the v3 patch.

Thanks! The patch basically looks good to me except that it is missing
a commit message. I think it can be added now.

BTW, Is the batch_size issue here an Open Item of PG14 ?

IMO, the issue you found when setting batch_size to a too high value
is an extreme case testing of the feature added by commit b663a4136
that introduced the batch_size parameter. So, it's a bug to me. I
think you can add it as a bug in the commitfest and let the committers
take the call.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#7)
Re: Fdw batch insert error out when set batch_size > 65535

On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments. I have addressed all comments to the v3 patch.

Thanks! The patch basically looks good to me except that it is missing
a commit message. I think it can be added now.

With v3 patch, I observed failure in postgres_fdw test cases with
insert query in prepared statements. Root cause is that in
postgresGetForeignModifyBatchSize, fmstate can be null (see the
existing code which handles for fmstate null cases). I fixed this, and
added a commit message. PSA v4 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0001-Adjust-batch_size-to-not-exceed-max-param-limit-o.patchapplication/octet-stream; name=v4-0001-Adjust-batch_size-to-not-exceed-max-param-limit-o.patchDownload+34-13
#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bharath Rupireddy (#8)
Re: Fdw batch insert error out when set batch_size > 65535

On 5/26/21 8:57 AM, Bharath Rupireddy wrote:

On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments. I have addressed all comments to the v3 patch.

Thanks! The patch basically looks good to me except that it is missing
a commit message. I think it can be added now.

With v3 patch, I observed failure in postgres_fdw test cases with
insert query in prepared statements. Root cause is that in
postgresGetForeignModifyBatchSize, fmstate can be null (see the
existing code which handles for fmstate null cases). I fixed this, and
added a commit message. PSA v4 patch.

Thanks. In what situation is the fmstate NULL? If it is NULL, the
current code simply skips the line adjusting it. Doesn't that mean we
may not actually fix the bug in that case?

Also, I think it'd be keep the existing comment, probably as the first
line of the new comment block.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tomas Vondra (#9)
Re: Fdw batch insert error out when set batch_size > 65535

On Wed, May 26, 2021 at 6:36 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/26/21 8:57 AM, Bharath Rupireddy wrote:

On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments. I have addressed all comments to the v3 patch.

Thanks! The patch basically looks good to me except that it is missing
a commit message. I think it can be added now.

With v3 patch, I observed failure in postgres_fdw test cases with
insert query in prepared statements. Root cause is that in
postgresGetForeignModifyBatchSize, fmstate can be null (see the
existing code which handles for fmstate null cases). I fixed this, and
added a commit message. PSA v4 patch.

Thanks. In what situation is the fmstate NULL? If it is NULL, the
current code simply skips the line adjusting it. Doesn't that mean we
may not actually fix the bug in that case?

fmstate i.e. resultRelInfo->ri_FdwState is NULL for EXPLAIN without
ANALYZE cases, below comment says it and we can't get the bug because
we don't actually execute the insert statement. The bug occurs on the
remote server when the insert query with those many query parameters
is submitted to the remote server. I'm not sure if there are any other
cases where it can be NULL.
/*
* In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
* the option directly in server/table options. Otherwise just use the
* value we determined earlier.
*/
if (fmstate)
batch_size = fmstate->batch_size;
else
batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);

Also, I think it'd be keep the existing comment, probably as the first
line of the new comment block.

Do you mean to say we need to retain "/* Otherwise use the batch size
specified for server/table. */"? If so, PSA v5.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0001-Adjust-batch_size-to-not-exceed-max-param-limit-o.patchapplication/octet-stream; name=v5-0001-Adjust-batch_size-to-not-exceed-max-param-limit-o.patchDownload+35-14
#11Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#10)
RE: Fdw batch insert error out when set batch_size > 65535

From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Wednesday, May 26, 2021 9:56 PM

On Wed, May 26, 2021 at 6:36 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/26/21 8:57 AM, Bharath Rupireddy wrote:

On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments. I have addressed all comments to the v3

patch.

Thanks! The patch basically looks good to me except that it is
missing a commit message. I think it can be added now.

With v3 patch, I observed failure in postgres_fdw test cases with
insert query in prepared statements. Root cause is that in
postgresGetForeignModifyBatchSize, fmstate can be null (see the
existing code which handles for fmstate null cases). I fixed this,
and added a commit message. PSA v4 patch.

Thanks. In what situation is the fmstate NULL? If it is NULL, the
current code simply skips the line adjusting it. Doesn't that mean we
may not actually fix the bug in that case?

fmstate i.e. resultRelInfo->ri_FdwState is NULL for EXPLAIN without ANALYZE
cases, below comment says it and we can't get the bug because we don't
actually execute the insert statement. The bug occurs on the remote server
when the insert query with those many query parameters is submitted to the
remote server.

Agreed.
The "ri_FdwState" is initialized in postgresBeginForeignInsert or postgresBeginForeignModify.
I think the above functions are always invoked before getting the batch_size.

Only in EXPLAIN mode, it will not initialize the ri_FdwState.

/*
* Do nothing in EXPLAIN (no ANALYZE) case. resultRelInfo->ri_FdwState
* stays NULL.
*/
if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
return;

Best regards,
houzj

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhijie Hou (Fujitsu) (#11)
Re: Fdw batch insert error out when set batch_size > 65535

Hi,

I took at this patch today. I did some minor changes, mostly:

1) change the code limiting batch_size from

if (fmstate->p_nums > 0 &&
(batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT))
{
batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums;
}

to

if (fmstate && fmstate->p_nums > 0)
batch_size = Min(batch_size,
PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums);

which I think is somewhat clearer / more common patter.

2) I've reworded the docs a bit, splitting the single para into two. I
think this makes it clearer.

Attached is a patch doing this. Please check the commit message etc.
Barring objections I'll get it committed in a couple days.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Make-sure-postgres_fdw-batching-does-use-too-many-pa.patchtext/x-patch; charset=UTF-8; name=0001-Make-sure-postgres_fdw-batching-does-use-too-many-pa.patchDownload+34-12
#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tomas Vondra (#12)
Re: Fdw batch insert error out when set batch_size > 65535

On Mon, May 31, 2021 at 1:21 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

I took at this patch today. I did some minor changes, mostly:

1) change the code limiting batch_size from

if (fmstate->p_nums > 0 &&
(batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT))
{
batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums;
}

to

if (fmstate && fmstate->p_nums > 0)
batch_size = Min(batch_size,
PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums);

which I think is somewhat clearer / more common patter.

Agree, that looks pretty good.

2) I've reworded the docs a bit, splitting the single para into two. I
think this makes it clearer.

LGTM, except one thing that the batch_size description says "should
insert in", but it's not that the value entered for batch_size is
always honoured right? Because this patch might it.

This option specifies the number of rows
<filename>postgres_fdw</filename>
should insert in each insert operation. It can be specified for a

So, I suggest to remove "should" and change it to:

This option specifies the number of rows
<filename>postgres_fdw</filename>
inserts in each insert operation. It can be specified for a

Attached is a patch doing this. Please check the commit message etc.
Barring objections I'll get it committed in a couple days.

One minor comment:
In the commit message, Int16 is used
The FE/BE protocol identifies parameters with an Int16 index, which
limits the maximum number of parameters per query to 65535. With

and in the code comments uint16 is used.
+ * parameters in a batch is limited to 64k (uint16), so make sure we don't

Isn't it uint16 in the commit message too? Also, can we use 64k in the
commit message instead of 65535?

With Regards,
Bharath Rupireddy.

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bharath Rupireddy (#13)
Re: Fdw batch insert error out when set batch_size > 65535

On 5/31/21 6:01 AM, Bharath Rupireddy wrote:

On Mon, May 31, 2021 at 1:21 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

I took at this patch today. I did some minor changes, mostly:

1) change the code limiting batch_size from

if (fmstate->p_nums > 0 &&
(batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT))
{
batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums;
}

to

if (fmstate && fmstate->p_nums > 0)
batch_size = Min(batch_size,
PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums);

which I think is somewhat clearer / more common patter.

Agree, that looks pretty good.

2) I've reworded the docs a bit, splitting the single para into two. I
think this makes it clearer.

LGTM, except one thing that the batch_size description says "should
insert in", but it's not that the value entered for batch_size is
always honoured right? Because this patch might it.

This option specifies the number of rows
<filename>postgres_fdw</filename>
should insert in each insert operation. It can be specified for a

So, I suggest to remove "should" and change it to:

This option specifies the number of rows
<filename>postgres_fdw</filename>
inserts in each insert operation. It can be specified for a

I think the "should" indicates exactly that postgres_fdw may adjust the
batch size. Without it it sounds much more definitive, so I kept it.

Attached is a patch doing this. Please check the commit message etc.
Barring objections I'll get it committed in a couple days.

One minor comment:
In the commit message, Int16 is used
The FE/BE protocol identifies parameters with an Int16 index, which
limits the maximum number of parameters per query to 65535. With

and in the code comments uint16 is used.
+ * parameters in a batch is limited to 64k (uint16), so make sure we don't

Isn't it uint16 in the commit message too? Also, can we use 64k in the
commit message instead of 65535?

No, the "Int16" refers to the FE/BE documentation, where we use Int16.
But in the C code we interpret it as uint16.

I've added a simple regression test to postgres_fdw, testing that batch
sizes > 65535 work fine, and pushed the fix.

I've considered checking the value in postgres_fdw_validator and just
rejecting anything over 65535, but I've decided against that. We'd still
need to adjust depending on number of columns anyway.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tomas Vondra (#14)
Re: Fdw batch insert error out when set batch_size > 65535

On Wed, Jun 9, 2021 at 12:04 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

No, the "Int16" refers to the FE/BE documentation, where we use Int16.
But in the C code we interpret it as uint16.

Hm. I see that in protocol.sgml Int16 is being used.

I've added a simple regression test to postgres_fdw, testing that batch
sizes > 65535 work fine, and pushed the fix.

I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.

With Regards,
Bharath Rupireddy.

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bharath Rupireddy (#15)
Re: Fdw batch insert error out when set batch_size > 65535

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

I've added a simple regression test to postgres_fdw, testing that batch
sizes > 65535 work fine, and pushed the fix.

I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.

The cost, versus the odds of ever detecting a problem, doesn't
seem like a good tradeoff.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#16)
Re: Fdw batch insert error out when set batch_size > 65535

I wrote:

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

I've added a simple regression test to postgres_fdw, testing that batch
sizes > 65535 work fine, and pushed the fix.

I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.

The cost, versus the odds of ever detecting a problem, doesn't
seem like a good tradeoff.

I took a quick look and noted that on buildfarm member longfin
(to take a random example that's sitting a few feet from me),
the time for contrib-install-check went from 34 seconds before
this patch to 40 seconds after. I find that completely
unacceptable compared to the likely value of this test case.

regards, tom lane

#18Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#17)
Re: Fdw batch insert error out when set batch_size > 65535

On 6/9/21 8:28 AM, Tom Lane wrote:

I wrote:

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

I've added a simple regression test to postgres_fdw, testing that batch
sizes > 65535 work fine, and pushed the fix.

I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.

The cost, versus the odds of ever detecting a problem, doesn't
seem like a good tradeoff.

I took a quick look and noted that on buildfarm member longfin
(to take a random example that's sitting a few feet from me),
the time for contrib-install-check went from 34 seconds before
this patch to 40 seconds after. I find that completely
unacceptable compared to the likely value of this test case.

Note that the problem here is [1] - we're creating a lot of slots
referencing the same tuple descriptor, which inflates the duration.
There's a fix in the other thread, which eliminates ~99% of the
overhead. I plan to push that fix soon (a day or two).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#18)
Re: Fdw batch insert error out when set batch_size > 65535

On 6/9/21 12:22 PM, Tomas Vondra wrote:

On 6/9/21 8:28 AM, Tom Lane wrote:

I wrote:

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

I've added a simple regression test to postgres_fdw, testing that
batch
sizes > 65535 work fine, and pushed the fix.

I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.

The cost, versus the odds of ever detecting a problem, doesn't
seem like a good tradeoff.

I took a quick look and noted that on buildfarm member longfin
(to take a random example that's sitting a few feet from me),
the time for contrib-install-check went from 34 seconds before
this patch to 40 seconds after.  I find that completely
unacceptable compared to the likely value of this test case.

Note that the problem here is [1] - we're creating a lot of slots
referencing the same tuple descriptor, which inflates the duration.
There's a fix in the other thread, which eliminates ~99% of the
overhead. I plan to push that fix soon (a day or two).

Forgot to add the link:

[1]: /messages/by-id/ebbbcc7d-4286-8c28-0272-61b4753af761@enterprisedb.com
/messages/by-id/ebbbcc7d-4286-8c28-0272-61b4753af761@enterprisedb.com

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#18)
Re: Fdw batch insert error out when set batch_size > 65535

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

Note that the problem here is [1] - we're creating a lot of slots
referencing the same tuple descriptor, which inflates the duration.
There's a fix in the other thread, which eliminates ~99% of the
overhead. I plan to push that fix soon (a day or two).

Oh, okay, as long as there's a plan to bring the time back down.

regards, tom lane

#21Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#22)
#25Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#24)
#26Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#24)
#27Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bharath Rupireddy (#26)
#28Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tomas Vondra (#27)
#29Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#30)
#32Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#31)
#33Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#32)