pg_dump multi VALUES INSERT

Started by Surafel Temesgenover 7 years ago66 messageshackers
Jump to latest
#1Surafel Temesgen
surafel3000@gmail.com

Hello,

According to the documentation –inserts option is mainly useful for making
dumps that can be loaded into non-PostgreSQL databases and to reduce the
amount of rows that might lost during error in reloading but multi values
insert command are equally portable and compact and also faster to reload
than single row statement. I think it deserve an option of its own

The patch attached add additional option for multi values insert statement
with a default values of 100 row per statement so the row lose during error
is at most 100 rather than entire table.

Comments?

Regards

Surafel

Attachments:

multi_values_inserts_dum_v1.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v1.patchDownload+252-12
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Surafel Temesgen (#1)
Re: pg_dump multi VALUES INSERT

Surafel Temesgen <surafel3000@gmail.com> writes:

According to the documentation –inserts option is mainly useful for making
dumps that can be loaded into non-PostgreSQL databases and to reduce the
amount of rows that might lost during error in reloading but multi values
insert command are equally portable and compact and also faster to reload
than single row statement. I think it deserve an option of its own

I don't actually see the point of this. If you want dump/reload speed
you should be using COPY. If that isn't your first priority, it's
unlikely that using an option like this would be a good idea. It makes
the effects of a bad row much harder to predict, and it increases your
odds of OOM problems with wide rows substantially.

I grant that COPY might not be an option if you're trying to transfer
data to a different DBMS, but the other problems seem likely to apply
anywhere. The bad-data hazard, in particular, is probably a much larger
concern than it is for Postgres-to-Postgres transfers.

regards, tom lane

#3Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#2)
Re: pg_dump multi VALUES INSERT

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Surafel Temesgen <surafel3000@gmail.com> writes:

According to the documentation –inserts option is mainly useful for making
dumps that can be loaded into non-PostgreSQL databases and to reduce the
amount of rows that might lost during error in reloading but multi values
insert command are equally portable and compact and also faster to reload
than single row statement. I think it deserve an option of its own

I don't actually see the point of this. If you want dump/reload speed
you should be using COPY. If that isn't your first priority, it's
unlikely that using an option like this would be a good idea. It makes
the effects of a bad row much harder to predict, and it increases your
odds of OOM problems with wide rows substantially.

I grant that COPY might not be an option if you're trying to transfer
data to a different DBMS, but the other problems seem likely to apply
anywhere. The bad-data hazard, in particular, is probably a much larger
concern than it is for Postgres-to-Postgres transfers.

I can't say that I can really buy off on this argument.

The point of it is that it makes loading into other RDBMS faster. Yes,
it has many of the same issues as our COPY does, but we support it
because it's much faster. The same is true here, just for other
databases, so I'm +1 on the general idea.

I've not looked at the patch itself at all, yet.

Thanks!

Stephen

#4Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Surafel Temesgen (#1)
Re: pg_dump multi VALUES INSERT

The patch attached add additional option for multi values insert statement
with a default values of 100 row per statement so the row lose during error
is at most 100 rather than entire table.

Patch does not seem to apply anymore, could you rebase?

--
Fabien.

#5Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#3)
Re: pg_dump multi VALUES INSERT

On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote:

The point of it is that it makes loading into other RDBMS faster. Yes,
it has many of the same issues as our COPY does, but we support it
because it's much faster. The same is true here, just for other
databases, so I'm +1 on the general idea.

Well, the patch author has mentioned that he cares about also being able
to detect errors when processing the dump, which multi inserts make that
easier to check for. However, even if you specify --data-only you still
need to worry about the first SET commands ahead, which requires manual
handling of the dump...

I am honestly not convinced that it is worth complicating pg_dump for
that, as there is no guarantee either that the DDLs generated by pg_dump
will be compatible with what other systems expect. This kind of
compatibility for fetch and reload can also be kind of tricky with
portability issues, so I'd rather let this stuff being handled correctly
by other tools like pgloader or others rather than expecting to get this
stuff half-baked within Postgres core tools.
--
Michael

#6Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#5)
Re: pg_dump multi VALUES INSERT

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote:

The point of it is that it makes loading into other RDBMS faster. Yes,
it has many of the same issues as our COPY does, but we support it
because it's much faster. The same is true here, just for other
databases, so I'm +1 on the general idea.

Well, the patch author has mentioned that he cares about also being able
to detect errors when processing the dump, which multi inserts make that
easier to check for. However, even if you specify --data-only you still
need to worry about the first SET commands ahead, which requires manual
handling of the dump...

That's hardly a serious complication..

I am honestly not convinced that it is worth complicating pg_dump for
that, as there is no guarantee either that the DDLs generated by pg_dump
will be compatible with what other systems expect. This kind of
compatibility for fetch and reload can also be kind of tricky with
portability issues, so I'd rather let this stuff being handled correctly
by other tools like pgloader or others rather than expecting to get this
stuff half-baked within Postgres core tools.

I can see an argument for not wanting to complicate pg_dump, but we've
explicitly stated that the purpose of --inserts is to facilitate
restoring into other database systems and I don't agree that we should
just punt on that entirely. For better or worse, there's an awful lot
of weight put behind things which are in core and we should take care to
do what we can to make those things better, especially when someone is
proposing a patch to improve the situation.

Sure, the patch might need work or have other issues, but that's an
opportunity for us to provide feedback to the author and encourage them
to improve the patch.

As for the other things that make it difficult to use pg_dump to get a
result out that can be loaded into other database systems- let's try to
improve on that too. Having an option to skip the 'setup' bits, such as
the SET commands, certainly wouldn't be hard.

I certainly don't see us adding code to pg_dump to handle 'fetch and
reload' into some non-PG system, or, really, even into a PG system, and
that certainly isn't something the patch does, so I don't think it's a
particularly interesting argument. Users can handle that as needed
themselves.

In other words, none of the arguments put forth really seem to be a
reason to reject the general idea of this patch, so I'm still +1 on
that. Having just glanced over the patch quickly, I think I would have
done something like '--inserts=100' as the way to enable it instead of
adding a new option though. Not that I feel very strongly about it.

Thanks!

Stephen

#7Surafel Temesgen
surafel3000@gmail.com
In reply to: Fabien COELHO (#4)
Re: pg_dump multi VALUES INSERT

hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Patch does not seem to apply anymore, could you rebase?

The attached patch is a rebased version and work by ‘inserts=100’ as
Stephen suggest

regards
Surafel

Attachments:

multi_values_inserts_dum_v2.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v2.patchDownload+249-10
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Surafel Temesgen (#7)
Re: pg_dump multi VALUES INSERT

On 2018-Nov-06, Surafel Temesgen wrote:

hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Patch does not seem to apply anymore, could you rebase?

The attached patch is a rebased version and work by ‘inserts=100’ as
Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100. It shouldn't take much more code to handle
it that way, which makes more sense to me.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#8)
Re: pg_dump multi VALUES INSERT

út 6. 11. 2018 v 18:18 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com>
napsal:

On 2018-Nov-06, Surafel Temesgen wrote:

hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr>

wrote:

Patch does not seem to apply anymore, could you rebase?

The attached patch is a rebased version and work by ‘inserts=100’ as
Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100. It shouldn't take much more code to handle
it that way, which makes more sense to me.

+1

100 looks really strange

Pavel

--

Show quoted text

Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Surafel Temesgen
surafel3000@gmail.com
In reply to: Alvaro Herrera (#8)
Re: pg_dump multi VALUES INSERT

On Tue, Nov 6, 2018 at 8:18 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2018-Nov-06, Surafel Temesgen wrote:

hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <coelho@cri.ensmp.fr>

wrote:

Patch does not seem to apply anymore, could you rebase?

The attached patch is a rebased version and work by ‘inserts=100’ as
Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.

ok

It shouldn't take much more code to handle
it that way, which makes more sense to me

yes its not much line of code. Attach is a patch that optionally accept the
number of row in a single insert statement and if it is not specified one
row per statement used

regards

Surafel

Attachments:

multi_values_inserts_dum_v3.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v3.patchDownload+219-5
#11Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Surafel Temesgen (#10)
Re: pg_dump multi VALUES INSERT

On Thu, Nov 8, 2018 at 2:03 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert statement and if it is not specified one row per statement used

Hi,

Unfortunately, patch needs to be rebased, could you please post an updated
version?

#12Surafel Temesgen
surafel3000@gmail.com
In reply to: Dmitry Dolgov (#11)
Re: pg_dump multi VALUES INSERT

On Fri, Nov 30, 2018 at 7:16 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Unfortunately, patch needs to be rebased, could you please post an updated
version?

Thank you for informing, Here is an updated patch against current master
Regards
Surafel

Attachments:

multi_values_inserts_dum_v4.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v4.patchDownload+218-4
#13Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Surafel Temesgen (#12)
Re: pg_dump multi VALUES INSERT

Hello Surafel,

Thank you for informing, Here is an updated patch against current master

Patch applies cleanly, compiles, "make check" is okay, but given that the
feature is not tested...

Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided:
This feature is seldom used (hmmm... 2 existing instances), because it
interferes with argument processing if such switches are used as the last
one. It is only okay with commands which do not expect arguments. For
backward compatibility, this suggests to add another switch, eg
--insert-multi=100 or whatever, which would possibly default to 100. The
alternative is to break compatibility with adding a mandatory argument,
but I guess it would not be admissible to committers.

Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to
avoid it and use some stricter option and error out on malformed integers.

The --help output does not document the --inserts argument, nor the
documentation.

There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding
function, I'd suggest to simply extend it so that it takes into account
the "multi insert" setting and default to the previous behavior if not
set.

--
Fabien.

#14Surafel Temesgen
surafel3000@gmail.com
In reply to: Fabien COELHO (#13)
Re: pg_dump multi VALUES INSERT

On Tue, Dec 25, 2018 at 2:47 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Thank you for looking into it

Hello Surafel,

Thank you for informing, Here is an updated patch against current master

Patch applies cleanly, compiles, "make check" is okay, but given that the
feature is not tested...

Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided:
This feature is seldom used (hmmm... 2 existing instances), because it
interferes with argument processing if such switches are used as the last
one. It is only okay with commands which do not expect arguments. For
backward compatibility, this suggests to add another switch, eg
--insert-multi=100 or whatever, which would possibly default to 100. The
alternative is to break compatibility with adding a mandatory argument,
but I guess it would not be admissible to committers.

Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to
avoid it and use some stricter option and error out on malformed integers.

The --help output does not document the --inserts argument, nor the
documentation.

done

There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding
function, I'd suggest to simply extend it so that it takes into account
the "multi insert" setting and default to the previous behavior if not
set.

At first i also try to do it like that but it seems the function will
became long and more complex to me

Regards

Surafel

Attachments:

multi_values_inserts_dum_v5.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v5.patchDownload+238-5
#15Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Surafel Temesgen (#14)
Re: pg_dump multi VALUES INSERT

At first i also try to do it like that but it seems the function will
became long and more complex to me

Probably. But calling it with size 100 should result in the same behavior,
so it is really just an extension of the preceeding one? Or am I missing
something?

--
Fabien.

#16Surafel Temesgen
surafel3000@gmail.com
In reply to: Fabien COELHO (#15)
Re: pg_dump multi VALUES INSERT

On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

At first i also try to do it like that but it seems the function will
became long and more complex to me

Probably. But calling it with size 100 should result in the same behavior,
so it is really just an extension of the preceeding one? Or am I missing
something?

Specifying table data using single value insert statement and user
specified values insert statement
have enough deference that demand to be separate function and they are not
the same thing that should implement
with the same function. Regarding code duplication i think the solution is
making those code separate function
and call at appropriate place.

Regards
Surafel

#17David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#14)
Re: pg_dump multi VALUES INSERT

Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR: syntax error at or near ")"
LINE 1: );
^

I'm not aware of a valid way to insert multiple 0 column rows in a
single INSERT statement, so not sure how you're going to handle that
case.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#17)
Re: pg_dump multi VALUES INSERT

Hi,
Thank you for looking at it
On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com>
wrote:

Just looking at the v5 patch, it seems not to handle 0 column tables
correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR: syntax error at or near ")"
LINE 1: );
^

The attach patch contain a fix for it
Regards
Surafel

Attachments:

multi_values_inserts_dum_v6.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v6.patchDownload+244-5
#19David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#18)
Re: pg_dump multi VALUES INSERT

On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <surafel3000@gmail.com> wrote:

On Mon, Dec 31, 2018 at 12:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote:

Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.

I've not looked at the rest of the patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#19)
Re: pg_dump multi VALUES INSERT

On Thu, Jan 3, 2019 at 1:38 AM David Rowley <david.rowley@2ndquadrant.com>
wrote:

On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <surafel3000@gmail.com>
wrote:

On Mon, Dec 31, 2018 at 12:38 PM David Rowley <

david.rowley@2ndquadrant.com> wrote:

Just looking at the v5 patch, it seems not to handle 0 column tables

correctly.

The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.

right , attach patch add more detail comment

regards
Surafel

Attachments:

multi_values_inserts_dum_v6.patchtext/x-patch; charset=US-ASCII; name=multi_values_inserts_dum_v6.patchDownload+248-5
#21David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#16)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#21)
#23Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#21)
#24David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#23)
#25David G. Johnston
david.g.johnston@gmail.com
In reply to: Surafel Temesgen (#23)
#26Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#24)
#27David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#26)
#28Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#27)
#29David G. Johnston
david.g.johnston@gmail.com
In reply to: Surafel Temesgen (#28)
#30David G. Johnston
david.g.johnston@gmail.com
In reply to: Fabien COELHO (#13)
#31David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#28)
#32Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Surafel Temesgen (#32)
#34Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Surafel Temesgen (#32)
#35David Rowley
dgrowleyml@gmail.com
In reply to: Alvaro Herrera (#33)
#36David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#32)
#37Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Rowley (#36)
#38David Rowley
dgrowleyml@gmail.com
In reply to: Fabien COELHO (#37)
#39Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Rowley (#38)
#40David Rowley
dgrowleyml@gmail.com
In reply to: Fabien COELHO (#39)
#41Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#35)
#42David Rowley
dgrowleyml@gmail.com
In reply to: Fabien COELHO (#39)
#43David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#42)
#44Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Rowley (#43)
#45David Rowley
dgrowleyml@gmail.com
In reply to: Fabien COELHO (#44)
#46Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Rowley (#45)
#47David Rowley
dgrowleyml@gmail.com
In reply to: Fabien COELHO (#46)
#48Surafel Temesgen
surafel3000@gmail.com
In reply to: Fabien COELHO (#46)
#49Michael Paquier
michael@paquier.xyz
In reply to: Surafel Temesgen (#48)
#50Surafel Temesgen
surafel3000@gmail.com
In reply to: Fabien COELHO (#44)
#51David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#50)
#52Surafel Temesgen
surafel3000@gmail.com
In reply to: David Rowley (#51)
#53David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#52)
#54Surafel Temesgen
surafel3000@gmail.com
In reply to: Fabien COELHO (#44)
#55David Rowley
dgrowleyml@gmail.com
In reply to: Surafel Temesgen (#54)
#56David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#55)
#57Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Rowley (#56)
#58Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#57)
#59Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fabien COELHO (#57)
#60David Rowley
dgrowleyml@gmail.com
In reply to: Fabien COELHO (#57)
#61Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#60)
#62David Rowley
dgrowleyml@gmail.com
In reply to: Alvaro Herrera (#61)
#63Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#62)
#64Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#63)
#65Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Alvaro Herrera (#64)
#66Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fabien COELHO (#65)