Merging two database dumps

Started by Alex O'Reealmost 8 years ago8 messagesgeneral
Jump to latest
#1Alex O'Ree
spyhunter99@gmail.com

I have a situation with multiple postgres servers running all with the same
databases and table structure. I need to periodically export the data from
each of there then merge them all into a single server. On occasion, it's
feasible for the same record (primary key) to be stored in two or more
servers

I was using pgdump without the --insert option however I just noticed that
pgrestore will stop inserting into a table when the conflict occurs,
leaving me with an incomplete set.

Question is what are my other options to skip over the conflicting record
when merging?

From the docs, it appears that making dumps with the --insert option may be
the only way to go however performance is an issue. In this case would
dropping all indexes help?

#2Pavan Teja
pavan.postgresdba@gmail.com
In reply to: Alex O'Ree (#1)
Re: Merging two database dumps

Hi Alex,
For storing duplicate rows, dropping primary and unique indexes is the only
way.

One alternative is create a column with timestamp which updates on every
insert/update so that timestamp will be primary. Hope it helps.

Regards,
Pavan

On Wed, Jun 13, 2018, 4:47 PM Alex O'Ree <spyhunter99@gmail.com> wrote:

Show quoted text

I have a situation with multiple postgres servers running all with the
same databases and table structure. I need to periodically export the data
from each of there then merge them all into a single server. On occasion,
it's feasible for the same record (primary key) to be stored in two or more
servers

I was using pgdump without the --insert option however I just noticed that
pgrestore will stop inserting into a table when the conflict occurs,
leaving me with an incomplete set.

Question is what are my other options to skip over the conflicting record
when merging?

From the docs, it appears that making dumps with the --insert option may
be the only way to go however performance is an issue. In this case would
dropping all indexes help?

#3Vijaykumar Jain
vjain@opentable.com
In reply to: Alex O'Ree (#1)
Re: [External] Merging two database dumps

You can try one option, although just a thought in the air 😊
Use postgres FDW ex. https://robots.thoughtbot.com/postgres-foreign-data-wrapper

Create foreign tables in the relevant server schema

And then union/union all 😊 or your custom constraint on the destination table where you dump the rows.

For ex.
You have server1, server2, server3

And you have server4 as your new single server.

You create FDW of server1, server2, server3 on server 4 and then import table into respective server schema.

server1.table1, server2.table1, server3.table1

and then
insert into server4.table1 select * from( select * from server1.table1 union select * from server2.table1 union select * from server3.table1) a;

something 😊

Thanks,
Vijay

From: Alex O'Ree <spyhunter99@gmail.com>
Date: Wednesday, June 13, 2018 at 4:47 PM
To: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Subject: [External] Merging two database dumps

I have a situation with multiple postgres servers running all with the same databases and table structure. I need to periodically export the data from each of there then merge them all into a single server. On occasion, it's feasible for the same record (primary key) to be stored in two or more servers

I was using pgdump without the --insert option however I just noticed that pgrestore will stop inserting into a table when the conflict occurs, leaving me with an incomplete set.

Question is what are my other options to skip over the conflicting record when merging?

From the docs, it appears that making dumps with the --insert option may be the only way to go however performance is an issue. In this case would dropping all indexes help?

#4Andreas Kretschmer
andreas@a-kretschmer.de
In reply to: Alex O'Ree (#1)
Re: Merging two database dumps

Am 13.06.2018 um 13:17 schrieb Alex O'Ree:

I have a situation with multiple postgres servers running all with the
same databases and table structure. I need to periodically export the
data from each of there then merge them all into a single server. On 
occasion, it's feasible for the same record (primary key) to be stored
in two or more servers

what should happen in this case?

I was using pgdump without the --insert option however I just noticed
that pgrestore will stop inserting into a table when the conflict
occurs, leaving me with an incomplete set.

Other solution:

* create the tables on the destination server without the PK or with an
other, new PK (maybe SERIAL)
* use logical replication to replicate the table from all your
source-db's to the destination table, see more here:
https://www.2ndquadrant.com/en/resources/pglogical/

your problem seems as a typical task for logical replication to me. You
needs 9.4 at least.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

#5Pavan Teja
pavan.postgresdba@gmail.com
In reply to: Vijaykumar Jain (#3)
Re: [External] Merging two database dumps

Yes Vijay,

It might work, but I'm thinking it will be a performance overhead in case of
complex data.

Regards,
Pavan

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

#6Alex O'Ree
spyhunter99@gmail.com
In reply to: Andreas Kretschmer (#4)
Re: Merging two database dumps

Desired behavior is to just log the error and continue the import using
pgdump based copy commands

The servers are not on the same network. Sneaker net is the only way

On Wed, Jun 13, 2018, 7:42 AM Andreas Kretschmer <andreas@a-kretschmer.de>
wrote:

Show quoted text

Am 13.06.2018 um 13:17 schrieb Alex O'Ree:

I have a situation with multiple postgres servers running all with the
same databases and table structure. I need to periodically export the
data from each of there then merge them all into a single server. On
occasion, it's feasible for the same record (primary key) to be stored
in two or more servers

what should happen in this case?

I was using pgdump without the --insert option however I just noticed
that pgrestore will stop inserting into a table when the conflict
occurs, leaving me with an incomplete set.

Other solution:

* create the tables on the destination server without the PK or with an
other, new PK (maybe SERIAL)
* use logical replication to replicate the table from all your
source-db's to the destination table, see more here:
https://www.2ndquadrant.com/en/resources/pglogical/

your problem seems as a typical task for logical replication to me. You
needs 9.4 at least.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Alex O'Ree (#6)
Re: Merging two database dumps

On 06/13/2018 06:21 AM, Alex O'Ree wrote:

Desired behavior is to just log the error and continue the import using
pgdump based copy commands

Each COPY is atomic so if any part of it fails the whole thing fails, so
you will not be able to achieve what you want that way.

The servers are not on the same network.  Sneaker net is the only way

On Wed, Jun 13, 2018, 7:42 AM Andreas Kretschmer
<andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:

Am 13.06.2018 um 13:17 schrieb Alex O'Ree:

I have a situation with multiple postgres servers running all

with the

same databases and table structure. I need to periodically export

the

data from each of there then merge them all into a single server. On
occasion, it's feasible for the same record (primary key) to be

stored

in two or more servers

what should happen in this case?

I was using pgdump without the --insert option however I just

noticed

that pgrestore will stop inserting into a table when the conflict
occurs, leaving me with an incomplete set.

Other solution:

* create the tables on the destination server without the PK or with an
other, new PK (maybe SERIAL)
* use logical replication to replicate the table from all your
source-db's to the destination table, see more here:
https://www.2ndquadrant.com/en/resources/pglogical/

your problem seems as a typical task for logical replication to me. You
needs 9.4 at least.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com <http://www.2ndQuadrant.com&gt;

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Alex O'Ree
spyhunter99@gmail.com
In reply to: Adrian Klaver (#7)
Re: Merging two database dumps

Thanks for the clarification

On Wed, Jun 13, 2018 at 9:32 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 06/13/2018 06:21 AM, Alex O'Ree wrote:

Desired behavior is to just log the error and continue the import using
pgdump based copy commands

Each COPY is atomic so if any part of it fails the whole thing fails, so
you will not be able to achieve what you want that way.

The servers are not on the same network. Sneaker net is the only way

On Wed, Jun 13, 2018, 7:42 AM Andreas Kretschmer <andreas@a-kretschmer.de
<mailto:andreas@a-kretschmer.de>> wrote:

Am 13.06.2018 um 13:17 schrieb Alex O'Ree:

I have a situation with multiple postgres servers running all

with the

same databases and table structure. I need to periodically export

the

data from each of there then merge them all into a single server.

On

occasion, it's feasible for the same record (primary key) to be

stored

in two or more servers

what should happen in this case?

I was using pgdump without the --insert option however I just

noticed

that pgrestore will stop inserting into a table when the conflict
occurs, leaving me with an incomplete set.

Other solution:

* create the tables on the destination server without the PK or with
an
other, new PK (maybe SERIAL)
* use logical replication to replicate the table from all your
source-db's to the destination table, see more here:
https://www.2ndquadrant.com/en/resources/pglogical/

your problem seems as a typical task for logical replication to me.
You
needs 9.4 at least.

Regards, Andreas

-- 2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com <http://www.2ndQuadrant.com&gt;

--
Adrian Klaver
adrian.klaver@aklaver.com