Move rows from one database to other
I want to move table rows from one database to an central database.
Both run PostgreSQL.
My use case looks like this:
There are N satellite databases in different data centers. N is about 100 at the moment.
There is one central database.
I need a way to reliably move rows from the satellite databases to the central one
Example
The rows of host1 look like this:
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghi
The rows of host2 look like this:
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, blu
After syncing, all lines which were transferred should be deleted on the satellite databases.
The central table should look like this (it has the same schema)
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghi
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, blu
I don't want to code this myself, since there a tons of possible race conditions:
- inserts can happen during syncing.
- Network can break during syncing.
- inserts into the central table can break (e.g. disk full): No loss at the satellite database must happen.
- ...
How to solve this with PostgreSQL?
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
I want to move table rows from one database to an central database.
You actually talking about moving from ~100 databases to the central
database, correct?
Both run PostgreSQL.
Are all the Postgres instances the same version and what is the version
or versions?
My use case looks like this:
There are N satellite databases in different data centers. N is about
100 at the moment.There is one central database.
I need a way to reliably move rows from the satellite databases to the
central one
Two ways I can think of:
https://www.postgresql.org/docs/9.6/static/dblink.html
https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
Example
The rows of host1 look like this:
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghiThe rows of host2 look like this:
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, bluAfter syncing, all lines which were transferred should be deleted on the
satellite databases.The central table should look like this (it has the same schema)
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghi
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, blu
Is there a Primary Key on the satellite tables or some way of
determining unique rows?
Is there any existing overlap between the data in the central database
and the satellite databases?
I don't want to code this myself, since there a tons of possible race
conditions:
How much data are you talking about moving from each database?
How active are the satellite databases?
- inserts can happen during syncing.
Can UPDATEs happen?
- Network can break during syncing.
- inserts into the central table can break (e.g. disk full): No loss at
the satellite database must happen.
- ...How to solve this with PostgreSQL?
Regards,
Thomas Güttler
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
You can try OmniDB: http://www.omnidb.com.br/en_index.aspx
OmniDB has a Convert feature, where you can set a data transfer, even if
the target table exists.
Em ter, 21 de fev de 2017 às 11:18, Adrian Klaver <adrian.klaver@aklaver.com>
escreveu:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
I want to move table rows from one database to an central database.
You actually talking about moving from ~100 databases to the central
database, correct?Both run PostgreSQL.
Are all the Postgres instances the same version and what is the version
or versions?My use case looks like this:
There are N satellite databases in different data centers. N is about
100 at the moment.There is one central database.
I need a way to reliably move rows from the satellite databases to the
central oneTwo ways I can think of:
https://www.postgresql.org/docs/9.6/static/dblink.html
https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
Example
The rows of host1 look like this:
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghiThe rows of host2 look like this:
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, bluAfter syncing, all lines which were transferred should be deleted on the
satellite databases.The central table should look like this (it has the same schema)
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghi
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, bluIs there a Primary Key on the satellite tables or some way of
determining unique rows?Is there any existing overlap between the data in the central database
and the satellite databases?I don't want to code this myself, since there a tons of possible race
conditions:How much data are you talking about moving from each database?
How active are the satellite databases?
- inserts can happen during syncing.
Can UPDATEs happen?
- Network can break during syncing.
- inserts into the central table can break (e.g. disk full): No loss at
the satellite database must happen.
- ...How to solve this with PostgreSQL?
Regards,
Thomas Güttler--
Adrian Klaver
adrian.klaver@aklaver.com--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
William Ivanski - Microsoft MVP
On Tue, Feb 21, 2017 at 9:27 AM, William Ivanski <william.ivanski@gmail.com>
wrote:
You can try OmniDB: http://www.omnidb.com.br/en_index.aspx
OmniDB has a Convert feature, where you can set a data transfer, even if
the target table exists.Em ter, 21 de fev de 2017 às 11:18, Adrian Klaver <
adrian.klaver@aklaver.com> escreveu:On 02/21/2017 12:53 AM, Thomas Güttler wrote:
I want to move table rows from one database to an central database.
You actually talking about moving from ~100 databases to the central
database, correct?Both run PostgreSQL.
Are all the Postgres instances the same version and what is the version
or versions?My use case looks like this:
There are N satellite databases in different data centers. N is about
100 at the moment.There is one central database.
I need a way to reliably move rows from the satellite databases to the
central oneTwo ways I can think of:
https://www.postgresql.org/docs/9.6/static/dblink.html
https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
Example
The rows of host1 look like this:
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghiThe rows of host2 look like this:
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, bluAfter syncing, all lines which were transferred should be deleted on the
satellite databases.The central table should look like this (it has the same schema)
host1, 2017-02-21, abc
host1, 2017-02-20, def
host1, 2017-02-19, ghi
host2, 2017-02-21, foo
host2, 2017-02-20, bar
host2, 2017-02-19, bluIs there a Primary Key on the satellite tables or some way of
determining unique rows?Is there any existing overlap between the data in the central database
and the satellite databases?I don't want to code this myself, since there a tons of possible race
conditions:How much data are you talking about moving from each database?
How active are the satellite databases?
- inserts can happen during syncing.
Can UPDATEs happen?
- Network can break during syncing.
- inserts into the central table can break (e.g. disk full): No loss at
the satellite database must happen.
- ...How to solve this with PostgreSQL?
Regards,
Thomas Güttler--
Adrian Klaver
adrian.klaver@aklaver.com--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general--
William Ivanski - Microsoft MVP
Depending on how much data you want to move, and if the tables have the
same structure, you might also want to consider using
pg_dump -a
OR
multiple instances of
on satellite
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
on central
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
A BIG consideration is:
Does the Central DB have the same table structures as all satellite DB's?
Does the Central DB already have records in the tables.
Do all Satellite tables have unique records for each other?
As Adrian stated, it would be very helpful if you provided us with all
O/S and PostgreSQL vesions involved.
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Il 21/02/2017 15:38, Melvin Davidson ha
scritto:</div>
<blockquote
cite="mid:CANu8FixqdzCZP4h3fPTfWxuhmXCmyfpcV3YtRDre=OoqPE8KkA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><span
class="gmail-HOEnZb"><font color="#888888">
<div>
<div dir="ltr">
</div>
</div>
</font></span></blockquote>
</div>
Depending on how much data you want to move, and if the tables
have the same structure, you might also want to consider using<br>
</div>
<div class="gmail_extra">pg_dump -a<br>
<br>
</div>
<div class="gmail_extra">OR<br>
</div>
<div class="gmail_extra">multiple instances of <br>
<pre class="gmail-SYNOPSIS">on satellite
COPY { <tt class="gmail-REPLACEABLE gmail-c2">table_name</tt> [ ( <tt class="gmail-REPLACEABLE gmail-c2">column_name</tt> [, ...] ) ] | ( <tt class="gmail-REPLACEABLE gmail-c2">query</tt> ) }
TO { '<tt class="gmail-REPLACEABLE gmail-c2">filename</tt>' | PROGRAM '<tt class="gmail-REPLACEABLE gmail-c2">command</tt>' | STDOUT }
[ [ WITH ] ( <tt class="gmail-REPLACEABLE gmail-c2">option</tt> [, ...] ) ]
</pre>
<pre class="gmail-SYNOPSIS">on central
COPY <tt class="gmail-REPLACEABLE gmail-c2">table_name</tt> [ ( <tt class="gmail-REPLACEABLE gmail-c2">column_name</tt> [, ...] ) ]
FROM { '<tt class="gmail-REPLACEABLE gmail-c2">filename</tt>' | PROGRAM '<tt class="gmail-REPLACEABLE gmail-c2">command</tt>' | STDIN }
[ [ WITH ] ( <tt class="gmail-REPLACEABLE gmail-c2">option</tt> [, ...] ) ]
A BIG consideration is:
Does the Central DB have the same table structures as all satellite DB's?
Does the Central DB already have records in the tables.
Do all Satellite tables have unique records for each other?
</pre>
<pre class="gmail-SYNOPSIS">As Adrian stated, it would be very helpful if you provided us with all O/S and PostgreSQL vesions involved.
</pre>
</div>
</div>
</blockquote>
<br>
AFAIK pg_dump is version-independent (I use it to upgrade some small
clusters), given that you are using the one from the newest version
and you are dumping from old and restoring from new. <br>
Another thing that I saw is that pg_dump is less I/O stressful (I
can almost safely dump databases from my production server in
traffic hours, even if I never tried doing it in rush hours :-) )
and therefore slower than COPY, but on the other side, using COPY,
the risk is to have all I/O bandwidth eaten by data transfer,
especially if the amount of data to be transferred is considerable.
I tried some times to COPY across different clusters with different
versions, but I always had issues.<br>
<br>
My 2 cents.<br>
<br>
Cheers<br>
Moreno.<br>
<br>
<blockquote
cite="mid:CANu8FixqdzCZP4h3fPTfWxuhmXCmyfpcV3YtRDre=OoqPE8KkA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra"><br>
-- <br>
<div class="gmail_signature">
<div dir="ltr"><font size="4"><b><span
style="font-family:courier new,monospace">Melvin
Davidson</span></b></font><br>
<font style="font-weight:bold" size="3"><span
style="color:rgb(128,0,255)">I reserve the right to
fantasize. Whether or not you </span><br
style="color:rgb(128,0,255)">
<span style="color:rgb(128,0,255)">wish to share my
fantasy is entirely up to you. </span><img
moz-do-not-send="true" style="color: rgb(128, 0,
255);"
src="http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys2/01.gif"></font><br>
</div>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>
Am 21.02.2017 um 15:12 schrieb Adrian Klaver:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
I want to move table rows from one database to an central database.
You actually talking about moving from ~100 databases to the central database, correct?
Both run PostgreSQL.
Are all the Postgres instances the same version and what is the version or versions?
Yes, all run postgres, but the version can be different (but not much).
Satellite-DB 9.5 and 9.6 and central 9.6.
My use case looks like this:
There are N satellite databases in different data centers. N is about
100 at the moment.There is one central database.
I need a way to reliably move rows from the satellite databases to the
central oneTwo ways I can think of:
https://www.postgresql.org/docs/9.6/static/dblink.html
https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
Since postgres_fdw is newer, I would focus on it, right?
If I understood it correctly, then there would be N (about 100) tables in the central DB.
What happens if there is a network outage (for about 3 minutes) during accessing a remote table?
Is there a Primary Key on the satellite tables or some way of determining unique rows?
The concrete schema is not specified up to now. But I guess UUID as primary key would be the best fit.
Or am I wrong?
Is there any existing overlap between the data in the central database and the satellite databases?
No, there won't be overlaps. Every satellite system creates its own rows.
How much data are you talking about moving from each database?
How active are the satellite databases?
100k rows per day per satellite. Each row has only few bytes.
Moving of rows should happen every ten minutes.
- inserts can happen during syncing.
Can UPDATEs happen?
No, rows get created and moved and later deleted.
Thank you Adrian for your questions. It helped me to narrow down my problem.
Regards,
Thomas
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
I want to move table rows from one database to an central database.
Both run PostgreSQL.
How to solve this with PostgreSQL?
Should have added earlier. This is a specific case of the more general
case of ETL(Extract/Transform/Load). There are a host of tools out there
that do this. For instance I use Python and the following is available:
http://petl.readthedocs.io/en/latest/index.html
What tool you choose comes down to what you are comfortable with:
1) Writing your own programs/scripts and in what language?
2) Using a GUI that sets things up for you.
Regards,
Thomas Güttler
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Am 21.02.2017 um 15:27 schrieb William Ivanski:
You can try OmniDB: http://www.omnidb.com.br/en_index.aspx
OmniDB has a Convert feature, where you can set a data transfer, even if the target table exists.
I am unsure if omnidb is the right tool here.
I don't need a GUI. The movement of the rows should happen in background.
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Depending on how much data you want to move, and if the tables have the same structure, you might also want to consider
using
pg_dump -aOR
multiple instances ofon satellite
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]on central
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]A BIG consideration is:
Does the Central DB have the same table structures as all satellite DB's?
yes, same schema
Does the Central DB already have records in the tables.
yes, movement of rows should happen every ten minutes.
Do all Satellite tables have unique records for each other?
Yes, UUID primary key.
As Adrian stated, it would be very helpful if you provided us with all O/S and PostgreSQL vesions involved.
Versions are 9.5 and 9.6
I have other concerns: atomar transaction. Movement should happen completely or not all.
I don't think you can do this reliable (atomic transaction) with "copy table_name".
Regards,
Thomas
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Feb 21, 2017 at 11:10 AM, Thomas Güttler <
guettliml@thomas-guettler.de> wrote:
Depending on how much data you want to move, and if the tables have the
same structure, you might also want to consider
using
pg_dump -aOR
multiple instances ofon satellite
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]on central
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]A BIG consideration is:
Does the Central DB have the same table structures as all satellite DB's?yes, same schema
Does the Central DB already have records in the tables.
yes, movement of rows should happen every ten minutes.
Do all Satellite tables have unique records for each other?
Yes, UUID primary key.
As Adrian stated, it would be very helpful if you provided us with all O/S
and PostgreSQL vesions involved.
Versions are 9.5 and 9.6
I have other concerns: atomar transaction. Movement should happen
completely or not all.I don't think you can do this reliable (atomic transaction) with "copy
table_name".Regards,
Thomas--
Thomas Guettler http://www.thomas-guettler.de/
I have other concerns: atomar transaction. Movement should happen
completely or not all.
I don't think you can do this reliable (atomic transaction) with "copy
table_name".
You can if you wrap it in a transaction:
EG:
BEGIN;
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
COMMIT;
BEGIN;
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
COMMIT;
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
On 02/21/2017 07:53 AM, Thomas Güttler wrote:
Am 21.02.2017 um 15:12 schrieb Adrian Klaver:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
I want to move table rows from one database to an central database.
You actually talking about moving from ~100 databases to the central
database, correct?Both run PostgreSQL.
Are all the Postgres instances the same version and what is the
version or versions?Yes, all run postgres, but the version can be different (but not much).
Satellite-DB 9.5 and 9.6 and central 9.6.My use case looks like this:
There are N satellite databases in different data centers. N is about
100 at the moment.There is one central database.
I need a way to reliably move rows from the satellite databases to the
central oneTwo ways I can think of:
https://www.postgresql.org/docs/9.6/static/dblink.html
https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
Since postgres_fdw is newer, I would focus on it, right?
If I understood it correctly, then there would be N (about 100) tables
in the central DB.What happens if there is a network outage (for about 3 minutes) during
accessing a remote table?
I misunderstood your original intent, I thought this was a one time
process to move data to the central database. Given that it is to be a
continuous process a FDW may not be the answer, one of the reasons being
the above question. You will be denied the data in the remote table
during the outage. Also not sure what you will be doing with the data in
the central database and how often? In any case it will involve reaching
out to all the satellites each time you want to query the latest data.
Looks more like some kind of push mechanism from the satellites to the
central database is in order. Then once the data is on the central
database it is 'captured'. A question that comes to mind is if there is
a problem should the data transfer from one or more satellites lag that
of the others?
Is there a Primary Key on the satellite tables or some way of
determining unique rows?The concrete schema is not specified up to now. But I guess UUID as
primary key would be the best fit.
Or am I wrong?Is there any existing overlap between the data in the central database
and the satellite databases?No, there won't be overlaps. Every satellite system creates its own rows.
How much data are you talking about moving from each database?
How active are the satellite databases?100k rows per day per satellite. Each row has only few bytes.
Moving of rows should happen every ten minutes.
- inserts can happen during syncing.
Can UPDATEs happen?
No, rows get created and moved and later deleted.
Thank you Adrian for your questions. It helped me to narrow down my
problem.Regards,
Thomas
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 02/21/2017 08:06 AM, Thomas Güttler wrote:
Am 21.02.2017 um 15:27 schrieb William Ivanski:
You can try OmniDB: http://www.omnidb.com.br/en_index.aspx
OmniDB has a Convert feature, where you can set a data transfer, even
if the target table exists.I am unsure if omnidb is the right tool here.
I don't need a GUI. The movement of the rows should happen in background.
Given the versions of Postgres you are using there is logical replication:
https://www.postgresql.org/docs/9.5/static/protocol-replication.html
https://2ndquadrant.com/en/resources/pglogical/
"AGGREGATE - Accumulate changes from sharded database servers into a
Data Warehouse"
I have not used this capability yet, so others would have to comment on
its applicability.
Regards,
Thomas Güttler
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I misunderstood your original intent, I thought this was a one time process to move data to the central database. Given
that it is to be a continuous process a FDW may not be the answer, one of the reasons being the above question. You will
be denied the data in the remote table during the outage. Also not sure what you will be doing with the data in the
central database and how often? In any case it will involve reaching out to all the satellites each time you want to
query the latest data. Looks more like some kind of push mechanism from the satellites to the central database is in
order. Then once the data is on the central database it is 'captured'. A question that comes to mind is if there is a
problem should the data transfer from one or more satellites lag that of the others?
I am sorry. Yes, I forgot to mention that the process should be a continous push.
If not all data is in the central, that's ok.
The central should only see the data which is already in the central place.
Up to now it is not important who triggers this process. It could be a push (from satellite to central) or a pull (from
central to satellite)
If there is a lag ... then there is a lag. That can happen (network outage), this is not very important.
More important is that no single row gets lost or duplicated.
Regards,
Thomas
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I have other concerns: atomar transaction. Movement should happen completely or not all.
I don't think you can do this reliable (atomic transaction) with "copy table_name".You can if you wrap it in a transaction:
I want to **move** the data. The data should get deleted on the satellite after transfer.
I don't know how to delete the data which was copied, since inserts can happen during the copy statement.
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 02/22/2017 04:51 AM, Thomas Güttler wrote:
I have other concerns: atomar transaction. Movement should happen
completely or not all.
I don't think you can do this reliable (atomic transaction) with
"copy table_name".You can if you wrap it in a transaction:
I want to **move** the data. The data should get deleted on the
satellite after transfer.
Well the replication suggestion is out.
I don't know how to delete the data which was copied, since inserts can
happen during the copy statement.
However you end up doing this I think you will probably need some sort
of flag on the rows on the satellites. It could be a timestamp field of
when the rows where inserted on the satellite or a boolean
field(copied). First instinct is to use an insert timestamp and a
tracking table that stores the last timestamp used to move rows, where
the timestamp is only written on a successful transfer. To improve the
chances of successful transfer more smaller transfer batches rather then
larger transfers.
Regards,
Thomas Güttler
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thomas:
On Wed, Feb 22, 2017 at 1:51 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
I want to **move** the data. The data should get deleted on the satellite
after transfer.
I don't know how to delete the data which was copied, since inserts can
happen during the copy statement.
Depending on the structure / control / kind of ops you have on the
satellite you can try a two-table trick, similar to a common one used
to process files in spool directories.
1st, you move everything to a holding table transactionally in the
satellite ( insert into holding table delete from main table returning
* )
Then you have the holding table to transfer and clear, not touched by
anyone ( as transfer process is the only one touching it )
You can do some tricks. You could inherit holding from main if you
want to be able to see data while it is been transferred. Or play
renaming tricks. It all depends in the concrete app, but you can try
to fit the pattern in it, I've done it several times and its a useful
one.
Francisco Olarte.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Maybe you can do something like
WITH cte AS (
DELETE FROM t1 WHERE cond
RETURNINIG *
)
INSERT into t2
SELECT * FROM cte;
To move rows between tables with combination with fdw_postgres and
foreign table.
Je;
On Thu, Feb 23, 2017 at 10:33:27AM +0100, Francisco Olarte wrote:
Thomas:
On Wed, Feb 22, 2017 at 1:51 PM, Thomas G�ttler
<guettliml@thomas-guettler.de> wrote:I want to **move** the data. The data should get deleted on the satellite
after transfer.
I don't know how to delete the data which was copied, since inserts can
happen during the copy statement.Depending on the structure / control / kind of ops you have on the
satellite you can try a two-table trick, similar to a common one used
to process files in spool directories.1st, you move everything to a holding table transactionally in the
satellite ( insert into holding table delete from main table returning
* )Then you have the holding table to transfer and clear, not touched by
anyone ( as transfer process is the only one touching it )You can do some tricks. You could inherit holding from main if you
want to be able to see data while it is been transferred. Or play
renaming tricks. It all depends in the concrete app, but you can try
to fit the pattern in it, I've done it several times and its a useful
one.Francisco Olarte.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
On 02/22/2017 04:51 AM, Thomas Güttler wrote:
I have other concerns: atomar transaction. Movement should happen
completely or not all.
I don't think you can do this reliable (atomic transaction) with
"copy table_name".You can if you wrap it in a transaction:
I want to **move** the data. The data should get deleted on the
satellite after transfer.Well the replication suggestion is out.
I don't know how to delete the data which was copied, since inserts can
happen during the copy statement.However you end up doing this I think you will probably need some sort of flag on the rows on the satellites. It could
be a timestamp field of when the rows where inserted on the satellite or a boolean field(copied). First instinct is to
use an insert timestamp and a tracking table that stores the last timestamp used to move rows, where the timestamp is
only written on a successful transfer. To improve the chances of successful transfer more smaller transfer batches
rather then larger transfers.
I really need a solid solution.
You said "... improve the chances of successful transfer ...". This makes me nervous.
Delays are no problems, but data loss or duplication is.
Regards,
Thomas
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Am 23.02.2017 um 10:33 schrieb Francisco Olarte:
Thomas:
On Wed, Feb 22, 2017 at 1:51 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:I want to **move** the data. The data should get deleted on the satellite
after transfer.
I don't know how to delete the data which was copied, since inserts can
happen during the copy statement.
Depending on the structure / control / kind of ops you have on the
satellite you can try a two-table trick, similar to a common one used
to process files in spool directories.
[cut]
This sound good. Is there a name for this trick, to find more details?
Regards,
Thomas
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Am 23.02.2017 um 13:44 schrieb Leknín Řepánek:
Maybe you can do something like
WITH cte AS (
DELETE FROM t1 WHERE cond
RETURNINIG *
)
INSERT into t2
SELECT * FROM cte;To move rows between tables with combination with fdw_postgres and
foreign table.
... this way you don't need the second table ... nice.
--
Thomas Guettler http://www.thomas-guettler.de/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general