Data Replication
Hi.
I am trying to determine what kind of data replication is currently
available in PostgreSQL. This is for purposes of examining capabilities
of PostgreSQL as compared to other RDBMSs.
I attempted some searches in various areas and came up with a
bewildering array of results but no clear answer.
a) Slony-I provides master/slave data replication (3rd party product
but FOSS)
b) PGCluster for Multi-master data replication, also 3rd party add-on
tool, also FOSS.
c) Postgres-R for multi-master data replication, appears to be a code
fork of PostgreSQL
d) Support for data replication in core PostgreSQL engine coming up, as
announced in this article in June 2008:
http://scale-out-blog.blogspot.com/2008/06/postgresql-gets-religion-abou
t.html
By contrast, when I search a similar question about MySQL, I get a clear
answer that both master/slave and multi-master data replication are
supported directly by the core engine.
Could somebody please help me sort this out? I would like to know
i) What is the current available support for data replication in
PostgreSQL?
ii) Does anybody have experience with these tools they could share?
iii) Is data replication planned for an upcoming release of PostgreSQL,
and if so what are the exact features and when is the release expected?
-Will
- - - - - Cisco - - - - -
This e-mail and any attachments may contain information which is confidential,
proprietary, privileged or otherwise protected by law. The information is solely
intended for the named addressee (or a person responsible for delivering it to
the addressee). If you are not the intended recipient of this message, you are
not authorized to read, print, retain, copy or disseminate this message or any
part of it. If you have received this e-mail in error, please notify the sender
immediately by return e-mail and delete it from your computer.
On Dec 10, 2008, at 2:18 PM, Rutherdale, Will wrote:
Hi.
I am trying to determine what kind of data replication is currently
available in PostgreSQL. This is for purposes of examining
capabilities
of PostgreSQL as compared to other RDBMSs.I attempted some searches in various areas and came up with a
bewildering array of results but no clear answer.a) Slony-I provides master/slave data replication (3rd party product
but FOSS)
b) PGCluster for Multi-master data replication, also 3rd party add-on
tool, also FOSS.
c) Postgres-R for multi-master data replication, appears to be a code
fork of PostgreSQL
d) Support for data replication in core PostgreSQL engine coming
up, as
announced in this article in June 2008:
http://scale-out-blog.blogspot.com/2008/06/postgresql-gets-religion-abou
t.htmlBy contrast, when I search a similar question about MySQL, I get a
clear
answer that both master/slave and multi-master data replication are
supported directly by the core engine.
Kinda.
Could somebody please help me sort this out? I would like to know
i) What is the current available support for data replication in
PostgreSQL?
There's not really such a thing as one true replication solution. There
are a wide range of different requirements and tradeoffs. HA is
different
to load balancing is different to geographical distribution and so on.
http://www.postgresql.org/docs/8.3/static/high-availability.html covers
some of the tradeoffs of different approaches.
Log shipping, skytools and pgpool are the three main methods I can
think of that you didn't mention.
http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling
has
others, and a bunch of references.
ii) Does anybody have experience with these tools they could share?
iii) Is data replication planned for an upcoming release of
PostgreSQL,
and if so what are the exact features and when is the release
expected?
See http://www.postgresql.org/docs/8.3/static/warm-standby.html for
the current in-core support for log-shipping based replication.
Cheers,
Steve
On Wed, 2008-12-10 at 17:18 -0500, Rutherdale, Will wrote:
Hi.
I am trying to determine what kind of data replication is currently
available in PostgreSQL. This is for purposes of examining capabilities
of PostgreSQL as compared to other RDBMSs.I attempted some searches in various areas and came up with a
bewildering array of results but no clear answer.a) Slony-I provides master/slave data replication (3rd party product
but FOSS)
Correct.
b) PGCluster for Multi-master data replication, also 3rd party add-on
tool, also FOSS.
Correct. Unsure of stability.
c) Postgres-R for multi-master data replication, appears to be a code
fork of PostgreSQL
Not stable as far as I know.
d) Support for data replication in core PostgreSQL engine coming up, as
announced in this article in June 2008:
http://scale-out-blog.blogspot.com/2008/06/postgresql-gets-religion-abou
t.html
e) PITR which is asynchronous log shipping. This is available in all
versions >= 8.1 (foss)
f) PostgreSQL Replicator an asynchronous replication system and a fork
of the core postgresql (see
https://projects.commandprompt.com/public/replicator)
g) Londiste, created by Skype. Similar to Slony but easier to manage
By contrast, when I search a similar question about MySQL, I get a clear
answer that both master/slave and multi-master data replication are
supported directly by the core engine.
And broken but yes :)
Could somebody please help me sort this out? I would like to know
i) What is the current available support for data replication in
PostgreSQL?
See above. Let me know if it is unclear.
ii) Does anybody have experience with these tools they could share?
Slony is useful if a bit difficult to manage. It is extremely flexible
however. We have several customers that user it.
Replicator is a previously closed source product. It is easy to use and
the latest version for 8.3 is in beta with 1.9 on the horizon which adds
DDL replication. We have several customers that use it.
Londiste I have played with a bit, it works but I have no production
experience with it.
iii) Is data replication planned for an upcoming release of PostgreSQL,
and if so what are the exact features and when is the release expected?
Planned yes, guaranteed no and it is still log shipping which means read
only slaves are out (as I recall).
Sincerely,
Joshua D. Drake
-Will
- - - - - Cisco - - - - -
This e-mail and any attachments may contain information which is confidential,
proprietary, privileged or otherwise protected by law. The information is solely
intended for the named addressee (or a person responsible for delivering it to
the addressee). If you are not the intended recipient of this message, you are
not authorized to read, print, retain, copy or disseminate this message or any
part of it. If you have received this e-mail in error, please notify the sender
immediately by return e-mail and delete it from your computer.
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
Thanks very much, Steve.
The main (but not only) type of data replication activity I'm interested
in right now would be the warm standby. Thus it appears from the
documents you showed me that log shipping is one solution currently
available in PostgreSQL. I would want to make this work between
geographically separated machines that have TCP/IP connectivity between
them.
From what I understand from the documents so far, I can set up log
shipping using pgsql without any external packages. There also seems to
be support for periodic backups from the primary to the warm standby.
-Will
BTW Hope you don't mind my quoting style.
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Steve Atkins
Sent: 10 December 2008 17:50
To: PostgreSQL General
Subject: Re: [GENERAL] Data Replication
On Dec 10, 2008, at 2:18 PM, Rutherdale, Will wrote:
Hi.
I am trying to determine what kind of data replication is currently
available in PostgreSQL. This is for purposes of examining
capabilities
of PostgreSQL as compared to other RDBMSs.I attempted some searches in various areas and came up with a
bewildering array of results but no clear answer.a) Slony-I provides master/slave data replication (3rd party product
but FOSS)
b) PGCluster for Multi-master data replication, also 3rd party add-on
tool, also FOSS.
c) Postgres-R for multi-master data replication, appears to be a code
fork of PostgreSQL
d) Support for data replication in core PostgreSQL engine coming
up, as
announced in this article in June 2008:
http://scale-out-blog.blogspot.com/2008/06/postgresql-gets-religion-abou
t.html
By contrast, when I search a similar question about MySQL, I get a
clear
answer that both master/slave and multi-master data replication are
supported directly by the core engine.
Kinda.
Could somebody please help me sort this out? I would like to know
i) What is the current available support for data replication in
PostgreSQL?
There's not really such a thing as one true replication solution. There
are a wide range of different requirements and tradeoffs. HA is
different
to load balancing is different to geographical distribution and so on.
http://www.postgresql.org/docs/8.3/static/high-availability.html covers
some of the tradeoffs of different approaches.
Log shipping, skytools and pgpool are the three main methods I can
think of that you didn't mention.
http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connect
ion_Pooling
has
others, and a bunch of references.
ii) Does anybody have experience with these tools they could share?
iii) Is data replication planned for an upcoming release of
PostgreSQL,
and if so what are the exact features and when is the release
expected?
See http://www.postgresql.org/docs/8.3/static/warm-standby.html for
the current in-core support for log-shipping based replication.
Cheers,
Steve
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
- - - - - Cisco - - - - -
This e-mail and any attachments may contain information which is confidential,
proprietary, privileged or otherwise protected by law. The information is solely
intended for the named addressee (or a person responsible for delivering it to
the addressee). If you are not the intended recipient of this message, you are
not authorized to read, print, retain, copy or disseminate this message or any
part of it. If you have received this e-mail in error, please notify the sender
immediately by return e-mail and delete it from your computer.
On Wed, 2008-12-10 at 18:34 -0500, Rutherdale, Will wrote:
Thanks very much, Steve.
The main (but not only) type of data replication activity I'm interested
in right now would be the warm standby. Thus it appears from the
documents you showed me that log shipping is one solution currently
available in PostgreSQL. I would want to make this work between
geographically separated machines that have TCP/IP connectivity between
them.From what I understand from the documents so far, I can set up log
shipping using pgsql without any external packages. There also seems to
be support for periodic backups from the primary to the warm standby.
You can. There is a BSD open source project here:
http://projects.commandprompt.com/public/pitrtools
That will help you with your warm standby needs quite a bit.
Joshua D. Drake
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
Thanks, Joshua.
As I mentioned to Steve, warm standby / log shipping seems to be the
main feature I'm looking for.
The PITR solution you mention: is that an improvement over regular log
shipping? Or do I misunderstand where that fits into the system?
-Will
-----Original Message-----
From: Joshua D. Drake [mailto:jd@commandprompt.com]
Sent: 10 December 2008 17:52
To: Rutherdale, Will
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Data Replication
On Wed, 2008-12-10 at 17:18 -0500, Rutherdale, Will wrote:
Hi.
I am trying to determine what kind of data replication is currently
available in PostgreSQL. This is for purposes of examining
capabilities
of PostgreSQL as compared to other RDBMSs.
I attempted some searches in various areas and came up with a
bewildering array of results but no clear answer.a) Slony-I provides master/slave data replication (3rd party product
but FOSS)
Correct.
b) PGCluster for Multi-master data replication, also 3rd party add-on
tool, also FOSS.
Correct. Unsure of stability.
c) Postgres-R for multi-master data replication, appears to be a code
fork of PostgreSQL
Not stable as far as I know.
d) Support for data replication in core PostgreSQL engine coming up,
as
announced in this article in June 2008:
http://scale-out-blog.blogspot.com/2008/06/postgresql-gets-religion-abou
t.html
e) PITR which is asynchronous log shipping. This is available in all
versions >= 8.1 (foss)
f) PostgreSQL Replicator an asynchronous replication system and a fork
of the core postgresql (see
https://projects.commandprompt.com/public/replicator)
g) Londiste, created by Skype. Similar to Slony but easier to manage
By contrast, when I search a similar question about MySQL, I get a
clear
answer that both master/slave and multi-master data replication are
supported directly by the core engine.
And broken but yes :)
Could somebody please help me sort this out? I would like to know
i) What is the current available support for data replication in
PostgreSQL?
See above. Let me know if it is unclear.
ii) Does anybody have experience with these tools they could share?
Slony is useful if a bit difficult to manage. It is extremely flexible
however. We have several customers that user it.
Replicator is a previously closed source product. It is easy to use and
the latest version for 8.3 is in beta with 1.9 on the horizon which adds
DDL replication. We have several customers that use it.
Londiste I have played with a bit, it works but I have no production
experience with it.
iii) Is data replication planned for an upcoming release of
PostgreSQL,
and if so what are the exact features and when is the release
expected?
Planned yes, guaranteed no and it is still log shipping which means read
only slaves are out (as I recall).
Sincerely,
Joshua D. Drake
-Will
- - - - - Cisco
- - - - -
This e-mail and any attachments may contain information which is
confidential,
proprietary, privileged or otherwise protected by law. The information
is solely
intended for the named addressee (or a person responsible for
delivering it to
the addressee). If you are not the intended recipient of this message,
you are
not authorized to read, print, retain, copy or disseminate this
message or any
part of it. If you have received this e-mail in error, please notify
the sender
immediately by return e-mail and delete it from your computer.
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
On Wed, 2008-12-10 at 18:45 -0500, Rutherdale, Will wrote:
Thanks, Joshua.
As I mentioned to Steve, warm standby / log shipping seems to be the
main feature I'm looking for.The PITR solution you mention: is that an improvement over regular log
shipping? Or do I misunderstand where that fits into the system?
It is a supplement to the system. Warm standby / log shipping in
postgresql by itself is very bare. Basically you have the ability to
archive files and take a base backup but there are no utilities to
actually perform any of those functions.
PITR Tools gives you a nice wrapper around all the various external
tools you need to get standby working. Once configured it even handles
archiver monitoring etc...
If you look here:
https://projects.commandprompt.com/public/pitrtools/browser/trunk/cmd_standby.README
It will show you how cmd_standby uses pg_standby to perform various
functions (including things like failover).
Sincerely,
Joshua D. Drake
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
We've done warm standby as you indicate, and we've not needed anything
special.
On the primary's postgresql.conf we use:
archive_command = '~/postgresql/bin/copyWAL "%p" "%f"'
Our copyWAL script is just a wrapper for 'scp' since we want to copy the
data encrypted over the network:
#!/bin/bash
WALPATH=$1
WALFILE=$2
WALCOPYLOG=~/postgresql/logs/WALcopy.log
echo $0 - $(date) - Copy WAL received PATH: $WALPATH and FILE: $WALFILE
$WALCOPYLOG
echo $0 - $(date) - $(ls -l $PGDATA/$WALPATH) >> $WALCOPYLOG
scp -B -C -p "$WALPATH"
user@otherhost.com:postgresql/recoveryWALs/"$WALFILE"
RET=$?
if [ $RET -ne 0 ]; then
echo $0 - $(date) - Copy WAL PATH: $WALPATH - failed to copy to backup
system, exit code: $RET >> $WALCOPYLOG
exit RET
fi
# 0 exit status means successfully copied
exit 0;
On the warm standby, our recovery.conf uses pg_standby, which is part of
the contrib code:
restore_command = '~/postgresql/bin/pg_standby -l -d -t
~/postgresql/recoveryWALs/STOP_RECOVERY ~/postgresql/recoveryWALs %f %p
%r 2>> ~/po
stgresql/logs/pg_standby.log'
We have a script that puts the STOP_RECOVERY file in the specified
folder when we want it to go into live mode.
Hope this helps....
David
Rutherdale, Will wrote:
Show quoted text
Thanks, Joshua.
As I mentioned to Steve, warm standby / log shipping seems to be the
main feature I'm looking for.The PITR solution you mention: is that an improvement over regular log
shipping? Or do I misunderstand where that fits into the system?-Will
On Thu, Dec 11, 2008 at 1:05 PM, David Wall <d.wall@computer.org> wrote:
We've done warm standby as you indicate, and we've not needed anything
special.
Thanks for sharing your configuation. I have one additional question thought...
How do you handle the reverting? For example.
Say I have a primary database which is located at my main data center.
I am log shipping to a secondary data center.
My primary database server goes down.
Presume I have some scripts or something which detects this and
switches to the secondary.
What happens when I bring the primary back on line. I now want this to
be primary again and catch up on all the transactions that were sent
to the secondary. I want the secondary to resume it's backup status.
Is there a product which enables this kind of functionality?
On Thu, 2008-12-11 at 15:21 +1300, Tim Uckun wrote:
What happens when I bring the primary back on line. I now want this to
be primary again and catch up on all the transactions that were sent
to the secondary. I want the secondary to resume it's backup status.
You have to run a new base backup and have the slave ship logs to the
master.
Is there a product which enables this kind of functionality?
https://projects.commandprompt.com/public/pitrtools
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
You have to run a new base backup and have the slave ship logs to the
master.
Mmmm. Does this backup have to be a full backup? What if your database
is very large?
I am hoping to get a setup which is similar to SQL server mirroring.
It uses a witness server to keep track of who got what "logs" (it's
based on transaction logging) and allows you to failover and fail back
without having to do a full backup in between.
Is there a product which enables this kind of functionality?
I'll check these out. Thanks.
On Wed, Dec 10, 2008 at 7:40 PM, Tim Uckun <timuckun@gmail.com> wrote:
You have to run a new base backup and have the slave ship logs to the
master.Mmmm. Does this backup have to be a full backup? What if your database
is very large?
Yes. Your backup is very large.
I am hoping to get a setup which is similar to SQL server mirroring.
It uses a witness server to keep track of who got what "logs" (it's
based on transaction logging) and allows you to failover and fail back
without having to do a full backup in between.
Nothing like that really exists for pgsql in terms of log shipping.
If you want some part of your db backed up / replicated somewhere, use
one of the real time replication. Failover in slony is pretty easy to
do and happens in seconds. But you do have to resubscribe the master
as a slave and copy everything over again after a failover to make the
old master the new master again.
With the slonik help scripts, it's pretty easy to drive, and lets you
do some interesting things, like having different indexes on the
target than on the source, i.e. for reporting queries. But DDL's a
bit of a pain. We take our app down and disconnect all clients before
running ddl changes, and it goes smooth, with a few minutes of
downtime tops.
Log shipping doesn't really lends itself to switching back and forth
between masters and slaves.
On Wed, 2008-12-10 at 20:41 -0700, Scott Marlowe wrote:
On Wed, Dec 10, 2008 at 7:40 PM, Tim Uckun <timuckun@gmail.com> wrote:
Log shipping doesn't really lends itself to switching back and forth
between masters and slaves.
Really? It seems to me that you can make a base backup just as fast as
you can sync from slony (or replicator or whatever).
Joshua D. Drake
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
On Wed, Dec 10, 2008 at 8:43 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Wed, 2008-12-10 at 20:41 -0700, Scott Marlowe wrote:
On Wed, Dec 10, 2008 at 7:40 PM, Tim Uckun <timuckun@gmail.com> wrote:
Log shipping doesn't really lends itself to switching back and forth
between masters and slaves.Really? It seems to me that you can make a base backup just as fast as
you can sync from slony (or replicator or whatever).
sorry if my post wasn't clear. slony's not really any better. I don't
think that there's any replication for pgsql that's easy to do that
in. Is there?
On Wed, 2008-12-10 at 21:39 -0700, Scott Marlowe wrote:
On Wed, Dec 10, 2008 at 8:43 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Wed, 2008-12-10 at 20:41 -0700, Scott Marlowe wrote:
On Wed, Dec 10, 2008 at 7:40 PM, Tim Uckun <timuckun@gmail.com> wrote:
Log shipping doesn't really lends itself to switching back and forth
between masters and slaves.Really? It seems to me that you can make a base backup just as fast as
you can sync from slony (or replicator or whatever).sorry if my post wasn't clear. slony's not really any better. I don't
think that there's any replication for pgsql that's easy to do that
in. Is there?
No probably not. I mean they are all pretty easy (especially log
shipping) but it is definitely true they are slow, depending on the size
of the database.
Joshua D. Drake
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
No probably not. I mean they are all pretty easy (especially log
shipping) but it is definitely true they are slow, depending on the size
of the database.
As an alternative is there a clustering or multi master replication
scheme that would be useful in a WAN? Preferably with a "prefered
server" option so it doesn't attempt to read or write from the remote
database unless the local one is down.
On Wed, Dec 10, 2008 at 08:41:30PM -0700, Scott Marlowe wrote:
one of the real time replication. Failover in slony is pretty easy to
do and happens in seconds. But you do have to resubscribe the master
as a slave and copy everything over again after a failover to make the
old master the new master again.
Note, however, that you can do controlled switchover in Slony that
requires no re-sync. If your hardware-health monitors are any good
(and they ought to be), you can use them to tell you that something is
wrong, and switch before things blow up. It's not perfect, but it has
been useful to me.
--
Andrew Sullivan
ajs@crankycanuck.ca
On Wed, 2008-12-10 at 18:34 -0500, Rutherdale, Will wrote:
Thanks very much, Steve.
The main (but not only) type of data replication activity I'm interested
in right now would be the warm standby. Thus it appears from the
documents you showed me that log shipping is one solution currently
available in PostgreSQL. I would want to make this work between
geographically separated machines that have TCP/IP connectivity between
them.From what I understand from the documents so far, I can set up log
shipping using pgsql without any external packages. There also seems to
be support for periodic backups from the primary to the warm standby.
Yes, everything you need for log shipping has been contributed to the
main project. If you read things elsewhere, please refer closely to the
docs which are the most accurate source of detail.
There is some further info here, but you won't need all of the tools
listed there in typical cases.
http://www.2ndquadrant.com/replication.htm
If you think anything else is required, please let us know and we can
update the docs, tools etc..
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Thu, 2008-12-11 at 17:09 +0000, Simon Riggs wrote:
On Wed, 2008-12-10 at 18:34 -0500, Rutherdale, Will wrote:
Thanks very much, Steve.
Yes, everything you need for log shipping has been contributed to the
main project. If you read things elsewhere, please refer closely to the
docs which are the most accurate source of detail.
I think this statement is misleading. The only thing core contains is
the ability to use a bunch of utilities (with the exception of
pg_standby) that aren't in core to provide log shipping.
Sincerely,
Joshua D. Drake
--
PostgreSQL
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
On Thu, 2008-12-11 at 09:14 -0800, Joshua D. Drake wrote:
On Thu, 2008-12-11 at 17:09 +0000, Simon Riggs wrote:
On Wed, 2008-12-10 at 18:34 -0500, Rutherdale, Will wrote:
Thanks very much, Steve.
Yes, everything you need for log shipping has been contributed to the
main project. If you read things elsewhere, please refer closely to the
docs which are the most accurate source of detail.I think this statement is misleading. The only thing core contains is
the ability to use a bunch of utilities (with the exception of
pg_standby) that aren't in core to provide log shipping.
True, we rely on the existence of rsync, scp etc.. and go to great pains
to provide as much choice as possible.
If you think other things are required you are welcome to contribute
them so they can be verified fault free by the community.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support