PG replication across DataCenters

Started by Kaushal Shriyanover 12 years ago36 messagesdocsgeneral
Jump to latest
#1Kaushal Shriyan
kaushalshriyan@gmail.com
docsgeneral

Hi,

I have read on the web that Postgresql DB supports replication across data
centers. Any real life usecase examples if it has been implemented by
anyone. Please also help me understand the caveats i need to take care if i
implement this setup.

Regards,

Kaushal

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Kaushal Shriyan (#1)
docsgeneral
Re: PG replication across DataCenters

Kaushal Shriyan wrote:

I have read on the web that Postgresql DB supports replication across data centers. Any real life
usecase examples if it has been implemented by anyone.

Well, we replicate a 1 TB database between two locations.
It is a fairly active OLTP application, but certainly not
pushing the limits of what PostgreSQL can do in transactions
per second.

But I get the impression that replication is widely accepted
and used by now.

Please also help me understand the caveats i
need to take care if i implement this setup.

Don't use synchronous replication if you have a high transaction
rate and a noticable network latency between the sites.

Wait for the next bugfix release, since a nasty bug has just
been discovered.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Edson Richter
edsonrichter@hotmail.com
In reply to: Kaushal Shriyan (#1)
docsgeneral
Re: PG replication across DataCenters

Em 22/11/2013 08:43, Kaushal Shriyan escreveu:

Hi,

I have read on the web that Postgresql DB supports replication across
data centers. Any real life usecase examples if it has been
implemented by anyone. Please also help me understand the caveats i
need to take care if i implement this setup.

Regards,

Kaushal

We have used asynchronous replication across datacenters with 100%
success since 9.1. Currently we use 9.2.
Our setup involves a internet tunnel between servers. Servers have about
2.000km of distance from each other.
The only points you need to take attention is tuning number of
wal_keep_segments and timeout, and the initial load (that can be huge,
depends on your data).

Regards,

Edson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Torsten Förtsch
torsten.foertsch@gmx.net
In reply to: Laurenz Albe (#2)
docsgeneral
Re: PG replication across DataCenters

On 22/11/13 11:57, Albe Laurenz wrote:

Don't use synchronous replication if you have a high transaction
rate and a noticable network latency between the sites.

Wait for the next bugfix release, since a nasty bug has just
been discovered.

Can you please explain or provide a pointer for more information?

We have recently started to use sync replication over a line with >80ms
latency. It works for small transactions with a relatively low
transaction rate.

Avoid transactions using NOTIFY. Those acquire an exclusive lock during
commit that is released only when the remote host has also done its
commit. So, only one such transaction can be committing at time.

Async replication works just fine.

Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Torsten Förtsch (#4)
docsgeneral
Re: PG replication across DataCenters

Torsten Förtsch wrote:

Don't use synchronous replication if you have a high transaction
rate and a noticable network latency between the sites.

Wait for the next bugfix release, since a nasty bug has just
been discovered.

Can you please explain or provide a pointer for more information?

If you mean the bug I mentioned, see this thread:
/messages/by-id/20131119142001.GA10498@alap2.anarazel.de

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Kaushal Shriyan
kaushalshriyan@gmail.com
In reply to: Laurenz Albe (#5)
docsgeneral
Re: PG replication across DataCenters

On Fri, Nov 22, 2013 at 6:14 PM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:

Torsten Förtsch wrote:

Don't use synchronous replication if you have a high transaction
rate and a noticable network latency between the sites.

Wait for the next bugfix release, since a nasty bug has just
been discovered.

Can you please explain or provide a pointer for more information?

If you mean the bug I mentioned, see this thread:

/messages/by-id/20131119142001.GA10498@alap2.anarazel.de

Yours,
Laurenz Albe

Hi,

I am not sure i understand the difference between async and sync
replication and on what scenarios i should use async or sync replication.
Does it mean if it is within same DC then sync replication is the best and
if it is across DC replication async is better than sync. Please help me
understand.

Regards,

Kaushal

#7Michael Paquier
michael@paquier.xyz
In reply to: Kaushal Shriyan (#6)
docsgeneral
Re: PG replication across DataCenters

On Fri, Nov 22, 2013 at 10:03 PM, Kaushal Shriyan
<kaushalshriyan@gmail.com> wrote:

I am not sure i understand the difference between async and sync replication
and on what scenarios i should use async or sync replication. Does it mean
if it is within same DC then sync replication is the best and if it is
across DC replication async is better than sync. Please help me understand.

In the case of synchronous replication, master node waits for the
confirmation that a given transaction has committed on slave side
before committing itself. This wait period can cause some delay, hence
it is preferable to use sync replication with nodes that far from each
other.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Michael Paquier
michael@paquier.xyz
In reply to: Laurenz Albe (#5)
docsgeneral
Re: PG replication across DataCenters

On Fri, Nov 22, 2013 at 9:44 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Torsten Förtsch wrote:

Don't use synchronous replication if you have a high transaction
rate and a noticable network latency between the sites.

Wait for the next bugfix release, since a nasty bug has just
been discovered.

Can you please explain or provide a pointer for more information?

If you mean the bug I mentioned, see this thread:
/messages/by-id/20131119142001.GA10498@alap2.anarazel.de

Bug that has just been fixed btw:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=98f58a30c1beb6ec0870d6520f49fb40d9d0b566
Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Michael Paquier (#7)
docsgeneral
Re: PG replication across DataCenters

Michael Paquier wrote:

On Fri, Nov 22, 2013 at 10:03 PM, Kaushal Shriyan <kaushalshriyan@gmail.com> wrote:

I am not sure i understand the difference between async and sync replication
and on what scenarios i should use async or sync replication. Does it mean
if it is within same DC then sync replication is the best and if it is
across DC replication async is better than sync. Please help me understand.

In the case of synchronous replication, master node waits for the
confirmation that a given transaction has committed on slave side
before committing itself. This wait period can cause some delay, hence
it is preferable to use sync replication with nodes that far from each
other.

I am sure that you wanted to say
"with nodes *not* that far from each other".

Basically, you have to choose between these options:
- Slow down processing, but don't lose a transaction on failover
(this would be synchronous, nodes close to each other)
- Replicate over longer distances, but possibly lose some
transactions on failover (that would be asynchronous).

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Michael Paquier
michael@paquier.xyz
In reply to: Laurenz Albe (#9)
docsgeneral
Re: PG replication across DataCenters

On Fri, Nov 22, 2013 at 11:46 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Michael Paquier wrote:

On Fri, Nov 22, 2013 at 10:03 PM, Kaushal Shriyan <kaushalshriyan@gmail.com> wrote:

I am not sure i understand the difference between async and sync replication
and on what scenarios i should use async or sync replication. Does it mean
if it is within same DC then sync replication is the best and if it is
across DC replication async is better than sync. Please help me understand.

In the case of synchronous replication, master node waits for the
confirmation that a given transaction has committed on slave side
before committing itself. This wait period can cause some delay, hence
it is preferable to use sync replication with nodes that far from each
other.

I am sure that you wanted to say
"with nodes *not* that far from each other".

Oops sorry for the typo. Yes I meant of course "not that far".
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Thomas Harold
thomas-lists@nybeta.com
In reply to: Laurenz Albe (#2)
docsgeneral
Re: PG replication across DataCenters

On 11/22/2013 5:57 AM, Albe Laurenz wrote:

Kaushal Shriyan wrote:

I have read on the web that Postgresql DB supports replication
across data centers. Any real life usecase examples if it has been
implemented by anyone.

Well, we replicate a 1 TB database between two locations. It is a
fairly active OLTP application, but certainly not pushing the limits
of what PostgreSQL can do in transactions per second.

Something that section 25 in the pgsql documentation is not clear about
for hot-standby with WAL log shipping using the built-in streaming:

Can you choose which databases / tables on the master server get
streamed to the hot-standby read-only server at the remote site? If
not, I suspect we'll have to go with either Slony or Bucardo.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Ben
bench@silentmedia.com
In reply to: Thomas Harold (#11)
docsgeneral
Re: PG replication across DataCenters

On Dec 9, 2013, at 8:09 AM, Thomas Harold wrote:

On 11/22/2013 5:57 AM, Albe Laurenz wrote:

Kaushal Shriyan wrote:

I have read on the web that Postgresql DB supports replication
across data centers. Any real life usecase examples if it has been
implemented by anyone.

Well, we replicate a 1 TB database between two locations. It is a
fairly active OLTP application, but certainly not pushing the limits
of what PostgreSQL can do in transactions per second.

Something that section 25 in the pgsql documentation is not clear about for hot-standby with WAL log shipping using the built-in streaming:

Can you choose which databases / tables on the master server get streamed to the hot-standby read-only server at the remote site? If not, I suspect we'll have to go with either Slony or Bucardo.

No, with the built-in binary replication, it's all or nothing, and the slaves have to have the exact same schema as the master (no adding or removing indices, for example.)

Out of curiosity what did you find unclear about http://www.postgresql.org/docs/9.3/static/different-replication-solutions.html?

#13Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Thomas Harold (#11)
docsgeneral
Re: PG replication across DataCenters

Thomas Harold <thomas-lists@nybeta.com> wrote:

On 11/22/2013 5:57 AM, Albe Laurenz wrote:

Kaushal Shriyan wrote:

I have read on the web that Postgresql DB supports replication
across data centers. Any real life usecase examples if it has been
implemented by anyone.

Well, we replicate a 1 TB database between two locations. It is a
fairly active OLTP application, but certainly not pushing the limits
of what PostgreSQL can do in transactions per second.

Something that section 25 in the pgsql documentation is not clear about
for hot-standby with WAL log shipping using the built-in streaming:

Can you choose which databases / tables on the master server get
streamed to the hot-standby read-only server at the remote site? If
not, I suspect we'll have to go with either Slony or Bucardo.

WAL's contains transaction informations for the whole cluster, you can't
choose particular databases or tables.

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Thomas Harold
thomas-lists@nybeta.com
In reply to: Ben (#12)
docsgeneral
Re: PG replication across DataCenters (section 25 in the manual)

On 12/9/2013 11:24 AM, Ben Chobot wrote:

Out of curiosity what did you find unclear about
http://www.postgresql.org/docs/9.3/static/different-replication-solutions.html?

Perhaps the "Per-table granularity" line in the matrix (Table 25-1)
might be better written as:

"Synchronization Granularity"

Columns 1-3 and 5 could say "Entire Cluster". Column 4 might say
"Selected tables (Slony)", and I'm not sure off-hand what granularity #6
(Bucardo) is capable of. Column #7 might just say "Varies".

For someone not familiar with what exactly WAL files are, it's not clear
that solution #3 is an all-or-nothing approaches at the cluster level.
Now that I've refreshed my memory on how WAL files work (and at what
level in the pgsql cluster), I understand why #3 works the way it does.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Bill Moran
wmoran@potentialtech.com
In reply to: Thomas Harold (#11)
docsgeneral
Re: PG replication across DataCenters

On Mon, 09 Dec 2013 11:09:21 -0500 Thomas Harold <thomas-lists@nybeta.com> wrote:

On 11/22/2013 5:57 AM, Albe Laurenz wrote:

Kaushal Shriyan wrote:

I have read on the web that Postgresql DB supports replication
across data centers. Any real life usecase examples if it has been
implemented by anyone.

Well, we replicate a 1 TB database between two locations. It is a
fairly active OLTP application, but certainly not pushing the limits
of what PostgreSQL can do in transactions per second.

Something that section 25 in the pgsql documentation is not clear about
for hot-standby with WAL log shipping using the built-in streaming:

Can you choose which databases / tables on the master server get
streamed to the hot-standby read-only server at the remote site? If
not, I suspect we'll have to go with either Slony or Bucardo.

Go with Slony. Trust me.

People seem to shy away from the comlexity of slony, only to realize
later that they're hurting because the solution they chose instead
doesn't have the features they need. Keep in mind some things that
slony does that aren't yet available with streaming:
* Cascading replication chains (a really big deal when you want
multiple slaves in the secondary facility and don't want to hog
your bandwidth)
* Quick and easy movement of the master to any of the database in
the cluster without destroying replication.
* Seeding of new slaves without interrupting existing nodes (assuming
your hardware has a little free capacity)
* Selective replication of tables, potentially in complex arrangements
where some tables are replicated to only to A and some only to B
and some to A and B, etc, etc.

I was about to go on and type more, but really, those three things
make a huge difference in day to day operations, when problems occur,
and when the unexpected (but joyful) "we never expected this much
activity" happens.

Streaming replication is great, but unless you're 100% sure you'll
be OK with the restrictions it imposes, I recommend taking the time
to learn how to manage slony, as the advantages far outweigh the
additional management overhead.

--
Bill Moran <wmoran@potentialtech.com>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Greg Sabino Mullane
greg@turnstep.com
In reply to: Thomas Harold (#14)
docsgeneral
Re: PG replication across DataCenters (section 25 in the manual)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Columns 1-3 and 5 could say "Entire Cluster". Column 4 might say
"Selected tables (Slony)", and I'm not sure off-hand what granularity #6
(Bucardo) is capable of. Column #7 might just say "Varies".

Bucardo and Slony are both table-based and trigger-driven.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201312100859
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlKnHmAACgkQvJuQZxSWSsiD9QCdFzrd+VfM18dGa6btzbZ5Bc9G
oBsAn3O4Y4g74w3WxMK3mQsJjjHOIQ5g
=m+o0
-----END PGP SIGNATURE-----

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Wolfgang Keller
feliphil@gmx.net
In reply to: Ben (#12)
docsgeneral
postgresql.org inconsistent (Re: PG replication across DataCenters)

http://www.postgresql.org/docs/9.3/static/different-replication-solutions.html?

Synchronous Multimaster Replication

*snip*

PostgreSQL does not offer this type of replication (...)

Now I compare that statement with:

http://wiki.postgresql.org/wiki/Postgres-XC

Project Overview

*snip*

Features of PG-XC include:

*snip*

2. Synchronous multi-master configuration

Seems to me that the editing process of the different parts of
postgresql.org somewhat lacks transactional semantics.

;->

Sincerely,

Wolfgang

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Steve Atkins
steve@blighty.com
In reply to: Wolfgang Keller (#17)
docsgeneral
Re: postgresql.org inconsistent (Re: PG replication across DataCenters)

On Dec 10, 2013, at 8:47 AM, Wolfgang Keller <feliphil@gmx.net> wrote:

http://www.postgresql.org/docs/9.3/static/different-replication-solutions.html?

Synchronous Multimaster Replication

*snip*

PostgreSQL does not offer this type of replication (...)

Now I compare that statement with:

http://wiki.postgresql.org/wiki/Postgres-XC

Project Overview

*snip*

Features of PG-XC include:

*snip*

2. Synchronous multi-master configuration

Seems to me that the editing process of the different parts of
postgresql.org somewhat lacks transactional semantics.

Postgres-XC isn't PostgreSQL. Entirely different product.

Anyone can add pages to the wiki, and there's lots of information
there about things that aren't postgresql, Postgres-XC is just
one of those.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19John R Pierce
pierce@hogranch.com
In reply to: Wolfgang Keller (#17)
docsgeneral
Re: postgresql.org inconsistent (Re: PG replication across DataCenters)

On 12/10/2013 8:47 AM, Wolfgang Keller wrote:

Seems to me that the editing process of the different parts of
postgresql.org somewhat lacks transactional semantics.

postgresql-xc is not postgresql, its a fork. there's other forks that
offer distributed databases, such as greenplum.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20Wolfgang Keller
feliphil@gmx.net
In reply to: John R Pierce (#19)
docsgeneral
Re: postgresql.org inconsistent (Re: PG replication across DataCenters)

Seems to me that the editing process of the different parts of
postgresql.org somewhat lacks transactional semantics.

postgresql-xc is not postgresql, its a fork.

As an end-user, why would I care.

Since, besides that it's still open-source (even same license as
PostgreSQL itself...?), it's following the PostgreSQL releases pretty
closely. According to their roadmap, version 1.1 has been merged with
PostgreSQL 9.2 and version 1.2 will be merged with 9.3.

It would at least merit being mentioned in the doc, just like other
"forks" or whatever you may call it, as long as they're open-source.

Sincerely,

Wolfgang

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wolfgang Keller (#20)
docsgeneral
#22Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#21)
docsgeneral
#23Wolfgang Keller
feliphil@gmx.net
In reply to: Tom Lane (#21)
docsgeneral
#24Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Wolfgang Keller (#23)
docsgeneral
#25Michael Paquier
michael@paquier.xyz
In reply to: Wolfgang Keller (#23)
docsgeneral
#26Chris Travers
chris.travers@gmail.com
In reply to: Steve Atkins (#18)
docsgeneral
#27Wolfgang Keller
feliphil@gmx.net
In reply to: Michael Paquier (#25)
docsgeneral
#28Joshua D. Drake
jd@commandprompt.com
In reply to: Wolfgang Keller (#27)
docsgeneral
#29Wolfgang Keller
feliphil@gmx.net
In reply to: Joshua D. Drake (#28)
docsgeneral
#30Christofer C. Bell
christofer.c.bell@gmail.com
In reply to: Wolfgang Keller (#29)
docsgeneral
#31Wolfgang Keller
feliphil@gmx.net
In reply to: Christofer C. Bell (#30)
docsgeneral
#32Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Bill Moran (#15)
docsgeneral
#33Bill Moran
wmoran@potentialtech.com
In reply to: Sameer Kumar (#32)
docsgeneral
#34Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Bill Moran (#33)
docsgeneral
#35Bill Moran
wmoran@potentialtech.com
In reply to: Sameer Kumar (#34)
docsgeneral
#36Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Bill Moran (#35)
docsgeneral