Multi Master Replication

Started by itishree suklaover 12 years ago14 messagesgeneral
Jump to latest
#1itishree sukla
itishree.sukla@gmail.com

Hi all,

I need suggestion about setting up multi master replication between two
postgresql server place two different geographical area. As i know using
some third party tool like Bucardo,RubyRep it can be achievable, not sue
which is the good one to use. If any one can provide me some online
documentation links, it will help me as well.

Thanks in advance.

Regards,
Itishree

#2John R Pierce
pierce@hogranch.com
In reply to: itishree sukla (#1)
Re: Multi Master Replication

On 12/18/2013 1:31 AM, itishree sukla wrote:

I need suggestion about setting up multi master replication between
two postgresql server place two different geographical area. As i know
using some third party tool like Bucardo,RubyRep it can be achievable,
not sue which is the good one to use. If any one can provide me some
online documentation links, it will help me as well.

that sort of replication is very problematic. its virtually impossible
to maintain ACID (Atomicity, Consistency, Isolation, Durability) and
maintain any semblance of performance.

question for you, what do you expect to happen if the communications
link between the servers is interrupted, and updates continue to be sent
to both servers?

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: John R Pierce (#2)
Re: Multi Master Replication

On Wed, Dec 18, 2013 at 1:16 PM, John R Pierce <pierce@hogranch.com> wrote:

On 12/18/2013 1:31 AM, itishree sukla wrote:

I need suggestion about setting up multi master replication between two
postgresql server place two different geographical area. As i know using
some third party tool like Bucardo,RubyRep it can be achievable, not sue
which is the good one to use. If any one can provide me some online
documentation links, it will help me as well.

that sort of replication is very problematic. its virtually impossible to
maintain ACID (Atomicity, Consistency, Isolation, Durability) and maintain
any semblance of performance.

question for you, what do you expect to happen if the communications link
between the servers is interrupted, and updates continue to be sent to both
servers?

When people start talking multi-master replication my first response
is to ask what problem you're trying to solve. Sometimes MM Rep IS the
answer. But quite often it's not the best one for your problem. So to
OP I'd ask what problem they're trying to solve.

--
To understand recursion, one must first understand recursion.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Michael Paquier
michael@paquier.xyz
In reply to: Scott Marlowe (#3)
Re: Multi Master Replication

On Thu, Dec 19, 2013 at 7:12 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Wed, Dec 18, 2013 at 1:16 PM, John R Pierce <pierce@hogranch.com> wrote:

that sort of replication is very problematic. its virtually impossible to
maintain ACID (Atomicity, Consistency, Isolation, Durability) and maintain
any semblance of performance.

Yep, there's usually a trade-off between performance and data
consistency. OLTP applications can benefit from MM with a
shared-nothing architecture, more than data warehouse type of things
that need to transfer a lot of data for join operations, or SQL
operations that use non-pushable clauses (for example stable/volatile
functions).

question for you, what do you expect to happen if the communications link
between the servers is interrupted, and updates continue to be sent to both
servers?

Split-brain is another problem, hard to solve. Even harder if you have
several types nodes in your cluster dedicated to provide some piece
building the MM system.

When people start talking multi-master replication my first response
is to ask what problem you're trying to solve. Sometimes MM Rep IS the
answer. But quite often it's not the best one for your problem. So to
OP I'd ask what problem they're trying to solve.

Yes that's actually the right approach, multi-master replication is
often cited as a marketing term for a fantastic technology that can
solve a lot of problems, which could be solved with a couple of
Postgres servers using a single-master, multiple-slave approach, or by
simply design a system that can do data sharding among a set of
Postgres servers to achieve some kind of write scalability.

Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Scott Marlowe
scott.marlowe@gmail.com
In reply to: Michael Paquier (#4)
Re: Multi Master Replication

On Wed, Dec 18, 2013 at 10:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Dec 19, 2013 at 7:12 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

When people start talking multi-master replication my first response
is to ask what problem you're trying to solve. Sometimes MM Rep IS the
answer. But quite often it's not the best one for your problem. So to
OP I'd ask what problem they're trying to solve.

Yes that's actually the right approach, multi-master replication is
often cited as a marketing term for a fantastic technology that can
solve a lot of problems, which could be solved with a couple of
Postgres servers using a single-master, multiple-slave approach, or by
simply design a system that can do data sharding among a set of
Postgres servers to achieve some kind of write scalability.

Sharding with plproxy is pretty easy and can scale hugely.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Michael Paquier
michael@paquier.xyz
In reply to: Scott Marlowe (#5)
Re: Multi Master Replication

On Thu, Dec 19, 2013 at 2:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Sharding with plproxy is pretty easy and can scale hugely.

Yeah indeed, the writable postgres_fdw could also be used as a
solution, if designed carefully.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Chris Travers
chris.travers@gmail.com
In reply to: itishree sukla (#1)
Re: Multi Master Replication

On Wed, Dec 18, 2013 at 1:31 AM, itishree sukla <itishree.sukla@gmail.com>wrote:

Hi all,

I need suggestion about setting up multi master replication between two
postgresql server place two different geographical area. As i know using
some third party tool like Bucardo,RubyRep it can be achievable, not sue
which is the good one to use. If any one can provide me some online
documentation links, it will help me as well.

First suggestion: don't. Multi-master replication almost never is a win.
I tell customers this and they don't like to hear it but it is true.

Second suggestion: If you absolutely must, go with Bucardo. Rubyrep might
work but it is db-agnostic and this raises additional complexity concerns.

The major reasons why I argue against multi-master replication are:

1. With async replication you have the problem of conflict resolution. In
some (rare) cases this is avoidable, but in most cases it is not. You have
to deal with the fact that two different people in two different locations
may update the same row, and you have to decide how to handle this. Once
you have a conflict, every option you have to resolve the conflict results
in data loss. There are rare cases where this is the right solution.

2. With sync replication, you have coordination problems and therefore it
is never (at least IME) a win compared to master-slave replication since
all writes must occur in the same order in the set, or you need global
sequences, or such. So I would say that something like PGPool for
multi-master replication is just to be avoided generally (however for load
balancing in master/slave it can be a win). You will never get better read
or write throughput, or a more robust system than you will with
master/slave replication in a synchronous environment. Keep it simple.

As others have mentioned your best bet here is pl/proxy. I would go a
little further however and suggest that you can separate storage dbs from
proxy db's and thus create the appearance of multi-master over a
master-slave setup. If you are going to go this route however, I don't
know whether Postgres-XC would be worth looking into.

The key here though is that design effort is important. If you carefully
design your federated storage, then you should be good. However this leads
to lots of problems and you need to think them through.

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more.shtml

#8Wolfgang Keller
feliphil@gmx.net
In reply to: Chris Travers (#7)
Re: Multi Master Replication

2. With sync replication, you have coordination problems and
therefore it is never (at least IME) a win compared to master-slave
replication since all writes must occur in the same order in the set,
or you need global sequences, or such.

*snip*

You will never get better read or write throughput,

Better read throughput is trivial to achieve even with other solutions
than multi-master replication.

And for better write throughput, the developers of Postgres-XC
(supported by NTT, among others) beg to differ:

http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki
http://postgres-xc.sourceforge.net/

As does Bettina Kemme (of Postgres-R fame).

Sincerely,

Wolfgang

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Chris Travers
chris.travers@gmail.com
In reply to: Wolfgang Keller (#8)
Re: Multi Master Replication

On Thu, Dec 19, 2013 at 6:52 AM, Wolfgang Keller <feliphil@gmx.net> wrote:

2. With sync replication, you have coordination problems and
therefore it is never (at least IME) a win compared to master-slave
replication since all writes must occur in the same order in the set,
or you need global sequences, or such.

*snip*

You will never get better read or write throughput,

Better read throughput is trivial to achieve even with other solutions
than multi-master replication.

And for better write throughput, the developers of Postgres-XC
(supported by NTT, among others) beg to differ:

http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki
http://postgres-xc.sourceforge.net/

I am not quite sure what the point is. I am not sure you will get the
same write extensibility if you list every table as replicated instead of
partitioned. What Postgres-XC gives you ideally is a no-storage and
multi-master coordination layer on top of master-slave data nodes. Some
things may need to be replicated multi-master between data nodes but that's
not a win write throughput-wise.

I am btw a reasonable fan of Postgres-XC within its problem domain, but it
is not a synchronous multi-master replication solution as far as write
scaling goes.

My point still holds, which is that synchronous multi-master replication
will never beat master-slave in write throughput. My understanding of
Postgres-XC is that you'd mark tables as replicated (instead of
partitioned) when they are going to be joined against by different nodes
and infrequently updated (and hence the write overhead is less of a problem
than the cross-node join overhead).

Am I way off-base with my understanding here? At any rate it isn't
Postgres-XC (which is something very different than a typical "replication"
setup, and I would describe it more as an advanced sharding solution).

Best Wishes,
Chris Travers

As does Bettina Kemme (of Postgres-R fame).

Sincerely,

Wolfgang

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more.shtml

#10Michael Paquier
michael@paquier.xyz
In reply to: Chris Travers (#9)
Re: Multi Master Replication

On Thu, Dec 19, 2013 at 11:18 PM, Chris Travers <chris.travers@gmail.com> wrote:

On Thu, Dec 19, 2013 at 6:52 AM, Wolfgang Keller <feliphil@gmx.net> wrote:

2. With sync replication, you have coordination problems and
therefore it is never (at least IME) a win compared to master-slave
replication since all writes must occur in the same order in the set,
or you need global sequences, or such.

I am not quite sure what the point is. I am not sure you will get the same
write extensibility if you list every table as replicated instead of
partitioned. What Postgres-XC gives you ideally is a no-storage and
multi-master coordination layer on top of master-slave data nodes. Some
things may need to be replicated multi-master between data nodes but that's
not a win write throughput-wise.

You'd kill the write scalability of the application by marking all the
tables as replicated. The communication between nodes uses SQL
strings, so a DML on a replicated table needs to occur on all the
nodes, and on top of that you need 2PC for a transaction commit if
more than 2 nodes are involved in write operations in this
transaction.

I am btw a reasonable fan of Postgres-XC within its problem domain, but it
is not a synchronous multi-master replication solution as far as write
scaling goes.

OLTP applications that have a schema tunable for replication/partition
to maximize join pushdown might be a good definition of the
application range that could benefit from XC.

My point still holds, which is that synchronous multi-master replication
will never beat master-slave in write throughput. My understanding of
Postgres-XC is that you'd mark tables as replicated (instead of partitioned)
when they are going to be joined against by different nodes and infrequently
updated (and hence the write overhead is less of a problem than the
cross-node join overhead).

Yep, exactly. Those tables are actually master tables and the point is
to maximize the number of join clause push down to minimize the amount
of data exchanged between the nodes because of the shared-nothing
infrastructure. The type of tables that should be marked as
partitioned is the once that keep growing and need to scale of the
type "user" tables. This is actually how DBT-1 has been tuned when
doing scaling testing with it: partition user and adress tables,
replicate stock and item tables.

Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#10)
Re: Multi Master Replication

On Fri, Dec 20, 2013 at 8:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Dec 19, 2013 at 11:18 PM, Chris Travers <chris.travers@gmail.com> wrote:

On Thu, Dec 19, 2013 at 6:52 AM, Wolfgang Keller <feliphil@gmx.net> wrote:

2. With sync replication, you have coordination problems and
therefore it is never (at least IME) a win compared to master-slave
replication since all writes must occur in the same order in the set,
or you need global sequences, or such.

I am not quite sure what the point is. I am not sure you will get the same
write extensibility if you list every table as replicated instead of
partitioned. What Postgres-XC gives you ideally is a no-storage and
multi-master coordination layer on top of master-slave data nodes. Some
things may need to be replicated multi-master between data nodes but that's
not a win write throughput-wise.

You'd kill the write scalability of the application by marking all the
tables as replicated. The communication between nodes uses SQL
strings, so a DML on a replicated table needs to occur on all the
nodes, and on top of that you need 2PC for a transaction commit if
more than 2 nodes are involved in write operations in this
transaction.

I am btw a reasonable fan of Postgres-XC within its problem domain, but it
is not a synchronous multi-master replication solution as far as write
scaling goes.

OLTP applications that have a schema tunable for replication/partition
to maximize join pushdown might be a good definition of the
application range that could benefit from XC.

My point still holds, which is that synchronous multi-master replication
will never beat master-slave in write throughput. My understanding of
Postgres-XC is that you'd mark tables as replicated (instead of partitioned)
when they are going to be joined against by different nodes and infrequently
updated (and hence the write overhead is less of a problem than the
cross-node join overhead).

Yep, exactly. Those tables are actually master tables and the point is
to maximize the number of join clause push down to minimize the amount
of data exchanged between the nodes because of the shared-nothing
infrastructure. The type of tables that should be marked as
partitioned is the once that keep growing and need to scale of the
type "user" tables. This is actually how DBT-1 has been tuned when
doing scaling testing with it: partition user and adress tables,
replicate stock and item tables.

I actually wrote something stupid here, stock is partitioned and it
makes sense as it faces lot of updates:
http://images.wikia.com/postgresxc/images/6/66/PG-XC_Architecture.pdf (page 23)
Thanks to Chris for pointing that out.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12John R Pierce
pierce@hogranch.com
In reply to: itishree sukla (#1)
Re: Multi Master Replication

14 replies so far, and the OP hasn't chimed in with any feedback as to
what their presumed requirements are based on.

*meh*

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13pax
pax.deckard@gmail.com
In reply to: itishree sukla (#1)
Re: Multi Master Replication

Sometime ago i was looking for something like this and because at this time
XC was a little baby i tried installing bucardo but i gave up when stucked
fighting with perl modules. So, after testing some other solutions i decided
to make my own, just "touching" the trigger part of the pyreplica project
and building a daemon in a binary compiled program (no script) for a better
performance. It's working well by now, with some tables and few nodes (12
nodes deployed across long distances), but i was lucky then because the
database and the systems were still in design and development phases.
There are many things you have to keep in mind, as all people is warning
you. I tell you, it can be done, but you MUST redesign your existing
database preparing for multimaster paradigm where the key is avoid node
level concurrency, and your client database apps to THINK what to do when
nodes are down, but if you are not ready to prepare your database (design)
or you have not access to the source code of your client database apps then
you have problems.
I suggest you read this
http://www.dbspecialists.com/files/presentations/mm_replication.html and
this
http://scale-out-blog.blogspot.com/2012/04/if-you-must-deploy-multi-master.html.
Finally, i can mention another posible solution that i never tried but maybe
can help you, SymmetricDS.
Good luck.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Multi-Master-Replication-tp5783855p5784468.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Anand Kumar, Karthik
Karthik.AnandKumar@classmates.com
In reply to: pax (#13)
Re: Multi Master Replication

We use symmetricDS pretty extensively, across oracle and postgres
databases. It has its flaws and its strengths. It shines when there's
multiple database platforms involved, when the volume of transactions is
not too high, and supports multi master. Its optimized for wan topologies,
so its great if you have small amounts of data over longer distances, not
as great when there's several gigabytes needing transfer within the same
datacenter. It also has inbuilt insert vs update conflict resolution.

On the flip side, as the volume of writes increases, it can quickly lead
to significant database bloat and high CPU usage. Its also not as fast as
some other tools (like slony, or ora2pg) for initial loads of tables.

It exposes all of its switches and knobs (and there are a *lot* of
tunables), which make it both very flexible, but also prone to breakage if
you fiddle with it too much. Its in active development and we've found the
community forums to be pretty helpful when we run into undocumented bugs.

Let me know if you need help with it.

Thanks,
Karthik

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general