Replication Using Triggers

Started by Gordan Bobicabout 18 years ago19 messagesgeneral

gordan@bobich.net

about 18 years ago

Hi,

Is there any reason why PostgreSQL replication solutions are all add-on
3rd party ones? Is there any reason why replication couldn't be
implemented using triggers and a handful of stored procedures? This is
what I have in mind:

Have a plperl function that creates connections to all servers in the
cluster (replication partners), and issues the supplied write query to
them, possibly with a tag of some sort to indicated it is a replicated
query (to prevent circular replication).

Have a post execute trigger that calls the above replication function if
the query was issued directly (as opposed to replicated), and passes it
the query it just executed if it was successful.

If the replication failed on any node, the whole thing gets rolled back.

This would effectively give star topology synchronous replication with
very little effort, and no need for any external code. Am I missing
something obvious that would prevent this from working? It would give
replication capabilities better than MySQL's (which can only handle
ring based multi-master replication) for the sake of about 100 lines of
code. None of the required functionality required is new to PostgreSQL,
either.

Is there an existing implementation of this? Perhaps a perl program that
creates the required triggers and stored procedures from looking at a
schema?

Thanks.

Gordan

Import Notes

Reply to msg id not found: 20080118130929.EE42C1FA00F@sulonlineturismo.com.brReference msg id not found: 20080118130929.EE42C1FA00F@sulonlineturismo.com.br

Erik Jones

erik@myemma.com

about 18 years ago

In reply to: Gordan Bobic (#1)

Re: Replication Using Triggers

On Jan 18, 2008, at 9:21 AM, gordan@bobich.net wrote:

Hi,

Is there any reason why PostgreSQL replication solutions are all
add-on 3rd party ones?

Because no one solution would be appropriate for everyone. The core
team and contributors feel that their time is better spent on the
database itself rather than developing and maintaining multiple
different replication solutions and dealing with the support
thereof. What has been done is to add some extra hooks in 8.3 for
replication triggers that can help to specialize when/if a given
trigger fires.

Is there any reason why replication couldn't be implemented using
triggers and a handful of stored procedures?

That's usually how it's done. Well, plus some external user-land
application libraries.

This is what I have in mind:

Have a plperl function that creates connections to all servers in
the cluster (replication partners), and issues the supplied write
query to them, possibly with a tag of some sort to indicated it is
a replicated query (to prevent circular replication).

Have a post execute trigger that calls the above replication
function if the query was issued directly (as opposed to
replicated), and passes it the query it just executed if it was
successful.

If the replication failed on any node, the whole thing gets rolled
back.

That sounds pretty brittle. Do you really want all progress in your
databases to stop if there is a network issue to a single server?

This would effectively give star topology synchronous replication
with very little effort, and no need for any external code. Am I
missing something obvious that would prevent this from working? It
would give replication capabilities better than MySQL's (which can
only handle ring based multi-master replication) for the sake of
about 100 lines of code. None of the required functionality
required is new to PostgreSQL, either.

Is there an existing implementation of this? Perhaps a perl program
that creates the required triggers and stored procedures from
looking at a schema?

What you've described here would be pretty simple to implement.
However, I think you've greatly underestimated the performance issues
involved. If you need to push data to multiple databases before each
transaction commits I think you'll find that pretty slow. That's why
most of the available third party solutions are asynchronous. The
biggest options are out there are Slony and Londiste (both master-
slave, asynchronous) and Bucardo (asynchronous, but supports both
master-master and master-slave) which, as you would have it, is
written in Perl.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: Erik Jones (#2)

Re: Replication Using Triggers

On Fri, 18 Jan 2008, Erik Jones wrote:

Is there any reason why PostgreSQL replication solutions are all add-on 3rd
party ones?

Because no one solution would be appropriate for everyone. The core team and
contributors feel that their time is better spent on the database itself
rather than developing and maintaining multiple different replication
solutions and dealing with the support thereof. What has been done is to add
some extra hooks in 8.3 for replication triggers that can help to specialize
when/if a given trigger fires.

Hmm, selective trigger firing sounds interesting.

Is there any reason why replication couldn't be implemented using triggers
and a handful of stored procedures?

That's usually how it's done. Well, plus some external user-land application
libraries.

That's just it - I don't think any user-land libraries would actually be
required. One of supposed big advantages of MySQL is it's straightforward
replication support. It's quite painful to see PostgreSQL suffer purely
for the sake of lack of marketting in this department. :-(

This is what I have in mind:

Have a plperl function that creates connections to all servers in the
cluster (replication partners), and issues the supplied write query to
them, possibly with a tag of some sort to indicated it is a replicated
query (to prevent circular replication).

Have a post execute trigger that calls the above replication function if
the query was issued directly (as opposed to replicated), and passes it the
query it just executed if it was successful.

If the replication failed on any node, the whole thing gets rolled back.

That sounds pretty brittle. Do you really want all progress in your
databases to stop if there is a network issue to a single server?

1) That's what MySQL does (it either ignores errors or stops replication
on encountering an error, which of those two it does is selectable, but
that's about it).

2) If there is a network/server issue, that would be detected because the
$dbh would break. If the $dbh breaks, then plperl can either attempt to
re-connect, or failing that, boot the node from the replication cluster
(could have a node list with active/fenced flag in a separate config
schema). Neither approach would be difficult to implement.

Then it could just not bother reconnecting the fenced node until the user
updates the node status in the said config schema table. Configuration
schema could also be replicated.

This would effectively give star topology synchronous replication with very
little effort, and no need for any external code. Am I missing something
obvious that would prevent this from working? It would give replication
capabilities better than MySQL's (which can only handle ring based
multi-master replication) for the sake of about 100 lines of code. None of
the required functionality required is new to PostgreSQL, either.

Is there an existing implementation of this? Perhaps a perl program that
creates the required triggers and stored procedures from looking at a
schema?

What you've described here would be pretty simple to implement. However, I
think you've greatly underestimated the performance issues involved. If you
need to push data to multiple databases before each transaction commits I
think you'll find that pretty slow.

Only if transactions are used. I'm basing the requirements on "at least as
good as MySQL", which this would meet without transactions. If
transactions are wanted they could be enabled, otherwise it could just be
fire and forget asynchronous replication a-la MySQL. Having a choice
between transactions and speed is good. :-)

One thing I haven't quite thought of a good way to do with this approach
is the equivalent of the useful (albeit deprecated) LOAD DATA FROM MASTER
command, that gets the server in sync by dropping and re-loading all the
tables from the master(s) (or rather, peers in a multi-master star
replication), and enables it in the replication. It would be neater than
requiring downtime or global write locks. But I guess that could wait
until version 2. :)

That's why most of the available third
party solutions are asynchronous. The biggest options are out there are
Slony and Londiste (both master-slave, asynchronous) and Bucardo
(asynchronous, but supports both master-master and master-slave) which, as
you would have it, is written in Perl.

I looked at all of the above, and they all seemed (to meat least) to
involve unnecessary complication or limitations I saw as unreasonable (or
both). I looked at Bucardo in detail, and I was rather disappointed to see
that it only supports two master nodes at the moment.

Thanks.

Gordan

Peter Wilson

petew@yellowhawk.co.uk

about 18 years ago

In reply to: Gordan Bobic (#3)

Re: Replication Using Triggers

gordan@bobich.net wrote:

On Fri, 18 Jan 2008, Erik Jones wrote:

This is what I have in mind:

Have a plperl function that creates connections to all servers in the
cluster (replication partners), and issues the supplied write query
to them, possibly with a tag of some sort to indicated it is a
replicated query (to prevent circular replication).

Have a post execute trigger that calls the above replication function
if the query was issued directly (as opposed to replicated), and
passes it the query it just executed if it was successful.

Not sure here if you mean literally the SQL query that was executed - in which
case you have all sorts of problems with sequences and functions returning
different values.

If the replication failed on any node, the whole thing gets rolled back.

This would effectively give star topology synchronous replication
with very little effort, and no need for any external code. Am I
missing something obvious that would prevent this from working? It
would give replication capabilities better than MySQL's (which can
only handle ring based multi-master replication) for the sake of
about 100 lines of code. None of the required functionality required
is new to PostgreSQL, either.

But there are plenty of solutions that do a lot better than this. Slony-I is the
most polular. My favourite is a spin on the old db_mirror that used to be part
of the Postgres distribution.

I can't talk about how Slony works, but db_mirror uses a very fast 'C' function
to capture changes in a set of simple replication tables. A replication process
then takes data from those tables and replicates (using actual values not the
SQL statement) to any number of other servers. If one of the servers is down,
the data remains in the replication tables until that node returns (or is removed).

The problem with db_mirror was that the replication process was written in Perl.
This worked fine for simple tests but was ridiculously slow for replicating
tables holding big BYTEA structures. I re-wrote the replication code in 'C' and
it can replicate just about arbitrarily complex transactions is close to real-time.

You seem to be re-inventing the wheel, and the re-invention is not quite as
round as the existing wheel :-)

Is there an existing implementation of this? Perhaps a perl program
that creates the required triggers and stored procedures from looking
at a schema?

What you've described here would be pretty simple to implement.
However, I think you've greatly underestimated the performance issues
involved. If you need to push data to multiple databases before each
transaction commits I think you'll find that pretty slow.

Only if transactions are used. I'm basing the requirements on "at least
as good as MySQL", which this would meet without transactions. If
transactions are wanted they could be enabled, otherwise it could just
be fire and forget asynchronous replication a-la MySQL. Having a choice
between transactions and speed is good. :-)

Synchronous replication tends to imply it works on all servers simultaneously or
not on any. If any server fails a transaction it's rolled back on all servers.
What you're describing sounds asynchronous to me.

One thing I haven't quite thought of a good way to do with this approach
is the equivalent of the useful (albeit deprecated) LOAD DATA FROM
MASTER command, that gets the server in sync by dropping and re-loading
all the tables from the master(s) (or rather, peers in a multi-master
star replication), and enables it in the replication. It would be neater
than requiring downtime or global write locks. But I guess that could
wait until version 2. :)

That's one thing. The other problem that most trigger based replication systems
have problems with is propogating schema changes - because (I think) you can
attach triggers to schema changes.

Thanks.
Gordan

Pete
--
Peter Wilson : http://www.whitebeam.org

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: Peter Wilson (#4)

Re: Replication Using Triggers

This is what I have in mind:

Have a plperl function that creates connections to all servers in the
cluster (replication partners), and issues the supplied write query to
them, possibly with a tag of some sort to indicated it is a replicated
query (to prevent circular replication).

Have a post execute trigger that calls the above replication function if
the query was issued directly (as opposed to replicated), and passes it
the query it just executed if it was successful.

Not sure here if you mean literally the SQL query that was executed - in
which case you have all sorts of problems with sequences and functions
returning different values.

Indeed, but sequences at least can be worked around. Post-execute,
sequence number used should be available already, and the sequence offset
and increment can be set so as to ensure they do not clash. That's what
MySQL does (and I must apologize for making the comparison all the time).

If the replication failed on any node, the whole thing gets rolled back.

This would effectively give star topology synchronous replication with
very little effort, and no need for any external code. Am I missing
something obvious that would prevent this from working? It would give
replication capabilities better than MySQL's (which can only handle ring
based multi-master replication) for the sake of about 100 lines of code.
None of the required functionality required is new to PostgreSQL, either.

But there are plenty of solutions that do a lot better than this. Slony-I is
the most polular. My favourite is a spin on the old db_mirror that used to be
part of the Postgres distribution.

How would Slony be better? It doesn't seem to support master-master
replication for one.

I can't talk about how Slony works, but db_mirror uses a very fast 'C'
function to capture changes in a set of simple replication tables. A
replication process then takes data from those tables and replicates (using
actual values not the SQL statement) to any number of other servers. If one
of the servers is down, the data remains in the replication tables until that
node returns (or is removed).

Interesting. I was thinking about making an auxiliary feature that just
writes a pending queue log for a server when it cannot establish the $dbh,
and when it manages to connect, it attempts to re-play the log before
issuing new queries.

The problem with this is that the ordering becomes inconsistent with
multiple masters. That would be a bit too inconsistent for my liking. As
far as I can tell, that is also why MySQL's current replication method is
unsuitable for more than ring-replication. Having said that, ring suffers
from similar race conditions, it's more of a hack than a solution.

Now that I think about it, I'm not actually sure that waiting for global
success before final commit would make update/delete without race
condition as they won't fail, but can still yield inconsistencies due to
race conditions. Still, I think it's worth having despite this issue.

The problem with db_mirror was that the replication process was written in
Perl. This worked fine for simple tests but was ridiculously slow for
replicating tables holding big BYTEA structures. I re-wrote the replication
code in 'C' and it can replicate just about arbitrarily complex transactions
is close to real-time.

Yes, I can see how big blobs can be an issue for performance. :-(

You seem to be re-inventing the wheel, and the re-invention is not quite as
round as the existing wheel :-)

Not quite - I think multi-master capability is important.

Is there an existing implementation of this? Perhaps a perl program that
creates the required triggers and stored procedures from looking at a
schema?

What you've described here would be pretty simple to implement. However,
I think you've greatly underestimated the performance issues involved. If
you need to push data to multiple databases before each transaction
commits I think you'll find that pretty slow.

Only if transactions are used. I'm basing the requirements on "at least as
good as MySQL", which this would meet without transactions. If transactions
are wanted they could be enabled, otherwise it could just be fire and
forget asynchronous replication a-la MySQL. Having a choice between
transactions and speed is good. :-)

Synchronous replication tends to imply it works on all servers simultaneously
or not on any. If any server fails a transaction it's rolled back on all
servers. What you're describing sounds asynchronous to me.

Indeed, I spotted that above. The transactions roll back of they fail, but
this alone does not quite ensure cross-node consistency of the data. Some
kind of special DELETE/UPDATE handling would be required to fix this, but
I don't have a definitive idea on how this could be handled. Will have to
think about it a bit more.

One thing I haven't quite thought of a good way to do with this approach is
the equivalent of the useful (albeit deprecated) LOAD DATA FROM MASTER
command, that gets the server in sync by dropping and re-loading all the
tables from the master(s) (or rather, peers in a multi-master star
replication), and enables it in the replication. It would be neater than
requiring downtime or global write locks. But I guess that could wait until
version 2. :)

That's one thing. The other problem that most trigger based replication
systems have problems with is propogating schema changes - because (I think)
you can attach triggers to schema changes.

I presume you mean that you cannot attach triggers to schema changes. Yes,
I had thought of that a minute ago. I don't suppose this could be deemed a
feature request for CREATE/ALTER/DROP schema level triggers? ;)

Gordan

Erik Jones

erik@myemma.com

about 18 years ago

In reply to: Gordan Bobic (#5)

Re: Replication Using Triggers

On Jan 18, 2008, at 11:37 AM, gordan@bobich.net wrote:

That's one thing. The other problem that most trigger based
replication systems have problems with is propogating schema
changes - because (I think) you can attach triggers to schema
changes.

I presume you mean that you cannot attach triggers to schema
changes. Yes, I had thought of that a minute ago. I don't suppose
this could be deemed a feature request for CREATE/ALTER/DROP schema
level triggers? ;)

I'm sure he did. Also, there was a little bit of discussion a week
or so ago about ddl & ddl/dml (such as TRUNCATE) triggers. With the
imminent release of 8.3, now's the time to pipe up with feature
requests.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Andrew Sullivan

ajs@crankycanuck.ca

about 18 years ago

In reply to: Gordan Bobic (#3)

Re: Replication Using Triggers

On Fri, Jan 18, 2008 at 04:09:45PM +0000, gordan@bobich.net wrote:

That's just it - I don't think any user-land libraries would actually be
required. One of supposed big advantages of MySQL is it's straightforward
replication support. It's quite painful to see PostgreSQL suffer purely
for the sake of lack of marketting in this department. :-(

The "straigtforward" replication support in MySQL is seriously broken. We
(by which I really mean "Jan") spent a great deal of time on the design of
Slony (and it's add-on nature is a feature, not a bug -- one thing it can do
is cross-version upgrades on PostgreSQL versions that were out before Slony
was finished being dfesigned) to avoid several nasty corner cases that are
sort of waved aside in the MySQL documentation. Designing a replication
system that works well 80% of the time is a waste of effort, because the
times when you really need it are all already in that 20% of cases that you
won't cover with the simple-minded solution.

Specifically,

1) That's what MySQL does (it either ignores errors or stops replication
on encountering an error, which of those two it does is selectable, but
that's about it).

That's got to be _the_ most brain-dead approach to replication I've ever
heard. It chooses the two least good of all possible worlds, and when you
get into your particular version of hell at 0-dark:30, you have to spend
some time first figuring out which hell you happen to be in.

In any case,

fire and forget asynchronous replication a-la MySQL. Having a choice
between transactions and speed is good. :-)

if this is what you believe, then you don't need a database to store your
data anyway. I can make your data system faster by storing all your data on
/dev/null. Writes will be very fast indeed.

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: Andrew Sullivan (#7)

Re: Replication Using Triggers

Andrew Sullivan wrote:

On Fri, Jan 18, 2008 at 04:09:45PM +0000, gordan@bobich.net wrote:

That's just it - I don't think any user-land libraries would actually be
required. One of supposed big advantages of MySQL is it's straightforward
replication support. It's quite painful to see PostgreSQL suffer purely
for the sake of lack of marketting in this department. :-(

The "straigtforward" replication support in MySQL is seriously broken.

I am not arguing that it isn't! :-)
I am merely trying to implement something at least as good (or rather,
no more broken) for PostgreSQL with a minimum of effort.

We
(by which I really mean "Jan") spent a great deal of time on the design of
Slony (and it's add-on nature is a feature, not a bug -- one thing it can do
is cross-version upgrades on PostgreSQL versions that were out before Slony
was finished being dfesigned) to avoid several nasty corner cases that are
sort of waved aside in the MySQL documentation. Designing a replication
system that works well 80% of the time is a waste of effort, because the
times when you really need it are all already in that 20% of cases that you
won't cover with the simple-minded solution.

Specifically,

1) That's what MySQL does (it either ignores errors or stops replication
on encountering an error, which of those two it does is selectable, but
that's about it).

That's got to be _the_ most brain-dead approach to replication I've ever
heard. It chooses the two least good of all possible worlds, and when you
get into your particular version of hell at 0-dark:30, you have to spend
some time first figuring out which hell you happen to be in.

I couldn't agree more. But I don't see another multi-master replication
solution on the horizon.

In any case,

fire and forget asynchronous replication a-la MySQL. Having a choice
between transactions and speed is good. :-)

if this is what you believe, then you don't need a database to store your
data anyway. I can make your data system faster by storing all your data on
/dev/null. Writes will be very fast indeed.

Fantastically put. :-)

But in the meantime, until a better multi-master replication solution
becomes available, I think I'll stick with the current plan.

I suppose some kind of a write counter with a rolling write query hash
could be implemented. Replicator function issues locks and compares the
counters/hashes to establish whether a state is consistent on all nodes
before a write query is replicated. It's a kludge and a horrible one at
that, and it will slow down the writes under load, but I think it would
work for ensuring ordering consistency with not-commutative write
operations.

Gordan

Andreas 'ads' Scherbaum

adsmail@wars-nicht.de

about 18 years ago

In reply to: Gordan Bobic (#5)

Re: Replication Using Triggers

Hello,

On Fri, 18 Jan 2008 17:37:07 +0000 (GMT) gordan@bobich.net wrote:

This is what I have in mind:

Have a plperl function that creates connections to all servers in the
cluster (replication partners), and issues the supplied write query to
them, possibly with a tag of some sort to indicated it is a replicated
query (to prevent circular replication).

Have a post execute trigger that calls the above replication function if
the query was issued directly (as opposed to replicated), and passes it
the query it just executed if it was successful.

Not sure here if you mean literally the SQL query that was executed - in
which case you have all sorts of problems with sequences and functions
returning different values.

Indeed, but sequences at least can be worked around. Post-execute,
sequence number used should be available already, and the sequence offset
and increment can be set so as to ensure they do not clash. That's what
MySQL does (and I must apologize for making the comparison all the time).

Sequences are only one (small) problem. What about functions returning
different results (volatile) for each call? Just imagine random() or
now(). What about inserts or updates selecting parts of table data? You
can't be sure to get exactly the same results on the slave.

But there are plenty of solutions that do a lot better than this. Slony-I is
the most polular. My favourite is a spin on the old db_mirror that used to be
part of the Postgres distribution.

How would Slony be better? It doesn't seem to support master-master
replication for one.

But it work's. At least master-slave.

The problem with this is that the ordering becomes inconsistent with
multiple masters. That would be a bit too inconsistent for my liking. As
far as I can tell, that is also why MySQL's current replication method is
unsuitable for more than ring-replication. Having said that, ring suffers
from similar race conditions, it's more of a hack than a solution.

A lot books are written about this topic, many details researched and
still master-master replication i a very complicated topic. You should
start reading about vector and Lamport clocks. This are the (very)
basic (among others) for your sync problem.

Now that I think about it, I'm not actually sure that waiting for global
success before final commit would make update/delete without race
condition as they won't fail, but can still yield inconsistencies due to
race conditions. Still, I think it's worth having despite this issue.

Sure, you are right. You now are about to find out what other ppl are
researching the last *uhm* 20 years ;-)

You seem to be re-inventing the wheel, and the re-invention is not quite as
round as the existing wheel :-)

Not quite - I think multi-master capability is important.

Yes. But master-master or even multi-master is a very complicated topic.
So let's start with something more easy, like master-slave. This works,
you don't have race conditions and that's a good starting point for
more research.

I presume you mean that you cannot attach triggers to schema changes. Yes,
I had thought of that a minute ago. I don't suppose this could be deemed a
feature request for CREATE/ALTER/DROP schema level triggers? ;)

More triggers, not only for ddl changes, would be nice anyway.
I see fields of application for some of my own projects ;-)

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group

#10

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: Andreas 'ads' Scherbaum (#9)

Re: Replication Using Triggers

Andreas 'ads' Scherbaum wrote:

Have a plperl function that creates connections to all servers in the
cluster (replication partners), and issues the supplied write query to
them, possibly with a tag of some sort to indicated it is a replicated
query (to prevent circular replication).

Have a post execute trigger that calls the above replication function if
the query was issued directly (as opposed to replicated), and passes it
the query it just executed if it was successful.

Not sure here if you mean literally the SQL query that was executed - in
which case you have all sorts of problems with sequences and functions
returning different values.

Indeed, but sequences at least can be worked around. Post-execute,
sequence number used should be available already, and the sequence offset
and increment can be set so as to ensure they do not clash. That's what
MySQL does (and I must apologize for making the comparison all the time).

Sequences are only one (small) problem. What about functions returning
different results (volatile) for each call? Just imagine random() or
now().

Yes, that's a problem. The bodge workaround for that is to save the
master's state for such functions and re-pack it from a function into a
literal in a pre-execution trigger, and then replicate the literals.

What about inserts or updates selecting parts of table data? You
can't be sure to get exactly the same results on the slave.

You can if you have an ordering consistency check mechanism, as I
mentioned in the other mail. Recovery when "something goes wrong" (tm),
however, could get interesting, especially under heavy distributed write
load. If there's a counter and a hash, I guess you could lock
everything, find the one with the biggest counter, and release the lock
on everything else until it catches up, then re-lock, then replicate. It
would add a fair bit of latency, though.

Gordan

#11

David Fetter

david@fetter.org

about 18 years ago

In reply to: Gordan Bobic (#8)

Re: Replication Using Triggers

On Fri, Jan 18, 2008 at 09:27:08PM +0000, Gordan Bobic wrote:

Andrew Sullivan wrote:

On Fri, Jan 18, 2008 at 04:09:45PM +0000, gordan@bobich.net wrote:

That's just it - I don't think any user-land libraries would
actually be required. One of supposed big advantages of MySQL is
it's straightforward replication support. It's quite painful to
see PostgreSQL suffer purely for the sake of lack of marketting in
this department. :-(

The "straigtforward" replication support in MySQL is seriously
broken.

I am not arguing that it isn't! :-) I am merely trying to implement
something at least as good (or rather, no more broken) for
PostgreSQL with a minimum of effort.

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#12

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: David Fetter (#11)

Re: Replication Using Triggers

David Fetter wrote:

That's just it - I don't think any user-land libraries would
actually be required. One of supposed big advantages of MySQL is
it's straightforward replication support. It's quite painful to
see PostgreSQL suffer purely for the sake of lack of marketting in
this department. :-(

The "straigtforward" replication support in MySQL is seriously
broken.

I am not arguing that it isn't! :-) I am merely trying to implement
something at least as good (or rather, no more broken) for
PostgreSQL with a minimum of effort.

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

Gordan

#13

Gregory Youngblood

greg@netio.org

about 18 years ago

In reply to: Gordan Bobic (#12)

Re: Replication Using Triggers

On Sat, 2008-01-19 at 23:46 +0000, Gordan Bobic wrote:

David Fetter wrote:

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

What about pgcluster? It's supposed to be able to provide synchronous
multi-master replication for postgresql.

Greg

#14

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: Gregory Youngblood (#13)

Re: Replication Using Triggers

Gregory Youngblood wrote:

On Sat, 2008-01-19 at 23:46 +0000, Gordan Bobic wrote:

David Fetter wrote:

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

What about pgcluster? It's supposed to be able to provide synchronous
multi-master replication for postgresql.

I looked at that, too, but it wasn't really a "cluster of equal peers"
sort of solution, which is what I am after. Still, thanks for pointing
it out.

Gordan

#15

Scott Marlowe

scott.marlowe@gmail.com

about 18 years ago

In reply to: Gordan Bobic (#12)

Re: Replication Using Triggers

On Jan 19, 2008 5:46 PM, Gordan Bobic <gordan@bobich.net> wrote:

David Fetter wrote:

That's just it - I don't think any user-land libraries would
actually be required. One of supposed big advantages of MySQL is
it's straightforward replication support. It's quite painful to
see PostgreSQL suffer purely for the sake of lack of marketting in
this department. :-(

The "straigtforward" replication support in MySQL is seriously
broken.

I am not arguing that it isn't! :-) I am merely trying to implement
something at least as good (or rather, no more broken) for
PostgreSQL with a minimum of effort.

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

Then go use MySQL.

PostgreSQL multi-master replication systems I know of:

pgcluster
http://www.postgresql.org/about/news.752

bucardo
http://bucardo.org/

One of the features of PostgreSQL is that it's easily enough to extend
that you don't have to rely on just what's built in and supported by
core.

The multi-master replication in MySQL is NOT recommended for things
like bank style transactional systems. It was built for telecom
systems, where everything can be restored form a backup if the power
goes out.

#16

Scott Marlowe

scott.marlowe@gmail.com

about 18 years ago

In reply to: Gordan Bobic (#14)

Re: Replication Using Triggers

On Jan 19, 2008 6:14 PM, Gordan Bobic <gordan@bobich.net> wrote:

Gregory Youngblood wrote:

On Sat, 2008-01-19 at 23:46 +0000, Gordan Bobic wrote:

David Fetter wrote:

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

What about pgcluster? It's supposed to be able to provide synchronous
multi-master replication for postgresql.

I looked at that, too, but it wasn't really a "cluster of equal peers"
sort of solution, which is what I am after. Still, thanks for pointing
it out.

Oh, and there's this too:

Cybertec sync-multi-master
http://www.postgresql.org/about/news.752
http://www.postgresql.org/about/news.752

#17

Gordan Bobic

gordan@bobich.net

about 18 years ago

In reply to: Scott Marlowe (#16)

Re: Replication Using Triggers

Scott Marlowe wrote:

On Jan 19, 2008 6:14 PM, Gordan Bobic <gordan@bobich.net> wrote:

Gregory Youngblood wrote:

On Sat, 2008-01-19 at 23:46 +0000, Gordan Bobic wrote:

David Fetter wrote:

In that case, use one of the existing solutions. They're all way
easier than re-inventing the wheel.

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

What about pgcluster? It's supposed to be able to provide synchronous
multi-master replication for postgresql.

I looked at that, too, but it wasn't really a "cluster of equal peers"
sort of solution, which is what I am after. Still, thanks for pointing
it out.

Oh, and there's this too:

Cybertec sync-multi-master
http://www.postgresql.org/about/news.752
http://www.postgresql.org/about/news.752

The design of that seems suspiciously similar to pgcluster with separate
load balancer and replicator servers.

Gordan

#18

Andreas 'ads' Scherbaum

adsmail@wars-nicht.de

about 18 years ago

In reply to: Gordan Bobic (#17)

Re: Replication Using Triggers

On Sun, 20 Jan 2008 00:34:11 +0000 Gordan Bobic wrote:

Scott Marlowe wrote:

On Jan 19, 2008 6:14 PM, Gordan Bobic <gordan@bobich.net> wrote:

Oh, and there's this too:

Cybertec sync-multi-master
http://www.postgresql.org/about/news.752
http://www.postgresql.org/about/news.752

The design of that seems suspiciously similar to pgcluster with separate
load balancer and replicator servers.

It does not only look like pgcluster.

Bye

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group

#19

Merlin Moncure

mmoncure@gmail.com

about 18 years ago

In reply to: Gordan Bobic (#12)

Re: Replication Using Triggers

On Jan 19, 2008 6:46 PM, Gordan Bobic <gordan@bobich.net> wrote:

Existing solutions can't handle multiple masters. MySQL can do it at
least in a ring arrangement.

mysql multi-master replication looks a lot better on paper than it
really is...one of the reasons mysql is able to seemingly provide it
so easily is that mysql doesn't take transactions and locks very
seriously whereas postgres does. Because spreading the locking system
around on multiple servers complicates everything and adds all kinds
of timing issues, you are unlikely to see any real performance benefit
over a single well built server unless your ratio of reads to writes
is extremely high (especially on a relatively high latency link like
ethernet). Since single master replication will fly in many of those
cases, the use case for multi-master replication is a lot smaller than
it appears on the surface. multi-master is more appealing if you do
not need locks strictly enforced and you can play fast and loose with
your data...many applications do not have this luxury however. Many
people who ask for this feature grossly misunderstand the complexity
of the problem and expect there to be a magical solution with few
trade-offs. mysql's solution to this problem, while quite useful at
solving particular problems, is not a 'solution' in the general sense.
By the way, many big and/or busy database are bound by the i/o
system, not the cpu.

PostgreSQL chose another path...we like our locks and want our
transactions to give exact and specific answers. This means a few
compromises on some things you take for granted in mysql (select
count(*) from foo; comes to mind), but maybe a better solution for
solving a wide range of applications where every transaction counts.
PostgreSQL also scales very well too multiple core systems (much
better than mysql, its been reported), so you can always scale the
server up fairly cheaply...x86 can easily scale up to 16 cores these
days fairly cheaply with decent results. Expect these trends to
continue going forwards...one thing that the PostgreSQL community
might be interested in the future is distributing a single query
across multiple cores...

Anyways, for replication on the single master side you have Slony
(trigger based) which is an effective solution along with some other
supported solutions (mammoth replicator, for example). At some point
in the future (maybe 8.4?) we will have PITR hot standby which will
likely be the 'in core' replication solution although various 3rd
party tools may extend on it and make it do various things.
Personally, unless I have some easily distributed task or have
insanely high transaction loading requirements (popular web site), I
would build twin servers, each capable of servicing the database
completely and use PITR to maintain a warm standby. Obviously, this
approach has limits but it's often a good place to start. If things
are too hot for a single server to handle, maybe it's time to start
dividing up the tasks into logical portions with shared data being
moved around (with db-link, or slony).

merlin