Replication

jd@commandprompt.com

almost 17 years ago

In reply to: Gerry Reno (#1)

Re: Replication

On Mon, 2009-06-22 at 17:53 -0400, Gerry Reno wrote:

I noticed that the user survey on the community page does not list
replication among the choices for development priority. For me,
replication is the most important thing that is critically missing from
postgresql. We need something as good as MySQL Replication. Both
statement-based and row-based replication. And support for
Master-Master and full cyclic replication setups. Postgresql is just a
toy database without this as far as I am concerned.

Funny.

Joshua D. Drake

Regards,
Gerry

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

Kevin Barnard

kevin.barnard@laser2mail.com

almost 17 years ago

In reply to: Gerry Reno (#1)

Re: Replication

On Jun 22, 2009, at 4:53 PM, Gerry Reno wrote:

I noticed that the user survey on the community page does not list
replication among the choices for development priority. For me,
replication is the most important thing that is critically missing
from postgresql. We need something as good as MySQL Replication.
Both statement-based and row-based replication. And support for
Master-Master and full cyclic replication setups. Postgresql is
just a toy database without this as far as I am concerned.

Regards,
Gerry

Google postgresql replication. There are multiple replication /
clustering options depending on you needs. It's not built in to the
DB nor should it be because everyone has different replication needs.

The idea of separating replication functionality from the core DB
product isn't new. AFAIK IBM has always done this on there big iron
based DB2. Granted their cheap replication software costs more then
you paid for that server that is running MySQL, and the expensive
replication probably costs more then a cabinet worth of MySQL
servers. :-)

--
Kevin Barnard
kevin.barnard@laser2mail.com

greno@verizon.net

almost 17 years ago

In reply to: Kevin Barnard (#3)

Re: Replication

Kevin Barnard wrote:

On Jun 22, 2009, at 4:53 PM, Gerry Reno wrote:

I noticed that the user survey on the community page does not list
replication among the choices for development priority. For me,
replication is the most important thing that is critically missing
from postgresql. We need something as good as MySQL Replication. Both
statement-based and row-based replication. And support for
Master-Master and full cyclic replication setups. Postgresql is just
a toy database without this as far as I am concerned.

Regards,
Gerry

Google postgresql replication. There are multiple replication /
clustering options depending on you needs. It's not built in to the DB
nor should it be because everyone has different replication needs.

The idea of separating replication functionality from the core DB
product isn't new. AFAIK IBM has always done this on there big iron
based DB2. Granted their cheap replication software costs more then
you paid for that server that is running MySQL, and the expensive
replication probably costs more then a cabinet worth of MySQL servers.
:-)

--
Kevin Barnard
kevin.barnard@laser2mail.com

Have you ever tried any of the postgresql replication offerings? The
only one that is remotely viable is slony and it is so quirky you may as
well forget it. The rest are in some stage of decay/abandonment. There
is no real replication available for postgresql. Postgresql needs to
develop a real replication offering for postgresql. Builtin or a
separate module.

Regards,
Gerry

jd@commandprompt.com

almost 17 years ago

In reply to: Gerry Reno (#4)

Re: Replication

On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:

Kevin Barnard wrote:

Have you ever tried any of the postgresql replication offerings? The
only one that is remotely viable is slony and it is so quirky you may as
well forget it. The rest are in some stage of decay/abandonment. There
is no real replication available for postgresql. Postgresql needs to
develop a real replication offering for postgresql. Builtin or a
separate module.

Well this certainly isn't true but what do I know.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

greno@verizon.net

almost 17 years ago

In reply to: Joshua D. Drake (#5)

Re: Replication

Joshua D. Drake wrote:

On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:

Kevin Barnard wrote:

Have you ever tried any of the postgresql replication offerings? The
only one that is remotely viable is slony and it is so quirky you may as
well forget it. The rest are in some stage of decay/abandonment. There
is no real replication available for postgresql. Postgresql needs to
develop a real replication offering for postgresql. Builtin or a
separate module.

Well this certainly isn't true but what do I know.

Joshua D. Drake

It is true. Otherwise show me a viable replication offering for
postgresql that I can put into production and obtain support for it.

Regards,
Gerry

jd@commandprompt.com

almost 17 years ago

In reply to: Gerry Reno (#6)

Re: Replication

On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:

Joshua D. Drake wrote:

It is true. Otherwise show me a viable replication offering for
postgresql that I can put into production and obtain support for it.

Well, you can get support for Slony (known to to be a bit complicated
but stable and flexible). You can also get support for Londiste (which
is used in production by Skype... I think that speaks for itself). You
can get support for log shipping if all you need is simple master->slave
redundancy.

I can name others if you like but since you are clearly not able to
effectively use Google nor actually present production requirements so
people can help you, I doubt it would do much good.

Joshua D. Drake

Regards,
Gerry

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

greno@verizon.net

almost 17 years ago

In reply to: Joshua D. Drake (#7)

Re: Replication

Joshua D. Drake wrote:

On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:

Joshua D. Drake wrote:

It is true. Otherwise show me a viable replication offering for
postgresql that I can put into production and obtain support for it.

Well, you can get support for Slony (known to to be a bit complicated
but stable and flexible).

I've already tried Slony last year and unless something major has
changed it is not viable. I cannot have replication that just stops for
no known reason.

You can also get support for Londiste (which
is used in production by Skype... I think that speaks for itself).

Londiste is beta. The fact that Skype uses it is because it's part of
Skytools which is their product. They may want to run their own beta
stuff. I don't.

You
can get support for log shipping if all you need is simple master->slave
redundancy.

If all I needed was log shipping I can do that myself with some scripts.

I can name others if you like but since you are clearly not able to
effectively use Google nor actually present production requirements so
people can help you, I doubt it would do much good.

Joshua D. Drake

So name others.

Regards,
Gerry

Greg Smith

gsmith@gregsmith.com

almost 17 years ago

In reply to: Gerry Reno (#1)

Re: Replication

On Mon, 22 Jun 2009, Gerry Reno wrote:

We need something as good as MySQL Replication.

I certainly hope not, I was hoping for a reliable replication solution
instead. Wow is the information you get searching for something like
"mysql replication corruption [replay log|bin log]" scary. I also
appreciate fun bits like how you'll get completely quiet master/slave
mismatches if you should do something crazy like, say, use LIMIT the wrong
way (see http://dev.mysql.com/doc/refman/5.0/en/replication-features.html
for more fun like that).

Anyway, you seem to be unaware that built-in replication for PostgreSQL
already is moving along, with an implementation that's just not quite
production quality yet, and might make into the next version after 8.4 if
things go well. That's probably why it's not on the survey--everybody
knows that's important and it's already being worked on actively.

P.S. another Google search, this one for "postgresql replication support",
finds the mythical company that sells multiple products and support for
this purpose on hit #2 for me. Or you could use the alternate approach of
looking at the jobs of the everyone who's been giving your a hard time in
this thread...

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#10

greno@verizon.net

almost 17 years ago

In reply to: Greg Smith (#9)

Re: Replication

Greg Smith wrote:

On Mon, 22 Jun 2009, Gerry Reno wrote:

We need something as good as MySQL Replication.

I certainly hope not, I was hoping for a reliable replication solution
instead. Wow is the information you get searching for something like
"mysql replication corruption [replay log|bin log]" scary. I also
appreciate fun bits like how you'll get completely quiet master/slave
mismatches if you should do something crazy like, say, use LIMIT the
wrong way (see
http://dev.mysql.com/doc/refman/5.0/en/replication-features.html for
more fun like that).

I didn't mean to imply that MySQL Replication was perfect. But I've been
using it for over three years with very few problems. And yes with
statement-based replication you can get some interesting replication
anomalies if you're not careful. But, that's true of any statement-based
replication with any database.

Anyway, you seem to be unaware that built-in replication for
PostgreSQL already is moving along, with an implementation that's just
not quite production quality yet, and might make into the next version
after 8.4 if things go well.

No, I'm aware of this basic builtin replication. It was rather
disappointing to see it moved out of the 8.4 release. We need something
more that just basic master-slave replication which is all this simple
builtin replication will provide. We need a real replication solution
that can handle statement-based and row-based replication. Multi-master
replication. Full cyclic replication chain setups. Simple master-slave
just doesn't cut it.

That's probably why it's not on the survey--everybody knows that's
important and it's already being worked on actively.

Ok, I just felt it should still be there. But, I hope development
understands just how important good replication really is.

P.S. another Google search, this one for "postgresql replication
support", finds the mythical company that sells multiple products and
support for this purpose on hit #2 for me. Or you could use the
alternate approach of looking at the jobs of the everyone who's been
giving your a hard time in this thread...

I figured as much.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Regards,
Gerry

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 17 years ago

In reply to: Greg Smith (#9)

Re: Replication

Greg Smith <gsmith@gregsmith.com> writes:

On Mon, 22 Jun 2009, Gerry Reno wrote:

We need something as good as MySQL Replication.

I certainly hope not, I was hoping for a reliable replication solution
instead. Wow is the information you get searching for something like
"mysql replication corruption [replay log|bin log]" scary.

My experience, stretching over more than five years now, is that mysql
replication fails its own regression tests a significant percentage of
the time ... in nonreproducible fashion of course, so it's hard to file
bug reports. I'm aware of this because I package the thing for Red Hat,
and I run mysql's regression tests as part of that build, and close to
half the time the build fails in the regression tests, invariably in
the replication-related tests. Never twice the same mind you; when
I resubmit the job, with the exact same SRPM, it usually works.

This might be some artifact of the Red Hat/Fedora build farm
environment, since my builds on my own workstation seldom fail. But
it's persisted over multiple incarnations of that build farm and quite
a few versions of mysql. I've never been able to pin it down enough
to file a bug report.

I can't say I'd trust mysql replication with any data I cared about.

regards, tom lane

#12

craig@2ndquadrant.com

almost 17 years ago

In reply to: Gerry Reno (#10)

Re: Replication

On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

Anyway, you seem to be unaware that built-in replication for
PostgreSQL already is moving along, with an implementation that's just
not quite production quality yet, and might make into the next version
after 8.4 if things go well.

No, I'm aware of this basic builtin replication. It was rather
disappointing to see it moved out of the 8.4 release. We need something
more that just basic master-slave replication which is all this simple
builtin replication will provide. We need a real replication solution
that can handle statement-based and row-based replication. Multi-master
replication. Full cyclic replication chain setups. Simple master-slave
just doesn't cut it.

Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.

Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.

My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?

- Client-transparent fault-tolerance?

- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

- Client must connect to master for writes, otherwise master or slave,
so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

That's probably why it's not on the survey--everybody knows that's
important and it's already being worked on actively.

Ok, I just felt it should still be there. But, I hope development
understands just how important good replication really is.

"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

--
Craig Ringer

#13

greno@verizon.net

almost 17 years ago

In reply to: Craig Ringer (#12)

Re: Replication

Craig Ringer wrote:

On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

Anyway, you seem to be unaware that built-in replication for
PostgreSQL already is moving along, with an implementation that's just
not quite production quality yet, and might make into the next version
after 8.4 if things go well.

No, I'm aware of this basic builtin replication. It was rather
disappointing to see it moved out of the 8.4 release. We need something
more that just basic master-slave replication which is all this simple
builtin replication will provide. We need a real replication solution
that can handle statement-based and row-based replication. Multi-master
replication. Full cyclic replication chain setups. Simple master-slave
just doesn't cut it.

Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.

I don't know how it could guarantee that. That's really why row-based
is better.

Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.

Sequences I deal with by setting up an offset and increment for each
replica so that there are no conflicts.
You have to know the entire replication array size prior to setup. I
usually set increment to 10 and then I can offset up to 10 replicas.

My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?

yes

- Client-transparent fault-tolerance?

yes.

- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

failure of single master.

- Client must connect to master for writes, otherwise master or slave,
so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

That's probably why it's not on the survey--everybody knows that's
important and it's already being worked on actively.

Ok, I just felt it should still be there. But, I hope development
understands just how important good replication really is.

"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

Regards,
Gerry

#14

http://dev.mysql.com/doc/refman/5.0/en/replication-features-functions.html

craig@2ndquadrant.com

almost 17 years ago

In reply to: Gerry Reno (#13)

Re: Replication

On Mon, 2009-06-22 at 21:29 -0400, Gerry Reno wrote:

I don't know how it could guarantee that. That's really why row-based
is better.

Yep, especially in the face of things like user PL functions, C
functions, etc.

This page:

is downright alarming, and (implicitly) says quite enough about how
statement-based replication is a really, REALLY bad idea.

Rather than replicating sets of changed rows, though, I suspect that
block-level replication using the WAL is probably more efficient.
Certainly it'll be easier on the slave in terms of the work required to
keep up with the master.

I guess block-level replication isn't much good for multi-master,
though, since you'd be spending half your time finding out what the
other masters were doing and what their state was, or telling them about
yours. (I guess that's the case anyway to some extent, though, any time
you have concurrent statements on different masters using the same data
and one or more of them is altering it).

Sequences I deal with by setting up an offset and increment for each
replica so that there are no conflicts.

Ah, so you don't actually care if the replicas are identical - you
expect things like different primary keys on master and replicas(s) ?

How do your applications cope if they switch from one replica to another
and suddenly primary key identifiers are different?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

failure of single master.

For that, read-only slave + heartbeat based failover with STONITH (shoot
the other node in the head) by something like IPMI remote-poweroff or a
USB-controlled power switch would be sufficient.

The only part of the requirements for this that PG can't already satisfy
is synchronous replication - the current WAL-based replication doesn't
guarantee that the slave has the changes before the client's commit
returns successfully, so recent changes that the client thinks are
committed might be lost on failover. Synchronous replication is, of
course, what's being worked on right now, partly to address just this
issue and partly to allow for read-only reporting slaves.

This technique is well established, very robust, and it's not hard to
implement in a way that ensures that the slave - when it takes over as
master - claims the master's MAC address and IP address so clients
barely notice a change.

With Pg it'd break any existing connections, but any database
application worth a damn must be able to handle re-issuing transactions
due to deadlocks, resource exhaustion, admin statement cancellation etc
anyway.

--
Craig Ringer

#15

scott.marlowe@gmail.com

almost 17 years ago

In reply to: Gerry Reno (#8)

Re: Replication

On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno<greno@verizon.net> wrote:

Joshua D. Drake wrote:

On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:

Joshua D. Drake wrote:

It is true. Otherwise show me a viable replication offering for
postgresql that I can put into production and obtain support for it.

Well, you can get support for Slony (known to to be a bit complicated
but stable and flexible).

I've already tried Slony last year and unless something major has changed it
is not viable. I cannot have replication that just stops for no known
reason.

I've been running slony since 1.0 came out, and have NEVER had it just
stop replication for no known reason. ever. Your inability to use it
tells me much less about slony than it does about you.

You can also get support for Londiste (which
is used in production by Skype... I think that speaks for itself).

Londiste is beta. The fact that Skype uses it is because it's part of
Skytools which is their product. They may want to run their own beta stuff.
I don't.

So, if they said it was general release, but it sucked, you'd try it,
but since they say it's beta, no way? Wow. Just wow. The amount of
dumb in that sentence is not measurable with modern instrumentation.

#16

greno@verizon.net

almost 17 years ago

In reply to: Craig Ringer (#14)

Re: Replication

Craig Ringer wrote:

On Mon, 2009-06-22 at 21:29 -0400, Gerry Reno wrote:

I don't know how it could guarantee that. That's really why row-based
is better.

Yep, especially in the face of things like user PL functions, C
functions, etc.

This page:

http://dev.mysql.com/doc/refman/5.0/en/replication-features-functions.html

is downright alarming, and (implicitly) says quite enough about how
statement-based replication is a really, REALLY bad idea.

Rather than replicating sets of changed rows, though, I suspect that
block-level replication using the WAL is probably more efficient.
Certainly it'll be easier on the slave in terms of the work required to
keep up with the master.

I guess block-level replication isn't much good for multi-master,
though, since you'd be spending half your time finding out what the
other masters were doing and what their state was, or telling them about
yours. (I guess that's the case anyway to some extent, though, any time
you have concurrent statements on different masters using the same data
and one or more of them is altering it).

Sequences I deal with by setting up an offset and increment for each
replica so that there are no conflicts.

Ah, so you don't actually care if the replicas are identical - you
expect things like different primary keys on master and replicas(s) ?

How do your applications cope if they switch from one replica to another
and suddenly primary key identifiers are different?

Here is a link that describes the technique:
http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1

<snip>

Regards,
Gerry

#17

CR Lender

crlender@gmail.com

almost 17 years ago

In reply to: Scott Marlowe (#15)

Re: Replication

On 23/06/09 03:44, Scott Marlowe wrote:

On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno<greno@verizon.net> wrote:

Londiste is beta. The fact that Skype uses it is because it's part
of Skytools which is their product. They may want to run their own
beta stuff. I don't.

So, if they said it was general release, but it sucked, you'd try it,
but since they say it's beta, no way? Wow. Just wow. The amount
of dumb in that sentence is not measurable with modern
instrumentation.

To be fair, the "beta" label has been abused a lot in the last years;
and what's more, it has been used as an excuse to refuse support (I'm
looking at Google here). Another point would be that Skype has come
under attack for using what basically amounts to a black box protocol in
their main application - many security-minded people are sceptical of
the company for this reason, and I can't blame them. That said, I do use
pgbouncer, which is also a Skype project (released under the BSD
license). After some casual code review I found it to be of good
quality, and I'm now using it in production environments. I don't think
it's so unreasonable to be questioning projects which are only available
as "betas". There was a time when "beta" meant caveat emptor, this
product is not fully tested, and if it breaks, we'd like to hear about
it, but we won't be surprised. Trusting such a product with database
replication may well work, but it's a risk not everybody's willing to take.

- Conrad

#18

craig@2ndquadrant.com

almost 17 years ago

In reply to: Gerry Reno (#16)

Re: Replication

On Mon, 2009-06-22 at 22:20 -0400, Gerry Reno wrote:

Here is a link that describes the technique:
http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1

Ah. You were referring to multiple-master replication, and your
reference to setting non-overlapping sequences referred to avoiding
collisions caused by inserts on two different masters. Yes, using
non-overlapping allocation ranges for sequences is indeed one way to
handle that, but it's not actually related to what I was talking about
anyway.

What I was referring to in the parent post was an issue with
statement-based replication of concurrent statements sharing a sequence.
It's completely unrelated; both statements are running on the SAME
server (master) and replicating to the slave. For example, take two
concurrent statements each of which inserts 10 generated rows into the
dummy table 'x':

CREATE SEQUENCE x;
CREATE TABLE x (
a INTEGER PRIMARY KEY DEFAULT nextval('x_id_seq'),
b INTEGER NOT NULL
);

CONNECTION (1) TO MASTER CONNECTION (2) TO MASTER
----------------------------- --------------------------
Issues INSERT INTO x (a,b)
SELECT nextval('x_id_seq'),1
FROM generate_series(0,9);
Issues INSERT INTO x (a,b)
SELECT nextval('x_id_seq'),2
FROM generate_series(0,9);

nextval() returns 1
nextval() returns 2
nextval() returns 3
nextval() returns 4
nextval() returns 5
nextval() returns 6
nextval() returns 7
nextval() returns 8
nextval() returns 9
nextval() returns 10
nextval() returns 11
nextval() returns 12

... etc

If you issue the same two statements on the slave, the ordering in which
those nextval() calls are interleaved will be different. So, while on
the master according to the example above table 'x' would contain:

a b
(1,1)
(2,1)
(3,1)
(4,2)
(5,1)
(6,2)
(7,1)
(8,1)
(9,2)
(10,2)
(11,2)
...

on the slave it might land up containing something like:

a b
(1,1)
(2,2)
(3,2)
(4,1)
(5,2)
(6,1)
(7,1)
(8,2)
(9,1)
(10,1)
(11,2)

so your slave and master contain TOTALLY DIFFERENT DATA. Yet, there's
nothing wrong with the ordering of execution on the master being
non-deterministic, as we still got what we asked for. We have 10 rows
with unique primary keys and b=1, and ten rows with unique primary keys
and b=2 . We don't actually care what those primary key values are since
they're synthetic primary keys, we only care that they're unique. In a
master/slave situation, though, we also care that the SAME primary key
identifies the SAME entity on both master and slave, and that won't
happen with statement-based replication when concurrent statements
interleave in non-deterministic ways.

Of course, it's rather nice in performance terms that such statements
CAN be interleaved without synchronisation or locking. In fact, that's
why PostgreSQL sequences exist.

In this particular case, the server could work around it by logging its
selection of generated values to some sort of side channel (akin to
MySQL's replication binlog) so the slave can use that as its source for
them. That's kind of error prone, though, as it requires every such
function ( nextval, random(), etc ) to have support for replication
manually added, and will result in hopelessly out-of-sync slaves if a
function isn't handled. It also doesn't provide an answer for other
non-deterministic result sets like use of a function in a result set
with LIMIT without ORDER BY .

The problem is that if you do statement-based replication, the order in
which reads from the sequence by each statement are interleaved is
undefined and depends on the OS's I/O and processor scheduling. The
slave will not produce the same ordering, so the same statements
executed on the slave will result in inserted rows having different
generated keys than on the master.

MySQL appears to tackle these problems by
look! a cassowary! Over there!

Anyway, what was I saying? Oh, yes, MySQL appears to ignore these
problems or expect a paranoidly careful admin to avoid them. Some
functions are just broken and don't replicate properly; some statements
will produce wrong results on the slave, etc.

You won't EVER see that sort of thing in PostgreSQL.

So ... it doesn't seem likely that statement-level replication would
ever get far in Pg because of nasty issues like this one.

That was my point re concurrent execution of statements. Nothing to do
with ensuring key uniqueness without inter-node synchronisation in
multi-master environments.

Block-level master/slave synchronous replication, however, is already on
the way. (Also, Slony provides row-level master/slave replication that
seems to work well for a lot of people, though it's widely admitted to
be a bit of a pain to work with and not particularly nice.)

--
Craig Ringer

#19

scott.marlowe@gmail.com

almost 17 years ago

In reply to: CR Lender (#17)

Re: Replication

On Mon, Jun 22, 2009 at 8:50 PM, Conrad Lender<crlender@gmail.com> wrote:

On 23/06/09 03:44, Scott Marlowe wrote:

On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno<greno@verizon.net> wrote:

Londiste is beta. The fact that Skype uses it is because it's part
of Skytools which is their product. They may want to run their own
beta stuff. I don't.

So, if they said it was general release, but it sucked, you'd try it,
but since they say it's beta, no way? Wow. Just wow. The amount
of dumb in that sentence is not measurable with modern
instrumentation.

To be fair, the "beta" label has been abused a lot in the last years;
and what's more, it has been used as an excuse to refuse support (I'm
looking at Google here). Another point would be that Skype has come
under attack for using what basically amounts to a black box protocol in
their main application - many security-minded people are sceptical of
the company for this reason, and I can't blame them. That said, I do use
pgbouncer, which is also a Skype project (released under the BSD
license). After some casual code review I found it to be of good
quality, and I'm now using it in production environments. I don't think
it's so unreasonable to be questioning projects which are only available
as "betas". There was a time when "beta" meant caveat emptor, this
product is not fully tested, and if it breaks, we'd like to hear about
it, but we won't be surprised. Trusting such a product with database
replication may well work, but it's a risk not everybody's willing to take.

Beta or alpha or final or production, they all mean nothing unless
they are applied to a specific piece of code and it's rep. I've seen
plenty of software that was supposedly supported that was never fixed
or fixed at a leisurely pace (see mysql and packaging mistakes and
innodb order by desc bugs for examples). I've used "alpha" products
in limited, well tested roles in production that worked and worked
well. OpenSSL which I trust to do a good job, is 0.9. something right
now, which screams not "release" to me.

What makes code production worthy is that YOU have tested it
thoroughly and that YOU guarantee it to work or you'll fix it as long
as it's used in a way you can test for properly before upgrade /
update deployments. How fast do fixes come out? How well is it
maintained. An actively maintained beta may be a better answer in a
moving landscape because it can keep up. Beta means beta. And what
that means to an individual developer may not be what you expect it to
be. The risk is purely non-existent based on the naming of the
release IF IT'S BEEN TESTED PROPERLY.

#20

scott.marlowe@gmail.com

almost 17 years ago

In reply to: Craig Ringer (#18)

Re: Replication

On Mon, Jun 22, 2009 at 8:59 PM, Craig
Ringer<craig@postnewspapers.com.au> wrote:

So ... it doesn't seem likely that statement-level replication would
ever get far in Pg because of nasty issues like this one.

It's exactly what pg_pool does, and you can choose it if you know what
you're doing. But yes, it's usually a bad fit for replication by
itself.

That was my point re concurrent execution of statements. Nothing to do
with ensuring key uniqueness without inter-node synchronisation in
multi-master environments.

Block-level master/slave synchronous replication, however, is already on
the way. (Also, Slony provides row-level master/slave replication that
seems to work well for a lot of people, though it's widely admitted to
be a bit of a pain to work with and not particularly nice.)

I think it's real easy to work with, once you understand that "it's
boss". I.e. you do things the slony way, or get used to recreating /
resubscribing a lot of times during maintenance windows when you can
run on one db. The mis-feature of no ability to drop tables caught me
out. Now we don't drop tables, period. We rename and alter to get
around that. Once I told the developers not to drop tables in order
to change them, things got better. Really it was bad habits learned
from other dbs.

#21

craig@2ndquadrant.com

almost 17 years ago

In reply to: Scott Marlowe (#20)

#22

Arndt Lehmann

arndt.lehmann@gmail.com

almost 17 years ago

In reply to: Gerry Reno (#1)

#23

Mike Christensen

mike@kitchenpc.com

almost 17 years ago

In reply to: Arndt Lehmann (#22)

#24

Arndt Lehmann

arndt.lehmann@gmail.com

almost 17 years ago

In reply to: Gerry Reno (#1)

#25

Arndt Lehmann

arndt.lehmann@gmail.com

almost 17 years ago

In reply to: Gerry Reno (#1)

#26

Devrim GÜNDÜZ

devrim@gunduz.org

almost 17 years ago

In reply to: Gerry Reno (#1)

#27

Devrim GÜNDÜZ

devrim@gunduz.org

almost 17 years ago

In reply to: Gerry Reno (#4)

#28

Grzegorz Jaśkiewicz

gryzman@gmail.com

almost 17 years ago

In reply to: Devrim GÜNDÜZ (#26)

#29

Devrim GÜNDÜZ

devrim@gunduz.org

almost 17 years ago

In reply to: Grzegorz Jaśkiewicz (#28)

#30

Grzegorz Jaśkiewicz

gryzman@gmail.com

almost 17 years ago

In reply to: Devrim GÜNDÜZ (#29)

#31

Jasen Betts

jasen@xnet.co.nz

almost 17 years ago

In reply to: Gerry Reno (#1)

#32

greno@verizon.net

almost 17 years ago

In reply to: Craig Ringer (#18)

#33

Merlin Moncure

mmoncure@gmail.com

almost 17 years ago

In reply to: Devrim GÜNDÜZ (#27)

#34

Ray Stell

stellr@cns.vt.edu

almost 17 years ago

In reply to: Merlin Moncure (#33)

#35

scott.marlowe@gmail.com

almost 17 years ago

In reply to: Craig Ringer (#21)

#36

scott.marlowe@gmail.com

almost 17 years ago

In reply to: Devrim GÜNDÜZ (#27)

#37

Merlin Moncure

mmoncure@gmail.com

almost 17 years ago

In reply to: Ray Stell (#34)

#38

Mike Christensen

mike@kitchenpc.com

almost 17 years ago

In reply to: Arndt Lehmann (#24)

#39

Thomas Kellerer

spam_eater@gmx.net

almost 17 years ago

In reply to: Mike Christensen (#38)

#40

Ray Stell

stellr@cns.vt.edu

almost 17 years ago

In reply to: Merlin Moncure (#37)

#41

Greg Sabino Mullane

greg@turnstep.com

almost 17 years ago

In reply to: Gerry Reno (#6)

#42

Glyn Astill

glynastill@yahoo.co.uk

almost 17 years ago

In reply to: Greg Sabino Mullane (#41)

#43

Emanuel Calvo Franco

postgres.arg@gmail.com

almost 17 years ago

In reply to: Greg Sabino Mullane (#41)

#44

craig@2ndquadrant.com

almost 17 years ago

In reply to: Thomas Kellerer (#39)

#45

Scott Mead

scott.lists@enterprisedb.com

almost 17 years ago

In reply to: Craig Ringer (#44)

#46

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 17 years ago

In reply to: Gerry Reno (#8)

#47

Thomas Kellerer

spam_eater@gmx.net

almost 17 years ago

In reply to: Craig Ringer (#44)

#48

Mike Christensen

mike@kitchenpc.com

almost 17 years ago

In reply to: Thomas Kellerer (#47)

#49

jd@commandprompt.com

almost 17 years ago

In reply to: Mike Christensen (#48)

#50

Geoffrey

lists@serioustechnology.com

almost 17 years ago

In reply to: Mike Christensen (#48)

#51

scott.marlowe@gmail.com

almost 17 years ago

In reply to: Mike Christensen (#48)

#52

Greg Smith

gsmith@gregsmith.com

almost 17 years ago

In reply to: Ray Stell (#34)

#53

Jasen Betts

jasen@xnet.co.nz

almost 17 years ago

In reply to: Gerry Reno (#1)

#54

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 17 years ago

In reply to: Jasen Betts (#53)

#55