BDR Selective Replication

Started by Willem Buitendykalmost 11 years ago9 messagesgeneral
Jump to latest
#1Willem Buitendyk
willem@pcfish.ca

It's not clear to me but is selective replication working in BDR? Does
anyone have any examples if so?

Thanks

--
View this message in context: http://postgresql.nabble.com/BDR-Selective-Replication-tp5846864.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Willem Buitendyk (#1)
Re: BDR Selective Replication

On 26 April 2015 at 10:05, swaxolez <willem@pcfish.ca> wrote:

It's not clear to me but is selective replication working in BDR? Does
anyone have any examples if so?

Yes, selective replication (using replication sets) is supported in the
current 0.9 stable series.

The documentation on replication sets is very sparse at the moment; the
next iteration will improve that.

http://bdr-project.org/docs/stable/replication-sets.html

There are also some improvements needed to the user interface - in
particular, providing a function interface for changing replication set
memberships for connections so there's no need to manually restart the
apply backends after a change, and providing default replication sets for a
node. Current development priorities mean that these aren't expected in the
next few releases.

Note that selective replication affects *only* replication of rows. DDL is
still replicated on tables that are not members of any active replication
set. Also, changing replication set memberships won't synchronise the added
table's rows from other nodes, it'll just start replicating new changes
from its current state. You generally want to set up replication sets
before starting to add data to tables.

All this applies to 0.9.0 and is, of course, subject to change in future
releases, time and resources permitting.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Willem Buitendyk
willem@pcfish.ca
In reply to: Craig Ringer (#2)
Re: BDR Selective Replication

I get the feeling I might want to wait for the next point release before
deploying on anything other than a test platform. In the meantime, I'll play
around and see how it works. These are fantastic additions to a fantastic
database. Thanks for the good work!

--
View this message in context: http://postgresql.nabble.com/BDR-Selective-Replication-tp5846864p5846898.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Willem Buitendyk (#3)
Re: BDR Selective Replication

On 26 April 2015 at 23:52, swaxolez <willem@pcfish.ca> wrote:

I get the feeling I might want to wait for the next point release before
deploying on anything other than a test platform. In the meantime, I'll
play
around and see how it works.

In the mean time, take a look at the rest of the documentation for the
coming version: http://bdr-project.org/docs/next/ . It's worth thinking
carefully about whether multi-master is right for you and understanding the
trade-offs involved with multi-master in general, and BDR in particular.

BDR's development is driven mostly by customer priorities. Currently we're
focused on improvements to dump and restore, DDL replication, and node
removal, plus some backporting of 9.5 versions of underlying features.

There's no current work planned on things like skipping DDL replication for
tables that are not in a replication set, table sync when replication sets
are changed, etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#2)
Re: BDR Selective Replication

On 4/26/15 7:49 AM, Craig Ringer wrote:

There are also some improvements needed to the user interface - in
particular, providing a function interface for changing replication set
memberships for connections so there's no need to manually restart the
apply backends after a change, and providing default replication sets
for a node.

If 'default replication set' is the idea of "here's what tables *should*
be getting replicated regardless of whether that's happening or not",
it'd be great if that was done so it could be split out on it's own at
some point. It's a problem that affects all replication systems.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#5)
Re: BDR Selective Replication

On 28 April 2015 at 05:38, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/26/15 7:49 AM, Craig Ringer wrote:

There are also some improvements needed to the user interface - in
particular, providing a function interface for changing replication set
memberships for connections so there's no need to manually restart the
apply backends after a change, and providing default replication sets
for a node.

If 'default replication set' is the idea of "here's what tables *should*
be getting replicated regardless of whether that's happening or not", it'd
be great if that was done so it could be split out on it's own at some
point. It's a problem that affects all replication systems.

It wasn't, but that's an interesting idea.

You need away to identify peer nodes in an abstract way before you can
really define sets of which nodes should get which tables. So I think
replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are a
pre-requisite for that though, and one that's proving difficult to get in.

I think any sort of replication sets is likely to have similar problems,
especially the "no in-core user" problem. There's nothing fundamentally
impossible about filtering WAL sent to physical downstreams over streaming
replication to include only replicated tables and the catalogs, though, so
perhaps there could be an in-core user for it.

In BDR we're currently (ab)using security labels to tag tables with their
replication sets, but I'd love to have a proper way to do that. As I recall
the prior approach, of allowing custom relation options, was rejected on
-hackers.

How would you want to go about storing and tracking the information? A new
catalog? The other issue for in-core replication sets would probably be
making it foreign-key aware, so replication of a table transitively
requires replication of its references.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#6)
Re: BDR Selective Replication

On 4/27/15 7:54 PM, Craig Ringer wrote:

If 'default replication set' is the idea of "here's what tables
*should* be getting replicated regardless of whether that's
happening or not", it'd be great if that was done so it could be
split out on it's own at some point. It's a problem that affects all
replication systems.

It wasn't, but that's an interesting idea.

You need away to identify peer nodes in an abstract way before you can
really define sets of which nodes should get which tables. So I think
replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are
a pre-requisite for that though, and one that's proving difficult to get
in.

Perhaps... different replication systems probably use different methods
to identify, so presumably there'd need to be some way to map a generic
identifier into an appropriate identifier for whatever replication
system you're using.

I think any sort of replication sets is likely to have similar problems,
especially the "no in-core user" problem. There's nothing fundamentally
impossible about filtering WAL sent to physical downstreams over
streaming replication to include only replicated tables and the
catalogs, though, so perhaps there could be an in-core user for it.

Oh, I wasn't thinking this needed to be in-core. I think it'd be a lot
easier to develop it as an extension to start with... certainly a lot
less headache ;) If it becomes popular then it'll be a lot easier to get
it added.

In BDR we're currently (ab)using security labels to tag tables with
their replication sets, but I'd love to have a proper way to do that. As
I recall the prior approach, of allowing custom relation options, was
rejected on -hackers.

How would you want to go about storing and tracking the information? A
new catalog? The other issue for in-core replication sets would probably
be making it foreign-key aware, so replication of a table transitively
requires replication of its references.

As you said, we'd need a way to identify replication nodes. We might
also need/want a way to specify topology. I don't think topology would
be too hard (presumably it's either a single 'parent' node, or a list of
peers). What might be more interesting is dealing with different systems
methods of identifying nodes.

You'd want a way to define different sets and associate them with nodes.
A node could be a provider, subscriber, or both. I think some
replication systems support 'pass through' as well, where the node
passes data downstream but doesn't apply it itself. Or it could be
multi-master and possibly a provider to read-only subscribers.

Finally you'd need to associate tables and sequences with a set. I agree
you'd want to look at FKs. I'd also like to be able to define rules for
a set, like "include everything in this schema, unless the first
character is _".
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#7)
Re: BDR Selective Replication

On 29 April 2015 at 09:14, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/27/15 7:54 PM, Craig Ringer wrote:

If 'default replication set' is the idea of "here's what tables
*should* be getting replicated regardless of whether that's
happening or not", it'd be great if that was done so it could be
split out on it's own at some point. It's a problem that affects all
replication systems.

It wasn't, but that's an interesting idea.

You need away to identify peer nodes in an abstract way before you can
really define sets of which nodes should get which tables. So I think
replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are
a pre-requisite for that though, and one that's proving difficult to get
in.

Perhaps... different replication systems probably use different methods to
identify, so presumably there'd need to be some way to map a generic
identifier into an appropriate identifier for whatever replication system
you're using.

Replication identifiers do just that: provide a way to map identifiers from
some external system into a local unique identifier for a peer node, along
with tracking of the replay position from the peer so replay can be
restarted at a consistent point. The replay position is an LSN, so they're
not going to work for any arbitrary system, though.

How would you want to go about storing and tracking the information? A

new catalog? The other issue for in-core replication sets would probably
be making it foreign-key aware, so replication of a table transitively
requires replication of its references.

As you said, we'd need a way to identify replication nodes. We might also
need/want a way to specify topology.

Topology? Why?

All a node needs to know is "send data from <these tables> to <these
peers>". It's just a set. If a replication system is doing something fancy
it'd be able to manage the replication sets on the nodes.

I don't think topology would be too hard (presumably it's either a single
'parent' node, or a list of peers). What might be more interesting is
dealing with different systems methods of identifying nodes.

Yeah, topology is hard. Rings, mesh with dangling follower nodes, etc.

I don't think it's really the same thing as replication sets.

You'd want a way to define different sets and associate them with nodes. A

node could be a provider, subscriber, or both. I think some replication
systems support 'pass through' as well, where the node passes data
downstream but doesn't apply it itself. Or it could be multi-master and
possibly a provider to read-only subscribers.

Yeah, you're talking about some kind of abstract modelling of a replication
topology. I'm not sure that's at all necessary to keep track of which
tables should be replicated to which nodes.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#8)
Re: BDR Selective Replication

On 4/29/15 1:38 AM, Craig Ringer wrote:

Perhaps... different replication systems probably use different
methods to identify, so presumably there'd need to be some way to
map a generic identifier into an appropriate identifier for whatever
replication system you're using.

Replication identifiers do just that: provide a way to map identifiers
from some external system into a local unique identifier for a peer
node, along with tracking of the replay position from the peer so replay
can be restarted at a consistent point. The replay position is an LSN,
so they're not going to work for any arbitrary system, though.

Which may not work for something meant to work with different
replication systems...

You'd want a way to define different sets and associate them with
nodes. A node could be a provider, subscriber, or both. I think some
replication systems support 'pass through' as well, where the node
passes data downstream but doesn't apply it itself. Or it could be
multi-master and possibly a provider to read-only subscribers.

Yeah, you're talking about some kind of abstract modelling of a
replication topology. I'm not sure that's at all necessary to keep track
of which tables should be replicated to which nodes.

I'd think that you'd still need to know if a table is a provider or
subscriber regardless of topology; how else will you know how to add it?

As for the topology part, yes, perhaps that's more than the baseline
case. It might be enough of a win to just deal with tables and sets to
not worry about it.

I originally had this idea when dealing with a number of londiste
clusters and wishing I had something better than "Run this SELECT and
paste the output to the command line" to deal with adding newly created
tables. It seemed likely that a more generic system should also be
pretty easy to allow plugging into different replication systems;
there'd just need to be a different layer that translated definition
into actual replication commands. Then the only thing missing would be
defining what sets lived where; that would allow the generic system at
least define almost every aspect of a replication environment. Maybe
that's too ambitious; the first step would be to try just what tables
are in which set and see how that goes.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general