Replication on the backend

Started by Gustavo Toniniover 20 years ago30 messageshackers
Jump to latest
#1Gustavo Tonini
gustavotonini@gmail.com

What about replication or data distribution inside the backend. This is a
valid issue?

Thanks,
Gustavo.

#2Chris Browne
cbbrowne@acm.org
In reply to: Gustavo Tonini (#1)
Re: Replication on the backend

gustavotonini@gmail.com (Gustavo Tonini) writes:

What about replication or data distribution inside the backend. This
is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/x.html
"Love is like a snowmobile flying over the frozen tundra that suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

#3Gustavo Tonini
gustavotonini@gmail.com
In reply to: Chris Browne (#2)
Re: Replication on the backend

replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.

Gustavo.

P.S. Sorry for my bad English.

2005/12/5, Chris Browne <cbbrowne@acm.org>:

Show quoted text

gustavotonini@gmail.com (Gustavo Tonini) writes:

What about replication or data distribution inside the backend. This
is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/x.html
"Love is like a snowmobile flying over the frozen tundra that suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#4Joshua D. Drake
jd@commandprompt.com
In reply to: Gustavo Tonini (#3)
Re: Replication on the backend

Gustavo Tonini wrote:

replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.

http://www.commandprompt.com/ - Master/Slave

Joshua D. Drake

Show quoted text

Gustavo.

P.S. Sorry for my bad English.

2005/12/5, Chris Browne <cbbrowne@acm.org <mailto:cbbrowne@acm.org>>:

gustavotonini@gmail.com <mailto:gustavotonini@gmail.com> (Gustavo
Tonini) writes:

What about replication or data distribution inside the

backend. This

is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/x.html
<http://www.ntlug.org/%7Ecbbrowne/x.html&gt;
"Love is like a snowmobile flying over the frozen tundra that
suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org
<mailto:majordomo@postgresql.org> so that your
message can get through to the mailing list cleanly

#5Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Gustavo Tonini (#3)
Re: Replication on the backend

replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.

It's not in the backend, check out things like Slony (www.slony.info)
and various other commercial solutions.

Chris

#6Jan Wieck
JanWieck@Yahoo.com
In reply to: Gustavo Tonini (#3)
Re: Replication on the backend

On 12/5/2005 8:18 PM, Gustavo Tonini wrote:

replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.

We do not plan to implement replication inside the backend. Replication
needs are so diverse that pluggable replication support makes a lot more
sense. To me it even makes more sense than keeping transaction support
outside of the database itself and add it via pluggable storage add-on.

Jan

Gustavo.

P.S. Sorry for my bad English.

2005/12/5, Chris Browne <cbbrowne@acm.org>:

gustavotonini@gmail.com (Gustavo Tonini) writes:

What about replication or data distribution inside the backend. This
is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/x.html
"Love is like a snowmobile flying over the frozen tundra that suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#7Gustavo Tonini
gustavotonini@gmail.com
In reply to: Jan Wieck (#6)
Re: Replication on the backend

But, wouldn't the performance be better? And wouldn't asynchronous messages
be better processed?

Thanks for replies,
Gustavo.

2005/12/6, Jan Wieck <JanWieck@yahoo.com>:

Show quoted text

On 12/5/2005 8:18 PM, Gustavo Tonini wrote:

replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.

We do not plan to implement replication inside the backend. Replication
needs are so diverse that pluggable replication support makes a lot more
sense. To me it even makes more sense than keeping transaction support
outside of the database itself and add it via pluggable storage add-on.

Jan

Gustavo.

P.S. Sorry for my bad English.

2005/12/5, Chris Browne <cbbrowne@acm.org>:

gustavotonini@gmail.com (Gustavo Tonini) writes:

What about replication or data distribution inside the backend. This
is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/x.html
"Love is like a snowmobile flying over the frozen tundra that suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

---------------------------(end of

broadcast)---------------------------

TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that

your

message can get through to the mailing list cleanly

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#8Markus Wanner
markus@bluegap.ch
In reply to: Gustavo Tonini (#7)
Re: Replication on the backend

On Tue, 2005-12-06 at 10:03 -0200, Gustavo Tonini wrote:

But, wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?

At least for synchronous multi-master replication, the performance
bottelneck is going to be the interconnect between the nodes -
integration of the replication logic into the backend most probably
doesn't affect performance that much.

I'd rather like to ask Jan what different needs for replication he
discovered so far. And how he came to the conclusion, that it's not
possible to provide a general solution.

My point for integration into the backend is flexibility: obviously the
replication code can influence the database much more from within the
backend than from the outside. For example running one complex query on
several nodes. I know, this a very advanced feature - currently it's not
even possible to run one query on multiple backends (i.e. processors of
a multi-core system) - but I like to plan ahead instead of throwing away
code later. For such advanced features you simply have to dig around in
the backend code one day. Of course you can always add hooks, but IMHO
that only complicates matters.

Is there some discussion going on about such topics somewhere? What's up
with slony-2? The wiki on slony2.org still doesn't provide a lot of
technical information (and obviously got spammed BTW).

Regards

Markus

#9Jan Wieck
JanWieck@Yahoo.com
In reply to: Markus Wanner (#8)
Re: Replication on the backend

On 12/6/2005 8:10 AM, Markus Schiltknecht wrote:

On Tue, 2005-12-06 at 10:03 -0200, Gustavo Tonini wrote:

But, wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?

At least for synchronous multi-master replication, the performance
bottelneck is going to be the interconnect between the nodes -
integration of the replication logic into the backend most probably
doesn't affect performance that much.

That is exactly right. Thus far, processor, memory and disk speeds have
allways advanced on a higher pace than network speeds. Thus, the few
percent of performance gain we'd get from moving things into the backend
will be irrelevant tomorrow with 4x-core and 16x-core CPU's.

I'd rather like to ask Jan what different needs for replication he
discovered so far. And how he came to the conclusion, that it's not
possible to provide a general solution.

- Asynchronous master to multi-slave. We have a few of those with
Mommoth-Replicator and Slony-I being the top players. Slony-I does
need some cleanup and/or reimplementation after we have a general
pluggable replication API in place.

- Synchronous multimaster. There are certain attempts out there, like
Postgres-R, pgcluster, Slony-II. Some more advanced, some less. But
certainly nothing I would send into the ring against Oracle-Grid.

- Asynchronous multimaster with conflict resolution. I have not seen
any reasonable attempt on this one yet. Plus, it divides again into
two camps. One is the idea to have one central system with thousands
of satellites (salesman on the street), the other being two or more
central systems doing load balancing (although this competes with
sync-mm).

My point for integration into the backend is flexibility: obviously the
replication code can influence the database much more from within the

We need a general API. It should be possible to define on a per-database
level which shared replication module to load on connect. The init
function of that replication module then installs all the required
callbacks at strategic points (like heap_update(), at_commit() ...) and
the rest is hidden in the module.

Is there some discussion going on about such topics somewhere? What's up
with slony-2? The wiki on slony2.org still doesn't provide a lot of
technical information (and obviously got spammed BTW).

Slony-II has been slow lately in the Eastern timezone.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#10Markus Wanner
markus@bluegap.ch
In reply to: Jan Wieck (#9)
Re: Replication on the backend

Hello Jan,

On Tue, 2005-12-06 at 10:10 -0500, Jan Wieck wrote:

We need a general API. It should be possible to define on a per-database
level which shared replication module to load on connect. The init
function of that replication module then installs all the required
callbacks at strategic points (like heap_update(), at_commit() ...) and
the rest is hidden in the module.

thank you for your list of replication types. Those still have some
things in common. Thus your approach of providing hooks for different
modules might make sense. Only I fear that I would need way to many
hooks for what I want ;)

Slony-II has been slow lately in the Eastern timezone.

What is that supposed to mean? Who sits in the eastern timezone?

Regards

Markus

#11Chris Browne
cbbrowne@acm.org
In reply to: Gustavo Tonini (#1)
Re: Replication on the backend

gustavotonini@gmail.com (Gustavo Tonini) writes:

But,� wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?

Why do you think performance would be materially affected by this?

The MAJOR performance bottleneck is normally the slow network
connection between servers.

When looked at in the perspective of that bottleneck, pretty much
everything else is just noise. (Sometimes pretty loud noise, but
still noise :-).)
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/spreadsheets.html
"When the grammar checker identifies an error, it suggests a
correction and can even makes some changes for you."
-- Microsoft Word for Windows 2.0 User's Guide, p.35:

#12Mario Weilguni
mario.weilguni@icomedias.com
In reply to: Chris Browne (#11)
Re: Replication on the backend

IMO this is not true. You can get affordable 10GBit network adapters, so you can have plenty of bandwith in a db server pool (if they are located in the same area). Even 1GBit Ethernet greatly helps here, and would make it possible to balance read-intensive (and not write intensive) applications. We using linux bonding interface with 2 gbit NICs, and 200 MBytes/sec throughput is something you need to have a quite some harddisks to reach that. Latency is not bad too.

Regards,
Mario weilguni

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Chris Browne
Sent: Tuesday, December 06, 2005 4:43 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replication on the backend

gustavotonini@gmail.com (Gustavo Tonini) writes:

But,  wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?

Why do you think performance would be materially affected by this?

The MAJOR performance bottleneck is normally the slow network
connection between servers.

When looked at in the perspective of that bottleneck, pretty much
everything else is just noise. (Sometimes pretty loud noise, but
still noise :-).)
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/spreadsheets.html
"When the grammar checker identifies an error, it suggests a
correction and can even makes some changes for you."
-- Microsoft Word for Windows 2.0 User's Guide, p.35:

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

#13Michael Meskes
meskes@postgresql.org
In reply to: Jan Wieck (#9)
Re: Replication on the backend

Postgres-R, pgcluster, Slony-II. Some more advanced, some less. But
certainly nothing I would send into the ring against Oracle-Grid.

Assuming that you mean Oracle Real Application Cluster (the Grid is more,
right?) I wonder if this technology technically still counts as replication.
AFAIK they do not replicate data but share a common data pool among different
servers. You still have communication overhead but you write a tuple only
once for all servers involved. Takes away a lot of overhead on a system
that's heavily written too.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL

#14Rick Gigger
rick@alpinenetworking.com
In reply to: Jan Wieck (#6)
Re: Replication on the backend

Just like MySql!

On Dec 5, 2005, at 10:35 PM, Jan Wieck wrote:

Show quoted text

On 12/5/2005 8:18 PM, Gustavo Tonini wrote:

replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.

We do not plan to implement replication inside the backend.
Replication needs are so diverse that pluggable replication support
makes a lot more sense. To me it even makes more sense than keeping
transaction support outside of the database itself and add it via
pluggable storage add-on.

Jan

Gustavo.
P.S. Sorry for my bad English.
2005/12/5, Chris Browne <cbbrowne@acm.org>:

gustavotonini@gmail.com (Gustavo Tonini) writes:

What about replication or data distribution inside the

backend. This

is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/x.html
"Love is like a snowmobile flying over the frozen tundra that
suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so
that your
message can get through to the mailing list cleanly

--
#=====================================================================
=#
# It's easier to get forgiveness for being wrong than for being
right. #
# Let's break this rule - forgive
me. #
#==================================================
JanWieck@Yahoo.com #

---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#15Rick Gigger
rick@alpinenetworking.com
In reply to: Jan Wieck (#9)
Re: Replication on the backend

- Asynchronous master to multi-slave. We have a few of those with
Mommoth-Replicator and Slony-I being the top players. Slony-I does
need some cleanup and/or reimplementation after we have a general
pluggable replication API in place.

Is this API actually have people working on it or just something on
the todo list?

#16Gustavo Tonini
gustavotonini@gmail.com
In reply to: Rick Gigger (#15)
Re: Replication on the backend

I don't see anything in the TODO list. I'm very interesting in work that. If
is possible...

Gustavo.

#17Aly Dharshi
aly.dharshi@telus.net
In reply to: Michael Meskes (#13)
Re: Replication on the backend

I would classify it as a clustered database system (Oracle 10g that is).
Clustered meaning more than one node in the cluster.

ALy.

On Tue, 6 Dec 2005, Michael Meskes wrote:

Postgres-R, pgcluster, Slony-II. Some more advanced, some less. But
certainly nothing I would send into the ring against Oracle-Grid.

Assuming that you mean Oracle Real Application Cluster (the Grid is more,
right?) I wonder if this technology technically still counts as replication.
AFAIK they do not replicate data but share a common data pool among different
servers. You still have communication overhead but you write a tuple only
once for all servers involved. Takes away a lot of overhead on a system
that's heavily written too.

Michael

--
Aly S.P Dharshi
aly.dharshi@telus.net

"A good speech is like a good dress
that's short enough to be interesting
and long enough to cover the subject"

#18Jan Wieck
JanWieck@Yahoo.com
In reply to: Mario Weilguni (#12)
Re: Replication on the backend

On 12/6/2005 11:23 AM, Mario Weilguni wrote:

IMO this is not true. You can get affordable 10GBit network adapters, so you can have plenty of bandwith in a db server pool (if they are located in the same area). Even 1GBit Ethernet greatly helps here, and would make it possible to balance read-intensive (and not write intensive) applications. We using linux bonding interface with 2 gbit NICs, and 200 MBytes/sec throughput is something you need to have a quite some harddisks to reach that. Latency is not bad too.

It's not so much the bandwidth but more the roundtrips that limit your
maximum transaction throughput. Remember, whatever the priority, you
can't increase the speed of light.

Jan

Regards,
Mario weilguni

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Chris Browne
Sent: Tuesday, December 06, 2005 4:43 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replication on the backend

gustavotonini@gmail.com (Gustavo Tonini) writes:

But, wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?

Why do you think performance would be materially affected by this?

The MAJOR performance bottleneck is normally the slow network
connection between servers.

When looked at in the perspective of that bottleneck, pretty much
everything else is just noise. (Sometimes pretty loud noise, but
still noise :-).)

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#19Gregory Maxwell
gmaxwell@gmail.com
In reply to: Jan Wieck (#18)
Re: Replication on the backend

On 12/6/05, Jan Wieck <JanWieck@yahoo.com> wrote:

IMO this is not true. You can get affordable 10GBit network adapters, so you can have plenty of bandwith in a db server pool (if they are located in the same area). Even 1GBit Ethernet greatly helps here, and would make it possible to balance read-intensive (and not write intensive) applications. We using linux bonding interface with 2 gbit NICs, and 200 MBytes/sec throughput is something you need to have a quite some harddisks to reach that. Latency is not bad too.

It's not so much the bandwidth but more the roundtrips that limit your
maximum transaction throughput. Remember, whatever the priority, you
can't increase the speed of light.

Eh, why would light limited delay be any slower than a disk on FC the
same distance away? :)

In any case, performance of PG on iscsi is just fine. You can't blame
the network... Doing multimaster replication is hard because the
locking primitives that are fine on a simple multiprocessor system
(with a VERY high bandwidth very low latency interconnect between
processors) just don't work across a network, so you're left finding
other methods and making them work...

But again, multimaster isn't hard because there of some inherently
slow property of networks.

#20Markus Wanner
markus@bluegap.ch
In reply to: Jan Wieck (#18)
Re: Replication on the backend

On Tue, 2005-12-06 at 23:19 -0500, Jan Wieck wrote:

It's not so much the bandwidth but more the roundtrips that limit your
maximum transaction throughput.

I completely agree that the latency is counting, not the bandwith.

Does anybody have latency / roundtrip measurements for current hardware?
I'm interested in:
1Gb Ethernet,
10 Gb Ethernet,
InfiniBand,
probably even p2p usb2 or firewire links?

At least Quadrics claims(1) to have measured only 1.38 microseconds.
Assuming real world condition would give you 5 microseconds, on a 3 GHz
processor that's 15'000 CPY cycles. Which is IMHO not that much any
more. Or am I wrong (mental arithmetic never was my favourite subject)?

Regards
Markus

[1]: http://www.quadrics.com/quadrics/QuadricsHome.nsf/NewsByDate/98FFE60F799AC95180256FEA002A6D9D
http://www.quadrics.com/quadrics/QuadricsHome.nsf/NewsByDate/98FFE60F799AC95180256FEA002A6D9D

#21J. Andrew Rogers
jrogers@neopolitan.com
In reply to: Markus Wanner (#20)
#22Markus Wanner
markus@bluegap.ch
In reply to: J. Andrew Rogers (#21)
#23J. Andrew Rogers
jrogers@neopolitan.com
In reply to: Gregory Maxwell (#19)
#24Luke Lonergan
llonergan@greenplum.com
In reply to: Mario Weilguni (#12)
#25Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Jan Wieck (#6)
#26Gustavo Tonini
gustavotonini@gmail.com
In reply to: Andrew Sullivan (#25)
#27Andrew Dunstan
andrew@dunslane.net
In reply to: Gustavo Tonini (#26)
#28Jan Wieck
JanWieck@Yahoo.com
In reply to: Gustavo Tonini (#26)
#29Chris Browne
cbbrowne@acm.org
In reply to: Gustavo Tonini (#1)
#30Markus Wanner
markus@bluegap.ch
In reply to: Chris Browne (#29)