Replication of databases (esp. postgres)

Started by Thomas Antepothabout 27 years ago7 messagesgeneral
Jump to latest
#1Thomas Antepoth
t_antepoth@hamm.netsurf.de

Hello altogether,

i'm breeding on a sensible concept to replicate a database
in a heterogenous environment while keeping the relations intact.

Given situation:

Database 1

Database 2

MASTER
Database 3

...

Database n

Every database replicates itself to MASTER which renumbers the
relations and which replicates the data to the rest of the databases.

All databases must have identical data. All databases have
tables with relations on unique indices. All databases are
granted i/u/d permissions on itself.

So any database may fire a trigger to insert, update or to delete
a record on any other database.

The shown scheme has the disadvantage to have two passes. One Transfer
of data from a slave to a master and one acknowledgement with corrected
relations from the master to the sending slave and replication traffic
to the other, receiving slaves.

Does anybody know how to implement another replication scheme?

Are there any info pointers on readings about replication of databases?

Are there any "ready to go out of the box" tools for postgres?

Thank you in advance.

t++

#2Dustin Sallings
dustin@spy.net
In reply to: Thomas Antepoth (#1)
Re: [GENERAL] Replication of databases (esp. postgres)

On Mon, 15 Feb 1999, Thomas Antepoth wrote:

// Every database replicates itself to MASTER which renumbers the
// relations and which replicates the data to the rest of the
// databases.

Just one note. Renumbering isn't all that important. I did one
that had a non-numeric unique ID for identity type columns. I got the
idea from one of our DBAs who's doing the same kinda thing with Sybase.
The main difference between Postgres and Sybase in this context is that
Sybase has a replication agent already.

Anyway, what I did was create a table of hostnames with a flag
that said, ``me.'' Then a numeric sequence, and combined the two to make
an ID unique to the host. You just have to not replicate the sequence.
I'm not really sure what to do about the db_host table. It needs to be
replicated, but you don't want them all to say true. :) Here's the
schema I did:

create sequence db_host_seq;

create table db_host(
host_key integer default nextval('db_host_seq'),
host_name text,
isme bool
);

create function hostname() returns text as
'select host_name from db_host where isme=\'t\''
language 'sql';

create function unique_id() returns text as
'select ((hostname()) || (''.''::text))
|| (nextval(''unique_id'')::text)'
language 'sql';

--
Principal Member Technical Staff, beyond.com The world is watching America,
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L______________________________________________ and America is watching TV. __

#3Thomas Antepoth
t_antepoth@hamm.netsurf.de
In reply to: Dustin Sallings (#2)
Re: [GENERAL] Replication of databases (esp. postgres)

Dustin,

thank you for your answer.

On Mon, 15 Feb 1999, dustin sallings wrote:

// {Replication of r/w Databases in a two pass scheme w/ renumbering}
Just one note. Renumbering isn't all that important. I did one
that had a non-numeric unique ID for identity type columns.

I thought of a field named db_id. This field combined with
a unique (at least to the local database) counter should give a
unique index to each relation.

The rule to follow is: Every record in a relation with the
same db_id and the same rec_id belongs together.

But think of this situation:

DB 1 gets a relation.

table1 table2
record1 record1
record2

Now this relation gets replicated to DB 2.

After this is done, DB 2 adds a record3 to
table2 this relation.

Question: Which db_id should DB 2 use?

His own? In this way record3 wouldn't belong
to this relation.

The id of DB 1? In this way the data may
not be unique to DB 1 as DB 1 might enter
the same values for record3 for a completely
other relation and a collision is created.

So i dropped the idea of having a db_id field
to identify the record of a relation w/o
renumbering the relations by a master.

have a nice day! t++

#4Michael Davis
michael.davis@prevuenet.com
In reply to: Thomas Antepoth (#3)
RE: [GENERAL] Replication of databases (esp. postgres)

I was thinking about creating a trigger on every table that would write an
insert, update, or delete statement into a log file. Each replicated
database would pro-actively read these and post them.

-----Original Message-----
From: Thomas Antepoth [SMTP:t_antepoth@hamm.netsurf.de]
Sent: Monday, February 15, 1999 9:16 PM
To: pgsql-general@postgreSQL.org
Subject: Re: [GENERAL] Replication of databases (esp.
postgres)

Dustin,

thank you for your answer.

On Mon, 15 Feb 1999, dustin sallings wrote:

// {Replication of r/w Databases in a two pass scheme w/

renumbering}

Just one note. Renumbering isn't all that important. I did

one

that had a non-numeric unique ID for identity type columns.

I thought of a field named db_id. This field combined with
a unique (at least to the local database) counter should give a
unique index to each relation.

The rule to follow is: Every record in a relation with the
same db_id and the same rec_id belongs together.

But think of this situation:

DB 1 gets a relation.

table1 table2
record1 record1
record2

Now this relation gets replicated to DB 2.

After this is done, DB 2 adds a record3 to
table2 this relation.

Question: Which db_id should DB 2 use?

His own? In this way record3 wouldn't belong
to this relation.

The id of DB 1? In this way the data may
not be unique to DB 1 as DB 1 might enter
the same values for record3 for a completely
other relation and a collision is created.

So i dropped the idea of having a db_id field
to identify the record of a relation w/o
renumbering the relations by a master.

have a nice day! t++

#5Thomas Antepoth
t_antepoth@hamm.netsurf.de
In reply to: Michael Davis (#4)
RE: [GENERAL] Replication of databases (esp. postgres)

Michael,

On Wed, 17 Feb 1999, Michael Davis wrote:

I was thinking about creating a trigger on every table that would write an
insert, update, or delete statement into a log file. Each replicated
database would pro-actively read these and post them.

Right. This is one of the prerequesites to avoid a timestamp orientated
replication mechanism, which is known to be vulnerable to the lost
update phenomenon. (e.g. "newnews" on newsservers)

But even if you do that, you have to keep the relations between the
tables intact creating a really unique recordid for each record
in each database.

As Dustin pointed out, he solved it by merging a unique database
id with a sequence into a unique primary key.

This solution is only a partially one because of the insert
into a relation from database 2 into a relation initially
created by database 1.

If the newly inserted record gets the database id of database 2
it will later not be recognized belonging to the relation by a
"... where db_id=this_relations_host and recid=this_relations_sequence"
on every host.

If the newly inserted record gets the database id of database 1
it will be surely regocnized, but the sequence number of the
record may collide with a pre existing record in database 1
when replicated.

Multiple updates to a record of the same relation on different
non connected databases are another problem not solved by
this mechanism.

Every solution which i can imagine relies on a master, which
receives temporary data on inserts, creates unique record ids
and re exports the data to the slaves.

A slave will create in his local database a record id which is
known to be "not part of all databases". e.g. all numbers below
100000 are temporary.

Then it exports this data to the master and deletes the temporary
data, when the master acknowledged his import.

The master creates unique record ids among all temporary data
and reexports the data to all his slaves.

But what to do, if there are several local masters?

t++

#6Clark Evans
clark.evans@manhattanproject.com
In reply to: Thomas Antepoth (#5)
Re: [GENERAL] Replication of databases (esp. postgres)

Thomas Antepoth wrote:

But what to do, if there are several local masters?

I may have interpreted the problem incorrectly, but here is
a similar problem I had...

Say I have 5 plants in a large company. Each of these
plants has a variety of machinery and tools which have
detailed maintance records.

It dosn't make sence to have one centralized database,
so you have 5 seperate ones. Each database with it's own
copy of the machines deployed at the plant.

Now comes the kicker. About 3-5 times a year there is
a major migration of equipment between the plants. Plant A
no longer needs such-and-such a tool, so the tool moves to
plant B. The problem is, the history for that peice of
equipment is tied to several, some times a hundred or
more different production processes, and/or related machinery.
Also, primary key conflicts cause hudge problems.

To solve this problem, I switched from a numeric key to
an alpha numeric key. I assigned a "birthplace" to
each machine... i.e., where it was first deployed. Each
birthplace had it's alpha key, e.g., plant 'A', 'B', etc.
and the new object would be entered using the birth place
key followed by a unique sequence or 'birth number' at that
particular 'birth place'. The result of concatination, is
a globally unique key that can move with the equipment
from place to place.

Each peice of equipment also has it's current location.
If when you query the peice of equipment in the local
database, and it's not the current location, then you
are not allowed to update that 'archive' copy of the data.
To make any changes, you must go over via remote database
link to the database which 'owns' the equipment.

This seemed to solve the problem.

Hope this helps.

Clark

#7Clark Evans
clark.evans@manhattanproject.com
In reply to: Thomas Antepoth (#5)
Re: [GENERAL] Replication of databases (esp. postgres)

Thomas Antepoth wrote:

Michael,

As Dustin pointed out, he solved it by merging a unique database
id with a sequence into a unique primary key.

<snip>

If the newly inserted record gets the database id of database 2
it will later not be recognized belonging to the relation by a
"... where db_id=this_relations_host and recid=this_relations_sequence"
on every host.

You concatinate them. They become a 'single value'.

Do not treat them as seperate things, you merge them together
to generate the uniqueness. You _never_ seperate them, i.e.,
you _never_ change the db_id.

If the newly inserted record gets the database id of database 1
it will be surely regocnized, but the sequence number of the
record may collide with a pre existing record in database 1
when replicated.

Nope. This is never a problem. This is *exactly* what
the concatination gives you.

Multiple updates to a record of the same relation on different
non connected databases are another problem not solved by
this mechanism.

Correct. If the data moves... you need a pointer
telling it where it moved to. When something moves
(a rare event in my case), you go back and update all of
the pointers in the 'local-cashe' copy of the tuple.

Every solution which i can imagine relies on a master, which
receives temporary data on inserts, creates unique record ids
and re exports the data to the slaves.

Nope. You can only query the local-cashe. If you want to
update it, you need to 'transfer' the ownership of the object
(tuple) from one database to the other. _Then_ you can
update the local copy.

A slave will create in his local database a record id which is
known to be "not part of all databases". e.g. all numbers below
100000 are temporary.

YIKES! I would not do this.... it won't scale, and besides
that it's messy! *evil grin*

Then it exports this data to the master and deletes the temporary
data, when the master acknowledged his import.

Why delete it? If you just edited it, chances are you may
want to query it. Sure.. you might get 'old' data, but for
relatively static data this is not a problem. A replication
service can go around propigating updates during off-peak times.

The master creates unique record ids among all temporary data
and reexports the data to all his slaves.

But what to do, if there are several local masters?

You have a 'chain of responsibility', the Design Pattern book
will help with this one.

t++

Best,

Clark Evans