Switching timeline over streaming replication

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: Switching timeline over streaming replication

On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:

I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error out.
Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.

Confirm my understanding of this feature:

This feature is for case when standby-1 who is going to be promoted to
master has archive mode 'on'.
As in that case only its timeline will change.

If above is right, then there can be other similar scenario's where it can
be used:

Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it

Now in above scenario also due to timeline mismatch it gives error, but your
patch should fix it.

Some parts of this patch are just refactoring that probably make sense
regardless of the new functionality. For example, I split off the
timeline history file related functions to a new file, timeline.c.
That's not very much code, but it's fairly isolated, and xlog.c is
massive, so I feel that anything that we can move off from xlog.c is a
good thing. I also moved off the two functions RestoreArchivedFile()
and ExecuteRecoveryCommand(), to a separate file. Those are also not
much code, but are fairly isolated. If no-one objects to those changes,
and the general direction this work is going to, I'm going split off
those refactorings to separate patches and commit them separately.

I also made the timeline history file a bit more detailed: instead of
recording just the WAL segment where the timeline was changed, it now
records the exact XLogRecPtr. That was required for the walsender to
know the switchpoint, without having to parse the XLOG records (it
reads and parses the history file, instead)

IMO separating timeline history file related functions to a new file is
good.
However I am not sure about splitting for RestoreArchivedFile() and
ExecuteRecoveryCommand() into separate file.
How about splitting for all Archive related functions:
static void XLogArchiveNotify(const char *xlog);
static void XLogArchiveNotifySeg(XLogSegNo segno);
static bool XLogArchiveCheckDone(const char *xlog);
static bool XLogArchiveIsBusy(const char *xlog);
static void XLogArchiveCleanup(const char *xlog);
..
..

In any case, it will be better if you can split it into multiple patches:
1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

It can make my testing and review for new feature patch little easier.

With Regards,
Amit Kapila.

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: Switching timeline over streaming replication

On Monday, September 24, 2012 9:08 PM md@rpzdesign.com wrote:
What a disaster waiting to happen. Maybe the only replication should be
master-master replication
so there is no need to sequence timelines or anything, all servers are
ready masters, no backups or failovers.
If you really do not want a master serving, then it should only be
handled in the routing
of traffic to that server and not the replication logic itself. The
only thing that ever came about
from failovers was the failure to turn over. The above is opinion
only.

This feature is for users who want to use master-standby configurations.

What do you mean by :
"then it should only be handled in the routing of traffic to that server
and not the replication logic itself."

Do you have any idea other than proposed implementation or do you see any
problem in currently proposed solution?

Show quoted text

On 9/24/2012 7:33 AM, Amit Kapila wrote:

On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:

I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error

out.

Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.

Confirm my understanding of this feature:

This feature is for case when standby-1 who is going to be promoted

to

master has archive mode 'on'.
As in that case only its timeline will change.

If above is right, then there can be other similar scenario's where

it can

be used:

Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it

Now in above scenario also due to timeline mismatch it gives error,

but your

patch should fix it.

Some parts of this patch are just refactoring that probably make

sense

regardless of the new functionality. For example, I split off the
timeline history file related functions to a new file, timeline.c.
That's not very much code, but it's fairly isolated, and xlog.c is
massive, so I feel that anything that we can move off from xlog.c is

a

good thing. I also moved off the two functions RestoreArchivedFile()
and ExecuteRecoveryCommand(), to a separate file. Those are also not
much code, but are fairly isolated. If no-one objects to those

changes,

and the general direction this work is going to, I'm going split off
those refactorings to separate patches and commit them separately.

I also made the timeline history file a bit more detailed: instead

of

recording just the WAL segment where the timeline was changed, it

now

records the exact XLogRecPtr. That was required for the walsender to
know the switchpoint, without having to parse the XLOG records (it
reads and parses the history file, instead)

IMO separating timeline history file related functions to a new file

is

good.
However I am not sure about splitting for RestoreArchivedFile() and
ExecuteRecoveryCommand() into separate file.
How about splitting for all Archive related functions:
static void XLogArchiveNotify(const char *xlog);
static void XLogArchiveNotifySeg(XLogSegNo segno);
static bool XLogArchiveCheckDone(const char *xlog);
static bool XLogArchiveIsBusy(const char *xlog);
static void XLogArchiveCleanup(const char *xlog);
..
..

In any case, it will be better if you can split it into multiple

patches:

1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

It can make my testing and review for new feature patch little

easier.

With Regards,
Amit Kapila.

Import Notes

Reply to msg id not found: 50607E64.70104@rpzdesign.com

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#2)

Re: Switching timeline over streaming replication

On 24.09.2012 16:33, Amit Kapila wrote:

On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:

I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error out.
Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.

Confirm my understanding of this feature:

This feature is for case when standby-1 who is going to be promoted to
master has archive mode 'on'.

No. This is for the case where there is no WAL archive.
archive_mode='off' on all servers.

Or to be precise, you can also have a WAL archive, but this patch
doesn't affect that in any way. This is strictly about streaming
replication.

As in that case only its timeline will change.

The timeline changes whenever you promote a standby. It's not related to
whether you have a WAL archive or not.

If above is right, then there can be other similar scenario's where it can
be used:

Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it

Now in above scenario also due to timeline mismatch it gives error, but your
patch should fix it.

If the master simply crashes or is shut down, and then restarted, the
timeline doesn't change. The standby will reconnect / poll the archive,
and sync up just fine, even without this patch.

However I am not sure about splitting for RestoreArchivedFile() and
ExecuteRecoveryCommand() into separate file.
How about splitting for all Archive related functions:
static void XLogArchiveNotify(const char *xlog);
static void XLogArchiveNotifySeg(XLogSegNo segno);
static bool XLogArchiveCheckDone(const char *xlog);
static bool XLogArchiveIsBusy(const char *xlog);
static void XLogArchiveCleanup(const char *xlog);

Hmm, sounds reasonable.

In any case, it will be better if you can split it into multiple patches:
1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

It can make my testing and review for new feature patch little easier.

Yep, I'll go ahead and split the patch. Thanks!

- Heikki

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#4)

Re: Switching timeline over streaming replication

On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote:

On 24.09.2012 16:33, Amit Kapila wrote:

On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:

I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error

out.

Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.

Confirm my understanding of this feature:

This feature is for case when standby-1 who is going to be promoted

to

master has archive mode 'on'.

No. This is for the case where there is no WAL archive.
archive_mode='off' on all servers.

Or to be precise, you can also have a WAL archive, but this patch
doesn't affect that in any way. This is strictly about streaming
replication.

As in that case only its timeline will change.

The timeline changes whenever you promote a standby. It's not related
to
whether you have a WAL archive or not.

Yes that is correct. I thought timeline change happens only when somebody
does PITR.
Can you please tell me why we change timeline after promotion, because the
original
Timeline concept was for PITR and I am not able to trace from code the
reason
why on promotion it is required?

If above is right, then there can be other similar scenario's where

it can

be used:

Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it

Now in above scenario also due to timeline mismatch it gives error,

but your

patch should fix it.

If the master simply crashes or is shut down, and then restarted, the
timeline doesn't change. The standby will reconnect / poll the archive,
and sync up just fine, even without this patch.

How about when Master does PITR when it comes again?

With Regards,
Amit Kapila.

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#5)

Re: Switching timeline over streaming replication

On 25.09.2012 14:10, Amit Kapila wrote:

On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote:

On 24.09.2012 16:33, Amit Kapila wrote:

On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:

I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error

out.

Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.

Confirm my understanding of this feature:

This feature is for case when standby-1 who is going to be promoted

to

master has archive mode 'on'.

No. This is for the case where there is no WAL archive.
archive_mode='off' on all servers.

Or to be precise, you can also have a WAL archive, but this patch
doesn't affect that in any way. This is strictly about streaming
replication.

As in that case only its timeline will change.

The timeline changes whenever you promote a standby. It's not related
to
whether you have a WAL archive or not.

Yes that is correct. I thought timeline change happens only when somebody
does PITR.
Can you please tell me why we change timeline after promotion, because the
original
Timeline concept was for PITR and I am not able to trace from code the
reason
why on promotion it is required?

Bumping the timeline helps to avoid confusion if, for example, the
master crashes, and the standby isn't fully in sync with it. In that
situation, there are some WAL records in the master that are not in the
standby, so promoting the standby is effectively the same as doing PITR.
If you promote the standby, and later try to turn the old master into a
standby server that connects to the new master, things will go wrong.
Assigning the new master a new timeline ID helps the system and the
administrator to notice that.

It's not bulletproof, for example you can easily avoid the timeline
change if you just remove recovery.conf and restart the server, but the
timelines help to manage such situations.

If above is right, then there can be other similar scenario's where

it can

be used:

Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it

Now in above scenario also due to timeline mismatch it gives error,

but your

patch should fix it.

If the master simply crashes or is shut down, and then restarted, the
timeline doesn't change. The standby will reconnect / poll the archive,
and sync up just fine, even without this patch.

How about when Master does PITR when it comes again?

Then the timeline will be bumped and this patch will be helpful.
Assuming the standby is behind the point in time that the master was
recovered to, it will be able to follow the master to the new timeline.

- Heikki

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Heikki Linnakangas (#4)

Re: Switching timeline over streaming replication

On 25.09.2012 10:08, Heikki Linnakangas wrote:

On 24.09.2012 16:33, Amit Kapila wrote:

In any case, it will be better if you can split it into multiple patches:
1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

It can make my testing and review for new feature patch little easier.

Yep, I'll go ahead and split the patch. Thanks!

Ok, here you go. xlog-c-split-1.patch contains the refactoring of
existing code, with no user-visible changes.
streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and
contains the new functionality.

- Heikki

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#7)

Re: Switching timeline over streaming replication

On Tuesday, September 25, 2012 6:29 PM Heikki Linnakangas wrote:
On 25.09.2012 10:08, Heikki Linnakangas wrote:

On 24.09.2012 16:33, Amit Kapila wrote:

In any case, it will be better if you can split it into multiple

patches:

1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

It can make my testing and review for new feature patch little

easier.

Yep, I'll go ahead and split the patch. Thanks!

Ok, here you go. xlog-c-split-1.patch contains the refactoring of
existing code, with no user-visible changes.
streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and
contains the new functionality.

Thanks, it will make my review easier than previous.

With Regards,
Amit Kapila.

md@rpzdesign.com

over 13 years ago

In reply to: Amit Kapila (#3)

Re: Switching timeline over streaming replication

Amit:

At some point, every master - slave replicator gets to the point where
they need
to start thinking about master-master replication.

Instead of getting stuck in the weeds to finally realize that
master-master is the ONLY way
to go, many developers do not start out planning for master - master,
but they should, out of habit.

You can save yourself a lot of grief just be starting with master-master
architecture.

But you don't have to USE it, you can just not send WRITE traffic to the
servers that you do
not want to WRITE to, but all of them should be WRITE servers. That way,
the only timeline
you ever need is your decision to send WRITE traffic request to them,
but there is nothing
that prevents you from running MASTER - MASTER all the time and skip the
whole slave thing
entirely.

At this point, I think synchronous replication is only for immediate
local replication needs
and async for all the master - master stuff.

cheers,

marco

Show quoted text

On 9/24/2012 9:44 PM, Amit Kapila wrote:

On Monday, September 24, 2012 9:08 PM md@rpzdesign.com wrote:
What a disaster waiting to happen. Maybe the only replication should be
master-master replication
so there is no need to sequence timelines or anything, all servers are
ready masters, no backups or failovers.
If you really do not want a master serving, then it should only be
handled in the routing
of traffic to that server and not the replication logic itself. The
only thing that ever came about
from failovers was the failure to turn over. The above is opinion
only.

This feature is for users who want to use master-standby configurations.

What do you mean by :
"then it should only be handled in the routing of traffic to that server
and not the replication logic itself."

Do you have any idea other than proposed implementation or do you see any
problem in currently proposed solution?

On 9/24/2012 7:33 AM, Amit Kapila wrote:

On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:

I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error

out.

Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.

Confirm my understanding of this feature:

This feature is for case when standby-1 who is going to be promoted

to

master has archive mode 'on'.
As in that case only its timeline will change.

If above is right, then there can be other similar scenario's where

it can

be used:

Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it

Now in above scenario also due to timeline mismatch it gives error,

but your

patch should fix it.

Some parts of this patch are just refactoring that probably make

sense

regardless of the new functionality. For example, I split off the
timeline history file related functions to a new file, timeline.c.
That's not very much code, but it's fairly isolated, and xlog.c is
massive, so I feel that anything that we can move off from xlog.c is

a

good thing. I also moved off the two functions RestoreArchivedFile()
and ExecuteRecoveryCommand(), to a separate file. Those are also not
much code, but are fairly isolated. If no-one objects to those

changes,

and the general direction this work is going to, I'm going split off
those refactorings to separate patches and commit them separately.

I also made the timeline history file a bit more detailed: instead

of

recording just the WAL segment where the timeline was changed, it

now

records the exact XLogRecPtr. That was required for the walsender to
know the switchpoint, without having to parse the XLOG records (it
reads and parses the history file, instead)

IMO separating timeline history file related functions to a new file

is

good.
However I am not sure about splitting for RestoreArchivedFile() and
ExecuteRecoveryCommand() into separate file.
How about splitting for all Archive related functions:
static void XLogArchiveNotify(const char *xlog);
static void XLogArchiveNotifySeg(XLogSegNo segno);
static bool XLogArchiveCheckDone(const char *xlog);
static bool XLogArchiveIsBusy(const char *xlog);
static void XLogArchiveCleanup(const char *xlog);
..
..

In any case, it will be better if you can split it into multiple

patches:

1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

It can make my testing and review for new feature patch little

easier.

With Regards,
Amit Kapila.

#10

Daniel Farina

daniel@heroku.com

over 13 years ago

In reply to: md@rpzdesign.com (#9)

Re: Switching timeline over streaming replication

On Tue, Sep 25, 2012 at 11:01 AM, md@rpzdesign.com <md@rpzdesign.com> wrote:

Amit:

At some point, every master - slave replicator gets to the point where they
need
to start thinking about master-master replication.

Even in a master-master system, the ability to cleanly swap leaders
managing a member of the master-master cluster is very useful. This
patch can make writing HA software for Postgres a lot less ridiculous.

Instead of getting stuck in the weeds to finally realize that master-master
is the ONLY way
to go, many developers do not start out planning for master - master, but
they should, out of habit.

You can save yourself a lot of grief just be starting with master-master
architecture.

I've seen more projects get stuck spinning their wheels on the one
Master-Master system to rule them all then succeed and move on. It
doesn't help that master-master does not have a single definition, and
different properties are possible with different logical models, too,
so that pervades its way up to the language layer.

As-is, managing single-master HA Postgres is a huge pain without this
patch. If there is work to be done on master-master, the logical
replication and event trigger work are probably more relevant, and I
know the authors of those projects are keen to make it more feasible
to experiment.

--
fdr

#11

John R Pierce

pierce@hogranch.com

over 13 years ago

In reply to: md@rpzdesign.com (#9)

Re: Switching timeline over streaming replication

On 09/25/12 11:01 AM, md@rpzdesign.com wrote:

At some point, every master - slave replicator gets to the point where
they need
to start thinking about master-master replication.

master-master and transactional integrity are mutually exclusive, except
perhaps in special cases like Oracle RAC, where the masters share a
coherent cache and implement global locks.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#12

md@rpzdesign.com

over 13 years ago

In reply to: John R Pierce (#11)

Re: Switching timeline over streaming replication

John:

Who has the money for oracle RAC or funding arrogant bastard Oracle CEO
Ellison to purchase another island?

Postgres needs CHEAP, easy to setup, self healing,
master-master-master-master and it needs it yesterday.

I was able to patch the 9.2.0 code base in 1 day and change my entire
architecture strategy for replication
into self healing async master-master-master and the tiniest bit of
sharding code imaginable

That is why I suggest something to replace OIDs with ROIDs for
replication ID. (CREATE TABLE with ROIDS)
I implement ROIDs as a uniform design pattern for the table structures.

Synchronous replication maybe between 2 local machines if absolutely no
local
hardware failure is acceptable, but cheap, scaleable synchronous,
TRANSACTIONAL, master-master-master-master is a real tough slog.

I could implement global locks in the external replication layer if I
choose, but there are much easier ways in routing
requests thru the load balancer and request sharding than trying to
manage global locks across the WAN.

Good luck with your HA patch for Postgres.

Thanks for all of the responses!

You guys are 15 times more active than the MySQL developer group, likely
because
they do not have a single db engine that meets all the requirements like PG.

marco

Show quoted text

On 9/25/2012 5:10 PM, John R Pierce wrote:

On 09/25/12 11:01 AM, md@rpzdesign.com wrote:

At some point, every master - slave replicator gets to the point
where they need
to start thinking about master-master replication.

master-master and transactional integrity are mutually exclusive,
except perhaps in special cases like Oracle RAC, where the masters
share a coherent cache and implement global locks.

#13

josh@agliodbs.com

over 13 years ago

In reply to: md@rpzdesign.com (#12)

Re: Switching timeline over streaming replication

I was able to patch the 9.2.0 code base in 1 day and change my entire
architecture strategy for replication
into self healing async master-master-master and the tiniest bit of
sharding code imaginable

Sounds cool. Do you have a fork available on Github? I'll try it out.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#14

josh@agliodbs.com

over 13 years ago

In reply to: Amit Kapila (#5)

Re: Switching timeline over streaming replication

Yes that is correct. I thought timeline change happens only when somebody
does PITR.
Can you please tell me why we change timeline after promotion, because the
original
Timeline concept was for PITR and I am not able to trace from code the
reason
why on promotion it is required?

The idea behind the timeline switch is to prevent a server from
subscribing to a master which is actually behind it. For example,
consider this sequence:

1. M1->async->S1
2. M1 is at xid 2001 and fails.
3. S1 did not receive transaction 2001 and is at xid 2000
4. S1 is promoted.
5. S1 processed an new, different transaction 2001
6. M1 is repaired and brought back up
7. M1 is subscribed to S1
8. M1 is now corrupt.

That's why we need the timeline switch.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#15

md@rpzdesign.com

over 13 years ago

In reply to: Josh Berkus (#13)

Re: Switching timeline over streaming replication

Josh:

The good part is you are the first person to ask for a copy
and I will send you the hook code that I have and you can be a good sport
and put it on GitHub, that is great, you can give us both credit for a
joint effort, I do the code,
you put it GitHub.

The not so good part is that the community has a bunch of other trigger work
and other stuff going on, so there was not much interest in non-WAL
replication hook code.

I do not have time to debate implementation nor wait for release of 9.3
with my needs not met, so I will just keep patching the hook code into
whatever release
code base comes along.

The bad news is that I have not implemented the logic of the external
replication daemon.

The other good and bad news is that you are free to receive the messages
from the hook code
thru the unix socket and implement replication any way you want and the
bad news is that you are free
to IMPLEMENT replication any way you want.

I am going to implement master-master-master-master SELF HEALING
replication, but that is just my preference.
Should take about a week to get it operational and another week to see
how it works in my geographically dispersed
servers in the cloud.

Send me a note if it is ok to send you a zip file with the source code
files that I touched in the 9.2 code base so you
can shove it up on GitHub.

Cheers,

marco

Show quoted text

On 9/26/2012 6:48 PM, Josh Berkus wrote:

I was able to patch the 9.2.0 code base in 1 day and change my entire
architecture strategy for replication
into self healing async master-master-master and the tiniest bit of
sharding code imaginable

Sounds cool. Do you have a fork available on Github? I'll try it out.

#16

amit.kapila16@gmail.com

over 13 years ago

In reply to: Josh Berkus (#14)

Re: Switching timeline over streaming replication

On Thursday, September 27, 2012 6:30 AM Josh Berkus wrote:

Yes that is correct. I thought timeline change happens only when

somebody

does PITR.
Can you please tell me why we change timeline after promotion,

because the

original
Timeline concept was for PITR and I am not able to trace from code

the

reason
why on promotion it is required?

The idea behind the timeline switch is to prevent a server from
subscribing to a master which is actually behind it. For example,
consider this sequence:

1. M1->async->S1
2. M1 is at xid 2001 and fails.
3. S1 did not receive transaction 2001 and is at xid 2000
4. S1 is promoted.
5. S1 processed an new, different transaction 2001
6. M1 is repaired and brought back up
7. M1 is subscribed to S1
8. M1 is now corrupt.

That's why we need the timeline switch.

Thanks.
I understood this point, but currently in documentation of Timelines, this usecase is not documented (Section 24.3.5).

With Regards,
Amit Kapila.

#17

Hannu Krosing

hannu@tm.ee

over 13 years ago

In reply to: md@rpzdesign.com (#12)

Re: Switching timeline over streaming replication

On 09/26/2012 01:02 AM, md@rpzdesign.com wrote:

John:

Who has the money for oracle RAC or funding arrogant bastard Oracle
CEO Ellison to purchase another island?

Postgres needs CHEAP, easy to setup, self healing,
master-master-master-master and it needs it yesterday.

I was able to patch the 9.2.0 code base in 1 day and change my entire
architecture strategy for replication
into self healing async master-master-master and the tiniest bit of
sharding code imaginable

Tell us about the compromises you had to make.

It is an established fact that you can either have it replicate fast and
loose or slow and correct.

In the fast and loose case you have to be ready to do a lot of
mopping-up in case of conflicts.

That is why I suggest something to replace OIDs with ROIDs for
replication ID. (CREATE TABLE with ROIDS)
I implement ROIDs as a uniform design pattern for the table structures.

Synchronous replication maybe between 2 local machines if absolutely
no local
hardware failure is acceptable, but cheap, scaleable synchronous,

Scaleable / synchronous is probably doable, if we are ready to take the
initial performance hit of lock propagation.

Show quoted text

TRANSACTIONAL, master-master-master-master is a real tough slog.

I could implement global locks in the external replication layer if I
choose, but there are much easier ways in routing
requests thru the load balancer and request sharding than trying to
manage global locks across the WAN.

Good luck with your HA patch for Postgres.

Thanks for all of the responses!

You guys are 15 times more active than the MySQL developer group,
likely because
they do not have a single db engine that meets all the requirements
like PG.

marco

On 9/25/2012 5:10 PM, John R Pierce wrote:

On 09/25/12 11:01 AM, md@rpzdesign.com wrote:

At some point, every master - slave replicator gets to the point
where they need
to start thinking about master-master replication.

master-master and transactional integrity are mutually exclusive,
except perhaps in special cases like Oracle RAC, where the masters
share a coherent cache and implement global locks.

#18

Euler Taveira de Oliveira

josh@agliodbs.com

over 13 years ago

In reply to: md@rpzdesign.com (#15)

Re: MD's replication WAS Switching timeline over streaming replication

On 9/26/12 6:17 PM, md@rpzdesign.com wrote:

Josh:

The good part is you are the first person to ask for a copy
and I will send you the hook code that I have and you can be a good sport
and put it on GitHub, that is great, you can give us both credit for a
joint effort, I do the code,
you put it GitHub.

Well, I think it just makes sense for you to put it up somewhere public
so that folks can review it; if not Github, then somewhere else. If
it's useful and well-written, folks will be interested.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#19

euler@timbira.com

over 13 years ago

In reply to: Amit Kapila (#16)

Re: Switching timeline over streaming replication

On 27-09-2012 01:30, Amit Kapila wrote:

I understood this point, but currently in documentation of Timelines, this usecase is not documented (Section 24.3.5).

Timeline documentation was written during PITR implementation. There wasn't SR
yet. AFAICS it doesn't cite SR but is sufficiently generic (it use 'wal
records' term to explain the feature). Feel free to reword those paragraphs
mentioning SR.

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#20

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#7)

Re: Switching timeline over streaming replication

On Tuesday, September 25, 2012 6:29 PM Heikki Linnakangas wrote:
On 25.09.2012 10:08, Heikki Linnakangas wrote:

On 24.09.2012 16:33, Amit Kapila wrote:

In any case, it will be better if you can split it into multiple

patches:

1. Having new functionality of "Switching timeline over streaming
replication"
2. Refactoring related changes.

Ok, here you go. xlog-c-split-1.patch contains the refactoring of existing

code, with no user-visible changes.

streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and

contains the new functionality.

Please find the initial review of the patch. Still more review is pending,
but I thought whatever is done I shall post

Basic stuff:
----------------------
- Patch applies OK
- Compiles cleanly with no warnings
- Regression tests pass.
- Documentation changes are mostly fine.
- Basic replication tests works.

Testing
---------
Start primary server
Start standby server
Start cascade standby server

Stopped the primary server

Promoted the standby server with ./pg_ctl -D data_repl promote

In postgresql.conf file
archive_mode = off

The following logs are observing in the cascade standby server.

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
LOG: record with zero length at 0/17E3888
LOG: re-handshaking at position 0/1000000 on tli 1
LOG: fetching timeline history file for timeline 2 from primary server
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
LOG: re-handshaking at position 0/1000000 on tli 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1

In postgresql.conf file
archive_mode = on

The following logs are observing in the cascade standby server.

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
LOG: record with zero length at 0/20144B8
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
LOG: re-handshaking at position 0/2000000 on tli 1
LOG: fetching timeline history file for timeline 2 from primary server
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
LOG: re-handshaking at position 0/2000000 on tli 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions

Verified that files are present in respective directories.

Code Review
----------------
1. In function readTimeLineHistory(), 
   two mechanisms are used to fetch timeline from history file 
   +                sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi,
&switchpoint_lo); 
+ 
+                /* expect a numeric timeline ID as first field of line */ 
+                tli = (TimeLineID) strtoul(ptr, &endptr, 0); 
   If we use new mechanism, it will not be able to detect error as it is
doing in current case.

2.   In function readTimeLineHistory(), 
+        fd = AllocateFile(path, "r"); 
+        if (fd == NULL) 
+        { 
+                if (errno != ENOENT) 
+                        ereport(FATAL, 
+                                        (errcode_for_file_access(), 
+                                         errmsg("could not open file
\"%s\": %m", path))); 
+                /* Not there, so assume no parents */ 
+                return list_make1_int((int) targetTLI); 
+        } 
   still return list_make1_int((int) targetTLI); is used.

3. Function timelineOfPointInHistory(), should return the timeline of recptr
passed to it.
a. is it okay to decide based on xlog recordpointer that which timeline
it belongs to, as different
timelines can have same xlog recordpointer?
b. it seems from logic that it will return timeline previous to the
timeline of recptr passed.
For example if the timeline 3's switchpoint is equal to recptr passed
then it will return timeline 2.

4. In writeTimeLineHistory function variable endTLI is never used.

5. In header of function writeTimeLineHistory(), can give explanation about
XLogRecPtr switchpoint

6. @@ -6869,11 +5947,35 @@ StartupXLOG(void) 
          */ 
         if (InArchiveRecovery) 
         { 
+                char        reason[200]; 
+

+                /* 
+                 * Write comment to history file to explain why and where
timeline 
+                 * changed. Comment varies according to the recovery target
used. 
+                 */ 
+                if (recoveryTarget == RECOVERY_TARGET_XID) 
+                        snprintf(reason, sizeof(reason), 
+                                         "%s transaction %u", 
+                                         recoveryStopAfter ? "after" :
"before", 
+                                         recoveryStopXid);

In the comment above this line you mentioned why and where timeline changed.

However in the reason field only specifies about where part.

7. + * Returns the redo pointer of the "previous" checkpoint. 
+GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli) 
+{ 
+        if (InRedo) 
+        { 
+                LWLockAcquire(ControlFileLock, LW_SHARED); 
+                *oldrecptr = ControlFile->checkPointCopy.redo; 
+                *oldtli = ControlFile->checkPointCopy.ThisTimeLineID; 
+                LWLockRelease(ControlFileLock); 
+        }

a. In this function, is it required to take ControlFileLock as earlier also
there was no lock to protect this read
when it get called from RestoreArchivedFile, and I think at this point no
one else can modify these values.
However for code consistency purpose like whenever or wherever read the
controlfile values, read it with read lock.

b. As per your comment it should have returned "previous" checkpoint,
however the function returns values of
latest checkpoint.

8. In function writeTimeLineHistoryFile(), will it not be better to directly
write rather than to later do pg_fsync().
as it is just one time write.

9. +XLogRecPtr 
+timeLineSwitchPoint(XLogRecPtr startpoint, TimeLineID tli) 
.. 
.. 
+                         * starting point. This is because the client can
legimately 
spelling of legitimately needs to be corrected.

10.+XLogRecPtr 
+timeLineSwitchPoint(XLogRecPtr startpoint, TimeLineID tli) 
.. 
.. 
+ if (tli < ThisTimeLineID) 
+        { 
+                if (!nexttle) 
+                        elog(ERROR, "could not find history entry for child
of timeline %u", tli); /* shouldn't happen */ 
+        }

I don't understand the meaning of the above check, as I think this situation
can occur
when this function gets called from StartReplication, because always tli
sent by standby to new
master will be less than ThisTimeLineID and it can be first in list.

Documentation
---------------
1. In explanation of TIMELINE_HISTORY:
Filename of the timeline history file. This is always of the form
[insert correct example here].
Give example.
2. In protocol.sgml change, I feel better explain when the COPYDONE message
will be initiated.

With Regards,
Amit Kapila.

#21

amit.kapila16@gmail.com

over 13 years ago

In reply to: Amit Kapila (#20)

#22

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#21)

#23

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#22)

#24

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#23)

#25

amit.kapila16@gmail.com

over 13 years ago

In reply to: Amit Kapila (#23)

#26

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#23)

#27

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Heikki Linnakangas (#26)

#28

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Tom Lane (#27)

#29

masao.fujii@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#24)

#30

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#28)

#31

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Simon Riggs (#30)

#32

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#31)

#33

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#26)

#34

amit.kapila16@gmail.com

over 13 years ago

In reply to: Amit Kapila (#25)

#35

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Fujii Masao (#29)

#36

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Fujii Masao (#29)

#37

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#34)

#38

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#37)

#39

thom@linux.com

over 13 years ago

In reply to: Heikki Linnakangas (#1)

#40

amit.kapila16@gmail.com

over 13 years ago

In reply to: Amit Kapila (#38)

#41

Alvaro Herrera

alvherre@2ndquadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#37)

#42

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#40)

#43

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#38)

#44

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Heikki Linnakangas (#42)

#45

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#44)

#46

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Heikki Linnakangas (#43)

#47

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Tom Lane (#46)

#48

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#44)

#49

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#48)

#50

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#49)

#51

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#50)

#52

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#51)

#53

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Heikki Linnakangas (#47)

#54

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#53)

#55

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#54)

#56

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#55)

#57

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Amit Kapila (#56)

#58

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#54)

#59

senthilnathan

senthilnathan.t@gmail.com

over 13 years ago

In reply to: Amit Kapila (#20)

#60

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: senthilnathan (#59)

#61

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#50)

#62

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#61)

#63

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#62)

#64

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#63)

#65

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Amit Kapila (#64)

#66

amit.kapila16@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#65)

#67

josh@agliodbs.com

over 13 years ago

In reply to: Heikki Linnakangas (#65)

#68

masao.fujii@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#65)

#69

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Josh Berkus (#67)

#70

thom@linux.com

over 13 years ago

In reply to: Heikki Linnakangas (#69)

#71

josh@agliodbs.com

over 13 years ago

In reply to: Thom Brown (#70)

#72

josh@agliodbs.com

over 13 years ago

In reply to: Josh Berkus (#71)

#73

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Josh Berkus (#72)

#74

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Heikki Linnakangas (#73)

#75

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Heikki Linnakangas (#74)

#76

josh@agliodbs.com

over 13 years ago

In reply to: Heikki Linnakangas (#74)

#77

josh@agliodbs.com

over 13 years ago

In reply to: Heikki Linnakangas (#75)

#78

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Thom Brown (#70)

#79

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Heikki Linnakangas (#78)

#80

masao.fujii@gmail.com

over 13 years ago

In reply to: Fujii Masao (#68)

#81

josh@agliodbs.com

over 13 years ago

In reply to: Heikki Linnakangas (#78)

#82

thom@linux.com

over 13 years ago

In reply to: Heikki Linnakangas (#78)

#83

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Thom Brown (#82)

#84

thom@linux.com

over 13 years ago

In reply to: Heikki Linnakangas (#83)

#85

masao.fujii@gmail.com

over 13 years ago

In reply to: Fujii Masao (#80)

#86