clarification about standby promotion

Started by Benoit Lobréauover 9 years ago9 messagesgeneral

benoit.lobreau@gmail.com

over 9 years ago

Hi,

I would like to clarify something about standby promotion. From the
sentence below. I understand that, during the promotion process, postgres
will replay all the available wals (from the archive or pg_xlog).

https://www.postgresql.org/docs/9.5/static/warm-standby.html#STREAMING-REPLICATION

25.2.2. Standby Server Operation
...
Standby mode is exited and the server switches to normal operation when
pg_ctl promote is run or a trigger file is found (trigger_file). Before
failover, any WAL immediately available in the archive or in pg_xlog will
be restored, but no attempt is made to connect to the master.

I have seen several articles like this one (
https://www.enterprisedb.com/switchoverswitchback-postgresql-93) where they
say that pg_last_xlog_receive_location() and pg_last_xlog_replay_location()
should be checked before promotion. I don't understand why since they will
be replayed anyway. Did something changed since 9.3 about this ?

Thanks for you help. please excuse my poor english.

Benoit

Venkata B Nagothi

nag1010@gmail.com

over 9 years ago

In reply to: Benoit Lobréau (#1)

Re: clarification about standby promotion

On Thu, Feb 9, 2017 at 4:53 AM, Benoit Lobréau <benoit.lobreau@gmail.com>
wrote:

Hi,

I would like to clarify something about standby promotion. From the
sentence below. I understand that, during the promotion process, postgres
will replay all the available wals (from the archive or pg_xlog).

Yes, that is correct.

https://www.postgresql.org/docs/9.5/static/warm-standby.
html#STREAMING-REPLICATION

25.2.2. Standby Server Operation
...
Standby mode is exited and the server switches to normal operation when
pg_ctl promote is run or a trigger file is found (trigger_file). Before
failover, any WAL immediately available in the archive or in pg_xlog will
be restored, but no attempt is made to connect to the master.

I have seen several articles like this one (https://www.enterprisedb.com/
switchoverswitchback-postgresql-93) where they say that
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() should
be checked before promotion. I don't understand why since they will be
replayed anyway. Did something changed since 9.3 about this ?

The above link is about improvements related to switch-over/switch-back
process from the version 9.3. What you are asking is about standby
promotion process. When the standby is promoted, as mentioned in the docs,
the standby server attempts to apply the available WAL during the promotion
process and will not attempt to connect to master. Which means, you would
not know if there are any pending WALs yet to be streamed from master or in
other words, standby may not know if the master is a-head. It is important
that you know the standby current position by executing the above mentioned
*xlog* functions. Hope i answered your question.

Regards,

Venkata B N
Database Consultant

Benoit Lobréau

benoit.lobreau@gmail.com

over 9 years ago

In reply to: Venkata B Nagothi (#2)

Re: clarification about standby promotion

hi,

Tank you for the confirmation !

For the second part, I understand your explanation but I fail to see how
checking what we have replayed against what we have received will confirm
we have received everything (unless we are in sync replication).

Have a good day !

Benoit.

Jehan-Guillaume (ioguix) de Rorthais

ioguix@free.fr

over 9 years ago

In reply to: Venkata B Nagothi (#2)

Re: clarification about standby promotion

On Thu, 9 Feb 2017 10:41:15 +1100
Venkata B Nagothi <nag1010@gmail.com> wrote:

On Thu, Feb 9, 2017 at 4:53 AM, Benoit Lobréau <benoit.lobreau@gmail.com>
wrote:

Hi,

I would like to clarify something about standby promotion. From the
sentence below. I understand that, during the promotion process, postgres
will replay all the available wals (from the archive or pg_xlog).

Yes, that is correct.

https://www.postgresql.org/docs/9.5/static/warm-standby.
html#STREAMING-REPLICATION

25.2.2. Standby Server Operation
...
Standby mode is exited and the server switches to normal operation when
pg_ctl promote is run or a trigger file is found (trigger_file). Before
failover, any WAL immediately available in the archive or in pg_xlog will
be restored, but no attempt is made to connect to the master.

I have seen several articles like this one (https://www.enterprisedb.com/
switchoverswitchback-postgresql-93) where they say that
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() should
be checked before promotion. I don't understand why since they will be
replayed anyway. Did something changed since 9.3 about this ?

The above link is about improvements related to switch-over/switch-back
process from the version 9.3. What you are asking is about standby
promotion process. When the standby is promoted, as mentioned in the docs,
the standby server attempts to apply the available WAL during the promotion
process and will not attempt to connect to master.

Sure, but when you are doing a switchover, the standby is supposed to be
connected to the master when you shutdown the master. So based on the doc,
the standby should receive **everything** from the master before the master
actually shutdown.

Which means, you would not know if there are any pending WALs yet to be
streamed from master or in other words, standby may not know if the master is
a-head. It is important that you know the standby current position by
executing the above mentioned *xlog* functions.

Sure, but in the link pointed by Benoit, the check is only comparing what the
**standby** received with what the standby **replayed**. It seems there's no
point to do such check.

What you are describing is to check the very last LSN from the master (its
shutdown checkpoint) with what the slave received. The only way to check this
is to compare LSN from the shut down master to the LSN the slave received.

in PAF[1]https://github.com/dalibo/PAF, this check is:

* shut down the master
* use pg_controldata to find its shutdown checkpoint
* force a checkpoint on the standby (which in theory received everything
from the master by streaming)
* using pg_dumpxlog, check you received the shutdown checkpoint

[1]: https://github.com/dalibo/PAF

--
Jehan-Guillaume de Rorthais
Dalibo

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rakesh Kumar

rakeshkumar464@outlook.com

over 9 years ago

In reply to: Jehan-Guillaume (ioguix) de Rorthais (#4)

Re: clarification about standby promotion

Sure, but when you are doing a switchover, the standby is supposed to be
connected to the master when you shutdown the master. So based on the doc,
the standby should receive **everything** from the master before the master
actually shutdown.

We use 9.5 and even in that version there is no handshake during role reversal.
In fact PG does not have concept of handshake and role reversal unlike in Db2, oracle
and sqlserver you can switchover from one to other by a single command.

Our DBAs use home grown script for switchover which does the following:

1 - first kill postmaster in the outgoing primary.
2 - promote the standby as the new primary
3 - use timeline to resync former primary (of step 1) with the new primary (step 2).
4- open up connection to the new standby (former primary)

I hope a more elegant way exists as in other RDBMS.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Venkata B Nagothi

nag1010@gmail.com

over 9 years ago

In reply to: Jehan-Guillaume (ioguix) de Rorthais (#4)

Re: clarification about standby promotion

On Fri, Feb 10, 2017 at 2:42 AM, Jehan-Guillaume de Rorthais <ioguix@free.fr

wrote:

On Thu, 9 Feb 2017 10:41:15 +1100
Venkata B Nagothi <nag1010@gmail.com> wrote:

On Thu, Feb 9, 2017 at 4:53 AM, Benoit Lobréau <benoit.lobreau@gmail.com

wrote:

Hi,

I would like to clarify something about standby promotion. From the
sentence below. I understand that, during the promotion process,

postgres

will replay all the available wals (from the archive or pg_xlog).

Yes, that is correct.

https://www.postgresql.org/docs/9.5/static/warm-standby.
html#STREAMING-REPLICATION

25.2.2. Standby Server Operation
...
Standby mode is exited and the server switches to normal operation when
pg_ctl promote is run or a trigger file is found (trigger_file). Before
failover, any WAL immediately available in the archive or in pg_xlog

will

be restored, but no attempt is made to connect to the master.

I have seen several articles like this one (

https://www.enterprisedb.com/

switchoverswitchback-postgresql-93) where they say that
pg_last_xlog_receive_location() and pg_last_xlog_replay_location()

should

be checked before promotion. I don't understand why since they will be
replayed anyway. Did something changed since 9.3 about this ?

The above link is about improvements related to switch-over/switch-back
process from the version 9.3. What you are asking is about standby
promotion process. When the standby is promoted, as mentioned in the

docs,

the standby server attempts to apply the available WAL during the

promotion

process and will not attempt to connect to master.

Sure, but when you are doing a switchover, the standby is supposed to be
connected to the master when you shutdown the master. So based on the doc,
the standby should receive **everything** from the master before the master
actually shutdown.

Yes, Standby would receive everything from the master before master shuts
down.
To perform switch-over / switch-back, It is important that, the standby
receives something called last known position from the master automatically
even after the master shuts down, which does not automatically happen in
9.2 and this happens automatically from 9.3. This improvement is only
applicable in the case when you want to swap roles of master and standby.

Which means, you would not know if there are any pending WALs yet to be

streamed from master or in other words, standby may not know if the

master is

a-head. It is important that you know the standby current position by
executing the above mentioned *xlog* functions.

Sure, but in the link pointed by Benoit, the check is only comparing what
the
**standby** received with what the standby **replayed**. It seems there's
no
point to do such check.

What you are describing is to check the very last LSN from the master (its
shutdown checkpoint) with what the slave received. The only way to check
this
is to compare LSN from the shut down master to the LSN the slave received.

I think, i need to be more clear here. What i meant was - If you wish to
intentionally promote standby, then, yes, before just shutting down the
master check the last *xlog* or checkpoint position and in real-time, this
must be done after the application is shutdown completely and the database
is not encountering any data changes. In application terminology, the data
must be in complete sync between master and standby. But, yes, i agree that
the only way to check the master-standby position sync after the master
shuts down is by comparing pg_controldata output of master and standby.

Regards,

Venkata B N
Database Consultant

Jehan-Guillaume (ioguix) de Rorthais

ioguix@free.fr

over 9 years ago

In reply to: Rakesh Kumar (#5)

Re: clarification about standby promotion

On Thu, 9 Feb 2017 18:27:30 +0000
Rakesh Kumar <rakeshkumar464@outlook.com> wrote:

Sure, but when you are doing a switchover, the standby is supposed to be
connected to the master when you shutdown the master. So based on the doc,
the standby should receive **everything** from the master before the master
actually shutdown.

We use 9.5 and even in that version there is no handshake during role
reversal. In fact PG does not have concept of handshake and role reversal
unlike in Db2, oracle and sqlserver you can switchover from one to other by a
single command.

Our DBAs use home grown script for switchover which does the following:

1 - first kill postmaster in the outgoing primary.

Kill ? You mean "pg_ctl stop -m fast" right ?

2 - promote the standby as the new primary
3 - use timeline to resync former primary (of step 1) with the new primary
(step 2).

Use timeline to resync ? Timeline is an internal mechanism in PostgreSQL, not
a tool, so I don't get this step...You mean using pg_rewind ?

So far, I stick to my procedure (given in another answer) which looks a lot
more safer.

I hope a more elegant way exists as in other RDBMS.

Me too. But it require a lot of work as a master is not able to "demote" as a
standby without a restart. As far as I know, the standby code path is only
accessible during startup.

Note that you could switchover in one command as well using external tools like
PAF [1]https://github.com/dalibo/PAF[2]http://www.dalibo.org/_media/2016-pgconfeu-paf.html.gz. But PAF comes with a lot more features than just switchover and
rely on Pacemaker...

[1]: https://github.com/dalibo/PAF
[2]: http://www.dalibo.org/_media/2016-pgconfeu-paf.html.gz

Cheers,

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rakesh Kumar

rakeshkumar464@outlook.com

over 9 years ago

In reply to: Jehan-Guillaume (ioguix) de Rorthais (#7)

Re: clarification about standby promotion

Kill ? You mean "pg_ctl stop -m fast" right ?

Yes.

Use timeline to resync ? Timeline is an internal mechanism in PostgreSQL, not
a tool, so I don't get this step...You mean using pg_rewind ?

pg_rewind which uses timeline.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Michael Paquier

michael@paquier.xyz

over 9 years ago

In reply to: Jehan-Guillaume (ioguix) de Rorthais (#7)

Re: clarification about standby promotion

On Fri, Feb 10, 2017 at 7:15 PM, Jehan-Guillaume de Rorthais
<ioguix@free.fr> wrote:

On Thu, 9 Feb 2017 18:27:30 +0000
Rakesh Kumar <rakeshkumar464@outlook.com> wrote:

Sure, but when you are doing a switchover, the standby is supposed to be
connected to the master when you shutdown the master. So based on the doc,
the standby should receive **everything** from the master before the master
actually shutdown.

We use 9.5 and even in that version there is no handshake during role
reversal. In fact PG does not have concept of handshake and role reversal
unlike in Db2, oracle and sqlserver you can switchover from one to other by a
single command.

Our DBAs use home grown script for switchover which does the following:

1 - first kill postmaster in the outgoing primary.

Kill ? You mean "pg_ctl stop -m fast" right ?

If at shutdown the previous primary does not issue a checkpoint, there
is no way to be sure that it will begin replaying WAL from the point
after WAL has forked. In short, if you kill it, then try to connect it
back to the new promoted primary, it may be able to begin replicating
changes. And if at the moment it was killed an unfinished checkpoint
was running, you will much likely corrupt a couple of pages on your
primary.

2 - promote the standby as the new primary
3 - use timeline to resync former primary (of step 1) with the new primary
(step 2).

Use timeline to resync ? Timeline is an internal mechanism in PostgreSQL, not
a tool, so I don't get this step...You mean using pg_rewind ?

So far, I stick to my procedure (given in another answer) which looks a lot
more safer.

Definitely yes, Guillaune is right here. You need to rewind things.
After being sure that the previous primary has been stopped cleanly.
You could as well kill it once if you want to make your server go down
as fast as possible once. But at next startup let it recover
completely, then stop it cleanly, and finally you will be able to
recycle it consistently with pg_rewind.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general