patch proposal

Started by Venkata B Nagothiover 9 years ago40 messages
#1Venkata B Nagothi
nag1010@gmail.com

Hi,

During the recovery process, It would be nice if PostgreSQL generates an
error by aborting the recovery process (instead of starting-up the cluster)
if the intended recovery target point is not reached and give an option to
DBA to resume the recovery process from where it exactly stopped.

The issue here is, if by any chance, the required WALs are not available or
if there is any WAL missing or corrupted at the restore_command location,
then PostgreSQL recovers until the end of the last available WAL and
starts-up the cluster. At a later point-of-time, if the missing WAL is
found, then, it is not possible to resume the recovery process from where
it stopped previously. The whole backup restoration + recovery process must
be initiated from the beginning which can be time consuming especially when
recovering huge clusters sizing up to Giga bytes and Tera Bytes requiring
1000s of WALs to be copied over for recovery.

Any thoughts ? comments?

I can work on this patch based on the comments/feedback.

Regards,
Venkata B N

Fujitsu Australia

#2David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#1)
Re: patch proposal

On 8/15/16 2:33 AM, Venkata B Nagothi wrote:

During the recovery process, It would be nice if PostgreSQL generates an
error by aborting the recovery process (instead of starting-up the
cluster) if the intended recovery target point is not reached and give
an option to DBA to resume the recovery process from where it exactly
stopped.

Thom wrote a patch [1]/messages/by-id/CAA-aLv4K2-9a+cvK75dkZkYD1etxpaH+9HC0vm9Ebw2up9Co2A@mail.gmail.com recently that gives warnings in this case. You
might want to have a look at that first.

The issue here is, if by any chance, the required WALs are not available
or if there is any WAL missing or corrupted at the restore_command
location, then PostgreSQL recovers until the end of the last available
WAL and starts-up the cluster.

You can use pause_at_recovery_target/recovery_target_action (depending
on your version) to prevent promotion. That would work for your stated
scenario but not for the scenario where replay starts (or the database
reaches consistency) after the recovery target.

[1]: /messages/by-id/CAA-aLv4K2-9a+cvK75dkZkYD1etxpaH+9HC0vm9Ebw2up9Co2A@mail.gmail.com
/messages/by-id/CAA-aLv4K2-9a+cvK75dkZkYD1etxpaH+9HC0vm9Ebw2up9Co2A@mail.gmail.com

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Venkata B Nagothi
nag1010@gmail.com
In reply to: David Steele (#2)
Re: patch proposal

On Tue, Aug 16, 2016 at 2:50 AM, David Steele <david@pgmasters.net> wrote:

On 8/15/16 2:33 AM, Venkata B Nagothi wrote:

During the recovery process, It would be nice if PostgreSQL generates an
error by aborting the recovery process (instead of starting-up the
cluster) if the intended recovery target point is not reached and give
an option to DBA to resume the recovery process from where it exactly
stopped.

Thom wrote a patch [1] recently that gives warnings in this case. You
might want to have a look at that first.

That is good to know. Yes, this patch is about generating a more meaningful
output messages for recovery process, which makes sense.

The issue here is, if by any chance, the required WALs are not available
or if there is any WAL missing or corrupted at the restore_command
location, then PostgreSQL recovers until the end of the last available
WAL and starts-up the cluster.

You can use pause_at_recovery_target/recovery_target_action (depending
on your version) to prevent promotion. That would work for your stated
scenario but not for the scenario where replay starts (or the database
reaches consistency) after the recovery target.

The above said parameters can be configured to pause, shutdown or prevent
promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is not
reached at all ( i mean, half-complete or in-complete recovery) and there
are lots of WALs still pending to be replayed - in this situation,
PostgreSQL just completes the archive recovery until the end of the last
available WAL (WAL file "00000001000000000000001E" in my case) and
starts-up the cluster by generating an error message (saying
"00000001000000000000001F" not found).

Note: I am testing in PostgreSQL-9.5

LOG: restored log file "00000001000000000000001E" from archive
cp: cannot stat ‘/data/pgrestore9531/00000001000000000000001F’: No such
file or directory
LOG: redo done at 0/1EFFDBB8
LOG: last completed transaction was at log time 2016-08-15
11:04:26.795902+10

I have used the following recovery* parameters in the recovery.conf file
here and have intentionally not supplied all the WAL archives needed for
the recovery process to reach the target xid.

recovery_target_xid = xxxx,
recovery_target_inclusive = true
recovery_target_action = pause

It would be nice if PostgreSQL pauses the recovery in-case its not complete
(because of missing or corrupt WAL), shutdown the cluster and allows the
DBA to restart the replay of the remaining WAL Archive files to continue
recovery (from where it stopped previously) until the recovery target point
is reached.

Regards,
Venkata B N

Fujitsu Australia

#4David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#3)
Re: patch proposal

On 8/16/16 1:08 AM, Venkata B Nagothi wrote:

The issue here is, if by any chance, the required WALs are not available
or if there is any WAL missing or corrupted at the restore_command
location, then PostgreSQL recovers until the end of the last available
WAL and starts-up the cluster.

You can use pause_at_recovery_target/recovery_target_action (depending
on your version) to prevent promotion. That would work for your stated
scenario but not for the scenario where replay starts (or the database
reaches consistency) after the recovery target.

The above said parameters can be configured to pause, shutdown or
prevent promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is
not reached at all ( i mean, half-complete or in-complete recovery) and
there are lots of WALs still pending to be replayed - in this situation,
PostgreSQL just completes the archive recovery until the end of the last
available WAL (WAL file "00000001000000000000001E" in my case) and
starts-up the cluster by generating an error message (saying
"00000001000000000000001F" not found).

Note: I am testing in PostgreSQL-9.5

LOG: restored log file "00000001000000000000001E" from archive
cp: cannot stat ‘/data/pgrestore9531/00000001000000000000001F’: No
such file or directory
LOG: redo done at 0/1EFFDBB8
LOG: last completed transaction was at log time 2016-08-15
11:04:26.795902+10

I have used the following recovery* parameters in the recovery.conf file
here and have intentionally not supplied all the WAL archives needed for
the recovery process to reach the target xid.

recovery_target_xid = xxxx,
recovery_target_inclusive = true
recovery_target_action = pause

It would be nice if PostgreSQL pauses the recovery in-case its not
complete (because of missing or corrupt WAL), shutdown the cluster and
allows the DBA to restart the replay of the remaining WAL Archive files
to continue recovery (from where it stopped previously) until the
recovery target point is reached.

OK, I see what you mean. I think it would be a good idea to work
through the various scenarios and define what Postgres would do in each
scenario before actually creating a patch.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Stephen Frost
sfrost@snowman.net
In reply to: Venkata B Nagothi (#3)
Re: patch proposal

Greetings,

* Venkata B Nagothi (nag1010@gmail.com) wrote:

The above said parameters can be configured to pause, shutdown or prevent
promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is not
reached at all ( i mean, half-complete or in-complete recovery) and there
are lots of WALs still pending to be replayed - in this situation,

PG doesn't know that there are still WALs to be replayed.

PostgreSQL just completes the archive recovery until the end of the last
available WAL (WAL file "00000001000000000000001E" in my case) and
starts-up the cluster by generating an error message (saying
"00000001000000000000001F" not found).

That's not a PG error, that's an error from cp. From PG's perspective,
your restore command has said that all of the WAL has been replayed.

If that's not what you want then change your restore command to return
an exit code > 125, which tells PG that it's unable to restore that WAL
segment.

It would be nice if PostgreSQL pauses the recovery in-case its not complete
(because of missing or corrupt WAL), shutdown the cluster and allows the
DBA to restart the replay of the remaining WAL Archive files to continue
recovery (from where it stopped previously) until the recovery target point
is reached.

Reaching the end of WAL isn't an error and I don't believe it makes any
sense to treat it like it is. You can specify any recovery target point
you wish, including ones that don't exist, and that's not an error
either.

I could see supporting an additional "pause" option that means "pause at
the end of WAL if you don't reach the recovery target point". I'd also
be happy with a warning being emitted in the log if the recovery target
point isn't reached before reaching the end of WAL, but I don't think it
makes sense to change the existing behavior.

Thanks!

Stephen

#6Venkata B Nagothi
nag1010@gmail.com
In reply to: Stephen Frost (#5)
Re: patch proposal

On Wed, Aug 17, 2016 at 12:06 AM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Venkata B Nagothi (nag1010@gmail.com) wrote:

The above said parameters can be configured to pause, shutdown or prevent
promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is

not

reached at all ( i mean, half-complete or in-complete recovery) and there
are lots of WALs still pending to be replayed - in this situation,

PG doesn't know that there are still WALs to be replayed.

PG doesn't know that there are still WALs to be replayed. Since, i have
given an particular recovery target and PG knows the current replay
position,
I would say, it would be good if PG warns and pauses there by saying
recovery target point is not reached.

It would be nice if PostgreSQL pauses the recovery in-case its not
complete

(because of missing or corrupt WAL), shutdown the cluster and allows the
DBA to restart the replay of the remaining WAL Archive files to continue
recovery (from where it stopped previously) until the recovery target

point

is reached.

Agreed. Reaching end-of-WAL is not an error. It sounds more like a
limitation in certain scenarios.

Reaching the end of WAL isn't an error and I don't believe it makes any

sense to treat it like it is. You can specify any recovery target point
you wish, including ones that don't exist, and that's not an error
either.

I could see supporting an additional "pause" option that means "pause at
the end of WAL if you don't reach the recovery target point". I'd also
be happy with a warning being emitted in the log if the recovery target
point isn't reached before reaching the end of WAL, but I don't think it
makes sense to change the existing behavior.

Agreed. Additional option like "pause" would. As long as there is an option
to ensure following happens if the recovery target is not reached -

a) PG pauses the recovery at the end of the WAL
b) Generates a warning in the log file saying that recovery target point
is not reached (there is a patch being worked upon on by Thom on this)
c) Does not open-up the database exiting from the recovery process by
giving room to resume the replay of WALs

Regards,
Venkata B N

Fujitsu Australia

#7Stephen Frost
sfrost@snowman.net
In reply to: Venkata B Nagothi (#6)
Re: patch proposal

Venkata,

* Venkata B Nagothi (nag1010@gmail.com) wrote:

Agreed. Additional option like "pause" would. As long as there is an option
to ensure following happens if the recovery target is not reached -

a) PG pauses the recovery at the end of the WAL
b) Generates a warning in the log file saying that recovery target point
is not reached (there is a patch being worked upon on by Thom on this)
c) Does not open-up the database exiting from the recovery process by
giving room to resume the replay of WALs

One thing to consider is just how different this is from simply bringing
PG up as a warm standby instead, with the warning added to indicate if
the recovery point wasn't reached.

Thanks!

Stephen

#8Venkata B Nagothi
nag1010@gmail.com
In reply to: Stephen Frost (#7)
Re: patch proposal

On Wed, Aug 17, 2016 at 11:27 PM, Stephen Frost <sfrost@snowman.net> wrote:

Venkata,

* Venkata B Nagothi (nag1010@gmail.com) wrote:

Agreed. Additional option like "pause" would. As long as there is an

option

to ensure following happens if the recovery target is not reached -

a) PG pauses the recovery at the end of the WAL
b) Generates a warning in the log file saying that recovery target point
is not reached (there is a patch being worked upon on by Thom on this)
c) Does not open-up the database exiting from the recovery process by
giving room to resume the replay of WALs

One thing to consider is just how different this is from simply bringing
PG up as a warm standby instead, with the warning added to indicate if
the recovery point wasn't reached.

I am referring to a specific scenario (performing point-in time recovery)
where-in a DBA attempts to bring up a standalone PG instance by restoring
the backup and performing recovery to a particular recover target (XID,
time or a named restore point) in the past by replaying all the available
WAL archives, which is not quite possible through a warm-standby setup.

Warm standby is more of a high-availability solution and i do not think so,
it addresses PITR kind of situation.

I will share more details defining PG behaviour across various recovery
scenarios (as asked by David Steele) when using various recovery*
parameters. Will also explain what difference the proposed patch could make.

Regards,

Venkata B N
Fujitsu Australia

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Stephen Frost (#5)
Re: patch proposal

On Tue, Aug 16, 2016 at 11:06 PM, Stephen Frost <sfrost@snowman.net> wrote:

I could see supporting an additional "pause" option that means "pause at
the end of WAL if you don't reach the recovery target point". I'd also
be happy with a warning being emitted in the log if the recovery target
point isn't reached before reaching the end of WAL, but I don't think it
makes sense to change the existing behavior.

Indeed, let's not change the existing behavior. A warning showing up
by default would be useful in itself, even if there are people that I
think set up overly high recovery targets to be sure to replay WAL as
much as possible. As recovery_target_action has meaning when a
recovery target has been reached, I would guess that we would want a
new option that has the same mapping value as recovery_target_action,
except that it activates when the target recovery is *not* reached.
Hence it would be possible to shutdown, pause or promote at will when
recovery completes, and be able to take a separate action is the
recovery target is indeed reached. The default of this parameter would
be "promote", which is what happens now.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Venkata B Nagothi
nag1010@gmail.com
In reply to: Michael Paquier (#9)
Re: patch proposal

On Thu, Aug 18, 2016 at 3:37 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Aug 16, 2016 at 11:06 PM, Stephen Frost <sfrost@snowman.net>
wrote:

I could see supporting an additional "pause" option that means "pause at
the end of WAL if you don't reach the recovery target point". I'd also
be happy with a warning being emitted in the log if the recovery target
point isn't reached before reaching the end of WAL, but I don't think it
makes sense to change the existing behavior.

Indeed, let's not change the existing behavior. A warning showing up
by default would be useful in itself, even if there are people that I
think set up overly high recovery targets to be sure to replay WAL as
much as possible. As recovery_target_action has meaning when a
recovery target has been reached, I would guess that we would want a
new option that has the same mapping value as recovery_target_action,
except that it activates when the target recovery is *not* reached.
Hence it would be possible to shutdown, pause or promote at will when
recovery completes, and be able to take a separate action is the
recovery target is indeed reached. The default of this parameter would
be "promote", which is what happens now.

Agreed. I understand the complexities with backward compatibility on
changing the existing behaviour.
Even, I was more inclined towards introducing a new parameter and as
suggested, will consider the options pause, shutdown or promote for new
parameter.
Thanks for the inputs and advises.

Regards,
Venkata B N

Fujitsu Australia

#11Stephen Frost
sfrost@snowman.net
In reply to: Venkata B Nagothi (#8)
Re: patch proposal

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Wed, Aug 17, 2016 at 11:27 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

Agreed. Additional option like "pause" would. As long as there is an

option

to ensure following happens if the recovery target is not reached -

a) PG pauses the recovery at the end of the WAL
b) Generates a warning in the log file saying that recovery target point
is not reached (there is a patch being worked upon on by Thom on this)
c) Does not open-up the database exiting from the recovery process by
giving room to resume the replay of WALs

One thing to consider is just how different this is from simply bringing
PG up as a warm standby instead, with the warning added to indicate if
the recovery point wasn't reached.

I am referring to a specific scenario (performing point-in time recovery)
where-in a DBA attempts to bring up a standalone PG instance by restoring
the backup and performing recovery to a particular recover target (XID,
time or a named restore point) in the past by replaying all the available
WAL archives, which is not quite possible through a warm-standby setup.

Warm standby is more of a high-availability solution and i do not think so,
it addresses PITR kind of situation.

No, a warm standby is one which just plays through the WAL but doesn't
bring the database up or start its own set of WAL, which is what you're
asking about.

Thanks!

Stephen

#12Venkata B Nagothi
nag1010@gmail.com
In reply to: Stephen Frost (#11)
Re: patch proposal

On Thu, Aug 18, 2016 at 7:20 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Wed, Aug 17, 2016 at 11:27 PM, Stephen Frost <sfrost@snowman.net>

wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

Agreed. Additional option like "pause" would. As long as there is an

option

to ensure following happens if the recovery target is not reached -

a) PG pauses the recovery at the end of the WAL
b) Generates a warning in the log file saying that recovery target

point

is not reached (there is a patch being worked upon on by Thom on

this)

c) Does not open-up the database exiting from the recovery process

by

giving room to resume the replay of WALs

One thing to consider is just how different this is from simply

bringing

PG up as a warm standby instead, with the warning added to indicate if
the recovery point wasn't reached.

I am referring to a specific scenario (performing point-in time recovery)
where-in a DBA attempts to bring up a standalone PG instance by restoring
the backup and performing recovery to a particular recover target (XID,
time or a named restore point) in the past by replaying all the available
WAL archives, which is not quite possible through a warm-standby setup.

Warm standby is more of a high-availability solution and i do not think

so,

it addresses PITR kind of situation.

No, a warm standby is one which just plays through the WAL but doesn't
bring the database up or start its own set of WAL, which is what you're
asking about.

I understand that, in warm-standby, PG does continuously replay WAL without
bringing up the database as the database will be in standby mode, which
sounds ideal.

While recovering the database to a particular recovery target point ( and
there is no requirement to setup standby ), then, there is a risk that
database will get promoted due to missing or corrupt WALs. Which means,
there is no way to avoid promotion and resume WAL recovery. An option like
"pause" with a new parameter would be ideal to prevent database promotion
at a default recovery target, which is "immediate" or "end-of-WAL".

Regards,

Venkata B N
Fujitsu Australia

#13Venkata B Nagothi
nag1010@gmail.com
In reply to: Michael Paquier (#9)
Re: patch proposal

On Thu, Aug 18, 2016 at 3:37 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Aug 16, 2016 at 11:06 PM, Stephen Frost <sfrost@snowman.net>
wrote:

I could see supporting an additional "pause" option that means "pause at
the end of WAL if you don't reach the recovery target point". I'd also
be happy with a warning being emitted in the log if the recovery target
point isn't reached before reaching the end of WAL, but I don't think it
makes sense to change the existing behavior.

Indeed, let's not change the existing behavior. A warning showing up
by default would be useful in itself, even if there are people that I
think set up overly high recovery targets to be sure to replay WAL as
much as possible. As recovery_target_action has meaning when a
recovery target has been reached, I would guess that we would want a
new option that has the same mapping value as recovery_target_action,
except that it activates when the target recovery is *not* reached.
Hence it would be possible to shutdown, pause or promote at will when
recovery completes, and be able to take a separate action is the
recovery target is indeed reached. The default of this parameter would
be "promote", which is what happens now.

Yes, a new parameter with same options as recovery_target_action is the
idea i had in mind as well and i have the following queries while working
through the patch design -

*Query 1*

What about the existing parameter called "recovery_target" which accepts
only one value "immediate", which will be similar to the "promote" option
with the to-be-introduced new parameter.
Since this parameter's behaviour will be incorporated into the new
parameter, I think, this parameter can be deprecated from the next
PostgreSQL version ?

*Query 2*

I am thinking that the new parameter name should be
"recovery_target_incomplete" or "recovery_target_incomplete_action" which
(by name) suggests that recovery target point is not yet reached and
accepts options "pause","promote" and "shutdown".

The other alternative name i thought of was -
"recovery_target_immediate_action", which (by name) suggests the action to
be taken when the recovery does not reach the actual set recovery target
and reaches immediate consistent point.

Any comments, suggestions ?

Regards,
Venkata B N

Fujitsu Australia

#14Stephen Frost
sfrost@snowman.net
In reply to: Venkata B Nagothi (#13)
Re: patch proposal

* Venkata B Nagothi (nag1010@gmail.com) wrote:

*Query 1*

What about the existing parameter called "recovery_target" which accepts
only one value "immediate", which will be similar to the "promote" option
with the to-be-introduced new parameter.
Since this parameter's behaviour will be incorporated into the new
parameter, I think, this parameter can be deprecated from the next
PostgreSQL version ?

I don't think we can really consider that without having a good answer
for what the "new parameter" is, in particular...

*Query 2*

I am thinking that the new parameter name should be
"recovery_target_incomplete" or "recovery_target_incomplete_action" which
(by name) suggests that recovery target point is not yet reached and
accepts options "pause","promote" and "shutdown".

The other alternative name i thought of was -
"recovery_target_immediate_action", which (by name) suggests the action to
be taken when the recovery does not reach the actual set recovery target
and reaches immediate consistent point.

I don't really care for any of those names. Note that "immediate" and
"the point at which we realize that we didn't find the recovery target"
are *not* necessairly the same. Whatever we do here needs to also cover
the 'recovery_target_name' option, where we're only going to realize
that we didn't find the restore point when we reach the end of the WAL
stream.

I'm not a fan of the "recovery_target" option, particularly as it's only
got one value even though it can mean two things (either "immediate" or
"not set"), but we need a complete solution before we can consider
deprecating it. Further, we could consider making it an alias for
whatever better name we come up with.

Thanks!

Stephen

#15Venkata B Nagothi
nag1010@gmail.com
In reply to: Stephen Frost (#14)
Re: patch proposal

On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

*Query 1*

What about the existing parameter called "recovery_target" which accepts
only one value "immediate", which will be similar to the "promote" option
with the to-be-introduced new parameter.
Since this parameter's behaviour will be incorporated into the new
parameter, I think, this parameter can be deprecated from the next
PostgreSQL version ?

I don't think we can really consider that without having a good answer
for what the "new parameter" is, in particular...

Sure. I Agree.

*Query 2*

I am thinking that the new parameter name should be
"recovery_target_incomplete" or "recovery_target_incomplete_action"

which

(by name) suggests that recovery target point is not yet reached and
accepts options "pause","promote" and "shutdown".

The other alternative name i thought of was -
"recovery_target_immediate_action", which (by name) suggests the action

to

be taken when the recovery does not reach the actual set recovery target
and reaches immediate consistent point.

I don't really care for any of those names. Note that "immediate" and
"the point at which we realize that we didn't find the recovery target"
are *not* necessairly the same. Whatever we do here needs to also cover
the 'recovery_target_name' option, where we're only going to realize
that we didn't find the restore point when we reach the end of the WAL
stream.

Yes, all the recovery targets (including recovery_target_name) will be
taken into consideration.
The whole idea about this patch is to ensure PG realises that the recovery
is incomplete if the specified recovery target point is not found at the
end of the WAL stream.

I'm not a fan of the "recovery_target" option, particularly as it's only
got one value even though it can mean two things (either "immediate" or
"not set"), but we need a complete solution before we can consider
deprecating it. Further, we could consider making it an alias for
whatever better name we come up with.

The new parameter will accept options : "pause", "shutdown" and "promote"

*"promote"*

This option will ensure database starts up once the "immediate" consistent
recovery point is reached even if it is well before the mentioned recovery
target point (XID, Name or time).
This behaviour will be similar to that of recovery_target="immediate" and
can be aliased.

*"pause"*

This option ensures database recovery pauses if any of the specified
recovery target ("XID", "Name" or "time") is not reached. So that WAL
replay can be resumed if required.

*"shutdown"*

This option ensures database shuts down if any of the mentioned recovery
target points (XID, Name or time) is not reached.

Regards,
Venkata B N

Fujitsu Australia

#16Stephen Frost
sfrost@snowman.net
In reply to: Venkata B Nagothi (#15)
Re: patch proposal

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost <sfrost@snowman.net> wrote:

I'm not a fan of the "recovery_target" option, particularly as it's only
got one value even though it can mean two things (either "immediate" or
"not set"), but we need a complete solution before we can consider
deprecating it. Further, we could consider making it an alias for
whatever better name we come up with.

The new parameter will accept options : "pause", "shutdown" and "promote"

*"promote"*

This option will ensure database starts up once the "immediate" consistent
recovery point is reached even if it is well before the mentioned recovery
target point (XID, Name or time).
This behaviour will be similar to that of recovery_target="immediate" and
can be aliased.

I don't believe we're really going at this the right way. Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

We need to break this down into all the different possible combinations
and then come up with names for the options to define them. I don't
believe a single option is going to be able to cover all of the cases.

The cases which I'm considering are:

recovery target is immediate (as soon as we have consistency)
recovery target is a set point (name, xid, time, whatever)

action to take if recovery target is found
action to take if recovery target is not found

Generally, "action" is one of "promote", "pause", or "shutdown".
Clearly, not all actions are valid for all recovery target cases- in
particular, "immediate" with "recovery target not found" can not support
the "promote" or "pause" options. Otherwise, we can support:

Recovery Target | Found | Action
-----------------|---------|----------
immediate | Yes | promote
immediate | Yes | pause
immediate | Yes | shutdown

immediate | No | shutdown

name/xid/time | Yes | promote
name/xid/time | Yes | pause
name/xid/time | Yes | shutdown

name/xid/time | No | promote
name/xid/time | No | pause
name/xid/time | No | shutdown

We could clearly support this with these options:

recovery_target = immediate, other
recovery_action_target_found = promote, pause, shutdown
recovery_action_target_not_found = promote, pause, shutdown

One question to ask is if we need to support an option for xid and time
related to when we realize that we won't find the recovery target. If
consistency is reached at a time which is later than the recovery target
for time, what then? Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream? If we use that
approach, then at least all of the recovery target types are handled the
same, but I can definitely see cases where an administrator might prefer
an "error" option.

I'd suggest we attempt to support that also.

Thanks!

Stephen

#17Venkata B Nagothi
nag1010@gmail.com
In reply to: Stephen Frost (#16)
Re: patch proposal

On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost <sfrost@snowman.net>

wrote:

I'm not a fan of the "recovery_target" option, particularly as it's

only

got one value even though it can mean two things (either "immediate" or
"not set"), but we need a complete solution before we can consider
deprecating it. Further, we could consider making it an alias for
whatever better name we come up with.

The new parameter will accept options : "pause", "shutdown" and "promote"

*"promote"*

This option will ensure database starts up once the "immediate"

consistent

recovery point is reached even if it is well before the mentioned

recovery

target point (XID, Name or time).
This behaviour will be similar to that of recovery_target="immediate" and
can be aliased.

I don't believe we're really going at this the right way. Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

My apologies for the confusion. Correction - I meant, "promote" option
would promote the database once the consistent point is reached at the
end-of-the-WAL.

We need to break this down into all the different possible combinations
and then come up with names for the options to define them. I don't
believe a single option is going to be able to cover all of the cases.

The cases which I'm considering are:

recovery target is immediate (as soon as we have consistency)
recovery target is a set point (name, xid, time, whatever)

action to take if recovery target is found
action to take if recovery target is not found

Generally, "action" is one of "promote", "pause", or "shutdown".
Clearly, not all actions are valid for all recovery target cases- in
particular, "immediate" with "recovery target not found" can not support
the "promote" or "pause" options. Otherwise, we can support:

I Agree. This is a valid point to consider. I might have few questions over
this at a later stage.

Recovery Target | Found | Action

-----------------|---------|----------
immediate | Yes | promote
immediate | Yes | pause
immediate | Yes | shutdown

immediate | No | shutdown

name/xid/time | Yes | promote
name/xid/time | Yes | pause
name/xid/time | Yes | shutdown

name/xid/time | No | promote
name/xid/time | No | pause
name/xid/time | No | shutdown

We could clearly support this with these options:

recovery_target = immediate, other

recovery_action_target_found = promote, pause, shutdown

This is currently supported by the existing parameter called
"recovery_target_action"

recovery_action_target_not_found = promote, pause, shutdown

This is exactly what newly proposed parameter will do.

One question to ask is if we need to support an option for xid and time
related to when we realize that we won't find the recovery target. If
consistency is reached at a time which is later than the recovery target
for time, what then? Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream? If we use that
approach, then at least all of the recovery target types are handled the
same, but I can definitely see cases where an administrator might prefer
an "error" option.

Currently, this situation is handled by recovery_target_inclusive
parameter. Administrator can choose to stop the recovery at any consistent
point before or after the specified recovery target. Is this what you were
referring to ?
I might need a bit of clarification here, the parameter i am proposing
would be effective only if the specified recovery target is not reached and
may not be effective if the recovery goes beyond recovery target point.

Regards,
Venkata B N

Fujitsu Australia

#18Stephen Frost
sfrost@snowman.net
In reply to: Venkata B Nagothi (#17)
Re: patch proposal

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

This behaviour will be similar to that of recovery_target="immediate" and
can be aliased.

I don't believe we're really going at this the right way. Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

My apologies for the confusion. Correction - I meant, "promote" option
would promote the database once the consistent point is reached at the
end-of-the-WAL.

"consistent point" and "end-of-the-WAL" are not the same.

recovery_target = immediate, other

recovery_action_target_found = promote, pause, shutdown

This is currently supported by the existing parameter called
"recovery_target_action"

Right, sure, we could possibly use that, or create an alias, etc.

recovery_action_target_not_found = promote, pause, shutdown

This is exactly what newly proposed parameter will do.

Then it isn't the same as 'recovery_target = immediate'.

One question to ask is if we need to support an option for xid and time
related to when we realize that we won't find the recovery target. If
consistency is reached at a time which is later than the recovery target
for time, what then? Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream? If we use that
approach, then at least all of the recovery target types are handled the
same, but I can definitely see cases where an administrator might prefer
an "error" option.

Currently, this situation is handled by recovery_target_inclusive
parameter.

No, that's not the same.

Administrator can choose to stop the recovery at any consistent
point before or after the specified recovery target. Is this what you were
referring to ?

Not quite.

I might need a bit of clarification here, the parameter i am proposing
would be effective only if the specified recovery target is not reached and
may not be effective if the recovery goes beyond recovery target point.

Ok, let's consider this scenario:

Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
recovery_target is not set
recovery_target_time = 2016-08-26 08:29:30
recovery_target_inclusive = false

In such a case, we will necessairly commit transactions which happened
between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
reached consistency. We could possibly detect that case and throw an
error instead. The same could happen with xid.

Working through more scenarios would be useful, I believe. For example,
what if the xid or time specified happened before the backup started?

Basically, what I was trying to get at is that we might be able to
detect that we'll never find a given recovery target point without
actually replaying all of WAL and wondering if we should provide options
to control what to do in such a case.

Thanks!

Stephen

#19Venkata B Nagothi
nag1010@gmail.com
In reply to: Stephen Frost (#18)
Re: patch proposal

On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost <sfrost@snowman.net>

wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

This behaviour will be similar to that of

recovery_target="immediate" and

can be aliased.

I don't believe we're really going at this the right way. Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

My apologies for the confusion. Correction - I meant, "promote" option
would promote the database once the consistent point is reached at the
end-of-the-WAL.

"consistent point" and "end-of-the-WAL" are not the same.

recovery_target = immediate, other

recovery_action_target_found = promote, pause, shutdown

This is currently supported by the existing parameter called
"recovery_target_action"

Right, sure, we could possibly use that, or create an alias, etc.

recovery_action_target_not_found = promote, pause, shutdown

This is exactly what newly proposed parameter will do.

Then it isn't the same as 'recovery_target = immediate'.

One question to ask is if we need to support an option for xid and time
related to when we realize that we won't find the recovery target. If
consistency is reached at a time which is later than the recovery

target

for time, what then? Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream? If we use that
approach, then at least all of the recovery target types are handled

the

same, but I can definitely see cases where an administrator might

prefer

an "error" option.

Currently, this situation is handled by recovery_target_inclusive
parameter.

No, that's not the same.

Administrator can choose to stop the recovery at any consistent
point before or after the specified recovery target. Is this what you

were

referring to ?

Not quite.

I might need a bit of clarification here, the parameter i am proposing
would be effective only if the specified recovery target is not reached

and

may not be effective if the recovery goes beyond recovery target point.

Ok, let's consider this scenario:

Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
recovery_target is not set
recovery_target_time = 2016-08-26 08:29:30
recovery_target_inclusive = false

In such a case, we will necessairly commit transactions which happened
between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
reached consistency. We could possibly detect that case and throw an
error instead. The same could happen with xid.

Yes, currently, PG just recovers until the consistency is reached. It makes
sense to throw an error instead.

Working through more scenarios would be useful, I believe. For example,
what if the xid or time specified happened before the backup started?

Basically, what I was trying to get at is that we might be able to
detect that we'll never find a given recovery target point without
actually replaying all of WAL and wondering if we should provide options
to control what to do in such a case.

Ah, Yes, I got the point and I agree. Thanks for the clarification.

Yes, good idea and it is important to ensure PG errors out and warn the
administrator with appropriate message indicating that... "a much earlier
backup must be used..."
if the specified recovery target xid, name or time is earlier than the
backup time.

Regards,

Venkata B N
Fujitsu Australia

#20Venkata B Nagothi
nag1010@gmail.com
In reply to: Venkata B Nagothi (#19)
1 attachment(s)
Re: patch proposal

Attached is the patch which introduces the new parameter
"recovery_target_incomplete" -

Currently this patch addresses two scenarios -

*Scenario 1 :*

Provide options to DBA when the recovery target is not reached and has
stopped mid-way due to unavailability of WALs

When performing recovery to a specific recovery target,
"recovery_target_incomplete" parameter can be used to either promote, pause
or shutdown when recovery does not reach the specified recovery target and
reaches end-of-the-wal.

*Scenario 2 :*

Generates Errors, Hints when the specified recovery target is prior to the
backup's current position (xid, time and lsn). This behaviour is integrated
with the parameters "recovery_target_time","recovery_target_lsn" and
"recovery_target_xid".

I would like to know if the approach this patch incorporates sounds ok ?

My other comments are inline

On Mon, Aug 29, 2016 at 3:17 PM, Venkata B Nagothi <nag1010@gmail.com>
wrote:

On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost <sfrost@snowman.net>
wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost <sfrost@snowman.net>

wrote:

* Venkata B Nagothi (nag1010@gmail.com) wrote:

This behaviour will be similar to that of

recovery_target="immediate" and

can be aliased.

I don't believe we're really going at this the right way. Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

My apologies for the confusion. Correction - I meant, "promote" option
would promote the database once the consistent point is reached at the
end-of-the-WAL.

"consistent point" and "end-of-the-WAL" are not the same.

recovery_target = immediate, other

recovery_action_target_found = promote, pause, shutdown

This is currently supported by the existing parameter called
"recovery_target_action"

Right, sure, we could possibly use that, or create an alias, etc.

recovery_action_target_not_found = promote, pause, shutdown

This is exactly what newly proposed parameter will do.

Then it isn't the same as 'recovery_target = immediate'.

One question to ask is if we need to support an option for xid and

time

related to when we realize that we won't find the recovery target. If
consistency is reached at a time which is later than the recovery

target

for time, what then? Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream? If we use that
approach, then at least all of the recovery target types are handled

the

same, but I can definitely see cases where an administrator might

prefer

an "error" option.

Currently, this situation is handled by recovery_target_inclusive
parameter.

No, that's not the same.

Administrator can choose to stop the recovery at any consistent
point before or after the specified recovery target. Is this what you

were

referring to ?

Not quite.

I might need a bit of clarification here, the parameter i am proposing
would be effective only if the specified recovery target is not reached

and

may not be effective if the recovery goes beyond recovery target point.

Ok, let's consider this scenario:

Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
recovery_target is not set
recovery_target_time = 2016-08-26 08:29:30
recovery_target_inclusive = false

In such a case, we will necessairly commit transactions which happened
between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
reached consistency. We could possibly detect that case and throw an
error instead. The same could happen with xid.

Yes, currently, PG just recovers until the consistency is reached. It
makes sense to throw an error instead.

This has not been addressed yet. Currently, the patch only considers
generating an error if the recovery target position is prior the current
backup's position and this is irrespective of weather backup_label is
present or not.
I will share my updates on how this can be addressed.

Working through more scenarios would be useful, I believe. For example,
what if the xid or time specified happened before the backup started?

Basically, what I was trying to get at is that we might be able to
detect that we'll never find a given recovery target point without
actually replaying all of WAL and wondering if we should provide options
to control what to do in such a case.

Ah, Yes, I got the point and I agree. Thanks for the clarification.

Yes, good idea and it is important to ensure PG errors out and warn the
administrator with appropriate message indicating that... "a much earlier
backup must be used..."
if the specified recovery target xid, name or time is earlier than the
backup time.

This scenario has been addressed in this patch.

I have a question regarding the following comment -

/*
* Override any inconsistent requests. Not that this is a change of
* behaviour in 9.5; prior to this we simply ignored a request to
pause if
* hot_standby = off, which was surprising behaviour.
*/
if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
recoveryTargetActionSet &&
!EnableHotStandby)
recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

Because of which, option "pause" is ignored and the database is shutdown
instead.

The "pause" request is ignored for recovery_target_action parameter and
that is because of the following lines in recoveryPausesHere() function ?
am i getting it right ?

/* Don't pause unless users can connect! */
if (!LocalHotStandbyActive)
return;

Not sure why.

The parameter i am introducing faced the similar problem while testing and
i have introduced a new function called IncompleteRecoveryPause() without
the above condition which will pause the recovery at end-of-the-wal. Hope
that is OK ?
If this is ok, i can change the function name and use the similar function
for recovery_target_action='pause' as well.

Please note that documentation is still pending from my end.

This is my first patch to the community and i am very new the code base.
Please let me know if this patch needs any changes.

Regards,

Venkata B N

Database Consultant
Fujitsu Australia

Attachments:

recoveryTargetIncomplete.patchapplication/octet-stream; name=recoveryTargetIncomplete.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d6c057a..f5a537c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -252,6 +252,7 @@ static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
 static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
+static RecoveryTargetIncomplete recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -789,10 +790,12 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoveryCommandFile(void);
+static void recoveryStartsHere(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static void recoveryPausesHere(void);
+static void IncompleteRecoveryPause(void);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
@@ -4949,7 +4952,6 @@ readRecoveryCommandFile(void)
 			   *tail = NULL;
 	bool		recoveryTargetActionSet = false;
 
-
 	fd = AllocateFile(RECOVERY_COMMAND_FILE, "r");
 	if (fd == NULL)
 	{
@@ -5014,6 +5016,26 @@ readRecoveryCommandFile(void)
 
 			recoveryTargetActionSet = true;
 		}
+		else if (strcmp(item->name, "recovery_target_incomplete") == 0)
+		{
+			if (strcmp(item->value, "pause") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PAUSE;
+			else if (strcmp(item->value, "promote") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
+			else if (strcmp(item->value, "shutdown") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_SHUTDOWN;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid value for recovery parameter \"%s\": \"%s\"",
+					   "recovery_target_incomplete",
+					   item->value),
+						 errhint("Valid values are \"pause\", \"promote\", and \"shutdown\".")));
+
+			ereport(DEBUG2,
+					(errmsg_internal("recovery_target_incomplete = '%s'",
+									 item->value)));
+		}
 		else if (strcmp(item->name, "recovery_target_timeline") == 0)
 		{
 			rtliGiven = true;
@@ -5395,6 +5417,60 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 	return false;
 }
 
+/* When performing point-in-time-recovery, this function identifies if
+ * the specified recovery target (recovery_target_time, recovery_target_lsn and recovery_target_xid) is prior to that of the backup.
+ * Which means, recovery cannot proceed if the recovery target point is prior to backup start point.
+ */
+
+static void
+recoveryStartsHere(void)
+{
+	/*
+	 * Check if the recovery target xid is older than the oldest xid of the
+	 * backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_XID)
+	{
+		if (TransactionIdPrecedes(recoveryTargetXid,
+								  ControlFile->checkPointCopy.oldestXid))
+		{
+			ereport(FATAL,
+					(errmsg("recovery_target_xid %u is older that backup's Xid %u", recoveryTargetXid, ControlFile->checkPointCopy.oldestXid),
+					 errhint("This means that the backup being used is much later than the recovery target position."
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/* Check if the recovery target lsn is prior to the latest checkpoint redo position of the backup */
+
+	if (recoveryTarget == RECOVERY_TARGET_LSN)
+	{
+		if (recoveryTargetLSN < ControlFile->checkPointCopy.redo)
+		{
+			ereport(FATAL,
+					(errmsg("recovery_target_lsn \"%X/%X\" is older than backup start LSN \"%X/%X\"",
+							(uint32) (recoveryTargetLSN >> 32),
+							(uint32) recoveryTargetLSN, (uint32) (ControlFile->checkPointCopy.redo >> 32), (uint32) ControlFile->checkPointCopy.redo),
+					 errhint("This means that the backup being used is much later than the recovery target position."
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/* Check if the recovery target time is prior to the current timestamp of the backup */
+
+	if (recoveryTarget == RECOVERY_TARGET_TIME)
+	{
+		if (recoveryTargetTime < ControlFile->checkPointCopy.time)
+		{
+			ereport(FATAL,
+					(errmsg("recovery_target_time %s is older than backup start time %s", timestamptz_to_str(recoveryTargetTime), timestamptz_to_str(ControlFile->checkPointCopy.time)),
+					 errhint("This means that the backup being used is much later than the recovery target position."
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -5740,6 +5816,23 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
+static void
+IncompleteRecoveryPause(void)
+{
+	/* Pause recovery at end-of-the-wal when recovery target is not reached */
+	ereport(LOG,
+			(errmsg("recovery has paused"),
+			 errhint("Execute pg_xlog_replay_resume() to continue.")));
+
+	while (RecoveryIsPaused())
+	{
+		pg_usleep(1000000L);	/* 1000 ms */
+		HandleStartupProcInterrupts();
+	}
+}
+
+
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -6128,6 +6221,9 @@ StartupXLOG(void)
 					(errmsg("starting archive recovery")));
 	}
 
+	/* Check if archive recovery can start at all */
+	recoveryStartsHere();
+
 	/*
 	 * Take ownership of the wakeup latch if we're going to sleep during
 	 * recovery.
@@ -7040,6 +7136,46 @@ StartupXLOG(void)
 						break;
 				}
 			}
+			else
+			{
+				ereport(LOG,
+						(errmsg("recovery has reached end-of-the-wal and has not reached the recovery target yet"),
+				errhint("This could be due to corrupt or missing WAL files.\n"
+						"All the WAL files needed for the recovery must be available to proceed to the recovery target "
+				"Or you might need to choose an earlier recovery target.")));
+
+				/*
+				 * This is the position where we can choose to shutdown, pause
+				 * or promote at the end-of-the-wal if the intended recovery
+				 * target is not reached
+				 */
+
+				switch (recoveryTargetIncomplete)
+				{
+
+					case RECOVERY_TARGET_INCOMPLETE_SHUTDOWN:
+
+						/*
+						 * exit with special return code to request shutdown
+						 * of postmaster.  Log messages issued from
+						 * postmaster.
+						 */
+						ereport(LOG,
+								(errmsg("shutdown at end-of-the-wal")));
+
+						proc_exit(2);
+
+					case RECOVERY_TARGET_INCOMPLETE_PAUSE:
+
+						SetRecoveryPause(true);
+						IncompleteRecoveryPause();
+
+						/* drop into promote */
+
+					case RECOVERY_TARGET_INCOMPLETE_PROMOTE:
+						break;
+				}
+			}
 
 			/* Allow resource managers to do any required cleanup. */
 			for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index ceb0462..799ddc2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -256,6 +256,18 @@ typedef enum
 } RecoveryTargetAction;
 
 /*
+ * Recovery target incomplete.
+ */
+
+typedef enum
+{
+	RECOVERY_TARGET_INCOMPLETE_PAUSE,
+	RECOVERY_TARGET_INCOMPLETE_PROMOTE,
+	RECOVERY_TARGET_INCOMPLETE_SHUTDOWN
+}	RecoveryTargetIncomplete;
+
+
+/*
  * Method table for resource managers.
  *
  * This struct must be kept in sync with the PG_RMGR definition in
#21Venkata B Nagothi
nag1010@gmail.com
In reply to: Venkata B Nagothi (#20)
1 attachment(s)
Re: patch proposal

Attached is the 2nd version of the patch with some enhancements.

*Scenario 2 :*

Generates Errors, Hints when the specified recovery target is prior to the
backup's current position (xid, time and lsn). This behaviour is integrated
with the parameters "recovery_target_time","recovery_target_lsn" and
"recovery_target_xid".

In the second version of the patch, recovery_target_xid will be compared
with the "latestCompletedXid" instead of "oldestXid" of the backup and if
the specified recovery_target_xid is prior to the "latestCompletedXid" of
the backup, then errors/hints would be generated indicating the DBA to use
the appropriate backup. Now, with this patch, pg_controldata also displays
the "latestCompletedXid" of the cluster.

Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08

Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
recovery_target is not set
recovery_target_time = 2016-08-26 08:29:30
recovery_target_inclusive = false

In such a case, we will necessairly commit transactions which happened
between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
reached consistency. We could possibly detect that case and throw an
error instead. The same could happen with xid.

Yes, currently, PG just recovers until the consistency is reached. It
makes sense to throw an error instead.

This has not been addressed yet. Currently, the patch only considers
generating an error if the recovery target position is prior the current
backup's position and this is irrespective of weather backup_label is
present or not.
I will share my updates on how this can be addressed.

This does not seem to be a possibility as the information in the
backup_label is not enough to handle this scenario in specific cases like
xid or time and it does not seem possible
to add the backup stop info to the backup_label. I would prefer leaving
this to the original behaviour at the moment which is : PostgreSQL
generates
errors and exits when *EndOfLog < minRecoveryPoint *during recovery

Documentation is still pending from my end and will be submitting the same
when the patch design/approach is agreed.

Regards,

Venkata B N
Database Consultant

Fujitsu Australia

Attachments:

recoveryTargetIncomplete.patch-version2application/octet-stream; name=recoveryTargetIncomplete.patch-version2Download
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6cec027..ba417c4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -252,6 +252,7 @@ static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
 static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
+static RecoveryTargetIncomplete recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -792,10 +793,12 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoveryCommandFile(void);
+static void recoveryStartsHere(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static void recoveryPausesHere(void);
+static void IncompleteRecoveryPause(void);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
@@ -5018,6 +5021,26 @@ readRecoveryCommandFile(void)
 
 			recoveryTargetActionSet = true;
 		}
+		else if (strcmp(item->name, "recovery_target_incomplete") == 0)
+		{
+			if (strcmp(item->value, "pause") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PAUSE;
+			else if (strcmp(item->value, "promote") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
+			else if (strcmp(item->value, "shutdown") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_SHUTDOWN;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid value for recovery parameter \"%s\": \"%s\"",
+					   "recovery_target_incomplete",
+					   item->value),
+						 errhint("Valid values are \"pause\", \"promote\", and \"shutdown\".")));
+
+			ereport(DEBUG2,
+					(errmsg_internal("recovery_target_incomplete = '%s'",
+									 item->value)));
+		}
 		else if (strcmp(item->name, "recovery_target_timeline") == 0)
 		{
 			rtliGiven = true;
@@ -5399,6 +5422,66 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 	return false;
 }
 
+/* When performing point-in-time-recovery, this function identifies if
+ * the specified recovery target (recovery_target_time, recovery_target_lsn and recovery_target_xid) is prior to that of the backup.
+ * Which means, recovery cannot proceed if the recovery target point is prior to backup start point.
+ */
+
+static void
+recoveryStartsHere(void)
+{
+	/*
+	 * Check if the recovery target xid is older than the latest completed xid of the
+	 * backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_XID)
+	{
+		if (TransactionIdPrecedes(recoveryTargetXid,
+							 ControlFile->checkPointCopy.latestCompletedXid))
+		{
+			ereport(FATAL,
+					(errmsg("recovery_target_xid %u is older than the latest Completed Xid %u", recoveryTargetXid, ControlFile->checkPointCopy.latestCompletedXid),
+					 errhint("This means that the backup being used is much later than the recovery target position."
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target lsn is prior to the latest checkpointi's redo
+	 * position of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_LSN)
+	{
+		if (recoveryTargetLSN < ControlFile->checkPointCopy.redo)
+		{
+			ereport(FATAL,
+					(errmsg("recovery_target_lsn \"%X/%X\" is older than the backup start LSN \"%X/%X\"",
+							(uint32) (recoveryTargetLSN >> 32),
+							(uint32) recoveryTargetLSN, (uint32) (ControlFile->checkPointCopy.redo >> 32), (uint32) ControlFile->checkPointCopy.redo),
+					 errhint("This means that the backup being used is much later than the recovery target position."
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target time is prior to the current timestamp of
+	 * the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_TIME)
+	{
+		if (recoveryTargetTime < ControlFile->checkPointCopy.time)
+		{
+			ereport(FATAL,
+					(errmsg("recovery_target_time %s is older than the backup start time %s", timestamptz_to_str(recoveryTargetTime), timestamptz_to_str(ControlFile->checkPointCopy.time)),
+					 errhint("This means that the backup being used is much later than the recovery target position."
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -5744,6 +5827,23 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
+static void
+IncompleteRecoveryPause(void)
+{
+	/* Pause recovery at end-of-the-wal when recovery target is not reached */
+	ereport(LOG,
+			(errmsg("recovery has paused"),
+			 errhint("Execute pg_xlog_replay_resume() to continue.")));
+
+	while (RecoveryIsPaused())
+	{
+		pg_usleep(1000000L);	/* 1000 ms */
+		HandleStartupProcInterrupts();
+	}
+}
+
+
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -6132,6 +6232,9 @@ StartupXLOG(void)
 					(errmsg("starting archive recovery")));
 	}
 
+	/* Check if archive recovery can start at all */
+	recoveryStartsHere();
+
 	/*
 	 * Take ownership of the wakeup latch if we're going to sleep during
 	 * recovery.
@@ -7044,6 +7147,44 @@ StartupXLOG(void)
 						break;
 				}
 			}
+		        else
+			{
+                              		ereport(LOG,
+							(errmsg("recovery has reached end-of-the-wal and has not reached the recovery target yet"),
+							 errhint("This could be due to corrupt or missing WAL files.\n"
+								 "All the WAL files needed for the recovery must be available to proceed to the recovery target "
+								 "Or you might need to choose an earlier recovery target.")));
+
+                              /*
+                               * This is the position where we can choose to shutdown, pause
+                               * or promote at the end-of-the-wal if the intended recovery
+                               * target is not reached
+                               */
+                              switch (recoveryTargetIncomplete)
+                              {
+
+					case RECOVERY_TARGET_INCOMPLETE_SHUTDOWN:
+
+					/*
+					 * exit with special return code to request shutdown
+					 * of postmaster.  Log messages issued from postmaster.
+                                         */
+					
+					ereport(LOG,
+							(errmsg("shutdown at end-of-the-wal")));
+					proc_exit(2);
+					
+					case RECOVERY_TARGET_INCOMPLETE_PAUSE:
+
+						SetRecoveryPause(true);
+						IncompleteRecoveryPause();
+
+                                              /* drop into promote */
+
+					case RECOVERY_TARGET_INCOMPLETE_PROMOTE:
+						break;
+                              }
+                      }
 
 			/* Allow resource managers to do any required cleanup. */
 			for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
@@ -8450,6 +8591,7 @@ CreateCheckPoint(int flags)
 	checkPoint.nextXid = ShmemVariableCache->nextXid;
 	checkPoint.oldestXid = ShmemVariableCache->oldestXid;
 	checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
+	checkPoint.latestCompletedXid = ShmemVariableCache->latestCompletedXid;
 	LWLockRelease(XidGenLock);
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 20077a6..d6d0852 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -192,114 +192,116 @@ main(int argc, char *argv[])
 	snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT,
 			 ControlFile->system_identifier);
 
-	printf(_("pg_control version number:            %u\n"),
+	printf(_("pg_control version number:		%u\n"),
 		   ControlFile->pg_control_version);
-	printf(_("Catalog version number:               %u\n"),
+	printf(_("Catalog version number:                 %u\n"),
 		   ControlFile->catalog_version_no);
-	printf(_("Database system identifier:           %s\n"),
+	printf(_("Database system identifier:             %s\n"),
 		   sysident_str);
-	printf(_("Database cluster state:               %s\n"),
+	printf(_("Database cluster state:                 %s\n"),
 		   dbState(ControlFile->state));
-	printf(_("pg_control last modified:             %s\n"),
+	printf(_("pg_control last modified:               %s\n"),
 		   pgctime_str);
-	printf(_("Latest checkpoint location:           %X/%X\n"),
+	printf(_("Latest checkpoint location:             %X/%X\n"),
 		   (uint32) (ControlFile->checkPoint >> 32),
 		   (uint32) ControlFile->checkPoint);
-	printf(_("Prior checkpoint location:            %X/%X\n"),
+	printf(_("Prior checkpoint location:              %X/%X\n"),
 		   (uint32) (ControlFile->prevCheckPoint >> 32),
 		   (uint32) ControlFile->prevCheckPoint);
-	printf(_("Latest checkpoint's REDO location:    %X/%X\n"),
+	printf(_("Latest checkpoint's REDO location:      %X/%X\n"),
 		   (uint32) (ControlFile->checkPointCopy.redo >> 32),
 		   (uint32) ControlFile->checkPointCopy.redo);
-	printf(_("Latest checkpoint's REDO WAL file:    %s\n"),
+	printf(_("Latest checkpoint's REDO WAL file:      %s\n"),
 		   xlogfilename);
-	printf(_("Latest checkpoint's TimeLineID:       %u\n"),
+	printf(_("Latest checkpoint's TimeLineID:         %u\n"),
 		   ControlFile->checkPointCopy.ThisTimeLineID);
-	printf(_("Latest checkpoint's PrevTimeLineID:   %u\n"),
+	printf(_("Latest checkpoint's PrevTimeLineID:     %u\n"),
 		   ControlFile->checkPointCopy.PrevTimeLineID);
-	printf(_("Latest checkpoint's full_page_writes: %s\n"),
+	printf(_("Latest checkpoint's full_page_writes:   %s\n"),
 		   ControlFile->checkPointCopy.fullPageWrites ? _("on") : _("off"));
-	printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
+	printf(_("Latest checkpoint's latestCompletedXID: %u\n"),
+		   ControlFile->checkPointCopy.latestCompletedXid);
+	printf(_("Latest checkpoint's NextXID:            %u:%u\n"),
 		   ControlFile->checkPointCopy.nextXidEpoch,
 		   ControlFile->checkPointCopy.nextXid);
-	printf(_("Latest checkpoint's NextOID:          %u\n"),
+	printf(_("Latest checkpoint's NextOID:            %u\n"),
 		   ControlFile->checkPointCopy.nextOid);
-	printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),
+	printf(_("Latest checkpoint's NextMultiXactId:    %u\n"),
 		   ControlFile->checkPointCopy.nextMulti);
-	printf(_("Latest checkpoint's NextMultiOffset:  %u\n"),
+	printf(_("Latest checkpoint's NextMultiOffset:    %u\n"),
 		   ControlFile->checkPointCopy.nextMultiOffset);
-	printf(_("Latest checkpoint's oldestXID:        %u\n"),
+	printf(_("Latest checkpoint's oldestXID:          %u\n"),
 		   ControlFile->checkPointCopy.oldestXid);
-	printf(_("Latest checkpoint's oldestXID's DB:   %u\n"),
+	printf(_("Latest checkpoint's oldestXID's DB:     %u\n"),
 		   ControlFile->checkPointCopy.oldestXidDB);
-	printf(_("Latest checkpoint's oldestActiveXID:  %u\n"),
+	printf(_("Latest checkpoint's oldestActiveXID:    %u\n"),
 		   ControlFile->checkPointCopy.oldestActiveXid);
-	printf(_("Latest checkpoint's oldestMultiXid:   %u\n"),
+	printf(_("Latest checkpoint's oldestMultiXid:     %u\n"),
 		   ControlFile->checkPointCopy.oldestMulti);
-	printf(_("Latest checkpoint's oldestMulti's DB: %u\n"),
+	printf(_("Latest checkpoint's oldestMulti's DB:   %u\n"),
 		   ControlFile->checkPointCopy.oldestMultiDB);
-	printf(_("Latest checkpoint's oldestCommitTsXid:%u\n"),
+	printf(_("Latest checkpoint's oldestCommitTsXid:  %u\n"),
 		   ControlFile->checkPointCopy.oldestCommitTsXid);
-	printf(_("Latest checkpoint's newestCommitTsXid:%u\n"),
+	printf(_("Latest checkpoint's newestCommitTsXid:  %u\n"),
 		   ControlFile->checkPointCopy.newestCommitTsXid);
-	printf(_("Time of latest checkpoint:            %s\n"),
+	printf(_("Time of latest checkpoint:              %s\n"),
 		   ckpttime_str);
-	printf(_("Fake LSN counter for unlogged rels:   %X/%X\n"),
+	printf(_("Fake LSN counter for unlogged rels:     %X/%X\n"),
 		   (uint32) (ControlFile->unloggedLSN >> 32),
 		   (uint32) ControlFile->unloggedLSN);
-	printf(_("Minimum recovery ending location:     %X/%X\n"),
+	printf(_("Minimum recovery ending location:       %X/%X\n"),
 		   (uint32) (ControlFile->minRecoveryPoint >> 32),
 		   (uint32) ControlFile->minRecoveryPoint);
-	printf(_("Min recovery ending loc's timeline:   %u\n"),
+	printf(_("Min recovery ending loc's timeline:     %u\n"),
 		   ControlFile->minRecoveryPointTLI);
-	printf(_("Backup start location:                %X/%X\n"),
+	printf(_("Backup start location:                  %X/%X\n"),
 		   (uint32) (ControlFile->backupStartPoint >> 32),
 		   (uint32) ControlFile->backupStartPoint);
-	printf(_("Backup end location:                  %X/%X\n"),
+	printf(_("Backup end location:                    %X/%X\n"),
 		   (uint32) (ControlFile->backupEndPoint >> 32),
 		   (uint32) ControlFile->backupEndPoint);
-	printf(_("End-of-backup record required:        %s\n"),
+	printf(_("End-of-backup record required:          %s\n"),
 		   ControlFile->backupEndRequired ? _("yes") : _("no"));
-	printf(_("wal_level setting:                    %s\n"),
+	printf(_("wal_level setting:                      %s\n"),
 		   wal_level_str(ControlFile->wal_level));
-	printf(_("wal_log_hints setting:                %s\n"),
+	printf(_("wal_log_hints setting:                  %s\n"),
 		   ControlFile->wal_log_hints ? _("on") : _("off"));
-	printf(_("max_connections setting:              %d\n"),
+	printf(_("max_connections setting:                %d\n"),
 		   ControlFile->MaxConnections);
-	printf(_("max_worker_processes setting:         %d\n"),
+	printf(_("max_worker_processes setting:           %d\n"),
 		   ControlFile->max_worker_processes);
-	printf(_("max_prepared_xacts setting:           %d\n"),
+	printf(_("max_prepared_xacts setting:             %d\n"),
 		   ControlFile->max_prepared_xacts);
-	printf(_("max_locks_per_xact setting:           %d\n"),
+	printf(_("max_locks_per_xact setting:             %d\n"),
 		   ControlFile->max_locks_per_xact);
-	printf(_("track_commit_timestamp setting:       %s\n"),
+	printf(_("track_commit_timestamp setting:         %s\n"),
 		   ControlFile->track_commit_timestamp ? _("on") : _("off"));
-	printf(_("Maximum data alignment:               %u\n"),
+	printf(_("Maximum data alignment:                 %u\n"),
 		   ControlFile->maxAlign);
 	/* we don't print floatFormat since can't say much useful about it */
-	printf(_("Database block size:                  %u\n"),
+	printf(_("Database block size:                    %u\n"),
 		   ControlFile->blcksz);
-	printf(_("Blocks per segment of large relation: %u\n"),
+	printf(_("Blocks per segment of large relation:   %u\n"),
 		   ControlFile->relseg_size);
-	printf(_("WAL block size:                       %u\n"),
+	printf(_("WAL block size:                         %u\n"),
 		   ControlFile->xlog_blcksz);
-	printf(_("Bytes per WAL segment:                %u\n"),
+	printf(_("Bytes per WAL segment:                  %u\n"),
 		   ControlFile->xlog_seg_size);
-	printf(_("Maximum length of identifiers:        %u\n"),
+	printf(_("Maximum length of identifiers:          %u\n"),
 		   ControlFile->nameDataLen);
-	printf(_("Maximum columns in an index:          %u\n"),
+	printf(_("Maximum columns in an index:            %u\n"),
 		   ControlFile->indexMaxKeys);
-	printf(_("Maximum size of a TOAST chunk:        %u\n"),
+	printf(_("Maximum size of a TOAST chunk:          %u\n"),
 		   ControlFile->toast_max_chunk_size);
-	printf(_("Size of a large-object chunk:         %u\n"),
+	printf(_("Size of a large-object chunk:           %u\n"),
 		   ControlFile->loblksize);
-	printf(_("Date/time type storage:               %s\n"),
+	printf(_("Date/time type storage:                 %s\n"),
 		   (ControlFile->enableIntTimes ? _("64-bit integers") : _("floating-point numbers")));
-	printf(_("Float4 argument passing:              %s\n"),
+	printf(_("Float4 argument passing:                %s\n"),
 		   (ControlFile->float4ByVal ? _("by value") : _("by reference")));
-	printf(_("Float8 argument passing:              %s\n"),
+	printf(_("Float8 argument passing:                %s\n"),
 		   (ControlFile->float8ByVal ? _("by value") : _("by reference")));
-	printf(_("Data page checksum version:           %u\n"),
+	printf(_("Data page checksum version:             %u\n"),
 		   ControlFile->data_checksum_version);
 	return 0;
 }
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index ceb0462..35eabf2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -256,6 +256,18 @@ typedef enum
 } RecoveryTargetAction;
 
 /*
+ * Recovery target incomplete.
+ */
+
+typedef enum
+{
+	RECOVERY_TARGET_INCOMPLETE_PAUSE,
+	RECOVERY_TARGET_INCOMPLETE_PROMOTE,
+	RECOVERY_TARGET_INCOMPLETE_SHUTDOWN
+} RecoveryTargetIncomplete;
+
+
+/*
  * Method table for resource managers.
  *
  * This struct must be kept in sync with the PG_RMGR definition in
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 0bc41ab..20da26b 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -21,7 +21,7 @@
 
 
 /* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION	960
+#define PG_CONTROL_VERSION	100
 
 /*
  * Body of CheckPoint XLOG records.  This is declared here because we keep
@@ -58,6 +58,7 @@ typedef struct CheckPoint
 	 * set to InvalidTransactionId.
 	 */
 	TransactionId oldestActiveXid;
+	TransactionId latestCompletedXid;
 } CheckPoint;
 
 /* XLOG info values for XLOG rmgr */
#22Jaime Casanova
jaime.casanova@2ndquadrant.com
In reply to: Stephen Frost (#5)
Re: patch proposal

On 16 August 2016 at 09:06, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Venkata B Nagothi (nag1010@gmail.com) wrote:

The above said parameters can be configured to pause, shutdown or prevent
promotion only after reaching the recovery target point.
To clarify, I am referring to a scenario where recovery target point is not
reached at all ( i mean, half-complete or in-complete recovery) and there
are lots of WALs still pending to be replayed - in this situation,

PG doesn't know that there are still WALs to be replayed.

PostgreSQL just completes the archive recovery until the end of the last
available WAL (WAL file "00000001000000000000001E" in my case) and
starts-up the cluster by generating an error message (saying
"00000001000000000000001F" not found).

That's not a PG error, that's an error from cp. From PG's perspective,
your restore command has said that all of the WAL has been replayed.

If that's not what you want then change your restore command to return
an exit code > 125, which tells PG that it's unable to restore that WAL
segment.

Ah! Is this documented somewhere?
if we expect people to use correct exit codes we should document them, no?

--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Jaime Casanova (#22)
Re: patch proposal

Hi Abhijit,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your review about
the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Regards,
Hari Babu
Fujitsu Australia

#24Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#23)
Re: patch proposal

On Tue, Nov 22, 2016 at 10:17 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

Hi Abhijit,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your review about
the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#25David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#21)
Re: patch proposal

Hi Venkata,

On 11/8/16 5:47 PM, Venkata B Nagothi wrote:

Attached is the 2nd version of the patch with some enhancements.

Here's my review of the patch.

1) There are a number of whitespace error when applying the patch:

$ git apply ../other/recoveryTargetIncomplete.patch-version2
../other/recoveryTargetIncomplete.patch-version2:158: indent with spaces.
else
../other/recoveryTargetIncomplete.patch-version2:160: space before tab
in indent.
ereport(LOG,
../other/recoveryTargetIncomplete.patch-version2:166: indent with spaces.
/*
../other/recoveryTargetIncomplete.patch-version2:167: indent with spaces.
* This is the position where we can
choose to shutdown, pause
../other/recoveryTargetIncomplete.patch-version2:168: indent with spaces.
* or promote at the end-of-the-wal if the
intended recovery
warning: squelched 10 whitespace errors
warning: 15 lines add whitespace errors.

The build did succeed after applying the patch.

2) There is no documentation. Serious changes to recovery are going to
need to be documented very carefully. It will also help during review
to determine you intent.

3) There are no tests. I believe that
src/test/recovery/t/006_logical_decoding.pl would be the best place to
insert your tests.

4) I'm not convinced that fatal errors by default are the way to go.
It's a pretty big departure from what we have now.

Also, I don't think it's very intuitive how recovery_target_incomplete
works. For instance, if I set recovery_target_incomplete=promote and
set recovery_target_time, but pick a backup after the specified time,
then there will be an error "recovery_target_time %s is older..." rather
than a promotion.

It seems to me that there needs to be a separate setting to raise a
fatal error.

5) There are really two patches here. One deals with notifying the user
that replay to their recovery target is not possible (implemented here
with fatal errors) and the other allows options when their recovery
target appears to be possible but never arrives.

As far as I can see, at this point, nobody has really signed on to this
approach. I believe you will need a consensus before you can proceed
with a patch this disruptive.

A better approach may be to pull the first part out and raise warnings
instead. This will allow you to write tests and make sure that your
logic is correct without major behavioral changes to Postgres.

I'm marking this as "Waiting on Author", but honestly I think "Returned
with Feedback" might be more appropriate. I'll leave that up to the CFM.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Michael Paquier
michael.paquier@gmail.com
In reply to: David Steele (#25)
Re: patch proposal

On Tue, Jan 24, 2017 at 7:22 AM, David Steele <david@pgmasters.net> wrote:

3) There are no tests. I believe that
src/test/recovery/t/006_logical_decoding.pl would be the best place to
insert your tests.

003_recovery_targets.pl is the test for recovery targets. The routines
it has would be helpful.

I'm marking this as "Waiting on Author", but honestly I think "Returned
with Feedback" might be more appropriate. I'll leave that up to the CFM.

We could wait a couple of days before that, this patch just got a
review. (Thanks!)
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27David Steele
david@pgmasters.net
In reply to: Michael Paquier (#26)
Re: patch proposal

On 1/23/17 10:52 PM, Michael Paquier wrote:

On Tue, Jan 24, 2017 at 7:22 AM, David Steele <david@pgmasters.net> wrote:

3) There are no tests. I believe that
src/test/recovery/t/006_logical_decoding.pl would be the best place to
insert your tests.

003_recovery_targets.pl is the test for recovery targets. The routines
it has would be helpful.

Thanks for the correction, that's what I meant. Auto-complete is not my
friend.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Venkata B Nagothi
nag1010@gmail.com
In reply to: David Steele (#25)
2 attachment(s)
Re: patch proposal

Hi David,

On Tue, Jan 24, 2017 at 9:22 AM, David Steele <david@pgmasters.net> wrote:

Hi Venkata,

On 11/8/16 5:47 PM, Venkata B Nagothi wrote:

Attached is the 2nd version of the patch with some enhancements.

Here's my review of the patch.

Thank you very much for reviewing the patch.

1) There are a number of whitespace error when applying the patch:

$ git apply ../other/recoveryTargetIncomplete.patch-version2
../other/recoveryTargetIncomplete.patch-version2:158: indent with spaces.
else
../other/recoveryTargetIncomplete.patch-version2:160: space before tab
in indent.
ereport(LOG,
../other/recoveryTargetIncomplete.patch-version2:166: indent with spaces.
/*
../other/recoveryTargetIncomplete.patch-version2:167: indent with spaces.
* This is the position where we can
choose to shutdown, pause
../other/recoveryTargetIncomplete.patch-version2:168: indent with spaces.
* or promote at the end-of-the-wal if the
intended recovery
warning: squelched 10 whitespace errors
warning: 15 lines add whitespace errors.

The build did succeed after applying the patch.

Sorry, my bad. The attached patch should fix those errors.

2) There is no documentation. Serious changes to recovery are going to
need to be documented very carefully. It will also help during review
to determine you intent.

Attached patches include the documentation as well.

3) There are no tests. I believe that
src/test/recovery/t/006_logical_decoding.pl would be the best place to
insert your tests.

I will be adding the tests in src/test/recovery/t/003_recovery_targets.pl.
My tests would look more or less similar to recovery_target_action. I do
not see any of the tests added for the parameter recovery_target_action ?
Anyways, i will work on adding tests for recovery_target_incomplete.

4) I'm not convinced that fatal errors by default are the way to go.
It's a pretty big departure from what we have now.

I have changed the code to generate ERRORs instead of FATALs. Which is in
the patch recoveryStartPoint.patch

Also, I don't think it's very intuitive how recovery_target_incomplete
works. For instance, if I set recovery_target_incomplete=promote and
set recovery_target_time, but pick a backup after the specified time,
then there will be an error "recovery_target_time %s is older..." rather
than a promotion.

Yes, that is correct and that is the expected behaviour. Let me explain -

a) When you use the parameter recovery_target_time and pick-up a backup
taken after the specified target time, then, recovery will not proceed
further and ideally it should not proceed further.
This is not an incomplete recovery situation either. This is a
situation where recovery process cannot start at all. With this patch,
"recovery_target_time" (similarly recovery_target_xid and
recovery_target_lsn) first checks if the specified target time is later
than the backup's time stamp if yes, then the recovery will proceed a-head
or else it would generate an error,
which is what has happened in your case. The ideal way to deal with
this situation to change the recovery_target_time to a much later time
(which is later than the backup's time)
or use the parameter "recovery_target=immediate" to complete the
recovery and start the database or choose the correct backup (which has a
time stamp prior to the recovery_target_time).

b) recovery_target_incomplete has no effect in the above situation as this
parameter only deals with incomplete recovery. A situation where-in
recovery process stops mid-way (Example : due to missing or corrupt
WALs ) without reaching the intended recovery target (XID, Time and LSN).

c) In real-time environment when a DBA is required to perform PITR to a
particular recovery target (say, recovery_target_time),
it is the DBA who knows till where the database needs to be recovered
and what backup needs to be used for the same. The first thing DBA would do
is, choose the
appropriate backup which is taken prior to the recovery target. This is
the basic first step needed.

It seems to me that there needs to be a separate setting to raise a

fatal error.

5) There are really two patches here. One deals with notifying the user
that replay to their recovery target is not possible (implemented here
with fatal errors) and the other allows options when their recovery
target appears to be possible but never arrives.

As far as I can see, at this point, nobody has really signed on to this

approach. I believe you will need a consensus before you can proceed
with a patch this disruptive.

A better approach may be to pull the first part out and raise warnings
instead. This will allow you to write tests and make sure that your
logic is correct without major behavioral changes to Postgres.

This patch is to ensure that the recovery starts only if the specified
recovery target is legitimate and if not, recovery should error out without
replaying any WALs (This was the agreed approach).
"Warnings" may not help much. I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

Please let me know if you have any further comments.

Regards,
Venkata B N

Database Consultant

Attachments:

recoveryStartPoint.patchapplication/octet-stream; name=recoveryStartPoint.patchDownload
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 8c24ae2..59ef116 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -211,6 +211,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the checkpoint time of the backup being used is earlier than the specified recovery_target_time. 
+        In other words, the recovery will not proceed further if the checkpoint time of the backup is later than the specified recovery_target_time.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -231,6 +236,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the latest completed xid of the backup is earlier than the specified recovery target xid.
+        In other words, the recovery will not proceed further if the latest completed xid of the backup is later than the specified target xid.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -248,6 +258,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         parameter is parsed using the system data type
         <link linkend="datatype-pg-lsn"><type>pg_lsn</></link>.
        </para>
+       <para>
+        This parameter also checks if the current checkpoint's redo position of the backup is earlier than the specified recovery target lsn.
+        In other words, the recovery will not proceed further if the latest checkpoint's redo position of the backup is later than the specified target lsn.
+       </para>
+
       </listitem>
      </varlistentry>
      </variablelist>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8379133..3043864 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -826,6 +826,7 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoveryCommandFile(void);
+static void recoveryStartsHere(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
@@ -5449,6 +5450,67 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 	return false;
 }
 
+/* When performing point-in-time-recovery, this function identifies if
+ * the specified recovery target (recovery_target_time, recovery_target_lsn and recovery_target_xid) is prior to that of the backup.
+ * Which means, recovery cannot proceed if the recovery target point is prior to backup start point.
+ */
+
+static void
+recoveryStartsHere(void)
+{
+	/*
+	 * Check if the recovery target xid is older than the latest completed xid
+	 * of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_XID)
+	{
+		if (TransactionIdPrecedes(recoveryTargetXid,
+							 ControlFile->checkPointCopy.latestCompletedXid))
+		{
+			ereport(ERROR,
+					(errmsg("recovery_target_xid %u is older than the latest Completed Xid %u", recoveryTargetXid, ControlFile->checkPointCopy.latestCompletedXid),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target lsn is prior to the latest checkpointi's
+	 * redo position of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_LSN)
+	{
+		if (recoveryTargetLSN < ControlFile->checkPointCopy.redo)
+		{
+			ereport(ERROR,
+					(errmsg("recovery_target_lsn \"%X/%X\" is older than the backup start LSN \"%X/%X\"",
+							(uint32) (recoveryTargetLSN >> 32),
+							(uint32) recoveryTargetLSN, (uint32) (ControlFile->checkPointCopy.redo >> 32), (uint32) ControlFile->checkPointCopy.redo),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target time is prior to the current timestamp of
+	 * the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_TIME)
+	{
+		if (recoveryTargetTime < ControlFile->checkPointCopy.time)
+		{
+			ereport(ERROR,
+					(errmsg("recovery_target_time %s is older than the backup start time %s", timestamptz_to_str(recoveryTargetTime),
+					   timestamptz_to_str(ControlFile->checkPointCopy.time)),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -6182,6 +6244,9 @@ StartupXLOG(void)
 					(errmsg("starting archive recovery")));
 	}
 
+	/* Check if archive recovery can start at all */
+	recoveryStartsHere();
+
 	/*
 	 * Take ownership of the wakeup latch if we're going to sleep during
 	 * recovery.
@@ -8531,6 +8596,7 @@ CreateCheckPoint(int flags)
 	checkPoint.nextXid = ShmemVariableCache->nextXid;
 	checkPoint.oldestXid = ShmemVariableCache->oldestXid;
 	checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
+	checkPoint.latestCompletedXid = ShmemVariableCache->latestCompletedXid;
 	LWLockRelease(XidGenLock);
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 20077a6..f340f1c 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -222,6 +222,8 @@ main(int argc, char *argv[])
 	printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
 		   ControlFile->checkPointCopy.nextXidEpoch,
 		   ControlFile->checkPointCopy.nextXid);
+	printf(_("Latest checkpoint's latestCompletedXID: %u\n"),
+		   ControlFile->checkPointCopy.latestCompletedXid);
 	printf(_("Latest checkpoint's NextOID:          %u\n"),
 		   ControlFile->checkPointCopy.nextOid);
 	printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 23731e9..7ed9484 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -21,7 +21,7 @@
 
 
 /* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION	960
+#define PG_CONTROL_VERSION	100
 
 /*
  * Body of CheckPoint XLOG records.  This is declared here because we keep
@@ -58,6 +58,7 @@ typedef struct CheckPoint
 	 * set to InvalidTransactionId.
 	 */
 	TransactionId oldestActiveXid;
+	TransactionId latestCompletedXid;
 } CheckPoint;
 
 /* XLOG info values for XLOG rmgr */
recoveryTargetIncomplete.patch-version3application/octet-stream; name=recoveryTargetIncomplete.patch-version3Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 8c24ae2..fa580b7 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -349,6 +349,48 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="recovery-target-incomplete"
+                   xreflabel="recovery_target_incomplete">
+      <term><varname>recovery_target_incomplete</varname> (<type>enum</type>)
+      <indexterm>
+        <primary><varname>recovery_target_incomplete</> recovery parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies what action the server should take once the recovery target is
+        not reached. The default is <literal>promote</>, which means recovery will
+        finish and the server will start to accept connections. <literal>pause</> means the recovery process will be paused.
+        Finally <literal>shutdown</> will stop the server if the recovery cannot proceed any further.
+       </para>
+       <para>
+        The intended use of the <literal>pause</> setting is to allow queries
+        to be executed against the database to check if this recovery target
+        is the most desirable point for recovery.
+        The paused state can be resumed by
+        using <function>pg_xlog_replay_resume()</> (see
+        <xref linkend="functions-recovery-control-table">), which then
+        causes recovery to end. If this recovery target is not the
+        desired stopping point, then shut down the server, change the
+        recovery target settings to a later target and restart to
+        continue recovery.
+       </para>
+       <para>
+        The <literal>shutdown</> setting is useful to have the instance ready
+        at the exact replay point desired.  The instance will still be able to
+        replay more WAL records (and in fact will have to replay WAL records
+        since the last checkpoint next time it is started).
+       </para>
+       <para>
+        Note that because <filename>recovery.conf</> will not be renamed when
+        <varname>recovery_target_incomplete</> is set to <literal>shutdown</>,
+        any subsequent start will end with immediate shutdown unless the
+        configuration is changed or the <filename>recovery.conf</> file is
+        removed manually.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
    </sect1>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8379133..343f5b6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -252,6 +252,7 @@ static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
 static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
+static RecoveryTargetIncomplete recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -830,6 +831,7 @@ static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static void recoveryPausesHere(void);
+static void IncompleteRecoveryPause(void);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
@@ -5068,6 +5070,26 @@ readRecoveryCommandFile(void)
 
 			recoveryTargetActionSet = true;
 		}
+		else if (strcmp(item->name, "recovery_target_incomplete") == 0)
+		{
+			if (strcmp(item->value, "pause") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PAUSE;
+			else if (strcmp(item->value, "promote") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
+			else if (strcmp(item->value, "shutdown") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_SHUTDOWN;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid value for recovery parameter \"%s\": \"%s\"",
+					   "recovery_target_incomplete",
+					   item->value),
+						 errhint("Valid values are \"pause\", \"promote\", and \"shutdown\".")));
+
+			ereport(DEBUG2,
+					(errmsg_internal("recovery_target_incomplete = '%s'",
+									 item->value)));
+		}
 		else if (strcmp(item->name, "recovery_target_timeline") == 0)
 		{
 			rtliGiven = true;
@@ -5794,6 +5816,21 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
+static void
+IncompleteRecoveryPause(void)
+{
+	/* Pause recovery at end-of-the-wal when recovery target is not reached */
+	ereport(LOG,
+			(errmsg("recovery has paused"),
+			 errhint("Execute pg_xlog_replay_resume() to continue.")));
+
+	while (RecoveryIsPaused())
+	{
+		pg_usleep(1000000L);	/* 1000 ms */
+		HandleStartupProcInterrupts();
+	}
+}
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -7094,6 +7131,46 @@ StartupXLOG(void)
 						break;
 				}
 			}
+			else
+			{
+				ereport(LOG,
+						(errmsg("recovery has reached end-of-the-wal and has not reached the recovery target yet"),
+				errhint("This could be due to corrupt or missing WAL files.\n"
+						"All the WAL files needed for the recovery must be available to proceed to the recovery target "
+				"Or you might need to choose an earlier recovery target.")));
+
+				/*
+				 * This is the position where we can choose to shutdown, pause
+				 * or promote at the end-of-the-wal if the intended recovery
+				 * target is not reached
+				 */
+				switch (recoveryTargetIncomplete)
+				{
+
+					case RECOVERY_TARGET_INCOMPLETE_SHUTDOWN:
+
+						/*
+						 * exit with special return code to request shutdown
+						 * of postmaster.  Log messages issued from
+						 * postmaster.
+						 */
+
+						ereport(LOG,
+								(errmsg("shutdown at end-of-the-wal")));
+						proc_exit(2);
+
+					case RECOVERY_TARGET_INCOMPLETE_PAUSE:
+
+						SetRecoveryPause(true);
+						IncompleteRecoveryPause();
+
+						/* drop into promote */
+
+					case RECOVERY_TARGET_INCOMPLETE_PROMOTE:
+						break;
+				}
+
+			}
 
 			/* Allow resource managers to do any required cleanup. */
 			for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 8ad4d47..e46f50d 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -256,6 +256,17 @@ typedef enum
 } RecoveryTargetAction;
 
 /*
+ * Recovery target incomplete.
+ */
+
+typedef enum
+{
+	RECOVERY_TARGET_INCOMPLETE_PAUSE,
+	RECOVERY_TARGET_INCOMPLETE_PROMOTE,
+	RECOVERY_TARGET_INCOMPLETE_SHUTDOWN
+}	RecoveryTargetIncomplete;
+
+/*
  * Method table for resource managers.
  *
  * This struct must be kept in sync with the PG_RMGR definition in
#29David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#28)
Re: patch proposal

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

I will be adding the tests in
src/test/recovery/t/003_recovery_targets.pl
<http://003_recovery_targets.pl&gt;. My tests would look more or less
similar to recovery_target_action. I do not see any of the tests added
for the parameter recovery_target_action ? Anyways, i will work on
adding tests for recovery_target_incomplete.

Do you know when those will be ready?

4) I'm not convinced that fatal errors by default are the way to go.
It's a pretty big departure from what we have now.

I have changed the code to generate ERRORs instead of FATALs. Which is
in the patch recoveryStartPoint.patch

I think at this point in recovery any ERROR at all will probably be
fatal. Once you have some tests in place we'll know for sure.

Either way, the goal would be to keep recovery from proceeding and let
the user adjust their targets. Since this is a big behavioral change
which will need buy in from the community.

Also, I don't think it's very intuitive how recovery_target_incomplete
works. For instance, if I set recovery_target_incomplete=promote and
set recovery_target_time, but pick a backup after the specified time,
then there will be an error "recovery_target_time %s is older..." rather
than a promotion.

Yes, that is correct and that is the expected behaviour. Let me explain -

My point was that this combination of options could lead to confusion on
the part of users. However, I'd rather leave that alone for the time
being and focus on the first patch.

I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch. Once there are tests for the
first patch I will complete my review.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Michael Paquier
michael.paquier@gmail.com
In reply to: David Steele (#29)
Re: patch proposal

On Tue, Jan 31, 2017 at 4:49 AM, David Steele <david@pgmasters.net> wrote:

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch. Once there are tests for the
first patch I will complete my review.

Based on that, I am moving the patch to next CF with "Needs Review".
Venkata, please be careful in updating correctly the patch status, it
was still on "Waiting on Author".
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Venkata B Nagothi
nag1010@gmail.com
In reply to: David Steele (#29)
Re: patch proposal

On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net> wrote:

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

I will be adding the tests in
src/test/recovery/t/003_recovery_targets.pl
<http://003_recovery_targets.pl&gt;. My tests would look more or less
similar to recovery_target_action. I do not see any of the tests added
for the parameter recovery_target_action ? Anyways, i will work on
adding tests for recovery_target_incomplete.

Do you know when those will be ready?

Apologies for the delayed response. Actually, I am working through to add
the tests for both the patches and started with adding the tests to
recovery_target_incomplete parameter.
I have hit an issue while adding the tests which i have explained below.

4) I'm not convinced that fatal errors by default are the way to go.
It's a pretty big departure from what we have now.

I have changed the code to generate ERRORs instead of FATALs. Which is
in the patch recoveryStartPoint.patch

I think at this point in recovery any ERROR at all will probably be
fatal. Once you have some tests in place we'll know for sure.

Sure. I can add the tests for the first patch.

Either way, the goal would be to keep recovery from proceeding and let

the user adjust their targets. Since this is a big behavioral change
which will need buy in from the community.

Sure. Agreed.

Also, I don't think it's very intuitive how

recovery_target_incomplete

works. For instance, if I set recovery_target_incomplete=promote

and

set recovery_target_time, but pick a backup after the specified time,
then there will be an error "recovery_target_time %s is older..."

rather

than a promotion.

Yes, that is correct and that is the expected behaviour. Let me explain -

My point was that this combination of options could lead to confusion on
the part of users. However, I'd rather leave that alone for the time
being and focus on the first patch.

Sure.

I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch. Once there are tests for the
first patch I will complete my review.

Sure. Will submit the first patch along with tests once i can get through
the issue i am facing.

*The issue i am facing while writing the tests*

Below is the problem i am facing while adding the tests to
003_recovery_targets.pl

1. Test case : Test the Incomplete recovery with parameters
restore_command, recovery_target_xid and recovery_target_incomplete
='shutdown' (option 'pause' and 'promote' work fine)

2. Expected test Result : Standby node successfully shuts-down if the
recovery process reaches mid-way and does not reach the intended recovery
target due to unavailability of WALs

3. All good. Everything works as expected. The problem is that, the test
exits with the exit code 256, bails out and shows-up as "pg_ctl failed".

Is there anyway, i can ensure the test does not bail-out and exits with the
exit code '0' when you do "pg_ctl start" and the server starts and
shuts-down for any reason ?

If you look at the log file, pg_ctl actually starts successfully and then
shuts down cleanly because intended recovery target is not reached. The
messages in the log file are also appropriate.
Since it is a clean-start and clean-shutdown (which means the test is
successful), the test should return a success, which is not happening.

Below is the log (a test for the second patch). I believe Perl generates an
error code 256 because pg_ctl responds with an error - "pg_ctl: could not
start the server".
I would need my test case to return a success in this scenario. Any
suggestions would be great.

[dba@buildhost ~]$ /data/PostgreSQL-Repo/postgresql/tmp_install/disk1/
pginstall/pgsql-10-dev-master-xlog-2-wal/bin/pg_ctl -D
/data/PostgreSQL-Repo/postgresql/src/test/recovery/
tmp_check/data_pitr_3_U_oA/pgdata start
waiting for server to start....2017-02-22 12:06:41.773 AEDT [23858] LOG:
database system was interrupted while in recovery at log time 2017-02-15
16:40:41 AEDT
2017-02-22 12:06:41.773 AEDT [23858] HINT: If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-02-22 12:06:41.773 AEDT [23858] LOG: starting point-in-time recovery
to XID 559
2017-02-22 12:06:41.782 AEDT [23858] LOG: restored log file
"000000010000000000000002" from archive
2017-02-22 12:06:41.784 AEDT [23858] LOG: redo starts at 0/2000028
2017-02-22 12:06:41.794 AEDT [23858] LOG: restored log file
"000000010000000000000003" from archive
2017-02-22 12:06:41.814 AEDT [23858] LOG: restored log file
"000000010000000000000004" from archive
2017-02-22 12:06:41.835 AEDT [23858] LOG: restored log file
"000000010000000000000005" from archive
2017-02-22 12:06:41.858 AEDT [23858] LOG: restored log file
"000000010000000000000006" from archive
2017-02-22 12:06:41.879 AEDT [23858] LOG: restored log file
"000000010000000000000007" from archive
2017-02-22 12:06:41.891 AEDT [23858] LOG: consistent recovery state
reached at 0/8000000
*2017-02-22 12:06:41.891 AEDT [23857] LOG: database system is ready to
accept read only connections*
cp: cannot stat ‘/data/PostgreSQL-Repo/postgresql/src/test/recovery/
tmp_check/data_master_cyOw/archives/000000010000000000000008’: No such file
or directory
2017-02-22 12:06:41.893 AEDT [23858] LOG: recovery has reached
end-of-the-wal and has not reached the recovery target yet
2017-02-22 12:06:41.893 AEDT [23858] HINT: This could be due to corrupt or
missing WAL files.
All the WAL files needed for the recovery must be available to
proceed to the recovery target Or you might need to choose an earlier
recovery target.
*2017-02-22 12:06:41.893 AEDT [23858] LOG: shutdown at end-of-the-wal*
2017-02-22 12:06:41.893 AEDT [23857] LOG: startup process (PID 23858)
exited with exit code 2
2017-02-22 12:06:41.893 AEDT [23857] LOG: terminating any other active
server processes
2017-02-22 12:06:41.894 AEDT [23857] LOG: database system is shut down
stopped waiting
*pg_ctl: could not start server*
Examine the log output.

I will be facing the same problem while adding the tests for the first
patch. In the first patch, the recovery will not proceed and database will
shutdown if the specified recovery target is not legitimate.

To summarise, i would require the tests to exit with an exit code "0" when
the pg_ctl could not the start the server or shuts down the server when
"pg_ctl start" is issued in certain recovery scenarios.

Thanks,

Venkata B N
Database Consultant

#32Venkata B Nagothi
nag1010@gmail.com
In reply to: Michael Paquier (#30)
Re: patch proposal

On Tue, Jan 31, 2017 at 10:41 AM, Michael Paquier <michael.paquier@gmail.com

wrote:

On Tue, Jan 31, 2017 at 4:49 AM, David Steele <david@pgmasters.net> wrote:

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch. Once there are tests for the
first patch I will complete my review.

Based on that, I am moving the patch to next CF with "Needs Review".
Venkata, please be careful in updating correctly the patch status, it
was still on "Waiting on Author".

Apologies. Sure. Will make a note.

Regards,

Venkata B N
Database Consultant

#33Venkata B Nagothi
nag1010@gmail.com
In reply to: David Steele (#29)
2 attachment(s)
Re: patch proposal

Hi David,

On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net> wrote:

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

I will be adding the tests in
src/test/recovery/t/003_recovery_targets.pl
<http://003_recovery_targets.pl&gt;. My tests would look more or less
similar to recovery_target_action. I do not see any of the tests added
for the parameter recovery_target_action ? Anyways, i will work on
adding tests for recovery_target_incomplete.

Do you know when those will be ready?

Attached are both the patches with tests included.

4) I'm not convinced that fatal errors by default are the way to go.
It's a pretty big departure from what we have now.

I have changed the code to generate ERRORs instead of FATALs. Which is
in the patch recoveryStartPoint.patch

I think at this point in recovery any ERROR at all will probably be
fatal. Once you have some tests in place we'll know for sure.

Either way, the goal would be to keep recovery from proceeding and let
the user adjust their targets. Since this is a big behavioral change
which will need buy in from the community.

Hoping to get some comments / feedback from the community on the second
patch too.

Also, I don't think it's very intuitive how recovery_target_incomplete

works. For instance, if I set recovery_target_incomplete=promote

and

set recovery_target_time, but pick a backup after the specified time,
then there will be an error "recovery_target_time %s is older..."

rather

than a promotion.

Yes, that is correct and that is the expected behaviour. Let me explain -

My point was that this combination of options could lead to confusion on
the part of users. However, I'd rather leave that alone for the time
being and focus on the first patch.

I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch. Once there are tests for the
first patch I will complete my review.

I have added all the tests for the second patch and all seem to be working
fine.

Regarding the first patch, i have included all the tests except for the
test case on recovery_target_time.
I did write the test, but, the test is kind of behaving weird which i am
working through to resolve.

Regards,

Venkata B N
Database Consultant

Attachments:

recoveryStartPoint.patch-version3application/octet-stream; name=recoveryStartPoint.patch-version3Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index a91864b..73ef387 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -211,6 +211,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the checkpoint time of the backup being used is earlier than the specified recovery_target_time. 
+        In other words, the recovery will not proceed further if the checkpoint time of the backup is later than the specified recovery_target_time.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -231,6 +236,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the latest completed xid of the backup is earlier than the specified recovery target xid.
+        In other words, the recovery will not proceed further if the latest completed xid of the backup is later than the specified target xid.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -248,6 +258,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         parameter is parsed using the system data type
         <link linkend="datatype-pg-lsn"><type>pg_lsn</></link>.
        </para>
+       <para>
+        This parameter also checks if the current checkpoint's redo position of the backup is earlier than the specified recovery target lsn.
+        In other words, the recovery will not proceed further if the latest checkpoint's redo position of the backup is later than the specified target lsn.
+       </para>
+
       </listitem>
      </varlistentry>
      </variablelist>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5016273..e8411a0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -834,6 +834,7 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoveryCommandFile(void);
+static void recoveryStartsHere(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
@@ -5544,6 +5545,67 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 	return false;
 }
 
+/* When performing point-in-time-recovery, this function identifies if
+ * the specified recovery target (recovery_target_time, recovery_target_lsn and recovery_target_xid) is prior to that of the backup.
+ * Which means, recovery cannot proceed if the recovery target point is prior to backup start point.
+ */
+
+static void
+recoveryStartsHere(void)
+{
+	/*
+	 * Check if the recovery target xid is older than the latest completed xid
+	 * of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_XID)
+	{
+		if (TransactionIdPrecedes(recoveryTargetXid,
+							 ControlFile->checkPointCopy.latestCompletedXid))
+		{
+			ereport(ERROR,
+					(errmsg("recovery_target_xid %u is older than the latest Completed Xid %u", recoveryTargetXid, ControlFile->checkPointCopy.latestCompletedXid),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target lsn is prior to the latest checkpointi's
+	 * redo position of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_LSN)
+	{
+		if (recoveryTargetLSN < ControlFile->checkPointCopy.redo)
+		{
+			ereport(ERROR,
+					(errmsg("recovery_target_lsn \"%X/%X\" is older than the backup start LSN \"%X/%X\"",
+							(uint32) (recoveryTargetLSN >> 32),
+							(uint32) recoveryTargetLSN, (uint32) (ControlFile->checkPointCopy.redo >> 32), (uint32) ControlFile->checkPointCopy.redo),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target time is prior to the current timestamp of
+	 * the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_TIME)
+	{
+		if (recoveryTargetTime < ControlFile->checkPointCopy.time)
+		{
+			ereport(ERROR,
+					(errmsg("recovery_target_time %s is older than the backup start time %s", timestamptz_to_str(recoveryTargetTime),
+					   timestamptz_to_str(ControlFile->checkPointCopy.time)),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -6277,6 +6339,9 @@ StartupXLOG(void)
 					(errmsg("starting archive recovery")));
 	}
 
+	/* Check if archive recovery can start at all */
+	recoveryStartsHere();
+
 	/*
 	 * Take ownership of the wakeup latch if we're going to sleep during
 	 * recovery.
@@ -8642,6 +8707,7 @@ CreateCheckPoint(int flags)
 	checkPoint.nextXid = ShmemVariableCache->nextXid;
 	checkPoint.oldestXid = ShmemVariableCache->oldestXid;
 	checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
+	checkPoint.latestCompletedXid = ShmemVariableCache->latestCompletedXid;
 	LWLockRelease(XidGenLock);
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f47171d..5ba170c 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -222,6 +222,8 @@ main(int argc, char *argv[])
 	printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
 		   ControlFile->checkPointCopy.nextXidEpoch,
 		   ControlFile->checkPointCopy.nextXid);
+	printf(_("Latest checkpoint's latestCompletedXID: %u\n"),
+		   ControlFile->checkPointCopy.latestCompletedXid);
 	printf(_("Latest checkpoint's NextOID:          %u\n"),
 		   ControlFile->checkPointCopy.nextOid);
 	printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e4194b9..a8c9101 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -58,6 +58,7 @@ typedef struct CheckPoint
 	 * set to InvalidTransactionId.
 	 */
 	TransactionId oldestActiveXid;
+	TransactionId latestCompletedXid;
 } CheckPoint;
 
 /* XLOG info values for XLOG rmgr */
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 4018f0a..fb0a48b 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -601,6 +601,11 @@ Restoring WAL segments from archives using restore_command can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Restoring WAL segments from archives using restore command to perform PITR
+can be enabled by passing the keyword parameter has_restoring_pitr => 1. This
+is disabled by default. By enabling this parameter, the standby database will
+not operate in standby mode.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -618,6 +623,7 @@ sub init_from_backup
 	$params{hba_permit_replication} = 1
 	  unless defined $params{hba_permit_replication};
 	$params{has_restoring} = 0 unless defined $params{has_restoring};
+	$params{has_restoring_pitr} = 0 unless defined $params{has_restoring_pitr};
 
 	print
 "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n";
@@ -641,6 +647,7 @@ port = $port
 	$self->set_replication_conf         if $params{hba_permit_replication};
 	$self->enable_streaming($root_node) if $params{has_streaming};
 	$self->enable_restoring($root_node) if $params{has_restoring};
+	$self->enable_restoring_pitr($root_node) if $params{has_restoring_pitr};
 }
 
 =pod
@@ -673,6 +680,31 @@ sub start
 	$self->_update_pid;
 }
 
+
+=pod
+
+=item $node->start_pitr()
+
+Wrapper for pg_ctl start
+
+Start the node and wait until it is ready to accept connections.
+
+=cut
+
+sub start_pitr
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+        my $log    = $self->logfile;
+        print("### Starting node \"$name\"\n");
+        my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+               $self->logfile, 'start');
+
+	return $ret;
+}
+
 =pod
 
 =item $node->stop(mode)
@@ -789,7 +821,6 @@ sub enable_restoring
 	  $TestLib::windows_os
 	  ? qq{copy "$path\\\\%f" "%p"}
 	  : qq{cp "$path/%f" "%p"};
-
 	$self->append_conf(
 		'recovery.conf', qq(
 restore_command = '$copy_command'
@@ -797,6 +828,32 @@ standby_mode = on
 ));
 }
 
+# Internal routine to enable archive recovery command on a standby node for PITR
+sub enable_restoring_pitr
+{
+        my ($self, $root_node) = @_;
+        my $path = $root_node->archive_dir;
+        my $name = $self->name;
+
+        print "### Enabling WAL restore for node \"$name\"\n";
+
+        # On Windows, the path specified in the restore command needs to use
+        # double back-slashes to work properly and to be able to detect properly
+        # the file targeted by the copy command, so the directory value used
+        # in this routine, using only one back-slash, need to be properly changed
+        # first. Paths also need to be double-quoted to prevent failures where
+        # the path contains spaces.
+        $path =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+        my $copy_command =
+          $TestLib::windows_os
+          ? qq{copy "$path\\\\%f" "%p"}
+          : qq{cp "$path/%f" "%p"};
+	$self->append_conf(
+		'recovery.conf', qq(
+restore_command = '$copy_command'
+));
+}
+
 # Internal routine to enable archiving
 sub enable_archiving
 {
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b7b0caa..9aa7a6c 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 11;
 
 # Create and test a standby from given backup, with a certain
 # recovery target.
@@ -45,6 +45,64 @@ sub test_recovery_standby
 	$node_standby->teardown_node;
 }
 
+# Test recovery start point for various recovery targets
+
+sub test_recovery_pitr
+{
+        my $test_name       = shift;
+        my $node_name       = shift;
+        my $node_master     = shift;
+        my $recovery_params = shift;
+	my $num_rows	    = shift;
+
+        my $node_standby = get_new_node($node_name);
+        $node_standby->init_from_backup($node_master, 'my_pitr',
+                has_restoring_pitr => 1);
+
+        foreach my $param_item (@$recovery_params)
+        {
+                $node_standby->append_conf(
+                        'recovery.conf',
+                                qq($param_item
+));
+        }
+        my $ret = $node_standby->start_pitr;
+
+	my $log = $node_standby->logfile;
+	my $irx = "is older than the latest Completed Xid";
+        my $irl = "is older than the backup start LSN";
+
+        my @rx = `grep -inr "$irx" $log`;
+        my @rl = `grep -inr "$irl" $log`;
+
+        print "matching lines for xid: @rx\n";
+        print "matching lines for lsn : @rl\n";
+
+        if ($ret != 0)
+        {
+                if (@rx)
+                {
+                        print "Recovery could not start as the specified target XID was not legitimate. Test Passed\n";
+                        isnt($ret,0, "Successful $test_name.");
+                        goto TEST_RECOVERY_LSN;
+
+                }
+                elsif(@rl)
+                {
+			print "Recovery could not start as the specified target LSN was not legitimate. Test Passed\n";
+			isnt($ret,0, "Successful $test_name.");
+			exit 0;
+
+                }
+                else
+                {
+                        print "# pg_ctl failed; logfile:\n";
+                        print TestLib::slurp_file($node_standby->logfile);
+                        BAIL_OUT("pg_ctl failed");
+                }
+        }
+}
+
 # Initialize master node
 my $node_master = get_new_node('master');
 $node_master->init(has_archiving => 1, allows_streaming => 1);
@@ -144,3 +202,51 @@ test_recovery_standby('XID + time + name',
 	"recovery_target_lsn = '$recovery_lsn'",);
 test_recovery_standby('XID + time + name + LSN',
 	'standby_9', $node_master, \@recovery_params, "5000", $lsn5);
+
+# Check if the recovery start point is legitimate
+# If the recovery start point is not appropriate, recovery should not start at all
+
+# Generate some data on master to advance the database position
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(4001,5000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(6001,7000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(7001,8000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(8001,9000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(9001,10000))");
+
+print "required data generated.\n";
+
+my $rxid = $node_master->safe_psql('postgres',
+        "SELECT pg_current_wal_location(), txid_current();");
+my ($lsn10, $recovery_lxid) = split /\|/, $rxid;
+
+# Now take a fresh backup
+
+$node_master->backup('my_pitr');
+
+print "Old LSN prior to my_pitr is : $lsn1\n";
+print "Old XID prior to my_pitr is : $recovery_txid\n";
+print "Old TIMESTAMP prior to the pitr is : $recovery_time\n";
+
+# Test the recovery start point for XID. Specify an old XID which falls prior to the backup's latest XID
+@recovery_params = (
+         "recovery_target_xid  = '$recovery_txid'",);
+
+test_recovery_pitr('Recovery Start Point check for XID',
+                 'pitr_1', $node_master, \@recovery_params, "10000");
+
+# Test the checking of the recovery start point for target LSN. Specify an old LSN which falls prior to the backup's latest LSN.
+TEST_RECOVERY_LSN:
+my $recovery_olsn = $lsn1;
+@recovery_params = (
+         "recovery_target_lsn  = '$recovery_olsn'",);
+
+test_recovery_pitr('Recovery Start Point check for LSN',
+                 'pitr_2', $node_master, \@recovery_params, "20000");
recoveryTargetIncomplete.patch-version4application/octet-stream; name=recoveryTargetIncomplete.patch-version4Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index a91864b..9053bb0 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -349,6 +349,48 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="recovery-target-incomplete"
+                   xreflabel="recovery_target_incomplete">
+      <term><varname>recovery_target_incomplete</varname> (<type>enum</type>)
+      <indexterm>
+        <primary><varname>recovery_target_incomplete</> recovery parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies what action the server should take once the recovery target is
+        not reached. The default is <literal>promote</>, which means recovery will
+        finish and the server will start to accept connections. <literal>pause</> means the recovery process will be paused.
+        Finally <literal>shutdown</> will stop the server if the recovery cannot proceed any further.
+       </para>
+       <para>
+        The intended use of the <literal>pause</> setting is to allow queries
+        to be executed against the database to check if this recovery target
+        is the most desirable point for recovery.
+        The paused state can be resumed by
+        using <function>pg_xlog_replay_resume()</> (see
+        <xref linkend="functions-recovery-control-table">), which then
+        causes recovery to end. If this recovery target is not the
+        desired stopping point, then shut down the server, change the
+        recovery target settings to a later target and restart to
+        continue recovery.
+       </para>
+       <para>
+        The <literal>shutdown</> setting is useful to have the instance ready
+        at the exact replay point desired.  The instance will still be able to
+        replay more WAL records (and in fact will have to replay WAL records
+        since the last checkpoint next time it is started).
+       </para>
+       <para>
+        Note that because <filename>recovery.conf</> will not be renamed when
+        <varname>recovery_target_incomplete</> is set to <literal>shutdown</>,
+        any subsequent start will end with immediate shutdown unless the
+        configuration is changed or the <filename>recovery.conf</> file is
+        removed manually.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
    </sect1>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5016273..159fe16 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -259,6 +259,7 @@ static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
 static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
+static RecoveryTargetIncomplete recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -838,6 +839,7 @@ static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static void recoveryPausesHere(void);
+static void IncompleteRecoveryPause(void);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
@@ -5163,6 +5165,26 @@ readRecoveryCommandFile(void)
 
 			recoveryTargetActionSet = true;
 		}
+		else if (strcmp(item->name, "recovery_target_incomplete") == 0)
+		{
+			if (strcmp(item->value, "pause") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PAUSE;
+			else if (strcmp(item->value, "promote") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
+			else if (strcmp(item->value, "shutdown") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_SHUTDOWN;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid value for recovery parameter \"%s\": \"%s\"",
+					   "recovery_target_incomplete",
+					   item->value),
+						 errhint("Valid values are \"pause\", \"promote\", and \"shutdown\".")));
+
+			ereport(DEBUG2,
+					(errmsg_internal("recovery_target_incomplete = '%s'",
+									 item->value)));
+		}
 		else if (strcmp(item->name, "recovery_target_timeline") == 0)
 		{
 			rtliGiven = true;
@@ -5889,6 +5911,21 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
+static void
+IncompleteRecoveryPause(void)
+{
+	/* Pause recovery at end-of-the-wal when recovery target is not reached */
+	ereport(LOG,
+			(errmsg("recovery has paused"),
+			 errhint("Execute pg_xlog_replay_resume() to continue.")));
+
+	while (RecoveryIsPaused())
+	{
+		pg_usleep(1000000L);	/* 1000 ms */
+		HandleStartupProcInterrupts();
+	}
+}
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -7205,6 +7242,46 @@ StartupXLOG(void)
 						break;
 				}
 			}
+			else
+			{
+				ereport(LOG,
+						(errmsg("recovery has reached end-of-the-wal and has not reached the recovery target yet"),
+				errhint("This could be due to corrupt or missing WAL files.\n"
+						"All the WAL files needed for the recovery must be available to proceed to the recovery target "
+				"Or you might need to choose an earlier recovery target.")));
+
+				/*
+				 * This is the position where we can choose to shutdown, pause
+				 * or promote at the end-of-the-wal if the intended recovery
+				 * target is not reached
+				 */
+				switch (recoveryTargetIncomplete)
+				{
+
+					case RECOVERY_TARGET_INCOMPLETE_SHUTDOWN:
+
+						/*
+						 * exit with special return code to request shutdown
+						 * of postmaster.  Log messages issued from
+						 * postmaster.
+						 */
+
+						ereport(LOG,
+								(errmsg("shutdown at end-of-the-wal")));
+						proc_exit(2);
+
+					case RECOVERY_TARGET_INCOMPLETE_PAUSE:
+
+						SetRecoveryPause(true);
+						IncompleteRecoveryPause();
+
+						/* drop into promote */
+
+					case RECOVERY_TARGET_INCOMPLETE_PROMOTE:
+						break;
+				}
+
+			}
 
 			/* Allow resource managers to do any required cleanup. */
 			for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 578bff5..b9d707c 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -256,6 +256,17 @@ typedef enum
 } RecoveryTargetAction;
 
 /*
+ * Recovery target incomplete.
+ */
+
+typedef enum
+{
+	RECOVERY_TARGET_INCOMPLETE_PAUSE,
+	RECOVERY_TARGET_INCOMPLETE_PROMOTE,
+	RECOVERY_TARGET_INCOMPLETE_SHUTDOWN
+}	RecoveryTargetIncomplete;
+
+/*
  * Method table for resource managers.
  *
  * This struct must be kept in sync with the PG_RMGR definition in
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 4018f0a..e8f6969 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -601,6 +601,11 @@ Restoring WAL segments from archives using restore_command can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Restoring WAL segments from archives using restore command to perform PITR
+can be enabled by passing the keyword parameter has_restoring_pitr => 1. This
+is disabled by default. By enabling this parameter, the standby database will
+not operate in standby mode.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -618,6 +623,7 @@ sub init_from_backup
 	$params{hba_permit_replication} = 1
 	  unless defined $params{hba_permit_replication};
 	$params{has_restoring} = 0 unless defined $params{has_restoring};
+	$params{has_restoring_pitr} = 0 unless defined $params{has_restoring_pitr};
 
 	print
 "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n";
@@ -641,6 +647,7 @@ port = $port
 	$self->set_replication_conf         if $params{hba_permit_replication};
 	$self->enable_streaming($root_node) if $params{has_streaming};
 	$self->enable_restoring($root_node) if $params{has_restoring};
+	$self->enable_restoring_pitr($root_node) if $params{has_restoring_pitr};
 }
 
 =pod
@@ -673,6 +680,68 @@ sub start
 	$self->_update_pid;
 }
 
+=item $node->pitr_start()
+
+Wrapper for pg_ctl start
+
+Start the node and wait until it is ready to accept connections.
+
+=cut
+
+sub pitr_start
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+	my $log	   = $self->logfile;
+	print("### Starting node \"$name\"\n");
+	my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+               $self->logfile, 'start');
+
+        if ($ret != 0)
+        {
+                my $s = "shutdown at end-of-the-wal";
+
+                my @rs = `grep $s, $log`;
+
+                if (@rs)
+                {
+                        print "Database was shutdown at end-of-the-wal. Test Passed\n";
+			isnt($ret,0, "check the PITR node for successful SHUTDOWN at incomplete recovery");
+			exit 0;
+
+                }
+		else
+		{
+			print "# pg_ctl failed; logfile:\n";
+			print TestLib::slurp_file($self->logfile);
+			BAIL_OUT("pg_ctl failed");
+		}
+        }
+
+        $self->_update_pid;
+}
+
+=item $node->status()
+
+Wrapper for pg_ctl status
+
+Check the node status
+
+=cut
+
+sub status
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+        print("### Checking the node status \"$name\"\n");
+        my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+                $self->logfile, 'status');
+}
+
 =pod
 
 =item $node->stop(mode)
@@ -789,7 +858,6 @@ sub enable_restoring
 	  $TestLib::windows_os
 	  ? qq{copy "$path\\\\%f" "%p"}
 	  : qq{cp "$path/%f" "%p"};
-
 	$self->append_conf(
 		'recovery.conf', qq(
 restore_command = '$copy_command'
@@ -797,6 +865,32 @@ standby_mode = on
 ));
 }
 
+# Internal routine to enable archive recovery command on a standby node for PITR
+sub enable_restoring_pitr
+{
+        my ($self, $root_node) = @_;
+        my $path = $root_node->archive_dir;
+        my $name = $self->name;
+
+        print "### Enabling WAL restore for node \"$name\"\n";
+
+        # On Windows, the path specified in the restore command needs to use
+        # double back-slashes to work properly and to be able to detect properly
+        # the file targeted by the copy command, so the directory value used
+        # in this routine, using only one back-slash, need to be properly changed
+        # first. Paths also need to be double-quoted to prevent failures where
+        # the path contains spaces.
+        $path =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+        my $copy_command =
+          $TestLib::windows_os
+          ? qq{copy "$path\\\\%f" "%p"}
+          : qq{cp "$path/%f" "%p"};
+	$self->append_conf(
+		'recovery.conf', qq(
+restore_command = '$copy_command'
+));
+}
+
 # Internal routine to enable archiving
 sub enable_archiving
 {
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b7b0caa..e47fb74 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
 
 # Create and test a standby from given backup, with a certain
 # recovery target.
@@ -27,9 +27,7 @@ sub test_recovery_standby
 			qq($param_item
 ));
 	}
-
 	$node_standby->start;
-
 	# Wait until standby has replayed enough data
 	my $caughtup_query =
 	  "SELECT '$until_lsn'::pg_lsn <= pg_last_wal_replay_location()";
@@ -41,6 +39,44 @@ sub test_recovery_standby
 	  $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 	is($result, qq($num_rows), "check standby content for $test_name");
 
+        print "Rows in PITR  : $result\n";
+        print "Rows in master: $num_rows\n";
+
+	# Stop standby node
+	$node_standby->teardown_node;
+}
+
+
+# Create a node from given backup to perform PITR to a certain recovery target.
+
+sub test_recovery_pitr
+{
+        my $test_name       = shift;
+        my $node_name       = shift;
+        my $node_master     = shift;
+        my $recovery_params = shift;
+        my $num_rows        = shift;
+        my $until_lsn       = shift;
+
+	my $node_standby = get_new_node($node_name);
+	$node_standby->init_from_backup($node_master, 'my_pitr',
+		has_restoring_pitr => 1);
+
+	foreach my $param_item (@$recovery_params)
+	{
+		$node_standby->append_conf(
+			'recovery.conf',
+				qq($param_item
+));
+	}
+	$node_standby->pitr_start;
+	# Check the content on pitr
+	my $result =
+		$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+	isnt($result, qq($num_rows), "check the PITR node for successful $test_name");
+	print "Rows in PITR  : $result\n";
+	print "Rows in master: $num_rows\n";
+
 	# Stop standby node
 	$node_standby->teardown_node;
 }
@@ -144,3 +180,79 @@ test_recovery_standby('XID + time + name',
 	"recovery_target_lsn = '$recovery_lsn'",);
 test_recovery_standby('XID + time + name + LSN',
 	'standby_9', $node_master, \@recovery_params, "5000", $lsn5);
+
+# Test Incomplete recovery
+# Initialize master node
+
+$node_master = get_new_node('master1');
+$node_master->init(has_archiving => 1, allows_streaming => 1);
+
+# Start it
+$node_master->start;
+
+# Create data before taking the backup, aimed at testing
+
+$node_master->safe_psql('postgres',
+         "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+
+	my $initial_lsn = $node_master->safe_psql('postgres', "SELECT pg_current_wal_location();");
+	print "LSN Before backup: $initial_lsn\n";
+	my $initial_xid = $node_master->safe_psql('postgres', "SELECT txid_current();");
+	print "XID before backup: $initial_xid\n";
+	my $initial_rows = $node_master->safe_psql('postgres', "SELECT count(*) from tab_int;");
+	print "Initial rows before backup: $initial_rows\n";
+
+# Take backup from which all operations will be run
+$node_master->backup('my_pitr');
+
+# Generate enough data and more WAL Archives for a recovery target reference.
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(1001,10000))");
+# Force archiving of WAL file
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(10001,20000))");
+# Force archiving of WAL file
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(20001,30000))");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(30001,40000))");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(40001,60000))");
+
+# current wal position in the master node
+$ret = $node_master->safe_psql('postgres',
+        "SELECT pg_current_wal_location(), txid_current();");
+my ($lsn6, $recovery_xid) = split /\|/, $ret;
+
+# Test the server promotion when the recovery fails to reach the recovery target.
+my $recovery_target_incomplete='promote';
+
+@recovery_params = (
+	"recovery_target_xid = '$recovery_xid'",
+	"recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('PROMOTION at incomplete recovery','pitr_1',$node_master, \@recovery_params, "60000", $lsn6);
+
+# Check if the server pauses when the recovery stops mid-way without reaching the recovery target.
+
+$recovery_target_incomplete='pause';
+
+@recovery_params = (
+        "recovery_target_xid = '$recovery_xid'",
+        "recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('PAUSE at incomplete recovery','pitr_2',$node_master, \@recovery_params, "60000", $lsn6);
+
+# Check if the server successfully shuts down when the recovery stops mid-way without reaching the recovery target.
+
+$recovery_target_incomplete='shutdown';
+
+@recovery_params = (
+        "recovery_target_xid = '$recovery_xid'",
+        "recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('SHUTDOWN at incomplete recovery','pitr_3',$node_master, \@recovery_params, "60000", $lsn6);
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 0d3859d..f90a449 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -247,7 +247,7 @@ sub pre_indent
 		# previous line was struct, etc.
 		next
 		  if $srclines[ $lno - 1 ] =~
-			  m!=|^(struct|enum|[ \t]*typedef|extern[ \t]+"C")!;
+		  m!=|^(struct|enum|[ \t]*typedef|extern[ \t]+"C")!;
 
 		$srclines[$lno] = "$l2\nint pgindent_func_no_var_fix;";
 	}
@@ -520,7 +520,7 @@ File::Find::find(
 			  && -f _
 			  && /^.*\.[ch]\z/s
 			  && push(@files, $File::Find::name);
-		  }
+		}
 	},
 	$code_base) if $code_base;
 
diff --git a/src/tools/pgindent/pgperltidy b/src/tools/pgindent/pgperltidy
index 6098e18..395bce3 100755
--- a/src/tools/pgindent/pgperltidy
+++ b/src/tools/pgindent/pgperltidy
@@ -17,7 +17,7 @@ PERLTIDY=${PERLTIDY:-perltidy}
 	cut -d: -f1
 ) |
 sort -u |
-xargs $PERLTIDY --profile=src/tools/pgindent/perltidyrc
+xargs $PERLTIDY --profile=/data/PostgreSQL-Repo/postgresql/src/tools/pgindent/perltidyrc
 
 # perltidyrc specifies --backup-and-modify-in-place, so get rid of .bak files
 find . -type f -name '*.bak' | xargs rm
#34Venkata B Nagothi
nag1010@gmail.com
In reply to: Venkata B Nagothi (#33)
2 attachment(s)
Re: patch proposal

On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi <nag1010@gmail.com> wrote:

Hi David,

On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net> wrote:

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

I will be adding the tests in
src/test/recovery/t/003_recovery_targets.pl
<http://003_recovery_targets.pl&gt;. My tests would look more or less
similar to recovery_target_action. I do not see any of the tests added
for the parameter recovery_target_action ? Anyways, i will work on
adding tests for recovery_target_incomplete.

Do you know when those will be ready?

Attached are both the patches with tests included.

4) I'm not convinced that fatal errors by default are the way to go.
It's a pretty big departure from what we have now.

I have changed the code to generate ERRORs instead of FATALs. Which is
in the patch recoveryStartPoint.patch

I think at this point in recovery any ERROR at all will probably be
fatal. Once you have some tests in place we'll know for sure.

Either way, the goal would be to keep recovery from proceeding and let
the user adjust their targets. Since this is a big behavioral change
which will need buy in from the community.

Hoping to get some comments / feedback from the community on the second
patch too.

Also, I don't think it's very intuitive how
recovery_target_incomplete

works. For instance, if I set recovery_target_incomplete=promote

and

set recovery_target_time, but pick a backup after the specified

time,

then there will be an error "recovery_target_time %s is older..."

rather

than a promotion.

Yes, that is correct and that is the expected behaviour. Let me explain

-

My point was that this combination of options could lead to confusion on
the part of users. However, I'd rather leave that alone for the time
being and focus on the first patch.

I have split the patch into two different
pieces. One is to determine if the recovery can start at all and other
patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch. Once there are tests for the
first patch I will complete my review.

I have added all the tests for the second patch and all seem to be working
fine.

Regarding the first patch, i have included all the tests except for the
test case on recovery_target_time.
I did write the test, but, the test is kind of behaving weird which i am
working through to resolve.

This issue has been resolved. All good now. To avoid any further confusion,
i have attached the latest versions (4) of both the patches with all the
tests included.

I have already changed the patch status to "Needs review".

Thank you !

Regards,

Venkata B N
Database Consultant

Attachments:

recoveryStartPoint.patch-version4application/octet-stream; name=recoveryStartPoint.patch-version4Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index a91864b..73ef387 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -211,6 +211,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the checkpoint time of the backup being used is earlier than the specified recovery_target_time. 
+        In other words, the recovery will not proceed further if the checkpoint time of the backup is later than the specified recovery_target_time.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -231,6 +236,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the latest completed xid of the backup is earlier than the specified recovery target xid.
+        In other words, the recovery will not proceed further if the latest completed xid of the backup is later than the specified target xid.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -248,6 +258,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         parameter is parsed using the system data type
         <link linkend="datatype-pg-lsn"><type>pg_lsn</></link>.
        </para>
+       <para>
+        This parameter also checks if the current checkpoint's redo position of the backup is earlier than the specified recovery target lsn.
+        In other words, the recovery will not proceed further if the latest checkpoint's redo position of the backup is later than the specified target lsn.
+       </para>
+
       </listitem>
      </varlistentry>
      </variablelist>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8973583..f030baa 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -834,6 +834,7 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoveryCommandFile(void);
+static void recoveryStartsHere(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
@@ -5544,6 +5545,66 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 	return false;
 }
 
+/* When performing point-in-time-recovery, this function identifies if
+ * the specified recovery target (recovery_target_time, recovery_target_lsn and recovery_target_xid) is prior to that of the backup.
+ * Which means, recovery cannot proceed if the recovery target point is prior to backup start point.
+ */
+
+static void
+recoveryStartsHere(void)
+{
+	/*
+	 * Check if the recovery target xid is older than the latest completed xid
+	 * of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_XID)
+	{
+		if (TransactionIdPrecedes(recoveryTargetXid,
+							 ControlFile->checkPointCopy.latestCompletedXid))
+		{
+			ereport(ERROR,
+					(errmsg("Specified recovery_target_xid is older than the latest Completed Xid which is %u", ControlFile->checkPointCopy.latestCompletedXid),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target lsn is prior to the latest checkpointi's
+	 * redo position of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_LSN)
+	{
+		if (recoveryTargetLSN < ControlFile->checkPointCopy.redo)
+		{
+			ereport(ERROR,
+					(errmsg("Specified recovery_target_lsn is older than the backup start LSN which is \"%X/%X\"",
+								(uint32) (ControlFile->checkPointCopy.redo >> 32), (uint32) ControlFile->checkPointCopy.redo),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target time is prior to the current timestamp of
+	 * the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_TIME)
+	{
+		if (recoveryTargetTime < time_t_to_timestamptz(ControlFile->checkPointCopy.time))
+		{
+			ereport(ERROR,
+					(errmsg("Specified recovery_target_time is older than the backup's latest checkpoint time which is %s",
+											pstrdup(str_time(ControlFile->checkPointCopy.time))),
+					errhint("This means that the backup being used is much later than the recovery target position.\n"
+						"You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -6277,6 +6338,9 @@ StartupXLOG(void)
 					(errmsg("starting archive recovery")));
 	}
 
+	/* Check if archive recovery can start at all */
+	recoveryStartsHere();
+
 	/*
 	 * Take ownership of the wakeup latch if we're going to sleep during
 	 * recovery.
@@ -8642,6 +8706,7 @@ CreateCheckPoint(int flags)
 	checkPoint.nextXid = ShmemVariableCache->nextXid;
 	checkPoint.oldestXid = ShmemVariableCache->oldestXid;
 	checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
+	checkPoint.latestCompletedXid = ShmemVariableCache->latestCompletedXid;
 	LWLockRelease(XidGenLock);
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f47171d..5ba170c 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -222,6 +222,8 @@ main(int argc, char *argv[])
 	printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
 		   ControlFile->checkPointCopy.nextXidEpoch,
 		   ControlFile->checkPointCopy.nextXid);
+	printf(_("Latest checkpoint's latestCompletedXID: %u\n"),
+		   ControlFile->checkPointCopy.latestCompletedXid);
 	printf(_("Latest checkpoint's NextOID:          %u\n"),
 		   ControlFile->checkPointCopy.nextOid);
 	printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e4194b9..a8c9101 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -58,6 +58,7 @@ typedef struct CheckPoint
 	 * set to InvalidTransactionId.
 	 */
 	TransactionId oldestActiveXid;
+	TransactionId latestCompletedXid;
 } CheckPoint;
 
 /* XLOG info values for XLOG rmgr */
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 4018f0a..92c5367 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -601,6 +601,11 @@ Restoring WAL segments from archives using restore_command can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Restoring WAL segments from archives using restore command to perform PITR
+can be enabled by passing the keyword parameter has_restoring_pitr => 1. This
+is disabled by default. By enabling this parameter, the standby database will
+not operate in standby mode.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -618,6 +623,7 @@ sub init_from_backup
 	$params{hba_permit_replication} = 1
 	  unless defined $params{hba_permit_replication};
 	$params{has_restoring} = 0 unless defined $params{has_restoring};
+	$params{has_restoring_pitr} = 0 unless defined $params{has_restoring_pitr};
 
 	print
 "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n";
@@ -641,6 +647,7 @@ port = $port
 	$self->set_replication_conf         if $params{hba_permit_replication};
 	$self->enable_streaming($root_node) if $params{has_streaming};
 	$self->enable_restoring($root_node) if $params{has_restoring};
+	$self->enable_restoring_pitr($root_node) if $params{has_restoring_pitr};
 }
 
 =pod
@@ -673,6 +680,31 @@ sub start
 	$self->_update_pid;
 }
 
+
+=pod
+
+=item $node->start_pitr()
+
+Wrapper for pg_ctl start
+
+Start the node and wait until it is ready to accept connections.
+
+=cut
+
+sub start_pitr
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+        my $log    = $self->logfile;
+        print("### Starting node \"$name\"\n");
+        my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+               $self->logfile, 'start');
+
+	return $ret;
+}
+
 =pod
 
 =item $node->stop(mode)
@@ -797,6 +829,32 @@ standby_mode = on
 ));
 }
 
+# Internal routine to enable archive recovery command on a standby node for PITR
+sub enable_restoring_pitr
+{
+        my ($self, $root_node) = @_;
+        my $path = $root_node->archive_dir;
+        my $name = $self->name;
+
+        print "### Enabling WAL restore for node \"$name\"\n";
+
+        # On Windows, the path specified in the restore command needs to use
+        # double back-slashes to work properly and to be able to detect properly
+        # the file targeted by the copy command, so the directory value used
+        # in this routine, using only one back-slash, need to be properly changed
+        # first. Paths also need to be double-quoted to prevent failures where
+        # the path contains spaces.
+        $path =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+        my $copy_command =
+          $TestLib::windows_os
+          ? qq{copy "$path\\\\%f" "%p"}
+          : qq{cp "$path/%f" "%p"};
+	$self->append_conf(
+		'recovery.conf', qq(
+restore_command = '$copy_command'
+));
+}
+
 # Internal routine to enable archiving
 sub enable_archiving
 {
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b7b0caa..ec6b9fe 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
 
 # Create and test a standby from given backup, with a certain
 # recovery target.
@@ -45,6 +45,69 @@ sub test_recovery_standby
 	$node_standby->teardown_node;
 }
 
+# Test recovery start point for various recovery targets
+
+sub test_recovery_pitr
+{
+        my $test_name       = shift;
+        my $node_name       = shift;
+        my $node_master     = shift;
+        my $recovery_params = shift;
+	my $num_rows	    = shift;
+
+        my $node_standby = get_new_node($node_name);
+        $node_standby->init_from_backup($node_master, 'my_pitr',
+                has_restoring_pitr => 1);
+
+        foreach my $param_item (@$recovery_params)
+        {
+                $node_standby->append_conf(
+                        'recovery.conf',
+                                qq($param_item
+));
+        }
+        my $ret = $node_standby->start_pitr;
+
+	my $log = $node_standby->logfile;
+	my $irx = "Specified recovery_target_xid is older than the latest Completed Xid";
+        my $irl = "Specified recovery_target_lsn is older than the backup start LSN";
+	my $irt = "Specified recovery_target_time is older than the backup's latest checkpoint time";
+
+        my @rx = `grep -inr "$irx" $log`;
+        my @rl = `grep -inr "$irl" $log`;
+	my @rt = `grep -inr "$irt" $log`;
+
+        if ($ret != 0)
+        {
+                if (@rx)
+                {
+                        print "Recovery could not start as the specified target XID was not legitimate. Test Passed\n";
+                        isnt($ret,0, "Successful $test_name.");
+                        goto TEST_RECOVERY_LSN;
+
+                }
+                elsif(@rl)
+                {
+			print "Recovery could not start as the specified target LSN was not legitimate. Test Passed\n";
+			isnt($ret,0, "Successful $test_name.");
+			goto TEST_RECOVERY_TIME;
+
+                }
+		elsif(@rt)
+		{
+                        print "Recovery could not start as the specified target TIME was not legitimate. Test Passed\n";
+                        isnt($ret,0, "Successful $test_name.");
+                        exit 0;
+		}
+                else
+                {
+                        print "# pg_ctl failed; logfile:\n";
+                        print TestLib::slurp_file($node_standby->logfile);
+                        BAIL_OUT("pg_ctl failed");
+                }
+        }
+}
+
 # Initialize master node
 my $node_master = get_new_node('master');
 $node_master->init(has_archiving => 1, allows_streaming => 1);
@@ -144,3 +207,57 @@ test_recovery_standby('XID + time + name',
 	"recovery_target_lsn = '$recovery_lsn'",);
 test_recovery_standby('XID + time + name + LSN',
 	'standby_9', $node_master, \@recovery_params, "5000", $lsn5);
+
+# Check if the recovery start point is legitimate
+# If the recovery start point is not appropriate, recovery should not start at all
+
+# Generate some data on master to advance the database position
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(4001,5000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(6001,7000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(7001,8000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(8001,9000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(9001,10000))");
+
+print "required data generated.\n";
+
+my $rxid = $node_master->safe_psql('postgres',
+        "SELECT pg_current_wal_location(), txid_current();");
+my ($lsn10, $recovery_lxid) = split /\|/, $rxid;
+
+# Now take a fresh backup
+
+$node_master->backup('my_pitr');
+
+# Test the recovery start point for XID. Specify an old XID which falls prior to the backup's latest XID
+@recovery_params = (
+         "recovery_target_xid  = '$recovery_txid'",);
+
+test_recovery_pitr('Recovery Start Point check for XID',
+                 'pitr_1', $node_master, \@recovery_params, "10000");
+
+# Test the checking of the recovery start point for target LSN. Specify an old recovery target LSN which falls prior to the backup's latest LSN.
+TEST_RECOVERY_LSN:
+my $recovery_olsn = $lsn1;
+@recovery_params = (
+         "recovery_target_lsn  = '$recovery_olsn'",);
+
+test_recovery_pitr('Recovery Start Point check for LSN',
+                 'pitr_2', $node_master, \@recovery_params, "10000");
+
+# Test the checking of the recovery start point for target TIME. Specify an old recovery target TIME which falls prior to the backup's latest Timestamp.
+TEST_RECOVERY_TIME:
+
+my $recovery_otime = $recovery_time;
+@recovery_params = (
+         "recovery_target_time  = '$recovery_otime'",);
+
+test_recovery_pitr('Recovery Start Point check for TIME',
+                 'pitr_3', $node_master, \@recovery_params, "10000");
recoveryTargetIncomplete.patch-version4application/octet-stream; name=recoveryTargetIncomplete.patch-version4Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index a91864b..9053bb0 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -349,6 +349,48 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="recovery-target-incomplete"
+                   xreflabel="recovery_target_incomplete">
+      <term><varname>recovery_target_incomplete</varname> (<type>enum</type>)
+      <indexterm>
+        <primary><varname>recovery_target_incomplete</> recovery parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies what action the server should take once the recovery target is
+        not reached. The default is <literal>promote</>, which means recovery will
+        finish and the server will start to accept connections. <literal>pause</> means the recovery process will be paused.
+        Finally <literal>shutdown</> will stop the server if the recovery cannot proceed any further.
+       </para>
+       <para>
+        The intended use of the <literal>pause</> setting is to allow queries
+        to be executed against the database to check if this recovery target
+        is the most desirable point for recovery.
+        The paused state can be resumed by
+        using <function>pg_xlog_replay_resume()</> (see
+        <xref linkend="functions-recovery-control-table">), which then
+        causes recovery to end. If this recovery target is not the
+        desired stopping point, then shut down the server, change the
+        recovery target settings to a later target and restart to
+        continue recovery.
+       </para>
+       <para>
+        The <literal>shutdown</> setting is useful to have the instance ready
+        at the exact replay point desired.  The instance will still be able to
+        replay more WAL records (and in fact will have to replay WAL records
+        since the last checkpoint next time it is started).
+       </para>
+       <para>
+        Note that because <filename>recovery.conf</> will not be renamed when
+        <varname>recovery_target_incomplete</> is set to <literal>shutdown</>,
+        any subsequent start will end with immediate shutdown unless the
+        configuration is changed or the <filename>recovery.conf</> file is
+        removed manually.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
    </sect1>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8973583..7257a49 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -259,6 +259,7 @@ static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
 static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
+static RecoveryTargetIncomplete recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -838,6 +839,7 @@ static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static void recoveryPausesHere(void);
+static void IncompleteRecoveryPause(void);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
@@ -5163,6 +5165,26 @@ readRecoveryCommandFile(void)
 
 			recoveryTargetActionSet = true;
 		}
+		else if (strcmp(item->name, "recovery_target_incomplete") == 0)
+		{
+			if (strcmp(item->value, "pause") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PAUSE;
+			else if (strcmp(item->value, "promote") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
+			else if (strcmp(item->value, "shutdown") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_SHUTDOWN;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid value for recovery parameter \"%s\": \"%s\"",
+					   "recovery_target_incomplete",
+					   item->value),
+						 errhint("Valid values are \"pause\", \"promote\", and \"shutdown\".")));
+
+			ereport(DEBUG2,
+					(errmsg_internal("recovery_target_incomplete = '%s'",
+									 item->value)));
+		}
 		else if (strcmp(item->name, "recovery_target_timeline") == 0)
 		{
 			rtliGiven = true;
@@ -5889,6 +5911,21 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
+static void
+IncompleteRecoveryPause(void)
+{
+	/* Pause recovery at end-of-the-wal when recovery target is not reached */
+	ereport(LOG,
+			(errmsg("recovery has paused"),
+			 errhint("Execute pg_xlog_replay_resume() to continue.")));
+
+	while (RecoveryIsPaused())
+	{
+		pg_usleep(1000000L);	/* 1000 ms */
+		HandleStartupProcInterrupts();
+	}
+}
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -7205,6 +7242,46 @@ StartupXLOG(void)
 						break;
 				}
 			}
+			else
+			{
+				ereport(LOG,
+						(errmsg("recovery has reached end-of-the-wal and has not reached the recovery target yet"),
+				errhint("This could be due to corrupt or missing WAL files.\n"
+						"All the WAL files needed for the recovery must be available to proceed to the recovery target "
+				"Or you might need to choose an earlier recovery target.")));
+
+				/*
+				 * This is the position where we can choose to shutdown, pause
+				 * or promote at the end-of-the-wal if the intended recovery
+				 * target is not reached
+				 */
+				switch (recoveryTargetIncomplete)
+				{
+
+					case RECOVERY_TARGET_INCOMPLETE_SHUTDOWN:
+
+						/*
+						 * exit with special return code to request shutdown
+						 * of postmaster.  Log messages issued from
+						 * postmaster.
+						 */
+
+						ereport(LOG,
+								(errmsg("shutdown at end-of-the-wal")));
+						proc_exit(2);
+
+					case RECOVERY_TARGET_INCOMPLETE_PAUSE:
+
+						SetRecoveryPause(true);
+						IncompleteRecoveryPause();
+
+						/* drop into promote */
+
+					case RECOVERY_TARGET_INCOMPLETE_PROMOTE:
+						break;
+				}
+
+			}
 
 			/* Allow resource managers to do any required cleanup. */
 			for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 578bff5..b9d707c 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -256,6 +256,17 @@ typedef enum
 } RecoveryTargetAction;
 
 /*
+ * Recovery target incomplete.
+ */
+
+typedef enum
+{
+	RECOVERY_TARGET_INCOMPLETE_PAUSE,
+	RECOVERY_TARGET_INCOMPLETE_PROMOTE,
+	RECOVERY_TARGET_INCOMPLETE_SHUTDOWN
+}	RecoveryTargetIncomplete;
+
+/*
  * Method table for resource managers.
  *
  * This struct must be kept in sync with the PG_RMGR definition in
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 4018f0a..119d43d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -601,6 +601,11 @@ Restoring WAL segments from archives using restore_command can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Restoring WAL segments from archives using restore command to perform PITR
+can be enabled by passing the keyword parameter has_restoring_pitr => 1. This
+is disabled by default. By enabling this parameter, the standby database will
+not operate in standby mode.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -618,6 +623,7 @@ sub init_from_backup
 	$params{hba_permit_replication} = 1
 	  unless defined $params{hba_permit_replication};
 	$params{has_restoring} = 0 unless defined $params{has_restoring};
+	$params{has_restoring_pitr} = 0 unless defined $params{has_restoring_pitr};
 
 	print
 "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n";
@@ -641,6 +647,7 @@ port = $port
 	$self->set_replication_conf         if $params{hba_permit_replication};
 	$self->enable_streaming($root_node) if $params{has_streaming};
 	$self->enable_restoring($root_node) if $params{has_restoring};
+	$self->enable_restoring_pitr($root_node) if $params{has_restoring_pitr};
 }
 
 =pod
@@ -673,6 +680,49 @@ sub start
 	$self->_update_pid;
 }
 
+=item $node->start_pitr()
+
+Wrapper for pg_ctl start
+
+Start the node and wait until it is ready to accept connections.
+
+=cut
+
+sub start_pitr
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+	my $log	   = $self->logfile;
+	print("### Starting node \"$name\"\n");
+	my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+               $self->logfile, 'start');
+
+        if ($ret != 0)
+        {
+                my $s = "shutdown at end-of-the-wal";
+
+                my @rs = `grep $s, $log`;
+
+                if (@rs)
+                {
+                        print "Database was shutdown at end-of-the-wal. Test Passed\n";
+			isnt($ret,0, "check the PITR node for successful SHUTDOWN at incomplete recovery");
+			exit 0;
+
+                }
+		else
+		{
+			print "# pg_ctl failed; logfile:\n";
+			print TestLib::slurp_file($self->logfile);
+			BAIL_OUT("pg_ctl failed");
+		}
+        }
+
+        $self->_update_pid;
+}
+
 =pod
 
 =item $node->stop(mode)
@@ -797,6 +847,32 @@ standby_mode = on
 ));
 }
 
+# Internal routine to enable archive recovery command on a standby node for PITR
+sub enable_restoring_pitr
+{
+        my ($self, $root_node) = @_;
+        my $path = $root_node->archive_dir;
+        my $name = $self->name;
+
+        print "### Enabling WAL restore for node \"$name\"\n";
+
+        # On Windows, the path specified in the restore command needs to use
+        # double back-slashes to work properly and to be able to detect properly
+        # the file targeted by the copy command, so the directory value used
+        # in this routine, using only one back-slash, need to be properly changed
+        # first. Paths also need to be double-quoted to prevent failures where
+        # the path contains spaces.
+        $path =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+        my $copy_command =
+          $TestLib::windows_os
+          ? qq{copy "$path\\\\%f" "%p"}
+          : qq{cp "$path/%f" "%p"};
+	$self->append_conf(
+		'recovery.conf', qq(
+restore_command = '$copy_command'
+));
+}
+
 # Internal routine to enable archiving
 sub enable_archiving
 {
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b7b0caa..e8e9d14 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
 
 # Create and test a standby from given backup, with a certain
 # recovery target.
@@ -27,9 +27,7 @@ sub test_recovery_standby
 			qq($param_item
 ));
 	}
-
 	$node_standby->start;
-
 	# Wait until standby has replayed enough data
 	my $caughtup_query =
 	  "SELECT '$until_lsn'::pg_lsn <= pg_last_wal_replay_location()";
@@ -41,6 +39,44 @@ sub test_recovery_standby
 	  $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 	is($result, qq($num_rows), "check standby content for $test_name");
 
+        print "Rows in PITR  : $result\n";
+        print "Rows in master: $num_rows\n";
+
+	# Stop standby node
+	$node_standby->teardown_node;
+}
+
+
+# Create a node from given backup to perform PITR to a certain recovery target.
+
+sub test_recovery_pitr
+{
+        my $test_name       = shift;
+        my $node_name       = shift;
+        my $node_master     = shift;
+        my $recovery_params = shift;
+        my $num_rows        = shift;
+        my $until_lsn       = shift;
+
+	my $node_standby = get_new_node($node_name);
+	$node_standby->init_from_backup($node_master, 'my_pitr',
+		has_restoring_pitr => 1);
+
+	foreach my $param_item (@$recovery_params)
+	{
+		$node_standby->append_conf(
+			'recovery.conf',
+				qq($param_item
+));
+	}
+	$node_standby->start_pitr;
+	# Check the content on pitr
+	my $result =
+		$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+	isnt($result, qq($num_rows), "check the PITR node for successful $test_name");
+	print "Rows in PITR  : $result\n";
+	print "Rows in master: $num_rows\n";
+
 	# Stop standby node
 	$node_standby->teardown_node;
 }
@@ -144,3 +180,72 @@ test_recovery_standby('XID + time + name',
 	"recovery_target_lsn = '$recovery_lsn'",);
 test_recovery_standby('XID + time + name + LSN',
 	'standby_9', $node_master, \@recovery_params, "5000", $lsn5);
+
+# Test Incomplete recovery
+# Initialize master node
+
+$node_master = get_new_node('master1');
+$node_master->init(has_archiving => 1, allows_streaming => 1);
+
+# Start it
+$node_master->start;
+
+# Create data before taking the backup, aimed at testing
+
+$node_master->safe_psql('postgres',
+         "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+
+# Take backup from which all operations will be run
+$node_master->backup('my_pitr');
+
+# Generate enough data and more WAL Archives for a recovery target reference.
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(1001,10000))");
+# Force archiving of WAL file
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(10001,20000))");
+# Force archiving of WAL file
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(20001,30000))");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(30001,40000))");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(40001,60000))");
+
+# current wal position in the master node
+$ret = $node_master->safe_psql('postgres',
+        "SELECT pg_current_wal_location(), txid_current();");
+my ($lsn6, $recovery_xid) = split /\|/, $ret;
+
+# Test the server promotion when the recovery fails to reach the recovery target.
+my $recovery_target_incomplete='promote';
+
+@recovery_params = (
+	"recovery_target_xid = '$recovery_xid'",
+	"recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('PROMOTION at incomplete recovery','pitr_1',$node_master, \@recovery_params, "60000", $lsn6);
+
+# Check if the server pauses when the recovery stops mid-way without reaching the recovery target.
+
+$recovery_target_incomplete='pause';
+
+@recovery_params = (
+        "recovery_target_xid = '$recovery_xid'",
+        "recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('PAUSE at incomplete recovery','pitr_2',$node_master, \@recovery_params, "60000", $lsn6);
+
+# Check if the server successfully shuts down when the recovery stops mid-way without reaching the recovery target.
+
+$recovery_target_incomplete='shutdown';
+
+@recovery_params = (
+        "recovery_target_xid = '$recovery_xid'",
+        "recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('SHUTDOWN at incomplete recovery','pitr_3',$node_master, \@recovery_params, "60000", $lsn6);
#35David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#34)
Re: patch proposal

Hi Venkata,

On 2/28/17 11:59 PM, Venkata B Nagothi wrote:

On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi <nag1010@gmail.com
<mailto:nag1010@gmail.com>> wrote:
On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

Do you know when those will be ready?

Attached are both the patches with tests included.

Thanks for adding the tests. They bring clarity to the patch.

Unfortunately, I don't think the first patch (recoveryStartPoint) will
work as currently implemented. The problem I see is that the new
function recoveryStartsHere() depends on pg_control containing a
checkpoint right at the end of the backup. There's no guarantee that
this is true, even if pg_control is copied last. That means a time,
lsn, or xid that occurs somewhere in the middle of the backup can be
selected without complaint from this code depending on timing.

The tests pass (or rather fail as expected) because they are written
using values before the start of the backup. It's actually the end of
the backup (where the database becomes consistent on recovery) that
defines where PITR can start and this distinction definitely matters for
very long backups. A better test would be to start the backup, get a
time/lsn/xid after writing some data, and then make sure that
time/lsn/xid is flagged as invalid on recovery.

It is also problematic to assume that transaction IDs commit in order.
If checkPoint.latestCompletedXid contains 5 then a recovery to xid 4 may
or may not be successful depending on the commit order. However, it
appears in this case the patch would disallow recovery to 4.

I think this logic would need to be baked into recoveryStopsAfter()
and/or recoveryStopsBefore() in order to work. It's not clear to me
what that would look like, however, or if the xid check is even practical.

Regards,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Venkata B Nagothi
nag1010@gmail.com
In reply to: David Steele (#35)
Re: patch proposal

On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david@pgmasters.net> wrote:

Hi Venkata,

On 2/28/17 11:59 PM, Venkata B Nagothi wrote:

On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi <nag1010@gmail.com
<mailto:nag1010@gmail.com>> wrote:
On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

Do you know when those will be ready?

Attached are both the patches with tests included.

Thanks for adding the tests. They bring clarity to the patch.

Thank you again reviewing the patch and your comments !

Unfortunately, I don't think the first patch (recoveryStartPoint) will
work as currently implemented. The problem I see is that the new function
recoveryStartsHere() depends on pg_control containing a checkpoint right at
the end of the backup. There's no guarantee that this is true, even if
pg_control is copied last. That means a time, lsn, or xid that occurs
somewhere in the middle of the backup can be selected without complaint
from this code depending on timing.

Yes, that is true. The latest best position, checkpoint position, xid and
timestamp of the restored backup of the backup is shown up in the
pg_controldata, which means, that is the position from which the recovery
would start. Which in-turn means, WALs start getting replayed from that
position towards --> minimum recovery position (which is the end backup,
which again means applying WALs generated between start and the end backup)
all the way through to --> recovery target position. The best start
position to check with would the position shown up in the pg_control file,
which is way much better compared to the current postgresql behaviour.

The tests pass (or rather fail as expected) because they are written using

values before the start of the backup. It's actually the end of the backup
(where the database becomes consistent on recovery) that defines where PITR
can start and this distinction definitely matters for very long backups. A
better test would be to start the backup, get a time/lsn/xid after writing
some data, and then make sure that time/lsn/xid is flagged as invalid on
recovery.

Yes, i agree, the databases restored from the backup would start the
recovery and would become consistent once the end-backup is reached. The
point here is that, when the backup is restored and recovery proceeds
further, there is NO WAY to identify the next consistent position or
end-position of the backup. This patch is only implementing a check to
ensure recovery starts and proceeds at the right position instead of
pre-maturely getting promoted which is the current behaviour.

The current behaviour is that, if the XID shown-up by the pg_controldata is
say 1400 and the specified recovery_target_xid is say 200, then, postgresql
just completes the recovery and promotes at the immediate consistent
recovery target, which means, the DBA has to all the way start the
restoration and recovery process again to promote the database at the
intended position.

It is also problematic to assume that transaction IDs commit in order. If

checkPoint.latestCompletedXid contains 5 then a recovery to xid 4 may or
may not be successful depending on the commit order. However, it appears
in this case the patch would disallow recovery to 4.

If the txid 4 is the recovery target xid, then, the appropriate backup
taken previous to txid 4 must be used or as an alternative recovery target
like recovery_target_timestamp must be used to proceed to the respective
recovery target xid.

I think this logic would need to be baked into recoveryStopsAfter() and/or
recoveryStopsBefore() in order to work. It's not clear to me what that
would look like, however, or if the xid check is even practical.

The above two functions would determine if the recovery process has to stop
at a particular position or not and the patch has nothing to do with it.

To summarise, this patch only determines if the WAL replay has to start at
all. In other words, if the specified recovery target falls prior to the
restored database position, then, the WAL replay should not start, which
prevent the database from getting promoted at an unintended recovery target.

Any thoughts/comments ?

Regards,

Venkata Balaji N
Database Consultant

#37David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#36)
Re: patch proposal

On 3/21/17 8:45 PM, Venkata B Nagothi wrote:

On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david@pgmasters.net

Unfortunately, I don't think the first patch (recoveryStartPoint)
will work as currently implemented. The problem I see is that the
new function recoveryStartsHere() depends on pg_control containing a
checkpoint right at the end of the backup. There's no guarantee
that this is true, even if pg_control is copied last. That means a
time, lsn, or xid that occurs somewhere in the middle of the backup
can be selected without complaint from this code depending on timing.

Yes, that is true. The latest best position, checkpoint position, xid
and timestamp of the restored backup of the backup is shown up in the
pg_controldata, which means, that is the position from which the
recovery would start.

Backup recovery starts from the checkpoint in the backup_label, not from
the checkpoint in pg_control. The original checkpoint that started the
backup is generally overwritten in pg_control by the end of the backup.

Which in-turn means, WALs start getting replayed
from that position towards --> minimum recovery position (which is the
end backup, which again means applying WALs generated between start and
the end backup) all the way through to --> recovery target position.

minRecoveryPoint is only used when recovering a backup that was made
from a standby. For backups taken on the master, the backup end WAL
record is used.

The best start position to check with would the position shown up in the
pg_control file, which is way much better compared to the current
postgresql behaviour.

I don't agree, for the reasons given previously.

The tests pass (or rather fail as expected) because they are written
using values before the start of the backup. It's actually the end
of the backup (where the database becomes consistent on recovery)
that defines where PITR can start and this distinction definitely
matters for very long backups. A better test would be to start the
backup, get a time/lsn/xid after writing some data, and then make
sure that time/lsn/xid is flagged as invalid on recovery.

The current behaviour is that, if the XID shown-up by the pg_controldata
is say 1400 and the specified recovery_target_xid is say 200, then,
postgresql just completes the recovery and promotes at the immediate
consistent recovery target, which means, the DBA has to all the way
start the restoration and recovery process again to promote the database
at the intended position.

I'm not saying that the current behavior is good, only that the proposed
behavior is incorrect as far as I can tell.

It is also problematic to assume that transaction IDs commit in
order. If checkPoint.latestCompletedXid contains 5 then a recovery
to xid 4 may or may not be successful depending on the commit
order. However, it appears in this case the patch would disallow
recovery to 4.

If the txid 4 is the recovery target xid, then, the appropriate backup
taken previous to txid 4 must be used or as an alternative recovery
target like recovery_target_timestamp must be used to proceed to the
respective recovery target xid.

I'm not sure I follow you here, but I do know that using the last
committed xid says little about which xids precede or follow it.

I think this logic would need to be baked into recoveryStopsAfter()
and/or recoveryStopsBefore() in order to work. It's not clear to me
what that would look like, however, or if the xid check is even
practical.

The above two functions would determine if the recovery process has to
stop at a particular position or not and the patch has nothing to do
with it.

To summarise, this patch only determines if the WAL replay has to start
at all. In other words, if the specified recovery target falls prior to
the restored database position, then, the WAL replay should not start,
which prevent the database from getting promoted at an unintended
recovery target.

I understand what you are trying to do. My point is that the
information you are working from (whatever checkpoint happens to be in
pg_control when recovery starts) is not sufficient for the task. This
method is inaccurate and would disallow valid recovery scenarios.

I suggest doing a thorough read-through of StartupXLOG(), particularly
the section where the backup_label is loaded.

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Venkata B Nagothi
nag1010@gmail.com
In reply to: David Steele (#37)
2 attachment(s)
Re: patch proposal

Hi David,

On Thu, Mar 23, 2017 at 4:21 AM, David Steele <david@pgmasters.net> wrote:

On 3/21/17 8:45 PM, Venkata B Nagothi wrote:

On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david@pgmasters.net

Unfortunately, I don't think the first patch (recoveryStartPoint)
will work as currently implemented. The problem I see is that the
new function recoveryStartsHere() depends on pg_control containing a
checkpoint right at the end of the backup. There's no guarantee
that this is true, even if pg_control is copied last. That means a
time, lsn, or xid that occurs somewhere in the middle of the backup
can be selected without complaint from this code depending on timing.

Yes, that is true. The latest best position, checkpoint position, xid
and timestamp of the restored backup of the backup is shown up in the
pg_controldata, which means, that is the position from which the
recovery would start.

Backup recovery starts from the checkpoint in the backup_label, not from
the checkpoint in pg_control. The original checkpoint that started the
backup is generally overwritten in pg_control by the end of the backup.

Yes, I totally agree. My initial intention was to compare the recovery
target position(s) with the contents in the backup_label, but, then, the
checks would fails if the recovery is performed without the backup_label
file. Then, i decided to compare the recovery target positions with the
contents in the pg_control file.

Which in-turn means, WALs start getting replayed

from that position towards --> minimum recovery position (which is the
end backup, which again means applying WALs generated between start and
the end backup) all the way through to --> recovery target position.

minRecoveryPoint is only used when recovering a backup that was made from
a standby. For backups taken on the master, the backup end WAL record is
used.

The best start position to check with would the position shown up in the

pg_control file, which is way much better compared to the current
postgresql behaviour.

I don't agree, for the reasons given previously.

As explained above, my intention was to ensure that the recovery start
positions checks are successfully performed irrespective of the presence of
the backup_label file.

I did some analysis before deciding to use pg_controldata's output instead
of backup_label file contents.

Comparing the output of the pg_controldata with the contents of
backup_label contents.

*Recovery Target LSN*

START WAL LOCATION (which is 0/9C000028) in the backup_label =
pg_controldata's latest checkpoint's REDO location (Latest checkpoint's
REDO location: 0/9C000028)

*Recovery Target TIME*

backup start time in the backup_label (START TIME: 2017-03-26 11:55:46
AEDT) = pg_controldata's latest checkpoint time (Time of latest checkpoint
: Sun 26 Mar 2017 11:55:46 AM AEDT)

*Recovery Target XID*

To begin with backup_label does contain any start XID. So, the only option
is to depend on pg_controldata's output.
After a few quick tests and thorough observation, i do notice that, the
pg_control file information is copied as it is to the backup location at
the pg_start_backup. I performed some quick tests by running few
transactions between pg_start_backup and pg_stop_backup. So, i believe,
this is ideal start point for WAL replay.

Am i missing anything here ?

The tests pass (or rather fail as expected) because they are written

using values before the start of the backup. It's actually the end
of the backup (where the database becomes consistent on recovery)
that defines where PITR can start and this distinction definitely
matters for very long backups. A better test would be to start the
backup, get a time/lsn/xid after writing some data, and then make
sure that time/lsn/xid is flagged as invalid on recovery.

The current behaviour is that, if the XID shown-up by the pg_controldata
is say 1400 and the specified recovery_target_xid is say 200, then,
postgresql just completes the recovery and promotes at the immediate
consistent recovery target, which means, the DBA has to all the way
start the restoration and recovery process again to promote the database
at the intended position.

I'm not saying that the current behavior is good, only that the proposed
behavior is incorrect as far as I can tell.

Please consider my explanation above and share your thoughts.

It is also problematic to assume that transaction IDs commit in

order. If checkPoint.latestCompletedXid contains 5 then a recovery
to xid 4 may or may not be successful depending on the commit
order. However, it appears in this case the patch would disallow
recovery to 4.

If the txid 4 is the recovery target xid, then, the appropriate backup
taken previous to txid 4 must be used or as an alternative recovery
target like recovery_target_timestamp must be used to proceed to the
respective recovery target xid.

I'm not sure I follow you here, but I do know that using the last
committed xid says little about which xids precede or follow it.

Yes, I agree, latestCompletedXID only says about the latest completed XID
and has nothing to do with the order of XIDs being committed. It just shows
the latest completed one and that is the natural and expected behaviour -
isn't it ?

Sure. I will try to explain. The point i wanted to make is that, it is kind
of difficult to identify the XIDs which got committed in the order and the
only option we have is to look at the pg_controldata information and also,
backup_label file does not contain any such information about XID.

Generally, in real-time when DBAs intend to perform recovery until a
particular point-in-time by choosing a recovery target in the form of XID,
TIMESTAMP or LSN depending on what ever information they have or whatever
target they feel is appropriate to use. If the XID seems to be
in-appropriate to use (may be due to the order in which they got
committed), then, the equivalent target like timestamp would be chosen to
perform recovery. Hope i am clear here.

Having said that, the only other options which i see here is to use
oldestXID, oldestActiveXID or NextXID and none of them seem appropriate to
use.

I think this logic would need to be baked into recoveryStopsAfter()

and/or recoveryStopsBefore() in order to work. It's not clear to me
what that would look like, however, or if the xid check is even
practical.

The above two functions would determine if the recovery process has to
stop at a particular position or not and the patch has nothing to do
with it.

To summarise, this patch only determines if the WAL replay has to start
at all. In other words, if the specified recovery target falls prior to
the restored database position, then, the WAL replay should not start,
which prevent the database from getting promoted at an unintended
recovery target.

I understand what you are trying to do. My point is that the information
you are working from (whatever checkpoint happens to be in pg_control when
recovery starts) is not sufficient for the task. This method is inaccurate
and would disallow valid recovery scenarios.

Please consider my explanations above.

I suggest doing a thorough read-through of StartupXLOG(), particularly the
section where the backup_label is loaded.

As communicated above, not choosing to use the backup_label file contents
for recovery start point check was intentional. I will be happy to hear any
comments/objections on that.

Also, the patch does not apply to the latest master and attached is the
updated patch re-based with the latest master. I have also attached the
re-based 2nd patch.

Thank you again for your comments on my patch.

Regards,

Venkata B N
Database Consultant

Attachments:

recoveryStartPoint.patch-version5application/octet-stream; name=recoveryStartPoint.patch-version5Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index a91864b..73ef387 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -211,6 +211,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the checkpoint time of the backup being used is earlier than the specified recovery_target_time. 
+        In other words, the recovery will not proceed further if the checkpoint time of the backup is later than the specified recovery_target_time.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -231,6 +236,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         The precise stopping point is also influenced by
         <xref linkend="recovery-target-inclusive">.
        </para>
+       <para>
+        This parameter also checks if the latest completed xid of the backup is earlier than the specified recovery target xid.
+        In other words, the recovery will not proceed further if the latest completed xid of the backup is later than the specified target xid.
+       </para>
+
       </listitem>
      </varlistentry>
 
@@ -248,6 +258,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
         parameter is parsed using the system data type
         <link linkend="datatype-pg-lsn"><type>pg_lsn</></link>.
        </para>
+       <para>
+        This parameter also checks if the current checkpoint's redo position of the backup is earlier than the specified recovery target lsn.
+        In other words, the recovery will not proceed further if the latest checkpoint's redo position of the backup is later than the specified target lsn.
+       </para>
+
       </listitem>
      </varlistentry>
      </variablelist>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 58790e0..787a61a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -841,6 +841,7 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoveryCommandFile(void);
+static void recoveryStartsHere(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
@@ -5601,6 +5602,66 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 	return false;
 }
 
+/* When performing point-in-time-recovery, this function identifies if
+ * the specified recovery target (recovery_target_time, recovery_target_lsn and recovery_target_xid) is prior to that of the backup.
+ * Which means, recovery cannot proceed if the recovery target point is prior to backup start point.
+ */
+
+static void
+recoveryStartsHere(void)
+{
+	/*
+	 * Check if the recovery target xid is older than the latest completed xid
+	 * of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_XID)
+	{
+		if (TransactionIdPrecedes(recoveryTargetXid,
+							 ControlFile->checkPointCopy.latestCompletedXid))
+		{
+			ereport(ERROR,
+					(errmsg("Specified recovery_target_xid is older than the latest Completed Xid which is %u", ControlFile->checkPointCopy.latestCompletedXid),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target lsn is prior to the latest checkpointi's
+	 * redo position of the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_LSN)
+	{
+		if (recoveryTargetLSN < ControlFile->checkPointCopy.redo)
+		{
+			ereport(ERROR,
+					(errmsg("Specified recovery_target_lsn is older than the backup start LSN which is \"%X/%X\"",
+								(uint32) (ControlFile->checkPointCopy.redo >> 32), (uint32) ControlFile->checkPointCopy.redo),
+					 errhint("This means that the backup being used is much later than the recovery target position.\n"
+							 "You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+
+	/*
+	 * Check if the recovery target time is prior to the current timestamp of
+	 * the backup
+	 */
+
+	if (recoveryTarget == RECOVERY_TARGET_TIME)
+	{
+		if (recoveryTargetTime < time_t_to_timestamptz(ControlFile->checkPointCopy.time))
+		{
+			ereport(ERROR,
+					(errmsg("Specified recovery_target_time is older than the backup's latest checkpoint time which is %s",
+											pstrdup(str_time(ControlFile->checkPointCopy.time))),
+					errhint("This means that the backup being used is much later than the recovery target position.\n"
+						"You might need to use a backup taken prior to the recovery target point.")));
+		}
+	}
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -6334,6 +6395,9 @@ StartupXLOG(void)
 					(errmsg("starting archive recovery")));
 	}
 
+	/* Check if archive recovery can start at all */
+	recoveryStartsHere();
+
 	/*
 	 * Take ownership of the wakeup latch if we're going to sleep during
 	 * recovery.
@@ -8705,6 +8769,7 @@ CreateCheckPoint(int flags)
 	checkPoint.nextXid = ShmemVariableCache->nextXid;
 	checkPoint.oldestXid = ShmemVariableCache->oldestXid;
 	checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
+	checkPoint.latestCompletedXid = ShmemVariableCache->latestCompletedXid;
 	LWLockRelease(XidGenLock);
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 2ea8931..d70a221 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -228,6 +228,8 @@ main(int argc, char *argv[])
 	printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
 		   ControlFile->checkPointCopy.nextXidEpoch,
 		   ControlFile->checkPointCopy.nextXid);
+	printf(_("Latest checkpoint's latestCompletedXID: %u\n"),
+		   ControlFile->checkPointCopy.latestCompletedXid);
 	printf(_("Latest checkpoint's NextOID:          %u\n"),
 		   ControlFile->checkPointCopy.nextOid);
 	printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 3a25cc8..898cec7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -60,6 +60,7 @@ typedef struct CheckPoint
 	 * set to InvalidTransactionId.
 	 */
 	TransactionId oldestActiveXid;
+	TransactionId latestCompletedXid;
 } CheckPoint;
 
 /* XLOG info values for XLOG rmgr */
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 5ef007f..a448443 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -589,6 +589,11 @@ Restoring WAL segments from archives using restore_command can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Restoring WAL segments from archives using restore command to perform PITR
+can be enabled by passing the keyword parameter has_restoring_pitr => 1. This
+is disabled by default. By enabling this parameter, the standby database will
+not operate in standby mode.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -604,6 +609,7 @@ sub init_from_backup
 
 	$params{has_streaming} = 0 unless defined $params{has_streaming};
 	$params{has_restoring} = 0 unless defined $params{has_restoring};
+	$params{has_restoring_pitr} = 0 unless defined $params{has_restoring_pitr};
 
 	print
 "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n";
@@ -626,6 +632,7 @@ port = $port
 ));
 	$self->enable_streaming($root_node) if $params{has_streaming};
 	$self->enable_restoring($root_node) if $params{has_restoring};
+	$self->enable_restoring_pitr($root_node) if $params{has_restoring_pitr};
 }
 
 =pod
@@ -658,6 +665,31 @@ sub start
 	$self->_update_pid;
 }
 
+
+=pod
+
+=item $node->start_pitr()
+
+Wrapper for pg_ctl start
+
+Start the node and wait until it is ready to accept connections.
+
+=cut
+
+sub start_pitr
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+        my $log    = $self->logfile;
+        print("### Starting node \"$name\"\n");
+        my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+               $self->logfile, 'start');
+
+	return $ret;
+}
+
 =pod
 
 =item $node->stop(mode)
@@ -782,6 +814,32 @@ standby_mode = on
 ));
 }
 
+# Internal routine to enable archive recovery command on a standby node for PITR
+sub enable_restoring_pitr
+{
+        my ($self, $root_node) = @_;
+        my $path = $root_node->archive_dir;
+        my $name = $self->name;
+
+        print "### Enabling WAL restore for node \"$name\"\n";
+
+        # On Windows, the path specified in the restore command needs to use
+        # double back-slashes to work properly and to be able to detect properly
+        # the file targeted by the copy command, so the directory value used
+        # in this routine, using only one back-slash, need to be properly changed
+        # first. Paths also need to be double-quoted to prevent failures where
+        # the path contains spaces.
+        $path =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+        my $copy_command =
+          $TestLib::windows_os
+          ? qq{copy "$path\\\\%f" "%p"}
+          : qq{cp "$path/%f" "%p"};
+	$self->append_conf(
+		'recovery.conf', qq(
+restore_command = '$copy_command'
+));
+}
+
 # Internal routine to enable archiving
 sub enable_archiving
 {
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b7b0caa..ec6b9fe 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
 
 # Create and test a standby from given backup, with a certain
 # recovery target.
@@ -45,6 +45,69 @@ sub test_recovery_standby
 	$node_standby->teardown_node;
 }
 
+# Test recovery start point for various recovery targets
+
+sub test_recovery_pitr
+{
+        my $test_name       = shift;
+        my $node_name       = shift;
+        my $node_master     = shift;
+        my $recovery_params = shift;
+	my $num_rows	    = shift;
+
+        my $node_standby = get_new_node($node_name);
+        $node_standby->init_from_backup($node_master, 'my_pitr',
+                has_restoring_pitr => 1);
+
+        foreach my $param_item (@$recovery_params)
+        {
+                $node_standby->append_conf(
+                        'recovery.conf',
+                                qq($param_item
+));
+        }
+        my $ret = $node_standby->start_pitr;
+
+	my $log = $node_standby->logfile;
+	my $irx = "Specified recovery_target_xid is older than the latest Completed Xid";
+        my $irl = "Specified recovery_target_lsn is older than the backup start LSN";
+	my $irt = "Specified recovery_target_time is older than the backup's latest checkpoint time";
+
+        my @rx = `grep -inr "$irx" $log`;
+        my @rl = `grep -inr "$irl" $log`;
+	my @rt = `grep -inr "$irt" $log`;
+
+        if ($ret != 0)
+        {
+                if (@rx)
+                {
+                        print "Recovery could not start as the specified target XID was not legitimate. Test Passed\n";
+                        isnt($ret,0, "Successful $test_name.");
+                        goto TEST_RECOVERY_LSN;
+
+                }
+                elsif(@rl)
+                {
+			print "Recovery could not start as the specified target LSN was not legitimate. Test Passed\n";
+			isnt($ret,0, "Successful $test_name.");
+			goto TEST_RECOVERY_TIME;
+
+                }
+		elsif(@rt)
+		{
+                        print "Recovery could not start as the specified target TIME was not legitimate. Test Passed\n";
+                        isnt($ret,0, "Successful $test_name.");
+                        exit 0;
+		}
+                else
+                {
+                        print "# pg_ctl failed; logfile:\n";
+                        print TestLib::slurp_file($node_standby->logfile);
+                        BAIL_OUT("pg_ctl failed");
+                }
+        }
+}
+
 # Initialize master node
 my $node_master = get_new_node('master');
 $node_master->init(has_archiving => 1, allows_streaming => 1);
@@ -144,3 +207,57 @@ test_recovery_standby('XID + time + name',
 	"recovery_target_lsn = '$recovery_lsn'",);
 test_recovery_standby('XID + time + name + LSN',
 	'standby_9', $node_master, \@recovery_params, "5000", $lsn5);
+
+# Check if the recovery start point is legitimate
+# If the recovery start point is not appropriate, recovery should not start at all
+
+# Generate some data on master to advance the database position
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(4001,5000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(6001,7000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(7001,8000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(8001,9000))");
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(9001,10000))");
+
+print "required data generated.\n";
+
+my $rxid = $node_master->safe_psql('postgres',
+        "SELECT pg_current_wal_location(), txid_current();");
+my ($lsn10, $recovery_lxid) = split /\|/, $rxid;
+
+# Now take a fresh backup
+
+$node_master->backup('my_pitr');
+
+# Test the recovery start point for XID. Specify an old XID which falls prior to the backup's latest XID
+@recovery_params = (
+         "recovery_target_xid  = '$recovery_txid'",);
+
+test_recovery_pitr('Recovery Start Point check for XID',
+                 'pitr_1', $node_master, \@recovery_params, "10000");
+
+# Test the checking of the recovery start point for target LSN. Specify an old recovery target LSN which falls prior to the backup's latest LSN.
+TEST_RECOVERY_LSN:
+my $recovery_olsn = $lsn1;
+@recovery_params = (
+         "recovery_target_lsn  = '$recovery_olsn'",);
+
+test_recovery_pitr('Recovery Start Point check for LSN',
+                 'pitr_2', $node_master, \@recovery_params, "10000");
+
+# Test the checking of the recovery start point for target TIME. Specify an old recovery target TIME which falls prior to the backup's latest Timestamp.
+TEST_RECOVERY_TIME:
+
+my $recovery_otime = $recovery_time;
+@recovery_params = (
+         "recovery_target_time  = '$recovery_otime'",);
+
+test_recovery_pitr('Recovery Start Point check for TIME',
+                 'pitr_3', $node_master, \@recovery_params, "10000");
recoveryTargetIncomplete.patch-version5application/octet-stream; name=recoveryTargetIncomplete.patch-version5Download
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index a91864b..9053bb0 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -349,6 +349,48 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="recovery-target-incomplete"
+                   xreflabel="recovery_target_incomplete">
+      <term><varname>recovery_target_incomplete</varname> (<type>enum</type>)
+      <indexterm>
+        <primary><varname>recovery_target_incomplete</> recovery parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies what action the server should take once the recovery target is
+        not reached. The default is <literal>promote</>, which means recovery will
+        finish and the server will start to accept connections. <literal>pause</> means the recovery process will be paused.
+        Finally <literal>shutdown</> will stop the server if the recovery cannot proceed any further.
+       </para>
+       <para>
+        The intended use of the <literal>pause</> setting is to allow queries
+        to be executed against the database to check if this recovery target
+        is the most desirable point for recovery.
+        The paused state can be resumed by
+        using <function>pg_xlog_replay_resume()</> (see
+        <xref linkend="functions-recovery-control-table">), which then
+        causes recovery to end. If this recovery target is not the
+        desired stopping point, then shut down the server, change the
+        recovery target settings to a later target and restart to
+        continue recovery.
+       </para>
+       <para>
+        The <literal>shutdown</> setting is useful to have the instance ready
+        at the exact replay point desired.  The instance will still be able to
+        replay more WAL records (and in fact will have to replay WAL records
+        since the last checkpoint next time it is started).
+       </para>
+       <para>
+        Note that because <filename>recovery.conf</> will not be renamed when
+        <varname>recovery_target_incomplete</> is set to <literal>shutdown</>,
+        any subsequent start will end with immediate shutdown unless the
+        configuration is changed or the <filename>recovery.conf</> file is
+        removed manually.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
    </sect1>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 58790e0..4969519 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -260,6 +260,7 @@ static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
 static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
+static RecoveryTargetIncomplete recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -845,6 +846,7 @@ static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static void recoveryPausesHere(void);
+static void IncompleteRecoveryPause(void);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
@@ -5220,6 +5222,26 @@ readRecoveryCommandFile(void)
 
 			recoveryTargetActionSet = true;
 		}
+		else if (strcmp(item->name, "recovery_target_incomplete") == 0)
+		{
+			if (strcmp(item->value, "pause") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PAUSE;
+			else if (strcmp(item->value, "promote") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_PROMOTE;
+			else if (strcmp(item->value, "shutdown") == 0)
+				recoveryTargetIncomplete = RECOVERY_TARGET_INCOMPLETE_SHUTDOWN;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid value for recovery parameter \"%s\": \"%s\"",
+					   "recovery_target_incomplete",
+					   item->value),
+						 errhint("Valid values are \"pause\", \"promote\", and \"shutdown\".")));
+
+			ereport(DEBUG2,
+					(errmsg_internal("recovery_target_incomplete = '%s'",
+									 item->value)));
+		}
 		else if (strcmp(item->name, "recovery_target_timeline") == 0)
 		{
 			rtliGiven = true;
@@ -5946,6 +5968,21 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
+static void
+IncompleteRecoveryPause(void)
+{
+	/* Pause recovery at end-of-the-wal when recovery target is not reached */
+	ereport(LOG,
+			(errmsg("recovery has paused"),
+			 errhint("Execute pg_xlog_replay_resume() to continue.")));
+
+	while (RecoveryIsPaused())
+	{
+		pg_usleep(1000000L);	/* 1000 ms */
+		HandleStartupProcInterrupts();
+	}
+}
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -7263,6 +7300,46 @@ StartupXLOG(void)
 						break;
 				}
 			}
+			else
+			{
+				ereport(LOG,
+						(errmsg("recovery has reached end-of-the-wal and has not reached the recovery target yet"),
+				errhint("This could be due to corrupt or missing WAL files.\n"
+						"All the WAL files needed for the recovery must be available to proceed to the recovery target "
+				"Or you might need to choose an earlier recovery target.")));
+
+				/*
+				 * This is the position where we can choose to shutdown, pause
+				 * or promote at the end-of-the-wal if the intended recovery
+				 * target is not reached
+				 */
+				switch (recoveryTargetIncomplete)
+				{
+
+					case RECOVERY_TARGET_INCOMPLETE_SHUTDOWN:
+
+						/*
+						 * exit with special return code to request shutdown
+						 * of postmaster.  Log messages issued from
+						 * postmaster.
+						 */
+
+						ereport(LOG,
+								(errmsg("shutdown at end-of-the-wal")));
+						proc_exit(2);
+
+					case RECOVERY_TARGET_INCOMPLETE_PAUSE:
+
+						SetRecoveryPause(true);
+						IncompleteRecoveryPause();
+
+						/* drop into promote */
+
+					case RECOVERY_TARGET_INCOMPLETE_PROMOTE:
+						break;
+				}
+
+			}
 
 			/* Allow resource managers to do any required cleanup. */
 			for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c09c0f8..380c3ee 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -256,6 +256,17 @@ typedef enum
 } RecoveryTargetAction;
 
 /*
+ * Recovery target incomplete.
+ */
+
+typedef enum
+{
+	RECOVERY_TARGET_INCOMPLETE_PAUSE,
+	RECOVERY_TARGET_INCOMPLETE_PROMOTE,
+	RECOVERY_TARGET_INCOMPLETE_SHUTDOWN
+}	RecoveryTargetIncomplete;
+
+/*
  * Method table for resource managers.
  *
  * This struct must be kept in sync with the PG_RMGR definition in
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 5ef007f..4e18e78 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -589,6 +589,11 @@ Restoring WAL segments from archives using restore_command can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Restoring WAL segments from archives using restore command to perform PITR
+can be enabled by passing the keyword parameter has_restoring_pitr => 1. This
+is disabled by default. By enabling this parameter, the standby database will
+not operate in standby mode.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -604,6 +609,7 @@ sub init_from_backup
 
 	$params{has_streaming} = 0 unless defined $params{has_streaming};
 	$params{has_restoring} = 0 unless defined $params{has_restoring};
+	$params{has_restoring_pitr} = 0 unless defined $params{has_restoring_pitr};
 
 	print
 "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n";
@@ -626,6 +632,7 @@ port = $port
 ));
 	$self->enable_streaming($root_node) if $params{has_streaming};
 	$self->enable_restoring($root_node) if $params{has_restoring};
+	$self->enable_restoring_pitr($root_node) if $params{has_restoring_pitr};
 }
 
 =pod
@@ -658,6 +665,68 @@ sub start
 	$self->_update_pid;
 }
 
+=item $node->pitr_start()
+
+Wrapper for pg_ctl start
+
+Start the node and wait until it is ready to accept connections.
+
+=cut
+
+sub pitr_start
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+	my $log	   = $self->logfile;
+	print("### Starting node \"$name\"\n");
+	my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+               $self->logfile, 'start');
+
+        if ($ret != 0)
+        {
+                my $s = "shutdown at end-of-the-wal";
+
+                my @rs = `grep $s, $log`;
+
+                if (@rs)
+                {
+                        print "Database was shutdown at end-of-the-wal. Test Passed\n";
+			isnt($ret,0, "check the PITR node for successful SHUTDOWN at incomplete recovery");
+			exit 0;
+
+                }
+		else
+		{
+			print "# pg_ctl failed; logfile:\n";
+			print TestLib::slurp_file($self->logfile);
+			BAIL_OUT("pg_ctl failed");
+		}
+        }
+
+        $self->_update_pid;
+}
+
+=item $node->status()
+
+Wrapper for pg_ctl status
+
+Check the node status
+
+=cut
+
+sub status
+{
+        my ($self) = @_;
+        my $port   = $self->port;
+        my $pgdata = $self->data_dir;
+        my $name   = $self->name;
+        print("### Checking the node status \"$name\"\n");
+        my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+                $self->logfile, 'status');
+}
+
 =pod
 
 =item $node->stop(mode)
@@ -774,7 +843,6 @@ sub enable_restoring
 	  $TestLib::windows_os
 	  ? qq{copy "$path\\\\%f" "%p"}
 	  : qq{cp "$path/%f" "%p"};
-
 	$self->append_conf(
 		'recovery.conf', qq(
 restore_command = '$copy_command'
@@ -782,6 +850,32 @@ standby_mode = on
 ));
 }
 
+# Internal routine to enable archive recovery command on a standby node for PITR
+sub enable_restoring_pitr
+{
+        my ($self, $root_node) = @_;
+        my $path = $root_node->archive_dir;
+        my $name = $self->name;
+
+        print "### Enabling WAL restore for node \"$name\"\n";
+
+        # On Windows, the path specified in the restore command needs to use
+        # double back-slashes to work properly and to be able to detect properly
+        # the file targeted by the copy command, so the directory value used
+        # in this routine, using only one back-slash, need to be properly changed
+        # first. Paths also need to be double-quoted to prevent failures where
+        # the path contains spaces.
+        $path =~ s{\\}{\\\\}g if ($TestLib::windows_os);
+        my $copy_command =
+          $TestLib::windows_os
+          ? qq{copy "$path\\\\%f" "%p"}
+          : qq{cp "$path/%f" "%p"};
+	$self->append_conf(
+		'recovery.conf', qq(
+restore_command = '$copy_command'
+));
+}
+
 # Internal routine to enable archiving
 sub enable_archiving
 {
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b7b0caa..e47fb74 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
 
 # Create and test a standby from given backup, with a certain
 # recovery target.
@@ -27,9 +27,7 @@ sub test_recovery_standby
 			qq($param_item
 ));
 	}
-
 	$node_standby->start;
-
 	# Wait until standby has replayed enough data
 	my $caughtup_query =
 	  "SELECT '$until_lsn'::pg_lsn <= pg_last_wal_replay_location()";
@@ -41,6 +39,44 @@ sub test_recovery_standby
 	  $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 	is($result, qq($num_rows), "check standby content for $test_name");
 
+        print "Rows in PITR  : $result\n";
+        print "Rows in master: $num_rows\n";
+
+	# Stop standby node
+	$node_standby->teardown_node;
+}
+
+
+# Create a node from given backup to perform PITR to a certain recovery target.
+
+sub test_recovery_pitr
+{
+        my $test_name       = shift;
+        my $node_name       = shift;
+        my $node_master     = shift;
+        my $recovery_params = shift;
+        my $num_rows        = shift;
+        my $until_lsn       = shift;
+
+	my $node_standby = get_new_node($node_name);
+	$node_standby->init_from_backup($node_master, 'my_pitr',
+		has_restoring_pitr => 1);
+
+	foreach my $param_item (@$recovery_params)
+	{
+		$node_standby->append_conf(
+			'recovery.conf',
+				qq($param_item
+));
+	}
+	$node_standby->pitr_start;
+	# Check the content on pitr
+	my $result =
+		$node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
+	isnt($result, qq($num_rows), "check the PITR node for successful $test_name");
+	print "Rows in PITR  : $result\n";
+	print "Rows in master: $num_rows\n";
+
 	# Stop standby node
 	$node_standby->teardown_node;
 }
@@ -144,3 +180,79 @@ test_recovery_standby('XID + time + name',
 	"recovery_target_lsn = '$recovery_lsn'",);
 test_recovery_standby('XID + time + name + LSN',
 	'standby_9', $node_master, \@recovery_params, "5000", $lsn5);
+
+# Test Incomplete recovery
+# Initialize master node
+
+$node_master = get_new_node('master1');
+$node_master->init(has_archiving => 1, allows_streaming => 1);
+
+# Start it
+$node_master->start;
+
+# Create data before taking the backup, aimed at testing
+
+$node_master->safe_psql('postgres',
+         "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+
+	my $initial_lsn = $node_master->safe_psql('postgres', "SELECT pg_current_wal_location();");
+	print "LSN Before backup: $initial_lsn\n";
+	my $initial_xid = $node_master->safe_psql('postgres', "SELECT txid_current();");
+	print "XID before backup: $initial_xid\n";
+	my $initial_rows = $node_master->safe_psql('postgres', "SELECT count(*) from tab_int;");
+	print "Initial rows before backup: $initial_rows\n";
+
+# Take backup from which all operations will be run
+$node_master->backup('my_pitr');
+
+# Generate enough data and more WAL Archives for a recovery target reference.
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(1001,10000))");
+# Force archiving of WAL file
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(10001,20000))");
+# Force archiving of WAL file
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(20001,30000))");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(30001,40000))");
+
+$node_master->safe_psql('postgres',
+        "INSERT INTO tab_int VALUES (generate_series(40001,60000))");
+
+# current wal position in the master node
+$ret = $node_master->safe_psql('postgres',
+        "SELECT pg_current_wal_location(), txid_current();");
+my ($lsn6, $recovery_xid) = split /\|/, $ret;
+
+# Test the server promotion when the recovery fails to reach the recovery target.
+my $recovery_target_incomplete='promote';
+
+@recovery_params = (
+	"recovery_target_xid = '$recovery_xid'",
+	"recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('PROMOTION at incomplete recovery','pitr_1',$node_master, \@recovery_params, "60000", $lsn6);
+
+# Check if the server pauses when the recovery stops mid-way without reaching the recovery target.
+
+$recovery_target_incomplete='pause';
+
+@recovery_params = (
+        "recovery_target_xid = '$recovery_xid'",
+        "recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('PAUSE at incomplete recovery','pitr_2',$node_master, \@recovery_params, "60000", $lsn6);
+
+# Check if the server successfully shuts down when the recovery stops mid-way without reaching the recovery target.
+
+$recovery_target_incomplete='shutdown';
+
+@recovery_params = (
+        "recovery_target_xid = '$recovery_xid'",
+        "recovery_target_incomplete = '$recovery_target_incomplete'");
+test_recovery_pitr('SHUTDOWN at incomplete recovery','pitr_3',$node_master, \@recovery_params, "60000", $lsn6);
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 0d3859d..f90a449 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -247,7 +247,7 @@ sub pre_indent
 		# previous line was struct, etc.
 		next
 		  if $srclines[ $lno - 1 ] =~
-			  m!=|^(struct|enum|[ \t]*typedef|extern[ \t]+"C")!;
+		  m!=|^(struct|enum|[ \t]*typedef|extern[ \t]+"C")!;
 
 		$srclines[$lno] = "$l2\nint pgindent_func_no_var_fix;";
 	}
@@ -520,7 +520,7 @@ File::Find::find(
 			  && -f _
 			  && /^.*\.[ch]\z/s
 			  && push(@files, $File::Find::name);
-		  }
+		}
 	},
 	$code_base) if $code_base;
 
diff --git a/src/tools/pgindent/pgperltidy b/src/tools/pgindent/pgperltidy
index 6098e18..395bce3 100755
--- a/src/tools/pgindent/pgperltidy
+++ b/src/tools/pgindent/pgperltidy
@@ -17,7 +17,7 @@ PERLTIDY=${PERLTIDY:-perltidy}
 	cut -d: -f1
 ) |
 sort -u |
-xargs $PERLTIDY --profile=src/tools/pgindent/perltidyrc
+xargs $PERLTIDY --profile=/data/PostgreSQL-Repo/postgresql/src/tools/pgindent/perltidyrc
 
 # perltidyrc specifies --backup-and-modify-in-place, so get rid of .bak files
 find . -type f -name '*.bak' | xargs rm
#39David Steele
david@pgmasters.net
In reply to: Venkata B Nagothi (#38)
Re: patch proposal

On 3/26/17 7:34 PM, Venkata B Nagothi wrote:

Hi David,

On Thu, Mar 23, 2017 at 4:21 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

On 3/21/17 8:45 PM, Venkata B Nagothi wrote:

On Tue, Mar 21, 2017 at 8:46 AM, David Steele
<david@pgmasters.net <mailto:david@pgmasters.net>

Unfortunately, I don't think the first patch
(recoveryStartPoint)
will work as currently implemented. The problem I see is
that the
new function recoveryStartsHere() depends on pg_control
containing a
checkpoint right at the end of the backup. There's no guarantee
that this is true, even if pg_control is copied last. That
means a
time, lsn, or xid that occurs somewhere in the middle of the
backup
can be selected without complaint from this code depending
on timing.

Yes, that is true. The latest best position, checkpoint
position, xid
and timestamp of the restored backup of the backup is shown up
in the
pg_controldata, which means, that is the position from which the
recovery would start.

Backup recovery starts from the checkpoint in the backup_label, not
from the checkpoint in pg_control. The original checkpoint that
started the backup is generally overwritten in pg_control by the end
of the backup.

Yes, I totally agree. My initial intention was to compare the recovery
target position(s) with the contents in the backup_label, but, then, the
checks would fails if the recovery is performed without the backup_label
file. Then, i decided to compare the recovery target positions with the
contents in the pg_control file.

Which in-turn means, WALs start getting replayed
from that position towards --> minimum recovery position (which
is the
end backup, which again means applying WALs generated between
start and
the end backup) all the way through to --> recovery target
position.

minRecoveryPoint is only used when recovering a backup that was made
from a standby. For backups taken on the master, the backup end WAL
record is used.

The best start position to check with would the position shown
up in the
pg_control file, which is way much better compared to the current
postgresql behaviour.

I don't agree, for the reasons given previously.

As explained above, my intention was to ensure that the recovery start
positions checks are successfully performed irrespective of the presence
of the backup_label file.

I did some analysis before deciding to use pg_controldata's output
instead of backup_label file contents.

Comparing the output of the pg_controldata with the contents of
backup_label contents.

*Recovery Target LSN*

START WAL LOCATION (which is 0/9C000028) in the backup_label =
pg_controldata's latest checkpoint's REDO location (Latest
checkpoint's REDO location: 0/9C000028)

*Recovery Target TIME*

backup start time in the backup_label (START TIME: 2017-03-26
11:55:46 AEDT) = pg_controldata's latest checkpoint time (Time of
latest checkpoint : Sun 26 Mar 2017 11:55:46 AM AEDT)

*Recovery Target XID*

To begin with backup_label does contain any start XID. So, the only
option is to depend on pg_controldata's output.
After a few quick tests and thorough observation, i do notice that,
the pg_control file information is copied as it is to the backup
location at the pg_start_backup. I performed some quick tests by
running few transactions between pg_start_backup and pg_stop_backup.
So, i believe, this is ideal start point for WAL replay.

Am i missing anything here ?

You are making assumptions about the contents of pg_control vs.
backup_label based on trivial tests. With PG defaults, the backup must
run about five minutes before the values in pg_control and backup_label
will diverge. Even if pg_control and backup_label do match, those are
the wrong values to use, and will get more incorrect the longer the
backup runs.

I believe there is a correct way to go about this, at least for time and
LSN, and I don't think your very approximate solution will pass muster
with a committer.

Since we are nearly at the end of the CF, I have marked this submission
"Returned with Feedback".

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Chapman Flack
chap@anastigmatix.net
In reply to: Stephen Frost (#5)
archive restore command failure status [was Re: patch proposal]

On 16 August 2016 at 09:06, Stephen Frost <sfrost@snowman.net> wrote:

From PG's perspective,
your restore command has said that all of the WAL has been replayed.

If that's not what you want then change your restore command to
return an exit code > 125, which tells PG that it's unable to restore
that WAL segment.

Ah! Is this documented somewhere?
if we expect people to use correct exit codes we should document them,
no?

It seems to be set out clearly in the code comments, but only alluded
to in the "Recovering using a continuous archive backup" doc section.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlogarchive.c;h=7afb73579b009444f9b7df658645ea3f2cf5e5f5;hb=958fe549884928cd3bdf009993e9a05df5fd6cee#l270

It looks like the code was put there to cover the case where certain
unanticipated Bad Things happen (a signal, or one of the Single Unix
Spec reserved codes 126 or 127 when the shell couldn't do what you`
asked it to), but not necessarily with an eye to the possibility you
might /want/ to make recovery abort based on a condition your restore
script detects, and that's why the doc only mentions an exception
on termination by a signal or an error by the shell, but doesn't
emphasize how you could do this yourself (by exit 126 or 127, or
sending yourself a signal).

Maybe that is worth a brief mention? It doesn't seem outlandish to me
that a restore script could detect conditions worth stopping recovery
for.

-Chap