BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

Started by Nonameover 10 years ago18 messagesbugs
Jump to latest
#1Noname
postgresql.org@gclough.com

The following bug has been logged on the website:

Bug reference: 13770
Logged by: Greg Clough
Email address: postgresql.org@gclough.com
PostgreSQL version: 9.4.5
Operating system: Oracle Linux v6.7
Description:

If you extend the recovery_min_apply_delay on a standby and then restart it,
you cannot connect to it for read-only transactions until the most recent
transaction time >= delay time. This is problematic if you extend from say
1 hour to 24 hours, as you have to wait almost a day before you can connect
to your Standby again.

I can't find anything in the documentation to describe this as intended
behaviour, and my expectation was that when I extended the delay that the
Standby database would come up and be available for read-only transactions,
but would simply delay any new transactions until they were older than
recovery_min_apply_delay.

I'm curious if this was a design decision to stop people getting confused
when they access a database before recovery_min_apply_delay has expired and
get data that's too recent, or if it was an unintentional consequence of the
way it was implemented.

Environment:

Oracle Linux v6.7
PostgreSQL v9.4.5 (Installed via yum)

postgres@gclough[5432] ~$ uname -a
Linux gclough 3.8.13-98.4.1.el6uek.x86_64 #2 SMP Wed Sep 23 18:46:01 PDT
2015 x86_64 x86_64 x86_64 GNU/Linux

postgres@gclough[5432] ~$ psql --version
psql (PostgreSQL) 9.4.5

Test Case:

1. Create a Primary database on port 5432

export PGDATA=/u01/postgres/9.4/5432/data
export PGPORT=5432
initdb

wal_level = hot_standby # minimal, archive, hot_standby, or
logical
max_wal_senders = 1 # max number of walsender processes

2. Replicate it to a Standby database on port 5433

export PGDATA=/u01/postgres/9.4/5433/data
export PGPORT=5433

port = 5433 # (change requires restart)
wal_level = hot_standby # minimal, archive, hot_standby, or
logical
max_wal_senders = 1 # max number of walsender processes
hot_standby = on # "on" allows queries during
recovery

3. On the Standby (Port 5433), introduce a 2min time delay with
"recovery_min_apply_delay" in the recovery.conf:

standby_mode = 'on'
primary_conninfo = 'port=5432 host=localhost user=postgres
application_name=port_5433'
recovery_min_apply_delay = 2min

4. Check the current lag on the Standby (Port 5433):

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:32:32 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7000000 | 0/7000000 | | t
(1 row)

5. Create a small test table on the Primary (Port 5432):

CREATE TABLE delay_test (id NUMERIC);
INSERT INTO delay_test VALUES (1);

6. Check the application of WAL on the Standby (Port 5433), and note the
delay:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:33:06 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015DC8 | 0/7015890 | | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:34:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015890 | | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:01 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015E00 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

7. Reset the time delay on the Standby from 2 to 5 minutes

recovery_min_apply_delay = 5min

pg_ctl restart -m fast

8. Try to connect to the Standby, and get a failure:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:26 GMT 2015
psql: FATAL: the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:36:01 GMT 2015
psql: FATAL: the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:37:00 GMT 2015
psql: FATAL: the database system is starting up

9. Wait 3 minutes... then it works:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:38:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015ED8 | 0/7015ED8 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Greg Clough
greg@gclough.com
In reply to: Noname (#1)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

Hi,

I didn't see any response to this, and I do believe it's an actual
PostgreSQL bug... or at least a case of undesirable undocumented behaviour.

*SHORT SUMMARY: If you are using a Standby with recovery_min_apply_delay =
1h and you increase it to 24h, then your Standby will be unavailable for
the next 23h.*

Could someone confirm my diagnosis?

Thanks.
Greg Clough.

On 13 November 2015 at 09:02, Greg Clough <greg@gclough.com> wrote:

Show quoted text

I may have glossed over how I created the standby, but just for clarity it
was done with the primary shutdown cleanly:

su - postgres
export PGPORT=5432
pg_ctl stop -m fast
cd /u01/postgres/9.4/5432
tar cvf - ./data | (cd /u01/postgres/9.4/5433 && tar xvf -)

Regads.
Greg.

On 11 November 2015 at 15:50, <postgresql.org@gclough.com> wrote:

The following bug has been logged on the website:

Bug reference: 13770
Logged by: Greg Clough
Email address: postgresql.org@gclough.com
PostgreSQL version: 9.4.5
Operating system: Oracle Linux v6.7
Description:

If you extend the recovery_min_apply_delay on a standby and then restart
it,
you cannot connect to it for read-only transactions until the most recent
transaction time >= delay time. This is problematic if you extend from
say
1 hour to 24 hours, as you have to wait almost a day before you can
connect
to your Standby again.

I can't find anything in the documentation to describe this as intended
behaviour, and my expectation was that when I extended the delay that the
Standby database would come up and be available for read-only
transactions,
but would simply delay any new transactions until they were older than
recovery_min_apply_delay.

I'm curious if this was a design decision to stop people getting confused
when they access a database before recovery_min_apply_delay has expired
and
get data that's too recent, or if it was an unintentional consequence of
the
way it was implemented.

Environment:

Oracle Linux v6.7
PostgreSQL v9.4.5 (Installed via yum)

postgres@gclough[5432] ~$ uname -a
Linux gclough 3.8.13-98.4.1.el6uek.x86_64 #2 SMP Wed Sep 23 18:46:01 PDT
2015 x86_64 x86_64 x86_64 GNU/Linux

postgres@gclough[5432] ~$ psql --version
psql (PostgreSQL) 9.4.5

Test Case:

1. Create a Primary database on port 5432

export PGDATA=/u01/postgres/9.4/5432/data
export PGPORT=5432
initdb

wal_level = hot_standby # minimal, archive, hot_standby,
or
logical
max_wal_senders = 1 # max number of walsender processes

2. Replicate it to a Standby database on port 5433

export PGDATA=/u01/postgres/9.4/5433/data
export PGPORT=5433

port = 5433 # (change requires restart)
wal_level = hot_standby # minimal, archive, hot_standby,
or
logical
max_wal_senders = 1 # max number of walsender processes
hot_standby = on # "on" allows queries during
recovery

3. On the Standby (Port 5433), introduce a 2min time delay with
"recovery_min_apply_delay" in the recovery.conf:

standby_mode = 'on'
primary_conninfo = 'port=5432 host=localhost user=postgres
application_name=port_5433'
recovery_min_apply_delay = 2min

4. Check the current lag on the Standby (Port 5433):

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:32:32 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7000000 | 0/7000000 | | t
(1 row)

5. Create a small test table on the Primary (Port 5432):

CREATE TABLE delay_test (id NUMERIC);
INSERT INTO delay_test VALUES (1);

6. Check the application of WAL on the Standby (Port 5433), and note the
delay:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:33:06 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015DC8 | 0/7015890 | | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:34:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015890 | | t
(1 row)

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:01 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015E00 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

7. Reset the time delay on the Standby from 2 to 5 minutes

recovery_min_apply_delay = 5min

pg_ctl restart -m fast

8. Try to connect to the Standby, and get a failure:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:26 GMT 2015
psql: FATAL: the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:36:01 GMT 2015
psql: FATAL: the database system is starting up

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:37:00 GMT 2015
psql: FATAL: the database system is starting up

9. Wait 3 minutes... then it works:

postgres@gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:38:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015ED8 | 0/7015ED8 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Michael Paquier
michael@paquier.xyz
In reply to: Greg Clough (#2)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Mon, Dec 21, 2015 at 7:44 PM, Greg Clough <greg@gclough.com> wrote:

I didn't see any response to this, and I do believe it's an actual
PostgreSQL bug... or at least a case of undesirable undocumented behaviour.

SHORT SUMMARY: If you are using a Standby with recovery_min_apply_delay =
1h and you increase it to 24h, then your Standby will be unavailable for the
next 23h.

Could someone confirm my diagnosis?

When recovery_min_apply_delay is set, replay should wait for the delay
defined in the case of a COMMIT or COMMIT PREPARED record, and this
even if the minimum recovery point ensuring that standby is in a
consistent state on is not reached. In short, if there is a COMMIT
before the standby thinks that it has reached a consistent state to
allow read-only queries, which is what you are looking for, your
application won't be able to connect to the standby, and the feature
behaves correctly.

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Greg Clough
greg@gclough.com
In reply to: Michael Paquier (#3)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

Hi Michael,

I figured that if the database was in a consistent state when it was
shutdown with a 1 hour delay, then when it was restarted with a new 24 hour
delay then it would also be consistent and thus available for read-only
work. In my test case, no transactions have occurred but the standby still
wasn't accessible. I think this was compounded, as I think the error
itself a bit confusing:

psql: FATAL: the database system is starting up

I believe the unavailability of the standby for extended periods if the
recovery_min_apply_delay is increased will create some confusion, just as
it confused me initially. I can see two schools of thought when the
parameter is increased:

1. The standby includes transactions that are newer than the
"recovery_min_apply_delay" setting, so it must be prevented from exposing
any data to ensure applications don't see data too soon.

2. The standby is consistent, so it should be available for read-only
queries... but any new WAL should not be applied until the commit time is >
recovery_min_apply_delay.

Obviously I fall into camp #2, where I expected the database to basically
pause application of WAL until it passed the recovery_min_apply_delay...
but still be available for read-only queries.

If the preferred option is #1, then could we introduce a new error message
so that it's a bit more communicative. Maybe something like:

psql: FATAL: the database system has transactions newer than
recover_min_apply_delay. Waiting...

... or something better of your choosing.

Regards.
Greg.

On 26 December 2015 at 13:45, Michael Paquier <michael.paquier@gmail.com>
wrote:

Show quoted text

On Mon, Dec 21, 2015 at 7:44 PM, Greg Clough <greg@gclough.com> wrote:

I didn't see any response to this, and I do believe it's an actual
PostgreSQL bug... or at least a case of undesirable undocumented

behaviour.

SHORT SUMMARY: If you are using a Standby with recovery_min_apply_delay

=

1h and you increase it to 24h, then your Standby will be unavailable for

the

next 23h.

Could someone confirm my diagnosis?

When recovery_min_apply_delay is set, replay should wait for the delay
defined in the case of a COMMIT or COMMIT PREPARED record, and this
even if the minimum recovery point ensuring that standby is in a
consistent state on is not reached. In short, if there is a COMMIT
before the standby thinks that it has reached a consistent state to
allow read-only queries, which is what you are looking for, your
application won't be able to connect to the standby, and the feature
behaves correctly.

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Greg Clough (#4)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Wed, Dec 30, 2015 at 10:05 PM, Greg Clough <greg@gclough.com> wrote:

I figured that if the database was in a consistent state when it was
shutdown with a 1 hour delay, then when it was restarted with a new 24 hour
delay then it would also be consistent and thus available for read-only
work. In my test case, no transactions have occurred but the standby still
wasn't accessible.

Are you sure about that? What does pg_xlogdump tells you? It seems
based on the information given upthread that there was at least one
transaction commit between the redo point and the consistent point.
When a standby is restarted, it would redo the replay from the last
same redo point, so that's just logic that the standby keeps being
unavailable for connections.
.

I believe the unavailability of the standby for extended periods if the
recovery_min_apply_delay is increased will create some confusion, just as it
confused me initially. I can see two schools of thought when the parameter
is increased:

1. The standby includes transactions that are newer than the
"recovery_min_apply_delay" setting, so it must be prevented from exposing
any data to ensure applications don't see data too soon.
2. The standby is consistent, so it should be available for read-only
queries... but any new WAL should not be applied until the commit time is >
recovery_min_apply_delay.

If the standby has a transaction commit only after a consistent point
is reached, you will be good to connect anyway.

Obviously I fall into camp #2, where I expected the database to basically
pause application of WAL until it passed the recovery_min_apply_delay... but
still be available for read-only queries.

If the preferred option is #1, then could we introduce a new error message
so that it's a bit more communicative. Maybe something like:

psql: FATAL: the database system has transactions newer than
recover_min_apply_delay. Waiting...

Hm. The current error message depends on the state of the database
reported by the postmaster. It does not seem to me a good idea to
expose a new state at this level regarding recover_min_apply_delay.

In short, it seems to me that the correct way to address your concerns
is actually a documentation addition, the point being to mention that
a hot standby would take a longer time to become available for
read-only connections, aka allowing read-only connections if there are
transaction commits that happened between the redo point and the
consistent point.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#3)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...

Is there actually any case where it's interesting to delay in that
scenario? I mean that really can only happen if you changed the
configuration to a different delay, or your clock offset
changed. Otherwise we should always reach the consistent point before
the delay plays a role. I'm tempted to simply only check for delay when
consistent.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

Andres Freund <andres@anarazel.de> writes:

On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...

Is there actually any case where it's interesting to delay in that
scenario? I mean that really can only happen if you changed the
configuration to a different delay, or your clock offset
changed. Otherwise we should always reach the consistent point before
the delay plays a role. I'm tempted to simply only check for delay when
consistent.

The argument for having a delay at all is to allow backing up to some
earlier point in the master's history; but a slave that is not yet
consistent cannot provide any rollback/recovery option. The slave is
completely useless for any purpose until it reaches consistency, so
it might as well do that as fast as possible, and then sit on the
next WAL record until the delay is met. +1 for no delay at all when
not consistent.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Greg Clough
greg@gclough.com
In reply to: Michael Paquier (#5)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

I like the idea of replaying up to a consistent point, and then pausing any
new transactions until recovery_min_apply_delay has passed... but maybe my
original problem is OS specific, as I tried PostgreSQL v9.4.5 on CentOS
64-bit the standby database opens just fine... even when
the recovery_min_apply_delay is changed from 1 hour to 24 hours. I'm
downloading Oracle Linux v6.7 to replicate my previous environment, and
I'll report back with my findings.

On 30 December 2015 at 13:44, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Wed, Dec 30, 2015 at 10:05 PM, Greg Clough <greg@gclough.com> wrote:

I figured that if the database was in a consistent state when it was
shutdown with a 1 hour delay, then when it was restarted with a new 24

hour

delay then it would also be consistent and thus available for read-only
work. In my test case, no transactions have occurred but the standby

still

wasn't accessible.

Are you sure about that? What does pg_xlogdump tells you? It seems
based on the information given upthread that there was at least one
transaction commit between the redo point and the consistent point.
When a standby is restarted, it would redo the replay from the last
same redo point, so that's just logic that the standby keeps being
unavailable for connections.

I didn't run pg_xlogdump... but I did check the timestamps of the most
recently applied transaction, and it does not change before/after the
increase in recovery_min_apply_delay. This was done on a private test
environment, so there is very little chance I accidentally did a
transaction. The xlog pointer seem to change, but the replay timestamp is
identical... so what would cause the xlog to increase, but not actually be
a transaction?

I had always put the difference (7015E00 vs. 7015ED8) down to the
shutdown/startup writing something like a "end of file" into the WAL during
the shutdown... but that's purely a guess, as I don't know the internals of
how it really works:

*BEFORE*
Wed Nov 11 15:35:01 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015E00 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

*AFTER*

Wed Nov 11 15:38:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015ED8 | 0/7015ED8 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

.

I believe the unavailability of the standby for extended periods if the
recovery_min_apply_delay is increased will create some confusion, just

as it

confused me initially. I can see two schools of thought when the

parameter

is increased:

1. The standby includes transactions that are newer than the
"recovery_min_apply_delay" setting, so it must be prevented from exposing
any data to ensure applications don't see data too soon.
2. The standby is consistent, so it should be available for read-only
queries... but any new WAL should not be applied until the commit time

is >

recovery_min_apply_delay.

If the standby has a transaction commit only after a consistent point
is reached, you will be good to connect anyway.

Obviously I fall into camp #2, where I expected the database to basically
pause application of WAL until it passed the recovery_min_apply_delay...

but

still be available for read-only queries.

If the preferred option is #1, then could we introduce a new error

message

so that it's a bit more communicative. Maybe something like:

psql: FATAL: the database system has transactions newer than
recover_min_apply_delay. Waiting...

Hm. The current error message depends on the state of the database
reported by the postmaster. It does not seem to me a good idea to
expose a new state at this level regarding recover_min_apply_delay.

In short, it seems to me that the correct way to address your concerns
is actually a documentation addition, the point being to mention that
a hot standby would take a longer time to become available for
read-only connections, aka allowing read-only connections if there are
transaction commits that happened between the redo point and the
consistent point.
--
Michael

OK, that makes sense. It appears that these pages would be affected, but
as per my testing above... it doesn't seem to be an issue on CentOS like it
was on Oracle Linux:

http://www.postgresql.org/docs/9.4/static/standby-settings.html
http://www.postgresql.org/docs/9.5/static/standby-settings.html

I'll diagnose it on Oracle Linux vs. CentOS first, and then report back.

Cheers.
Greg.

#9Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#7)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Thu, Dec 31, 2015 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...

Is there actually any case where it's interesting to delay in that
scenario? I mean that really can only happen if you changed the
configuration to a different delay, or your clock offset
changed. Otherwise we should always reach the consistent point before
the delay plays a role. I'm tempted to simply only check for delay when
consistent.

The argument for having a delay at all is to allow backing up to some
earlier point in the master's history; but a slave that is not yet
consistent cannot provide any rollback/recovery option. The slave is
completely useless for any purpose until it reaches consistency, so
it might as well do that as fast as possible, and then sit on the
next WAL record until the delay is met. +1 for no delay at all when
not consistent.

OK, I don't mind doing so if you guys think that's more adapted. Based
on reading the code, it seems obvious though that this was made so as
a delay is taken into account even before the node is consistent.
Hence ISTM that it would be good as well to get feedback from people
who actually worked on the feature before deciding anything. Robert,
Simon?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Thu, Dec 31, 2015 at 8:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Dec 31, 2015 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...

Is there actually any case where it's interesting to delay in that
scenario? I mean that really can only happen if you changed the
configuration to a different delay, or your clock offset
changed. Otherwise we should always reach the consistent point before
the delay plays a role. I'm tempted to simply only check for delay when
consistent.

The argument for having a delay at all is to allow backing up to some
earlier point in the master's history; but a slave that is not yet
consistent cannot provide any rollback/recovery option. The slave is
completely useless for any purpose until it reaches consistency, so
it might as well do that as fast as possible, and then sit on the
next WAL record until the delay is met. +1 for no delay at all when
not consistent.

OK, I don't mind doing so if you guys think that's more adapted. Based
on reading the code, it seems obvious though that this was made so as
a delay is taken into account even before the node is consistent.

Changing my mind after more thoughts on the matter, it seems indeed
that it would make more sense to apply delays only once the database
has reached a consistent state to be able to do immediately
transaction-related operations on a standby without having to wait for
it to reach consistency for perhaps a couple of hours. Please see
attached a patch to do that.
--
Michael

Attachments:

20160102_recovery_delay_consistent.patchtext/x-patch; charset=US-ASCII; name=20160102_recovery_delay_consistent.patchDownload+7-2
#11Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#10)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Sat, Jan 2, 2016 at 9:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Changing my mind after more thoughts on the matter, it seems indeed
that it would make more sense to apply delays only once the database
has reached a consistent state to be able to do immediately
transaction-related operations on a standby without having to wait for
it to reach consistency for perhaps a couple of hours. Please see
attached a patch to do that.

CF entry added here:
https://commitfest.postgresql.org/9/478/
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#10)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On 02/01/2016 13:14, Michael Paquier wrote:

On Thu, Dec 31, 2015 at 8:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Dec 31, 2015 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2015-12-26 22:45:57 +0900, Michael Paquier wrote:

Depending on the use cases, it may be interesting to have a switch
allowing to not apply the delay should a consistent point not be
reached though...

Is there actually any case where it's interesting to delay in that
scenario? I mean that really can only happen if you changed the
configuration to a different delay, or your clock offset
changed. Otherwise we should always reach the consistent point before
the delay plays a role. I'm tempted to simply only check for delay when
consistent.

The argument for having a delay at all is to allow backing up to some
earlier point in the master's history; but a slave that is not yet
consistent cannot provide any rollback/recovery option. The slave is
completely useless for any purpose until it reaches consistency, so
it might as well do that as fast as possible, and then sit on the
next WAL record until the delay is met. +1 for no delay at all when
not consistent.

OK, I don't mind doing so if you guys think that's more adapted. Based
on reading the code, it seems obvious though that this was made so as
a delay is taken into account even before the node is consistent.

Changing my mind after more thoughts on the matter, it seems indeed
that it would make more sense to apply delays only once the database
has reached a consistent state to be able to do immediately
transaction-related operations on a standby without having to wait for
it to reach consistency for perhaps a couple of hours. Please see
attached a patch to do that.

I just reviewed the patch. It's pretty straightforward and works as
intended, so I mark it as ready for committer.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#12)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Thu, Mar 3, 2016 at 7:55 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just reviewed the patch. It's pretty straightforward and works as
intended, so I mark it as ready for committer.

Thanks for the review!
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#13)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Thu, Mar 3, 2016 at 9:26 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Mar 3, 2016 at 7:55 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just reviewed the patch. It's pretty straightforward and works as
intended, so I mark it as ready for committer.

The patch looks good to me.

We should backpatch this? I'm tempted to do that
because it seems an oversight.
It's user-visible change of behavior, though.

Regards,

--
Fujii Masao

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15Julien Rouhaud
rjuju123@gmail.com
In reply to: Fujii Masao (#14)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On 03/03/2016 15:42, Fujii Masao wrote:

On Thu, Mar 3, 2016 at 9:26 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Mar 3, 2016 at 7:55 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just reviewed the patch. It's pretty straightforward and works as
intended, so I mark it as ready for committer.

The patch looks good to me.

We should backpatch this? I'm tempted to do that
because it seems an oversight.
It's user-visible change of behavior, though.

The user-visible previous situation is to have a standby unavailable for
no good reason, so +1 for backpatching it.

Regards,

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#14)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On 2016-03-03 23:42:25 +0900, Fujii Masao wrote:

We should backpatch this? I'm tempted to do that
because it seems an oversight.
It's user-visible change of behavior, though.

I agree we should.

Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#16)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Fri, Mar 4, 2016 at 2:35 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-03-03 23:42:25 +0900, Fujii Masao wrote:

We should backpatch this? I'm tempted to do that
because it seems an oversight.
It's user-visible change of behavior, though.

I agree we should.

We definitely need to backpatch.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#17)
Re: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

On Fri, Mar 4, 2016 at 11:57 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Mar 4, 2016 at 2:35 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-03-03 23:42:25 +0900, Fujii Masao wrote:

We should backpatch this? I'm tempted to do that
because it seems an oversight.
It's user-visible change of behavior, though.

I agree we should.

We definitely need to backpatch.

Yeah, we should backpatch to 9.4 where the parameter was added.
Pushed. Thanks!

Regards,

--
Fujii Masao

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs