Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record. The standby then checks whether its own settings are at least
as big as the ones on the primary. If not, the standby shuts down with
a fatal error.
The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that are
required for processing WAL records that the primary might send. For
example, if the primary sends a prepared transaction, the standby must
have had max_prepared_transaction set appropriately or it won't be able
to process those WAL records.
However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction. The
resources related to those settings are not required immediately at that
point, and might never be required if the activity on the primary does
not exhaust all those resources. An extreme example is raising
max_prepared_transactions on the primary but never actually using
prepared transactions.
Where this becomes a serious problem is if you have many standbys and
you do a failover. If the newly promoted standby happens to have a
higher setting for one of the relevant parameters, all the other
standbys that have followed it then shut down immediately and won't be
able to continue until you change all their settings.
If we didn't do the hard shutdown and we just let the standby roll on
with recovery, nothing bad will happen and it will eventually produce an
appropriate error when those resources are required (e.g., "maximum
number of prepared transactions reached").
So I think there are better ways to handle this. It might be reasonable
to provide options. The attached patch doesn't do that but it would be
pretty easy. What the attached patch does is:
Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but
only issue a warning and set a global flag if there is a problem. Then
when we actually hit the resource issue and the flag was set, we issue
another warning message with relevant information. Additionally, at
that point we pause recovery instead of shutting down, so a hot standby
remains usable. (That could certainly be configurable.)
Btw., I think the current setup is slightly buggy. The MaxBackends
value that is used to size shared memory is computed as MaxConnections +
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but
we don't track autovacuum_max_workers in WAL.
(This patch was developed together with Simon Riggs.)
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v1-0001-Improve-handling-of-parameter-differences-in-phys.patchtext/plain; charset=UTF-8; name=v1-0001-Improve-handling-of-parameter-differences-in-phys.patch; x-mac-creator=0; x-mac-type=0Download+74-12
Hello
Thank you for working on this!
Where this becomes a serious problem is if you have many standbys and you do a failover.
+1
Several times my team would like to pause recovery instead of panic after change settings on primary. (same thing for create_tablespace_directories replay errors too...)
We documented somewhere (excluding code) shutting down the standby immediately upon receipt of the parameter change? doc/src/sgml/high-availability.sgml says only about "refuse to start".
regards, Sergei
On 2020/02/27 17:23, Peter Eisentraut wrote:
When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error.
The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records.
However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. An extreme example is raising max_prepared_transactions on the primary but never actually using prepared transactions.
Where this becomes a serious problem is if you have many standbys and you do a failover. If the newly promoted standby happens to have a higher setting for one of the relevant parameters, all the other standbys that have followed it then shut down immediately and won't be able to continue until you change all their settings.
If we didn't do the hard shutdown and we just let the standby roll on with recovery, nothing bad will happen and it will eventually produce an appropriate error when those resources are required (e.g., "maximum number of prepared transactions reached").
So I think there are better ways to handle this. It might be reasonable to provide options. The attached patch doesn't do that but it would be pretty easy. What the attached patch does is:
Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. Additionally, at that point we pause recovery instead of shutting down, so a hot standby remains usable. (That could certainly be configurable.)
+1
Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL.
Maybe this is because autovacuum doesn't work during recovery?
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On 2020-02-27 11:13, Fujii Masao wrote:
Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL.
Maybe this is because autovacuum doesn't work during recovery?
Autovacuum on the primary can use locks or xids, and so it's possible
that the standby when processing WAL encounters more of those than it
has locally allocated shared memory to handle.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote:
On 2020-02-27 11:13, Fujii Masao wrote:
Btw., I think the current setup is slightly buggy. The
MaxBackends value that is used to size shared memory is computed as
MaxConnections + autovacuum_max_workers + 1 + max_worker_processes +
max_wal_senders, but we don't track autovacuum_max_workers in WAL.Maybe this is because autovacuum doesn't work during recovery?
Autovacuum on the primary can use locks or xids, and so it's possible that
the standby when processing WAL encounters more of those than it has locally
allocated shared memory to handle.
Putting aside your patch because that sounds like a separate issue..
Doesn't this mean that autovacuum_max_workers should be added to the
control file, that we need to record in WAL any updates done to it and
that CheckRequiredParameterValues() is wrong?
--
Michael
On 2020-02-28 08:45, Michael Paquier wrote:
On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote:
On 2020-02-27 11:13, Fujii Masao wrote:
Btw., I think the current setup is slightly buggy.� The
MaxBackends value that is used to size shared memory is computed as
MaxConnections + autovacuum_max_workers + 1 + max_worker_processes +
max_wal_senders, but we don't track autovacuum_max_workers in WAL.Maybe this is because autovacuum doesn't work during recovery?
Autovacuum on the primary can use locks or xids, and so it's possible that
the standby when processing WAL encounters more of those than it has locally
allocated shared memory to handle.Putting aside your patch because that sounds like a separate issue..
Doesn't this mean that autovacuum_max_workers should be added to the
control file, that we need to record in WAL any updates done to it and
that CheckRequiredParameterValues() is wrong?
That would be a direct fix, yes.
Perhaps it might be better to track the combined MaxBackends instead,
however.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Feb 28, 2020 at 08:49:08AM +0100, Peter Eisentraut wrote:
Perhaps it might be better to track the combined MaxBackends instead,
however.
Not sure about that. I think that we should keep them separated, as
that's more useful for debugging and more verbose for error reporting.
(Worth noting that max_prepared_xacts is separate because of its dummy
PGPROC entries created by PREPARE TRANSACTION, so it cannot be
included in the set).
--
Michael
On 2020-Feb-27, Peter Eisentraut wrote:
So this patch relaxes this a bit. Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem. Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information. Additionally, at that
point we pause recovery, so a hot standby remains usable.
Hmm, so what is the actual end-user behavior? As I read the code, we
first send the WARNING, then pause recovery until the user resumes
replication; at that point we raise the original error. Presumably, at
that point the startup process terminates and is relaunched, and replay
continues normally. Is that it?
I think if the startup process terminates because of the original error,
after it is unpaused, postmaster will get that as a signal to do a
crash-recovery cycle, closing all existing connections. Is that right?
If so, it would be worth improving that (possibly by adding a
sigsetjmp() block) to avoid the disruption.
Thanks,
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 29 Feb 2020 at 06:39, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Feb-27, Peter Eisentraut wrote:
So this patch relaxes this a bit. Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem. Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information. Additionally, at that
point we pause recovery, so a hot standby remains usable.Hmm, so what is the actual end-user behavior? As I read the code, we
first send the WARNING, then pause recovery until the user resumes
replication; at that point we raise the original error.
I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-02-28 16:33, Alvaro Herrera wrote:
Hmm, so what is the actual end-user behavior? As I read the code, we
first send the WARNING, then pause recovery until the user resumes
replication; at that point we raise the original error. Presumably, at
that point the startup process terminates and is relaunched, and replay
continues normally. Is that it?
No, at that point you get the original, current behavior that the server
instance shuts down with a fatal error.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-03-09 09:11, Masahiko Sawada wrote:
I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.
I think that would be annoying, unless you create a system for
configuring those periodic warnings.
I imagine in a case like having set max_prepared_transactions but never
actually using prepared transactions, people will just ignore the
warning until they have their next restart, so it could be months of
periodic warnings.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-03-09 09:11, Masahiko Sawada wrote:
I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.I think that would be annoying, unless you create a system for
configuring those periodic warnings.I imagine in a case like having set max_prepared_transactions but never
actually using prepared transactions, people will just ignore the
warning until they have their next restart, so it could be months of
periodic warnings.
Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Mon, 9 Mar 2020 21:13:38 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in
On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:On 2020-03-09 09:11, Masahiko Sawada wrote:
I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.I think that would be annoying, unless you create a system for
configuring those periodic warnings.I imagine in a case like having set max_prepared_transactions but never
actually using prepared transactions, people will just ignore the
warning until they have their next restart, so it could be months of
periodic warnings.Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.
If we lose connection, standby continues to complain about lost
connection every 5 seconds. This is a situation of that kind.
By the way, when I reduced max_connection only on master then take
exclusive locks until standby complains on lock exchaustion, I see a
WARNING that is saying max_locks_per_transaction instead of
max_connection.
WARNING: insufficient setting for parameter max_connections
DETAIL: max_connections = 2 is a lower setting than on the master server (where its value was 3).
HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
CONTEXT: WAL redo at 0/60000A0 for XLOG/PARAMETER_CHANGE: max_connections=3 max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0 max_locks_per_xact=10 wal_level=replica wal_log_hints=off track_commit_timestamp=off
WARNING: recovery paused because of insufficient setting of parameter max_locks_per_transaction (currently 10)
DETAIL: The value must be at least as high as on the primary server.
HINT: Recovery cannot continue unless the parameter is changed and the server restarted.
CONTEXT: WAL redo at 0/6004A80 for Standb
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020-03-10 09:57, Kyotaro Horiguchi wrote:
Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.If we lose connection, standby continues to complain about lost
connection every 5 seconds. This is a situation of that kind.
My argument is that it's not really the same. If a standby is
disconnected for more than a few minutes, it's really not going to be a
good standby anymore after a while. In this case, however, having
certain parameter discrepancies is really harmless and you can run with
it for a long time. I'm not strictly opposed to a periodic warning, but
it's unclear to me how we would find a good interval.
By the way, when I reduced max_connection only on master then take
exclusive locks until standby complains on lock exchaustion, I see a
WARNING that is saying max_locks_per_transaction instead of
max_connection.WARNING: insufficient setting for parameter max_connections
DETAIL: max_connections = 2 is a lower setting than on the master server (where its value was 3).
HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
CONTEXT: WAL redo at 0/60000A0 for XLOG/PARAMETER_CHANGE: max_connections=3 max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0 max_locks_per_xact=10 wal_level=replica wal_log_hints=off track_commit_timestamp=off
WARNING: recovery paused because of insufficient setting of parameter max_locks_per_transaction (currently 10)
DETAIL: The value must be at least as high as on the primary server.
HINT: Recovery cannot continue unless the parameter is changed and the server restarted.
CONTEXT: WAL redo at 0/6004A80 for Standb
This is all a web of half-truths. The lock tables are sized based on
max_locks_per_xact * (MaxBackends + max_prepared_xacts). So if you run
out of lock space, we currently recommend (in the single-server case),
that you raise max_locks_per_xact, but you could also raise
max_prepared_xacts or something else. So this is now the opposite case
where the lock table on the master was bigger because of max_connections.
We could make the advice less specific and just say, in essence, you
need to make some parameter changes; see earlier for some hints.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 10 Mar 2020 14:47:47 +0100, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in
On 2020-03-10 09:57, Kyotaro Horiguchi wrote:
Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.If we lose connection, standby continues to complain about lost
connection every 5 seconds. This is a situation of that kind.My argument is that it's not really the same. If a standby is
disconnected for more than a few minutes, it's really not going to be
a good standby anymore after a while. In this case, however, having
certain parameter discrepancies is really harmless and you can run
with it for a long time. I'm not strictly opposed to a periodic
warning, but it's unclear to me how we would find a good interval.
I meant the behavior after streaming is paused. That situation leads
to loss of WAL or running out of WAL storage on the master. Actually
5 seconds as a interval would be too frequent, but, maybe, we need at
least one message for a WAL segment-size?
By the way, when I reduced max_connection only on master then take
exclusive locks until standby complains on lock exchaustion, I see a
WARNING that is saying max_locks_per_transaction instead of
max_connection.
...
WARNING: recovery paused because of insufficient setting of parameter
max_locks_per_transaction (currently 10)
DETAIL: The value must be at least as high as on the primary server.
HINT: Recovery cannot continue unless the parameter is changed and the
server restarted.
CONTEXT: WAL redo at 0/6004A80 for StandbThis is all a web of half-truths. The lock tables are sized based on
max_locks_per_xact * (MaxBackends + max_prepared_xacts). So if you
run out of lock space, we currently recommend (in the single-server
case), that you raise max_locks_per_xact, but you could also raise
max_prepared_xacts or something else. So this is now the opposite
case where the lock table on the master was bigger because of
max_connections.
Yeah, I know. So, I'm not sure whether the checks on individual GUC
variable (other than wal_level) makes sense. We might even not need
the WARNING on parameter change.
We could make the advice less specific and just say, in essence, you
need to make some parameter changes; see earlier for some hints.
In that sense the direction menetioned above seems sensible.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Here is an updated patch that incorporates some of the suggestions. In
particular, some of the warning messages have been rephrased to more
accurate (but also less specific), the warning message at recovery pause
repeats every 1 minute, and the documentation has been updated.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v2-0001-Improve-handling-of-parameter-differences-in-phys.patchtext/plain; charset=UTF-8; name=v2-0001-Improve-handling-of-parameter-differences-in-phys.patch; x-mac-creator=0; x-mac-type=0Download+120-24
On Thu, 12 Mar 2020 at 04:34, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Here is an updated patch that incorporates some of the suggestions. In
particular, some of the warning messages have been rephrased to more
accurate (but also less specific), the warning message at recovery pause
repeats every 1 minute, and the documentation has been updated.
Thank you for updating the patch. I have one comment on the latest
version patch:
+ do
+ {
+ TimestampTz now = GetCurrentTimestamp();
+
+ if (TimestampDifferenceExceeds(last_warning, now, 60000))
+ {
+ ereport(WARNING,
+ (errmsg("recovery paused because of insufficient
parameter settings"),
+ errdetail("See earlier in the log about which
settings are insufficient."),
+ errhint("Recovery cannot continue unless the
configuration is changed and the server restarted.")));
+ last_warning = now;
+ }
+
+ pg_usleep(1000000L); /* 1000 ms */
+ HandleStartupProcInterrupts();
+ }
I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here.
The others look good to me.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello
I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here.
+1, since we added this in recoveryPausesHere.
PS: do we need to add a prototype for the RecoveryRequiredIntParameter function in top of xlog.c?
regards, Sergei
On 2020-03-27 20:15, Sergei Kornilov wrote:
I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here.
+1, since we added this in recoveryPausesHere.
committed with that addition
PS: do we need to add a prototype for the RecoveryRequiredIntParameter function in top of xlog.c?
There is no consistent style, I think, but I usually only add prototypes
for static functions if they are required because of the ordering in the
file.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here is another stab at this subject.
This is a much simplified variant: When encountering a parameter change
in the WAL that is higher than the standby's current setting, we log a
warning (instead of an error until now) and pause recovery. If you
resume (unpause) recovery, the instance shuts down as before.
This allows you to keep your standbys running for a bit (depending on
lag requirements) and schedule the required restart more deliberately.
I had previously suggested making this new behavior configurable, but
there didn't seem to be much interest in that, so I have not included
that there.
The documentation changes are mostly carried over from previous patch
versions (but adjusted for the actual behavior of the patch).
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services