tx canceled on standby despite infinite max_standby_streaming_delay

Started by Jay Howardalmost 10 years ago5 messagesgeneral
Jump to latest
#1Jay Howard
jhoward@alumni.utexas.net

I'm seeing long-running transactions (pg_dump) canceled on the standby when
there are a lot of inserts happening on the master. This despite my having
set max_standby_streaming_delay to -1 on the standby.

Why might that happen?

This is pg 9.3.12. When it happens I see:

pg_dump: Dumping the contents of table "TABLE" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: canceling statement due to
conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.
pg_dump: The command was: COPY public.TABLE (COLUMNS) TO stdout;

#2Venkata B Nagothi
nag1010@gmail.com
In reply to: Jay Howard (#1)
Re: tx canceled on standby despite infinite max_standby_streaming_delay

On Sat, May 14, 2016 at 12:27 PM, Jay Howard <jhoward@alumni.utexas.net>
wrote:

I'm seeing long-running transactions (pg_dump) canceled on the standby
when there are a lot of inserts happening on the master. This despite my
having set max_standby_streaming_delay to -1 on the standby.

Do you have hot_standby_feedback set to "on" ?

What is the parameter max_standby_archive_delay configured to ? This will
pause WAL archives from being applied when queries are executed on the
standby database.

Why might that happen?

This is pg 9.3.12. When it happens I see:

pg_dump: Dumping the contents of table "TABLE" failed: PQgetResult()
failed.
pg_dump: Error message from server: ERROR: canceling statement due to
conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.
pg_dump: The command was: COPY public.TABLE (COLUMNS) TO stdout;

I suspect this is due to the clean up by VACUUM on primary.

Regards,
Venkata B N

Fujitsu Australia

#3Jay Howard
jhoward@alumni.utexas.net
In reply to: Venkata B Nagothi (#2)
Re: tx canceled on standby despite infinite max_standby_streaming_delay

Do you have hot_standby_feedback set to "on" ?

It was off. Will research that. Thank you!

What is the parameter max_standby_archive_delay configured to ? This will

pause WAL archives from being applied when queries are executed on the
standby database.

It's set to the default, which is 30 seconds. For some reason I thought
setting "max_standby_streaming_delay" to -1 would be sufficient.

At a high level, what's the difference between the "archive_delay" and
"streaming_delay"? I will read up on streaming replication in the mean
time.

On Sat, May 14, 2016 at 8:20 PM, Venkata Balaji N <nag1010@gmail.com> wrote:

Show quoted text

On Sat, May 14, 2016 at 12:27 PM, Jay Howard <jhoward@alumni.utexas.net>
wrote:

I'm seeing long-running transactions (pg_dump) canceled on the standby
when there are a lot of inserts happening on the master. This despite my
having set max_standby_streaming_delay to -1 on the standby.

Do you have hot_standby_feedback set to "on" ?

What is the parameter max_standby_archive_delay configured to ? This will
pause WAL archives from being applied when queries are executed on the
standby database.

Why might that happen?

This is pg 9.3.12. When it happens I see:

pg_dump: Dumping the contents of table "TABLE" failed: PQgetResult()
failed.
pg_dump: Error message from server: ERROR: canceling statement due to
conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.
pg_dump: The command was: COPY public.TABLE (COLUMNS) TO stdout;

I suspect this is due to the clean up by VACUUM on primary.

Regards,
Venkata B N

Fujitsu Australia

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Jay Howard (#3)
Re: tx canceled on standby despite infinite max_standby_streaming_delay

Its customary to bottom-post (or respond inline) on these lists.

On Sun, May 15, 2016 at 7:01 PM, Jay Howard <jhoward@alumni.utexas.net>
wrote:

Do you have hot_standby_feedback set to "on" ?

It was off. Will research that. Thank you!

What is the parameter max_standby_archive_delay configured to ? This will

pause WAL archives from being applied when queries are executed on the
standby database.

It's set to the default, which is 30 seconds. For some reason I thought
setting "max_standby_streaming_delay" to -1 would be sufficient.

​At minimum I think there is room for improvement in the documentation here
since I spent probably a good 15-20 minutes trying to find an answer
related to either vacuum or WAL accumulation and could not discover
anything that directly permitted your situation to occur.​

At a high level, what's the difference between the "archive_delay" and

"streaming_delay"? I will read up on streaming replication in the mean
time.

http://www.postgresql.org/docs/9.5/static/hot-standby.html​

​"""

When a conflicting query is short, it's typically desirable to allow it to
complete by delaying WAL application for a little bit; but a long delay in
WAL application is usually not desirable. So the cancel mechanism has
parameters, max_standby_archive_delay and max_standby_streaming_delay, that
define the maximum allowed delay in WAL application. Conflicting queries
will be canceled once it has taken longer than the relevant delay setting
to apply any newly-received WAL data. There are two parameters so that
different delay values can be specified for the case of reading WAL data
from an archive (i.e., initial recovery from a base backup or "catching up"
a standby server that has fallen far behind) versus reading WAL data via
streaming replication.
​"""

​David J.

#5Jay Howard
jhoward@alumni.utexas.net
In reply to: David G. Johnston (#4)
Re: tx canceled on standby despite infinite max_standby_streaming_delay

On Sun, May 15, 2016 at 6:15 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

Its customary to bottom-post (or respond inline) on these lists.

On Sun, May 15, 2016 at 7:01 PM, Jay Howard <jhoward@alumni.utexas.net>
wrote:

Do you have hot_standby_feedback set to "on" ?

It was off. Will research that. Thank you!

What is the parameter max_standby_archive_delay configured to ? This

will pause WAL archives from being applied when queries are executed on the
standby database.

It's set to the default, which is 30 seconds. For some reason I thought
setting "max_standby_streaming_delay" to -1 would be sufficient.

​At minimum I think there is room for improvement in the documentation
here since I spent probably a good 15-20 minutes trying to find an answer
related to either vacuum or WAL accumulation and could not discover
anything that directly permitted your situation to occur.​

At a high level, what's the difference between the "archive_delay" and

"streaming_delay"? I will read up on streaming replication in the mean
time.

http://www.postgresql.org/docs/9.5/static/hot-standby.html​

​"""

When a conflicting query is short, it's typically desirable to allow it
to complete by delaying WAL application for a little bit; but a long delay
in WAL application is usually not desirable. So the cancel mechanism has
parameters, max_standby_archive_delay and max_standby_streaming_delay, that
define the maximum allowed delay in WAL application. Conflicting queries
will be canceled once it has taken longer than the relevant delay setting
to apply any newly-received WAL data. There are two parameters so that
different delay values can be specified for the case of reading WAL data
from an archive (i.e., initial recovery from a base backup or "catching up"
a standby server that has fallen far behind) versus reading WAL data via
streaming replication.
​"""

​David J.

W.r.t. improving the documentation, what I didn't realize is that WAL files
can be placed in the archive (making "max_standby_archive_delay" relevant)
in situations other than just "initial recovery from a base backup".

In my case, there was some heavy-duty stuff happening on the master that
caused the standby to get sufficiently far behind that WAL files were
"archived". Then, while they were being applied from the archive, my
pg_dump ran on the standby.