Non-pausing table scan on 9.6 replica?

Started by Mark Fletcherabout 7 years ago7 messagesgeneral
Jump to latest
#1Mark Fletcher
markf@corp.groups.io

Hi All,

On a 9.6 streaming replica, we do table scans for stats and other things.
During these scans, the replication is paused (the 'recovering' postgres
process has 'waiting' appended to it). We're not using transactions with
these scans. Is there anything we can do to prevent the pausing?

Thanks,
Mark

#2Andreas Kretschmer
andreas@a-kretschmer.de
In reply to: Mark Fletcher (#1)
Re: Non-pausing table scan on 9.6 replica?

Am 06.03.19 um 01:26 schrieb Mark Fletcher:

Hi All,

On a 9.6 streaming replica, we do table scans for stats and other
things. During these scans, the replication is paused (the
'recovering' postgres process has 'waiting' appended to it). We're not
using transactions with these scans. Is there anything we can do to
prevent the pausing?

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.
You can use ``hot_standby_feedback = on``, but the downside will be more
bloat on the tables.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

#3Mark Fletcher
markf@corp.groups.io
In reply to: Andreas Kretschmer (#2)
Re: Non-pausing table scan on 9.6 replica?

Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer <andreas@a-kretschmer.de>
wrote:

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.

Yes, we've bumped that up a lot.

You can use ``hot_standby_feedback = on``, but the downside will be more
bloat on the tables.

I'm not sure I understand. I'm not worried about the query being cancelled

on the replica. max_standby_streaming_delay fixes that for us. I'm worried
about the streaming replication being paused while this table scan is
running. If the table scan takes several minutes, then the replica becomes
several minutes out of sync with the master. I'd prefer that not to happen
and I'm wondering if there's a way to do that.

Thanks,
Mark

#4Andreas Kretschmer
andreas@a-kretschmer.de
In reply to: Mark Fletcher (#3)
Re: Non-pausing table scan on 9.6 replica?

Am 06.03.19 um 06:41 schrieb Mark Fletcher:

Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer
<andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.

Yes, we've bumped that up a lot.

i tought so.

You can use ``hot_standby_feedback = on``, but the downside will
be more
bloat on the tables.

I'm not sure I understand. I'm not worried about the query being
cancelled on the replica. max_standby_streaming_delay fixes that for
us. I'm worried about the streaming replication being paused while
this table scan is running. If the table scan takes several minutes,
then the replica becomes several minutes out of sync with the master.
I'd prefer that not to happen and I'm wondering if there's a way to do
that.

You can choose. max_standby_streaming_delay with delay in the streaming
(hence the name) or hot_standby_feedback, with the downside of possible
more bloat.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

#5Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Mark Fletcher (#3)
Re: Non-pausing table scan on 9.6 replica?

On Wed, Mar 6, 2019 at 1:41 PM Mark Fletcher <markf@corp.groups.io> wrote:

Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer <andreas@a-kretschmer.de>
wrote:

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.

Yes, we've bumped that up a lot.

Did you encounter cases where the queries were canceled on standby because
of streaming replication? Did you face any kind of recovery conflict
because of which you bumped this up?

You can use ``hot_standby_feedback = on``, but the downside will be more

bloat on the tables.

I'm not sure I understand. I'm not worried about the query being

cancelled on the replica. max_standby_streaming_delay fixes that for us.
I'm worried about the streaming replication being paused while this table
scan is running. If the table scan takes several minutes, then the replica
becomes several minutes out of sync with the master. I'd prefer that not to
happen and I'm wondering if there's a way to do that.

Typically all operations on Master will be replicated as soon as possible
and standby will apply them as it receives them. There could be situations
when the operation on master cannot be replicated because it would conflict
with the running query e.g. you fired a select (a long-running report) and
in the meantime someone updated a row. Now because of MVCC, if your SELECT
query was running on master it would still give consistent result. It will
also be the case with standby, standby will also maintain older version and
select would get consistent result.
Then VACUUM gets fired by virtue of acuto-vacuum. If a SELECT was running
on master, the auto-vacuum will factor in the older version of rows which
currently running SELECT queries would require and will not remove them.
But master has no idea about what is happening on standby and will remove
rows which are used/required by a SELECT on standby. When this VACUUM gets
replicated to standby it will realize that there is a conflict in on going
query and replicated VACUUM operation so it will be stalled (by the virtue
of max_standby_streaming_delay). Hence you end up with replication delay. A
much smarter thing would be if Master knew which row versions should not be
removed (because they are in use on standby), that is what
"hot_standby_feedback offers to achieve.

But conflict on standby could also happen because of other operations e.g.
a query is reading from a table on standby and you did a DDL operation on
the master that modifies the table. In that case an exclusive lock will be
acquired on master, but when that gets replicated it will result in
conflict and standby will stall replication.

If your priority is to have replica as close as possible to master then
disable max_standby_streaming_delay

Show quoted text

Thanks,
Mark

#6Mark Fletcher
markf@corp.groups.io
In reply to: Sameer Kumar (#5)
Re: Non-pausing table scan on 9.6 replica?

Andreas, Sameer,

Thank you for replying. I did not understand the purpose of
hot_standby_feedback, and your explanations helped. I turned it on, and the
pausing stopped.

Thanks,
Mark

#7Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Mark Fletcher (#6)
Re: Non-pausing table scan on 9.6 replica?

On Thu, Mar 7, 2019 at 5:16 AM Mark Fletcher <markf@corp.groups.io> wrote:

Andreas, Sameer,

Thank you for replying. I did not understand the purpose of
hot_standby_feedback, and your explanations helped. I turned it on, and the
pausing stopped.

Great!
But do bear in mind that this is also not without its own implications. As
pointed out by Andreas, this could lead to bloats.

Which version of PostgreSQL are you using? Sorry if I have missed that
information.

Show quoted text

Thanks,
Mark