hot_standby_feedback

Started by Torsten Förtschover 9 years ago2 messagesgeneral

tfoertsch123@gmail.com

over 9 years ago

Hi,

I am in the process of reviewing our configs for a number of 9.3 databases
and found a replica with hot_standby_feedback=on. I remember when we set it
long ago we were fighting cancelled queries. I also remember that it never
really worked for us. In the end we set up 2 replicas, one suitable for
short queries where we prefer low replication lag, and another one where we
allow for long running queries but sacrifice timeliness
(max_standby_*_delay=-1).

I have a hunch why hot_standby_feedback=on didn't work. But I never
verified it. So, here it is. The key is this sentence:

"Feedback messages will not be sent more frequently than once per
wal_receiver_status_interval."

That interval is 10 sec. So, assuming a transaction on the replica uses a
row right after the message has been sent. Then there is a 10 sec window in
which the master cannot know that the row is needed on the replica and can
vacuum it. If then the transaction on the replica takes longer than
max_standby_*_delay, the only option is to cancel it.

Is that explanation correct?

What is the correct way to use hot_standby_feedback to prevent
cancellations reliably? (and accepting the bloat)

Thanks,
Torsten

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Torsten Förtsch (#1)

Re: hot_standby_feedback

On 2016-11-28 22:14:55 +0100, Torsten Fï¿½rtsch wrote:

Hi,

I am in the process of reviewing our configs for a number of 9.3 databases
and found a replica with hot_standby_feedback=on. I remember when we set it
long ago we were fighting cancelled queries. I also remember that it never
really worked for us. In the end we set up 2 replicas, one suitable for
short queries where we prefer low replication lag, and another one where we
allow for long running queries but sacrifice timeliness
(max_standby_*_delay=-1).

There's a few kind of conflicts against which hs_feedback doesn't
protect. E.g. exclusive locks on tables that are in use and such
(e.g. by vacuum truncating a table or an explicit drop table).

There's a table with some information about the causes of cancellations,
pg_stat_database_conflicts - did you check that?

I have a hunch why hot_standby_feedback=on didn't work. But I never
verified it. So, here it is. The key is this sentence:

"Feedback messages will not be sent more frequently than once per
wal_receiver_status_interval."

That interval is 10 sec. So, assuming a transaction on the replica uses a
row right after the message has been sent. Then there is a 10 sec window in
which the master cannot know that the row is needed on the replica and can
vacuum it. If then the transaction on the replica takes longer than
max_standby_*_delay, the only option is to cancel it.

Is that explanation correct?

No. That just means that we don't update the value more frequently. The
value reported is a "horizon" meaning that nothing older than the
reported value can be accessed.

Greetings,

Andres Freund

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general