Removing max_connection requirement on hot_standby

Started by Chris Winslettover 10 years ago3 messages

chris@compose.io

over 10 years ago

I'm orchestrating Postgres to behave as a leader-follower cluster. I've run
into issues when I am scaling down a connection count for a cluster
(scaling up is fine -- scaling down results in fatal errors). I use an
open source tool I've written to orchestrate the cluster called Governor (
http://github.com/compose/governor). It would work if I weren't running
hot_standby, but alas…I'm running with it.

I found the code which throws the fatal is actually a pre-flight test for
hot_standby written in 2009 (i.e. battle tested):

https://github.com/postgres/postgres/blob/efc16ea520679d713d98a2c7bf1453c4ff7b91ec/src/backend/access/transam/xlog.c#L5312-L5321

I've tested changing this value from a FATAL to a WARN. I've compiled and
tested my scenario and all appears to be correct:

https://github.com/compose/postgres/commit/2bdf6b36821987aadb401e1b8590ecc5b02126d8

In researching these lines of code, it appears the original FATAL code was
put in place to ensure that a hot_standby is as close as possibly
configured the same as the leader.

This change will also allow backups taken using `pg_basebackup` to work
with settings that different from the original host.

Am I missing something with this change?

Cheers,
Chris

Heikki Linnakangas

hlinnaka@iki.fi

over 10 years ago

In reply to: Chris Winslett (#1)

Re: Removing max_connection requirement on hot_standby

On 09/30/2015 04:31 AM, Chris Winslett wrote:

I'm orchestrating Postgres to behave as a leader-follower cluster. I've run
into issues when I am scaling down a connection count for a cluster
(scaling up is fine -- scaling down results in fatal errors). I use an
open source tool I've written to orchestrate the cluster called Governor (
http://github.com/compose/governor). It would work if I weren't running
hot_standby, but alas…I'm running with it.

I found the code which throws the fatal is actually a pre-flight test for
hot_standby written in 2009 (i.e. battle tested):

https://github.com/postgres/postgres/blob/efc16ea520679d713d98a2c7bf1453c4ff7b91ec/src/backend/access/transam/xlog.c#L5312-L5321

I've tested changing this value from a FATAL to a WARN. I've compiled and
tested my scenario and all appears to be correct:

https://github.com/compose/postgres/commit/2bdf6b36821987aadb401e1b8590ecc5b02126d8

It will work, until you actually have more active transactions in the
master than you have slots available in the standby.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Heikki Linnakangas (#2)

Re: Removing max_connection requirement on hot_standby

On 2015-09-30 19:36:22 +0300, Heikki Linnakangas wrote:

On 09/30/2015 04:31 AM, Chris Winslett wrote:

I'm orchestrating Postgres to behave as a leader-follower cluster. I've run
into issues when I am scaling down a connection count for a cluster
(scaling up is fine -- scaling down results in fatal errors). I use an
open source tool I've written to orchestrate the cluster called Governor (
http://github.com/compose/governor). It would work if I weren't running
hot_standby, but alas…I'm running with it.

I found the code which throws the fatal is actually a pre-flight test for
hot_standby written in 2009 (i.e. battle tested):

https://github.com/postgres/postgres/blob/efc16ea520679d713d98a2c7bf1453c4ff7b91ec/src/backend/access/transam/xlog.c#L5312-L5321

I've tested changing this value from a FATAL to a WARN. I've compiled and
tested my scenario and all appears to be correct:

s/correct/not fail directly/

https://github.com/compose/postgres/commit/2bdf6b36821987aadb401e1b8590ecc5b02126d8

The new phrasing certainly isn't accurate since it can actually prevent
replay from continuing.

It will work, until you actually have more active transactions in the master
than you have slots available in the standby.

There's a bunch of additional cases where it'll possibly fail. As the
actual maximum number of locks is determined by both max_locks_per_xact
and max_connections (and some more, but whatever) you can even run into
troubles with just one transaction on the primary. A case where that
frequently happens is when you restore a (partial) database dump because
that can easily cause a large number of exclusive locks to be acquired -
which will be WAL logged.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers