Advice on cluster architecture for two related, but distinct, use cases
Dear all,
(This is a follow-up to a question I asked almost exactly a year ago, https://postgrespro.com/list/thread-id/2670756#726F3765-858C-4AC0-A7B0-5CB6720E4B37@leisi.net - the requirements have changed since then, and the platform has changed from OpenBSD to Linux, which may make some things easier.)
I’m looking for advice on Postgres cluster architecture(s) for two related but distinct use cases. Ideally, the approaches for the two use cases would not differ too widely.
The goal of clustering is low RPO (I guess we need sync clustering) and RTO (ideally almost-instant failover); throughput is not a concern (it’s relatively low transaction volume). Latency (due to the distance between datacenters which provide georedundancy) is a fact we are willing to accept (eg for syncs to happen on all cluster members).
The first use case is in an environment under our own control (and where eg a DBA could intervene). We can theoretically run any number of cluster instances, but assume we would use an even number (split over the two datacenters), or potentially an odd number of nodes (eg with an arbiter).
In the second use case, the environment is not under our control, so we can only assume basic network connectivity from the application to the DB, and between the DBs (the latter potentially through an SSH tunnel if needed). In this use case, we can not assume a person to intervene if a node goes down, and would prefer some automated failover to the other node (this automation would also be welcome for the first use case, eg if something happens while nobody is watching). We can not assume eg a load balancer.
There could be various ways how the environment in the second use case is set up, ranging from „application and database running on the same box“ (well, no clustering for you then…), to dedicated two- or three node cluster.
In both use cases, we have full control over the application and the database itself.
From reading various docs, it seems we would need something like Patroni (/Percona), at least for the first use case. However it seems relatively complex to set up and operate to me.
I would appreciate your experience and input into which approach would best fit the two use cases. We are also willing to engage in paid consulting.
Thanks,
— Matthias
--
Matthias Leisi
Katzenrütistrasse 68, 8153 Rümlang
Mobile +41 79 377 04 43
matthias@leisi.net
Some of those requirements are vague, but yes, Patroni should probably be
the first approach you look at. If the second datacenter is just for
redundancy, then a simple setup would be:
DCA (data center A):
Postgres server 1
Postgres server 2
DCB:
Postgres server 3 (set no_failover: true)
You will also need a DCS system of some sort (e.g. etcd on all three
nodes), as well as a backup system (e.g. pgBackRest). Will also need to
decide how automated you want things to be (for example, cross datacenter
failover in the above would be manually done). It should definitely be able
to handle your RPO/RTO requirements easily enough.
[Patroni] However it seems relatively complex to set up and operate
Setting things up can be a little complex, yes, but once done it just
works, so very little operation resources are needed.
We can not assume eg a load balancer.
Possible via the application: see
https://www.postgresql.org/docs/current/libpq-connect.html (esp.
target_session_attrs)
Cheers,
Greg
CC
On Mon, Nov 11, 2024 at 09:28 Greg Sabino Mullane <htamfids@gmail.com>
wrote:
Show quoted text
Some of those requirements are vague, but yes, Patroni should probably be
the first approach you look at. If the second datacenter is just for
redundancy, then a simple setup would be:DCA (data center A):
Postgres server 1
Postgres server 2DCB:
Postgres server 3 (set no_failover: true)You will also need a DCS system of some sort (e.g. etcd on all three
nodes), as well as a backup system (e.g. pgBackRest). Will also need to
decide how automated you want things to be (for example, cross datacenter
failover in the above would be manually done). It should definitely be able
to handle your RPO/RTO requirements easily enough.[Patroni] However it seems relatively complex to set up and operate
Setting things up can be a little complex, yes, but once done it just
works, so very little operation resources are needed.We can not assume eg a load balancer.
Possible via the application: see
https://www.postgresql.org/docs/current/libpq-connect.html (esp.
target_session_attrs)Cheers,
Greg