PostgreSQL Active-Active Clustering
Hi,
We are evaluating features of the Community edition of PostgreSQL in relation to a proposal for a prospective client and need help with the queries below:-
* Does the Community edition of PostgreSQL provide NATIVE active-active high availability clustering with objectives of scalability, load balancing and high availability without using any extensions or external components or usage of Kubernetes/Dockers.
* In the Community edition of PostgreSQL is it possible to setup a database cluster in load balancing mode and provide vertical and horizontal scalability without repartitioning or changes to the database objects or 3rd party transaction routing mechanisms using NATIVE features only (i.e. without using any extensions or external components or usage of Kubernetes/Dockers).
* In the Community edition of PostgreSQL is it possible to setup a cluster where all the nodes are able to concurrently read-write the underlying database image using NATIVE features (i.e. without using any extensions or external components or usage of Kubernetes/Dockers).
Kindly direct us to the right contributor/user group. Thanks in advance!
Regards,
Subhadeep
Subhadeep Sarkar
Program Director
Government & Public Service - Technology.
O +91 33 4403 4000
M +91 98301 69398
Email: :subhadeepsarkar@kpmg.com<mailto:subhadeepsarkar@kpmg.com>
KPMG Advisory Services Private Limited
Unit No. 604, 6th Floor, Tower 1, Plot No. 5,
Block DP, Godrej Waterside, Sector V,
Salt Lake,
Kolkata - 700 091
**********************************************************************
________________________________________
KPMG (in India) allows reasonable personal use of the e-mail system. Views and opinions expressed in these communications do not necessarily represent those of KPMG (in India).
*******************************************************************************************************
DISCLAIMER
The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorized. If you have received this communication in error, please address with the subject heading "Received in error," send to postmaster1@kpmg.com, then delete the e-mail and destroy any copies of it. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any opinions or advice contained in this e-mail are subject to the terms and conditions expressed in the governing KPMG client engagement letter. Opinions, conclusions and other information in this e-mail and any attachments that do not relate to the official business of the firm are neither given nor endorsed by it.
If this email is being sent by a KPMG team member deployed with a client in relation to KPMG’s engagement with such client for supporting the client through deployment of a team as per the requirement of the [Work Order/Letter of Award/Purchase order/Contract] and such team member is working under the client’s supervision and control, then the respective client is responsible for the content of this email and/or attachments to the email regardless of whether such team member is using client/project domain ID or that of KPMG.
If this email is received by Government Authority on a public email IDs (like gmail, yahoo etc.), then it should be noted that this was sent on public email ID since KPMG was not provided with an alternate secure email ID by the recipient. It is highlighted to the recipient that Government guidelines have advised not to use public emails and an alternate secure email id in line with Government IT Security policy should be provided to KPMG for further communications. KPMG will not be responsible for any situations which may arise on account of using public email id by such recipient.
KPMG cannot guarantee that e-mail communications are secure or error-free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses.
KPMG, an Indian partnership and a member firm of KPMG International Cooperative ("KPMG International"), an English entity that serves as a coordinating entity for a network of independent firms operating under the KPMG name. KPMG International Cooperative (“KPMG International”) provides no services to clients. Each member firm of KPMG International Cooperative (“KPMG International”) is a legally distinct and separate entity and each describes itself as such.
*******************************************************************************************************
Import Notes
Reply to msg id not found: PN0PR01MB9069922B90966633C8183C5AC5A12@PN0PR01MB9069.INDPRD01.PROD.OUTLOOK.COMReference msg id not found: PN0PR01MB9069922B90966633C8183C5AC5A12@PN0PR01MB9069.INDPRD01.PROD.OUTLOOK.COM
On Mon, Jul 15, 2024 at 12:06 PM Sarkar, Subhadeep <subhadeepsarkar@kpmg.com>
wrote:
We are evaluating features of the Community edition of PostgreSQL in
relation to a proposal for a prospective client and need help with the
queries below:-- Does the Community edition of PostgreSQL provide NATIVE
active-active high availability clustering with objectives of scalability,
load balancing and high availability without using any extensions or
external components or usage of Kubernetes/Dockers.
Basically, no. See "logical replication" if you'd like to delve into why I
qualified this one.
- In the Community edition of PostgreSQL is it possible to setup a
database cluster in load balancing mode and provide vertical and horizontal
scalability without repartitioning or changes to the database objects or
3rd party transaction routing mechanisms using NATIVE features only (i.e.
without using any extensions or external components or usage of
Kubernetes/Dockers).
No
- In the Community edition of PostgreSQL is it possible to setup a
cluster where all the nodes are able to concurrently read-write the
underlying database image using NATIVE features (i.e. without using any
extensions or external components or usage of Kubernetes/Dockers).
No
There is only one server process per underlying data directory in
PostgreSQL.
David J.
On Jul 15, 2024, at 12:06, Sarkar, Subhadeep <subhadeepsarkar@kpmg.com> wrote:
• Does the Community edition of PostgreSQL provide NATIVE active-active high availability clustering with objectives of scalability, load balancing and high availability without using any extensions or external components or usage of Kubernetes/Dockers.
• In the Community edition of PostgreSQL is it possible to setup a database cluster in load balancing mode and provide vertical and horizontal scalability without repartitioning or changes to the database objects or 3rd party transaction routing mechanisms using NATIVE features only (i.e. without using any extensions or external components or usage of Kubernetes/Dockers).
• In the Community edition of PostgreSQL is it possible to setup a cluster where all the nodes are able to concurrently read-write the underlying database image using NATIVE features (i.e. without using any extensions or external components or usage of Kubernetes/Dockers).
Short answer: No.
The community version of PostgreSQL, without any extensions beyond what is available in the core distribution, supports none of these. I will offer that your client is not being realistic if these are their requirements, and they expect them to be fulfilled by the core distribution of any open-source database.
There are commercial extensions to PostgreSQL that provide the first, but PostgreSQL does not do so out of the box. It is possible, now, to build this on top of community PostgreSQL with logical replication, but there is notable development work involved, and you cannot just drop an existing database into PostgreSQL and expect this to work. In any event, you will need to make sure the schema is compatible with an active-active model.
For the second, you can explore open-source projects such as Citus, but some attention to the schema and queries will be required.
No product, either commercial or open-source, provides the last one (read-write shared storage), although there are commercial products that provide for a shared-storage model single-writer, multiple-reader model (for example, Amazon Aurora).
On Mon, Jul 15, 2024 at 3:28 PM Christophe Pettus <xof@thebuild.com> wrote:
On Jul 15, 2024, at 12:06, Sarkar, Subhadeep <subhadeepsarkar@kpmg.com>
wrote:
[snip]
• In the Community edition of PostgreSQL is it possible to setup a
cluster where all the nodes are able to concurrently read-write the
underlying database image using NATIVE features (i.e. without using any
extensions or external components or usage of Kubernetes/Dockers).
[snip]
No product, either commercial or open-source, provides the last one
(read-write shared storage), although there are commercial products that
provide for a shared-storage model single-writer, multiple-reader model
(for example, Amazon Aurora).
This "lack of products" puzzles me, because DEC was doing this with VAX
(then Alpha and Itanium) clusters 40 years ago via a Distributed Lock
Manager integrated deep into VMS. Their Rdb and (CODASYL) DBMS products
used those functions extensively.
(In the late 1990s, they sold the DLM code to Oracle, which is where RAC
comes from.)
It was shared-disk, multiple-writer, because the DLM allowed for locking at
the row level. Thus, a half dozen cluster nodes could hold write locks on
different rows on the same data page.
## Ron Johnson (ronljohnsonjr@gmail.com):
This "lack of products" puzzles me, because DEC was doing this with VAX
(then Alpha and Itanium) clusters 40 years ago via a Distributed Lock
Manager integrated deep into VMS. Their Rdb and (CODASYL) DBMS products
Tech and trade-offs have changed over the last 40 years :)
These days you can so many cores in one package, while "more than one
processor" was quite a feat in the 80ies ("A dual processor VAX 11/780",
1982 https://dl.acm.org/doi/10.5555/800048.801738; also the 11/782 and
11/784), and you get so much RAM and storage (even fast storage, if
you keep it local) with that package. Response Latency really jumps
if you have to communicate with anything outside your box.
While latency matters, the number of problems where you absolutely
need that distributed lock manager has not really grown that much,
I think.
Regards,
Christoph
--
Spare Space
On Mon, Jul 15, 2024 at 5:54 PM Christoph Moench-Tegeder <cmt@burggraben.net>
wrote:
## Ron Johnson (ronljohnsonjr@gmail.com):
This "lack of products" puzzles me, because DEC was doing this with VAX
(then Alpha and Itanium) clusters 40 years ago via a Distributed Lock
Manager integrated deep into VMS. Their Rdb and (CODASYL) DBMS productsTech and trade-offs have changed over the last 40 years :)
These days you can so many cores in one package, while "more than one
processor" was quite a feat in the 80ies ("A dual processor VAX 11/780",
1982 https://dl.acm.org/doi/10.5555/800048.801738; also the 11/782 and
11/784), and you get so much RAM and storage (even fast storage, if
you keep it local) with that package. Response Latency really jumps
if you have to communicate with anything outside your box.
While latency matters, the number of problems where you absolutely
need that distributed lock manager has not really grown that much,
I think.
Customers still want High Availability, and VMS Clusters were great for
HA.
Στις 15/7/24 22:55, ο/η Ron Johnson έγραψε:
On Mon, Jul 15, 2024 at 3:28 PM Christophe Pettus <xof@thebuild.com>
wrote:On Jul 15, 2024, at 12:06, Sarkar, Subhadeep
<subhadeepsarkar@kpmg.com> wrote:
[snip]
• In the Community edition of PostgreSQL is it possible to
setup a cluster where all the nodes are able to concurrently
read-write the underlying database image using NATIVE features
(i.e. without using any extensions or external components or usage
of Kubernetes/Dockers).[snip]
No product, either commercial or open-source, provides the last
one (read-write shared storage), although there are commercial
products that provide for a shared-storage model single-writer,
multiple-reader model (for example, Amazon Aurora).This "lack of products" puzzles me, because DEC was doing this with
VAX (then Alpha and Itanium) clusters 40 years ago via a Distributed
Lock Manager integrated deep into VMS. Their Rdb and (CODASYL) DBMS
products used those functions extensively.
IMHO IBM did something similar with their shared DASDi back in 70s+.
There was serialization mechanism enabling concurrent writes to data
sets (meaning files). Not to mention IBM had great VM technology back in
the day.
(In the late 1990s, they sold the DLM code to Oracle, which is where
RAC comes from.)It was shared-disk, multiple-writer, because the DLM allowed for
locking at the row level. Thus, a half dozen cluster nodes could hold
write locks on different rows on the same data page.
--
Achilleas Mantzios
IT DEV - HEAD
IT DEPT
Dynacom Tankers Mgmt (as agents only)
On Jul 15, 2024, at 12:06, Sarkar, Subhadeep <subhadeepsarkar@kpmg.com> wrote:
• Does the Community edition of PostgreSQL provide NATIVE active-active high availability clustering with objectives of scalability, load balancing and high availability without using any extensions or external components or usage of Kubernetes/Dockers.
• In the Community edition of PostgreSQL is it possible to setup a database cluster in load balancing mode and provide vertical and horizontal scalability without repartitioning or changes to the database objects or 3rd party transaction routing mechanisms using NATIVE features only (i.e. without using any extensions or external components or usage of Kubernetes/Dockers).
• In the Community edition of PostgreSQL is it possible to setup a cluster where all the nodes are able to concurrently read-write the underlying database image using NATIVE features (i.e. without using any extensions or external components or usage of Kubernetes/Dockers).
I do have to add that this list of requirements sounds very much like a set supplied by the sales organization of a commercial database vendor (in fact, a particular commercial database vendor) in order to exclude open-source software. Open-source software relies very much on a whole ecosystem, rather than attempting to deliver every imaginable feature in the core distribution. This is especially true of projects like PostgreSQL which are not backed by a single company and do not have an official commercial distribution. Can your client articulate why they need these specific features, and why they must be in the core distribution?