Parallel Apply

Started by Amit Kapila11 months ago108 messageshackers

amit.kapila16@gmail.com

11 months ago

Hi,

Background and Motivation
-------------------------------------
In high-throughput systems, where hundreds of sessions generate data
on the publisher, the subscriber's apply process often becomes a
bottleneck due to the single apply worker model. While users can
mitigate this by creating multiple publication-subscription pairs,
this approach has scalability and usability limitations.

Currently, PostgreSQL supports parallel apply only for large streaming
transactions (streaming=parallel). This proposal aims to extend
parallelism to non-streaming transactions, thereby improving
replication performance in workloads dominated by smaller, frequent
transactions.

Design Overview
------------------------
To safely parallelize non-streaming transactions, we must ensure that
transaction dependencies are respected to avoid failures and
deadlocks. Consider the following scenarios to understand it better:
(a) Transaction failures: Say, if we insert a row in the first
transaction and update it in the second transaction on the publisher,
then allowing the subscriber to apply both in parallel can lead to
failure in the update; (b) Deadlocks - allowing transactions that
update the same set of rows in a table in the opposite order in
parallel can lead to deadlocks.

The core idea is that the leader apply worker ensures the following:
a. Identifies dependencies between transactions. b. Coordinates
parallel workers to apply independent transactions concurrently. c.
Ensures correct ordering for dependent transactions.

Dependency Detection
--------------------------------
1. Basic Dependency Tracking: Maintain a hash table keyed by
(RelationId, ReplicaIdentity) with the value as the transaction XID.
Before dispatching a change to a parallel worker, the leader checks
for existing entries: (a) If no match: add the entry and proceed; (b)
If match: instruct the worker to wait until the dependent transaction
completes.

2. Unique Keys
In addition to RI, track unique keys to detect conflicts. Example:
CREATE TABLE tab1(a INT PRIMARY KEY, b INT UNIQUE);
Transactions on publisher:
Txn1: INSERT (1,1)
Txn2: INSERT (2,2)
Txn3: DELETE (2,2)
Txn4: UPDATE (1,1) → (1,2)

If Txn4 is applied before Txn2 and Txn3, it will fail due to a unique
constraint violation. To prevent this, track both RI and unique keys
in the hash table. Compare keys of both old and new tuples to detect
dependencies. Then old_tuple's RI needs to be compared, and new
tuple's, both unique key and RI (new tuple's RI is required to detect
some prior insertion with the same key) needs to be compared with
existing hash table entries to identify transaction dependency.

3. Foreign Keys
Consider FK constraints between tables. Example:

TABLE owner(user_id INT PRIMARY KEY);
TABLE car(car_name TEXT, user_id INT REFERENCES owner);

Transactions:
Txn1: INSERT INTO owner(1)
Txn2: INSERT INTO car('bz', 1)

Applying Txn2 before Txn1 will fail. To avoid this, check if FK values
in new tuples match any RI or unique key in the hash table. If
matched, treat the transaction as dependent.

4. Triggers and Constraints
For the initial version, exclude tables with user-defined triggers or
constraints from parallel apply due to complexity in dependency
detection. We may need some parallel-apply-safe marking to allow this.

Replication Progress Tracking
-----------------------------------------
Parallel apply introduces out-of-order commit application,
complicating replication progress tracking. To handle restarts and
ensure consistency:

Track Three Key Metrics:
lowest_remote_lsn: Starting point for applying transactions.
highest_remote_lsn: Highest LSN that has been applied.
list_remote_lsn: List of commit LSNs applied between the lowest and highest.

Mechanism:
Store these in ReplicationState: lowest_remote_lsn,
highest_remote_lsn, list_remote_lsn. Flush these to disk during
checkpoints similar to CheckPointReplicationOrigin.

After Restart, Start from lowest_remote_lsn and for each transaction,
if its commit LSN is in list_remote_lsn, skip it, otherwise, apply it.
Once commit LSN > highest_remote_lsn, apply without checking the list.

During apply, the leader maintains list_in_progress_xacts in the
increasing commit order. On commit, update highest_remote_lsn. If
commit LSN matches the first in-progress xact of
list_in_progress_xacts, update lowest_remote_lsn, otherwise, add to
list_remote_lsn. After commit, also remove it from the
list_in_progress_xacts. We need to clean up entries below
lowest_remote_lsn in list_remote_lsn while updating its value.

To illustrate how this mechanism works, consider the following four
transactions:

Transaction ID Commit LSN
501 1000
502 1100
503 1200
504 1300

Assume:
Transactions 501 and 502 take longer to apply whereas transactions 503
and 504 finish earlier. Parallel apply workers are assigned as
follows:
pa-1 → 501
pa-2 → 502
pa-3 → 503
pa-4 → 504

Initial state: list_in_progress_xacts = [501, 502, 503, 504]

Step 1: Transaction 503 commits first and in RecordTransactionCommit,
it updates highest_remote_lsn to 1200. In apply_handle_commit, since
503 is not the first in list_in_progress_xacts, add 1200 to
list_remote_lsn. Remove 503 from list_in_progress_xacts.
Step 2: Transaction 504 commits, Update highest_remote_lsn to 1300.
Add 1300 to list_remote_lsn. Remove 504 from list_in_progress_xacts.
ReplicationState now:
lowest_remote_lsn = 0
list_remote_lsn = [1200, 1300]
highest_remote_lsn = 1300
list_in_progress_xacts = [501, 502]

Step 3: Transaction 501 commits. Since 501 is now the first in
list_in_progress_xacts, update lowest_remote_lsn to 1000. Remove 501
from list_in_progress_xacts. Clean up list_remote_lsn to remove
entries < lowest_remote_lsn (none in this case).
ReplicationState now:
lowest_remote_lsn = 1000
list_remote_lsn = [1200, 1300]
highest_remote_lsn = 1300
list_in_progress_xacts = [502]

Step 4: System crash and restart
Upon restart, Start replication from lowest_remote_lsn = 1000. First
transaction encountered is 502, since it is not present in
list_remote_lsn, apply it. As transactions 503 and 504 are present in
list_remote_lsn, we skip them. Note that each transaction's
end_lsn/commit_lsn has to be compared which the apply worker receives
along with the first transaction command BEGIN. This ensures
correctness and avoids duplicate application of already committed
transactions.

Upon restart, start replication from lowest_remote_lsn = 1000. First
transaction encountered is 502 with commit LSN 1100, since it is not
present in list_remote_lsn, apply it. As transactions 503 and 504's
respective commit LSNs [1200, 1300] are present in list_remote_lsn, we
skip them. This ensures correctness and avoids duplicate application
of already committed transactions.

Now, it is possible that some users may want to parallelize the
transaction but still want to maintain commit order because they don't
explicitly annotate FK, PK for columns but maintain the integrity via
application. So, in such cases as we won't be able to detect
transaction dependencies, it would be better to allow out-of-order
commits optionally.

Thoughts?

--
With Regards,
Amit Kapila.

Kirill Reshke

reshkekirill@gmail.com

11 months ago

In reply to: Amit Kapila (#1)

Re: Parallel Apply

Hi!

On Mon, 11 Aug 2025 at 09:46, Amit Kapila <amit.kapila16@gmail.com> wrote:

Hi,

Background and Motivation
-------------------------------------
In high-throughput systems, where hundreds of sessions generate data
on the publisher, the subscriber's apply process often becomes a
bottleneck due to the single apply worker model. While users can
mitigate this by creating multiple publication-subscription pairs,
this approach has scalability and usability limitations.

Currently, PostgreSQL supports parallel apply only for large streaming
transactions (streaming=parallel). This proposal aims to extend
parallelism to non-streaming transactions, thereby improving
replication performance in workloads dominated by smaller, frequent
transactions.

Sure.

Design Overview
------------------------
To safely parallelize non-streaming transactions, we must ensure that
transaction dependencies are respected to avoid failures and
deadlocks. Consider the following scenarios to understand it better:
(a) Transaction failures: Say, if we insert a row in the first
transaction and update it in the second transaction on the publisher,
then allowing the subscriber to apply both in parallel can lead to
failure in the update; (b) Deadlocks - allowing transactions that
update the same set of rows in a table in the opposite order in
parallel can lead to deadlocks.

Build-in subsystem for transaction dependency tracking would be highly
beneficial for physical replication speedup projects like[0]https://github.com/koichi-szk/postgres

Thoughts?

Surely we need to give it a try.

[0]: https://github.com/koichi-szk/postgres

--
Best regards,
Kirill Reshke

Amit Kapila

amit.kapila16@gmail.com

11 months ago

In reply to: Kirill Reshke (#2)

Re: Parallel Apply

On Mon, Aug 11, 2025 at 1:39 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

Design Overview
------------------------
To safely parallelize non-streaming transactions, we must ensure that
transaction dependencies are respected to avoid failures and
deadlocks. Consider the following scenarios to understand it better:
(a) Transaction failures: Say, if we insert a row in the first
transaction and update it in the second transaction on the publisher,
then allowing the subscriber to apply both in parallel can lead to
failure in the update; (b) Deadlocks - allowing transactions that
update the same set of rows in a table in the opposite order in
parallel can lead to deadlocks.

Build-in subsystem for transaction dependency tracking would be highly
beneficial for physical replication speedup projects like[0]

I am not sure if that is directly applicable because this work
proposes to track dependencies based on logical WAL contents. However,
if you can point me to README on the overall design of the work you
are pointing to then I can check it once.

--
With Regards,
Amit Kapila.

Kirill Reshke

reshkekirill@gmail.com

11 months ago

In reply to: Amit Kapila (#3)

Re: Parallel Apply

On Mon, 11 Aug 2025 at 13:45, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if that is directly applicable because this work
proposes to track dependencies based on logical WAL contents. However,
if you can point me to README on the overall design of the work you
are pointing to then I can check it once.

The only doc on this that I am aware of is [0]https://wiki.postgresql.org/wiki/Parallel_Recovery. The project is however
more dead than alive, but I hope this is just a temporary stop of
development, not permanent.

[0]: https://wiki.postgresql.org/wiki/Parallel_Recovery

--
Best regards,
Kirill Reshke

Andrei Lepikhov

lepihov@gmail.com

11 months ago

In reply to: Amit Kapila (#1)

Re: Parallel Apply

On 11/8/2025 06:45, Amit Kapila wrote:

The core idea is that the leader apply worker ensures the following:
a. Identifies dependencies between transactions. b. Coordinates
parallel workers to apply independent transactions concurrently. c.
Ensures correct ordering for dependent transactions.

Dependency detection may be quite an expensive operation. What about a
'positive' approach - deadlock detection on replica and, restart apply
of a record that should be applied later? Have you thought about this
way? What are the pros and cons here? Do you envision common cases where
such a deadlock will be frequent?

--
regards, Andrei Lepikhov

Amit Kapila

amit.kapila16@gmail.com

11 months ago

In reply to: Andrei Lepikhov (#5)

Re: Parallel Apply

On Tue, Aug 12, 2025 at 12:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 11/8/2025 06:45, Amit Kapila wrote:

The core idea is that the leader apply worker ensures the following:
a. Identifies dependencies between transactions. b. Coordinates
parallel workers to apply independent transactions concurrently. c.
Ensures correct ordering for dependent transactions.

Dependency detection may be quite an expensive operation. What about a
'positive' approach - deadlock detection on replica and, restart apply
of a record that should be applied later? Have you thought about this
way? What are the pros and cons here? Do you envision common cases where
such a deadlock will be frequent?

It is not only deadlocks but we could also incorrectly apply some
transactions which should otherwise fail. For example, consider
following case:
Pub: t1(c1 int unique key, c2 int)
Sub: t1(c1 int unique key, c2 int)
On Pub:
TXN-1
insert(1,11)
TXN-2
update (1,11) --> update (2,12)

On Sub:
table contains (1,11) before replication.
Now, if we allow dependent transactions to go in parallel, instead of
giving an ERROR while doing Insert, the update will be successful and
next insert will also be successful. This will create inconsistency on
the subscriber-side.

Similarly consider another set of transactions:
On Pub:
TXN-1
insert(1,11)
TXN-2
Delete (1,11)

On subscriber, if we allow TXN-2 before TXN-1, then the subscriber
will apply both transactions successfully but will become inconsistent
w.r.t publisher.

My colleague had already built a POC based on this idea and we did
check some initial numbers for non-dependent transactions and the
apply speed has improved drastically. We will share the POC patch and
numbers in the next few days.

For the dependent transactions workload, if we choose to go with the
deadlock detection approach, there will be lot of retries which may
not lead to good apply improvements. Also, we may choose to enable
this form of parallel-apply optionally due to reasons mentioned in my
first email, so if there is overhead due to dependency tracking then
one can disable parally apply for those particular subscriptions.

--
With Regards,
Amit Kapila.

Amit Kapila

amit.kapila16@gmail.com

11 months ago

In reply to: Kirill Reshke (#4)

Re: Parallel Apply

On Mon, Aug 11, 2025 at 3:00 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Mon, 11 Aug 2025 at 13:45, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if that is directly applicable because this work
proposes to track dependencies based on logical WAL contents. However,
if you can point me to README on the overall design of the work you
are pointing to then I can check it once.

The only doc on this that I am aware of is [0]. The project is however
more dead than alive, but I hope this is just a temporary stop of
development, not permanent.

[0] https://wiki.postgresql.org/wiki/Parallel_Recovery

Thanks for sharing the wiki page. After reading, it seems we can't use
the exact dependency tracking mechanism as both the projects have
different dependency requirements. However, it could be an example to
refer to and maybe some parts of the infrastructure could be reused.

--
With Regards,
Amit Kapila.

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

11 months ago

In reply to: Amit Kapila (#1)

Re: Parallel Apply

On 11.08.2025 7:45 AM, Amit Kapila wrote: