[PATCH] Preserve replication origin OIDs in pg_upgrade
Hello hackers,
The idea for this patch came up during discussions in the thread [1]/messages/by-id/182311743703924@mail.yandex.ru
on migration of the pg_commit_ts directory as part of pg_upgrade.
There was a problem raised by Sawada-san in that thread which this
patch addresses. [2]/messages/by-id/CAD21AoDG8zQpHHfw7OvaEy7W0ZSyP=_dS-hrcquJ3C_ctMDmMQ@mail.gmail.com
The problem:
The pg_commit_ts directory stores commit-timestamp records for each
transaction, and each record embeds the replication origin ID
(roident) that identifies which subscription wrote that transaction.
When pg_upgrade migrates a subscriber, the pg_commit_ts directory is
copied directly from the old cluster to the new cluster. This means
those embedded roidents must remain valid in the new cluster. When
pg_upgrade migrates a subscriber, CREATE SUBSCRIPTION on the new
cluster calls replorigin_create() which assigns fresh roidents to each
subscription's replication origin. Because subscription OIDs are not
stable across upgrades, the origin names change (e.g. pg_16392 becomes
pg_16403), and consequently the roidents can be assigned differently —
or in the worst case, swapped between subscriptions.
Consider two subscriptions subA and subB with roidents 1 and 2
respectively before upgrade. After upgrade, due to OID reassignment,
subA might get roident 2 and subB might get roident 1. The
commit-timestamp records copied from the old cluster still say roident
1 for rows written by subA, but the new cluster now thinks roident 1
belongs to subB. This causes spurious update_origin_differs conflicts
— the new cluster incorrectly thinks a row was last modified by a
different subscription than it actually was.
This patch attempts to fix this by replicating the roident of the
replication origins of each subscription on migration. This patch also
migrates all replication origins as part of pg_upgrade.
Sequence of Events During Upgrade
1. pg_dumpall dumps all non-subscription replication origins from the
old cluster with their roidents and LSN positions.
2. pg_dump dumps each subscription, but now records the old roident
alongside the subscription info.
3. During restore, pg_dumpall's output recreates non-subscription
origins on the new cluster with their original roidents via
binary_upgrade_create_replication_origin().
4. During per-database restore, CREATE SUBSCRIPTION runs but skips
origin creation.
5. binary_upgrade_set_next_replorigin_oid() creates the origin for
each subscription with the preserved roident.
6. binary_upgrade_replorigin_advance() restores the LSN position for
each subscription.
7. Subscriptions that were running before upgrade are re-enabled.
Please let me know your feedback regarding this patch
[1]: /messages/by-id/182311743703924@mail.yandex.ru
[2]: /messages/by-id/CAD21AoDG8zQpHHfw7OvaEy7W0ZSyP=_dS-hrcquJ3C_ctMDmMQ@mail.gmail.com
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v2-0001-Preserve-replication-origin-OIDs-in-pg_upgrade.patchapplication/octet-stream; name=v2-0001-Preserve-replication-origin-OIDs-in-pg_upgrade.patchDownload+320-10
Dear Ajin,
Sequence of Events During Upgrade
1. pg_dumpall dumps all non-subscription replication origins from the
old cluster with their roidents and LSN positions.
2. pg_dump dumps each subscription, but now records the old roident
alongside the subscription info.
3. During restore, pg_dumpall's output recreates non-subscription
origins on the new cluster with their original roidents via
binary_upgrade_create_replication_origin().
To confirm, why do we have to handle separately for subscription-associated
origins? I'm thinking it's not needed if the subscription's OID is preserved
during the upgrade.
I checked the old thread to preserve it [1]/messages/by-id/CALDaNm2Wj63VcbB0SY2NECHr1mKM1YSaV1ZydrdQVxyox2O2hg@mail.gmail.com, but it could not be accepted because
there are no strong motivations. But I feel this is the good reason to do so now.
How do you feel?
[1]: /messages/by-id/CALDaNm2Wj63VcbB0SY2NECHr1mKM1YSaV1ZydrdQVxyox2O2hg@mail.gmail.com
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Wed, 29 Apr 2026 at 14:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Ajin,
Sequence of Events During Upgrade
1. pg_dumpall dumps all non-subscription replication origins from the
old cluster with their roidents and LSN positions.
2. pg_dump dumps each subscription, but now records the old roident
alongside the subscription info.
3. During restore, pg_dumpall's output recreates non-subscription
origins on the new cluster with their original roidents via
binary_upgrade_create_replication_origin().To confirm, why do we have to handle separately for subscription-associated
origins? I'm thinking it's not needed if the subscription's OID is preserved
during the upgrade.
+1 to preserve the subscription OID. This should make preserving
replication origin easier.
I checked the old thread to preserve it [1], but it could not be accepted because
there are no strong motivations. But I feel this is the good reason to do so now.
Here is a rebased version of the patch.
Regards,
Vignesh
Attachments:
v1-0001-Preserve-subscription-OIDs-during-pg_upgrade.patchapplication/octet-stream; name=v1-0001-Preserve-subscription-OIDs-during-pg_upgrade.patchDownload+60-5
On Wed, Apr 29, 2026 at 2:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Ajin,
Sequence of Events During Upgrade
1. pg_dumpall dumps all non-subscription replication origins from the
old cluster with their roidents and LSN positions.
2. pg_dump dumps each subscription, but now records the old roident
alongside the subscription info.
3. During restore, pg_dumpall's output recreates non-subscription
origins on the new cluster with their original roidents via
binary_upgrade_create_replication_origin().To confirm, why do we have to handle separately for subscription-associated
origins? I'm thinking it's not needed if the subscription's OID is preserved
during the upgrade.
I’m not sure how preserving the subscription OID would ensure that the
origin ID is also preserved for sub-associated origins. Could you
please elaborate?
As I understand it, roident values are assigned independently during
origin creation. Even if subscription OIDs are preserved, the origin
IDs could still be reassigned differently on the new cluster. For
example, suppose we have two subscriptions, sub1 and sub2, with
roident values 2 and 3, assuming 1 was previously used and dropped.
After upgrade, origin creation may start allocating from 1 again,
resulting in roident values 1 and 2 instead. Since pg_commit_ts stores
the numeric roident, not the origin name, this mismatch could still
lead to incorrect conflict detection. Wouldn’t that result in the same
wrong conflict detection issue we are trying to avoid?
Please let me know if my understanding is wrong.
thanks
Shveta
On Thu, Apr 30, 2026 at 4:52 PM vignesh C <vignesh21@gmail.com> wrote:
On Wed, 29 Apr 2026 at 14:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Ajin,
Sequence of Events During Upgrade
1. pg_dumpall dumps all non-subscription replication origins from the
old cluster with their roidents and LSN positions.
2. pg_dump dumps each subscription, but now records the old roident
alongside the subscription info.
3. During restore, pg_dumpall's output recreates non-subscription
origins on the new cluster with their original roidents via
binary_upgrade_create_replication_origin().To confirm, why do we have to handle separately for subscription-associated
origins? I'm thinking it's not needed if the subscription's OID is preserved
during the upgrade.+1 to preserve the subscription OID. This should make preserving
replication origin easier.I checked the old thread to preserve it [1], but it could not be accepted because
there are no strong motivations. But I feel this is the good reason to do so now.Here is a rebased version of the patch.
Thanks Vignesh for the patch. I have used your patch as 0001 and
created mine on top of that as 0002. Like Kuroda-san said, with your
patch, I no longer need to have special handling of subscription
replication origins when pg_dumpall creates all replication origins on
the new cluster as now the name of origin is also guaranteed to be the
same because the replication origin name is created using the oid of
the subscription which is now the same because of the the changes in
patch 0001.
Here's v3 with the updated changes.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v3-0002-Preserve-replication-origin-OIDs-during-pg_upgrad.patchapplication/octet-stream; name=v3-0002-Preserve-replication-origin-OIDs-during-pg_upgrad.patchDownload+246-22
v3-0001-Preserve-subscription-OIDs-during-pg_upgrade.patchapplication/octet-stream; name=v3-0001-Preserve-subscription-OIDs-during-pg_upgrade.patchDownload+60-5
On Thu, Apr 30, 2026 at 7:37 PM shveta malik <shveta.malik@gmail.com> wrote:
I’m not sure how preserving the subscription OID would ensure that the
origin ID is also preserved for sub-associated origins. Could you
please elaborate?As I understand it, roident values are assigned independently during
origin creation. Even if subscription OIDs are preserved, the origin
IDs could still be reassigned differently on the new cluster. For
example, suppose we have two subscriptions, sub1 and sub2, with
roident values 2 and 3, assuming 1 was previously used and dropped.
After upgrade, origin creation may start allocating from 1 again,
resulting in roident values 1 and 2 instead. Since pg_commit_ts stores
the numeric roident, not the origin name, this mismatch could still
lead to incorrect conflict detection. Wouldn’t that result in the same
wrong conflict detection issue we are trying to avoid?
Please let me know if my understanding is wrong.
In the first patch, the replication origins were duplicated from the
old cluster to the new with matching roidents and ronames. This
couldn't be done for subscription replication origins as subscriptions
weren't preserving OIDs on the new cluster and therefore the
corresponding roname which is derived from the subscription OIDs also
differed. Now with matching roname and roident, all the replication
origins from the old cluster can be copied over to the new cluster in
one shot.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, May 6, 2026 at 11:13 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Thu, Apr 30, 2026 at 7:37 PM shveta malik <shveta.malik@gmail.com> wrote:
I’m not sure how preserving the subscription OID would ensure that the
origin ID is also preserved for sub-associated origins. Could you
please elaborate?As I understand it, roident values are assigned independently during
origin creation. Even if subscription OIDs are preserved, the origin
IDs could still be reassigned differently on the new cluster. For
example, suppose we have two subscriptions, sub1 and sub2, with
roident values 2 and 3, assuming 1 was previously used and dropped.
After upgrade, origin creation may start allocating from 1 again,
resulting in roident values 1 and 2 instead. Since pg_commit_ts stores
the numeric roident, not the origin name, this mismatch could still
lead to incorrect conflict detection. Wouldn’t that result in the same
wrong conflict detection issue we are trying to avoid?
Please let me know if my understanding is wrong.In the first patch, the replication origins were duplicated from the
old cluster to the new with matching roidents and ronames. This
couldn't be done for subscription replication origins as subscriptions
weren't preserving OIDs on the new cluster and therefore the
corresponding roname which is derived from the subscription OIDs also
differed.
Okay. I think you did not post the first patch you are referring to
here. V2 was posted directly. But I see your point.
Now with matching roname and roident, all the replication
origins from the old cluster can be copied over to the new cluster in
one shot.
Okay. Will review the patch.
thanks
Shveta
Dear Ajin,
Thanks for updating the patch. Let me share my two high-level comments.
1.
Can you clarify the policy for backward compatibility? In other words, should we
preserve subscription OIDs and migrate replication origins from PG19- instances?
Similar commits 9a17be1 and 29d0a77 did not allow migrating objects from released
versions.
2.
I found that other objects use global variables like binary_upgrade_next_xxx to
specify the next OID. Can we follow the manner even for replication origins?
Or it's not a good approach because of some reasons?
BTW, cfbot got angry with your patch.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Hello!
+ appendPQExpBuffer(buf,
+ "SELECT pg_catalog.binary_upgrade_create_replication_origin('%u'::pg_catalog.oid,
'%s'::pg_catalog.name",
+ roident, roname);
+
+ if (!PQgetisnull(res, i, i_remotelsn))
+ {
+ appendPQExpBuffer(buf, ", '%s'::pg_catalog.pg_lsn",
+ PQgetvalue(res, i, i_remotelsn));
+ }
+ else
Isn't some string escaping missing from this? It doesn't seem like the
most likely SQL injection target, but could still cause surprises.
Could be even verified by a test case using
pg_replication_origin_create('O''Brien_replica')
- if (old_cluster.nsubs > max_active_replication_origins)
+ if (old_cluster.nrepl_origins > max_active_replication_origins)
pg_fatal("\"max_active_replication_origins\" (%d) must be greater
than or equal to the number of "
"subscriptions (%d) on the old cluster",
- max_active_replication_origins, old_cluster.nsubs);
+ max_active_replication_origins, old_cluster.nrepl_origins);
This error message could be misleading now, it's not the number of
subscriptions but the number of replication origins.
+ rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
Do we need an ExclusiveLock, couldn't this instead use a
RowExclusiveLock? In the unlikely situation that another session
inserts the same record, the unique index should be able catch it?
+#include "access/xlogdefs.h"
This seems to be unnecessary?
- appendPQExpBuffer(query, ", '%s');\n", subinfo->suboriginremotelsn);
+ if (fout->remoteVersion < 190000 && subinfo->suboriginremotelsn)
+ {
subinfo->suboriginremotelsn is already checked by the outer if
On Thu, May 7, 2026 at 5:47 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Ajin,
Thanks for updating the patch. Let me share my two high-level comments.
1.
Can you clarify the policy for backward compatibility? In other words, should we
preserve subscription OIDs and migrate replication origins from PG19- instances?
Similar commits 9a17be1 and 29d0a77 did not allow migrating objects from released
versions.
I thought about this and after discussions with others decided that we
should support migrating replication_origins from PG17 onwards. Prior
to that pg_subscription_rel and remote LSN were not updated as part of
migration. It's important that we support migration of versions prior
to PG_19, otherwise we will end up skipping creating replication
origins as part of CREATE SUBSCRIPTION in the new cluster and
subscriptions will be without replication origins. So, I have updated
the check to migrate replication origins as long as the old cluster is
PG17 or greater.
2.
I found that other objects use global variables like binary_upgrade_next_xxx to
specify the next OID. Can we follow the manner even for replication origins?
Or it's not a good approach because of some reasons?
This naming convention is followed for preserving subscription OIDs
but since for replication origins we are creating each replication
origin with the same roname and roident, I don't think the same
situation applies here.
BTW, cfbot got angry with your patch.
Yes, fixed that.
On Fri, May 8, 2026 at 7:06 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:
Hello!
+ appendPQExpBuffer(buf, + "SELECT pg_catalog.binary_upgrade_create_replication_origin('%u'::pg_catalog.oid, '%s'::pg_catalog.name", + roident, roname); + + if (!PQgetisnull(res, i, i_remotelsn)) + { + appendPQExpBuffer(buf, ", '%s'::pg_catalog.pg_lsn", + PQgetvalue(res, i, i_remotelsn)); + } + elseIsn't some string escaping missing from this? It doesn't seem like the
most likely SQL injection target, but could still cause surprises.
Could be even verified by a test case using
pg_replication_origin_create('O''Brien_replica')
Fixed.
- if (old_cluster.nsubs > max_active_replication_origins) + if (old_cluster.nrepl_origins > max_active_replication_origins) pg_fatal("\"max_active_replication_origins\" (%d) must be greater than or equal to the number of " "subscriptions (%d) on the old cluster", - max_active_replication_origins, old_cluster.nsubs); + max_active_replication_origins, old_cluster.nrepl_origins);This error message could be misleading now, it's not the number of
subscriptions but the number of replication origins.
Fixed.
+ rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
Do we need an ExclusiveLock, couldn't this instead use a
RowExclusiveLock? In the unlikely situation that another session
inserts the same record, the unique index should be able catch it?
Fixed.
+#include "access/xlogdefs.h"
This seems to be unnecessary?
It is required.
- appendPQExpBuffer(query, ", '%s');\n", subinfo->suboriginremotelsn); + if (fout->remoteVersion < 190000 && subinfo->suboriginremotelsn) + {subinfo->suboriginremotelsn is already checked by the outer if
Fixed.
Attaching version 4 with the fixes.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v4-0001-Preserve-subscription-OIDs-during-pg_upgrade.patchapplication/octet-stream; name=v4-0001-Preserve-subscription-OIDs-during-pg_upgrade.patchDownload+60-5
v4-0002-Preserve-replication-origin-OIDs-during-pg_upgrad.patchapplication/octet-stream; name=v4-0002-Preserve-replication-origin-OIDs-during-pg_upgrad.patchDownload+246-42
A few more minor comments:
binary_upgrade_replorigin_advance seems like dead code in the patch
now, it has no callers since the patch removes its only use.
+ ReplOriginId node;
....
+ node = PG_GETARG_OID(0);
ReplOriginId is uint16, this silently truncates it. Since this is a
generic callable function shouldn't there be at least a check about
it?
Also, seems like the same function
(binary_upgrade_create_replication_origin) locks-unlocks-locks
ReplicationOriginRelationId, that doesn't seem the best approach with
a single-use helper function? (first in
replorigin_create_with_reploriginid, then
binary_upgrade_create_replication_origin reaquires it)
+ if (PQntuples(res) > 0 && archDumpFormat == archNull)
+ fprintf(OPF, "--\n-- Replication Origins \n--\n\n");
The caller also checks the format, it is redundant.
+ /* Get replication origins in current database. */
+ appendPQExpBufferStr(buf,
Isn't pg_replication_origin a shared catalog?
On Tue, May 12, 2026 at 4:32 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:
A few more minor comments:
binary_upgrade_replorigin_advance seems like dead code in the patch
now, it has no callers since the patch removes its only use.
Removed.
+ ReplOriginId node; .... + node = PG_GETARG_OID(0);ReplOriginId is uint16, this silently truncates it. Since this is a
generic callable function shouldn't there be at least a check about
it?
Added check.
Also, seems like the same function
(binary_upgrade_create_replication_origin) locks-unlocks-locks
ReplicationOriginRelationId, that doesn't seem the best approach with
a single-use helper function? (first in
replorigin_create_with_reploriginid, then
binary_upgrade_create_replication_origin reaquires it)
Rewrote the function and removed replorigin_create_with_reploriginid
to do all the work in binary_upgrade_create_replication_origin()
itself.
+ if (PQntuples(res) > 0 && archDumpFormat == archNull) + fprintf(OPF, "--\n-- Replication Origins \n--\n\n");The caller also checks the format, it is redundant.
Fixed.
+ /* Get replication origins in current database. */ + appendPQExpBufferStr(buf,Isn't pg_replication_origin a shared catalog?
Fixed.
Also changed the definition of
binary_upgrade_create_replication_origin to proparallel => 'u', as
parallel execution was causing a failure on FreeBSD in cfbot.
regards,
Ajin Cherian
Funitsu Australia