Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

Started by Chao Li3 months ago20 messages
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,
While working with logical replication and partitioned tables, I noticed an
inconsistency between how publications treat partitions and how "ALTER
TABLE ... REPLICA IDENTITY" behaves.

When a publication is created on a partitioned table, e.g.:
```
CREATE PUBLICATION pub FOR TABLE parent;
```

PostgreSQL automatically includes all leaf partitions of the table in the
publication. This matches the user’s expectation that a partitioned table
behaves as a single logical entity.

However, if the user then runs:
```
ALTER TABLE parent REPLICA IDENTITY FULL;
```
only the parent table’s relreplident is updated. None of the leaf
partitions inherit this change, even though the parent itself has no
storage and its replication identity plays no role in logical replication.
Logical decoding always operates on the leaf partitions, and their
replication identities determine whether UPDATE/DELETE can be replicated
safely.

This gap leads to several problems:

* The parent table’s replica identity is effectively irrelevant during
logical replication, since it never stores tuples or produces WAL.
* Users expect that altering the replica identity on the partitioned table
would apply to all partitions that are implicitly included in the
publication.
* As a result, users currently need to run ALTER TABLE ... REPLICA
IDENTITY separately for every leaf partition, which is tedious and
error-prone on large partition hierarchies.
* Misconfiguration usually surfaces only when logical replication starts
failing on UPDATE/DELETE for specific leaf partitions due to mismatched
replica identities.

To address this, the attached patch makes:
```
ALTER TABLE parent REPLICA IDENTITY <type>
```
cascade the new setting to all leaf partitions of the table. Partitioned
tables (RELKIND_PARTITIONED_TABLE) are skipped since they have no storage
and no effective replica identity.

This aligns ALTER TABLE behavior with how publications already expand
partitioned tables, and makes replication identity configuration consistent
with logical replication semantics.

The attached patch is not yet fully ready for detailed review, this is more
of a proof-of-concept. At this stage, I mainly want to see whether people
agree with the idea, or if there are objections to cascading replica
identity changes for partitioned tables before I refine the patch further.

Comments and feedback are welcome.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v1-0001-tablecmds-Cascade-REPLICA-IDENTITY-changes-to-lea.patchapplication/octet-stream; name=v1-0001-tablecmds-Cascade-REPLICA-IDENTITY-changes-to-lea.patchDownload+19-1
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Chao Li (#1)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Thu, Dec 11, 2025 at 2:46 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi,
While working with logical replication and partitioned tables, I noticed an inconsistency between how publications treat partitions and how "ALTER TABLE ... REPLICA IDENTITY" behaves.

When a publication is created on a partitioned table, e.g.:
```
CREATE PUBLICATION pub FOR TABLE parent;
```

PostgreSQL automatically includes all leaf partitions of the table in the publication. This matches the user’s expectation that a partitioned table behaves as a single logical entity.

However, if the user then runs:
```
ALTER TABLE parent REPLICA IDENTITY FULL;
```
only the parent table’s relreplident is updated. None of the leaf partitions inherit this change, even though the parent itself has no storage and its replication identity plays no role in logical replication. Logical decoding always operates on the leaf partitions, and their replication identities determine whether UPDATE/DELETE can be replicated safely.

This gap leads to several problems:

* The parent table’s replica identity is effectively irrelevant during logical replication, since it never stores tuples or produces WAL.

When we use row filters, if publish_via_partition_root option of
publication is true, the root partitioned table's row filter is used.
I think this would then refer RI of partitioned table for validity of
row filter. Please see docs [1]https://www.postgresql.org/docs/devel/sql-createpublication.html (There can be a case where a
subscription combines multiple publications. If a partitioned table is
published by any subscribed publications which set
publish_via_partition_root = true, changes on this partitioned table
(or on its partitions) will be published using the identity and schema
of this partitioned table rather than that of the individual
partitions. This parameter also affects how row filters and column
lists are chosen for partitions; see below for details.) for more
details.

I have not tested it but you can once try to see how it behaves.

The other point is what if one of the partition already has RI defined
to a different value than what is defined for parent table?

[1]: https://www.postgresql.org/docs/devel/sql-createpublication.html

--
With Regards,
Amit Kapila.

#3Chao Li
li.evan.chao@gmail.com
In reply to: Amit Kapila (#2)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Dec 11, 2025, at 20:43, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 11, 2025 at 2:46 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi,
While working with logical replication and partitioned tables, I noticed an inconsistency between how publications treat partitions and how "ALTER TABLE ... REPLICA IDENTITY" behaves.

When a publication is created on a partitioned table, e.g.:
```
CREATE PUBLICATION pub FOR TABLE parent;
```

PostgreSQL automatically includes all leaf partitions of the table in the publication. This matches the user’s expectation that a partitioned table behaves as a single logical entity.

However, if the user then runs:
```
ALTER TABLE parent REPLICA IDENTITY FULL;
```
only the parent table’s relreplident is updated. None of the leaf partitions inherit this change, even though the parent itself has no storage and its replication identity plays no role in logical replication. Logical decoding always operates on the leaf partitions, and their replication identities determine whether UPDATE/DELETE can be replicated safely.

This gap leads to several problems:

* The parent table’s replica identity is effectively irrelevant during logical replication, since it never stores tuples or produces WAL.

When we use row filters, if publish_via_partition_root option of
publication is true, the root partitioned table's row filter is used.
I think this would then refer RI of partitioned table for validity of
row filter. Please see docs [1] (There can be a case where a
subscription combines multiple publications. If a partitioned table is
published by any subscribed publications which set
publish_via_partition_root = true, changes on this partitioned table
(or on its partitions) will be published using the identity and schema
of this partitioned table rather than that of the individual
partitions. This parameter also affects how row filters and column
lists are chosen for partitions; see below for details.) for more
details.

I have not tested it but you can once try to see how it behaves.

The other point is what if one of the partition already has RI defined
to a different value than what is defined for parent table?

[1] - https://www.postgresql.org/docs/devel/sql-createpublication.html

--
With Regards,
Amit Kapila.

Hi Amit,

Thanks for pointing out that my assumption of “RI of parent is not used” is not always true.

I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about to extend the ALTER TABLE syntax like:

```
ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE]
```

So, that the current syntax will behave the same as usual, and

With CASCADE
============
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated unless 3
3. If any child’s RI is different from the root's RI, fail out, no change happens

With CASCADE FORCE
===================
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’s RI is different from root’s RI

"ALTER TABLE parent REPLICA IDENTITY” is a PG specific syntax, so the change won’t break the SQL standard. And “CASCADE” is known keyword that has been used in many SQL commands.

I can see the usefulness of “CASCADE” when a partitioned table has many partitions. A single command will be able to update all partitions’ RI.

What do you think?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Chao Li (#3)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Fri, Dec 12, 2025 at 9:28 AM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Amit,

Thanks for pointing out that my assumption of “RI of parent is not used” is not always true.

I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about to extend the ALTER TABLE syntax like:

```
ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE]
```

CASCADE is used for dependent objects, so I don't think using it will
be appropriate in this context. However, the INHERIT (NO INHERIT)
could be used. We already use them for constraints, see ALTER TABLE
... ALTER CONSTRAINT syntax in docs.

So, that the current syntax will behave the same as usual, and

With CASCADE
============
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated unless 3
3. If any child’s RI is different from the root's RI, fail out, no change happens

With CASCADE FORCE
===================
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’s RI is different from root’s RI

I think you can try to experiment with CHECK or NOT NULL constraint
behavior for similar cases in case of partition tables.

BTW, did you get this use case in the field or just browsing docs, you
thought it would be useful to have such a feature?

--
With Regards,
Amit Kapila.

#5Chao Li
li.evan.chao@gmail.com
In reply to: Amit Kapila (#4)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

Hi Amit,

Thank you very much for the guidance.

On Dec 13, 2025, at 19:10, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 12, 2025 at 9:28 AM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Amit,

Thanks for pointing out that my assumption of “RI of parent is not used” is not always true.

I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about to extend the ALTER TABLE syntax like:

```
ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE]
```

CASCADE is used for dependent objects, so I don't think using it will
be appropriate in this context. However, the INHERIT (NO INHERIT)
could be used. We already use them for constraints, see ALTER TABLE
... ALTER CONSTRAINT syntax in docs.

So, that the current syntax will behave the same as usual, and

With CASCADE
============
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated unless 3
3. If any child’s RI is different from the root's RI, fail out, no change happens

With CASCADE FORCE
===================
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’s RI is different from root’s RI

I think you can try to experiment with CHECK or NOT NULL constraint
behavior for similar cases in case of partition tables.

BTW, did you get this use case in the field or just browsing docs, you
thought it would be useful to have such a feature?

The main problem I am trying to solve is [1]/messages/by-id/CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com, where you are already in the thread. This is a real pain point reported by our users and field teams. While working on [1]/messages/by-id/CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com, I noticed this additional issue during my own tests. I then discussed it with our field teams, and they confirmed that such a feature would be very helpful in practice. We have many deployments where a single partitioned table has several thousands of partitions, and having a fast, single command to update the replica identity of all partitions would significantly simplify operations.

I also confirmed one thing with the field teams: across our deployments (my company has 100K+ deployments in China), they have never seen a case where partitions under the same parent/root use different replica identities. In theory, this is allowed, since RI can be set per partition, but in practice I am not sure whether such a use case really exists.

Currently, replica identity is not inheritable for partitions. I verified this behavior: if I create a partitioned table, alter its RI to FULL, and then create a new partition, the new partition still uses DEFAULT. If we keep this behavior, we can easily run into a scenario like this:

* create a partitioned table with 10 partitions
* ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- assume this feature exists
* create 5 new partitions
* ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- conflict occurs

In this case, a conflict occurs because the newly created partitions still have DEFAULT RI, but this is not the user’s intention.

From this perspective, when a new partition is created, it should automatically inherit the parent’s RI. With that behavior, a “FORCE” option would rarely be needed, because having one partition use a different RI from its siblings should be an uncommon case.

Based on this, I imagine the feature roughly like this:

* When a new partition is created, it inherits its parent’s RI
* ALTER TABLE <table_name> REPLICA IDENTITY [ INHERIT | NO INHERIT ] -- error out on conflicts

This leads to a couple of follow-up questions:

* Should RI be switchable between “inheritable” and “non-inheritable”, similar to constraints? IMO, no. RI is much simpler than constraints. For constraints, parent–child relationships exist between tables with potentially different structures, so it is natural that child tables might have different constraints. RI, however, only applies to partitioned tables, where partitions must share the same structure as the parent. In practice, it seems rare for partitions to intentionally use a different RI than the parent.

* Should publish_via_partition_root in publications affect this feature? IMO, no. A table can belong to multiple publications, and different publications may have different publish_via_partition_root settings.

[1]: /messages/by-id/CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Chao Li (#5)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Mon, Dec 15, 2025 at 12:44 PM Chao Li <li.evan.chao@gmail.com> wrote:

Thank you very much for the guidance.

On Dec 13, 2025, at 19:10, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 12, 2025 at 9:28 AM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Amit,

Thanks for pointing out that my assumption of “RI of parent is not used” is not always true.

I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about to extend the ALTER TABLE syntax like:

```
ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE]
```

CASCADE is used for dependent objects, so I don't think using it will
be appropriate in this context. However, the INHERIT (NO INHERIT)
could be used. We already use them for constraints, see ALTER TABLE
... ALTER CONSTRAINT syntax in docs.

So, that the current syntax will behave the same as usual, and

With CASCADE
============
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated unless 3
3. If any child’s RI is different from the root's RI, fail out, no change happens

With CASCADE FORCE
===================
1. Root's RI updated
2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’s RI is different from root’s RI

I think you can try to experiment with CHECK or NOT NULL constraint
behavior for similar cases in case of partition tables.

BTW, did you get this use case in the field or just browsing docs, you
thought it would be useful to have such a feature?

The main problem I am trying to solve is [1], where you are already in the thread. This is a real pain point reported by our users and field teams. While working on [1], I noticed this additional issue during my own tests. I then discussed it with our field teams, and they confirmed that such a feature would be very helpful in practice. We have many deployments where a single partitioned table has several thousands of partitions, and having a fast, single command to update the replica identity of all partitions would significantly simplify operations.

I also confirmed one thing with the field teams: across our deployments (my company has 100K+ deployments in China), they have never seen a case where partitions under the same parent/root use different replica identities. In theory, this is allowed, since RI can be set per partition, but in practice I am not sure whether such a use case really exists.

Currently, replica identity is not inheritable for partitions. I verified this behavior: if I create a partitioned table, alter its RI to FULL, and then create a new partition, the new partition still uses DEFAULT. If we keep this behavior, we can easily run into a scenario like this:

* create a partitioned table with 10 partitions
* ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- assume this feature exists
* create 5 new partitions
* ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- conflict occurs

In this case, a conflict occurs because the newly created partitions still have DEFAULT RI, but this is not the user’s intention.

From this perspective, when a new partition is created, it should automatically inherit the parent’s RI. With that behavior, a “FORCE” option would rarely be needed, because having one partition use a different RI from its siblings should be an uncommon case.

Based on this, I imagine the feature roughly like this:

* When a new partition is created, it inherits its parent’s RI
* ALTER TABLE <table_name> REPLICA IDENTITY [ INHERIT | NO INHERIT ] -- error out on conflicts

Sounds reasonable to me.

This leads to a couple of follow-up questions:

* Should RI be switchable between “inheritable” and “non-inheritable”, similar to constraints? IMO, no. RI is much simpler than constraints. For constraints, parent–child relationships exist between tables with potentially different structures, so it is natural that child tables might have different constraints. RI, however, only applies to partitioned tables, where partitions must share the same structure as the parent. In practice, it seems rare for partitions to intentionally use a different RI than the parent.

I see your point but I think we should provide resetting the option
unless it is too complex or not feasible.

* Should publish_via_partition_root in publications affect this feature? IMO, no. A table can belong to multiple publications, and different publications may have different publish_via_partition_root settings.

I also don't think so.

--
With Regards,
Amit Kapila.

#7Euler Taveira
euler@eulerto.com
In reply to: Amit Kapila (#6)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Tue, Dec 16, 2025, at 7:12 AM, Amit Kapila wrote:

On Mon, Dec 15, 2025 at 12:44 PM Chao Li <li.evan.chao@gmail.com> wrote:

From this perspective, when a new partition is created, it should automatically inherit the parent’s RI. With that behavior, a “FORCE” option would rarely be needed, because having one partition use a different RI from its siblings should be an uncommon case.

Based on this, I imagine the feature roughly like this:

* When a new partition is created, it inherits its parent’s RI
* ALTER TABLE <table_name> REPLICA IDENTITY [ INHERIT | NO INHERIT ] -- error out on conflicts

Sounds reasonable to me.

+1.

This leads to a couple of follow-up questions:

* Should RI be switchable between “inheritable” and “non-inheritable”, similar to constraints? IMO, no. RI is much simpler than constraints. For constraints, parent–child relationships exist between tables with potentially different structures, so it is natural that child tables might have different constraints. RI, however, only applies to partitioned tables, where partitions must share the same structure as the parent. In practice, it seems rare for partitions to intentionally use a different RI than the parent.

I see your point but I think we should provide resetting the option
unless it is too complex or not feasible.

+1.

ALTER TABLE ...
REPLICA IDENTITY { DEFAULT [ INHERIT | NO INHERIT ] |
USING INDEX index_name |
FULL [ INHERIT | NO INHERIT ] |
NOTHING [ INHERIT | NO INHERIT ] }

It doesn't make sense to have the inheritance property for INDEX. Although the
natural default value is NO INHERIT to preserve backward compatibility, I
wondering if we use INHERIT as default. The main advantage is usability as Chao
Li already mentioned. Is there any cases that having a different replica
identity from parent/partitioned table makes sense?

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

ri.sqlapplication/sql; name=ri.sqlDownload
#8Chao Li
li.evan.chao@gmail.com
In reply to: Euler Taveira (#7)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

Hi Amit and Euler,

Thank you both for all your advice. Here comes my first implementation of
INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira <euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is usability
as Chao
Li already mentioned. Is there any cases that having a different replica
identity from parent/partitioned table makes sense?

We can leave this topic open for discussion. In my current implementation,
NO INHERIT is still the default. But if we decide to switch the default, I
can add a new commit that should include only 1 line code change in gram.y
and a tiny doc update.

0001 - when a new partition is created, use the parent's replication
identity
0002 - add INHERIT | NO INHERIT

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v2-0002-Add-INHERIT-NO-INHERIT-options-to-ALTER-TABLE-REP.patchapplication/octet-stream; name=v2-0002-Add-INHERIT-NO-INHERIT-options-to-ALTER-TABLE-REP.patchDownload+191-5
v2-0001-Inherit-replica-identity-for-new-and-attached-par.patchapplication/octet-stream; name=v2-0001-Inherit-replica-identity-for-new-and-attached-par.patchDownload+121-3
#9Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Chao Li (#8)
RE: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Wednesday, December 17, 2025 3:56 PM Chao Li <li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first implementation of
INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira <mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is usability as
Chao Li already mentioned. Is there any cases that having a different
replica identity from parent/partitioned table makes sense?

We can leave this topic open for discussion. In my current implementation, NO
INHERIT is still the default. But if we decide to switch the default, I can add
a new commit that should include only 1 line code change in gram.y and a tiny
doc update.

0001 - when a new partition is created, use the parent's replication identity
0002 - add INHERIT | NO INHERIT

Thanks for updating the patches.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what should be the
expected behavior if a new partition cannot identify the same replica identity
key as the root partitioned table?

2) Should we simply use the ONLY keyword to determine whether to propagate the
replica identity to partitions instead of adding [NOT] INHERIT? For example, if
a user specifies ONLY, it changes the identity of the parent table, and any
newly created partitions will adopt this new identity. However, the identities
of existing partitions remain unchanged.

3) There have been previous discussions on similar proposals[1]/messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql[2]/messages/by-id/OSBPR01MB2982A2738F16722899A50082FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com. It might be
beneficial to review those debates to see whether any old issues or arguments
are pertinent to this proposal.

[1]: /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql
[2]: /messages/by-id/OSBPR01MB2982A2738F16722899A50082FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

#10Chao Li
li.evan.chao@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#9)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 17, 2025 3:56 PM Chao Li <li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first implementation of
INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira <mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is usability as
Chao Li already mentioned. Is there any cases that having a different
replica identity from parent/partitioned table makes sense?

We can leave this topic open for discussion. In my current implementation, NO
INHERIT is still the default. But if we decide to switch the default, I can add
a new commit that should include only 1 line code change in gram.y and a tiny
doc update.

0001 - when a new partition is created, use the parent's replication identity
0002 - add INHERIT | NO INHERIT

Hi Zhijie,

Thanks for your feedback and linked information. I think this patch is avoiding the hard problem of “index” RI.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what should be the
expected behavior if a new partition cannot identify the same replica identity
key as the root partitioned table?

Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and NONE.

2) Should we simply use the ONLY keyword to determine whether to propagate the
replica identity to partitions instead of adding [NOT] INHERIT? For example, if
a user specifies ONLY, it changes the identity of the parent table, and any
newly created partitions will adopt this new identity. However, the identities
of existing partitions remain unchanged.

The current syntax “ALTER TABLE [ONLY] table_name action”, I think here “ONLY” works in terms of inherited tables but partitions. (I surprisedly find the doc doesn’t explain “ONLY” at all.) The current behavior (without my patch) proves my understanding, “ALTER TABLE table_name REPLICA IDENTITY” only updates the parent table itself.

Given that it is not allowed to inherit a table from either a partitioned table (root/parent) or partition (leaf), reusing the “ONLY" at “ALTER TABLE” level won’t generate a conflict. But the problem is, we will have to revisit all ALTER TABLE actions to see if they can propagate to partitions when the target table is a partitioned table. In that case, scope of this patch is extremely extended.

The current approach adding INHERIT to the action “REPLICA IDENTIFY” has no global impacts. If you look at the patch, the syntax this patch uses is:
```
ALTER TABLE ...
REPLICA IDENTITY { DEFAULT [ INHERIT | NO INHERIT ] |
USING INDEX index_name |
FULL [ INHERIT | NO INHERIT ] |
NOTHING [ INHERIT | NO INHERIT ] }
```
It explicitly avoids the complexity of “USING INDEX”, thus we don’t have to mention any exceptions in docs.

Amit, what do you think?

3) There have been previous discussions on similar proposals[1][2]. It might be
beneficial to review those debates to see whether any old issues or arguments
are pertinent to this proposal.

[1] /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql
[2] /messages/by-id/OSBPR01MB2982A2738F16722899A50082FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com

I read through the both threads. I think the key difference between the patch and the previous one is that this patch only adds “INHERIT” to DEFAULT/FULL/NONE, which is narrow scoped. Let’s see how the folks who involved in the previous discussion will feedback to this patch.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#11Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#10)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Dec 18, 2025, at 12:21, Chao Li <li.evan.chao@gmail.com> wrote:

2) Should we simply use the ONLY keyword to determine whether to propagate the
replica identity to partitions instead of adding [NOT] INHERIT? For example, if
a user specifies ONLY, it changes the identity of the parent table, and any
newly created partitions will adopt this new identity. However, the identities
of existing partitions remain unchanged.

The current syntax “ALTER TABLE [ONLY] table_name action”, I think here “ONLY” works in terms of inherited tables but partitions. (I surprisedly find the doc doesn’t explain “ONLY” at all.) The current behavior (without my patch) proves my understanding, “ALTER TABLE table_name REPLICA IDENTITY” only updates the parent table itself.

Given that it is not allowed to inherit a table from either a partitioned table (root/parent) or partition (leaf), reusing the “ONLY" at “ALTER TABLE” level won’t generate a conflict. But the problem is, we will have to revisit all ALTER TABLE actions to see if they can propagate to partitions when the target table is a partitioned table. In that case, scope of this patch is extremely extended.

The current approach adding INHERIT to the action “REPLICA IDENTIFY” has no global impacts. If you look at the patch, the syntax this patch uses is:
```
ALTER TABLE ...
REPLICA IDENTITY { DEFAULT [ INHERIT | NO INHERIT ] |
USING INDEX index_name |
FULL [ INHERIT | NO INHERIT ] |
NOTHING [ INHERIT | NO INHERIT ] }
```
It explicitly avoids the complexity of “USING INDEX”, thus we don’t have to mention any exceptions in docs.

Amit, what do you think?

I read through the doc of “ALTER TABLE” again, and got some findings.

```
name - The name (optionally schema-qualified) of an existing table to alter. If ONLY is specified before the table name, only that table is altered. If ONLY is not specified, the table and all its descendant tables (if any) are altered. Optionally, * can be specified after the table name to explicitly indicate that descendant tables are included.
```

For table name, it says "If ONLY is specified before the table name, only that table is altered. If ONLY is not specified, the table and all its descendant tables (if any) are altered.” Here, I think we can “descendant tables” as both inherited table or partition table, as they are mutually exclusive.

And for “DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER”, the doc says:
```
DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER

When this command is applied to a partitioned table, the states of corresponding clone triggers in the partitions are updated too, unless ONLY is specified.
```

Here, ONLY is used for partition table. From this perspective, “REPLICA IDENTITY” not propagating to children feels like a “bug”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#12Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Chao Li (#10)
RE: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

wrote:

On Wednesday, December 17, 2025 3:56 PM Chao Li

<li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first
implementation of INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira

<mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is
usability as Chao Li already mentioned. Is there any cases that
having a different replica identity from parent/partitioned table makes

sense?

We can leave this topic open for discussion. In my current
implementation, NO INHERIT is still the default. But if we decide to
switch the default, I can add a new commit that should include only 1
line code change in gram.y and a tiny doc update.

0001 - when a new partition is created, use the parent's replication
identity
0002 - add INHERIT | NO INHERIT

Hi Zhijie,

Thanks for your feedback and linked information. I think this patch is avoiding
the hard problem of “index” RI.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what
should be the expected behavior if a new partition cannot identify the
same replica identity key as the root partitioned table?

Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and
NONE.

I personally find skipping this part to be inconsistent, particularly given the
existing disparities among ALTER TABLE commands related to partitioned table handling.
Omitting this part introduces further inconsistency within the ALTER TABLE
REPLICA IDENTITY.

That said, I understand that skipping this implementation might be necessary due
to technical challenges. It's important, however, to discuss, analyze and document the
specific difficulties associated with implementing this part, and we should
include this information in the code comments and commit message.

Best Regards,
Hou zj

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#12)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

wrote:

On Wednesday, December 17, 2025 3:56 PM Chao Li

<li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first
implementation of INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira

<mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is
usability as Chao Li already mentioned. Is there any cases that
having a different replica identity from parent/partitioned table makes

sense?

We can leave this topic open for discussion. In my current
implementation, NO INHERIT is still the default. But if we decide to
switch the default, I can add a new commit that should include only 1
line code change in gram.y and a tiny doc update.

0001 - when a new partition is created, use the parent's replication
identity
0002 - add INHERIT | NO INHERIT

Hi Zhijie,

Thanks for your feedback and linked information. I think this patch is avoiding
the hard problem of “index” RI.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what
should be the expected behavior if a new partition cannot identify the
same replica identity key as the root partitioned table?

Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and
NONE.

I personally find skipping this part to be inconsistent, particularly given the
existing disparities among ALTER TABLE commands related to partitioned table handling.
Omitting this part introduces further inconsistency within the ALTER TABLE
REPLICA IDENTITY.

Fair point. I think one should summarize the previous discussions with
key differences and where the previous patch got stuck. Then, it would
be good to get some feedback from the people involved previously. If
there is an agreement that we can do INHERIT stuff for specific parts
then fine, otherwise, I think we need to address the index part as
well.

--
With Regards,
Amit Kapila.

#14Chao Li
li.evan.chao@gmail.com
In reply to: Amit Kapila (#13)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Dec 22, 2025, at 14:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

wrote:

On Wednesday, December 17, 2025 3:56 PM Chao Li

<li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first
implementation of INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira

<mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is
usability as Chao Li already mentioned. Is there any cases that
having a different replica identity from parent/partitioned table makes

sense?

We can leave this topic open for discussion. In my current
implementation, NO INHERIT is still the default. But if we decide to
switch the default, I can add a new commit that should include only 1
line code change in gram.y and a tiny doc update.

0001 - when a new partition is created, use the parent's replication
identity
0002 - add INHERIT | NO INHERIT

Hi Zhijie,

Thanks for your feedback and linked information. I think this patch is avoiding
the hard problem of “index” RI.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what
should be the expected behavior if a new partition cannot identify the
same replica identity key as the root partitioned table?

Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and
NONE.

I personally find skipping this part to be inconsistent, particularly given the
existing disparities among ALTER TABLE commands related to partitioned table handling.
Omitting this part introduces further inconsistency within the ALTER TABLE
REPLICA IDENTITY.

Fair point. I think one should summarize the previous discussions with
key differences and where the previous patch got stuck. Then, it would
be good to get some feedback from the people involved previously. If
there is an agreement that we can do INHERIT stuff for specific parts
then fine, otherwise, I think we need to address the index part as
well.

[1] /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql
[2] /messages/by-id/OSBPR01MB2982A2738F16722899A50082FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com

Sure, let me try to summarize the two discussions.

For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, Simon Riggs, Michael Paquier, and Peter Eisentraut.
----

Brief summary with rough timeline:

* In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle “index-based” replica identity, which my patch intentionally avoids.
* Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors.
* Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLE actions such as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurse to child partitions.
* Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY would affect just the parent.
* Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites.
* Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER, OWNER, TABLESPACE, CHECK constraints).
* This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse.
* Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY.
* Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible.
* Álvaro wanted to proceed with the patch.
* Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first.

Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patch itself. There appeared to be broad agreement that:
* New partitions should use the parent’s replica identity.
* ONLY could be used to control whether replica identity changes recurse.

However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreement on whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort.

For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, Michael Paquier, and Kyotaro Horiguchi.
----

Brief summary with rough timeline:

* In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions.
* Bharath reviewed the patch and raised a number of questions and edge cases.
* There were initial discussions about the patch mechanics and expected behavior.
* Álvaro then pointed out the strong relation to the earlier discussion in [1].
* The focus of the discussion shifted to the more fundamental question of “what is the parent?”:
* Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with future partitions remaining independent.
* Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitions as well.
* No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistency across ALTER TABLE actions.
* Takayuki withdrew the patch

Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identity itself.

Current situation:
----

* Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge. However, addressing everything in one patch set does not seem realistic.
* The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean toward the view that future partitions should inherit properties from the parent.
* Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change and relatively safe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTER TABLE action may deserve its own discussion and patch set.
* The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partition inherits TABLESPACE from the parent, but not REPLICA IDENTITY.
* “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions.

How to proceed?
----

If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind, I’d like to suggest a two-phase approach.

Phase 1: Document current behavior and set expectations

* Identify all ALTER TABLE actions involved in these inconsistencies.
* Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables, and (where appropriate) the intended or ideal behavior.
* Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, with details described in each action’s section.

Phase 2: Address actions incrementally

* Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require more design discussion, coding, and testing.
* With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release.
* That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoid long-term fragmentation.

I’ve included the participants from the previous discussions on CC, in case they want to comment further.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#15Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#14)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Dec 29, 2025, at 16:24, Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 22, 2025, at 14:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

wrote:

On Wednesday, December 17, 2025 3:56 PM Chao Li

<li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first
implementation of INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira

<mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is
usability as Chao Li already mentioned. Is there any cases that
having a different replica identity from parent/partitioned table makes

sense?

We can leave this topic open for discussion. In my current
implementation, NO INHERIT is still the default. But if we decide to
switch the default, I can add a new commit that should include only 1
line code change in gram.y and a tiny doc update.

0001 - when a new partition is created, use the parent's replication
identity
0002 - add INHERIT | NO INHERIT

Hi Zhijie,

Thanks for your feedback and linked information. I think this patch is avoiding
the hard problem of “index” RI.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what
should be the expected behavior if a new partition cannot identify the
same replica identity key as the root partitioned table?

Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and
NONE.

I personally find skipping this part to be inconsistent, particularly given the
existing disparities among ALTER TABLE commands related to partitioned table handling.
Omitting this part introduces further inconsistency within the ALTER TABLE
REPLICA IDENTITY.

Fair point. I think one should summarize the previous discussions with
key differences and where the previous patch got stuck. Then, it would
be good to get some feedback from the people involved previously. If
there is an agreement that we can do INHERIT stuff for specific parts
then fine, otherwise, I think we need to address the index part as
well.

[1] /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql
[2] /messages/by-id/OSBPR01MB2982A2738F16722899A50082FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com

Sure, let me try to summarize the two discussions.

For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, Simon Riggs, Michael Paquier, and Peter Eisentraut.
----

Brief summary with rough timeline:

* In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle “index-based” replica identity, which my patch intentionally avoids.
* Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors.
* Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLE actions such as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurse to child partitions.
* Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY would affect just the parent.
* Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites.
* Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER, OWNER, TABLESPACE, CHECK constraints).
* This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse.
* Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY.
* Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible.
* Álvaro wanted to proceed with the patch.
* Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first.

Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patch itself. There appeared to be broad agreement that:
* New partitions should use the parent’s replica identity.
* ONLY could be used to control whether replica identity changes recurse.

However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreement on whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort.

For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, Michael Paquier, and Kyotaro Horiguchi.
----

Brief summary with rough timeline:

* In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions.
* Bharath reviewed the patch and raised a number of questions and edge cases.
* There were initial discussions about the patch mechanics and expected behavior.
* Álvaro then pointed out the strong relation to the earlier discussion in [1].
* The focus of the discussion shifted to the more fundamental question of “what is the parent?”:
* Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with future partitions remaining independent.
* Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitions as well.
* No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistency across ALTER TABLE actions.
* Takayuki withdrew the patch

Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identity itself.

Current situation:
----

* Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge. However, addressing everything in one patch set does not seem realistic.
* The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean toward the view that future partitions should inherit properties from the parent.
* Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change and relatively safe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTER TABLE action may deserve its own discussion and patch set.
* The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partition inherits TABLESPACE from the parent, but not REPLICA IDENTITY.
* “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions.

How to proceed?
----

If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind, I’d like to suggest a two-phase approach.

Phase 1: Document current behavior and set expectations

* Identify all ALTER TABLE actions involved in these inconsistencies.
* Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables, and (where appropriate) the intended or ideal behavior.
* Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, with details described in each action’s section.

Phase 2: Address actions incrementally

* Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require more design discussion, coding, and testing.
* With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release.
* That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoid long-term fragmentation.

I’ve included the participants from the previous discussions on CC, in case they want to comment further.

To move this patch forward, I’ve been working on a few related patches in parallel:

[1]/messages/by-id/CAEoWx2=mYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw@mail.gmail.com Enhances the “ALTER TABLE” documentation to clarify how subcommands behave on partitioned tables. In that discussion, I summarized the current behaviors of all subcommands into 15 categories.
[2]/messages/by-id/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com Adds a NOTICE when ONLY is not specified but the action does not recurse to child partitions.
[3]/messages/by-id/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com Improves the header comments in tablecmds.c to better explain the meaning of “recurse” and “recursing”.

While working on [1]/messages/by-id/CAEoWx2=mYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw@mail.gmail.com, [2]/messages/by-id/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com and [3]/messages/by-id/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com, I gained a deeper understanding of ALTER TABLE behavior. Based on that, v3 is a complete rework compared to v2. The behavior implemented by v3 is:

• Recursion is performed during the preparation phase.
• When ONLY is specified, no recursion occurs; without ONLY, the command recurses to all child partitions.
• All replica identity types are supported, including USING INDEX (which was not supported in v2).
• For non-index replica identities, the existing ATSimpleRecursion() infrastructure is reused; for index-based replica identity, the corresponding index name must be determined for each child partition.

In a previous discussion [4]/messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql, Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test against v3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition attachment. I will investigate this further.

While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, the index is automatically created on all partitions. However, if a column type change causes an index rebuild, it appears that the rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discuss it in a separate thread.

At this point, I think v3 is in a reviewable state.

[1]: /messages/by-id/CAEoWx2=mYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw@mail.gmail.com
[2]: /messages/by-id/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com
[3]: /messages/by-id/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com
[4]: /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v3-0001-Inherit-replica-identity-for-new-and-attached-par.patchapplication/octet-stream; name=v3-0001-Inherit-replica-identity-for-new-and-attached-par.patch; x-unix-mode=0644Download+121-3
v3-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patchapplication/octet-stream; name=v3-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patch; x-unix-mode=0644Download+184-8
#16Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#15)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Jan 26, 2026, at 10:51, Chao Li <li.evan.chao@gmail.com> wrote:

While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, the index is automatically created on all partitions. However, if a column type change causes an index rebuild, it appears that the rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discuss it in a separate thread.

I have verified the problem. The real problem is, after index rebuild, indisreplident flag of index on child partition will be lost, which should be a bug. I will start a new thread to give more details of the problem and propose a fix.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#17Michael Paquier
michael@paquier.xyz
In reply to: Chao Li (#16)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Mon, Jan 26, 2026 at 04:56:36PM +0800, Chao Li wrote:

I have verified the problem. The real problem is, after index
rebuild, indisreplident flag of index on child partition will be
lost, which should be a bug. I will start a new thread to give more
details of the problem and propose a fix.

That sounds like a real problem, yes. Thanks for starting a new
thread about that.
--
Michael

#18Chao Li
li.evan.chao@gmail.com
In reply to: Michael Paquier (#17)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Jan 27, 2026, at 14:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jan 26, 2026 at 04:56:36PM +0800, Chao Li wrote:

I have verified the problem. The real problem is, after index
rebuild, indisreplident flag of index on child partition will be
lost, which should be a bug. I will start a new thread to give more
details of the problem and propose a fix.

That sounds like a real problem, yes. Thanks for starting a new
thread about that.
--
Michael

This is the new discussion for the bug:
/messages/by-id/DB533C25-C6BA-4C0F-8046-96168E9CDD72@gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#19Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#15)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Jan 26, 2026, at 10:51, Chao Li <li.evan.chao@gmail.com> wrote:

In a previous discussion [4], Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test against v3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition attachment. I will investigate this further.

I tried to investigate the server crash yesterday, but I’m no longer able to reproduce it. From the record of the first crash I encountered, the call stack looked
like this:
```
TRAP: failed Assert("entry->data.lockmode == BUFFER_LOCK_UNLOCK"), File: "bufmgr.c", Line: 5908, PID: 47991
0 postgres 0x00000001013d9bb0 ExceptionalCondition + 216
1 postgres 0x0000000101129a80 BufferLockConditional + 88
2 postgres 0x0000000101129a04 ConditionalLockBuffer + 224
3 postgres 0x0000000100b8966c _bt_conditionallockbuf + 28
4 postgres 0x0000000100b88714 _bt_allocbuf + 128
5 postgres 0x0000000100b858d4 _bt_split + 1496
6 postgres 0x0000000100b82cec _bt_insertonpg + 1520
7 postgres 0x0000000100b81220 _bt_doinsert + 608
8 postgres 0x0000000100b9a008 btinsert + 120
9 postgres 0x0000000100b7a224 index_insert + 552
10 postgres 0x0000000100c5dd50 CatalogIndexInsert + 764
11 postgres 0x0000000100c5df60 CatalogTupleUpdate + 100
12 postgres 0x0000000100c7e608 ConstraintSetParentConstraint + 580
13 postgres 0x0000000100de6598 AttachPartitionEnsureIndexes + 1596
14 postgres 0x0000000100de5cac attachPartitionTable + 80
15 postgres 0x0000000100dd864c ATExecAttachPartition + 2520
16 postgres 0x0000000100dcb8e8 ATExecCmd + 4464
17 postgres 0x0000000100dc6054 ATRewriteCatalogs + 408
18 postgres 0x0000000100dbfa18 ATController + 256
19 postgres 0x0000000100dbf84c AlterTable + 96
20 postgres 0x00000001011a3508 ProcessUtilitySlow + 1704
21 postgres 0x00000001011a111c standard_ProcessUtility + 3504
22 postgres 0x00000001011a035c ProcessUtility + 360
23 postgres 0x000000010119fa10 PortalRunUtility + 216
24 postgres 0x000000010119eae0 PortalRunMulti + 688
25 postgres 0x000000010119e018 PortalRun + 788
26 postgres 0x0000000101198dcc exec_simple_query + 1380
27 postgres 0x0000000101197ee8 PostgresMain + 3244
28 postgres 0x000000010118f8d0 BackendInitialize + 0
29 postgres 0x0000000101061f3c postmaster_child_launch + 456
30 postgres 0x00000001010696c8 BackendStartup + 304
31 postgres 0x0000000101067564 ServerLoop + 372
32 postgres 0x0000000101066044 PostmasterMain + 6440
33 postgres 0x0000000100ee40a4 main + 924
34 dyld 0x000000019a36dd54 start + 7184
2026-01-26 09:52:41.240 CST [46845] LOG: client backend (PID 47991) was terminated by signal 6: Abort trap: 6
```

I noticed that the Assert in bufmgr.c was removed earlier today by commit 333f58637.

However, with the server crash no longer occurring, the DEADLOCK issue reappeared. After some investigation, I confirmed that the deadlock is not specific to this patch, I can consistently reproduce it with ATTACH PARTITION on the master branch. That suggests this is a more general problem.

I’ll start a new thread to follow up on the deadlock separately.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#20Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#15)
Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions

On Jan 26, 2026, at 10:51, Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 29, 2025, at 16:24, Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 22, 2025, at 14:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote:

On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

wrote:

On Wednesday, December 17, 2025 3:56 PM Chao Li

<li.evan.chao@gmail.com> wrote:

Thank you both for all your advice. Here comes my first
implementation of INHERIT in the attached v2 patch.

On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira

<mailto:euler@eulerto.com> wrote:

I wondering if we use INHERIT as default. The main advantage is
usability as Chao Li already mentioned. Is there any cases that
having a different replica identity from parent/partitioned table makes

sense?

We can leave this topic open for discussion. In my current
implementation, NO INHERIT is still the default. But if we decide to
switch the default, I can add a new commit that should include only 1
line code change in gram.y and a tiny doc update.

0001 - when a new partition is created, use the parent's replication
identity
0002 - add INHERIT | NO INHERIT

Hi Zhijie,

Thanks for your feedback and linked information. I think this patch is avoiding
the hard problem of “index” RI.

I think there are several design considerations for this proposal:

1) Since the index names can vary across different partitions, what
should be the expected behavior if a new partition cannot identify the
same replica identity key as the root partitioned table?

Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and
NONE.

I personally find skipping this part to be inconsistent, particularly given the
existing disparities among ALTER TABLE commands related to partitioned table handling.
Omitting this part introduces further inconsistency within the ALTER TABLE
REPLICA IDENTITY.

Fair point. I think one should summarize the previous discussions with
key differences and where the previous patch got stuck. Then, it would
be good to get some feedback from the people involved previously. If
there is an agreement that we can do INHERIT stuff for specific parts
then fine, otherwise, I think we need to address the index part as
well.

[1] /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql
[2] /messages/by-id/OSBPR01MB2982A2738F16722899A50082FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com

Sure, let me try to summarize the two discussions.

For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, Simon Riggs, Michael Paquier, and Peter Eisentraut.
----

Brief summary with rough timeline:

* In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle “index-based” replica identity, which my patch intentionally avoids.
* Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors.
* Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLE actions such as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurse to child partitions.
* Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY would affect just the parent.
* Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites.
* Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER, OWNER, TABLESPACE, CHECK constraints).
* This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse.
* Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY.
* Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible.
* Álvaro wanted to proceed with the patch.
* Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first.

Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patch itself. There appeared to be broad agreement that:
* New partitions should use the parent’s replica identity.
* ONLY could be used to control whether replica identity changes recurse.

However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreement on whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort.

For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, Michael Paquier, and Kyotaro Horiguchi.
----

Brief summary with rough timeline:

* In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions.
* Bharath reviewed the patch and raised a number of questions and edge cases.
* There were initial discussions about the patch mechanics and expected behavior.
* Álvaro then pointed out the strong relation to the earlier discussion in [1].
* The focus of the discussion shifted to the more fundamental question of “what is the parent?”:
* Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with future partitions remaining independent.
* Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitions as well.
* No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistency across ALTER TABLE actions.
* Takayuki withdrew the patch

Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identity itself.

Current situation:
----

* Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge. However, addressing everything in one patch set does not seem realistic.
* The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean toward the view that future partitions should inherit properties from the parent.
* Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change and relatively safe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTER TABLE action may deserve its own discussion and patch set.
* The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partition inherits TABLESPACE from the parent, but not REPLICA IDENTITY.
* “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions.

How to proceed?
----

If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind, I’d like to suggest a two-phase approach.

Phase 1: Document current behavior and set expectations

* Identify all ALTER TABLE actions involved in these inconsistencies.
* Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables, and (where appropriate) the intended or ideal behavior.
* Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, with details described in each action’s section.

Phase 2: Address actions incrementally

* Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require more design discussion, coding, and testing.
* With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release.
* That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoid long-term fragmentation.

I’ve included the participants from the previous discussions on CC, in case they want to comment further.

To move this patch forward, I’ve been working on a few related patches in parallel:

• [1] Enhances the “ALTER TABLE” documentation to clarify how subcommands behave on partitioned tables. In that discussion, I summarized the current behaviors of all subcommands into 15 categories.
• [2] Adds a NOTICE when ONLY is not specified but the action does not recurse to child partitions.
• [3] Improves the header comments in tablecmds.c to better explain the meaning of “recurse” and “recursing”.

While working on [1], [2] and [3], I gained a deeper understanding of ALTER TABLE behavior. Based on that, v3 is a complete rework compared to v2. The behavior implemented by v3 is:

• Recursion is performed during the preparation phase.
• When ONLY is specified, no recursion occurs; without ONLY, the command recurses to all child partitions.
• All replica identity types are supported, including USING INDEX (which was not supported in v2).
• For non-index replica identities, the existing ATSimpleRecursion() infrastructure is reused; for index-based replica identity, the corresponding index name must be determined for each child partition.

In a previous discussion [4], Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test against v3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition attachment. I will investigate this further.

While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, the index is automatically created on all partitions. However, if a column type change causes an index rebuild, it appears that the rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discuss it in a separate thread.

At this point, I think v3 is in a reviewable state.

[1] /messages/by-id/CAEoWx2=mYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw@mail.gmail.com
[2] /messages/by-id/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com
[3] /messages/by-id/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com
[4] /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<v3-0001-Inherit-replica-identity-for-new-and-attached-par.patch><v3-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patch>

Trying to move forward this patch, as the previous blockers have been addressed:

* ALTER TABLE “inconsistency” is clarified by the other patch, see [1]/messages/by-id/CAEoWx2mUiCYJEBuo5D74gi7pHfNz82b54oDHjZNtrXRFDnBPOg@mail.gmail.com.
* DEADLOCK has been addressed by the other patch, see [2]/messages/by-id/CFACA0EB-7E6F-4FAA-9ACE-1FC2226D7482@gmail.com.
* “index” replica identity is handled.

PFA v4: rebased and deleted an unneeded test case.

[1]: /messages/by-id/CAEoWx2mUiCYJEBuo5D74gi7pHfNz82b54oDHjZNtrXRFDnBPOg@mail.gmail.com
[2]: /messages/by-id/CFACA0EB-7E6F-4FAA-9ACE-1FC2226D7482@gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v4-0001-Inherit-replica-identity-for-new-and-attached-par.patchapplication/octet-stream; name=v4-0001-Inherit-replica-identity-for-new-and-attached-par.patch; x-unix-mode=0644Download+121-3
v4-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patchapplication/octet-stream; name=v4-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patch; x-unix-mode=0644Download+184-8