repack: fix a bug to reject deferrable primary key fallback for concurrent mode

Started by Chao Li15 days ago10 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity index, the code falls back to using the primary key if one exists. The problem is that a deferrable primary key cannot be used for this purpose. WAL generation does not consider a deferrable primary key to be a replica identity, so concurrent mode may not receive enough old tuple information to replay concurrent changes.

I tested this with the following procedure.

1 - Create a table
```
create table t (id int, v text, primary key (id) deferrable initially deferred);
insert into t values (1, 'a');
```

2 - Attach a debugger to session 1's backend process. I used vscode. Add a breakpoint at the first process_concurrent_changes() call. This blocks the REPACK process and gives session 2 time to issue a DELETE.

3 - In session 1, issue a repack, it will stop at the breakpoint
```
repack (concurrently) t;
```

4 - In session 2
```
delete from t where id=1;
```

5 - Detach session 1 from the debugger, so that repack continues and tries to re-apply the delete from session 2.

6 - repack fails with:
```
evantest=# repack (concurrently) t;
ERROR: incomplete delete info
CONTEXT: slot "repack_96468", output plugin "pgrepack", in the change callback, associated LSN 0/2A5717F0
REPACK decoding worker
```

The error comes from this code in pgrepack.c:
```
case REORDER_BUFFER_CHANGE_DELETE:
{
HeapTuple oldtuple;

oldtuple = change->data.tp.oldtuple;

if (oldtuple == NULL)
elog(ERROR, "incomplete delete info");

repack_store_change(ctx, relation, CHANGE_DELETE, oldtuple);
}
break;
```

The root cause is that repack.c assumes rel->rd_pkindex is usable as an identity index, but logical decoding does not treat a deferrable primary key as replica identity. As a result, the decoding worker may not get the old tuple needed to re-apply the delete.

To fix the problem, I think we should just not fall back to deferrable primary key in the first place. See the attached patch.

With this patch, repack will quickly for the test:
```
evantest=# repack (concurrently) t;
ERROR: cannot process relation "t"
HINT: Relation "t" has a deferrable primary key.
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v1-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patchapplication/octet-stream; name=v1-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patch; x-unix-mode=0644Download+14-1
#2Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Chao Li (#1)
RE: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

On Friday, April 17, 2026 11:35 AM Chao Li <li.evan.chao@gmail.com> wrote:

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity
index, the code falls back to using the primary key if one exists. The problem is
that a deferrable primary key cannot be used for this purpose. WAL
generation does not consider a deferrable primary key to be a replica identity,
so concurrent mode may not receive enough old tuple information to replay
concurrent changes.

I tested this with the following procedure.

...

With this patch, repack will quickly for the test:
```
evantest=# repack (concurrently) t;
ERROR: cannot process relation "t"
HINT: Relation "t" has a deferrable primary key.
```

Good catch!

I think we can use the existing API to identify the index, for example:

diff --git a/src/backend/commands/repack.c b/src/backend/commands/repack.c
index 67364cc60e3..cc30236f493 100644
--- a/src/backend/commands/repack.c
+++ b/src/backend/commands/repack.c
@@ -62,6 +62,7 @@
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/logicalrelation.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -924,9 +925,7 @@ check_concurrent_repack_requirements(Relation rel, Oid *ident_idx_p)
 	 * repack work with a FULL replica identity; however that requires more
 	 * work and is not implemented yet.
 	 */
-	ident_idx = RelationGetReplicaIndex(rel);
-	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
-		ident_idx = rel->rd_pkindex;
+	ident_idx = GetRelationIdentityOrPK();
 	if (!OidIsValid(ident_idx))
 		ereport(ERROR,
 				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

And it would be better to add a test for this.

Best Regards,
Hou zj

#3Chao Li
li.evan.chao@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#2)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

On Apr 17, 2026, at 11:46, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

On Friday, April 17, 2026 11:35 AM Chao Li <li.evan.chao@gmail.com> wrote:

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity
index, the code falls back to using the primary key if one exists. The problem is
that a deferrable primary key cannot be used for this purpose. WAL
generation does not consider a deferrable primary key to be a replica identity,
so concurrent mode may not receive enough old tuple information to replay
concurrent changes.

I tested this with the following procedure.

...

With this patch, repack will quickly for the test:
```
evantest=# repack (concurrently) t;
ERROR: cannot process relation "t"
HINT: Relation "t" has a deferrable primary key.
```

Good catch!

I think we can use the existing API to identify the index, for example:

diff --git a/src/backend/commands/repack.c b/src/backend/commands/repack.c
index 67364cc60e3..cc30236f493 100644
--- a/src/backend/commands/repack.c
+++ b/src/backend/commands/repack.c
@@ -62,6 +62,7 @@
#include "miscadmin.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
+#include "replication/logicalrelation.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
@@ -924,9 +925,7 @@ check_concurrent_repack_requirements(Relation rel, Oid *ident_idx_p)
* repack work with a FULL replica identity; however that requires more
* work and is not implemented yet.
*/
- ident_idx = RelationGetReplicaIndex(rel);
- if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
- ident_idx = rel->rd_pkindex;
+ ident_idx = GetRelationIdentityOrPK();
if (!OidIsValid(ident_idx))
ereport(ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

Thanks for pointing out GetRelationIdentityOrPK(). Looks like it wraps RelationGetReplicaIndex and excludes deferrable primary key.

Switched to use GetRelationIdentityOrPK() in v2.

And it would be better to add a test for this.

I didn’t add the test because I saw there was only 1 test case of checking catalog table in cluster.sql for repack(concurrently). As you asked, I added tests for all checks of check_concurrent_repack_requirements().

PFA v2.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v2-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patchapplication/octet-stream; name=v2-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patch; x-unix-mode=0644Download+88-9
#4Antonin Houska
ah@cybertec.at
In reply to: Chao Li (#1)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

Chao Li <li.evan.chao@gmail.com> wrote:

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity index, the code falls back to using the primary key if one exists. The problem is that a deferrable primary key cannot be used for this purpose. WAL generation does not consider a deferrable primary key to be a replica identity, so concurrent mode may not receive enough old tuple information to replay concurrent changes.

Thanks for finding it, this is certainly a problem.

I'm just thinking if it's worth a separate error message.
RelationGetIndexList() just ignores the deferrable PK

if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex) && !pkdeferrable)
relation->rd_replidindex = pkeyIndex;

and if there's no other suitable index, the result is that there is no
identity index for the table. So the change attached here should be consistent
with this approach.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

fix_identity_check.difftext/x-diffDownload+2-1
#5Chao Li
li.evan.chao@gmail.com
In reply to: Antonin Houska (#4)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

On Apr 20, 2026, at 22:52, Antonin Houska <ah@cybertec.at> wrote:

Chao Li <li.evan.chao@gmail.com> wrote:

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity index, the code falls back to using the primary key if one exists. The problem is that a deferrable primary key cannot be used for this purpose. WAL generation does not consider a deferrable primary key to be a replica identity, so concurrent mode may not receive enough old tuple information to replay concurrent changes.

Thanks for finding it, this is certainly a problem.

I'm just thinking if it's worth a separate error message.
RelationGetIndexList() just ignores the deferrable PK

if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex) && !pkdeferrable)
relation->rd_replidindex = pkeyIndex;

and if there's no other suitable index, the result is that there is no
identity index for the table. So the change attached here should be consistent
with this approach.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Hi Antonin,

Thanks for your review. I guess you read the v1 patch. In v2, I have switched to use GetRelationIdentityOrPK() that Zhijie suggested, which has covered RelationGetIndexList() and all checks, so that code is simplified, and there is no longer a separate error message.

As CFBot asked for, rebased to v3, nothing changed from v2. Can you please take a look at v3 again?

BTW, could you please also take a look at [1]/messages/by-id/06F150E0-FDE9-4FC6-9EEF-79A759430471@gmail.com that is a comment-only change for repack.

[1]: /messages/by-id/06F150E0-FDE9-4FC6-9EEF-79A759430471@gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v3-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patchapplication/octet-stream; name=v3-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patch; x-unix-mode=0644Download+88-9
#6Antonin Houska
ah@cybertec.at
In reply to: Chao Li (#5)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 20, 2026, at 22:52, Antonin Houska <ah@cybertec.at> wrote:

I'm just thinking if it's worth a separate error message.
RelationGetIndexList() just ignores the deferrable PK

if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex) && !pkdeferrable)
relation->rd_replidindex = pkeyIndex;

and if there's no other suitable index, the result is that there is no
identity index for the table. So the change attached here should be consistent
with this approach.

Thanks for your review. I guess you read the v1 patch. In v2, I have switched to use GetRelationIdentityOrPK() that Zhijie suggested, which has covered RelationGetIndexList() and all checks, so that code is simplified, and there is no longer a separate error message.

Yes, this looks like the best approach. Sorry for missing v2.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#7Yuchen Li
liyuchen_xyz@163.com
In reply to: Antonin Houska (#6)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

On 4/21/2026 2:09 PM, Antonin Houska wrote:

Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 20, 2026, at 22:52, Antonin Houska <ah@cybertec.at> wrote:

I'm just thinking if it's worth a separate error message.
RelationGetIndexList() just ignores the deferrable PK

if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex) && !pkdeferrable)
relation->rd_replidindex = pkeyIndex;

and if there's no other suitable index, the result is that there is no
identity index for the table. So the change attached here should be consistent
with this approach.

Thanks for your review. I guess you read the v1 patch. In v2, I have switched to use GetRelationIdentityOrPK() that Zhijie suggested, which has covered RelationGetIndexList() and all checks, so that code is simplified, and there is no longer a separate error message.

Yes, this looks like the best approach. Sorry for missing v2.

The patch looks good to me. Using GetRelationIdentityOrPK() makes the
check match the intended replica identity semantics more closely, and
the added regression coverage looks useful.

Regards,
Yuchen Li

#8Chao Li
li.evan.chao@gmail.com
In reply to: Antonin Houska (#6)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

On Apr 21, 2026, at 14:09, Antonin Houska <ah@cybertec.at> wrote:

Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 20, 2026, at 22:52, Antonin Houska <ah@cybertec.at> wrote:

I'm just thinking if it's worth a separate error message.
RelationGetIndexList() just ignores the deferrable PK

if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex) && !pkdeferrable)
relation->rd_replidindex = pkeyIndex;

and if there's no other suitable index, the result is that there is no
identity index for the table. So the change attached here should be consistent
with this approach.

Thanks for your review. I guess you read the v1 patch. In v2, I have switched to use GetRelationIdentityOrPK() that Zhijie suggested, which has covered RelationGetIndexList() and all checks, so that code is simplified, and there is no longer a separate error message.

Yes, this looks like the best approach. Sorry for missing v2.

Thanks for your reviewing and confirming.

Rebased to v4 as the CF reported a conflict.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v4-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patchapplication/octet-stream; name=v4-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patch; x-unix-mode=0644Download+97-9
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Chao Li (#8)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

On 2026-Apr-26, Chao Li wrote:

Thanks for your reviewing and confirming.

Rebased to v4 as the CF reported a conflict.

Thanks, pushed it now.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria. Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: repack: fix a bug to reject deferrable primary key fallback for concurrent mode

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Thanks, pushed it now.

So, no sooner had 4b2aa4b39 removed tests of REPACK CONCURRENTLY
from the core regression tests than 832e220d9 put one back.
BF member thorntail is broken again.

regards, tom lane