BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

Started by PG Bug reporting formover 3 years ago7 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17670
Logged by: Yunhe Xu
Email address: xyh@nvn.xyz
PostgreSQL version: 14.4
Operating system: rhel 7
Description:

Logical Replication data may be lost on the subscription under certain
scenarios.
The following is review process.

* Logical Replication Information
t1=# select * from pg_publication_tables ;
pubname | schemaname | tablename
------------+------------+----------------
pub1 | public | t_test1
t2=# select srrelid::regclass,* from pg_subscription_rel ;
srrelid | srsubid | srrelid | srsubstate | srsublsn
---------+---------+---------+------------+-----------
t_test1 | 57551 | 41170 | r | 0/696E418
t1=# select application_name,state from pg_stat_replication where
application_name='test2_sub';
application_name | state
------------------+-----------
test2_sub | streaming

* Verify the status is normal
t1=# insert into t_test1 values (1);
INSERT 0 1
t1=# select * from t_test1;
id
----
1
t2=# select * from t_test1;
id
----
1

* Then delete this table on subscription :
t2=# alter table t_test1 rename TO t_test2;
ALTER TABLE

* Now,do DMLs
t1=# insert into t_test1 values (2);
INSERT 0 1
t1=# delete from t_test1 where id=1;
DELETE 1
t1=#
* The log gives some errors
2022-10-28 15:14:07.919 CST,,,2600,,635b813f.a28,2,,2022-10-28 15:14:07
CST,4/12,0,ERROR,55000,"logical replication target relation
""public.t_test1"" does not exist",,,,,,,,,"","logical replication
worker",,0

* OK,Let me create this table
t2=# create table t_test1 (id int PRIMARY KEY);
CREATE TABLE

* At this point, the log is no longer showing errors.But the incremental
data is lost.
t2=# select * from t_test1;
id
----
(0 rows)

t1=# insert into t_test1 values (3);
INSERT 0 1
t1=# insert into t_test1 values (4);
INSERT 0 1

t2=# select * from t_test1;
id
----
(0 rows)
t2=# select * from t_test2 ;
id
----
1
(1 row)

* Still no error.
* Now modify the previous table back

t2=# drop table t_test1 ;
DROP TABLE
t2=# alter table t_test2 rename TO t_test1;
ALTER TABLE

t1=# insert into t_test1 values (5);
INSERT 0 1
t1=# select * from t_test1;
id
----
2
3
4
5
(4 rows)

t2=# select * from t_test1;
id
----
1
5
(2 rows)

* The middle operation-delete id=1 and insert id=3,4-is loss.
*Please confirm if this is a BUG.

Thanks.
Yunhe Xu

#2Japin Li
japinli@hotmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

On Fri, 28 Oct 2022 at 15:48, PG Bug reporting form <noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17670
Logged by: Yunhe Xu
Email address: xyh@nvn.xyz
PostgreSQL version: 14.4
Operating system: rhel 7
Description:

Logical Replication data may be lost on the subscription under certain
scenarios.
The following is review process.

* Logical Replication Information
t1=# select * from pg_publication_tables ;
pubname | schemaname | tablename
------------+------------+----------------
pub1 | public | t_test1
t2=# select srrelid::regclass,* from pg_subscription_rel ;
srrelid | srsubid | srrelid | srsubstate | srsublsn
---------+---------+---------+------------+-----------
t_test1 | 57551 | 41170 | r | 0/696E418
t1=# select application_name,state from pg_stat_replication where
application_name='test2_sub';
application_name | state
------------------+-----------
test2_sub | streaming

* Verify the status is normal
t1=# insert into t_test1 values (1);
INSERT 0 1
t1=# select * from t_test1;
id
----
1
t2=# select * from t_test1;
id
----
1

* Then delete this table on subscription :
t2=# alter table t_test1 rename TO t_test2;
ALTER TABLE

* Now,do DMLs
t1=# insert into t_test1 values (2);
INSERT 0 1
t1=# delete from t_test1 where id=1;
DELETE 1
t1=#
* The log gives some errors
2022-10-28 15:14:07.919 CST,,,2600,,635b813f.a28,2,,2022-10-28 15:14:07
CST,4/12,0,ERROR,55000,"logical replication target relation
""public.t_test1"" does not exist",,,,,,,,,"","logical replication
worker",,0

* OK,Let me create this table
t2=# create table t_test1 (id int PRIMARY KEY);
CREATE TABLE

* At this point, the log is no longer showing errors.But the incremental
data is lost.
t2=# select * from t_test1;
id
----
(0 rows)

t1=# insert into t_test1 values (3);
INSERT 0 1
t1=# insert into t_test1 values (4);
INSERT 0 1

t2=# select * from t_test1;
id
----
(0 rows)
t2=# select * from t_test2 ;
id
----
1
(1 row)

* Still no error.
* Now modify the previous table back

t2=# drop table t_test1 ;
DROP TABLE
t2=# alter table t_test2 rename TO t_test1;
ALTER TABLE

t1=# insert into t_test1 values (5);
INSERT 0 1
t1=# select * from t_test1;
id
----
2
3
4
5
(4 rows)

t2=# select * from t_test1;
id
----
1
5
(2 rows)

* The middle operation-delete id=1 and insert id=3,4-is loss.
*Please confirm if this is a BUG.

I can reproduce it on HEAD. Here is my analysis:

When we rename the t_test1 to t_test2, the subscriber doesn't have a
table matched publication table name, so the logical replication throw
an error. Then, we create a new t_test1 on subscriber, the logical
replication worker can find the table that matches the published name.
However, the pg_subscription_rel hasn't updated, and when we try to get
the subscription state through GetSubscriptionRelState(), it cannot find
a matched subscription relation mapping, so the WAL cannot apply to the
new table t_test1.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Japin Li (#2)
Re: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

On Fri, Oct 28, 2022 at 8:07 PM Japin Li <japinli@hotmail.com> wrote:

I can reproduce it on HEAD. Here is my analysis:

When we rename the t_test1 to t_test2, the subscriber doesn't have a
table matched publication table name, so the logical replication throw
an error. Then, we create a new t_test1 on subscriber, the logical
replication worker can find the table that matches the published name.
However, the pg_subscription_rel hasn't updated, and when we try to get
the subscription state through GetSubscriptionRelState(), it cannot find
a matched subscription relation mapping, so the WAL cannot apply to the
new table t_test1.

I am just wondering if it is correct behavior to allow renaming the
table used by a subscription, or should there be some dependency?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#4Japin Li
japinli@hotmail.com
In reply to: Dilip Kumar (#3)
Re: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

On Sun, 30 Oct 2022 at 14:39, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Oct 28, 2022 at 8:07 PM Japin Li <japinli@hotmail.com> wrote:

I can reproduce it on HEAD. Here is my analysis:

When we rename the t_test1 to t_test2, the subscriber doesn't have a
table matched publication table name, so the logical replication throw
an error. Then, we create a new t_test1 on subscriber, the logical
replication worker can find the table that matches the published name.
However, the pg_subscription_rel hasn't updated, and when we try to get
the subscription state through GetSubscriptionRelState(), it cannot find
a matched subscription relation mapping, so the WAL cannot apply to the
new table t_test1.

I am just wondering if it is correct behavior to allow renaming the
table used by a subscription, or should there be some dependency?

Maybe we can add a dependency to make the user know what they are doing.
I also want to know when we should add a dependency?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Japin Li (#4)
Re: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

On Sun, Oct 30, 2022 at 7:22 PM Japin Li <japinli@hotmail.com> wrote:

I am just wondering if it is correct behavior to allow renaming the
table used by a subscription, or should there be some dependency?

Maybe we can add a dependency to make the user know what they are doing.
I also want to know when we should add a dependency?

I haven't thought about it in deep, but I think whenever we add a new
entry to pg_subscription_rel, that time maybe we could add a
dependency on the respective subscription entry IMHO. But I am just
wondering is there any reason why we are not already having such
dependency?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

On Fri, Oct 28, 2022 at 2:51 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17670
Logged by: Yunhe Xu
Email address: xyh@nvn.xyz
PostgreSQL version: 14.4
Operating system: rhel 7
Description:

Logical Replication data may be lost on the subscription under certain
scenarios.
The following is review process.

* Logical Replication Information
t1=# select * from pg_publication_tables ;
pubname | schemaname | tablename
------------+------------+----------------
pub1 | public | t_test1
t2=# select srrelid::regclass,* from pg_subscription_rel ;
srrelid | srsubid | srrelid | srsubstate | srsublsn
---------+---------+---------+------------+-----------
t_test1 | 57551 | 41170 | r | 0/696E418
t1=# select application_name,state from pg_stat_replication where
application_name='test2_sub';
application_name | state
------------------+-----------
test2_sub | streaming

* Verify the status is normal
t1=# insert into t_test1 values (1);
INSERT 0 1
t1=# select * from t_test1;
id
----
1
t2=# select * from t_test1;
id
----
1

* Then delete this table on subscription :
t2=# alter table t_test1 rename TO t_test2;
ALTER TABLE

* Now,do DMLs
t1=# insert into t_test1 values (2);
INSERT 0 1
t1=# delete from t_test1 where id=1;
DELETE 1
t1=#
* The log gives some errors
2022-10-28 15:14:07.919 CST,,,2600,,635b813f.a28,2,,2022-10-28 15:14:07
CST,4/12,0,ERROR,55000,"logical replication target relation
""public.t_test1"" does not exist",,,,,,,,,"","logical replication
worker",,0

* OK,Let me create this table
t2=# create table t_test1 (id int PRIMARY KEY);
CREATE TABLE

* At this point, the log is no longer showing errors.But the incremental
data is lost.

If you do "Alter Subscription test2_sub Refresh Publication;" then the
data should be synced.

--
With Regards,
Amit Kapila.

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#5)
Re: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

On Mon, Oct 31, 2022 at 2:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Oct 30, 2022 at 7:22 PM Japin Li <japinli@hotmail.com> wrote:

I am just wondering if it is correct behavior to allow renaming the
table used by a subscription, or should there be some dependency?

Maybe we can add a dependency to make the user know what they are doing.
I also want to know when we should add a dependency?

I haven't thought about it in deep, but I think whenever we add a new
entry to pg_subscription_rel, that time maybe we could add a
dependency on the respective subscription entry IMHO. But I am just
wondering is there any reason why we are not already having such
dependency?

As per our current implementation, users need to be careful with DDL
operations on tables involved in subscriptions to ensure smooth
replication. In the example shared in this thread, say if the user
wants to rename tables on both publisher and subscriber then the case
reported would have worked without hassles, and disallowing Alter
Table would have unnecessarily made that operation difficult.
Similarly, say if the user wants to add a primary key on the table on
both publisher and subscriber, then it would be quite easy with the
current system but adding dependencies could be additional work for
the user and some people can complain about the same.

--
With Regards,
Amit Kapila.