Ignore heap rewrites for materialized views in logical replication

Started by Euler Taveiraover 3 years ago7 messages
#1Euler Taveira
euler@eulerto.com
2 attachment(s)

Hi,

While investigating an internal report, I concluded that it is a bug. The
reproducible test case is simple (check 0002) and it consists of a FOR ALL
TABLES publication and a non-empty materialized view on publisher. After the
setup, if you refresh the MV, you got the following message on the subscriber:

ERROR: logical replication target relation "public.pg_temp_NNNNN" does not exist

That's because the commit 1a499c2520 (that fixes the heap rewrite for tables)
forgot to consider that materialized views can also create transient heaps and
they should also be skipped. The affected version is only 10 because 11
contains a different solution (commit 325f2ec555) that provides a proper fix
for the heap rewrite handling in logical decoding.

0001 is a patch to skip MV too. I attached 0002 to demonstrate the issue but it
doesn't seem appropriate to be included. The test was written to detect the
error and bail out. After this fix, it takes a considerable amount of time to
finish the test because it waits for a message that never arrives. Since nobody
reports this bug in 5 years and considering that version 10 will be EOL in 6
months, I don't think an additional test is crucial here.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

v1-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patchtext/x-patch; name=v1-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patchDownload
From 652efe45665d91f2f4ae865dba078fcaffdc0a17 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Fri, 27 May 2022 11:35:27 -0300
Subject: [PATCH v1 1/2] Ignore heap rewrites for materialized views in logical
 replication

If you have a publication that specifies FOR ALL TABLES clause, a REFRESH
MATERIALIZED VIEW can break your setup with the following message

ERROR:  logical replication target relation "public.pg_temp_NNN" does not exist

Commit 1a499c2520 introduces a heuristic to prevent that transient heaps (due
to table rewrites) are included for a FOR ALL TABLES publication. However, it
forgot to exclude materialized views (heap rewrites occurs when you execute
REFRESH MATERIALIZED VIEW). Since materialized views are not supported in
logical decoding, it is appropriate to filter them too.

As explained in the commit message, a more robust solution was included in
version 11 (commit 325f2ec555), hence only version 10 is affected.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f96fde3df0..e03ff4a11e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -540,7 +540,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 				if (sscanf(relname, "pg_temp_%u%n", &u, &n) == 1 &&
 					relname[n] == '\0')
 				{
-					if (get_rel_relkind(u) == RELKIND_RELATION)
+					char	relkind = get_rel_relkind(u);
+
+					if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
 						break;
 				}
 			}
-- 
2.30.2

v1-0002-test-heap-rewrite-for-materialized-view-in-logica.patchtext/x-patch; name=v1-0002-test-heap-rewrite-for-materialized-view-in-logica.patchDownload
From 7ba28dea6800359ae631c1a132f8f6c6d418548b Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Fri, 27 May 2022 14:43:10 -0300
Subject: [PATCH v1 2/2] test: heap rewrite for materialized view in logical
 replication

---
 src/test/subscription/t/006_rewrite.pl | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/test/subscription/t/006_rewrite.pl b/src/test/subscription/t/006_rewrite.pl
index 5e3211aefa..e5e0fdabe7 100644
--- a/src/test/subscription/t/006_rewrite.pl
+++ b/src/test/subscription/t/006_rewrite.pl
@@ -3,7 +3,8 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Test::More tests => 3;
 
 sub wait_for_caught_up
 {
@@ -26,6 +27,13 @@ my $ddl = "CREATE TABLE test1 (a int, b text);";
 $node_publisher->safe_psql('postgres', $ddl);
 $node_subscriber->safe_psql('postgres', $ddl);
 
+$ddl = "CREATE TABLE test2 (a int, b text);";
+$node_publisher->safe_psql('postgres', $ddl);
+$node_subscriber->safe_psql('postgres', $ddl);
+
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (10, 'ten'), (20, 'twenty');});
+$node_publisher->safe_psql('postgres', 'CREATE MATERIALIZED VIEW test3 AS SELECT a, b FROM test2;');
+
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 my $appname           = 'encoding_test';
 
@@ -69,5 +77,21 @@ is($node_subscriber->safe_psql('postgres', q{SELECT a, b, c FROM test1}),
 3|three|33),
    'data replicated to subscriber');
 
+$node_publisher->safe_psql('postgres', 'REFRESH MATERIALIZED VIEW test3;');
+
+my $max_attempts = 2 * $TestLib::timeout_default;
+my $attempts     = 0;
+my $errmsg = qr/logical replication target relation.*does not exist/;
+{
+	while ($attempts < $max_attempts)
+	{
+		last if (slurp_file($node_subscriber->logfile) =~ $errmsg);
+
+		usleep(100_000);
+		$attempts++;
+	}
+}
+is($attempts, $max_attempts, 'heap rewrite for materialized view on subscriber');
+
 $node_subscriber->stop;
 $node_publisher->stop;
-- 
2.30.2

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#1)
Re: Ignore heap rewrites for materialized views in logical replication

On Sat, May 28, 2022 at 2:44 AM Euler Taveira <euler@eulerto.com> wrote:

While investigating an internal report, I concluded that it is a bug. The
reproducible test case is simple (check 0002) and it consists of a FOR ALL
TABLES publication and a non-empty materialized view on publisher. After the
setup, if you refresh the MV, you got the following message on the subscriber:

ERROR: logical replication target relation "public.pg_temp_NNNNN" does not exist

That's because the commit 1a499c2520 (that fixes the heap rewrite for tables)
forgot to consider that materialized views can also create transient heaps and
they should also be skipped. The affected version is only 10 because 11
contains a different solution (commit 325f2ec555) that provides a proper fix
for the heap rewrite handling in logical decoding.

0001 is a patch to skip MV too.

I agree with your analysis and the fix looks correct to me.

I attached 0002 to demonstrate the issue but it
doesn't seem appropriate to be included. The test was written to detect the
error and bail out. After this fix, it takes a considerable amount of time to
finish the test because it waits for a message that never arrives.

Instead of waiting for an error, we can try to insert into a new table
created by the test case after the 'Refresh ..' command and wait for
the change to be replicated by using wait_for_caught_up.

Since nobody
reports this bug in 5 years and considering that version 10 will be EOL in 6
months, I don't think an additional test is crucial here.

Let's try to see if we can simplify the test so that it can be
committed along with a fix. If we are not able to find any reasonable
way then we can think of skipping it.

--
With Regards,
Amit Kapila.

#3Euler Taveira
euler@eulerto.com
In reply to: Amit Kapila (#2)
2 attachment(s)
Re: Ignore heap rewrites for materialized views in logical replication

On Sat, May 28, 2022, at 7:07 AM, Amit Kapila wrote:

I agree with your analysis and the fix looks correct to me.

Thanks for checking.

Instead of waiting for an error, we can try to insert into a new table
created by the test case after the 'Refresh ..' command and wait for
the change to be replicated by using wait_for_caught_up.

That's a good idea. [modifying the test...] I used the same table. Whenever the
new row arrives on the subscriber or it reads that error message, it bails out.

Let's try to see if we can simplify the test so that it can be
committed along with a fix. If we are not able to find any reasonable
way then we can think of skipping it.

The new test is attached.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

v2-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patchtext/x-patch; name=v2-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patchDownload
From fe13dc44f210b5f14b97bf9a29a242c2211ba700 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Fri, 27 May 2022 11:35:27 -0300
Subject: [PATCH v2 1/2] Ignore heap rewrites for materialized views in logical
 replication

If you have a publication that specifies FOR ALL TABLES clause, a REFRESH
MATERIALIZED VIEW can break your setup with the following message

ERROR:  logical replication target relation "public.pg_temp_NNN" does not exist

Commit 1a499c2520 introduces a heuristic to prevent that transient heaps (due
to table rewrites) are included for a FOR ALL TABLES publication. However, it
forgot to exclude materialized views (heap rewrites occurs when you execute
REFRESH MATERIALIZED VIEW). Since materialized views are not supported in
logical decoding, it is appropriate to filter them too.

As explained in the commit message, a more robust solution was included in
version 11 (commit 325f2ec555), hence only version 10 is affected.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f96fde3df0..e03ff4a11e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -540,7 +540,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 				if (sscanf(relname, "pg_temp_%u%n", &u, &n) == 1 &&
 					relname[n] == '\0')
 				{
-					if (get_rel_relkind(u) == RELKIND_RELATION)
+					char	relkind = get_rel_relkind(u);
+
+					if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
 						break;
 				}
 			}
-- 
2.30.2

v2-0002-test-heap-rewrite-for-materialized-view-in-logica.patchtext/x-patch; name=v2-0002-test-heap-rewrite-for-materialized-view-in-logica.patchDownload
From 04b6b6566f4cf4ddc8cf3ddc51420e93c599024d Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Fri, 27 May 2022 14:43:10 -0300
Subject: [PATCH v2 2/2] test: heap rewrite for materialized view in logical
 replication

---
 src/test/subscription/t/006_rewrite.pl | 35 +++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/src/test/subscription/t/006_rewrite.pl b/src/test/subscription/t/006_rewrite.pl
index 5e3211aefa..27eb6a1cb7 100644
--- a/src/test/subscription/t/006_rewrite.pl
+++ b/src/test/subscription/t/006_rewrite.pl
@@ -3,7 +3,8 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Test::More tests => 3;
 
 sub wait_for_caught_up
 {
@@ -26,6 +27,13 @@ my $ddl = "CREATE TABLE test1 (a int, b text);";
 $node_publisher->safe_psql('postgres', $ddl);
 $node_subscriber->safe_psql('postgres', $ddl);
 
+$ddl = "CREATE TABLE test2 (a int, b text);";
+$node_publisher->safe_psql('postgres', $ddl);
+$node_subscriber->safe_psql('postgres', $ddl);
+
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (10, 'ten'), (20, 'twenty');});
+$node_publisher->safe_psql('postgres', 'CREATE MATERIALIZED VIEW test3 AS SELECT a, b FROM test2;');
+
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 my $appname           = 'encoding_test';
 
@@ -69,5 +77,30 @@ is($node_subscriber->safe_psql('postgres', q{SELECT a, b, c FROM test1}),
 3|three|33),
    'data replicated to subscriber');
 
+$node_publisher->safe_psql('postgres', 'REFRESH MATERIALIZED VIEW test3;');
+
+# an additional row to check if the REFRESH worked
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (30, 'thirty');});
+
+my $max_attempts = 10 * $TestLib::timeout_default;
+my $attempts     = 0;
+my $ret          = 0;
+my $errmsg = qr/logical replication target relation.*does not exist/;
+while ($attempts < $max_attempts)
+{
+	# Succeed. Bail out.
+	if ($node_subscriber->safe_psql('postgres', 'SELECT COUNT(1) FROM test2') == 3) {
+		$ret = 1;
+		last;
+	}
+
+	# Failed. Bail out.
+	last if (slurp_file($node_subscriber->logfile) =~ $errmsg);
+
+	usleep(100_000);
+	$attempts++;
+}
+is($ret, 1, 'heap rewrite for materialized view on subscriber');
+
 $node_subscriber->stop;
 $node_publisher->stop;
-- 
2.30.2

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#3)
1 attachment(s)
Re: Ignore heap rewrites for materialized views in logical replication

On Tue, May 31, 2022 at 6:27 AM Euler Taveira <euler@eulerto.com> wrote:

On Sat, May 28, 2022, at 7:07 AM, Amit Kapila wrote:

I agree with your analysis and the fix looks correct to me.

Thanks for checking.

Instead of waiting for an error, we can try to insert into a new table
created by the test case after the 'Refresh ..' command and wait for
the change to be replicated by using wait_for_caught_up.

That's a good idea. [modifying the test...] I used the same table. Whenever the
new row arrives on the subscriber or it reads that error message, it bails out.

I think we don't need the retry logical to check error, a simple
wait_for_caught_up should be sufficient as we are doing in other
tests. See attached. I have slightly modified the commit message as
well. Kindly let me know what you think?

--
With Regards,
Amit Kapila.

Attachments:

v3-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patchapplication/octet-stream; name=v3-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patchDownload
From ae56052c01cc04d4a8d197a829ada515a2e9f69a Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Tue, 31 May 2022 19:29:27 +0530
Subject: [PATCH v3] Ignore heap rewrites for materialized views in logical
 replication.

If you have a publication that specifies FOR ALL TABLES clause, a REFRESH
MATERIALIZED VIEW can break your setup with the following message

ERROR: logical replication target relation "public.pg_temp_NNN" does not exist

Commit 1a499c2520 introduces a heuristic to skip table writes that look
like they are from a heap rewrite for a FOR ALL TABLES publication.
However, it forgot to exclude materialized views (heap rewrites occur when
you execute REFRESH MATERIALIZED VIEW). Since materialized views are not
supported in logical decoding, it is appropriate to filter them too.

As explained in the commit 1a499c2520, a more robust solution was included
in version 11 (commit 325f2ec555), hence only version 10 is affected.

Reported-by: Euler Taveira
Author: Euler Taveira
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/bc557ebe-92dc-4afa-b6bb-285a9eeaa614@www.fastmail.com
---
 src/backend/replication/pgoutput/pgoutput.c |  4 +++-
 src/test/subscription/t/006_rewrite.pl      | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f96fde3..e03ff4a 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -540,7 +540,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 				if (sscanf(relname, "pg_temp_%u%n", &u, &n) == 1 &&
 					relname[n] == '\0')
 				{
-					if (get_rel_relkind(u) == RELKIND_RELATION)
+					char	relkind = get_rel_relkind(u);
+
+					if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
 						break;
 				}
 			}
diff --git a/src/test/subscription/t/006_rewrite.pl b/src/test/subscription/t/006_rewrite.pl
index 5e3211a..20151b2 100644
--- a/src/test/subscription/t/006_rewrite.pl
+++ b/src/test/subscription/t/006_rewrite.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 3;
 
 sub wait_for_caught_up
 {
@@ -26,6 +26,13 @@ my $ddl = "CREATE TABLE test1 (a int, b text);";
 $node_publisher->safe_psql('postgres', $ddl);
 $node_subscriber->safe_psql('postgres', $ddl);
 
+$ddl = "CREATE TABLE test2 (a int, b text);";
+$node_publisher->safe_psql('postgres', $ddl);
+$node_subscriber->safe_psql('postgres', $ddl);
+
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (10, 'ten'), (20, 'twenty');});
+$node_publisher->safe_psql('postgres', 'CREATE MATERIALIZED VIEW test3 AS SELECT a, b FROM test2;');
+
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 my $appname           = 'encoding_test';
 
@@ -69,5 +76,16 @@ is($node_subscriber->safe_psql('postgres', q{SELECT a, b, c FROM test1}),
 3|three|33),
    'data replicated to subscriber');
 
+# Another DDL that causes a heap rewrite
+$node_publisher->safe_psql('postgres', 'REFRESH MATERIALIZED VIEW test3;');
+
+# an additional row to check if the REFRESH worked
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (30, 'thirty');});
+
+wait_for_caught_up($node_publisher, $appname);
+
+is($node_subscriber->safe_psql('postgres', q{SELECT COUNT(1) FROM test2}), 3,
+   'data replicated to subscriber after refresh');
+
 $node_subscriber->stop;
 $node_publisher->stop;
-- 
1.8.3.1

#5Euler Taveira
euler@eulerto.com
In reply to: Amit Kapila (#4)
Re: Ignore heap rewrites for materialized views in logical replication

On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote:

I think we don't need the retry logical to check error, a simple
wait_for_caught_up should be sufficient as we are doing in other
tests. See attached. I have slightly modified the commit message as
well. Kindly let me know what you think?

Your modification will hang until the test timeout without the patch. That's
why I avoided to use wait_for_caught_up and used a loop for fast exit on success
or failure. I'm fine with a simple test case like you proposed.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#5)
Re: Ignore heap rewrites for materialized views in logical replication

On Tue, May 31, 2022 at 8:28 PM Euler Taveira <euler@eulerto.com> wrote:

On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote:

I think we don't need the retry logical to check error, a simple
wait_for_caught_up should be sufficient as we are doing in other
tests. See attached. I have slightly modified the commit message as
well. Kindly let me know what you think?

Your modification will hang until the test timeout without the patch. That's
why I avoided to use wait_for_caught_up and used a loop for fast exit on success
or failure.

Right, but that is true for other tests as well and we are not
expecting to face this/other errors. I think keeping it simple and
similar to other tests seems enough for this case.

I'm fine with a simple test case like you proposed.

Thanks, I'll push this in a day or two unless I see any other
suggestions/comments. Note to others: this is v10 fix only. As
mentioned by Euler in his initial email, this is not required from v11
onwards due to a different solution for this problem via commit
325f2ec555.

--
With Regards,
Amit Kapila.

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#6)
Re: Ignore heap rewrites for materialized views in logical replication

On Wed, Jun 1, 2022 at 10:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 31, 2022 at 8:28 PM Euler Taveira <euler@eulerto.com> wrote:

On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote:

I think we don't need the retry logical to check error, a simple
wait_for_caught_up should be sufficient as we are doing in other
tests. See attached. I have slightly modified the commit message as
well. Kindly let me know what you think?

Your modification will hang until the test timeout without the patch. That's
why I avoided to use wait_for_caught_up and used a loop for fast exit on success
or failure.

Right, but that is true for other tests as well and we are not
expecting to face this/other errors. I think keeping it simple and
similar to other tests seems enough for this case.

I'm fine with a simple test case like you proposed.

Thanks, I'll push this in a day or two unless I see any other
suggestions/comments.

Pushed.

--
With Regards,
Amit Kapila.