Introduce wait_for_subscription_sync for TAP tests

Started by Masahiko Sawadaover 3 years ago25 messageshackers
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi,

In tap tests for logical replication, we have the following code in many places:

$node_publisher->wait_for_catchup('tap_sub');
my $synced_query =
"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
$node_subscriber->poll_query_until('postgres', $synced_query)
or die "Timed out while waiting for subscriber to synchronize data";

Also, we sometime forgot to check either one, like we fixed in commit
1f50918a6fb02207d151e7cb4aae4c36de9d827c.

I think we can have a new function to wait for all subscriptions to
synchronize data. The attached patch introduce a new function
wait_for_subscription_sync(). With this function, we can replace the
above code with this one function as follows:

$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v1-0001-Introduce-wait_for_subscription_sync-for-TAP-test.patchapplication/octet-stream; name=v1-0001-Introduce-wait_for_subscription_sync-for-TAP-test.patchDownload+111-242
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Introduce wait_for_subscription_sync for TAP tests

On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In tap tests for logical replication, we have the following code in many places:

$node_publisher->wait_for_catchup('tap_sub');
my $synced_query =
"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
$node_subscriber->poll_query_until('postgres', $synced_query)
or die "Timed out while waiting for subscriber to synchronize data";

Also, we sometime forgot to check either one, like we fixed in commit
1f50918a6fb02207d151e7cb4aae4c36de9d827c.

I think we can have a new function to wait for all subscriptions to
synchronize data. The attached patch introduce a new function
wait_for_subscription_sync(). With this function, we can replace the
above code with this one function as follows:

$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

+1. This reduces quite some code in various tests and will make it
easier to write future tests.

Few comments/questions:
====================
1.
-$node_publisher->wait_for_catchup('mysub1');
-
-# Also wait for initial table sync to finish.
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-
 # Also wait for initial table sync to finish.
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1');

It seems to me without your patch there is an extra poll in the above
test. If so, we can probably remove that in a separate patch?

2.
+    # wait for the replication to catchup if required.
+    if (defined($publisher))
+    {
+ croak 'subscription name must be specified' unless defined($subname);
+ $publisher->wait_for_catchup($subname, 'replay');
+    }
+
+    # then, wait for all table states to be ready.
+    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
+    my $query = qq[SELECT count(1) = 0
+     FROM pg_subscription_rel
+     WHERE srsubstate NOT IN ('r', 's');];
+    $self->poll_query_until($dbname, $query)
+ or croak "timed out waiting for subscriber to synchronize data";

In the tests, I noticed that a few places did wait_for_catchup after
the subscription check, and at other places, we did that check before
as you have it here. Ideally, I think wait_for_catchup should be after
confirming the initial sync is over as without initial sync, the
publisher node won't be completely in sync with the subscriber. What
do you think?

3. In the code quoted in the previous point, why did you pass the
second parameter as 'replay' when we have not used that in the tests
otherwise?

--
With Regards,
Amit Kapila.

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#2)
Re: Introduce wait_for_subscription_sync for TAP tests

On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In tap tests for logical replication, we have the following code in many places:

$node_publisher->wait_for_catchup('tap_sub');
my $synced_query =
"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
$node_subscriber->poll_query_until('postgres', $synced_query)
or die "Timed out while waiting for subscriber to synchronize data";

Also, we sometime forgot to check either one, like we fixed in commit
1f50918a6fb02207d151e7cb4aae4c36de9d827c.

I think we can have a new function to wait for all subscriptions to
synchronize data. The attached patch introduce a new function
wait_for_subscription_sync(). With this function, we can replace the
above code with this one function as follows:

$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

+1. This reduces quite some code in various tests and will make it
easier to write future tests.

Few comments/questions:
====================
1.
-$node_publisher->wait_for_catchup('mysub1');
-
-# Also wait for initial table sync to finish.
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-
# Also wait for initial table sync to finish.
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1');

It seems to me without your patch there is an extra poll in the above
test. If so, we can probably remove that in a separate patch?

Agreed.

2.
+    # wait for the replication to catchup if required.
+    if (defined($publisher))
+    {
+ croak 'subscription name must be specified' unless defined($subname);
+ $publisher->wait_for_catchup($subname, 'replay');
+    }
+
+    # then, wait for all table states to be ready.
+    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
+    my $query = qq[SELECT count(1) = 0
+     FROM pg_subscription_rel
+     WHERE srsubstate NOT IN ('r', 's');];
+    $self->poll_query_until($dbname, $query)
+ or croak "timed out waiting for subscriber to synchronize data";

In the tests, I noticed that a few places did wait_for_catchup after
the subscription check, and at other places, we did that check before
as you have it here. Ideally, I think wait_for_catchup should be after
confirming the initial sync is over as without initial sync, the
publisher node won't be completely in sync with the subscriber.

What do you mean by the last sentence? I thought the order doesn't
matter here. Even if we do wait_for_catchup first then the
subscription check, we can make sure that the apply worker caught up
and table synchronization has been done, no?

3. In the code quoted in the previous point, why did you pass the
second parameter as 'replay' when we have not used that in the tests
otherwise?

It makes sure to use the (current) default value of $mode of
wait_for_catchup(). But probably it's not necessary, I've removed it.

I've attached an updated patch as well as a patch to remove duplicated
waits in 007_ddl.pl.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0001-Remove-duplicated-wait-for-subscription-synchroni.patchapplication/x-patch; name=v2-0001-Remove-duplicated-wait-for-subscription-synchroni.patchDownload+0-5
v2-0002-Introduce-wait_for_subscription_sync-for-TAP-test.patchapplication/x-patch; name=v2-0002-Introduce-wait_for_subscription_sync-for-TAP-test.patchDownload+111-238
#4shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#3)
RE: Introduce wait_for_subscription_sync for TAP tests

On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch as well as a patch to remove duplicated
waits in 007_ddl.pl.

Thanks for your patch. Here are some comments.

1.
I think some comments need to be changed in the patch.
For example:
# Also wait for initial table sync to finish
# Wait for initial sync to finish as well

Words like "Also" and "as well" can be removed now, we originally used them
because we wait for catchup and "also" wait for initial sync.

2.
In the following places, we can remove wait_for_catchup() and then call it in
wait_for_subscription_sync().

2.1.
030_origin.pl:
@@ -128,8 +120,7 @@ $node_B->safe_psql(

$node_C->wait_for_catchup($appname_B2);

-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->wait_for_subscription_sync;

2.2.
031_column_list.pl:
@@ -385,7 +373,7 @@ $node_subscriber->safe_psql(
ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3
));

-wait_for_subscription_sync($node_subscriber);
+$node_subscriber->wait_for_subscription_sync;

$node_publisher->wait_for_catchup('sub1');

2.3.
100_bugs.pl:
@@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres',
$node_publisher->wait_for_catchup('tap_sub');

 # Also wait for initial table sync to finish
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync;

is( $node_subscriber->safe_psql(
'postgres', "SELECT * FROM tab_replidentity_index"),

Regards,
Shi yu

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#3)
Re: Introduce wait_for_subscription_sync for TAP tests

On Tue, Jul 26, 2022 at 1:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

2.
+    # wait for the replication to catchup if required.
+    if (defined($publisher))
+    {
+ croak 'subscription name must be specified' unless defined($subname);
+ $publisher->wait_for_catchup($subname, 'replay');
+    }
+
+    # then, wait for all table states to be ready.
+    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
+    my $query = qq[SELECT count(1) = 0
+     FROM pg_subscription_rel
+     WHERE srsubstate NOT IN ('r', 's');];
+    $self->poll_query_until($dbname, $query)
+ or croak "timed out waiting for subscriber to synchronize data";

In the tests, I noticed that a few places did wait_for_catchup after
the subscription check, and at other places, we did that check before
as you have it here. Ideally, I think wait_for_catchup should be after
confirming the initial sync is over as without initial sync, the
publisher node won't be completely in sync with the subscriber.

What do you mean by the last sentence? I thought the order doesn't
matter here. Even if we do wait_for_catchup first then the
subscription check, we can make sure that the apply worker caught up
and table synchronization has been done, no?

That's right. I thought we should first ensure the subscriber has
finished operations if possible, like in this case, it can ensure
table sync has finished and then we can ensure whether publisher and
subscriber are in sync. That sounds more logical to me.

--
With Regards,
Amit Kapila.

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: shiy.fnst@fujitsu.com (#4)
Re: Introduce wait_for_subscription_sync for TAP tests

On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch as well as a patch to remove duplicated
waits in 007_ddl.pl.

Thanks for your patch. Here are some comments.

Thank you for the comments!

1.
I think some comments need to be changed in the patch.
For example:
# Also wait for initial table sync to finish
# Wait for initial sync to finish as well

Words like "Also" and "as well" can be removed now, we originally used them
because we wait for catchup and "also" wait for initial sync.

Agreed.

2.
In the following places, we can remove wait_for_catchup() and then call it in
wait_for_subscription_sync().

2.1.
030_origin.pl:
@@ -128,8 +120,7 @@ $node_B->safe_psql(

$node_C->wait_for_catchup($appname_B2);

-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->wait_for_subscription_sync;

2.2.
031_column_list.pl:
@@ -385,7 +373,7 @@ $node_subscriber->safe_psql(
ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3
));

-wait_for_subscription_sync($node_subscriber);
+$node_subscriber->wait_for_subscription_sync;

$node_publisher->wait_for_catchup('sub1');

2.3.
100_bugs.pl:
@@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres',
$node_publisher->wait_for_catchup('tap_sub');

# Also wait for initial table sync to finish
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync;

is( $node_subscriber->safe_psql(
'postgres', "SELECT * FROM tab_replidentity_index"),

Agreed.

I've attached updated patches that incorporated the above comments as
well as the comment from Amit.

BTW regarding 0001 patch to remove the duplicated wait, should we
backpatch to v15? I think we can do that as it's an obvious fix and it
seems to be an oversight in 8f2e2bbf145.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v3-0001-Remove-duplicated-wait-for-subscription-synchroni.patchapplication/octet-stream; name=v3-0001-Remove-duplicated-wait-for-subscription-synchroni.patchDownload+0-5
v3-0002-Introduce-wait_for_subscription_sync-for-TAP-test.patchapplication/octet-stream; name=v3-0002-Introduce-wait_for_subscription_sync-for-TAP-test.patchDownload+128-261
#7Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#5)
Re: Introduce wait_for_subscription_sync for TAP tests

On Wed, Jul 27, 2022 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 26, 2022 at 1:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

2.
+    # wait for the replication to catchup if required.
+    if (defined($publisher))
+    {
+ croak 'subscription name must be specified' unless defined($subname);
+ $publisher->wait_for_catchup($subname, 'replay');
+    }
+
+    # then, wait for all table states to be ready.
+    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
+    my $query = qq[SELECT count(1) = 0
+     FROM pg_subscription_rel
+     WHERE srsubstate NOT IN ('r', 's');];
+    $self->poll_query_until($dbname, $query)
+ or croak "timed out waiting for subscriber to synchronize data";

In the tests, I noticed that a few places did wait_for_catchup after
the subscription check, and at other places, we did that check before
as you have it here. Ideally, I think wait_for_catchup should be after
confirming the initial sync is over as without initial sync, the
publisher node won't be completely in sync with the subscriber.

What do you mean by the last sentence? I thought the order doesn't
matter here. Even if we do wait_for_catchup first then the
subscription check, we can make sure that the apply worker caught up
and table synchronization has been done, no?

That's right. I thought we should first ensure the subscriber has
finished operations if possible, like in this case, it can ensure
table sync has finished and then we can ensure whether publisher and
subscriber are in sync. That sounds more logical to me.

Make sense. I've incorporated it in the v3 patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#6)
Re: Introduce wait_for_subscription_sync for TAP tests

On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

I've attached updated patches that incorporated the above comments as
well as the comment from Amit.

BTW regarding 0001 patch to remove the duplicated wait, should we
backpatch to v15?

I think it is good to clean this test case even for PG15 even though
there is no major harm in keeping it. I'll push this early next week
by Tuesday unless someone thinks otherwise.

--
With Regards,
Amit Kapila.

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#8)
Re: Introduce wait_for_subscription_sync for TAP tests

On Sat, Jul 30, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

I've attached updated patches that incorporated the above comments as
well as the comment from Amit.

BTW regarding 0001 patch to remove the duplicated wait, should we
backpatch to v15?

I think it is good to clean this test case even for PG15 even though
there is no major harm in keeping it. I'll push this early next week
by Tuesday unless someone thinks otherwise.

Pushed this one and now I'll look at your other patch.

--
With Regards,
Amit Kapila.

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#9)
Re: Introduce wait_for_subscription_sync for TAP tests

On Wed, Aug 3, 2022 at 1:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jul 30, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

I've attached updated patches that incorporated the above comments as
well as the comment from Amit.

BTW regarding 0001 patch to remove the duplicated wait, should we
backpatch to v15?

I think it is good to clean this test case even for PG15 even though
there is no major harm in keeping it. I'll push this early next week
by Tuesday unless someone thinks otherwise.

Pushed this one and now I'll look at your other patch.

Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#9)
Re: Introduce wait_for_subscription_sync for TAP tests

On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Pushed this one and now I'll look at your other patch.

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1]/messages/by-id/20220803104544.k2luy5hr2ugnhgr2@alvherre.pgsql and Tom [2]/messages/by-id/2966703.1659535343@sss.pgh.pa.us suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

[1]: /messages/by-id/20220803104544.k2luy5hr2ugnhgr2@alvherre.pgsql
[2]: /messages/by-id/2966703.1659535343@sss.pgh.pa.us

--
With Regards,
Amit Kapila.

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#11)
Re: Introduce wait_for_subscription_sync for TAP tests

On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Pushed this one and now I'll look at your other patch.

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

Yes. I've attached patches for backbranches. The updates are
straightforward on v11 - v15. However, on v10, we don't use
wait_for_catchup() in some logical replication test cases. The commit
bbd3363e128dae refactored the tests to use wait_for_catchup but it's
not backpatched. So in the patch for v10, I didn't change the code
that was changed by the commit. Also, since wait_for_catchup requires
to specify $target_lsn, unlike the one in v11 or later, I changed
wait_for_subscription_sync() accordingly.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

REL13_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/x-patch; name=REL13_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+69-85
REL15_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/x-patch; name=REL15_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+125-246
REL14_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/x-patch; name=REL14_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+85-146
REL11_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/x-patch; name=REL11_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+61-63
REL12_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/x-patch; name=REL12_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+62-67
REL10_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL10_v4-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+57-49
#13shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#12)
RE: Introduce wait_for_subscription_sync for TAP tests

On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Pushed this one and now I'll look at your other patch.

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

Yes. I've attached patches for backbranches. The updates are
straightforward on v11 - v15. However, on v10, we don't use
wait_for_catchup() in some logical replication test cases. The commit
bbd3363e128dae refactored the tests to use wait_for_catchup but it's
not backpatched. So in the patch for v10, I didn't change the code
that was changed by the commit. Also, since wait_for_catchup requires
to specify $target_lsn, unlike the one in v11 or later, I changed
wait_for_subscription_sync() accordingly.

Thanks for your patches.

In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
current change in PostgresNode.pm. Right?

Regards,
Shi yu

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Masahiko Sawada (#12)
Re: Introduce wait_for_subscription_sync for TAP tests

Masahiko Sawada <sawada.mshk@gmail.com> writes:

Yes. I've attached patches for backbranches.

FWIW, I'd recommend waiting till after next week's wrap before
pushing these. While I'm definitely in favor of doing this,
the odds of introducing a bug are nonzero, so right before a
release deadline doesn't seem like a good time.

regards, tom lane

#15shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: shiy.fnst@fujitsu.com (#13)
RE: Introduce wait_for_subscription_sync for TAP tests

On Thu, Aug 4, 2022 5:49 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:

On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila

<amit.kapila16@gmail.com>

wrote:

Pushed this one and now I'll look at your other patch.

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

Yes. I've attached patches for backbranches. The updates are
straightforward on v11 - v15. However, on v10, we don't use
wait_for_catchup() in some logical replication test cases. The commit
bbd3363e128dae refactored the tests to use wait_for_catchup but it's
not backpatched. So in the patch for v10, I didn't change the code
that was changed by the commit. Also, since wait_for_catchup requires
to specify $target_lsn, unlike the one in v11 or later, I changed
wait_for_subscription_sync() accordingly.

Thanks for your patches.

In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
current change in PostgresNode.pm. Right?

By the way, I notice that in 002_types.pl (on master branch), it seems the "as
well" in the following comment should be removed. Is it worth being fixed?

$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name = tap_sub_slot)"
);

# Wait for initial sync to finish as well
$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Regards,
Shi yu

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#14)
Re: Introduce wait_for_subscription_sync for TAP tests

On Thu, Aug 4, 2022 at 7:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Masahiko Sawada <sawada.mshk@gmail.com> writes:

Yes. I've attached patches for backbranches.

FWIW, I'd recommend waiting till after next week's wrap before
pushing these. While I'm definitely in favor of doing this,
the odds of introducing a bug are nonzero, so right before a
release deadline doesn't seem like a good time.

Agreed. I was planning to do it only after next week's wrap. Thanks
for your suggestion.

--
With Regards,
Amit Kapila.

#17Masahiko Sawada
sawada.mshk@gmail.com
In reply to: shiy.fnst@fujitsu.com (#15)
Re: Introduce wait_for_subscription_sync for TAP tests

On Fri, Aug 5, 2022 at 10:39 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Thu, Aug 4, 2022 5:49 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:

On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila

<amit.kapila16@gmail.com>

wrote:

Pushed this one and now I'll look at your other patch.

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

Yes. I've attached patches for backbranches. The updates are
straightforward on v11 - v15. However, on v10, we don't use
wait_for_catchup() in some logical replication test cases. The commit
bbd3363e128dae refactored the tests to use wait_for_catchup but it's
not backpatched. So in the patch for v10, I didn't change the code
that was changed by the commit. Also, since wait_for_catchup requires
to specify $target_lsn, unlike the one in v11 or later, I changed
wait_for_subscription_sync() accordingly.

Thanks for your patches.

In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
current change in PostgresNode.pm. Right?

By the way, I notice that in 002_types.pl (on master branch), it seems the "as
well" in the following comment should be removed. Is it worth being fixed?

$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name = tap_sub_slot)"
);

# Wait for initial sync to finish as well
$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Thank you for the comments. I've attached updated version patches.
Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

REL14_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL14_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+88-147
REL12_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL12_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+65-68
REL11_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL11_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+64-64
REL13_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL13_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+72-86
REL15_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL15_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+125-246
REL10_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchapplication/octet-stream; name=REL10_v5-0001-Add-wait_for_subscription_sync-for-TAP-tests.patchDownload+59-49
#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#17)
Re: Introduce wait_for_subscription_sync for TAP tests

On Wed, Aug 10, 2022 at 10:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 5, 2022 at 10:39 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

Thank you for the comments. I've attached updated version patches.
Please review them.

Pushed.

--
With Regards,
Amit Kapila.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#18)
Re: Introduce wait_for_subscription_sync for TAP tests

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed.

Recently a number of buildfarm animals have failed at the same
place in src/test/subscription/t/100_bugs.pl [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&amp;dt=2022-09-09%2012%3A03%3A46[2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2022-09-09%2011%3A16%3A36[3]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&amp;dt=2022-09-09%2010%3A33%3A19[4]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2022-09-08%2010%3A56%3A59:

# Failed test '2x3000 rows in t'
# at t/100_bugs.pl line 149.
# got: '9000'
# expected: '6000'
# Looks like you failed 1 test of 7.
[09:30:56] t/100_bugs.pl ......................

This was the last commit to touch that test script. I'm thinking
maybe it wasn't adjusted quite correctly? On the other hand, since
I can't find any similar failures before the last 48 hours, maybe
there is some other more-recent commit to blame. Anyway, something
is wrong there.

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&amp;dt=2022-09-09%2012%3A03%3A46
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2022-09-09%2011%3A16%3A36
[3]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&amp;dt=2022-09-09%2010%3A33%3A19
[4]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2022-09-08%2010%3A56%3A59

#20Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#19)
Re: Introduce wait_for_subscription_sync for TAP tests

On Fri, Sep 9, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

Pushed.

Recently a number of buildfarm animals have failed at the same
place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:

# Failed test '2x3000 rows in t'
# at t/100_bugs.pl line 149.
# got: '9000'
# expected: '6000'
# Looks like you failed 1 test of 7.
[09:30:56] t/100_bugs.pl ......................

This was the last commit to touch that test script. I'm thinking
maybe it wasn't adjusted quite correctly? On the other hand, since
I can't find any similar failures before the last 48 hours, maybe
there is some other more-recent commit to blame. Anyway, something
is wrong there.

It seems that this commit is innocent as it changed only how to wait.
Rather, looking at the logs, the tablesync worker errored out at an
interesting point:

022-09-09 09:30:19.630 EDT [631b3feb.840:13]
pg_16400_sync_16392_7141371862484106124 ERROR: could not find record
while sending logically-decoded data: missing contrecord at 0/1D4FFF8
2022-09-09 09:30:19.630 EDT [631b3feb.840:14]
pg_16400_sync_16392_7141371862484106124 STATEMENT: START_REPLICATION
SLOT "pg_16400_sync_16392_7141371862484106124" LOGICAL 0/0
(proto_version '3', origin 'any', publication_names '"testpub"')
ERROR: could not find record while sending logically-decoded data:
missing contrecord at 0/1D4FFF8
2022-09-09 09:30:19.631 EDT [631b3feb.26e8:2] ERROR: error while
shutting down streaming COPY: ERROR: could not find record while
sending logically-decoded data: missing contrecord at 0/1D4FFF8

It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
is relevant.

Regards,

--
Masahiko Sawada

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Masahiko Sawada (#20)
#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#21)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#21)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#23)
#25Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#24)