Add two missing tests in 035_standby_logical_decoding.pl

Started by Bertrand Drouvotabout 3 years ago57 messageshackers
Jump to latest
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com

hi hackers,

In the logical decoding on standby thread [1]/messages/by-id/6d801661-e21b-7326-be1b-f90d904da66a@gmail.com, Andres proposed 2 new tests (that I did
not find the time to complete before the finish line):

- Test that we can subscribe to the standby (with the publication created on the primary)
- Verify that invalidated logical slots do not lead to retaining WAL

Please find those 2 missing tests in the patch proposal attached.

A few words about them:

1) Regarding the subscription test:

It modifies wait_for_catchup() to take into account the case where the requesting
node is in recovery mode. Indeed, without that change, wait_for_subscription_sync() was
failing with:

"
error running SQL: 'psql:<stdin>:1: ERROR: recovery is in progress
HINT: WAL control functions cannot be executed during recovery.'
while running 'psql -XAtq -d port=61441 host=/tmp/45dt3wqs2p dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_current_wal_lsn()'
"

2) Regarding the WAL file not retained test:

As it's not possible to execute pg_switch_wal() and friends on a standby, this is
done on the primary. Also checking that the WAL file (linked to a restart_lsn of an invalidate
slot) has been removed is done directly at the os/directory level.

The attached patch also removes:

"
-log_min_messages = 'debug2'
-log_error_verbosity = verbose
"

as also discussed in [1]/messages/by-id/6d801661-e21b-7326-be1b-f90d904da66a@gmail.com.

I'm not sure if adding those 2 tests should be considered as an open item. I can add this open item
if we think that makes sense. I'd be happy to do so but it looks like I don't have the privileges
to edit https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

[1]: /messages/by-id/6d801661-e21b-7326-be1b-f90d904da66a@gmail.com

Attachments:

v1-0001-add-missing-tests.patchtext/plain; charset=UTF-8; name=v1-0001-add-missing-tests.patchDownload+167-7
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bertrand Drouvot (#1)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On 2023-Apr-12, Drouvot, Bertrand wrote:

I'm not sure if adding those 2 tests should be considered as an open
item. I can add this open item if we think that makes sense. I'd be
happy to do so but it looks like I don't have the privileges to edit
https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

I think adding extra tests for new code can definitely be considered an
open item, since those tests might help to discover issues in said new
code.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/17/23 11:55 AM, Alvaro Herrera wrote:

On 2023-Apr-12, Drouvot, Bertrand wrote:

I'm not sure if adding those 2 tests should be considered as an open
item. I can add this open item if we think that makes sense. I'd be
happy to do so but it looks like I don't have the privileges to edit
https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

I think adding extra tests for new code can definitely be considered an
open item, since those tests might help to discover issues in said new
code.

Thanks for the feedback! Added as an open item.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4vignesh C
vignesh21@gmail.com
In reply to: Bertrand Drouvot (#1)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Wed, 12 Apr 2023 at 21:45, Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

hi hackers,

In the logical decoding on standby thread [1], Andres proposed 2 new tests (that I did
not find the time to complete before the finish line):

- Test that we can subscribe to the standby (with the publication created on the primary)
- Verify that invalidated logical slots do not lead to retaining WAL

Please find those 2 missing tests in the patch proposal attached.

Few comments:
1) Should this change be committed as a separate patch instead of
mixing it with the new test addition patch? I feel it would be better
to split it into 0001 and 0002 patches.
# Name for the physical slot on primary
@@ -235,8 +241,6 @@ $node_primary->append_conf('postgresql.conf', q{
wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
-log_min_messages = 'debug2'
-log_error_verbosity = verbose
});
$node_primary->dump_info;
$node_primary->start;

2) We could add a commitfest entry for this, which will help in
checking cfbot results across platforms.

3) Should the comment say subscription instead of subscriber here?
+# We do not need the subscriber anymore
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$node_subscriber->stop;

4) we could add a commit message for the patch

Regards,
Vignesh

#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: vignesh C (#4)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/24/23 6:04 AM, vignesh C wrote:

On Wed, 12 Apr 2023 at 21:45, Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

hi hackers,

In the logical decoding on standby thread [1], Andres proposed 2 new tests (that I did
not find the time to complete before the finish line):

- Test that we can subscribe to the standby (with the publication created on the primary)
- Verify that invalidated logical slots do not lead to retaining WAL

Please find those 2 missing tests in the patch proposal attached.

Few comments:

Thanks for looking at it!

1) Should this change be committed as a separate patch instead of
mixing it with the new test addition patch? I feel it would be better
to split it into 0001 and 0002 patches.

Agree, done in V2 attached.

2) We could add a commitfest entry for this, which will help in
checking cfbot results across platforms.

Good point, done in [1]https://commitfest.postgresql.org/43/4295/.

3) Should the comment say subscription instead of subscriber here?
+# We do not need the subscriber anymore
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$node_subscriber->stop;

Comment was due to the node_subscriber being stopped. Changed to
"We do not need the subscription and the subscriber anymore"
in V2.

4) we could add a commit message for the patch

Good point, done.

[1]: https://commitfest.postgresql.org/43/4295/

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0002-Add-two-tests-in-035_standby_logical_decoding.pl.patchtext/plain; charset=UTF-8; name=v2-0002-Add-two-tests-in-035_standby_logical_decoding.pl.patchDownload+167-8
v2-0001-Lower-log-level-in-035_standby_logical_decoding.p.patchtext/plain; charset=UTF-8; name=v2-0001-Lower-log-level-in-035_standby_logical_decoding.p.patchDownload+0-3
#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#5)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Few comments:
============
1.
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', 'max_replication_slots = 4');

Why do we need slots on the subscriber?

2.
+# Speed up the subscription creation
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+
+# Explicitly shut down psql instance gracefully - to avoid hangs
+# or worse on windows
+$psql_subscriber{subscriber_stdin} .= "\\q\n";
+$psql_subscriber{run}->finish;
+
+# Insert some rows on the primary
+$node_primary->safe_psql('postgres',
+ qq[INSERT INTO tab_rep select generate_series(1,10);]);
+
+$node_primary->wait_for_replay_catchup($node_standby);
+
+# To speed up the wait_for_subscription_sync
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+$node_subscriber->wait_for_subscription_sync($node_standby, 'tap_sub');

It is not clear to me why you need to do pg_log_standby_snapshot() twice.

3. Why do you need $psql_subscriber to be used in a different way
instead of using safe_psql as is used for node_primary?

--
With Regards,
Amit Kapila.

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#1)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Wed, Apr 12, 2023 at 9:45 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

The attached patch also removes:

"
-log_min_messages = 'debug2'
-log_error_verbosity = verbose
"

as also discussed in [1].

I agree that we should reduce the log level here. It is discussed in
an email [1]/messages/by-id/523315.1681245505@sss.pgh.pa.us. I'll push this part tomorrow unless Andres or someone
else thinks that we still need this.

[1]: /messages/by-id/523315.1681245505@sss.pgh.pa.us

--
With Regards,
Amit Kapila.

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#6)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Few comments:
============

+# We can not test if the WAL file still exists immediately.
+# We need to let some time to the standby to actually "remove" it.
+my $i = 0;
+while (1)
+{
+ last if !-f $standby_walfile;
+ if ($i++ == 10 * $default_timeout)
+ {
+ die
+   "could not determine if WAL file has been retained or not, can't continue";
+ }
+ usleep(100_000);
+}

Is this adhoc wait required because we can't guarantee that the
checkpoint is complete on standby even after using wait_for_catchup?
Is there a guarantee that it can never fail on some slower machines?

BTW, for the second test is it necessary that we first ensure that the
WAL file has not been retained on the primary?

--
With Regards,
Amit Kapila.

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#6)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/24/23 8:24 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Few comments:
============

Thanks for looking at it!

1.
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', 'max_replication_slots = 4');

Why do we need slots on the subscriber?

Good point, it's not needed. I guess it has been missed during my initial patch clean up.

Fixed in V3 attached.

2.
+# Speed up the subscription creation
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+
+# Explicitly shut down psql instance gracefully - to avoid hangs
+# or worse on windows
+$psql_subscriber{subscriber_stdin} .= "\\q\n";
+$psql_subscriber{run}->finish;
+
+# Insert some rows on the primary
+$node_primary->safe_psql('postgres',
+ qq[INSERT INTO tab_rep select generate_series(1,10);]);
+
+$node_primary->wait_for_replay_catchup($node_standby);
+
+# To speed up the wait_for_subscription_sync
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+$node_subscriber->wait_for_subscription_sync($node_standby, 'tap_sub');

It is not clear to me why you need to do pg_log_standby_snapshot() twice.

That's because there is 2 logical slot creations that have the be done on the standby.

The one for the subscription:

"
CREATE_REPLICATION_SLOT "tap_sub" LOGICAL pgoutput (SNAPSHOT 'nothing')
"

And the one for the data sync:

"
CREATE_REPLICATION_SLOT "pg_16389_sync_16384_7225540800768250444" LOGICAL pgoutput (SNAPSHOT 'use')
"

Without the second "pg_log_standby_snapshot()" then wait_for_subscription_sync() would be waiting
some time on the poll for "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"

Adding a comment in V3 to explain the need for the second pg_log_standby_snapshot().

3. Why do you need $psql_subscriber to be used in a different way
instead of using safe_psql as is used for node_primary?

Because safe_psql() would wait for activity on the primary without being able to launch
pg_log_standby_snapshot() on the primary while waiting. psql_subscriber() allows
to not wait synchronously.

Also adding a comment in V3 to explain why safe_psql() is not being used here.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0002-Add-two-tests-in-035_standby_logical_decoding.pl.patchtext/plain; charset=UTF-8; name=v3-0002-Add-two-tests-in-035_standby_logical_decoding.pl.patchDownload+173-8
v3-0001-Lower-log-level-in-035_standby_logical_decoding.p.patchtext/plain; charset=UTF-8; name=v3-0001-Lower-log-level-in-035_standby_logical_decoding.p.patchDownload+0-3
#10Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#8)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/24/23 11:45 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Few comments:
============

+# We can not test if the WAL file still exists immediately.
+# We need to let some time to the standby to actually "remove" it.
+my $i = 0;
+while (1)
+{
+ last if !-f $standby_walfile;
+ if ($i++ == 10 * $default_timeout)
+ {
+ die
+   "could not determine if WAL file has been retained or not, can't continue";
+ }
+ usleep(100_000);
+}

Is this adhoc wait required because we can't guarantee that the
checkpoint is complete on standby even after using wait_for_catchup?

Yes, the restart point on the standby is not necessary completed even after wait_for_catchup is done.

Is there a guarantee that it can never fail on some slower machines?

We are waiting here at a maximum for 10 * $default_timeout (means 3 minutes) before
we time out. Would you prefer to wait more than 3 minutes at a maximum?

BTW, for the second test is it necessary that we first ensure that the
WAL file has not been retained on the primary?

I was not sure it's worth it too. Idea was more: it's useless to verify it is removed on
the standby if we are not 100% sure it has been removed on the primary first. But yeah, we can get
rid of this test if you prefer.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#9)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Mon, Apr 24, 2023 at 3:36 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

On 4/24/23 8:24 AM, Amit Kapila wrote:

2.
+# Speed up the subscription creation
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+
+# Explicitly shut down psql instance gracefully - to avoid hangs
+# or worse on windows
+$psql_subscriber{subscriber_stdin} .= "\\q\n";
+$psql_subscriber{run}->finish;
+
+# Insert some rows on the primary
+$node_primary->safe_psql('postgres',
+ qq[INSERT INTO tab_rep select generate_series(1,10);]);
+
+$node_primary->wait_for_replay_catchup($node_standby);
+
+# To speed up the wait_for_subscription_sync
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+$node_subscriber->wait_for_subscription_sync($node_standby, 'tap_sub');

It is not clear to me why you need to do pg_log_standby_snapshot() twice.

That's because there is 2 logical slot creations that have the be done on the standby.

The one for the subscription:

"
CREATE_REPLICATION_SLOT "tap_sub" LOGICAL pgoutput (SNAPSHOT 'nothing')
"

And the one for the data sync:

"
CREATE_REPLICATION_SLOT "pg_16389_sync_16384_7225540800768250444" LOGICAL pgoutput (SNAPSHOT 'use')
"

Without the second "pg_log_standby_snapshot()" then wait_for_subscription_sync() would be waiting
some time on the poll for "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"

Adding a comment in V3 to explain the need for the second pg_log_standby_snapshot().

Won't this still be unpredictable because it is possible that the
tablesync worker may take more time to get launched or create a
replication slot? If that happens after your second
pg_log_standby_snapshot() then wait_for_subscription_sync() will be
hanging. Wouldn't it be better to create a subscription with
(copy_data = false) to make it predictable and then we won't need
pg_log_standby_snapshot() to be performed twice?

If you agree with the above suggestion then you probably need to move
wait_for_subscription_sync() before Insert.

--
With Regards,
Amit Kapila.

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#10)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Mon, Apr 24, 2023 at 5:38 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

On 4/24/23 11:45 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Few comments:
============

+# We can not test if the WAL file still exists immediately.
+# We need to let some time to the standby to actually "remove" it.
+my $i = 0;
+while (1)
+{
+ last if !-f $standby_walfile;
+ if ($i++ == 10 * $default_timeout)
+ {
+ die
+   "could not determine if WAL file has been retained or not, can't continue";
+ }
+ usleep(100_000);
+}

Is this adhoc wait required because we can't guarantee that the
checkpoint is complete on standby even after using wait_for_catchup?

Yes, the restart point on the standby is not necessary completed even after wait_for_catchup is done.

Is there a guarantee that it can never fail on some slower machines?

We are waiting here at a maximum for 10 * $default_timeout (means 3 minutes) before
we time out. Would you prefer to wait more than 3 minutes at a maximum?

No, because I don't know what would be a suitable timeout here. At
this stage, I don't have a good idea on how to implement this test in
a better way. Can we split this into a separate patch as the first
test is a bit straightforward, we can push that one and then
brainstorm on if there is a better way to test this functionality.

--
With Regards,
Amit Kapila.

#13Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#11)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/25/23 6:23 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 3:36 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Without the second "pg_log_standby_snapshot()" then wait_for_subscription_sync() would be waiting
some time on the poll for "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"

Adding a comment in V3 to explain the need for the second pg_log_standby_snapshot().

Won't this still be unpredictable because it is possible that the
tablesync worker may take more time to get launched or create a
replication slot? If that happens after your second
pg_log_standby_snapshot() then wait_for_subscription_sync() will be
hanging.

Oh right, that looks like a possible scenario.

Wouldn't it be better to create a subscription with
(copy_data = false) to make it predictable and then we won't need
pg_log_standby_snapshot() to be performed twice?

If you agree with the above suggestion then you probably need to move
wait_for_subscription_sync() before Insert.

I like that idea, thanks! Done in V4 attached.

Not related to the above corner case, but while re-reading the patch I also added:

"
$node_primary->wait_for_replay_catchup($node_standby);
"

between the publication creation on the primary and the subscription to the standby
(to ensure the publication gets replicated before we request for the subscription creation).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0002-Add-retained-WAL-test-in-035_standby_logical_deco.patchtext/plain; charset=UTF-8; name=v4-0002-Add-retained-WAL-test-in-035_standby_logical_deco.patchDownload+76-3
v4-0001-Add-subscribtion-to-the-standby-test-in-035_stand.patchtext/plain; charset=UTF-8; name=v4-0001-Add-subscribtion-to-the-standby-test-in-035_stand.patchDownload+99-4
#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#12)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/25/23 6:43 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 5:38 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

We are waiting here at a maximum for 10 * $default_timeout (means 3 minutes) before
we time out. Would you prefer to wait more than 3 minutes at a maximum?

No, because I don't know what would be a suitable timeout here.

Yeah, I understand that. On the other hand, there is other places that
rely on a timeout, for example:

- wait_for_catchup(), wait_for_slot_catchup(),
wait_for_subscription_sync() by making use of poll_query_until.
- wait_for_log() by setting a max_attempts.

Couldn't we have the same concern for those ones? (aka be suitable on
slower machines).

At
this stage, I don't have a good idea on how to implement this test in
a better way. Can we split this into a separate patch as the first
test is a bit straightforward, we can push that one and then
brainstorm on if there is a better way to test this functionality.

I created a dedicated v4-0002-Add-retained-WAL-test-in-035_standby_logical_deco.patch
just shared up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#15vignesh C
vignesh21@gmail.com
In reply to: Bertrand Drouvot (#13)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Tue, 25 Apr 2023 at 12:51, Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On 4/25/23 6:23 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 3:36 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Without the second "pg_log_standby_snapshot()" then wait_for_subscription_sync() would be waiting
some time on the poll for "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"

Adding a comment in V3 to explain the need for the second pg_log_standby_snapshot().

Won't this still be unpredictable because it is possible that the
tablesync worker may take more time to get launched or create a
replication slot? If that happens after your second
pg_log_standby_snapshot() then wait_for_subscription_sync() will be
hanging.

Oh right, that looks like a possible scenario.

Wouldn't it be better to create a subscription with
(copy_data = false) to make it predictable and then we won't need
pg_log_standby_snapshot() to be performed twice?

If you agree with the above suggestion then you probably need to move
wait_for_subscription_sync() before Insert.

I like that idea, thanks! Done in V4 attached.

Not related to the above corner case, but while re-reading the patch I also added:

"
$node_primary->wait_for_replay_catchup($node_standby);
"

between the publication creation on the primary and the subscription to the standby
(to ensure the publication gets replicated before we request for the subscription creation).

Thanks for the updated patch.
Few comments:
1) subscriber_stdout and  subscriber_stderr are not required for this
test case, we could remove it, I was able to remove those variables
and run the test successfully:
+$node_subscriber->start;
+
+my %psql_subscriber = (
+       'subscriber_stdin'  => '',
+       'subscriber_stdout' => '',
+       'subscriber_stderr' => '');
+$psql_subscriber{run} = IPC::Run::start(
+       [ 'psql', '-XA', '-f', '-', '-d',
$node_subscriber->connstr('postgres') ],
+       '<',
+       \$psql_subscriber{subscriber_stdin},
+       '>',
+       \$psql_subscriber{subscriber_stdout},
+       '2>',
+       \$psql_subscriber{subscriber_stderr},
+       $psql_timeout);

I ran it like:
my %psql_subscriber = (
'subscriber_stdin' => '');
$psql_subscriber{run} = IPC::Run::start(
[ 'psql', '-XA', '-f', '-', '-d', $node_subscriber->connstr('postgres') ],
'<',
\$psql_subscriber{subscriber_stdin},
$psql_timeout);

2) Also we have changed the default timeout here, why is this change required:
 my $node_cascading_standby =
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout    = IPC::Run::timer(2 * $default_timeout);

Regards,
Vignesh

#16Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: vignesh C (#15)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/26/23 6:06 AM, vignesh C wrote:

On Tue, 25 Apr 2023 at 12:51, Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
Thanks for the updated patch.
Few comments:

Thanks for looking at it!

1) subscriber_stdout and  subscriber_stderr are not required for this
test case, we could remove it, I was able to remove those variables
and run the test successfully:
+$node_subscriber->start;
+
+my %psql_subscriber = (
+       'subscriber_stdin'  => '',
+       'subscriber_stdout' => '',
+       'subscriber_stderr' => '');
+$psql_subscriber{run} = IPC::Run::start(
+       [ 'psql', '-XA', '-f', '-', '-d',
$node_subscriber->connstr('postgres') ],
+       '<',
+       \$psql_subscriber{subscriber_stdin},
+       '>',
+       \$psql_subscriber{subscriber_stdout},
+       '2>',
+       \$psql_subscriber{subscriber_stderr},
+       $psql_timeout);

I ran it like:
my %psql_subscriber = (
'subscriber_stdin' => '');
$psql_subscriber{run} = IPC::Run::start(
[ 'psql', '-XA', '-f', '-', '-d', $node_subscriber->connstr('postgres') ],
'<',
\$psql_subscriber{subscriber_stdin},
$psql_timeout);

Not using the 3 std* is also the case for example in 021_row_visibility.pl and 032_relfilenode_reuse.pl
where the "stderr" is set but does not seem to be used.

I don't think that's a problem to keep them all and I think it's better to have
them re-directed to dedicated places.

2) Also we have changed the default timeout here, why is this change required:
my $node_cascading_standby =
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout    = IPC::Run::timer(2 * $default_timeout);

I think I used 021_row_visibility.pl as an example. But agree there is
others .pl that are using the timeout_default as the psql_timeout and that
the default is enough in our case. So, using the default in V5 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-Add-subscribtion-to-the-standby-test-in-035_stand.patchtext/plain; charset=UTF-8; name=v5-0001-Add-subscribtion-to-the-standby-test-in-035_stand.patchDownload+99-4
#17vignesh C
vignesh21@gmail.com
In reply to: Bertrand Drouvot (#16)
Re: Add two missing tests in 035_standby_logical_decoding.pl

On Wed, 26 Apr 2023 at 13:45, Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On 4/26/23 6:06 AM, vignesh C wrote:

On Tue, 25 Apr 2023 at 12:51, Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
Thanks for the updated patch.
Few comments:

Thanks for looking at it!

1) subscriber_stdout and  subscriber_stderr are not required for this
test case, we could remove it, I was able to remove those variables
and run the test successfully:
+$node_subscriber->start;
+
+my %psql_subscriber = (
+       'subscriber_stdin'  => '',
+       'subscriber_stdout' => '',
+       'subscriber_stderr' => '');
+$psql_subscriber{run} = IPC::Run::start(
+       [ 'psql', '-XA', '-f', '-', '-d',
$node_subscriber->connstr('postgres') ],
+       '<',
+       \$psql_subscriber{subscriber_stdin},
+       '>',
+       \$psql_subscriber{subscriber_stdout},
+       '2>',
+       \$psql_subscriber{subscriber_stderr},
+       $psql_timeout);

I ran it like:
my %psql_subscriber = (
'subscriber_stdin' => '');
$psql_subscriber{run} = IPC::Run::start(
[ 'psql', '-XA', '-f', '-', '-d', $node_subscriber->connstr('postgres') ],
'<',
\$psql_subscriber{subscriber_stdin},
$psql_timeout);

Not using the 3 std* is also the case for example in 021_row_visibility.pl and 032_relfilenode_reuse.pl
where the "stderr" is set but does not seem to be used.

I don't think that's a problem to keep them all and I think it's better to have
them re-directed to dedicated places.

ok, that way it will be consistent across others too.

2) Also we have changed the default timeout here, why is this change required:
my $node_cascading_standby =
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout    = IPC::Run::timer(2 * $default_timeout);

I think I used 021_row_visibility.pl as an example. But agree there is
others .pl that are using the timeout_default as the psql_timeout and that
the default is enough in our case. So, using the default in V5 attached.

Thanks for fixing this.

There was one typo in the commit message, subscribtion should be
subscription, the rest of the changes looks good to me:
Subject: [PATCH v5] Add subscribtion to the standby test in
035_standby_logical_decoding.pl

Adding one test, to verify that subscribtion to the standby is possible.

Regards,
Vignesh

#18shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Bertrand Drouvot (#10)
RE: Add two missing tests in 035_standby_logical_decoding.pl

On Mon, Apr 24, 2023 8:07 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:

On 4/24/23 11:45 AM, Amit Kapila wrote:

On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:

Few comments:
============

+# We can not test if the WAL file still exists immediately.
+# We need to let some time to the standby to actually "remove" it.
+my $i = 0;
+while (1)
+{
+ last if !-f $standby_walfile;
+ if ($i++ == 10 * $default_timeout)
+ {
+ die
+   "could not determine if WAL file has been retained or not, can't continue";
+ }
+ usleep(100_000);
+}

Is this adhoc wait required because we can't guarantee that the
checkpoint is complete on standby even after using wait_for_catchup?

Yes, the restart point on the standby is not necessary completed even after
wait_for_catchup is done.

I think that's because when replaying a checkpoint record, the startup process
of standby only saves the information of the checkpoint, and we need to wait for
the checkpointer to perform a restartpoint (see RecoveryRestartPoint), right? If
so, could we force a checkpoint on standby? After this, the standby should have
completed the restartpoint and we don't need to wait.

Besides, would it be better to wait for the cascading standby? If the wal log
file needed for cascading standby is removed on the standby, the subsequent test
will fail. Do we need to consider this scenario? I saw the following error
message after setting recovery_min_apply_delay to 5s on the cascading standby,
and the test failed due to a timeout while waiting for cascading standby.

Log of cascading standby node:
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already been removed

Regards,
Shi Yu

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bertrand Drouvot (#16)
Re: Add two missing tests in 035_standby_logical_decoding.pl
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 6f7f4e5de4..819667d42a 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2644,7 +2644,16 @@ sub wait_for_catchup
}
if (!defined($target_lsn))
{
-		$target_lsn = $self->lsn('write');
+		my $isrecovery = $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+		chomp($isrecovery);
+		if ($isrecovery eq 't')
+		{
+			$target_lsn = $self->lsn('replay');
+		}
+		else
+		{
+			$target_lsn = $self->lsn('write');
+		}

Please modify the function's documentation to account for this code change.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Porque Kim no hacía nada, pero, eso sí,
con extraordinario éxito" ("Kim", Kipling)

#20Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Alvaro Herrera (#19)
Re: Add two missing tests in 035_standby_logical_decoding.pl

Hi,

On 4/26/23 12:27 PM, Alvaro Herrera wrote:

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 6f7f4e5de4..819667d42a 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2644,7 +2644,16 @@ sub wait_for_catchup
}
if (!defined($target_lsn))
{
-		$target_lsn = $self->lsn('write');
+		my $isrecovery = $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+		chomp($isrecovery);
+		if ($isrecovery eq 't')
+		{
+			$target_lsn = $self->lsn('replay');
+		}
+		else
+		{
+			$target_lsn = $self->lsn('write');
+		}

Please modify the function's documentation to account for this code change.

Good point, thanks! Done in V6 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Add-subscription-to-the-standby-test-in-035_stand.patchtext/plain; charset=UTF-8; name=v6-0001-Add-subscription-to-the-standby-test-in-035_stand.patchDownload+107-6
#21Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: vignesh C (#17)
#22Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: shiy.fnst@fujitsu.com (#18)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#20)
#24Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#23)
#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#24)
#26Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#25)
#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#22)
#28Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#27)
#29Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#28)
#30Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#29)
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#30)
#32Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#31)
#33vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#31)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#33)
#35Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#34)
#36Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#34)
#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#36)
#38vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#37)
#39Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: vignesh C (#38)
#40Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#37)
#41Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#40)
#42Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#41)
#43Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#41)
#44Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#43)
#45Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#44)
#46Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#45)
#47Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#46)
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#47)
#49Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#48)
#50Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#49)
#51Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#50)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#51)
#53Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#52)
#54Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#53)
#55Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#54)
#56Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#55)
#57Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#56)