pgsql: Additional tests for subtransactions in recovery
Additional tests for subtransactions in recovery
Tests for normal and prepared transactions
Author: Nikhil Sontakke, placed in new test file by me
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/0352c15e5ab4f70c3ab2ac1d48d9f38a5dd11786
Modified Files
--------------
src/test/recovery/t/009_twophase.pl | 39 +-----
src/test/recovery/t/012_subtransactions.pl | 197 +++++++++++++++++++++++++++++
2 files changed, 198 insertions(+), 38 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Apr 27, 2017 at 9:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Additional tests for subtransactions in recovery
Tests for normal and prepared transactions
Author: Nikhil Sontakke, placed in new test file by me
+$node_slave->promote;
+$node_slave->poll_query_until('postgres',
+ "SELECT NOT pg_is_in_recovery()")
+ or die "Timed out while waiting for promotion of standby";
This reminds me that we should really switch PostgresNode::promote to
use the wait mode of pg_ctl promote, and remove all those polling
queries...
--
Michael
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 4/27/17 08:41, Michael Paquier wrote:
+$node_slave->promote; +$node_slave->poll_query_until('postgres', + "SELECT NOT pg_is_in_recovery()") + or die "Timed out while waiting for promotion of standby";This reminds me that we should really switch PostgresNode::promote to
use the wait mode of pg_ctl promote, and remove all those polling
queries...
I was going to say: This should all be obsolete already, because pg_ctl
promote waits by default.
However: Failure to complete promotion within the waiting time does not
lead to an error exit, so you will not get a failure if the promotion
does not finish. This is probably a mistake. Looking around pg_ctl, I
found that this was handled seemingly inconsistently in do_start(), but
do_stop() errors when it does not complete.
Possible patches for this attached.
Perhaps we need a separate exit code in pg_ctl to distinguish general
errors from did not finish within timeout?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-pg_ctl-Make-failure-to-complete-operation-a-nonzero-.patchinvalid/octet-stream; name=0001-pg_ctl-Make-failure-to-complete-operation-a-nonzero-.patchDownload
From 366994e04c277c25cb8e476378bb5af8e673ba58 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 12:10:17 -0400
Subject: [PATCH 1/2] pg_ctl: Make failure to complete operation a nonzero exit
If an operation being waited for does not complete within the timeout,
then exit with a nonzero exit status. This was previously handled
inconsistently.
---
doc/src/sgml/ref/pg_ctl-ref.sgml | 7 +++++++
src/bin/pg_ctl/pg_ctl.c | 8 ++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 71e52c4c35..dcf9628de0 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -412,6 +412,13 @@ <title>Options</title>
<command>pg_ctl</command> returns an exit code based on the
success of the startup or shutdown.
</para>
+
+ <para>
+ If the operation does not complete within the timeout (see
+ option <option>-t</option>), then <command>pg_ctl</command> exits with
+ a nonzero exit status. But the operation might continue in the
+ background and eventually succeed. XXX
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index f34dd28c6e..8c65d89370 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -919,7 +919,9 @@ do_start(void)
break;
case PQPING_REJECT:
print_msg(_(" stopped waiting\n"));
- print_msg(_("server is still starting up\n"));
+ write_stderr(_("%s: server did not start in time\n"),
+ progname);
+ exit(1);
break;
case PQPING_NO_RESPONSE:
print_msg(_(" stopped waiting\n"));
@@ -1247,7 +1249,9 @@ do_promote(void)
else
{
print_msg(_(" stopped waiting\n"));
- print_msg(_("server is still promoting\n"));
+ write_stderr(_("%s: server did not promote in time\n"),
+ progname);
+ exit(1);
}
}
else
--
2.12.2
0002-Remove-unnecessary-pg_is_in_recovery-calls-in-tests.patchinvalid/octet-stream; name=0002-Remove-unnecessary-pg_is_in_recovery-calls-in-tests.patchDownload
From 25be31719c8011cc1030b15661da31c56774b9e2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 12:11:25 -0400
Subject: [PATCH 2/2] Remove unnecessary pg_is_in_recovery calls in tests
Since pg_ctl promote already waits for recovery to end, these calls are
obsolete.
---
src/test/modules/commit_ts/t/003_standby_2.pl | 1 -
src/test/recovery/t/008_fsm_truncation.pl | 3 ---
src/test/recovery/t/009_twophase.pl | 9 ---------
src/test/recovery/t/010_logical_decoding_timelines.pl | 3 ---
src/test/recovery/t/012_subtransactions.pl | 9 ---------
5 files changed, 25 deletions(-)
diff --git a/src/test/modules/commit_ts/t/003_standby_2.pl b/src/test/modules/commit_ts/t/003_standby_2.pl
index d7898b8355..fad28501cf 100644
--- a/src/test/modules/commit_ts/t/003_standby_2.pl
+++ b/src/test/modules/commit_ts/t/003_standby_2.pl
@@ -55,7 +55,6 @@
$master->restart;
system_or_bail('pg_ctl', '-D', $standby->data_dir, 'promote');
-$standby->poll_query_until('postgres', "SELECT pg_is_in_recovery() <> true");
$standby->safe_psql('postgres', "create table t11()");
my $standby_ts = $standby->safe_psql('postgres',
diff --git a/src/test/recovery/t/008_fsm_truncation.pl b/src/test/recovery/t/008_fsm_truncation.pl
index 8aa8a4fe82..c189bcd7f7 100644
--- a/src/test/recovery/t/008_fsm_truncation.pl
+++ b/src/test/recovery/t/008_fsm_truncation.pl
@@ -78,9 +78,6 @@
# Promote the standby
$node_standby->promote;
-$node_standby->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_standby->psql('postgres', 'checkpoint');
# Restart to discard in-memory copy of FSM
diff --git a/src/test/recovery/t/009_twophase.pl b/src/test/recovery/t/009_twophase.pl
index 73103252a7..2079ae8d7d 100644
--- a/src/test/recovery/t/009_twophase.pl
+++ b/src/test/recovery/t/009_twophase.pl
@@ -180,9 +180,6 @@
PREPARE TRANSACTION 'xact_009_1';");
$node_master->teardown_node;
$node_slave->promote;
-$node_slave->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$psql_rc = $node_slave->psql('postgres', "COMMIT PREPARED 'xact_009_1'");
is($psql_rc, '0', "Restore of prepared transaction on promoted slave");
@@ -211,9 +208,6 @@
$node_master->stop;
$node_slave->restart;
$node_slave->promote;
-$node_slave->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql('postgres', "SELECT count(*) FROM pg_prepared_xacts",
stdout => \$psql_out);
@@ -245,9 +239,6 @@
$node_slave->teardown_node;
$node_slave->start;
$node_slave->promote;
-$node_slave->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql('postgres', "SELECT count(*) FROM pg_prepared_xacts",
stdout => \$psql_out);
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl b/src/test/recovery/t/010_logical_decoding_timelines.pl
index cdddb4d13d..7d97456e4a 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -129,9 +129,6 @@
$node_master->stop('immediate');
$node_replica->promote;
-print "waiting for replica to come up\n";
-$node_replica->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery();");
$node_replica->safe_psql('postgres',
"INSERT INTO decoding(blah) VALUES ('after failover');");
diff --git a/src/test/recovery/t/012_subtransactions.pl b/src/test/recovery/t/012_subtransactions.pl
index 5e02c28cb8..caafe5348e 100644
--- a/src/test/recovery/t/012_subtransactions.pl
+++ b/src/test/recovery/t/012_subtransactions.pl
@@ -97,9 +97,6 @@
is($psql_out, '8128', "Visible");
$node_master->stop;
$node_slave->promote;
-$node_slave->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_012_tbl",
stdout => \$psql_out);
@@ -141,9 +138,6 @@
is($psql_out, '-1', "Not visible");
$node_master->stop;
$node_slave->promote;
-$node_slave->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_012_tbl",
stdout => \$psql_out);
@@ -174,9 +168,6 @@
is($psql_out, '-1', "Not visible");
$node_master->stop;
$node_slave->promote;
-$node_slave->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_012_tbl",
stdout => \$psql_out);
--
2.12.2
On 5/1/17 12:19, Peter Eisentraut wrote:
On 4/27/17 08:41, Michael Paquier wrote:
+$node_slave->promote; +$node_slave->poll_query_until('postgres', + "SELECT NOT pg_is_in_recovery()") + or die "Timed out while waiting for promotion of standby";This reminds me that we should really switch PostgresNode::promote to
use the wait mode of pg_ctl promote, and remove all those polling
queries...I was going to say: This should all be obsolete already, because pg_ctl
promote waits by default.However: Failure to complete promotion within the waiting time does not
lead to an error exit, so you will not get a failure if the promotion
does not finish. This is probably a mistake. Looking around pg_ctl, I
found that this was handled seemingly inconsistently in do_start(), but
do_stop() errors when it does not complete.Possible patches for this attached.
Perhaps we need a separate exit code in pg_ctl to distinguish general
errors from did not finish within timeout?
I was going to hold this back for PG11, but since we're now doing some
other tweaks in pg_ctl, it might be useful to add this too. Thoughts?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v2-0001-pg_ctl-Make-failure-to-complete-operation-a-nonze.patchtext/plain; charset=UTF-8; name=v2-0001-pg_ctl-Make-failure-to-complete-operation-a-nonze.patch; x-mac-creator=0; x-mac-type=0Download
From 67707d541a2d9e088109385c8fa1eced8af83d54 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 12:10:17 -0400
Subject: [PATCH v2 1/2] pg_ctl: Make failure to complete operation a nonzero
exit
If an operation being waited for does not complete within the timeout,
then exit with a nonzero exit status. This was previously handled
inconsistently.
---
doc/src/sgml/ref/pg_ctl-ref.sgml | 7 +++++++
src/bin/pg_ctl/pg_ctl.c | 8 ++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 71e52c4c35..12fa011c4e 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -412,6 +412,13 @@ <title>Options</title>
<command>pg_ctl</command> returns an exit code based on the
success of the startup or shutdown.
</para>
+
+ <para>
+ If the operation does not complete within the timeout (see
+ option <option>-t</option>), then <command>pg_ctl</command> exits with
+ a nonzero exit status. But note that the operation might continue in
+ the background and eventually succeed.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 0c65196bda..4e02c4cea1 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -840,7 +840,9 @@ do_start(void)
break;
case POSTMASTER_STILL_STARTING:
print_msg(_(" stopped waiting\n"));
- print_msg(_("server is still starting up\n"));
+ write_stderr(_("%s: server did not start in time\n"),
+ progname);
+ exit(1);
break;
case POSTMASTER_FAILED:
print_msg(_(" stopped waiting\n"));
@@ -1166,7 +1168,9 @@ do_promote(void)
else
{
print_msg(_(" stopped waiting\n"));
- print_msg(_("server is still promoting\n"));
+ write_stderr(_("%s: server did not promote in time\n"),
+ progname);
+ exit(1);
}
}
else
--
2.13.1
v2-0002-Remove-unnecessary-pg_is_in_recovery-calls-in-tes.patchtext/plain; charset=UTF-8; name=v2-0002-Remove-unnecessary-pg_is_in_recovery-calls-in-tes.patch; x-mac-creator=0; x-mac-type=0Download
From b30b7d96161a2e27d80cc96073b44c5266c2b751 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 12:11:25 -0400
Subject: [PATCH v2 2/2] Remove unnecessary pg_is_in_recovery calls in tests
Since pg_ctl promote already waits for recovery to end, these calls are
obsolete.
---
src/test/modules/commit_ts/t/003_standby_2.pl | 1 -
src/test/recovery/t/008_fsm_truncation.pl | 2 --
src/test/recovery/t/009_twophase.pl | 6 ------
src/test/recovery/t/010_logical_decoding_timelines.pl | 3 ---
src/test/recovery/t/012_subtransactions.pl | 6 ------
5 files changed, 18 deletions(-)
diff --git a/src/test/modules/commit_ts/t/003_standby_2.pl b/src/test/modules/commit_ts/t/003_standby_2.pl
index 2fd561115c..c3000f5b4c 100644
--- a/src/test/modules/commit_ts/t/003_standby_2.pl
+++ b/src/test/modules/commit_ts/t/003_standby_2.pl
@@ -55,7 +55,6 @@
$master->restart;
system_or_bail('pg_ctl', '-D', $standby->data_dir, 'promote');
-$standby->poll_query_until('postgres', "SELECT pg_is_in_recovery() <> true");
$standby->safe_psql('postgres', "create table t11()");
my $standby_ts = $standby->safe_psql('postgres',
diff --git a/src/test/recovery/t/008_fsm_truncation.pl b/src/test/recovery/t/008_fsm_truncation.pl
index 56eecf722c..ddab464a97 100644
--- a/src/test/recovery/t/008_fsm_truncation.pl
+++ b/src/test/recovery/t/008_fsm_truncation.pl
@@ -83,8 +83,6 @@
# Promote the standby
$node_standby->promote;
-$node_standby->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_standby->psql('postgres', 'checkpoint');
# Restart to discard in-memory copy of FSM
diff --git a/src/test/recovery/t/009_twophase.pl b/src/test/recovery/t/009_twophase.pl
index 13b4a04205..376cb09a25 100644
--- a/src/test/recovery/t/009_twophase.pl
+++ b/src/test/recovery/t/009_twophase.pl
@@ -195,8 +195,6 @@
PREPARE TRANSACTION 'xact_009_1';");
$node_master->teardown_node;
$node_slave->promote;
-$node_slave->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$psql_rc = $node_slave->psql('postgres', "COMMIT PREPARED 'xact_009_1'");
is($psql_rc, '0', "Restore of prepared transaction on promoted slave");
@@ -227,8 +225,6 @@
$node_master->stop;
$node_slave->restart;
$node_slave->promote;
-$node_slave->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql(
'postgres',
@@ -264,8 +260,6 @@
$node_slave->teardown_node;
$node_slave->start;
$node_slave->promote;
-$node_slave->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql(
'postgres',
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl b/src/test/recovery/t/010_logical_decoding_timelines.pl
index 65f6ba2fca..98418128d2 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -135,9 +135,6 @@
$node_master->stop('immediate');
$node_replica->promote;
-print "waiting for replica to come up\n";
-$node_replica->poll_query_until('postgres',
- "SELECT NOT pg_is_in_recovery();");
$node_replica->safe_psql('postgres',
"INSERT INTO decoding(blah) VALUES ('after failover');");
diff --git a/src/test/recovery/t/012_subtransactions.pl b/src/test/recovery/t/012_subtransactions.pl
index 30677e1675..c99733cad7 100644
--- a/src/test/recovery/t/012_subtransactions.pl
+++ b/src/test/recovery/t/012_subtransactions.pl
@@ -109,8 +109,6 @@
is($psql_out, '8128', "Visible");
$node_master->stop;
$node_slave->promote;
-$node_slave->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql(
'postgres',
@@ -162,8 +160,6 @@
is($psql_out, '-1', "Not visible");
$node_master->stop;
$node_slave->promote;
-$node_slave->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql(
'postgres',
@@ -205,8 +201,6 @@
is($psql_out, '-1', "Not visible");
$node_master->stop;
$node_slave->promote;
-$node_slave->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
- or die "Timed out while waiting for promotion of standby";
$node_slave->psql(
'postgres',
--
2.13.1
On Sat, Jul 1, 2017 at 4:47 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 5/1/17 12:19, Peter Eisentraut wrote:
However: Failure to complete promotion within the waiting time does not
lead to an error exit, so you will not get a failure if the promotion
does not finish. This is probably a mistake. Looking around pg_ctl, I
found that this was handled seemingly inconsistently in do_start(), but
do_stop() errors when it does not complete.
This inconsistency could be treated like a bug, though changing such
an old behavior in bacl-branches would be risky. So +1 for only HEAD
with such a change, and pg_ctl promote -w is new in 10.
Possible patches for this attached.
Perhaps we need a separate exit code in pg_ctl to distinguish general
errors from did not finish within timeout?
I would treat that as a separate item for 11, but that's as far as my
opinion goes. Per this link in pg_ctl.c the error code ought to be 4:
https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
I was going to hold this back for PG11, but since we're now doing some
other tweaks in pg_ctl, it might be useful to add this too. Thoughts?
The use of 0 as exit code for the new promote -w if timeout is reached
looks like an open item to me. Cleaning up the pool queries after
promotion would be nice to see as well.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7/2/17 20:28, Michael Paquier wrote:
I was going to hold this back for PG11, but since we're now doing some
other tweaks in pg_ctl, it might be useful to add this too. Thoughts?The use of 0 as exit code for the new promote -w if timeout is reached
looks like an open item to me. Cleaning up the pool queries after
promotion would be nice to see as well.
committed
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 6, 2017 at 2:41 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 7/2/17 20:28, Michael Paquier wrote:
I was going to hold this back for PG11, but since we're now doing some
other tweaks in pg_ctl, it might be useful to add this too. Thoughts?The use of 0 as exit code for the new promote -w if timeout is reached
looks like an open item to me. Cleaning up the pool queries after
promotion would be nice to see as well.committed
Thanks for finishing the cleanup.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers