pg_ctl promote wait
It would be nice if pg_ctl promote supported the -w (wait) option.
How could pg_ctl determine when the promotion has finished?
We could query pg_is_in_recovery(), but that would require database
access, which we got rid of in pg_ctl.
We could check when recovery.conf is gone or recovery.done appears.
This could work, but I think some people are trying to get rid of these
files, so building a feature on that might be a problem? (I think the
latest news on that was that the files might be different in this
hypothetical future but there would still be a file to prevent
re-recovery on restart.)
Other ideas?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 February 2016 at 02:47, Peter Eisentraut <peter_e@gmx.net> wrote:
It would be nice if pg_ctl promote supported the -w (wait) option.
How could pg_ctl determine when the promotion has finished?
We could query pg_is_in_recovery(), but that would require database
access, which we got rid of in pg_ctl.We could check when recovery.conf is gone or recovery.done appears.
This could work, but I think some people are trying to get rid of these
files, so building a feature on that might be a problem? (I think the
latest news on that was that the files might be different in this
hypothetical future but there would still be a file to prevent
re-recovery on restart.)Other ideas?
We use a trigger file under the covers, so a promotion complete file seems
useful also, but crappy.
Perhaps we should have a Server Status file that shows Standby or Master,
obviously not updated on crash.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
It would be nice if pg_ctl promote supported the -w (wait) option.
How could pg_ctl determine when the promotion has finished?
How about looking into pg_control? ControlFileData->state ought to have
the correct information.
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 February 2016 at 08:33, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
It would be nice if pg_ctl promote supported the -w (wait) option.
How could pg_ctl determine when the promotion has finished?
How about looking into pg_control? ControlFileData->state ought to have
the correct information.
+1
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 18, 2016 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 18 February 2016 at 08:33, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
It would be nice if pg_ctl promote supported the -w (wait) option.
+1
How could pg_ctl determine when the promotion has finished?
How about looking into pg_control? ControlFileData->state ought to have
the correct information.+1
One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/18/16 3:33 AM, Andres Freund wrote:
Hi,
On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
It would be nice if pg_ctl promote supported the -w (wait) option.
How could pg_ctl determine when the promotion has finished?
How about looking into pg_control? ControlFileData->state ought to have
the correct information.
Is it safe to read pg_control externally without a lock? pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again. But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/19/16 10:06 AM, Fujii Masao wrote:
One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.
I think that window would be acceptable.
If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG(). I'm not sure that that is
worth it.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-02-19 13:48:52 -0500, Peter Eisentraut wrote:
Is it safe to read pg_control externally without a lock? pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again. But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.
Yes, the OS should give sufficient guarantees here:
If write() is interrupted by a signal before it writes any data, it shall return −1 with errno set to [EINTR].
If write() is interrupted by a signal after it successfully writes some data, it shall return the number of bytes written.
If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.
After a write() to a regular file has successfully returned:
* Any successful read() from each byte position in the file that was modified by that write shall return the data specified
by the write() for that position until such byte positions are again modified.
We currently assume that all "small" writes succeed in one go (and throw
errors if not). Thus the guarantees by read/write are sufficient.
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-02-20 00:06:09 +0900, Fujii Masao wrote:
One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.
It might make sense to have one more state type, which is used when
ending recovery, but before full access is allowed.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut <peter_e@gmx.net> writes:
Is it safe to read pg_control externally without a lock? pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again. But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.
Just retry the read till you don't get a CRC error.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut <peter_e@gmx.net> writes:
On 2/19/16 10:06 AM, Fujii Masao wrote:
One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.
I think that window would be acceptable.
If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG(). I'm not sure that that is
worth it.
I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-02-19 15:09:58 -0500, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
On 2/19/16 10:06 AM, Fujii Masao wrote:
One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.I think that window would be acceptable.
If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG(). I'm not sure that that is
worth it.I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.
That's afaics not sufficient if the standby was using hot standby, as
that'll let -w succeed immediately, no?
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2016-02-19 15:09:58 -0500, Tom Lane wrote:
I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.
That's afaics not sufficient if the standby was using hot standby, as
that'll let -w succeed immediately, no?
Oh, now I get Fujii-san's point. Yes, that wouldn't prove anything
about whether you could do a write transaction immediately.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/19/16 3:09 PM, Tom Lane wrote:
I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.
Here is a patch set around that idea.
The subsequent discussion mentioned that there might still be a window
between end of waiting and when read-write queries would be accepted. I
don't know how big that window would be in practice and would be
interested in some testing and feedback.
Attachments:
0001-pg_ctl-Add-tests-for-promote-action.patchapplication/x-patch; name=0001-pg_ctl-Add-tests-for-promote-action.patchDownload
From cb5d4a63620636d4043d1a85acf7fcfdace73b1d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Sun, 28 Feb 2016 20:21:54 -0500
Subject: [PATCH 1/3] pg_ctl: Add tests for promote action
---
src/bin/pg_ctl/t/003_promote.pl | 62 +++++++++++++++++++++++++++++++++++++++++
src/test/perl/TestLib.pm | 11 ++++++++
2 files changed, 73 insertions(+)
create mode 100644 src/bin/pg_ctl/t/003_promote.pl
diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl
new file mode 100644
index 0000000..64d72e0
--- /dev/null
+++ b/src/bin/pg_ctl/t/003_promote.pl
@@ -0,0 +1,62 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 13;
+
+my $tempdir = TestLib::tempdir;
+#my $tempdir_short = TestLib::tempdir_short;
+
+command_fails_like([ 'pg_ctl', 'promote', '-D', "$tempdir/nonexistent" ],
+ qr/directory .* does not exist/,
+ 'pg_ctl promote with nonexistent directory');
+
+my $node_primary = get_new_node('primary');
+$node_primary->init;
+$node_primary->append_conf(
+ "postgresql.conf", qq(
+wal_level = hot_standby
+max_wal_senders = 2
+wal_keep_segments = 20
+hot_standby = on
+)
+ );
+
+command_fails_like([ 'pg_ctl', 'promote', '-D', $node_primary->data_dir ],
+ qr/PID file .* does not exist/,
+ 'pg_ctl promote of not running instance fails');
+
+$node_primary->start;
+
+command_fails_like([ 'pg_ctl', 'promote', '-D', $node_primary->data_dir ],
+ qr/not in standby mode/,
+ 'pg_ctl promote of primary instance fails');
+
+my $node_standby = get_new_node('standby');
+$node_primary->backup('my_backup');
+$node_standby->init_from_backup($node_primary, 'my_backup');
+my $connstr_primary = $node_primary->connstr('postgres');
+
+$node_standby->append_conf(
+ "recovery.conf", qq(
+primary_conninfo='$connstr_primary'
+standby_mode=on
+recovery_target_timeline='latest'
+)
+ );
+
+$node_standby->start;
+
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'],
+ qr/^t$/,
+ 'standby is in recovery');
+
+command_ok([ 'pg_ctl', 'promote', '-D', $node_standby->data_dir ],
+ 'pg_ctl promote of standby runs');
+
+sleep 3; # needs a moment to react
+
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'],
+ qr/^f$/,
+ 'promoted standby is not in recovery');
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 3d11cbb..dd275cf 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -34,6 +34,7 @@ our @EXPORT = qw(
program_version_ok
program_options_handling_ok
command_like
+ command_fails_like
$windows_os
);
@@ -262,4 +263,14 @@ sub command_like
like($stdout, $expected_stdout, "$test_name: matches");
}
+sub command_fails_like
+{
+ my ($cmd, $expected_stderr, $test_name) = @_;
+ my ($stdout, $stderr);
+ print("# Running: " . join(" ", @{$cmd}) . "\n");
+ my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ ok(!$result, "@$cmd exit code not 0");
+ like($stderr, $expected_stderr, "$test_name: matches");
+}
+
1;
--
2.7.2
0002-pg_ctl-Detect-current-standby-state-from-pg_control.patchapplication/x-patch; name=0002-pg_ctl-Detect-current-standby-state-from-pg_control.patchDownload
From 7c1b6e94f5e3beb9e558d2af7098940d4475fe11 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Sun, 28 Feb 2016 20:21:54 -0500
Subject: [PATCH 2/3] pg_ctl: Detect current standby state from pg_control
pg_ctl used to determine whether a server was in standby mode by looking
for a recovery.conf file. With this change, it instead looks into
pg_control, which is potentially more accurate. There are also
occasional discussions about removing recovery.conf, so this removes one
dependency.
---
src/bin/pg_ctl/pg_ctl.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 57 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index bae6c22..c38c479 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -19,6 +19,7 @@
#include "postgres_fe.h"
+#include "catalog/pg_control.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -158,6 +159,8 @@ static bool postmaster_is_alive(pid_t pid);
static void unlimit_core_size(void);
#endif
+static DBState get_control_dbstate(void);
+
#ifdef WIN32
static void
@@ -1168,7 +1171,6 @@ do_promote(void)
{
FILE *prmfile;
pgpid_t pid;
- struct stat statbuf;
pid = get_pgpid(false);
@@ -1187,8 +1189,7 @@ do_promote(void)
exit(1);
}
- /* If recovery.conf doesn't exist, the server is not in standby mode */
- if (stat(recovery_file, &statbuf) != 0)
+ if (get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
{
write_stderr(_("%s: cannot promote server; "
"server is not in standby mode\n"),
@@ -2115,6 +2116,59 @@ adjust_data_dir(void)
}
+static DBState
+get_control_dbstate(void)
+{
+ char *control_file_path;
+ ControlFileData control_file_data;
+
+ control_file_path = psprintf("%s/global/pg_control", pg_data);
+
+ for (;;)
+ {
+ int fd;
+ pg_crc32c crc;
+
+ if ((fd = open(control_file_path, O_RDONLY | PG_BINARY, 0)) == -1)
+ {
+ fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
+ progname, control_file_path, strerror(errno));
+ exit(1);
+ }
+
+ if (read(fd, &control_file_data, sizeof(control_file_data)) != sizeof(control_file_data))
+ {
+ fprintf(stderr, _("%s: could not read file \"%s\": %s\n"),
+ progname, control_file_path, strerror(errno));
+ exit(1);
+ }
+ close(fd);
+
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ &control_file_data,
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ if (EQ_CRC32C(crc, control_file_data.crc))
+ break;
+
+ if (wait_seconds > 0)
+ {
+ sleep(1);
+ wait_seconds--;
+ continue;
+ }
+
+ fprintf(stderr, _("%s: control file \"%s\" appears to be corrupt\n"),
+ progname, control_file_path);
+ exit(1);
+ }
+
+ return control_file_data.state;
+}
+
+
int
main(int argc, char **argv)
{
--
2.7.2
0003-pg_ctl-Add-wait-option-to-promote-action.patchapplication/x-patch; name=0003-pg_ctl-Add-wait-option-to-promote-action.patchDownload
From fc1f402ed1c60470322133ab30b0034357163ff5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Sun, 28 Feb 2016 20:21:54 -0500
Subject: [PATCH 3/3] pg_ctl: Add wait option to promote action
When waiting is selected for the promote action, look into pg_control
until the state changes, then use the PQping-based waiting until the
server is reachable.
---
doc/src/sgml/ref/pg_ctl-ref.sgml | 29 ++++++++++++++++++++------
src/bin/pg_ctl/pg_ctl.c | 45 ++++++++++++++++++++++++++++------------
src/bin/pg_ctl/t/003_promote.pl | 28 ++++++++++++++++++++++++-
3 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 6ceb781..a00c355 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -91,6 +91,8 @@
<cmdsynopsis>
<command>pg_ctl</command>
<arg choice="plain"><option>promote</option></arg>
+ <arg choice="opt"><option>-w</option></arg>
+ <arg choice="opt"><option>-t</option> <replaceable>seconds</replaceable></arg>
<arg choice="opt"><option>-s</option></arg>
<arg choice="opt"><option>-D</option> <replaceable>datadir</replaceable></arg>
</cmdsynopsis>
@@ -361,8 +363,8 @@ <title>Options</title>
<term><option>--timeout</option></term>
<listitem>
<para>
- The maximum number of seconds to wait when waiting for startup or
- shutdown to complete. Defaults to the value of the
+ The maximum number of seconds to wait when waiting for an operation
+ to complete (see option <option>-w</option>). Defaults to the value of the
<envar>PGCTLTIMEOUT</> environment variable or, if not set, to 60
seconds.
</para>
@@ -383,8 +385,23 @@ <title>Options</title>
<term><option>-w</option></term>
<listitem>
<para>
- Wait for the startup or shutdown to complete.
- Waiting is the default option for shutdowns, but not startups.
+ Wait for an operation to complete. This is supported for the
+ modes <literal>start</literal>, <literal>stop</literal>,
+ <literal>restart</literal>, <literal>promote</literal>,
+ and <literal>register</literal>.
+ </para>
+
+ <para>
+ Waiting is the default option for shutdowns, but not startups,
+ restarts, or promotions. This is mainly for historical reasons; the
+ waiting option is almost always preferable. If waiting is not
+ selected, the requested action is triggered, but there is no feedback
+ about its success. In that case, the server log file or an external
+ monitoring system would have to be used to check the progress and
+ success of the operation.
+ </para>
+
+ <para>
When waiting for startup, <command>pg_ctl</command> repeatedly
attempts to connect to the server.
When waiting for shutdown, <command>pg_ctl</command> waits for
@@ -400,8 +417,8 @@ <title>Options</title>
<term><option>-W</option></term>
<listitem>
<para>
- Do not wait for startup or shutdown to complete. This is the
- default for start and restart modes.
+ Do not wait for an operation to complete. This is the opposite of the
+ option <option>-w</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index c38c479..6c152b1 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1228,7 +1228,34 @@ do_promote(void)
exit(1);
}
- print_msg(_("server promoting\n"));
+ if (do_wait)
+ {
+ DBState state = DB_STARTUP;
+
+ print_msg(_("waiting for server to promote..."));
+ while (wait_seconds > 0)
+ {
+ state = get_control_dbstate();
+ if (state == DB_IN_PRODUCTION)
+ break;
+
+ print_msg(".");
+ pg_usleep(1000000); /* 1 sec */
+ wait_seconds--;
+ }
+ if (state == DB_IN_PRODUCTION)
+ {
+ print_msg(_(" done\n"));
+ print_msg(_("server promoted\n"));
+ }
+ else
+ {
+ print_msg(_(" stopped waiting\n"));
+ print_msg(_("server is still promoting\n"));
+ }
+ }
+ else
+ print_msg(_("server promoting\n"));
}
@@ -2429,18 +2456,10 @@ main(int argc, char **argv)
if (!wait_set)
{
- switch (ctl_command)
- {
- case RESTART_COMMAND:
- case START_COMMAND:
- do_wait = false;
- break;
- case STOP_COMMAND:
- do_wait = true;
- break;
- default:
- break;
- }
+ if (ctl_command == STOP_COMMAND)
+ do_wait = true;
+ else
+ do_wait = false;
}
if (ctl_command == RELOAD_COMMAND)
diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl
index 64d72e0..5c9c31c 100644
--- a/src/bin/pg_ctl/t/003_promote.pl
+++ b/src/bin/pg_ctl/t/003_promote.pl
@@ -3,7 +3,7 @@
use PostgresNode;
use TestLib;
-use Test::More tests => 13;
+use Test::More tests => 20;
my $tempdir = TestLib::tempdir;
#my $tempdir_short = TestLib::tempdir_short;
@@ -60,3 +60,29 @@
$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'],
qr/^f$/,
'promoted standby is not in recovery');
+
+$node_standby = get_new_node('standby2');
+$node_standby->init_from_backup($node_primary, 'my_backup');
+
+$node_standby->append_conf(
+ "recovery.conf", qq(
+primary_conninfo='$connstr_primary'
+standby_mode=on
+recovery_target_timeline='latest'
+)
+ );
+
+$node_standby->start;
+
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'],
+ qr/^t$/,
+ 'standby is in recovery');
+
+command_ok([ 'pg_ctl', 'promote', '-w', '-D', $node_standby->data_dir ],
+ 'pg_ctl promote -w of standby runs');
+
+# no wait here
+
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'],
+ qr/^f$/,
+ 'promoted standby is not in recovery');
--
2.7.2
On Mon, Feb 29, 2016 at 10:30 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 2/19/16 3:09 PM, Tom Lane wrote:
I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.Here is a patch set around that idea.
The subsequent discussion mentioned that there might still be a window
between end of waiting and when read-write queries would be accepted. I
don't know how big that window would be in practice and would be
interested in some testing and feedback.
Here is some input for 0001 (useful taken independently):
+$node_primary->append_conf(
+ "postgresql.conf", qq(
+wal_level = hot_standby
+max_wal_senders = 2
+wal_keep_segments = 20
+hot_standby = on
+)
+ );
That's more or less allows_streaming => 1 in $node_primary->init.
+$node_standby->append_conf(
+ "recovery.conf", qq(
+primary_conninfo='$connstr_primary'
+standby_mode=on
+recovery_target_timeline='latest'
+)
+ );
Here you could just use enable_streaming => 1 when calling init_from_backup.
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
$node_standby->psql instead of a direct command? The result is
returned directly with the call to the routine.
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+ qr/^f$/,
+ 'promoted standby is not in recovery');
Once again $node_standby->psql?
+sleep 3; # needs a moment to react
+
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+ qr/^f$/,
+ 'promoted standby is not in recovery');
sleep() is something we should try to avoid as much as possible in our
tests. On slow platforms, take hamster, promote is surely going to
take longer than that to be processed by the standby node and put it
out of recovery. I would suggest using
$node_standby->poll_query_until('SELECT pg_is_in_recovery()') to
validate the end of the test.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 29, 2016 at 4:28 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I would suggest using
$node_standby->poll_query_until('SELECT pg_is_in_recovery()') to
validate the end of the test.
Meh. SELECT NOT pg_is_in_recovery(). This will wait until the query
returns true.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 29, 2016 at 4:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Feb 29, 2016 at 4:28 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I would suggest using
$node_standby->poll_query_until('SELECT pg_is_in_recovery()') to
validate the end of the test.Meh. SELECT NOT pg_is_in_recovery(). This will wait until the query
returns true.
Here are some comments about 0002
+ if ((fd = open(control_file_path, O_RDONLY | PG_BINARY, 0)) == -1)
+ {
+ fprintf(stderr, _("%s: could not open file \"%s\" for
reading: %s\n"),
+ progname, control_file_path, strerror(errno));
+ exit(1);
+ }
[...]
Most of the logic of get_control_dbstate() is a mimic of the
recently-introduced get_controlfile() in controldata_utils.c of
dc7d70e. I think that we had better use that, and that we had better
emit an error should an incorrect control file be found while running
those pg_ctl commands as the control file present had better have a
correct CRC all the time or something wrong is going on. So this would
lead to this logic for get_control_dbstate():
control_file_data = get_controlfile(pg_data, progname);
res = control_file_data->state;
pfree(control_file_data);
Except that, 0002 is a good thing to have, switching from the presence
of recovery.conf to what is in the control data file is definitely
more robust, a lot of things happen from when recovery.conf is renamed
to recovery.done until WAL is enabled for backends, particularly the
end of recovery checkpoint and the cleanup of the WAL segments of the
previous timeline.
And now for 0003...
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+ qr/^t$/,
+ 'standby is in recovery');
[...]
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+ qr/^f$/,
+ 'promoted standby is not in recovery');
for those two you can use $node_standby->psql_safe to get back a query result.
The subsequent discussion mentioned that there might still be a window
between end of waiting and when read-write queries would be accepted. I
don't know how big that window would be in practice and would be
interested in some testing and feedback.
And so... Based on the previous discussion, there is an interval of
time between the moment the update of the control file is done and the
point where backends are allowed to emit WAL. I am really worrying
about this interval of time actually, as once pg_ctl exits client
applications should be guaranteed to connect to the server but the
current patch would not be failure-proof, and I imagine that
particularly on CPU-constrained environments this is going to become
unstable. Particularly I expect that slow machines are likely going to
fail in the last test of 003_promote.pl as designed (I am away from
home now so I have not been able to test that unfortunately on my own
stuff but that's possible) because pg_is_in_recovery is controlled by
SharedRecoveryInProgress, so it may be possible that
pg_is_in_recovery() returns false while the control file status is
DB_IN_PRODUCTION. The main factor that can contribute to a larger
window is a higher number of 2PC transactions that need to be loaded
back to shared memory after scanning pg_twophase.
If we are going to have a reliable promote wait mode for pg_ctl, I
think that we had better first reduce this window, something that
could be done is to update SharedRecoveryInProgress while holding an
exclusive lock on ControlFileLock, with this flow for example. See for
example the patch attached, we can reduce this window to zero for
backends if some of them refer to ControlFile in shared memory thanks
to ControlFileLock. For clients, there will still be a small window
during which backends could write WAL and the control file status is
ARCHIVE_RECOVERY on disk. If we want to have a reliable promote wait
mode for pg_ctl, I think that we had better do something like the
attached first. Thoughts?
Looking at where is used the shared memory structure of ControlFile,
one thing to worry about is CreateRestartPoint but its code paths are
already using ControlFileLock when looking at the status file, so they
are safe with this logic.
--
Michael
Attachments:
delay-status-update-end-recovery.patchtext/x-patch; charset=US-ASCII; name=delay-status-update-end-recovery.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 00f139a..590385c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7371,12 +7371,6 @@ StartupXLOG(void)
*/
InRecovery = false;
- LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- ControlFile->state = DB_IN_PRODUCTION;
- ControlFile->time = (pg_time_t) time(NULL);
- UpdateControlFile();
- LWLockRelease(ControlFileLock);
-
/* start the archive_timeout timer running */
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
@@ -7434,15 +7428,32 @@ StartupXLOG(void)
CompleteCommitTsInitialization();
/*
- * All done. Allow backends to write WAL. (Although the bool flag is
- * probably atomic in itself, we use the info_lck here to ensure that
- * there are no race conditions concerning visibility of other recent
- * updates to shared memory.)
+ * All done with end-of-recovery actions.
+ *
+ * Now, allow backends to write WAL and update the control file status
+ * in consequence. The boolean flag allowing backends to write WAL is
+ * updated while holding ControlFileLock to prevent other backends
+ * to look at an inconsistent state of the control file in shared
+ * memory. There is still a small window during which backends can
+ * write WAL and the control file is still referring to a system not
+ * in DB_IN_PRODUCTION state while looking at the on-disk control
+ * file.
+ *
+ * Also, the boolean flag to allow WAL is probably atomic in itself,
+ * we use the info_lck here to ensure that there are no race conditions
+ * concerning visibility of other recent updates to shared memory.
*/
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->state = DB_IN_PRODUCTION;
+ ControlFile->time = (pg_time_t) time(NULL);
+
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->SharedRecoveryInProgress = false;
SpinLockRelease(&XLogCtl->info_lck);
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+
/*
* If there were cascading standby servers connected to us, nudge any wal
* sender processes to notice that we've been promoted.
Hi Peter,
On 3/9/16 3:08 PM, Michael Paquier wrote:
Here are some comments about 0002
<...>
I think that we had better do something like the attached first.
Thoughts?
It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?
Thanks,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/16/16 12:19 PM, David Steele wrote:
Hi Peter,
On 3/9/16 3:08 PM, Michael Paquier wrote:
Here are some comments about 0002
<...>
I think that we had better do something like the attached first.
Thoughts?It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?
Bump.
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 23, 2016 at 1:47 AM, David Steele <david@pgmasters.net> wrote:
On 3/16/16 12:19 PM, David Steele wrote:
Hi Peter,
On 3/9/16 3:08 PM, Michael Paquier wrote:
Here are some comments about 0002
<...>
I think that we had better do something like the attached first.
Thoughts?It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?Bump.
Feature freeze is getting close by. Once the deadline is reached, I
guess that we had better return the patch.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/07/2016 01:22 AM, Michael Paquier wrote:
On Wed, Mar 23, 2016 at 1:47 AM, David Steele <david@pgmasters.net> wrote:
On 3/16/16 12:19 PM, David Steele wrote:
Hi Peter,
On 3/9/16 3:08 PM, Michael Paquier wrote:
Here are some comments about 0002
<...>
I think that we had better do something like the attached first.
Thoughts?It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?Bump.
Feature freeze is getting close by. Once the deadline is reached, I
guess that we had better return the patch.
I have closed the patch as returned with feedback. There were a number
of interesting points raised that will need some time for consideration
and testing.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Updated patch set for pg_ctl promote -w, incorporating AFAICT all of the
previous reviews, in particular Michael Paquier's changes to the
StartupXLOG() ordering, and rebasing on top of
src/common/controldata_utils.c.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Make-command_like-output-more-compact.patchtext/x-patch; name=0001-Make-command_like-output-more-compact.patchDownload
From 371ca61ab87e792864ac2f76f3e0dad7912c247e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 11:39:43 -0400
Subject: [PATCH 1/5] Make command_like output more compact
Consistently print the test name, not the full command, which can be
quite lenghty and include temporary directory names and other
distracting details.
---
src/test/perl/TestLib.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 649fd82..c7b3262 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -276,8 +276,8 @@ sub command_like
my ($stdout, $stderr);
print("# Running: " . join(" ", @{$cmd}) . "\n");
my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
- ok($result, "@$cmd exit code 0");
- is($stderr, '', "@$cmd no stderr");
+ ok($result, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
like($stdout, $expected_stdout, "$test_name: matches");
}
--
2.9.2
0002-pg_ctl-Add-tests-for-promote-action.patchtext/x-patch; name=0002-pg_ctl-Add-tests-for-promote-action.patchDownload
From eabd099bf09d9a4815c29c730d568161c87a67ba Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 10:48:43 -0400
Subject: [PATCH 2/5] pg_ctl: Add tests for promote action
---
src/bin/pg_ctl/t/003_promote.pl | 41 +++++++++++++++++++++++++++++++++++++++++
src/test/perl/TestLib.pm | 11 +++++++++++
2 files changed, 52 insertions(+)
create mode 100644 src/bin/pg_ctl/t/003_promote.pl
diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl
new file mode 100644
index 0000000..d7e7b99
--- /dev/null
+++ b/src/bin/pg_ctl/t/003_promote.pl
@@ -0,0 +1,41 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 9;
+
+my $tempdir = TestLib::tempdir;
+
+command_fails_like([ 'pg_ctl', 'promote', '-D', "$tempdir/nonexistent" ],
+ qr/directory .* does not exist/,
+ 'pg_ctl promote with nonexistent directory');
+
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+
+command_fails_like([ 'pg_ctl', 'promote', '-D', $node_primary->data_dir ],
+ qr/PID file .* does not exist/,
+ 'pg_ctl promote of not running instance fails');
+
+$node_primary->start;
+
+command_fails_like([ 'pg_ctl', 'promote', '-D', $node_primary->data_dir ],
+ qr/not in standby mode/,
+ 'pg_ctl promote of primary instance fails');
+
+my $node_standby = get_new_node('standby');
+$node_primary->backup('my_backup');
+$node_standby->init_from_backup($node_primary, 'my_backup', has_streaming => 1);
+$node_standby->start;
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 't', 'standby is in recovery');
+
+command_ok([ 'pg_ctl', 'promote', '-D', $node_standby->data_dir ],
+ 'pg_ctl promote of standby runs');
+
+$node_standby->poll_query_until('postgres', 'SELECT NOT pg_is_in_recovery()');
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 'f', 'promoted standby is not in recovery');
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index c7b3262..51b533e 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -34,6 +34,7 @@ our @EXPORT = qw(
program_version_ok
program_options_handling_ok
command_like
+ command_fails_like
$windows_os
);
@@ -281,4 +282,14 @@ sub command_like
like($stdout, $expected_stdout, "$test_name: matches");
}
+sub command_fails_like
+{
+ my ($cmd, $expected_stderr, $test_name) = @_;
+ my ($stdout, $stderr);
+ print("# Running: " . join(" ", @{$cmd}) . "\n");
+ my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ ok(!$result, "$test_name: exit code not 0");
+ like($stderr, $expected_stderr, "$test_name: matches");
+}
+
1;
--
2.9.2
0003-pg_ctl-Detect-current-standby-state-from-pg_control.patchtext/x-patch; name=0003-pg_ctl-Detect-current-standby-state-from-pg_control.patchDownload
From b5bc405a85c6a84c7916f5cc2c3225438df70315 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 11:23:43 -0400
Subject: [PATCH 3/5] pg_ctl: Detect current standby state from pg_control
pg_ctl used to determine whether a server was in standby mode by looking
for a recovery.conf file. With this change, it instead looks into
pg_control, which is potentially more accurate. There are also
occasional discussions about removing recovery.conf, so this removes one
dependency.
Add a wait_seconds argument to get_controlfile() so that pg_ctl can wait
a bit for a consistent pg_control file. Otherwise, pg_ctl operations
might fail on rare occasions when the server is updating the file at the
same time.
---
src/backend/utils/misc/pg_controldata.c | 8 ++--
src/bin/pg_controldata/pg_controldata.c | 2 +-
src/bin/pg_ctl/pg_ctl.c | 35 ++++++++------
src/common/controldata_utils.c | 83 ++++++++++++++++++++-------------
src/include/common/controldata_utils.h | 2 +-
5 files changed, 76 insertions(+), 54 deletions(-)
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 34ee76a..13f0523 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -51,7 +51,7 @@ pg_control_system(PG_FUNCTION_ARGS)
tupdesc = BlessTupleDesc(tupdesc);
/* read the control file */
- ControlFile = get_controlfile(DataDir, NULL);
+ ControlFile = get_controlfile(DataDir, NULL, NULL);
values[0] = Int32GetDatum(ControlFile->pg_control_version);
nulls[0] = false;
@@ -127,7 +127,7 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
tupdesc = BlessTupleDesc(tupdesc);
/* Read the control file. */
- ControlFile = get_controlfile(DataDir, NULL);
+ ControlFile = get_controlfile(DataDir, NULL, NULL);
/*
* Calculate name of the WAL file containing the latest checkpoint's REDO
@@ -229,7 +229,7 @@ pg_control_recovery(PG_FUNCTION_ARGS)
tupdesc = BlessTupleDesc(tupdesc);
/* read the control file */
- ControlFile = get_controlfile(DataDir, NULL);
+ ControlFile = get_controlfile(DataDir, NULL, NULL);
values[0] = LSNGetDatum(ControlFile->minRecoveryPoint);
nulls[0] = false;
@@ -294,7 +294,7 @@ pg_control_init(PG_FUNCTION_ARGS)
tupdesc = BlessTupleDesc(tupdesc);
/* read the control file */
- ControlFile = get_controlfile(DataDir, NULL);
+ ControlFile = get_controlfile(DataDir, NULL, NULL);
values[0] = Int32GetDatum(ControlFile->maxAlign);
nulls[0] = false;
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 96619a2..c107601 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -155,7 +155,7 @@ main(int argc, char *argv[])
}
/* get a copy of the control file */
- ControlFile = get_controlfile(DataDir, progname);
+ ControlFile = get_controlfile(DataDir, progname, NULL);
/*
* This slightly-chintzy coding will work as long as the control file
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index efc0729..fac3986 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -19,6 +19,8 @@
#include "postgres_fe.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -96,7 +98,6 @@ static char postopts_file[MAXPGPATH];
static char version_file[MAXPGPATH];
static char pid_file[MAXPGPATH];
static char backup_file[MAXPGPATH];
-static char recovery_file[MAXPGPATH];
static char promote_file[MAXPGPATH];
#ifdef WIN32
@@ -988,15 +989,17 @@ do_stop(void)
/*
* If backup_label exists, an online backup is running. Warn the user
* that smart shutdown will wait for it to finish. However, if
- * recovery.conf is also present, we're recovering from an online
+ * the server is in archive recovery, we're recovering from an online
* backup instead of performing one.
*/
if (shutdown_mode == SMART_MODE &&
- stat(backup_file, &statbuf) == 0 &&
- stat(recovery_file, &statbuf) != 0)
+ stat(backup_file, &statbuf) == 0)
{
- print_msg(_("WARNING: online backup mode is active\n"
- "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
+ ControlFileData *control_file_data = get_controlfile(pg_data, progname, &wait_seconds);
+ if (control_file_data->state != DB_IN_ARCHIVE_RECOVERY)
+ print_msg(_("WARNING: online backup mode is active\n"
+ "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
+ pfree(control_file_data);
}
print_msg(_("waiting for server to shut down..."));
@@ -1076,15 +1079,17 @@ do_restart(void)
/*
* If backup_label exists, an online backup is running. Warn the user
* that smart shutdown will wait for it to finish. However, if
- * recovery.conf is also present, we're recovering from an online
+ * the server is in archive recovery, we're recovering from an online
* backup instead of performing one.
*/
if (shutdown_mode == SMART_MODE &&
- stat(backup_file, &statbuf) == 0 &&
- stat(recovery_file, &statbuf) != 0)
+ stat(backup_file, &statbuf) == 0)
{
- print_msg(_("WARNING: online backup mode is active\n"
- "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
+ ControlFileData *control_file_data = get_controlfile(pg_data, progname, &wait_seconds);
+ if (control_file_data->state != DB_IN_ARCHIVE_RECOVERY)
+ print_msg(_("WARNING: online backup mode is active\n"
+ "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
+ pfree(control_file_data);
}
print_msg(_("waiting for server to shut down..."));
@@ -1168,7 +1173,7 @@ do_promote(void)
{
FILE *prmfile;
pgpid_t pid;
- struct stat statbuf;
+ ControlFileData *control_file_data;
pid = get_pgpid(false);
@@ -1187,14 +1192,15 @@ do_promote(void)
exit(1);
}
- /* If recovery.conf doesn't exist, the server is not in standby mode */
- if (stat(recovery_file, &statbuf) != 0)
+ control_file_data = get_controlfile(pg_data, progname, &wait_seconds);
+ if (control_file_data->state != DB_IN_ARCHIVE_RECOVERY)
{
write_stderr(_("%s: cannot promote server; "
"server is not in standby mode\n"),
progname);
exit(1);
}
+ pfree(control_file_data);
/*
* For 9.3 onwards, "fast" promotion is performed. Promotion with a full
@@ -2401,7 +2407,6 @@ main(int argc, char **argv)
snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
- snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
}
switch (ctl_command)
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 5592fe7..01002b0 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -35,60 +35,77 @@
* for pfreeing the result.
*/
ControlFileData *
-get_controlfile(char *DataDir, const char *progname)
+get_controlfile(const char *DataDir, const char *progname, int *wait_seconds)
{
ControlFileData *ControlFile;
- int fd;
char ControlFilePath[MAXPGPATH];
- pg_crc32c crc;
ControlFile = palloc(sizeof(ControlFileData));
snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
- if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
+ for (;;)
+ {
+ int fd;
+ pg_crc32c crc;
+
+ if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
#ifndef FRONTEND
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\" for reading: %m",
- ControlFilePath)));
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\" for reading: %m",
+ ControlFilePath)));
#else
- {
- fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
- progname, ControlFilePath, strerror(errno));
- exit(EXIT_FAILURE);
- }
+ {
+ fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
+ progname, ControlFilePath, strerror(errno));
+ exit(EXIT_FAILURE);
+ }
#endif
- if (read(fd, ControlFile, sizeof(ControlFileData)) != sizeof(ControlFileData))
+ if (read(fd, ControlFile, sizeof(ControlFileData)) != sizeof(ControlFileData))
#ifndef FRONTEND
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m", ControlFilePath)));
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m", ControlFilePath)));
#else
- {
- fprintf(stderr, _("%s: could not read file \"%s\": %s\n"),
- progname, ControlFilePath, strerror(errno));
- exit(EXIT_FAILURE);
- }
+ {
+ fprintf(stderr, _("%s: could not read file \"%s\": %s\n"),
+ progname, ControlFilePath, strerror(errno));
+ exit(EXIT_FAILURE);
+ }
#endif
- close(fd);
+ close(fd);
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) ControlFile,
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
- /* Check the CRC. */
- INIT_CRC32C(crc);
- COMP_CRC32C(crc,
- (char *) ControlFile,
- offsetof(ControlFileData, crc));
- FIN_CRC32C(crc);
+ if (EQ_CRC32C(crc, ControlFile->crc))
+ break;
+
+ if (wait_seconds && *wait_seconds > 0)
+ {
+ sleep(1);
+ (*wait_seconds)--;
+ continue;
+ }
- if (!EQ_CRC32C(crc, ControlFile->crc))
#ifndef FRONTEND
elog(ERROR, _("calculated CRC checksum does not match value stored in file"));
#else
- printf(_("WARNING: Calculated CRC checksum does not match value stored in file.\n"
- "Either the file is corrupt, or it has a different layout than this program\n"
- "is expecting. The results below are untrustworthy.\n\n"));
+ if (wait_seconds)
+ fprintf(stderr, _("%s: control file \"%s\" appears to be corrupt\n"),
+ progname, ControlFilePath);
+ else
+ printf(_("WARNING: Calculated CRC checksum does not match value stored in file.\n"
+ "Either the file is corrupt, or it has a different layout than this program\n"
+ "is expecting. The results below are untrustworthy.\n\n"));
#endif
+ }
/* Make sure the control file is valid byte order. */
if (ControlFile->pg_control_version % 65536 == 0 &&
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index a355d22..283ec55 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -12,6 +12,6 @@
#include "catalog/pg_control.h"
-extern ControlFileData *get_controlfile(char *DataDir, const char *progname);
+extern ControlFileData *get_controlfile(const char *DataDir, const char *progname, int *wait_seconds);
#endif /* COMMON_CONTROLDATA_UTILS_H */
--
2.9.2
0004-Delay-updating-control-file-to-in-production.patchtext/x-patch; name=0004-Delay-updating-control-file-to-in-production.patchDownload
From af156fbcb2e0e59045f814f4dd3d3494d0b2078d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 12:08:49 -0400
Subject: [PATCH 4/5] Delay updating control file to "in production"
Move the updating of the control file to "in production" status until
the point where WAL writes are allowed. Before, there could be a
significant gap between the control file update and write transactions
actually being allowed. This makes it more reliable to use the control
status to verify the end of a promotion.
Michael Paquier
---
src/backend/access/transam/xlog.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..1f8d054 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7349,12 +7349,6 @@ StartupXLOG(void)
*/
InRecovery = false;
- LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- ControlFile->state = DB_IN_PRODUCTION;
- ControlFile->time = (pg_time_t) time(NULL);
- UpdateControlFile();
- LWLockRelease(ControlFileLock);
-
/* start the archive_timeout timer running */
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
@@ -7412,15 +7406,32 @@ StartupXLOG(void)
CompleteCommitTsInitialization();
/*
- * All done. Allow backends to write WAL. (Although the bool flag is
- * probably atomic in itself, we use the info_lck here to ensure that
- * there are no race conditions concerning visibility of other recent
- * updates to shared memory.)
+ * All done with end-of-recovery actions.
+ *
+ * Now allow backends to write WAL and update the control file status in
+ * consequence. The boolean flag allowing backends to write WAL is
+ * updated while holding ControlFileLock to prevent other backends to look
+ * at an inconsistent state of the control file in shared memory. There
+ * is still a small window during which backends can write WAL and the
+ * control file is still referring to a system not in DB_IN_PRODUCTION
+ * state while looking at the on-disk control file.
+ *
+ * Also, although the boolean flag to allow WAL is probably atomic in
+ * itself, we use the info_lck here to ensure that there are no race
+ * conditions concerning visibility of other recent updates to shared
+ * memory.
*/
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->state = DB_IN_PRODUCTION;
+ ControlFile->time = (pg_time_t) time(NULL);
+
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->SharedRecoveryInProgress = false;
SpinLockRelease(&XLogCtl->info_lck);
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+
/*
* If there were cascading standby servers connected to us, nudge any wal
* sender processes to notice that we've been promoted.
--
2.9.2
0005-pg_ctl-Add-wait-option-to-promote-action.patchtext/x-patch; name=0005-pg_ctl-Add-wait-option-to-promote-action.patchDownload
From 384083f9189dbd691a57f98dfefbf4c836b2d0b6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 12:44:08 -0400
Subject: [PATCH 5/5] pg_ctl: Add wait option to promote action
When waiting is selected for the promote action, look into pg_control
until the state changes, then use the PQping-based waiting until the
server is reachable.
---
doc/src/sgml/ref/pg_ctl-ref.sgml | 29 ++++++++++++++++++-----
src/bin/pg_ctl/pg_ctl.c | 50 +++++++++++++++++++++++++++++-----------
src/bin/pg_ctl/t/003_promote.pl | 18 ++++++++++++++-
3 files changed, 77 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 6ceb781..a00c355 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -91,6 +91,8 @@
<cmdsynopsis>
<command>pg_ctl</command>
<arg choice="plain"><option>promote</option></arg>
+ <arg choice="opt"><option>-w</option></arg>
+ <arg choice="opt"><option>-t</option> <replaceable>seconds</replaceable></arg>
<arg choice="opt"><option>-s</option></arg>
<arg choice="opt"><option>-D</option> <replaceable>datadir</replaceable></arg>
</cmdsynopsis>
@@ -361,8 +363,8 @@ <title>Options</title>
<term><option>--timeout</option></term>
<listitem>
<para>
- The maximum number of seconds to wait when waiting for startup or
- shutdown to complete. Defaults to the value of the
+ The maximum number of seconds to wait when waiting for an operation
+ to complete (see option <option>-w</option>). Defaults to the value of the
<envar>PGCTLTIMEOUT</> environment variable or, if not set, to 60
seconds.
</para>
@@ -383,8 +385,23 @@ <title>Options</title>
<term><option>-w</option></term>
<listitem>
<para>
- Wait for the startup or shutdown to complete.
- Waiting is the default option for shutdowns, but not startups.
+ Wait for an operation to complete. This is supported for the
+ modes <literal>start</literal>, <literal>stop</literal>,
+ <literal>restart</literal>, <literal>promote</literal>,
+ and <literal>register</literal>.
+ </para>
+
+ <para>
+ Waiting is the default option for shutdowns, but not startups,
+ restarts, or promotions. This is mainly for historical reasons; the
+ waiting option is almost always preferable. If waiting is not
+ selected, the requested action is triggered, but there is no feedback
+ about its success. In that case, the server log file or an external
+ monitoring system would have to be used to check the progress and
+ success of the operation.
+ </para>
+
+ <para>
When waiting for startup, <command>pg_ctl</command> repeatedly
attempts to connect to the server.
When waiting for shutdown, <command>pg_ctl</command> waits for
@@ -400,8 +417,8 @@ <title>Options</title>
<term><option>-W</option></term>
<listitem>
<para>
- Do not wait for startup or shutdown to complete. This is the
- default for start and restart modes.
+ Do not wait for an operation to complete. This is the opposite of the
+ option <option>-w</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index fac3986..87cdcf3 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1233,7 +1233,39 @@ do_promote(void)
exit(1);
}
- print_msg(_("server promoting\n"));
+ if (do_wait)
+ {
+ DBState state = DB_STARTUP;
+
+ print_msg(_("waiting for server to promote..."));
+ while (wait_seconds > 0)
+ {
+ ControlFileData *control_file_data;
+
+ control_file_data = get_controlfile(pg_data, progname, &wait_seconds);
+ state = control_file_data->state;
+ pfree(control_file_data);
+
+ if (state == DB_IN_PRODUCTION)
+ break;
+
+ print_msg(".");
+ pg_usleep(1000000); /* 1 sec */
+ wait_seconds--;
+ }
+ if (state == DB_IN_PRODUCTION)
+ {
+ print_msg(_(" done\n"));
+ print_msg(_("server promoted\n"));
+ }
+ else
+ {
+ print_msg(_(" stopped waiting\n"));
+ print_msg(_("server is still promoting\n"));
+ }
+ }
+ else
+ print_msg(_("server promoting\n"));
}
@@ -2381,18 +2413,10 @@ main(int argc, char **argv)
if (!wait_set)
{
- switch (ctl_command)
- {
- case RESTART_COMMAND:
- case START_COMMAND:
- do_wait = false;
- break;
- case STOP_COMMAND:
- do_wait = true;
- break;
- default:
- break;
- }
+ if (ctl_command == STOP_COMMAND)
+ do_wait = true;
+ else
+ do_wait = false;
}
if (ctl_command == RELOAD_COMMAND)
diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl
index d7e7b99..6443926 100644
--- a/src/bin/pg_ctl/t/003_promote.pl
+++ b/src/bin/pg_ctl/t/003_promote.pl
@@ -3,7 +3,7 @@
use PostgresNode;
use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
my $tempdir = TestLib::tempdir;
@@ -39,3 +39,19 @@
is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
'f', 'promoted standby is not in recovery');
+
+# same again with wait option
+$node_standby = get_new_node('standby2');
+$node_standby->init_from_backup($node_primary, 'my_backup', has_streaming => 1);
+$node_standby->start;
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 't', 'standby is in recovery');
+
+command_ok([ 'pg_ctl', 'promote', '-w', '-D', $node_standby->data_dir ],
+ 'pg_ctl promote -w of standby runs');
+
+# no wait here
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 'f', 'promoted standby is not in recovery');
--
2.9.2
On Wed, Aug 3, 2016 at 9:35 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Updated patch set for pg_ctl promote -w, incorporating AFAICT all of the
previous reviews, in particular Michael Paquier's changes to the
StartupXLOG() ordering, and rebasing on top of
src/common/controldata_utils.c.
Glad to see a new set of patches here:
1) From 0001
- ok($result, "@$cmd exit code 0");
- is($stderr, '', "@$cmd no stderr");
+ ok($result, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
Okay with that, there is anyway a log mentioning the command anyway.
Perhaps you'd want to backpatch that?
2) From 0002:
+command_fails_like([ 'pg_ctl', 'promote', '-D', "$tempdir/nonexistent" ],
Any code using src/port/getopt_long.c (Windows!) is going to fail on
that. The argument that is not preceded by an option name needs to be
put at the end of the command. So for example this command needs to be
changed to that:
[ 'pg_ctl', '-D', "$tempdir/nonexistent", 'promote' ]
+$node_standby->poll_query_until('postgres', 'SELECT NOT pg_is_in_recovery()');
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 'f', 'promoted standby is not in recovery');
We could just check that poll_query_until returns 1 here. This would
save running one query.
3) From 0003
In do_stop, this patches makes the wait happen for a maximum of
wait_seconds * 2, once when getting the control file information, and
once when waiting for the server to shut down. This is not a good
idea, and the idea of putting a wait argument in get_controlfile does
not seem a good interface to me. I'd rather see get_controlfile be
extended with a flag saying no_error_on_failure and keep the wait
logic within pg_ctl. The flag would do the following when enabled:
- for frontends, do not issue a WARNING message and return NULL instead.
- for backends, do not issue an ERROR and return NULL instead.
4) Patch 0004, no real comments :) I am glad you picked up this idea.
5) Patch 0005:
Looks good to me. I just noticed that similar to 0002 regarding the
ordering of those arguments:
+command_ok([ 'pg_ctl', 'promote', '-w', '-D', $node_standby->data_dir ],
+ 'pg_ctl promote -w of standby runs');
Thanks,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/5/16 12:14 AM, Michael Paquier wrote:
In do_stop, this patches makes the wait happen for a maximum of
wait_seconds * 2, once when getting the control file information, and
once when waiting for the server to shut down.
That's not how I read it. get_controlfile() will decrease the
wait_seconds argument by how much wait time it has used. The wait for
shutdown will then only use as much seconds as are left.
This is not a good
idea, and the idea of putting a wait argument in get_controlfile does
not seem a good interface to me. I'd rather see get_controlfile be
extended with a flag saying no_error_on_failure and keep the wait
logic within pg_ctl.
I guess we could write a wrapper function in pg_ctl that encapsulated
the wait logic.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 6, 2016 at 10:43 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 8/5/16 12:14 AM, Michael Paquier wrote:
In do_stop, this patches makes the wait happen for a maximum of
wait_seconds * 2, once when getting the control file information, and
once when waiting for the server to shut down.That's not how I read it. get_controlfile() will decrease the
wait_seconds argument by how much wait time it has used. The wait for
shutdown will then only use as much seconds as are left.
Ah, right. The reference to wait_seconds gets decremented.
This is not a good
idea, and the idea of putting a wait argument in get_controlfile does
not seem a good interface to me. I'd rather see get_controlfile be
extended with a flag saying no_error_on_failure and keep the wait
logic within pg_ctl.I guess we could write a wrapper function in pg_ctl that encapsulated
the wait logic.
That's what I would do.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/7/16 9:44 PM, Michael Paquier wrote:
This is not a good
idea, and the idea of putting a wait argument in get_controlfile does
not seem a good interface to me. I'd rather see get_controlfile be
extended with a flag saying no_error_on_failure and keep the wait
logic within pg_ctl.I guess we could write a wrapper function in pg_ctl that encapsulated
the wait logic.That's what I would do.
New patches, incorporating your suggestions.
I moved some of the error handling out of get_controlfile() and back
into the callers, because it was getting too weird that that function
knew so much about the callers' intentions. That way we don't actually
have to change the signature.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v2-0001-Make-command_like-output-more-compact.patchtext/x-patch; name=v2-0001-Make-command_like-output-more-compact.patchDownload
From 04a724675769b2412b739c408abc136d17d52794 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 11:39:43 -0400
Subject: [PATCH v2 1/5] Make command_like output more compact
Consistently print the test name, not the full command, which can be
quite lenghty and include temporary directory names and other
distracting details.
---
src/test/perl/TestLib.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 649fd82..c7b3262 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -276,8 +276,8 @@ sub command_like
my ($stdout, $stderr);
print("# Running: " . join(" ", @{$cmd}) . "\n");
my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
- ok($result, "@$cmd exit code 0");
- is($stderr, '', "@$cmd no stderr");
+ ok($result, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
like($stdout, $expected_stdout, "$test_name: matches");
}
--
2.9.2
v2-0002-pg_ctl-Add-tests-for-promote-action.patchtext/x-patch; name=v2-0002-pg_ctl-Add-tests-for-promote-action.patchDownload
From b7c04b3f6cbe7a37359509363da0701c84063113 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 10:48:43 -0400
Subject: [PATCH v2 2/5] pg_ctl: Add tests for promote action
---
src/bin/pg_ctl/t/003_promote.pl | 39 +++++++++++++++++++++++++++++++++++++++
src/test/perl/TestLib.pm | 11 +++++++++++
2 files changed, 50 insertions(+)
create mode 100644 src/bin/pg_ctl/t/003_promote.pl
diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl
new file mode 100644
index 0000000..1461234
--- /dev/null
+++ b/src/bin/pg_ctl/t/003_promote.pl
@@ -0,0 +1,39 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 9;
+
+my $tempdir = TestLib::tempdir;
+
+command_fails_like([ 'pg_ctl', '-D', "$tempdir/nonexistent", 'promote' ],
+ qr/directory .* does not exist/,
+ 'pg_ctl promote with nonexistent directory');
+
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+
+command_fails_like([ 'pg_ctl', '-D', $node_primary->data_dir, 'promote' ],
+ qr/PID file .* does not exist/,
+ 'pg_ctl promote of not running instance fails');
+
+$node_primary->start;
+
+command_fails_like([ 'pg_ctl', '-D', $node_primary->data_dir, 'promote' ],
+ qr/not in standby mode/,
+ 'pg_ctl promote of primary instance fails');
+
+my $node_standby = get_new_node('standby');
+$node_primary->backup('my_backup');
+$node_standby->init_from_backup($node_primary, 'my_backup', has_streaming => 1);
+$node_standby->start;
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 't', 'standby is in recovery');
+
+command_ok([ 'pg_ctl', '-D', $node_standby->data_dir, 'promote' ],
+ 'pg_ctl promote of standby runs');
+
+ok($node_standby->poll_query_until('postgres', 'SELECT NOT pg_is_in_recovery()'),
+ 'promoted standby is not in recovery');
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index c7b3262..51b533e 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -34,6 +34,7 @@ our @EXPORT = qw(
program_version_ok
program_options_handling_ok
command_like
+ command_fails_like
$windows_os
);
@@ -281,4 +282,14 @@ sub command_like
like($stdout, $expected_stdout, "$test_name: matches");
}
+sub command_fails_like
+{
+ my ($cmd, $expected_stderr, $test_name) = @_;
+ my ($stdout, $stderr);
+ print("# Running: " . join(" ", @{$cmd}) . "\n");
+ my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ ok(!$result, "$test_name: exit code not 0");
+ like($stderr, $expected_stderr, "$test_name: matches");
+}
+
1;
--
2.9.2
v2-0003-pg_ctl-Detect-current-standby-state-from-pg_contr.patchtext/x-patch; name=v2-0003-pg_ctl-Detect-current-standby-state-from-pg_contr.patchDownload
From cf77e8419bae21a84a2a30d260a70bac2c19498a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 11:23:43 -0400
Subject: [PATCH v2 3/5] pg_ctl: Detect current standby state from pg_control
pg_ctl used to determine whether a server was in standby mode by looking
for a recovery.conf file. With this change, it instead looks into
pg_control, which is potentially more accurate. There are also
occasional discussions about removing recovery.conf, so this removes one
dependency.
Add a wait_seconds argument to get_controlfile() so that pg_ctl can wait
a bit for a consistent pg_control file. Otherwise, pg_ctl operations
might fail on rare occasions when the server is updating the file at the
same time.
---
src/backend/utils/misc/pg_controldata.c | 12 +++++++++
src/bin/pg_controldata/pg_controldata.c | 4 +++
src/bin/pg_ctl/pg_ctl.c | 47 ++++++++++++++++++++++++++-------
src/common/controldata_utils.c | 15 +++++------
src/include/common/controldata_utils.h | 2 +-
5 files changed, 62 insertions(+), 18 deletions(-)
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 34ee76a..4f9de83 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -52,6 +52,9 @@ pg_control_system(PG_FUNCTION_ARGS)
/* read the control file */
ControlFile = get_controlfile(DataDir, NULL);
+ if (!ControlFile)
+ ereport(ERROR,
+ (errmsg("calculated CRC checksum does not match value stored in file")));
values[0] = Int32GetDatum(ControlFile->pg_control_version);
nulls[0] = false;
@@ -128,6 +131,9 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
/* Read the control file. */
ControlFile = get_controlfile(DataDir, NULL);
+ if (!ControlFile)
+ ereport(ERROR,
+ (errmsg("calculated CRC checksum does not match value stored in file")));
/*
* Calculate name of the WAL file containing the latest checkpoint's REDO
@@ -230,6 +236,9 @@ pg_control_recovery(PG_FUNCTION_ARGS)
/* read the control file */
ControlFile = get_controlfile(DataDir, NULL);
+ if (!ControlFile)
+ ereport(ERROR,
+ (errmsg("calculated CRC checksum does not match value stored in file")));
values[0] = LSNGetDatum(ControlFile->minRecoveryPoint);
nulls[0] = false;
@@ -295,6 +304,9 @@ pg_control_init(PG_FUNCTION_ARGS)
/* read the control file */
ControlFile = get_controlfile(DataDir, NULL);
+ if (!ControlFile)
+ ereport(ERROR,
+ (errmsg("calculated CRC checksum does not match value stored in file")));
values[0] = Int32GetDatum(ControlFile->maxAlign);
nulls[0] = false;
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 96619a2..e92feab 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -156,6 +156,10 @@ main(int argc, char *argv[])
/* get a copy of the control file */
ControlFile = get_controlfile(DataDir, progname);
+ if (!ControlFile)
+ printf(_("WARNING: Calculated CRC checksum does not match value stored in file.\n"
+ "Either the file is corrupt, or it has a different layout than this program\n"
+ "is expecting. The results below are untrustworthy.\n\n"));
/*
* This slightly-chintzy coding will work as long as the control file
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index efc0729..85f6a92 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -19,6 +19,8 @@
#include "postgres_fe.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -96,7 +98,6 @@ static char postopts_file[MAXPGPATH];
static char version_file[MAXPGPATH];
static char pid_file[MAXPGPATH];
static char backup_file[MAXPGPATH];
-static char recovery_file[MAXPGPATH];
static char promote_file[MAXPGPATH];
#ifdef WIN32
@@ -158,6 +159,8 @@ static bool postmaster_is_alive(pid_t pid);
static void unlimit_core_size(void);
#endif
+static DBState get_control_dbstate(void);
+
#ifdef WIN32
static void
@@ -988,12 +991,12 @@ do_stop(void)
/*
* If backup_label exists, an online backup is running. Warn the user
* that smart shutdown will wait for it to finish. However, if
- * recovery.conf is also present, we're recovering from an online
+ * the server is in archive recovery, we're recovering from an online
* backup instead of performing one.
*/
if (shutdown_mode == SMART_MODE &&
stat(backup_file, &statbuf) == 0 &&
- stat(recovery_file, &statbuf) != 0)
+ get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
{
print_msg(_("WARNING: online backup mode is active\n"
"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
@@ -1076,12 +1079,12 @@ do_restart(void)
/*
* If backup_label exists, an online backup is running. Warn the user
* that smart shutdown will wait for it to finish. However, if
- * recovery.conf is also present, we're recovering from an online
+ * the server is in archive recovery, we're recovering from an online
* backup instead of performing one.
*/
if (shutdown_mode == SMART_MODE &&
stat(backup_file, &statbuf) == 0 &&
- stat(recovery_file, &statbuf) != 0)
+ get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
{
print_msg(_("WARNING: online backup mode is active\n"
"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
@@ -1168,7 +1171,6 @@ do_promote(void)
{
FILE *prmfile;
pgpid_t pid;
- struct stat statbuf;
pid = get_pgpid(false);
@@ -1187,8 +1189,7 @@ do_promote(void)
exit(1);
}
- /* If recovery.conf doesn't exist, the server is not in standby mode */
- if (stat(recovery_file, &statbuf) != 0)
+ if (get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
{
write_stderr(_("%s: cannot promote server; "
"server is not in standby mode\n"),
@@ -2115,6 +2116,35 @@ adjust_data_dir(void)
}
+static DBState
+get_control_dbstate(void)
+{
+ DBState ret;
+
+ for (;;)
+ {
+ ControlFileData *control_file_data = get_controlfile(pg_data, progname);
+
+ if (control_file_data)
+ {
+ ret = control_file_data->state;
+ pfree(control_file_data);
+ return ret;
+ }
+
+ if (wait_seconds > 0)
+ {
+ sleep(1);
+ wait_seconds--;
+ continue;
+ }
+
+ write_stderr(_("%s: control file appears to be corrupt\n"), progname);
+ exit(1);
+ }
+}
+
+
int
main(int argc, char **argv)
{
@@ -2401,7 +2431,6 @@ main(int argc, char **argv)
snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
- snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
}
switch (ctl_command)
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 5592fe7..f218d25 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -33,9 +33,11 @@
*
* Get controlfile values. The caller is responsible
* for pfreeing the result.
+ *
+ * Returns NULL if the CRC did not match.
*/
ControlFileData *
-get_controlfile(char *DataDir, const char *progname)
+get_controlfile(const char *DataDir, const char *progname)
{
ControlFileData *ControlFile;
int fd;
@@ -82,13 +84,10 @@ get_controlfile(char *DataDir, const char *progname)
FIN_CRC32C(crc);
if (!EQ_CRC32C(crc, ControlFile->crc))
-#ifndef FRONTEND
- elog(ERROR, _("calculated CRC checksum does not match value stored in file"));
-#else
- printf(_("WARNING: Calculated CRC checksum does not match value stored in file.\n"
- "Either the file is corrupt, or it has a different layout than this program\n"
- "is expecting. The results below are untrustworthy.\n\n"));
-#endif
+ {
+ pfree(ControlFile);
+ return NULL;
+ }
/* Make sure the control file is valid byte order. */
if (ControlFile->pg_control_version % 65536 == 0 &&
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index a355d22..f834624 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -12,6 +12,6 @@
#include "catalog/pg_control.h"
-extern ControlFileData *get_controlfile(char *DataDir, const char *progname);
+extern ControlFileData *get_controlfile(const char *DataDir, const char *progname);
#endif /* COMMON_CONTROLDATA_UTILS_H */
--
2.9.2
v2-0004-Delay-updating-control-file-to-in-production.patchtext/x-patch; name=v2-0004-Delay-updating-control-file-to-in-production.patchDownload
From 4938adf0f28ecf83ca5b0d6332b83383d916866e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 26 Jul 2016 12:08:49 -0400
Subject: [PATCH v2 4/5] Delay updating control file to "in production"
Move the updating of the control file to "in production" status until
the point where WAL writes are allowed. Before, there could be a
significant gap between the control file update and write transactions
actually being allowed. This makes it more reliable to use the control
status to verify the end of a promotion.
Michael Paquier
---
src/backend/access/transam/xlog.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..1f8d054 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7349,12 +7349,6 @@ StartupXLOG(void)
*/
InRecovery = false;
- LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- ControlFile->state = DB_IN_PRODUCTION;
- ControlFile->time = (pg_time_t) time(NULL);
- UpdateControlFile();
- LWLockRelease(ControlFileLock);
-
/* start the archive_timeout timer running */
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
@@ -7412,15 +7406,32 @@ StartupXLOG(void)
CompleteCommitTsInitialization();
/*
- * All done. Allow backends to write WAL. (Although the bool flag is
- * probably atomic in itself, we use the info_lck here to ensure that
- * there are no race conditions concerning visibility of other recent
- * updates to shared memory.)
+ * All done with end-of-recovery actions.
+ *
+ * Now allow backends to write WAL and update the control file status in
+ * consequence. The boolean flag allowing backends to write WAL is
+ * updated while holding ControlFileLock to prevent other backends to look
+ * at an inconsistent state of the control file in shared memory. There
+ * is still a small window during which backends can write WAL and the
+ * control file is still referring to a system not in DB_IN_PRODUCTION
+ * state while looking at the on-disk control file.
+ *
+ * Also, although the boolean flag to allow WAL is probably atomic in
+ * itself, we use the info_lck here to ensure that there are no race
+ * conditions concerning visibility of other recent updates to shared
+ * memory.
*/
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->state = DB_IN_PRODUCTION;
+ ControlFile->time = (pg_time_t) time(NULL);
+
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->SharedRecoveryInProgress = false;
SpinLockRelease(&XLogCtl->info_lck);
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+
/*
* If there were cascading standby servers connected to us, nudge any wal
* sender processes to notice that we've been promoted.
--
2.9.2
v2-0005-pg_ctl-Add-wait-option-to-promote-action.patchtext/x-patch; name=v2-0005-pg_ctl-Add-wait-option-to-promote-action.patchDownload
From 88d426cfb25f02aa990fe24b71a55fb4c316daff Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Fri, 5 Aug 2016 21:35:19 -0400
Subject: [PATCH v2 5/5] pg_ctl: Add wait option to promote action
When waiting is selected for the promote action, look into pg_control
until the state changes, then use the PQping-based waiting until the
server is reachable.
---
doc/src/sgml/ref/pg_ctl-ref.sgml | 29 ++++++++++++++++++++------
src/bin/pg_ctl/pg_ctl.c | 45 ++++++++++++++++++++++++++++------------
src/bin/pg_ctl/t/003_promote.pl | 18 +++++++++++++++-
3 files changed, 72 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 6ceb781..a00c355 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -91,6 +91,8 @@
<cmdsynopsis>
<command>pg_ctl</command>
<arg choice="plain"><option>promote</option></arg>
+ <arg choice="opt"><option>-w</option></arg>
+ <arg choice="opt"><option>-t</option> <replaceable>seconds</replaceable></arg>
<arg choice="opt"><option>-s</option></arg>
<arg choice="opt"><option>-D</option> <replaceable>datadir</replaceable></arg>
</cmdsynopsis>
@@ -361,8 +363,8 @@ <title>Options</title>
<term><option>--timeout</option></term>
<listitem>
<para>
- The maximum number of seconds to wait when waiting for startup or
- shutdown to complete. Defaults to the value of the
+ The maximum number of seconds to wait when waiting for an operation
+ to complete (see option <option>-w</option>). Defaults to the value of the
<envar>PGCTLTIMEOUT</> environment variable or, if not set, to 60
seconds.
</para>
@@ -383,8 +385,23 @@ <title>Options</title>
<term><option>-w</option></term>
<listitem>
<para>
- Wait for the startup or shutdown to complete.
- Waiting is the default option for shutdowns, but not startups.
+ Wait for an operation to complete. This is supported for the
+ modes <literal>start</literal>, <literal>stop</literal>,
+ <literal>restart</literal>, <literal>promote</literal>,
+ and <literal>register</literal>.
+ </para>
+
+ <para>
+ Waiting is the default option for shutdowns, but not startups,
+ restarts, or promotions. This is mainly for historical reasons; the
+ waiting option is almost always preferable. If waiting is not
+ selected, the requested action is triggered, but there is no feedback
+ about its success. In that case, the server log file or an external
+ monitoring system would have to be used to check the progress and
+ success of the operation.
+ </para>
+
+ <para>
When waiting for startup, <command>pg_ctl</command> repeatedly
attempts to connect to the server.
When waiting for shutdown, <command>pg_ctl</command> waits for
@@ -400,8 +417,8 @@ <title>Options</title>
<term><option>-W</option></term>
<listitem>
<para>
- Do not wait for startup or shutdown to complete. This is the
- default for start and restart modes.
+ Do not wait for an operation to complete. This is the opposite of the
+ option <option>-w</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 85f6a92..82840bb 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1228,7 +1228,34 @@ do_promote(void)
exit(1);
}
- print_msg(_("server promoting\n"));
+ if (do_wait)
+ {
+ DBState state = DB_STARTUP;
+
+ print_msg(_("waiting for server to promote..."));
+ while (wait_seconds > 0)
+ {
+ state = get_control_dbstate();
+ if (state == DB_IN_PRODUCTION)
+ break;
+
+ print_msg(".");
+ pg_usleep(1000000); /* 1 sec */
+ wait_seconds--;
+ }
+ if (state == DB_IN_PRODUCTION)
+ {
+ print_msg(_(" done\n"));
+ print_msg(_("server promoted\n"));
+ }
+ else
+ {
+ print_msg(_(" stopped waiting\n"));
+ print_msg(_("server is still promoting\n"));
+ }
+ }
+ else
+ print_msg(_("server promoting\n"));
}
@@ -2405,18 +2432,10 @@ main(int argc, char **argv)
if (!wait_set)
{
- switch (ctl_command)
- {
- case RESTART_COMMAND:
- case START_COMMAND:
- do_wait = false;
- break;
- case STOP_COMMAND:
- do_wait = true;
- break;
- default:
- break;
- }
+ if (ctl_command == STOP_COMMAND)
+ do_wait = true;
+ else
+ do_wait = false;
}
if (ctl_command == RELOAD_COMMAND)
diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl
index 1461234..0b6090b 100644
--- a/src/bin/pg_ctl/t/003_promote.pl
+++ b/src/bin/pg_ctl/t/003_promote.pl
@@ -3,7 +3,7 @@
use PostgresNode;
use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 12;
my $tempdir = TestLib::tempdir;
@@ -37,3 +37,19 @@
ok($node_standby->poll_query_until('postgres', 'SELECT NOT pg_is_in_recovery()'),
'promoted standby is not in recovery');
+
+# same again with wait option
+$node_standby = get_new_node('standby2');
+$node_standby->init_from_backup($node_primary, 'my_backup', has_streaming => 1);
+$node_standby->start;
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 't', 'standby is in recovery');
+
+command_ok([ 'pg_ctl', '-D', $node_standby->data_dir, '-w', 'promote' ],
+ 'pg_ctl -w promote of standby runs');
+
+# no wait here
+
+is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'),
+ 'f', 'promoted standby is not in recovery');
--
2.9.2
On Thu, Aug 11, 2016 at 3:24 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 8/7/16 9:44 PM, Michael Paquier wrote:
This is not a good
idea, and the idea of putting a wait argument in get_controlfile does
not seem a good interface to me. I'd rather see get_controlfile be
extended with a flag saying no_error_on_failure and keep the wait
logic within pg_ctl.I guess we could write a wrapper function in pg_ctl that encapsulated
the wait logic.That's what I would do.
New patches, incorporating your suggestions.
Thanks for the new set!
I moved some of the error handling out of get_controlfile() and back
into the callers, because it was getting too weird that that function
knew so much about the callers' intentions. That way we don't actually
have to change the signature.
I have looked at them and the changes are looking fine for me. So I
have switched the patch as ready for committer, aka you.
Just a nit:
+ if (wait_seconds > 0)
+ {
+ sleep(1);
+ wait_seconds--;
+ continue;
+ }
This may be better this pg_usleep() instead of sleep().
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/11/16 9:28 AM, Michael Paquier wrote:
I have looked at them and the changes are looking fine for me. So I
have switched the patch as ready for committer, aka you.Just a nit: + if (wait_seconds > 0) + { + sleep(1); + wait_seconds--; + continue; + } This may be better this pg_usleep() instead of sleep().
Committed with that adjustment.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 22, 2016 at 1:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 8/11/16 9:28 AM, Michael Paquier wrote:
I have looked at them and the changes are looking fine for me. So I
have switched the patch as ready for committer, aka you.Just a nit: + if (wait_seconds > 0) + { + sleep(1); + wait_seconds--; + continue; + } This may be better this pg_usleep() instead of sleep().Committed with that adjustment.
Commit e7010ce seems to forget to change help message of pg_ctl.
Attached patch.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
fix_pg_ctl_help.patchapplication/octet-stream; name=fix_pg_ctl_help.patchDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 2f0976a..17643cd 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1943,7 +1943,7 @@ do_help(void)
" [-o \"OPTIONS\"]\n"), progname);
printf(_(" %s reload [-D DATADIR] [-s]\n"), progname);
printf(_(" %s status [-D DATADIR]\n"), progname);
- printf(_(" %s promote [-D DATADIR] [-s]\n"), progname);
+ printf(_(" %s promote [-w] [-t SECS] [-D DATADIR] [-s]\n"), progname);
printf(_(" %s kill SIGNALNAME PID\n"), progname);
#ifdef WIN32
printf(_(" %s register [-N SERVICENAME] [-U USERNAME] [-P PASSWORD] [-D DATADIR]\n"
On Fri, Sep 23, 2016 at 12:16 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Sep 22, 2016 at 1:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:On 8/11/16 9:28 AM, Michael Paquier wrote:
Committed with that adjustment.
Thanks...
Commit e7010ce seems to forget to change help message of pg_ctl.
Attached patch.
And right, we missed that bit.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/22/16 11:16 AM, Masahiko Sawada wrote:
Commit e7010ce seems to forget to change help message of pg_ctl.
Attached patch.
Fixed, thanks. I also added the -t option and reformatted a bit.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers