pg_ctl promote wait

Started by Peter Eisentrautabout 10 years ago31 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

It would be nice if pg_ctl promote supported the -w (wait) option.

How could pg_ctl determine when the promotion has finished?

We could query pg_is_in_recovery(), but that would require database
access, which we got rid of in pg_ctl.

We could check when recovery.conf is gone or recovery.done appears.
This could work, but I think some people are trying to get rid of these
files, so building a feature on that might be a problem? (I think the
latest news on that was that the files might be different in this
hypothetical future but there would still be a file to prevent
re-recovery on restart.)

Other ideas?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Eisentraut (#1)
Re: pg_ctl promote wait

On 18 February 2016 at 02:47, Peter Eisentraut <peter_e@gmx.net> wrote:

It would be nice if pg_ctl promote supported the -w (wait) option.

How could pg_ctl determine when the promotion has finished?

We could query pg_is_in_recovery(), but that would require database
access, which we got rid of in pg_ctl.

We could check when recovery.conf is gone or recovery.done appears.
This could work, but I think some people are trying to get rid of these
files, so building a feature on that might be a problem? (I think the
latest news on that was that the files might be different in this
hypothetical future but there would still be a file to prevent
re-recovery on restart.)

Other ideas?

We use a trigger file under the covers, so a promotion complete file seems
useful also, but crappy.

Perhaps we should have a Server Status file that shows Standby or Master,
obviously not updated on crash.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#1)
Re: pg_ctl promote wait

Hi,

On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:

It would be nice if pg_ctl promote supported the -w (wait) option.

How could pg_ctl determine when the promotion has finished?

How about looking into pg_control? ControlFileData->state ought to have
the correct information.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#3)
Re: pg_ctl promote wait

On 18 February 2016 at 08:33, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:

It would be nice if pg_ctl promote supported the -w (wait) option.

How could pg_ctl determine when the promotion has finished?

How about looking into pg_control? ControlFileData->state ought to have
the correct information.

+1

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#4)
Re: pg_ctl promote wait

On Thu, Feb 18, 2016 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 18 February 2016 at 08:33, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:

It would be nice if pg_ctl promote supported the -w (wait) option.

+1

How could pg_ctl determine when the promotion has finished?

How about looking into pg_control? ControlFileData->state ought to have
the correct information.

+1

One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#3)
Re: pg_ctl promote wait

On 2/18/16 3:33 AM, Andres Freund wrote:

Hi,

On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:

It would be nice if pg_ctl promote supported the -w (wait) option.

How could pg_ctl determine when the promotion has finished?

How about looking into pg_control? ControlFileData->state ought to have
the correct information.

Is it safe to read pg_control externally without a lock? pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again. But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Fujii Masao (#5)
Re: pg_ctl promote wait

On 2/19/16 10:06 AM, Fujii Masao wrote:

One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.

I think that window would be acceptable.

If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG(). I'm not sure that that is
worth it.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#6)
Re: pg_ctl promote wait

On 2016-02-19 13:48:52 -0500, Peter Eisentraut wrote:

Is it safe to read pg_control externally without a lock? pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again. But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.

Yes, the OS should give sufficient guarantees here:

If write() is interrupted by a signal before it writes any data, it shall return −1 with errno set to [EINTR].

If write() is interrupted by a signal after it successfully writes some data, it shall return the number of bytes written.

If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.

After a write() to a regular file has successfully returned:

* Any successful read() from each byte position in the file that was modified by that write shall return the data specified
by the write() for that position until such byte positions are again modified.

We currently assume that all "small" writes succeed in one go (and throw
errors if not). Thus the guarantees by read/write are sufficient.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#5)
Re: pg_ctl promote wait

On 2016-02-20 00:06:09 +0900, Fujii Masao wrote:

One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.

It might make sense to have one more state type, which is used when
ending recovery, but before full access is allowed.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#6)
Re: pg_ctl promote wait

Peter Eisentraut <peter_e@gmx.net> writes:

Is it safe to read pg_control externally without a lock? pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again. But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.

Just retry the read till you don't get a CRC error.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#7)
Re: pg_ctl promote wait

Peter Eisentraut <peter_e@gmx.net> writes:

On 2/19/16 10:06 AM, Fujii Masao wrote:

One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.

I think that window would be acceptable.

If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG(). I'm not sure that that is
worth it.

I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#11)
Re: pg_ctl promote wait

On 2016-02-19 15:09:58 -0500, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On 2/19/16 10:06 AM, Fujii Masao wrote:

One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.

I think that window would be acceptable.

If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG(). I'm not sure that that is
worth it.

I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.

That's afaics not sufficient if the standby was using hot standby, as
that'll let -w succeed immediately, no?

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: pg_ctl promote wait

Andres Freund <andres@anarazel.de> writes:

On 2016-02-19 15:09:58 -0500, Tom Lane wrote:

I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.

That's afaics not sufficient if the standby was using hot standby, as
that'll let -w succeed immediately, no?

Oh, now I get Fujii-san's point. Yes, that wouldn't prove anything
about whether you could do a write transaction immediately.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#11)
Re: pg_ctl promote wait

On 2/19/16 3:09 PM, Tom Lane wrote:

I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.

Here is a patch set around that idea.

The subsequent discussion mentioned that there might still be a window
between end of waiting and when read-write queries would be accepted. I
don't know how big that window would be in practice and would be
interested in some testing and feedback.

Attachments:

0001-pg_ctl-Add-tests-for-promote-action.patchapplication/x-patch; name=0001-pg_ctl-Add-tests-for-promote-action.patchDownload+73-1
0002-pg_ctl-Detect-current-standby-state-from-pg_control.patchapplication/x-patch; name=0002-pg_ctl-Detect-current-standby-state-from-pg_control.patchDownload+57-4
0003-pg_ctl-Add-wait-option-to-promote-action.patchapplication/x-patch; name=0003-pg_ctl-Add-wait-option-to-promote-action.patchDownload+82-21
#15Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#14)
Re: pg_ctl promote wait

On Mon, Feb 29, 2016 at 10:30 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On 2/19/16 3:09 PM, Tom Lane wrote:

I see no need for an additional mechanism. Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.

Here is a patch set around that idea.

The subsequent discussion mentioned that there might still be a window
between end of waiting and when read-write queries would be accepted. I
don't know how big that window would be in practice and would be
interested in some testing and feedback.

Here is some input for 0001 (useful taken independently):
+$node_primary->append_conf(
+   "postgresql.conf", qq(
+wal_level = hot_standby
+max_wal_senders = 2
+wal_keep_segments = 20
+hot_standby = on
+)
+   );
That's more or less allows_streaming => 1 in $node_primary->init.
+$node_standby->append_conf(
+   "recovery.conf", qq(
+primary_conninfo='$connstr_primary'
+standby_mode=on
+recovery_target_timeline='latest'
+)
+   );
Here you could just use enable_streaming => 1 when calling init_from_backup.

+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
$node_standby->psql instead of a direct command? The result is
returned directly with the call to the routine.

+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+                           qr/^f$/,
+                           'promoted standby is not in recovery');
Once again $node_standby->psql?
+sleep 3;  # needs a moment to react
+
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+                           qr/^f$/,
+                           'promoted standby is not in recovery');
sleep() is something we should try to avoid as much as possible in our
tests. On slow platforms, take hamster, promote is surely going to
take longer than that to be processed by the standby node and put it
out of recovery. I would suggest using
$node_standby->poll_query_until('SELECT pg_is_in_recovery()') to
validate the end of the test.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#15)
Re: pg_ctl promote wait

On Mon, Feb 29, 2016 at 4:28 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I would suggest using
$node_standby->poll_query_until('SELECT pg_is_in_recovery()') to
validate the end of the test.

Meh. SELECT NOT pg_is_in_recovery(). This will wait until the query
returns true.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#16)
Re: pg_ctl promote wait

On Mon, Feb 29, 2016 at 4:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Mon, Feb 29, 2016 at 4:28 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I would suggest using
$node_standby->poll_query_until('SELECT pg_is_in_recovery()') to
validate the end of the test.

Meh. SELECT NOT pg_is_in_recovery(). This will wait until the query
returns true.

Here are some comments about 0002

+       if ((fd = open(control_file_path, O_RDONLY | PG_BINARY, 0)) == -1)
+       {
+           fprintf(stderr, _("%s: could not open file \"%s\" for
reading: %s\n"),
+                   progname, control_file_path, strerror(errno));
+           exit(1);
+       }
[...]
Most of the logic of get_control_dbstate() is a mimic of the
recently-introduced get_controlfile() in controldata_utils.c of
dc7d70e. I think that we had better use that, and that we had better
emit an error should an incorrect control file be found while running
those pg_ctl commands as the control file present had better have a
correct CRC all the time or something wrong is going on. So this would
lead to this logic for get_control_dbstate():
control_file_data = get_controlfile(pg_data, progname);
res = control_file_data->state;
pfree(control_file_data);

Except that, 0002 is a good thing to have, switching from the presence
of recovery.conf to what is in the control data file is definitely
more robust, a lot of things happen from when recovery.conf is renamed
to recovery.done until WAL is enabled for backends, particularly the
end of recovery checkpoint and the cleanup of the WAL segments of the
previous timeline.

And now for 0003...

+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+                           qr/^t$/,
+                           'standby is in recovery');
[...]
+$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT
pg_is_in_recovery()'],
+                           qr/^f$/,
+                           'promoted standby is not in recovery');
for those two you can use $node_standby->psql_safe to get back a query result.

The subsequent discussion mentioned that there might still be a window
between end of waiting and when read-write queries would be accepted. I
don't know how big that window would be in practice and would be
interested in some testing and feedback.

And so... Based on the previous discussion, there is an interval of
time between the moment the update of the control file is done and the
point where backends are allowed to emit WAL. I am really worrying
about this interval of time actually, as once pg_ctl exits client
applications should be guaranteed to connect to the server but the
current patch would not be failure-proof, and I imagine that
particularly on CPU-constrained environments this is going to become
unstable. Particularly I expect that slow machines are likely going to
fail in the last test of 003_promote.pl as designed (I am away from
home now so I have not been able to test that unfortunately on my own
stuff but that's possible) because pg_is_in_recovery is controlled by
SharedRecoveryInProgress, so it may be possible that
pg_is_in_recovery() returns false while the control file status is
DB_IN_PRODUCTION. The main factor that can contribute to a larger
window is a higher number of 2PC transactions that need to be loaded
back to shared memory after scanning pg_twophase.

If we are going to have a reliable promote wait mode for pg_ctl, I
think that we had better first reduce this window, something that
could be done is to update SharedRecoveryInProgress while holding an
exclusive lock on ControlFileLock, with this flow for example. See for
example the patch attached, we can reduce this window to zero for
backends if some of them refer to ControlFile in shared memory thanks
to ControlFileLock. For clients, there will still be a small window
during which backends could write WAL and the control file status is
ARCHIVE_RECOVERY on disk. If we want to have a reliable promote wait
mode for pg_ctl, I think that we had better do something like the
attached first. Thoughts?

Looking at where is used the shared memory structure of ControlFile,
one thing to worry about is CreateRestartPoint but its code paths are
already using ControlFileLock when looking at the status file, so they
are safe with this logic.
--
Michael

Attachments:

delay-status-update-end-recovery.patchtext/x-patch; charset=US-ASCII; name=delay-status-update-end-recovery.patchDownload+21-10
#18David Steele
david@pgmasters.net
In reply to: Michael Paquier (#17)
Re: pg_ctl promote wait

Hi Peter,

On 3/9/16 3:08 PM, Michael Paquier wrote:

Here are some comments about 0002

<...>

I think that we had better do something like the attached first.
Thoughts?

It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19David Steele
david@pgmasters.net
In reply to: David Steele (#18)
Re: pg_ctl promote wait

On 3/16/16 12:19 PM, David Steele wrote:

Hi Peter,

On 3/9/16 3:08 PM, Michael Paquier wrote:

Here are some comments about 0002

<...>

I think that we had better do something like the attached first.
Thoughts?

It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?

Bump.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#19)
Re: pg_ctl promote wait

On Wed, Mar 23, 2016 at 1:47 AM, David Steele <david@pgmasters.net> wrote:

On 3/16/16 12:19 PM, David Steele wrote:

Hi Peter,

On 3/9/16 3:08 PM, Michael Paquier wrote:

Here are some comments about 0002

<...>

I think that we had better do something like the attached first.
Thoughts?

It's been a week since Michael reviewed this patch. Could you please
comment on his proposed changes as soon as possible?

Bump.

Feature freeze is getting close by. Once the deadline is reached, I
guess that we had better return the patch.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#20)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#22)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#24)
#26Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#26)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#27)
#29Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#29)
#31Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#29)