pg_receivexlog stops upon server restart
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.
--
Thom
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.
This is intentional as far as that's how the code was written, there's not
a malfunctioning piece of code somewhere.
It would probably make sense to have an auto-reconnect feature, and to have
an option to turn it on/off.
If you haven't already (my wifi here is currently quite useless, which is
why I'm working on my email backlog, so I can't check), please add it to
the open items list.
//Magnus
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 10 April 2012 21:07, Magnus Hagander <magnus@hagander.net> wrote:
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.This is intentional as far as that's how the code was written, there's not a
malfunctioning piece of code somewhere.It would probably make sense to have an auto-reconnect feature, and to have
an option to turn it on/off.If you haven't already (my wifi here is currently quite useless, which is
why I'm working on my email backlog, so I can't check), please add it to the
open items list.
I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.
--
Thom
On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown <thom@linux.com> wrote:
On 10 April 2012 21:07, Magnus Hagander <magnus@hagander.net> wrote:
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.This is intentional as far as that's how the code was written, there's not a
malfunctioning piece of code somewhere.It would probably make sense to have an auto-reconnect feature, and to have
an option to turn it on/off.If you haven't already (my wifi here is currently quite useless, which is
why I'm working on my email backlog, so I can't check), please add it to the
open items list.I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.
Attached is a patch that implements this. Seems reasonable?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Attachments:
pg_receivexlog_loop.patchapplication/octet-stream; name=pg_receivexlog_loop.patchDownload
diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index 8e5fca4..ee31065 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -58,6 +58,14 @@ PostgreSQL documentation
configured with <xref linkend="guc-max-wal-senders"> set high enough to
leave at least one session available for the stream.
</para>
+
+ <para>
+ If the connection is lost, or if it cannot be initially established,
+ with a non fatal error, <application>pg_receivexlog</application> will
+ retry the connection indefinitely, and reestablish streaming as soon
+ as possible. To avoid this behavior, use the <literal>-n</literal>
+ parameter.
+ </para>
</refsect1>
<refsect1>
@@ -87,6 +95,17 @@ PostgreSQL documentation
<variablelist>
<varlistentry>
+ <term><option>-n</option></term>
+ <term><option>--noloop</option></term>
+ <listitem>
+ <para>
+ Don't loop on connection errors. Instead, exit right away with
+ an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-v</option></term>
<term><option>--verbose</option></term>
<listitem>
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0289c4b..b08afbd 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -280,6 +280,9 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
/* Get a second connection */
param->bgconn = GetConnection();
+ if (!param->bgconn)
+ /* Error message already written in GetConnection() */
+ exit(1);
/*
* Always in plain format, so we can write to basedir/pg_xlog. But the
@@ -916,6 +919,9 @@ BaseBackup(void)
* Connect in replication mode to the server
*/
conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ exit(1);
/*
* Run IDENTIFY_SYSTEM so we can get the timeline
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 2134c87..c1fae27 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -33,9 +33,13 @@
#include "getopt_long.h"
+/* Time to sleep between reconnection attempts */
+#define RECONNECT_SLEEP_TIME 5
+
/* Global options */
char *basedir = NULL;
int verbose = 0;
+int noloop = 0;
int standby_message_timeout = 10; /* 10 sec = default */
volatile bool time_to_abort = false;
@@ -55,6 +59,7 @@ usage(void)
printf(_("\nOptions controlling the output:\n"));
printf(_(" -D, --dir=directory receive xlog files into this directory\n"));
printf(_("\nGeneral options:\n"));
+ printf(_(" -n, --noloop do not loop on connection lost\n"));
printf(_(" -v, --verbose output verbose messages\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_(" -V, --version output version information, then exit\n"));
@@ -223,6 +228,9 @@ StreamLog(void)
* Connect in replication mode to the server
*/
conn = GetConnection();
+ if (!conn)
+ /* Error message already written in GetConnection() */
+ return;
/*
* Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
@@ -326,7 +334,7 @@ main(int argc, char **argv)
}
}
- while ((c = getopt_long(argc, argv, "D:h:p:U:s:wWv",
+ while ((c = getopt_long(argc, argv, "D:h:p:U:s:nwWv",
long_options, &option_index)) != -1)
{
switch (c)
@@ -364,6 +372,9 @@ main(int argc, char **argv)
exit(1);
}
break;
+ case 'n':
+ noloop = 1;
+ break;
case 'v':
verbose++;
break;
@@ -406,7 +417,28 @@ main(int argc, char **argv)
pqsignal(SIGINT, sigint_handler);
#endif
- StreamLog();
+ while (true)
+ {
+ StreamLog();
+ if (time_to_abort)
+ /*
+ * We've been Ctrl-C'ed. That's not an error, so exit without
+ * an errorcode.
+ */
+ exit(0);
+ else if (noloop)
+ {
+ fprintf(stderr, _("%s: disconnected.\n"), progname);
+ exit(1);
+ }
+ else
+ {
+ fprintf(stderr, _("%s: disconnected. Waiting %d seconds to try again\n"),
+ progname, RECONNECT_SLEEP_TIME);
+ pg_usleep(RECONNECT_SLEEP_TIME * 1000000);
+ }
+ }
- exit(0);
+ /* Never get here */
+ exit(2);
}
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index cc01537..1416faa 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -65,6 +65,11 @@ xmalloc0(int size)
}
+/*
+ * Connect to the server. Returns a valid PGconn pointer if connected,
+ * or NULL on non-permanent error. On permanent error, the function will
+ * call exit(1) directly.
+ */
PGconn *
GetConnection(void)
{
@@ -151,7 +156,7 @@ GetConnection(void)
{
fprintf(stderr, _("%s: could not connect to server: %s\n"),
progname, PQerrorMessage(tmpconn));
- exit(1);
+ return NULL;
}
/* Connection ok! */
On 24 May 2012 13:05, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown <thom@linux.com> wrote:
On 10 April 2012 21:07, Magnus Hagander <magnus@hagander.net> wrote:
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.This is intentional as far as that's how the code was written, there's not a
malfunctioning piece of code somewhere.It would probably make sense to have an auto-reconnect feature, and to have
an option to turn it on/off.If you haven't already (my wifi here is currently quite useless, which is
why I'm working on my email backlog, so I can't check), please add it to the
open items list.I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.Attached is a patch that implements this. Seems reasonable?
s/non fatal/non-fatal/
Yes, this solves the problem for me, except you forgot to translate
noloop in long_options[] . :)
--
Thom
On Thu, May 24, 2012 at 2:34 PM, Thom Brown <thom@linux.com> wrote:
On 24 May 2012 13:05, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown <thom@linux.com> wrote:
On 10 April 2012 21:07, Magnus Hagander <magnus@hagander.net> wrote:
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.This is intentional as far as that's how the code was written, there's not a
malfunctioning piece of code somewhere.It would probably make sense to have an auto-reconnect feature, and to have
an option to turn it on/off.If you haven't already (my wifi here is currently quite useless, which is
why I'm working on my email backlog, so I can't check), please add it to the
open items list.I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.Attached is a patch that implements this. Seems reasonable?
s/non fatal/non-fatal/
Yes, this solves the problem for me, except you forgot to translate
noloop in long_options[] . :)
Fixed :-)
Did you test it, or just assumed it worked? ;)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 24 May 2012 13:37, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, May 24, 2012 at 2:34 PM, Thom Brown <thom@linux.com> wrote:
On 24 May 2012 13:05, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown <thom@linux.com> wrote:
On 10 April 2012 21:07, Magnus Hagander <magnus@hagander.net> wrote:
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.This is intentional as far as that's how the code was written, there's not a
malfunctioning piece of code somewhere.It would probably make sense to have an auto-reconnect feature, and to have
an option to turn it on/off.If you haven't already (my wifi here is currently quite useless, which is
why I'm working on my email backlog, so I can't check), please add it to the
open items list.I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.Attached is a patch that implements this. Seems reasonable?
s/non fatal/non-fatal/
Yes, this solves the problem for me, except you forgot to translate
noloop in long_options[] . :)Fixed :-)
Did you test it, or just assumed it worked? ;)
How very dare you. Of course I tested it. It successfully reconnects
on multiple restarts, checks intermittently when I've stopped the
server, showing the connection error message, successfully continues
when I eventually bring the server back up, and doesn't attempt a
reconnect when using -n.
So looks good to me.
--
Thom
On Thursday, May 24, 2012, Thom Brown wrote:
On 24 May 2012 13:37, Magnus Hagander <magnus@hagander.net <javascript:;>>
wrote:On Thu, May 24, 2012 at 2:34 PM, Thom Brown <thom@linux.com<javascript:;>>
wrote:
On 24 May 2012 13:05, Magnus Hagander <magnus@hagander.net<javascript:;>>
wrote:
On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown <thom@linux.com<javascript:;>>
wrote:
On 10 April 2012 21:07, Magnus Hagander <magnus@hagander.net<javascript:;>>
wrote:
On Friday, April 6, 2012, Thom Brown wrote:
Hi,
I've tried out pg_receivexlog and have noticed that when restarting
the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
This is surprising as the DBA would have to remember to start
pg_receivexlog up again.This is intentional as far as that's how the code was written,
there's not a
malfunctioning piece of code somewhere.
It would probably make sense to have an auto-reconnect feature, and
to have
an option to turn it on/off.
If you haven't already (my wifi here is currently quite useless,
which is
why I'm working on my email backlog, so I can't check), please add
it to the
open items list.
I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.Attached is a patch that implements this. Seems reasonable?
s/non fatal/non-fatal/
Yes, this solves the problem for me, except you forgot to translate
noloop in long_options[] . :)Fixed :-)
Did you test it, or just assumed it worked? ;)
How very dare you. Of course I tested it. It successfully reconnects
on multiple restarts, checks intermittently when I've stopped the
server, showing the connection error message, successfully continues
when I eventually bring the server back up, and doesn't attempt a
reconnect when using -n.So looks good to me.
Thanks - applied!
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/