PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
Hi,
"pg_ctl stop" does't work propley, if --slot option is specified when WAL is flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger.
How to reproduce:
1.Start PostgreSQL
2.Create slot
3.Specify --slot option to pg_receivexlog and start it
4.Stop PostgreSQL
Example commands for reproduce:
1.pg_ctl start
2.psql -c "select pg_create_physical_replication_slot('test');"
3.pg_receivexlog --slot='test' -D ./testdir
4.pg_ctl stop
This happen cause --slot option set a flush location to feedback messages,
but it only flush when wal has switched, so walsender wait for the WAL have been replicated forever.
When --slot option is not specified, 'invalid' is set to flush location and it use write location to check replication so it can stop the process propley.
So I thought set 'invalid' to flush location as well in this case, this problem will be solved.
Does --slot option need to set flush location?
Thought?
Regards,
--
Furuya Osamu
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 14, 2014 at 7:22 PM, <furuyao@pm.nttdata.co.jp> wrote:
Hi,
"pg_ctl stop" does't work propley, if --slot option is specified when WAL is flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger.
I could reproduce this problem. At normal shutdown, walsender keeps waiting
for the last WAL record to be replicated and flushed in pg_receivexlog. But
pg_receivexlog issues sync command only when WAL file is switched. Thus,
since pg_receivexlog may never flush the last WAL record, walsender may have
to keep waiting infinitely.
pg_recvlogical handles this problem by calling fsync() when it receives the
request of immediate reply from the server. That is, at shutdown, walsender
sends the request, pg_receivexlog receives it, flushes the last WAL record,
and sends the flush location back to the server. Since walsender can see that
the last WAL record is successfully flushed in pg_receivexlog, it can
exit cleanly.
One idea to the problem is to introduce the same logic as pg_recvlogical has,
to pg_receivexlog. Thought?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
On Fri, Nov 14, 2014 at 7:22 PM, <furuyao@pm.nttdata.co.jp> wrote:
Hi,
"pg_ctl stop" does't work propley, if --slot option is specified when WAL is flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger.I could reproduce this problem. At normal shutdown, walsender keeps waiting
for the last WAL record to be replicated and flushed in pg_receivexlog. But
pg_receivexlog issues sync command only when WAL file is switched. Thus,
since pg_receivexlog may never flush the last WAL record, walsender may have
to keep waiting infinitely.
Right.
pg_recvlogical handles this problem by calling fsync() when it receives the
request of immediate reply from the server. That is, at shutdown, walsender
sends the request, pg_receivexlog receives it, flushes the last WAL record,
and sends the flush location back to the server. Since walsender can see that
the last WAL record is successfully flushed in pg_receivexlog, it can
exit cleanly.One idea to the problem is to introduce the same logic as pg_recvlogical has,
to pg_receivexlog. Thought?
Sounds sane to me. Are you looking into doing that?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
On Fri, Nov 14, 2014 at 7:22 PM, <furuyao@pm.nttdata.co.jp> wrote:
"pg_ctl stop" does't work propley, if --slot option is specified when WAL is flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger.I could reproduce this problem. At normal shutdown, walsender keeps waiting
for the last WAL record to be replicated and flushed in pg_receivexlog. But
pg_receivexlog issues sync command only when WAL file is switched. Thus,
since pg_receivexlog may never flush the last WAL record, walsender may have
to keep waiting infinitely.Right.
It is surprising that nobody complained about that before,
pg_receivexlog has been released two years ago.
pg_recvlogical handles this problem by calling fsync() when it receives the
request of immediate reply from the server. That is, at shutdown, walsender
sends the request, pg_receivexlog receives it, flushes the last WAL record,
and sends the flush location back to the server. Since walsender can see that
the last WAL record is successfully flushed in pg_receivexlog, it can
exit cleanly.One idea to the problem is to introduce the same logic as pg_recvlogical has,
to pg_receivexlog. Thought?Sounds sane to me. Are you looking into doing that?
Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.
--
Michael
Attachments:
20141115_pg_receivexlog_flush_ptr.patchtext/x-patch; charset=US-ASCII; name=20141115_pg_receivexlog_flush_ptr.patchDownload
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index c6c90fb..b93d52b 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -149,6 +149,34 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir,
}
/*
+ * Flush the current WAL file to disk and update the last WAL flush
+ * positions as well as the last fsync timestamp. On failure, print
+ * a message to stderr and return false, otherwise return true.
+ */
+static bool
+fsync_walfile(XLogRecPtr blockpos, int64 timestamp)
+{
+ /* nothing to do if no WAL file open */
+ if (walfile == -1)
+ return true;
+
+ /* nothing to do if data has already been flushed */
+ if (blockpos <= lastFlushPosition)
+ return true;
+
+ if (fsync(walfile) != 0)
+ {
+ fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+ progname, current_walfile_name, strerror(errno));
+ return false;
+ }
+
+ lastFlushPosition = blockpos;
+ last_fsync = timestamp;
+ return true;
+}
+
+/*
* Close the current WAL file (if open), and rename it to the correct
* filename if it's complete. On failure, prints an error message to stderr
* and returns false, otherwise returns true.
@@ -787,21 +815,12 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
* If fsync_interval has elapsed since last WAL flush and we've written
* some WAL data, flush them to disk.
*/
- if (lastFlushPosition < blockpos &&
- walfile != -1 &&
- ((fsync_interval > 0 &&
- feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) ||
- fsync_interval < 0))
+ if ((fsync_interval > 0 &&
+ feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) ||
+ fsync_interval < 0)
{
- if (fsync(walfile) != 0)
- {
- fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
- progname, current_walfile_name, strerror(errno));
+ if (!fsync_walfile(blockpos, now))
goto error;
- }
-
- lastFlushPosition = blockpos;
- last_fsync = now;
}
/*
@@ -999,7 +1018,6 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
{
int pos;
bool replyRequested;
- int64 now;
/*
* Parse the keepalive message, enclosed in the CopyData message.
@@ -1021,7 +1039,12 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
/* If the server requested an immediate reply, send one. */
if (replyRequested && still_sending)
{
- now = feGetCurrentTimestamp();
+ int64 now = feGetCurrentTimestamp();
+
+ /* fsync data, so a fresh flush position is sent */
+ if (!fsync_walfile(blockpos, now))
+ return false;
+
if (!sendFeedback(conn, blockpos, now, false))
return false;
*last_status = now;
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
On Fri, Nov 14, 2014 at 7:22 PM, <furuyao@pm.nttdata.co.jp> wrote:
"pg_ctl stop" does't work propley, if --slot option is specified when WAL is flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger.I could reproduce this problem. At normal shutdown, walsender keeps waiting
for the last WAL record to be replicated and flushed in pg_receivexlog. But
pg_receivexlog issues sync command only when WAL file is switched. Thus,
since pg_receivexlog may never flush the last WAL record, walsender may have
to keep waiting infinitely.Right.
It is surprising that nobody complained about that before,
pg_receivexlog has been released two years ago.
It's not so surprising because the problem can happen only when
replication slot is specified, i.e., the version is 9.4 or later.
pg_recvlogical handles this problem by calling fsync() when it receives the
request of immediate reply from the server. That is, at shutdown, walsender
sends the request, pg_receivexlog receives it, flushes the last WAL record,
and sends the flush location back to the server. Since walsender can see that
the last WAL record is successfully flushed in pg_receivexlog, it can
exit cleanly.One idea to the problem is to introduce the same logic as pg_recvlogical has,
to pg_receivexlog. Thought?Sounds sane to me. Are you looking into doing that?
Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.
Isn't it better to do this only when replication slot is used?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.Isn't it better to do this only when replication slot is used?
Makes sense. What about a check using reportFlushPosition then?
--
Michael
Attachments:
20141117_pg_receivexlog_flush_ptr.patchtext/x-patch; charset=US-ASCII; name=20141117_pg_receivexlog_flush_ptr.patchDownload
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index c6c90fb..b426cfe 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -149,6 +149,34 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir,
}
/*
+ * Flush the current WAL file to disk and update the last WAL flush
+ * positions as well as the last fsync timestamp. On failure, print
+ * a message to stderr and return false, otherwise return true.
+ */
+static bool
+fsync_walfile(XLogRecPtr blockpos, int64 timestamp)
+{
+ /* nothing to do if no WAL file open */
+ if (walfile == -1)
+ return true;
+
+ /* nothing to do if data has already been flushed */
+ if (blockpos <= lastFlushPosition)
+ return true;
+
+ if (fsync(walfile) != 0)
+ {
+ fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+ progname, current_walfile_name, strerror(errno));
+ return false;
+ }
+
+ lastFlushPosition = blockpos;
+ last_fsync = timestamp;
+ return true;
+}
+
+/*
* Close the current WAL file (if open), and rename it to the correct
* filename if it's complete. On failure, prints an error message to stderr
* and returns false, otherwise returns true.
@@ -787,21 +815,12 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
* If fsync_interval has elapsed since last WAL flush and we've written
* some WAL data, flush them to disk.
*/
- if (lastFlushPosition < blockpos &&
- walfile != -1 &&
- ((fsync_interval > 0 &&
- feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) ||
- fsync_interval < 0))
+ if ((fsync_interval > 0 &&
+ feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) ||
+ fsync_interval < 0)
{
- if (fsync(walfile) != 0)
- {
- fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
- progname, current_walfile_name, strerror(errno));
+ if (!fsync_walfile(blockpos, now))
goto error;
- }
-
- lastFlushPosition = blockpos;
- last_fsync = now;
}
/*
@@ -999,7 +1018,6 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
{
int pos;
bool replyRequested;
- int64 now;
/*
* Parse the keepalive message, enclosed in the CopyData message.
@@ -1021,7 +1039,15 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
/* If the server requested an immediate reply, send one. */
if (replyRequested && still_sending)
{
- now = feGetCurrentTimestamp();
+ int64 now = feGetCurrentTimestamp();
+
+ /*
+ * Flush data, so a fresh position is sent but do this only if a
+ * flush position needs to be reported.
+ */
+ if (reportFlushPosition && !fsync_walfile(blockpos, now))
+ return false;
+
if (!sendFeedback(conn, blockpos, now, false))
return false;
*last_status = now;
On Mon, Nov 17, 2014 at 12:30 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.Isn't it better to do this only when replication slot is used?
Makes sense. What about a check using reportFlushPosition then?
Sounds reasonable. Thanks for updating the patch!
But the patch could not already be applied to the master cleanly
because c4f99d2 heavily changed the code that the patch also touches...
I rewrote the patch and pushed it to both master and REL9_4_STABLE.
Anyway, thanks!
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 19, 2014 at 6:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Nov 17, 2014 at 12:30 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.Isn't it better to do this only when replication slot is used?
Makes sense. What about a check using reportFlushPosition then?
Sounds reasonable. Thanks for updating the patch!
But the patch could not already be applied to the master cleanly
because c4f99d2 heavily changed the code that the patch also touches...
I rewrote the patch and pushed it to both master and REL9_4_STABLE.
Anyway, thanks!
Is this:
+ if (reportFlushPosition && lastFlushPosition < blockpos &&
+ walfile != 1)
really correct? Shouldn't that walfile test be against -1 (minus one)?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 19, 2014 at 5:15 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Nov 19, 2014 at 6:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Nov 17, 2014 at 12:30 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.Isn't it better to do this only when replication slot is used?
Makes sense. What about a check using reportFlushPosition then?
Sounds reasonable. Thanks for updating the patch!
But the patch could not already be applied to the master cleanly
because c4f99d2 heavily changed the code that the patch also touches...
I rewrote the patch and pushed it to both master and REL9_4_STABLE.
Anyway, thanks!Is this:
+ if (reportFlushPosition && lastFlushPosition < blockpos && + walfile != 1)really correct? Shouldn't that walfile test be against -1 (minus one)?
Ooops... You're right. That's really mistake... Just fixed and pushed. Thanks!
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers