Primary not sending to synchronous standby
Hi,
I've noticed that if the primary is started and then a base backup is
immediately taken from it and started as as a synchronous standby, it
doesn't replicate and the primary hangs indefinitely when trying to run any
WAL-generating statements. It only recovers when either the primary is
restarted (which has to use a fast shutdown otherwise it also hangs
forever), or the standby is restarted.
Here's a way of reproducing it:
-------------------------------
mkdir -p -m 0700 primary standby1
initdb -N -k -D primary -E 'UTF8'
cat << PRIMARYCONFIG >> primary/postgresql.conf
shared_buffers = 8MB
logging_collector = on
log_line_prefix = '%m - %u - %d'
synchronous_standby_names = 'standby1'
max_connections = 8
wal_level = 'hot_standby'
port = 5530
max_wal_senders = 3
wal_keep_segments = 6
PRIMARYCONFIG
cat << PRIMARYHBA >> primary/pg_hba.conf
local replication rep_user trust
host replication rep_user 127.0.0.1/32 trust
host replication rep_user ::1/128 trust
PRIMARYHBA
pg_ctl start -D primary
psql -p 5530 -h localhost -c 'SET SESSION synchronous_commit TO
'off';CREATE USER rep_user REPLICATION;;' -d postgres
pg_basebackup -x -D standby1 -h localhost -p 5530 -U rep_user
cat << STANDBYCONFIG >> standby1/postgresql.conf
port = 5531
hot_standby = on
STANDBYCONFIG
cat << STANDBYRECOVERY >> standby1/recovery.conf
standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'host=127.0.0.1 user=rep_user port=5530
application_name=standby1'
STANDBYRECOVERY
pg_ctl -D standby1 start
-------------------------------
Note that if you run the commands one by one, there isn't a problem. If
you run it as a script, the standby doesn't connect to the primary. There
aren't any errors reported by either the standby or the primary. The
primary's wal sender process reports the following:
wal sender process rep_user 127.0.0.1(45243) startup waiting for 0/3000158
Anyone know why this would be happening? And if this could be a problem in
other scenarios?
Thom
Hi,
On 2015-02-23 15:25:57 +0000, Thom Brown wrote:
I've noticed that if the primary is started and then a base backup is
immediately taken from it and started as as a synchronous standby, it
doesn't replicate and the primary hangs indefinitely when trying to run any
WAL-generating statements. It only recovers when either the primary is
restarted (which has to use a fast shutdown otherwise it also hangs
forever), or the standby is restarted.Here's a way of reproducing it:
...
Note that if you run the commands one by one, there isn't a problem. If
you run it as a script, the standby doesn't connect to the primary. There
aren't any errors reported by either the standby or the primary. The
primary's wal sender process reports the following:wal sender process rep_user 127.0.0.1(45243) startup waiting for 0/3000158
Anyone know why this would be happening? And if this could be a problem in
other scenarios?
Given that normally a walsender doesn't wait for syncrep I guess this is
the above backend just did authentication. If you gdb into the
walsender, what's the backtrace?
We previously had discussions about that being rather annoying; I
unfortunately don't remember enough of the thread to reference it
here. If it really is this, I think we should add some more smarts about
only enabling syncrep once a backend is fully up and maybe even remove
it from more scenarios during commits generally (e.g. if no xid was
assigned and we just pruned something).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-02-23 16:38:44 +0100, Andres Freund wrote:
I unfortunately don't remember enough of the thread to reference it
here.
Found the right keywords. The threads below
http://archives.postgresql.org/message-id/369698E947874884A77849D8FE3680C2%40maumau
and
/messages/by-id/5CF4ABBA67674088B3941894E22A0D25@maumau
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 February 2015 at 15:38, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
On 2015-02-23 15:25:57 +0000, Thom Brown wrote:
I've noticed that if the primary is started and then a base backup is
immediately taken from it and started as as a synchronous standby, it
doesn't replicate and the primary hangs indefinitely when trying to runany
WAL-generating statements. It only recovers when either the primary is
restarted (which has to use a fast shutdown otherwise it also hangs
forever), or the standby is restarted.Here's a way of reproducing it:
...
Note that if you run the commands one by one, there isn't a problem. If
you run it as a script, the standby doesn't connect to the primary.There
aren't any errors reported by either the standby or the primary. The
primary's wal sender process reports the following:wal sender process rep_user 127.0.0.1(45243) startup waiting for
0/3000158
Anyone know why this would be happening? And if this could be a problem
in
other scenarios?
Given that normally a walsender doesn't wait for syncrep I guess this is
the above backend just did authentication. If you gdb into the
walsender, what's the backtrace?
#0 0x00007f66d1725940 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000617faa in WaitLatchOrSocket ()
#2 0x000000000064741b in SyncRepWaitForLSN ()
#3 0x00000000004bbf8f in CommitTransaction ()
#4 0x00000000004be135 in CommitTransactionCommand ()
#5 0x0000000000757679 in InitPostgres ()
#6 0x0000000000675032 in PostgresMain ()
#7 0x00000000004617ef in ServerLoop ()
#8 0x0000000000627c9c in PostmasterMain ()
#9 0x000000000046223d in main ()
--
Thom
On 23 February 2015 at 15:42, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-23 16:38:44 +0100, Andres Freund wrote:
I unfortunately don't remember enough of the thread to reference it
here.Found the right keywords. The threads below
http://archives.postgresql.org/message-id/369698E947874884A77849D8FE3680C2%40maumau
and
Yes, this seems to be virtually the same issue reported. The trace looks
the same except for RecordTransactionCommit.
--
Thom
On 2015-02-23 15:48:25 +0000, Thom Brown wrote:
On 23 February 2015 at 15:42, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-23 16:38:44 +0100, Andres Freund wrote:
I unfortunately don't remember enough of the thread to reference it
here.Found the right keywords. The threads below
http://archives.postgresql.org/message-id/369698E947874884A77849D8FE3680C2%40maumau
andYes, this seems to be virtually the same issue reported. The trace looks
the same except for RecordTransactionCommit.
So, I proposed in
/messages/by-id/20140707155113.GB1136@alap3.anarazel.de
that we make sequences assign a xid and only wait for syncrep when a xid
is assigned. The biggest blocker was that somebody would have to do some
code reviewing to find other locations that might need similar
treatment.
I did a, quick, grep for XLogInsert() and I think we're otherwise
fine. There's some debatable cases:
* XLOG_STANDBY_LOCK doesn't force a xid to be assigned. I think it's
harmless though, as we really only need to wait for that to be
replicated if the transaction did something relevant (i.e. catalog
changes). And those will force xid assignment.
* 2pc records don't assign a xid. But twophase.c does it's own waiting,
so that's fine.
* Plain vacuums will not trigger waits. But I think that's good. There's
really no need to wait if all that's been done is some cleanup without
visible consequences.
* Fujii brought up that we might want to wait for XLOG_SWITCH - I don't
really see why.
* XLOG_RESTORE_POINT is a similar candidate - I don't see really valid
arguments for making 2pc wait.
The attached, untested, patch changes things so that we
a) only wait for syncrep if we both wrote WAL and had a xid assigned
b) use an async commit if we just had a xid assigned, without having
written WAL, even if synchronous_commit = off
c) acquire a xid when WAL logging sequence changes (arguable at least
one of the xid assignments is redundant, but it doesn't cost
anything, so ...)
I think it makes sense to change a) and b) that way because there's no
need to wait for WAL flushes/syncrep waits when all that happened is
manipulations of temporary/unlogged tables or HOT pruning. It's slightly
wierd that the on-disk flush and the syncrep wait essentially used two
different mechanisms for deciding when to flush.
Comments? This is obviously just a POC, but I think something like this
does make a great deal of sense.
Thom, does that help?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Reconsider-when-to-flush-WAL-and-wait-for-syncrep-wh.patchtext/x-patch; charset=us-asciiDownload
>From 16828b249b4d2ebfc04678f2570d6c439981e472 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 23 Feb 2015 17:38:40 +0100
Subject: [PATCH] Reconsider when to flush WAL and wait for syncrep while
committing.
---
src/backend/access/transam/xact.c | 30 ++++++++++++++++++------------
src/backend/commands/sequence.c | 23 +++++++++++++++++++++++
2 files changed, 41 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 97000ef..2818a03 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1031,10 +1031,9 @@ RecordTransactionCommit(void)
/*
* If we didn't create XLOG entries, we're done here; otherwise we
- * should flush those entries the same as a commit record. (An
- * example of a possible record that wouldn't cause an XID to be
- * assigned is a sequence advance record due to nextval() --- we want
- * to flush that to disk before reporting commit.)
+ * should trigger flushing those entries the same as a commit record
+ * would. This will primarily happen for HOT pruning and the like; we
+ * want these to be flushed to disk in due time.
*/
if (!wrote_xlog)
goto cleanup;
@@ -1153,11 +1152,13 @@ RecordTransactionCommit(void)
/*
* Check if we want to commit asynchronously. We can allow the XLOG flush
* to happen asynchronously if synchronous_commit=off, or if the current
- * transaction has not performed any WAL-logged operation. The latter
- * case can arise if the current transaction wrote only to temporary
- * and/or unlogged tables. In case of a crash, the loss of such a
- * transaction will be irrelevant since temp tables will be lost anyway,
- * and unlogged tables will be truncated. (Given the foregoing, you might
+ * transaction has not performed any WAL-logged operation or didn't assign
+ * a xid. The transaction can end up not writing any WAL, even if it has
+ * a xid, if it only wrote to temporary and/or unlogged tables. It can
+ * end up having written WAL without and xid if it did HOT pruning. In
+ * case of a crash, the loss of such a transaction will be irrelevant;
+ * temp tables will be lost anyway, unlogged tables will be truncated and
+ * HOT pruning will be done again later. (Given the foregoing, you might
* think that it would be unnecessary to emit the XLOG record at all in
* this case, but we don't currently try to do that. It would certainly
* cause problems at least in Hot Standby mode, where the
@@ -1173,7 +1174,8 @@ RecordTransactionCommit(void)
* if all to-be-deleted tables are temporary though, since they are lost
* anyway if we crash.)
*/
- if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
+ if ((wrote_xlog && markXidCommitted &&
+ synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
forceSyncCommit || nrels > 0)
{
XLogFlush(XactLastRecEnd);
@@ -1222,12 +1224,16 @@ RecordTransactionCommit(void)
latestXid = TransactionIdLatest(xid, nchildren, children);
/*
- * Wait for synchronous replication, if required.
+ * Wait for synchronous replication, if required. With similar reasons to
+ * the decision above about using using a async commit we only want to
+ * wait if this backend assigned a xid and wrote WAL. No need to wait if
+ * a xid was assigned due to temporary/unlogged tables or due to HOT
+ * pruning.
*
* Note that at this stage we have marked clog, but still show as running
* in the procarray and continue to hold locks.
*/
- if (wrote_xlog)
+ if (wrote_xlog && markXidCommitted)
SyncRepWaitForLSN(XactLastRecEnd);
/* Reset XactLastRecEnd until the next transaction writes something */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 622ccf7..0070c4f 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -17,6 +17,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/xact.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "access/xlogutils.h"
@@ -358,6 +359,10 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
ItemPointerSet(&tuple->t_data->t_ctid, 0, FirstOffsetNumber);
+ /* check the comment above nextval_internal()'s equivalent call. */
+ if (RelationNeedsWAL(rel))
+ GetTopTransactionId();
+
START_CRIT_SECTION();
MarkBufferDirty(buf);
@@ -438,6 +443,10 @@ AlterSequence(AlterSeqStmt *stmt)
/* Note that we do not change the currval() state */
elm->cached = elm->last;
+ /* check the comment above nextval_internal()'s equivalent call. */
+ if (RelationNeedsWAL(seqrel))
+ GetTopTransactionId();
+
/* Now okay to update the on-disk tuple */
START_CRIT_SECTION();
@@ -679,6 +688,16 @@ nextval_internal(Oid relid)
last_used_seq = elm;
+ /*
+ * If something needs to be WAL logged, acquire an xid, so this
+ * transaction's commit will trigger a WAL flush and wait for
+ * syncrep. It's sufficient to ensure the toplevel transaction has a xid,
+ * no need to assign xids subxacts, that'll already trigger a appropriate
+ * wait. (Have to do that here, so we're outside the critical section)
+ */
+ if (logit && RelationNeedsWAL(seqrel))
+ GetTopTransactionId();
+
/* ready to change the on-disk (or really, in-buffer) tuple */
START_CRIT_SECTION();
@@ -867,6 +886,10 @@ do_setval(Oid relid, int64 next, bool iscalled)
/* In any case, forget any future cached numbers */
elm->cached = elm->last;
+ /* check the comment above nextval_internal()'s equivalent call. */
+ if (RelationNeedsWAL(seqrel))
+ GetTopTransactionId();
+
/* ready to change the on-disk (or really, in-buffer) tuple */
START_CRIT_SECTION();
--
2.3.0.149.gf3f4077.dirty
On 23 February 2015 at 16:53, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-23 15:48:25 +0000, Thom Brown wrote:
On 23 February 2015 at 15:42, Andres Freund <andres@2ndquadrant.com>
wrote:
On 2015-02-23 16:38:44 +0100, Andres Freund wrote:
I unfortunately don't remember enough of the thread to reference it
here.Found the right keywords. The threads below
http://archives.postgresql.org/message-id/369698E947874884A77849D8FE3680C2%40maumau
and
/messages/by-id/5CF4ABBA67674088B3941894E22A0D25@maumau
Yes, this seems to be virtually the same issue reported. The trace looks
the same except for RecordTransactionCommit.So, I proposed in
/messages/by-id/20140707155113.GB1136@alap3.anarazel.de
that we make sequences assign a xid and only wait for syncrep when a xid
is assigned. The biggest blocker was that somebody would have to do some
code reviewing to find other locations that might need similar
treatment.I did a, quick, grep for XLogInsert() and I think we're otherwise
fine. There's some debatable cases:* XLOG_STANDBY_LOCK doesn't force a xid to be assigned. I think it's
harmless though, as we really only need to wait for that to be
replicated if the transaction did something relevant (i.e. catalog
changes). And those will force xid assignment.
* 2pc records don't assign a xid. But twophase.c does it's own waiting,
so that's fine.
* Plain vacuums will not trigger waits. But I think that's good. There's
really no need to wait if all that's been done is some cleanup without
visible consequences.
* Fujii brought up that we might want to wait for XLOG_SWITCH - I don't
really see why.
* XLOG_RESTORE_POINT is a similar candidate - I don't see really valid
arguments for making 2pc wait.The attached, untested, patch changes things so that we
a) only wait for syncrep if we both wrote WAL and had a xid assigned
b) use an async commit if we just had a xid assigned, without having
written WAL, even if synchronous_commit = off
c) acquire a xid when WAL logging sequence changes (arguable at least
one of the xid assignments is redundant, but it doesn't cost
anything, so ...)I think it makes sense to change a) and b) that way because there's no
need to wait for WAL flushes/syncrep waits when all that happened is
manipulations of temporary/unlogged tables or HOT pruning. It's slightly
wierd that the on-disk flush and the syncrep wait essentially used two
different mechanisms for deciding when to flush.Comments? This is obviously just a POC, but I think something like this
does make a great deal of sense.Thom, does that help?
Yeah, this appears to eliminate the problem, at least in the case I
reported.
Thanks
--
Thom
On 2015-02-23 17:53:59 +0100, Andres Freund wrote:
On 2015-02-23 15:48:25 +0000, Thom Brown wrote:
On 23 February 2015 at 15:42, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-23 16:38:44 +0100, Andres Freund wrote:
I unfortunately don't remember enough of the thread to reference it
here.Found the right keywords. The threads below
http://archives.postgresql.org/message-id/369698E947874884A77849D8FE3680C2%40maumau
andYes, this seems to be virtually the same issue reported. The trace looks
the same except for RecordTransactionCommit.So, I proposed in
/messages/by-id/20140707155113.GB1136@alap3.anarazel.de
that we make sequences assign a xid and only wait for syncrep when a xid
is assigned. The biggest blocker was that somebody would have to do some
code reviewing to find other locations that might need similar
treatment.I did a, quick, grep for XLogInsert() and I think we're otherwise
fine. There's some debatable cases:* XLOG_STANDBY_LOCK doesn't force a xid to be assigned. I think it's
harmless though, as we really only need to wait for that to be
replicated if the transaction did something relevant (i.e. catalog
changes). And those will force xid assignment.
* 2pc records don't assign a xid. But twophase.c does it's own waiting,
so that's fine.
* Plain vacuums will not trigger waits. But I think that's good. There's
really no need to wait if all that's been done is some cleanup without
visible consequences.
* Fujii brought up that we might want to wait for XLOG_SWITCH - I don't
really see why.
* XLOG_RESTORE_POINT is a similar candidate - I don't see really valid
arguments for making 2pc wait.The attached, untested, patch changes things so that we
a) only wait for syncrep if we both wrote WAL and had a xid assigned
b) use an async commit if we just had a xid assigned, without having
written WAL, even if synchronous_commit = off
c) acquire a xid when WAL logging sequence changes (arguable at least
one of the xid assignments is redundant, but it doesn't cost
anything, so ...)I think it makes sense to change a) and b) that way because there's no
need to wait for WAL flushes/syncrep waits when all that happened is
manipulations of temporary/unlogged tables or HOT pruning. It's slightly
wierd that the on-disk flush and the syncrep wait essentially used two
different mechanisms for deciding when to flush.
I think this patch is a clear improvement - and unless somebody protests
I'm planning to push it sometime not too far away.
What I'm wondering about is what we want to do with the back branches:
It's really rather ugly that enabling syncrep can prevent you from
logging in; especially if the connection that's blocked is the syncrep
walsender. It'll resolve itself if you kill the connections often
enough, but still, that's something you don't wan to have to do. The
patch isn't particularly complex and should only make things efficient,
so it'd be defensible to backpatch it. Thoughts?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-02-23 17:09:24 +0000, Thom Brown wrote:
On 23 February 2015 at 16:53, Andres Freund <andres@2ndquadrant.com> wrote:
Comments? This is obviously just a POC, but I think something like this
does make a great deal of sense.Thom, does that help?
Yeah, this appears to eliminate the problem, at least in the case I
reported.
I've pushed a somewhat more evolved version of this after more testing.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 February 2015 at 13:08, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-23 17:09:24 +0000, Thom Brown wrote:
On 23 February 2015 at 16:53, Andres Freund <andres@2ndquadrant.com>
wrote:
Comments? This is obviously just a POC, but I think something like this
does make a great deal of sense.Thom, does that help?
Yeah, this appears to eliminate the problem, at least in the case I
reported.I've pushed a somewhat more evolved version of this after more testing.
Thanks. I'll give it another round of testing later.
--
Thom