Review of Refactoring code for sync node detection
Original message (sorry, don't have a copy to reply to :( ): /messages/by-id/CAB7nPqThX=WvuGA1J-_CQ293dK3FmUivuYkNvHR0W5xjEc=oFQ@mail.gmail.com
Commitfest URL: https://commitfest.postgresql.org/action/patch_view?id=1579
Summary:
I'd like someone to chime in on the potential performance impact. Other than that, looks good modulo a few grammar fixes.
Patch applies cleanly and passes make check.
Searching for both SyncRepLock and sync_standby_priority I find no other code this patch needs to modify.
SyncRepGetSynchronousNode():
IMHO, the top comment should mention that if there are multiple standbys of the same priority it returns the first one.
This switches from using a single if() with multiple conditions &&'d together to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK this is pretty performance critical.
<grammarZealot>In the comments, "Process" should be "Proceed".</grammarZealot>
The original search logic in SyncRepReleaseWaiters is correctly copied and negated.
pg_stat_get_wal_senders():
The original version only loops through walsnds[] one time; the new code loops twice, both times while holding SyncRepLock(shared). This will block transaction commit if syncrep is enabled. That's not great, but presumably this function isn't going to be called terribly often, so maybe it's OK. (On a side note, perhaps we should add a warning to the documentation for pg_stat_replication warning not to hammer it...)
<grammar>
+ /* Get first the priorities on each standby as long as we hold a lock */
should be
+ /* First get the priority of each standby as long as we hold a lock */
or even just
+ /* First get the priority of each standby */
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for your review! (No worries for creating a new thread, I don't
mind as this is not a huge patch. I think you could have downloaded
the mbox from postgresql.org btw).
On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby <jim@nasby.net> wrote:
SyncRepGetSynchronousNode():
IMHO, the top comment should mention that if there are multiple standbys of
the same priority it returns the first one.
OK. Corrected.
This switches from using a single if() with multiple conditions &&'d
together to a bunch of if() continue's. I don't know if those will perform
the same, and AFAIK this is pretty performance critical.
Well, we could still use the old notation with a single if(). That's
not much complicated to change.
<grammarZealot>In the comments, "Process" should be
"Proceed".</grammarZealot>
Noted. Thanks for correcting my Frenglish.
pg_stat_get_wal_senders():
The original version only loops through walsnds[] one time; the new code
loops twice, both times while holding SyncRepLock(shared). This will block
transaction commit if syncrep is enabled. That's not great, but presumably
this function isn't going to be called terribly often, so maybe it's OK. (On
a side note, perhaps we should add a warning to the documentation for
pg_stat_replication warning not to hammer it...)
Hm. For code readability I did not really want to do so, but let's do
this: let's add a new argument in SyncRepGetSynchronousNode to
retrieve the priority of each WAL sender and do a single pass through
the array such as there is no performance difference.
Updated patch attached.
Regards,
--
Michael
Attachments:
20141030_syncrep_refactor_v2.patchtext/x-diff; charset=US-ASCII; name=20141030_syncrep_refactor_v2.patchDownload
From 1a02c982de0aeb2bd7dcf0bbf34605017619a417 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array
This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
src/backend/replication/syncrep.c | 99 +++++++++++++++++++++++++++----------
src/backend/replication/walsender.c | 36 +++-----------
src/include/replication/syncrep.h | 1 +
3 files changed, 80 insertions(+), 56 deletions(-)
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..70c799b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
* Synchronous replication is new as of PostgreSQL 9.1.
*
* If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
*
* This module contains the code for waiting and release of backends.
* All code in this module executes on the primary. The core streaming
@@ -357,6 +357,70 @@ SyncRepInitConfig(void)
}
}
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+ int sync_node = -1;
+ int priority = 0;
+ int i;
+
+ /* Scan WAL senders and find synchronous node if any */
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ /* Use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+ /* First get the priority of this WAL sender */
+ if (node_priority)
+ node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
+ 0 : walsnd->sync_standby_priority;
+
+ /*
+ * Proceed immediately to next WAL sender if one of those
+ * conditions is satisfied:
+ * - Not active with PID equal to 0.
+ * - Not streaming.
+ * - Priority conditions not satisfied. If priority is not 0 it
+ * means that one potential synchronous node has already been
+ * found. The node selected needs as well to have the lowest
+ * priority in the list of WAL senders.
+ * - Stream flush position is invalid.
+ */
+ if (walsnd->pid == 0 ||
+ walsnd->state != WALSNDSTATE_STREAMING ||
+ walsnd->sync_standby_priority == 0 ||
+ (priority != 0 &&
+ priority <= walsnd->sync_standby_priority) ||
+ XLogRecPtrIsInvalid(walsnd->flush))
+ continue;
+
+ /*
+ * We have a potential synchronous candidate, choose it as the
+ * synchronous node for the time being before going through the
+ * other nodes listed in the WAL sender array.
+ */
+ sync_node = i;
+
+ /* Update priority to current value of WAL sender */
+ priority = walsnd->sync_standby_priority;
+ }
+
+ return sync_node;
+}
+
/*
* Update the LSNs on each queue based upon our latest state. This
* implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +433,9 @@ SyncRepReleaseWaiters(void)
{
volatile WalSndCtlData *walsndctl = WalSndCtl;
volatile WalSnd *syncWalSnd = NULL;
+ int sync_node;
int numwrite = 0;
int numflush = 0;
- int priority = 0;
- int i;
/*
* If this WALSender is serving a standby that is not on the list of
@@ -388,32 +451,14 @@ SyncRepReleaseWaiters(void)
/*
* We're a potential sync standby. Release waiters if we are the highest
* priority standby. If there are multiple standbys with same priorities
- * then we use the first mentioned standby. If you change this, also
- * change pg_stat_get_wal_senders().
+ * then we use the first mentioned standbys.
*/
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+ sync_node = SyncRepGetSynchronousNode(NULL);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
- if (walsnd->pid != 0 &&
- walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- syncWalSnd = walsnd;
- }
- }
-
- /*
- * We should have found ourselves at least.
- */
- Assert(syncWalSnd);
+ /* We should have found ourselves at least */
+ Assert(sync_node >= 0 && sync_node < max_wal_senders);
+ syncWalSnd = &WalSndCtl->walsnds[sync_node];
/*
* If we aren't managing the highest priority standby then just leave.
@@ -444,7 +489,7 @@ SyncRepReleaseWaiters(void)
elog(DEBUG3, "released %d procs up to write %X/%X, %d procs up to flush %X/%X",
numwrite, (uint32) (MyWalSnd->write >> 32), (uint32) MyWalSnd->write,
- numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
+ numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
/*
* If we are managing the highest priority standby, though we weren't
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 385d18b..7974ed9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2742,8 +2742,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContext per_query_ctx;
MemoryContext oldcontext;
int *sync_priority;
- int priority = 0;
- int sync_standby = -1;
+ int sync_standby;
int i;
/* check to see if caller supports us returning a tuplestore */
@@ -2774,36 +2773,13 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
/*
* Get the priorities of sync standbys all in one go, to minimise lock
* acquisitions and to allow us to evaluate who is the current sync
- * standby. This code must match the code in SyncRepReleaseWaiters().
+ * standby.
*/
- sync_priority = palloc(sizeof(int) * max_wal_senders);
+ sync_priority = (int *) palloc(sizeof(int) * max_wal_senders);
LWLockAcquire(SyncRepLock, LW_SHARED);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
- if (walsnd->pid != 0)
- {
- /*
- * Treat a standby such as a pg_basebackup background process
- * which always returns an invalid flush location, as an
- * asynchronous standby.
- */
- sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
- 0 : walsnd->sync_standby_priority;
-
- if (walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- sync_standby = i;
- }
- }
- }
+ /* Obtain list of synchronous standbys */
+ sync_standby = SyncRepGetSynchronousNode(sync_priority);
LWLockRelease(SyncRepLock);
for (i = 0; i < max_wal_senders; i++)
@@ -2873,6 +2849,8 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}
+
+ /* Cleanup */
pfree(sync_priority);
/* clean up and return the tuplestore */
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 7eeaf3b..62e1652 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -49,6 +49,7 @@ extern void SyncRepUpdateSyncStandbysDefined(void);
/* called by various procs */
extern int SyncRepWakeQueue(bool all, int mode);
+extern int SyncRepGetSynchronousNode(int *sync_priority);
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
extern void assign_synchronous_commit(int newval, void *extra);
--
2.1.2
On 10/30/14, 8:05 AM, Michael Paquier wrote:
This switches from using a single if() with multiple conditions &&'d
together to a bunch of if() continue's. I don't know if those will perform
the same, and AFAIK this is pretty performance critical.Well, we could still use the old notation with a single if(). That's
not much complicated to change.
I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what the compiler will do with it though.
If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. </nitpick>
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
If we stick with this version I'd argue it makes more sense to just stick
the sync_node = and priority = statements into the if block and ditch the
continue. </nitpick>
Let's go with the cleaner version then, I'd prefer code that can be read
easily for this code path. Switching back is not much complicated either.
--
Michael
Attachments:
20141031_syncrep_refactor_v3.patchtext/x-diff; charset=US-ASCII; name=20141031_syncrep_refactor_v3.patchDownload
From 00d85f046d187056a85c6d6dc82e9a79674b132d Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array
This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
src/backend/replication/syncrep.c | 101 ++++++++++++++++++++++++++----------
src/backend/replication/walsender.c | 36 +++----------
src/include/replication/syncrep.h | 1 +
3 files changed, 82 insertions(+), 56 deletions(-)
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..848c783 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
* Synchronous replication is new as of PostgreSQL 9.1.
*
* If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
*
* This module contains the code for waiting and release of backends.
* All code in this module executes on the primary. The core streaming
@@ -357,6 +357,72 @@ SyncRepInitConfig(void)
}
}
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+ int sync_node = -1;
+ int priority = 0;
+ int i;
+
+ /* Scan WAL senders and find synchronous node if any */
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ /* Use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+ /* First get the priority of this WAL sender */
+ if (node_priority)
+ node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
+ 0 : walsnd->sync_standby_priority;
+
+ /* Proceed to next if not active */
+ if (walsnd->pid == 0)
+ continue;
+
+ /* Proceed to next if not streaming */
+ if (walsnd->state != WALSNDSTATE_STREAMING)
+ continue;
+
+ /* Proceed to next one if asynchronous */
+ if (walsnd->sync_standby_priority == 0)
+ continue;
+
+ /* Proceed to next one if priority conditions not satisfied */
+ if (priority != 0 &&
+ priority <= walsnd->sync_standby_priority)
+ continue;
+
+ /* Proceed to next one if flush position is invalid */
+ if (XLogRecPtrIsInvalid(walsnd->flush))
+ continue;
+
+ /*
+ * We have a potential synchronous candidate, choose it as the
+ * synchronous node for the time being before going through the
+ * other nodes listed in the WAL sender array.
+ */
+ sync_node = i;
+
+ /* Update priority to current value of WAL sender */
+ priority = walsnd->sync_standby_priority;
+ }
+
+ return sync_node;
+}
+
/*
* Update the LSNs on each queue based upon our latest state. This
* implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +435,9 @@ SyncRepReleaseWaiters(void)
{
volatile WalSndCtlData *walsndctl = WalSndCtl;
volatile WalSnd *syncWalSnd = NULL;
+ int sync_node;
int numwrite = 0;
int numflush = 0;
- int priority = 0;
- int i;
/*
* If this WALSender is serving a standby that is not on the list of
@@ -388,32 +453,14 @@ SyncRepReleaseWaiters(void)
/*
* We're a potential sync standby. Release waiters if we are the highest
* priority standby. If there are multiple standbys with same priorities
- * then we use the first mentioned standby. If you change this, also
- * change pg_stat_get_wal_senders().
+ * then we use the first mentioned standbys.
*/
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+ sync_node = SyncRepGetSynchronousNode(NULL);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
- if (walsnd->pid != 0 &&
- walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- syncWalSnd = walsnd;
- }
- }
-
- /*
- * We should have found ourselves at least.
- */
- Assert(syncWalSnd);
+ /* We should have found ourselves at least */
+ Assert(sync_node >= 0 && sync_node < max_wal_senders);
+ syncWalSnd = &WalSndCtl->walsnds[sync_node];
/*
* If we aren't managing the highest priority standby then just leave.
@@ -444,7 +491,7 @@ SyncRepReleaseWaiters(void)
elog(DEBUG3, "released %d procs up to write %X/%X, %d procs up to flush %X/%X",
numwrite, (uint32) (MyWalSnd->write >> 32), (uint32) MyWalSnd->write,
- numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
+ numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
/*
* If we are managing the highest priority standby, though we weren't
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 385d18b..7974ed9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2742,8 +2742,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContext per_query_ctx;
MemoryContext oldcontext;
int *sync_priority;
- int priority = 0;
- int sync_standby = -1;
+ int sync_standby;
int i;
/* check to see if caller supports us returning a tuplestore */
@@ -2774,36 +2773,13 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
/*
* Get the priorities of sync standbys all in one go, to minimise lock
* acquisitions and to allow us to evaluate who is the current sync
- * standby. This code must match the code in SyncRepReleaseWaiters().
+ * standby.
*/
- sync_priority = palloc(sizeof(int) * max_wal_senders);
+ sync_priority = (int *) palloc(sizeof(int) * max_wal_senders);
LWLockAcquire(SyncRepLock, LW_SHARED);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
- if (walsnd->pid != 0)
- {
- /*
- * Treat a standby such as a pg_basebackup background process
- * which always returns an invalid flush location, as an
- * asynchronous standby.
- */
- sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
- 0 : walsnd->sync_standby_priority;
-
- if (walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- sync_standby = i;
- }
- }
- }
+ /* Obtain list of synchronous standbys */
+ sync_standby = SyncRepGetSynchronousNode(sync_priority);
LWLockRelease(SyncRepLock);
for (i = 0; i < max_wal_senders; i++)
@@ -2873,6 +2849,8 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}
+
+ /* Cleanup */
pfree(sync_priority);
/* clean up and return the tuplestore */
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 7eeaf3b..62e1652 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -49,6 +49,7 @@ extern void SyncRepUpdateSyncStandbysDefined(void);
/* called by various procs */
extern int SyncRepWakeQueue(bool all, int mode);
+extern int SyncRepGetSynchronousNode(int *sync_priority);
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
extern void assign_synchronous_commit(int newval, void *extra);
--
2.1.3
On 31 October 2014 00:27, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
If we stick with this version I'd argue it makes more sense to just stick
the sync_node = and priority = statements into the if block and ditch the
continue. </nitpick>Let's go with the cleaner version then, I'd prefer code that can be read
easily for this code path. Switching back is not much complicated either.
I like where you are headed with this feature set. I will do my best
to comment and review so we get something in 9.5.
Some review comments
* The new function calls them a Node rather than a Standby. That is a
good change, but it needs to be applied consistently, so we no longer
use the word Standby anywhere near the sync rep code. I'd prefer if we
continue to use the abbreviation sync for synchronous, since its less
typing and avoids spelling mistakes in code and comments.
* Rationale...
Robert Haas wrote
Interestingly, the syncrep code has in some of its code paths the idea
that a synchronous node is unique, while other code paths assume that
there can be multiple synchronous nodes. If that's fine I think that
it would be better to just make the code multiple-sync node aware, by
having a single function call in walsender.c and syncrep.c that
returns an integer array of WAL sender positions (WalSndCtl). as that
seems more extensible long-term.
I thought this was the rationale for the patch, but it doesn't do
this. If you disagree with Robert, it would be best to say so now,
since I would guess he's expecting the patch to be doing as he asked.
Or perhaps I misunderstand.
* Multiple sync standbys were originally avoided because of the code
cost and complexity at ReleaseWaiters(). I'm very keen to keep the
code at that point very robust, so simplicity is essential. It would
also be useful to have some kind of micro-benchmark that allows us to
understand the overhead of various configs. Jim mentions he isn't sure
of the performance there. I don't see too much a problem with this
patch, but the later patches will require this, so we may as well do
this soon.
* We probably don't need a comment like /* Cleanup */ inserted above a
pfree. ;-)
I see this should get committed in next few days, even outside the CF.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the comments!
On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 31 October 2014 00:27, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
If we stick with this version I'd argue it makes more sense to just stick
the sync_node = and priority = statements into the if block and ditch the
continue. </nitpick>Let's go with the cleaner version then, I'd prefer code that can be read
easily for this code path. Switching back is not much complicated either.I like where you are headed with this feature set. I will do my best
to comment and review so we get something in 9.5.Some review comments
* The new function calls them a Node rather than a Standby. That is a
good change, but it needs to be applied consistently, so we no longer
use the word Standby anywhere near the sync rep code. I'd prefer if we
continue to use the abbreviation sync for synchronous, since its less
typing and avoids spelling mistakes in code and comments.
Yes I guess this makes sense.
* Rationale...
Robert Haas wroteInterestingly, the syncrep code has in some of its code paths the idea
that a synchronous node is unique, while other code paths assume that
there can be multiple synchronous nodes. If that's fine I think that
it would be better to just make the code multiple-sync node aware, by
having a single function call in walsender.c and syncrep.c that
returns an integer array of WAL sender positions (WalSndCtl). as that
seems more extensible long-term.I thought this was the rationale for the patch, but it doesn't do
this. If you disagree with Robert, it would be best to say so now,
since I would guess he's expecting the patch to be doing as he asked.
Or perhaps I misunderstand.
This comment is originally from me :)
/messages/by-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6VRQo5N_TB+A@mail.gmail.com
Changing with my older opinion, I wrote the patch on this thread with
in mind the idea to keep the code as simple as possible, so for now it
is better to still think that we have a single sync node. Let's work
the multiple node thing once we have a better spec of how to do it,
visibly using a dedicated micro-language within s_s_names.
* Multiple sync standbys were originally avoided because of the code
cost and complexity at ReleaseWaiters(). I'm very keen to keep the
code at that point very robust, so simplicity is essential. It would
also be useful to have some kind of micro-benchmark that allows us to
understand the overhead of various configs. Jim mentions he isn't sure
of the performance there. I don't see too much a problem with this
patch, but the later patches will require this, so we may as well do
this soon.
Yes I have been playing with that with the patch series of the
previous commit fest, with something not that much kludgy.
* We probably don't need a comment like /* Cleanup */ inserted above a
pfree. ;-)
Sure. I'll send an updated patch tomorrow.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I'll send an updated patch tomorrow.
Here are updated versions. I divided the patch into two for clarity,
the first one is the actual refactoring patch, renaming
SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
like updating synchronous to sync in the comments as you mentioned)
such as the namings have no conflicts.
The second one updates the syncrep code, including WAL sender and WAL
receiver, and its surroundings to use the term "node" instead of
"standby". This brings in the code the idea that a node using
replication APIs is not necessarily a standby, making it more generic.
This can be applied on top of the refactoring patch. If any other
folks (Heikki or Fujii-san?) have comments about this idea feel free.
Regards,
--
Michael
Attachments:
0001-Refactor-code-to-detect-synchronous-node-in-WAL-send.patchtext/x-patch; charset=US-ASCII; name=0001-Refactor-code-to-detect-synchronous-node-in-WAL-send.patchDownload
From 51e9988ff44b7c2b716e3a0da3f1d1c9359a1d79 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH 1/2] Refactor code to detect synchronous node in WAL sender
array
This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
src/backend/replication/syncrep.c | 97 +++++++++++++++++++++++++++----------
src/backend/replication/walsender.c | 35 +++----------
src/include/replication/syncrep.h | 1 +
3 files changed, 78 insertions(+), 55 deletions(-)
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..f838ad0 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -357,6 +357,70 @@ SyncRepInitConfig(void)
}
}
+
+/*
+ * Obtain position of sync standby in the array referencing all the WAL
+ * senders, or -1 if no sync node can be found. The caller of this function
+ * should take a lock on SyncRepLock. If there are multiple nodes with
+ * the same lowest priority value, the first node found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that can
+ * be used to retrieve the priority of each WAL sender. Its inclusion in
+ * this function has the advantage to limit the scan of the WAL sender
+ * array to one pass, limiting the amount of cycles SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousStandby(int *node_priority)
+{
+ int sync_node = -1;
+ int priority = 0;
+ int i;
+
+ /* Scan WAL senders and find the sync node if any */
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ /* Use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+ /* First get the priority of this WAL sender */
+ if (node_priority)
+ node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
+ 0 : walsnd->sync_standby_priority;
+
+ /* Proceed to next if not active */
+ if (walsnd->pid == 0)
+ continue;
+
+ /* Proceed to next if not streaming */
+ if (walsnd->state != WALSNDSTATE_STREAMING)
+ continue;
+
+ /* Proceed to next one if asynchronous */
+ if (walsnd->sync_standby_priority == 0)
+ continue;
+
+ /* Proceed to next one if priority conditions not satisfied */
+ if (priority != 0 &&
+ priority <= walsnd->sync_standby_priority)
+ continue;
+
+ /* Proceed to next one if flush position is invalid */
+ if (XLogRecPtrIsInvalid(walsnd->flush))
+ continue;
+
+ /*
+ * We have a potential sync candidate, choose it as the sync
+ * node for the time being before going through the other nodes
+ * listed in the WAL sender array.
+ */
+ sync_node = i;
+
+ /* Update priority to current value of WAL sender */
+ priority = walsnd->sync_standby_priority;
+ }
+
+ return sync_node;
+}
+
/*
* Update the LSNs on each queue based upon our latest state. This
* implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +433,9 @@ SyncRepReleaseWaiters(void)
{
volatile WalSndCtlData *walsndctl = WalSndCtl;
volatile WalSnd *syncWalSnd = NULL;
+ int sync_node;
int numwrite = 0;
int numflush = 0;
- int priority = 0;
- int i;
/*
* If this WALSender is serving a standby that is not on the list of
@@ -388,32 +451,14 @@ SyncRepReleaseWaiters(void)
/*
* We're a potential sync standby. Release waiters if we are the highest
* priority standby. If there are multiple standbys with same priorities
- * then we use the first mentioned standby. If you change this, also
- * change pg_stat_get_wal_senders().
+ * then we use the first mentioned standby.
*/
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+ sync_node = SyncRepGetSynchronousStandby(NULL);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
- if (walsnd->pid != 0 &&
- walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- syncWalSnd = walsnd;
- }
- }
-
- /*
- * We should have found ourselves at least.
- */
- Assert(syncWalSnd);
+ /* We should have found ourselves at least */
+ Assert(sync_node >= 0 && sync_node < max_wal_senders);
+ syncWalSnd = &WalSndCtl->walsnds[sync_node];
/*
* If we aren't managing the highest priority standby then just leave.
@@ -444,7 +489,7 @@ SyncRepReleaseWaiters(void)
elog(DEBUG3, "released %d procs up to write %X/%X, %d procs up to flush %X/%X",
numwrite, (uint32) (MyWalSnd->write >> 32), (uint32) MyWalSnd->write,
- numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
+ numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
/*
* If we are managing the highest priority standby, though we weren't
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 385d18b..2d35cc9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2742,8 +2742,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContext per_query_ctx;
MemoryContext oldcontext;
int *sync_priority;
- int priority = 0;
- int sync_standby = -1;
+ int sync_standby;
int i;
/* check to see if caller supports us returning a tuplestore */
@@ -2774,36 +2773,13 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
/*
* Get the priorities of sync standbys all in one go, to minimise lock
* acquisitions and to allow us to evaluate who is the current sync
- * standby. This code must match the code in SyncRepReleaseWaiters().
+ * standby.
*/
- sync_priority = palloc(sizeof(int) * max_wal_senders);
+ sync_priority = (int *) palloc(sizeof(int) * max_wal_senders);
LWLockAcquire(SyncRepLock, LW_SHARED);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
- if (walsnd->pid != 0)
- {
- /*
- * Treat a standby such as a pg_basebackup background process
- * which always returns an invalid flush location, as an
- * asynchronous standby.
- */
- sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
- 0 : walsnd->sync_standby_priority;
-
- if (walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- sync_standby = i;
- }
- }
- }
+ /* Look for sync standby if any */
+ sync_standby = SyncRepGetSynchronousStandby(sync_priority);
LWLockRelease(SyncRepLock);
for (i = 0; i < max_wal_senders; i++)
@@ -2873,6 +2849,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}
+
pfree(sync_priority);
/* clean up and return the tuplestore */
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 7eeaf3b..8361c2f 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -49,6 +49,7 @@ extern void SyncRepUpdateSyncStandbysDefined(void);
/* called by various procs */
extern int SyncRepWakeQueue(bool all, int mode);
+extern int SyncRepGetSynchronousStandby(int *sync_priority);
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
extern void assign_synchronous_commit(int newval, void *extra);
--
2.1.3
0002-Replace-concept-of-standby-by-node-in-replication-co.patchtext/x-patch; charset=US-ASCII; name=0002-Replace-concept-of-standby-by-node-in-replication-co.patchDownload
From a550be7c87ae52e004cc75290cbb84a3f660accd Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Mon, 17 Nov 2014 14:07:17 +0900
Subject: [PATCH 2/2] Replace concept of standby by node in replication code
path
The code path impacted by this change are synchronous replication code,
and WAL-related libraries: WAL receiver and WAL sender. This makes the
code more generic by adding the idea that nodes are not necessarily
standbys.
---
src/backend/postmaster/checkpointer.c | 2 +-
src/backend/replication/README | 2 +-
src/backend/replication/slot.c | 4 +-
src/backend/replication/syncrep.c | 122 +++++++++++++-------------
src/backend/replication/walreceiver.c | 10 +--
src/backend/replication/walsender.c | 130 ++++++++++++++--------------
src/backend/utils/misc/guc.c | 2 +-
src/include/replication/syncrep.h | 6 +-
src/include/replication/walsender_private.h | 10 +--
9 files changed, 144 insertions(+), 144 deletions(-)
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 6c814ba..3bb9eea 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1353,7 +1353,7 @@ static void
UpdateSharedMemoryConfig(void)
{
/* update global shmem state for sync rep */
- SyncRepUpdateSyncStandbysDefined();
+ SyncRepUpdateSyncNodesDefined();
/*
* If full_page_writes has been changed by SIGHUP, we update it in shared
diff --git a/src/backend/replication/README b/src/backend/replication/README
index 2f5df49..78f3c9b 100644
--- a/src/backend/replication/README
+++ b/src/backend/replication/README
@@ -65,7 +65,7 @@ Walsender IPC
At shutdown, postmaster handles walsender processes differently from regular
backends. It waits for regular backends to die before writing the
shutdown checkpoint and terminating pgarch and other auxiliary processes, but
-that's not desirable for walsenders, because we want the standby servers to
+that's not desirable for walsenders, because we want the node servers to
receive all the WAL, including the shutdown checkpoint, before the master
is shut down. Therefore postmaster treats walsenders like the pgarch process,
and instructs them to terminate at PM_SHUTDOWN_2 phase, after all regular
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 937b669..ade2296 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -17,8 +17,8 @@
* premature removal of WAL or of old tuple versions in a manner that would
* interfere with replication; they are also useful for monitoring purposes.
* Slots need to be permanent (to allow restarts), crash-safe, and allocatable
- * on standbys (to support cascading setups). The requirement that slots be
- * usable on standbys precludes storing them in the system catalogs.
+ * on nodes (to support cascading setups). The requirement that slots be
+ * usable on nodes precludes storing them in the system catalogs.
*
* Each replication slot gets its own directory inside the $PGDATA/pg_replslot
* directory. Inside that directory the state file will contain the slot's
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index f838ad0..92bba44 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,23 +5,23 @@
* Synchronous replication is new as of PostgreSQL 9.1.
*
* If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the sync node.
*
* This module contains the code for waiting and release of backends.
* All code in this module executes on the primary. The core streaming
* replication transport remains within WALreceiver/WALsender modules.
*
* The essence of this design is that it isolates all logic about
- * waiting/releasing onto the primary. The primary defines which standbys
- * it wishes to wait for. The standby is completely unaware of the
+ * waiting/releasing onto the primary. The primary defines which nodes
+ * it wishes to wait for. The node is completely unaware of the
* durability requirements of transactions on the primary, reducing the
- * complexity of the code and streamlining both standby operations and
+ * complexity of the code and streamlining both node operations and
* network bandwidth because there is no requirement to ship
* per-transaction state information.
*
* Replication is either synchronous or not synchronous (async). If it is
* async, we just fastpath out of here. If it is sync, then we wait for
- * the write or flush location on the standby before releasing the waiting
+ * the write or flush location on the node before releasing the waiting
* backend. Further complexity in that interaction is expected in later
* releases.
*
@@ -29,10 +29,10 @@
* single ordered queue of waiting backends, so that we can avoid
* searching the through all waiters each time we receive a reply.
*
- * In 9.1 we support only a single synchronous standby, chosen from a
+ * In 9.1 we support only a single synchronous node, chosen from a
* priority list of synchronous_standby_names. Before it can become the
- * synchronous standby it must have caught up with the primary; that may
- * take some time. Once caught up, the current highest priority standby
+ * synchronous node it must have caught up with the primary; that may
+ * take some time. Once caught up, the current highest priority node
* will release waiters from the queue.
*
* Portions Copyright (c) 2010-2014, PostgreSQL Global Development Group
@@ -58,10 +58,10 @@
#include "utils/ps_status.h"
/* User-settable parameters for sync rep */
-char *SyncRepStandbyNames;
+char *SyncRepNodeNames;
-#define SyncStandbysDefined() \
- (SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
+#define SyncNodesDefined() \
+ (SyncRepNodeNames != NULL && SyncRepNodeNames[0] != '\0')
static bool announce_next_takeover = true;
@@ -70,7 +70,7 @@ static int SyncRepWaitMode = SYNC_REP_NO_WAIT;
static void SyncRepQueueInsert(int mode);
static void SyncRepCancelWait(void);
-static int SyncRepGetStandbyPriority(void);
+static int SyncRepGetNodePriority(void);
#ifdef USE_ASSERT_CHECKING
static bool SyncRepQueueIsOrderedByLSN(int mode);
@@ -100,10 +100,10 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
/*
* Fast exit if user has not requested sync replication, or there are no
- * sync replication standby names defined. Note that those standbys don't
+ * sync replication node names defined. Note that those nodes don't
* need to be connected.
*/
- if (!SyncRepRequested() || !SyncStandbysDefined())
+ if (!SyncRepRequested() || !SyncNodesDefined())
return;
Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
@@ -113,14 +113,14 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
/*
- * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
- * set. See SyncRepUpdateSyncStandbysDefined.
+ * We don't wait for sync rep if WalSndCtl->sync_nodes_defined is not
+ * set. See SyncRepUpdateSyncNodesDefined.
*
- * Also check that the standby hasn't already replied. Unlikely race
+ * Also check that the node hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
- if (!WalSndCtl->sync_standbys_defined ||
+ if (!WalSndCtl->sync_nodes_defined ||
XactCommitLSN <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
@@ -206,7 +206,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
ereport(WARNING,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the node.")));
whereToSendOutput = DestNone;
SyncRepCancelWait();
break;
@@ -223,7 +223,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the node.")));
SyncRepCancelWait();
break;
}
@@ -342,24 +342,24 @@ SyncRepInitConfig(void)
int priority;
/*
- * Determine if we are a potential sync standby and remember the result
- * for handling replies from standby.
+ * Determine if we are a potential sync node and remember the result
+ * for handling replies from node.
*/
- priority = SyncRepGetStandbyPriority();
- if (MyWalSnd->sync_standby_priority != priority)
+ priority = SyncRepGetNodePriority();
+ if (MyWalSnd->sync_node_priority != priority)
{
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
- MyWalSnd->sync_standby_priority = priority;
+ MyWalSnd->sync_node_priority = priority;
LWLockRelease(SyncRepLock);
ereport(DEBUG1,
- (errmsg("standby \"%s\" now has synchronous standby priority %u",
+ (errmsg("node \"%s\" now has synchronous node priority %u",
application_name, priority)));
}
}
/*
- * Obtain position of sync standby in the array referencing all the WAL
+ * Obtain position of sync node in the array referencing all the WAL
* senders, or -1 if no sync node can be found. The caller of this function
* should take a lock on SyncRepLock. If there are multiple nodes with
* the same lowest priority value, the first node found is selected.
@@ -369,7 +369,7 @@ SyncRepInitConfig(void)
* array to one pass, limiting the amount of cycles SyncRepLock is taken.
*/
int
-SyncRepGetSynchronousStandby(int *node_priority)
+SyncRepGetSynchronousNode(int *node_priority)
{
int sync_node = -1;
int priority = 0;
@@ -384,7 +384,7 @@ SyncRepGetSynchronousStandby(int *node_priority)
/* First get the priority of this WAL sender */
if (node_priority)
node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
- 0 : walsnd->sync_standby_priority;
+ 0 : walsnd->sync_node_priority;
/* Proceed to next if not active */
if (walsnd->pid == 0)
@@ -395,12 +395,12 @@ SyncRepGetSynchronousStandby(int *node_priority)
continue;
/* Proceed to next one if asynchronous */
- if (walsnd->sync_standby_priority == 0)
+ if (walsnd->sync_node_priority == 0)
continue;
/* Proceed to next one if priority conditions not satisfied */
if (priority != 0 &&
- priority <= walsnd->sync_standby_priority)
+ priority <= walsnd->sync_node_priority)
continue;
/* Proceed to next one if flush position is invalid */
@@ -415,7 +415,7 @@ SyncRepGetSynchronousStandby(int *node_priority)
sync_node = i;
/* Update priority to current value of WAL sender */
- priority = walsnd->sync_standby_priority;
+ priority = walsnd->sync_node_priority;
}
return sync_node;
@@ -423,7 +423,7 @@ SyncRepGetSynchronousStandby(int *node_priority)
/*
* Update the LSNs on each queue based upon our latest state. This
- * implements a simple policy of first-valid-standby-releases-waiter.
+ * implements a simple policy of first-valid-node-releases-waiter.
*
* Other policies are possible, which would change what we do here and what
* perhaps also which information we store as well.
@@ -438,30 +438,30 @@ SyncRepReleaseWaiters(void)
int numflush = 0;
/*
- * If this WALSender is serving a standby that is not on the list of
- * potential standbys then we have nothing to do. If we are still starting
+ * If this WALSender is serving a node that is not on the list of
+ * potential nodes then we have nothing to do. If we are still starting
* up, still running base backup or the current flush position is still
* invalid, then leave quickly also.
*/
- if (MyWalSnd->sync_standby_priority == 0 ||
+ if (MyWalSnd->sync_node_priority == 0 ||
MyWalSnd->state < WALSNDSTATE_STREAMING ||
XLogRecPtrIsInvalid(MyWalSnd->flush))
return;
/*
- * We're a potential sync standby. Release waiters if we are the highest
- * priority standby. If there are multiple standbys with same priorities
- * then we use the first mentioned standby.
+ * We're a potential sync node. Release waiters if we are the highest
+ * priority node. If there are multiple nodes with same priorities
+ * then we use the first mentioned node.
*/
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
- sync_node = SyncRepGetSynchronousStandby(NULL);
+ sync_node = SyncRepGetSynchronousNode(NULL);
/* We should have found ourselves at least */
Assert(sync_node >= 0 && sync_node < max_wal_senders);
syncWalSnd = &WalSndCtl->walsnds[sync_node];
/*
- * If we aren't managing the highest priority standby then just leave.
+ * If we aren't managing the highest priority node then just leave.
*/
if (syncWalSnd != MyWalSnd)
{
@@ -492,28 +492,28 @@ SyncRepReleaseWaiters(void)
numflush, (uint32) (MyWalSnd->flush >> 32), (uint32) MyWalSnd->flush);
/*
- * If we are managing the highest priority standby, though we weren't
- * prior to this, then announce we are now the sync standby.
+ * If we are managing the highest priority node, though we weren't
+ * prior to this, then announce we are now the sync node.
*/
if (announce_next_takeover)
{
announce_next_takeover = false;
ereport(LOG,
- (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
- application_name, MyWalSnd->sync_standby_priority)));
+ (errmsg("node \"%s\" is now the synchronous node with priority %u",
+ application_name, MyWalSnd->sync_node_priority)));
}
}
/*
- * Check if we are in the list of sync standbys, and if so, determine
+ * Check if we are in the list of sync nodes, and if so, determine
* priority sequence. Return priority if set, or zero to indicate that
- * we are not a potential sync standby.
+ * we are not a potential sync node.
*
- * Compare the parameter SyncRepStandbyNames against the application_name
+ * Compare the parameter SyncRepNodeNames against the application_name
* for this WALSender, or allow any name if we find a wildcard "*".
*/
static int
-SyncRepGetStandbyPriority(void)
+SyncRepGetNodePriority(void)
{
char *rawstring;
List *elemlist;
@@ -529,7 +529,7 @@ SyncRepGetStandbyPriority(void)
return 0;
/* Need a modifiable copy of string */
- rawstring = pstrdup(SyncRepStandbyNames);
+ rawstring = pstrdup(SyncRepNodeNames);
/* Parse string into list of identifiers */
if (!SplitIdentifierString(rawstring, ',', &elemlist))
@@ -543,12 +543,12 @@ SyncRepGetStandbyPriority(void)
foreach(l, elemlist)
{
- char *standby_name = (char *) lfirst(l);
+ char *node_name = (char *) lfirst(l);
priority++;
- if (pg_strcasecmp(standby_name, application_name) == 0 ||
- pg_strcasecmp(standby_name, "*") == 0)
+ if (pg_strcasecmp(node_name, application_name) == 0 ||
+ pg_strcasecmp(node_name, "*") == 0)
{
found = true;
break;
@@ -625,17 +625,17 @@ SyncRepWakeQueue(bool all, int mode)
/*
* The checkpointer calls this as needed to update the shared
- * sync_standbys_defined flag, so that backends don't remain permanently wedged
+ * sync_nodes_defined flag, so that backends don't remain permanently wedged
* if synchronous_standby_names is unset. It's safe to check the current value
* without the lock, because it's only ever updated by one process. But we
* must take the lock to change it.
*/
void
-SyncRepUpdateSyncStandbysDefined(void)
+SyncRepUpdateSyncNodesDefined(void)
{
- bool sync_standbys_defined = SyncStandbysDefined();
+ bool sync_nodes_defined = SyncNodesDefined();
- if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
+ if (sync_nodes_defined != WalSndCtl->sync_nodes_defined)
{
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
@@ -644,7 +644,7 @@ SyncRepUpdateSyncStandbysDefined(void)
* for backends to continue to waiting. Since the user no longer
* wants synchronous replication, we'd better wake them up.
*/
- if (!sync_standbys_defined)
+ if (!sync_nodes_defined)
{
int i;
@@ -654,12 +654,12 @@ SyncRepUpdateSyncStandbysDefined(void)
/*
* Only allow people to join the queue when there are synchronous
- * standbys defined. Without this interlock, there's a race
+ * nodes defined. Without this interlock, there's a race
* condition: we might wake up all the current waiters; then, some
* backend that hasn't yet reloaded its config might go to sleep on
* the queue (and never wake up). This prevents that.
*/
- WalSndCtl->sync_standbys_defined = sync_standbys_defined;
+ WalSndCtl->sync_nodes_defined = sync_nodes_defined;
LWLockRelease(SyncRepLock);
}
@@ -726,7 +726,7 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source)
}
/*
- * Any additional validation of standby names should go here.
+ * Any additional validation of node names should go here.
*
* Don't attempt to set WALSender priority because this is executed by
* postmaster at startup, not WALSender, so the application_name is not
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index c2d4ed3..f24a09b 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -3,7 +3,7 @@
* walreceiver.c
*
* The WAL receiver process (walreceiver) is new as of Postgres 9.0. It
- * is the process in the standby server that takes charge of receiving
+ * is the process in the node server that takes charge of receiving
* XLOG records from a primary server during streaming replication.
*
* When the startup process determines that it's time to start streaming,
@@ -103,8 +103,8 @@ static volatile sig_atomic_t got_SIGTERM = false;
*/
static struct
{
- XLogRecPtr Write; /* last byte + 1 written out in the standby */
- XLogRecPtr Flush; /* last byte + 1 flushed in the standby */
+ XLogRecPtr Write; /* last byte + 1 written out in the node */
+ XLogRecPtr Flush; /* last byte + 1 flushed in the node */
} LogstreamResult;
static StringInfoData reply_message;
@@ -474,7 +474,7 @@ WalReceiverMain(void)
bool requestReply = false;
/*
- * Check if time since last receive from standby has
+ * Check if time since last receive from node has
* reached the configured limit.
*/
if (wal_receiver_timeout > 0)
@@ -1098,7 +1098,7 @@ XLogWalRcvSendReply(bool force, bool requestReply)
* in case they don't have a watch.
*
* If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this node.
*/
static void
XLogWalRcvSendHSFeedback(bool immed)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2d35cc9..ba9a9d7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -5,7 +5,7 @@
* The WAL sender process (walsender) is new as of Postgres 9.0. It takes
* care of sending XLOG from the primary server to a single recipient.
* (Note that there can be more than one walsender process concurrently.)
- * It is started by the postmaster when the walreceiver of a standby server
+ * It is started by the postmaster when the walreceiver of a node server
* connects to the primary server and requests XLOG streaming replication.
*
* A walsender is similar to a regular backend, ie. there is a one-to-one
@@ -13,7 +13,7 @@
* of processing SQL queries, it understands a small set of special
* replication-mode commands. The START_REPLICATION command begins streaming
* WAL to the client. While streaming, the walsender keeps reading XLOG
- * records from the disk and sends them to the standby server over the
+ * records from the disk and sends them to the node server over the
* COPY protocol, until the either side ends the replication by exiting COPY
* mode (or until the connection is closed).
*
@@ -27,7 +27,7 @@
* If the server is shut down, postmaster sends us SIGUSR2 after all
* regular backends have exited and the shutdown checkpoint has been written.
* This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
+ * shutdown checkpoint record, wait for it to be replicated to the node,
* and then exit.
*
*
@@ -101,7 +101,7 @@ WalSnd *MyWalSnd = NULL;
/* Global state */
bool am_walsender = false; /* Am I a walsender process? */
bool am_cascading_walsender = false; /* Am I cascading WAL to
- * another standby? */
+ * another node? */
bool am_db_walsender = false; /* Connected to a database? */
/* User-settable parameters for walsender */
@@ -149,7 +149,7 @@ static StringInfoData reply_message;
static StringInfoData tmpbuf;
/*
- * Timestamp of the last receipt of the reply from the standby. Set to 0 if
+ * Timestamp of the last receipt of the reply from the node. Set to 0 if
* wal_sender_timeout doesn't need to be active.
*/
static TimestampTz last_reply_timestamp = 0;
@@ -198,15 +198,15 @@ static void WalSndShutdown(void) __attribute__((noreturn));
static void XLogSendPhysical(void);
static void XLogSendLogical(void);
static void WalSndDone(WalSndSendDataCallback send_data);
-static XLogRecPtr GetStandbyFlushRecPtr(void);
+static XLogRecPtr GetNodeFlushRecPtr(void);
static void IdentifySystem(void);
static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
static void StartReplication(StartReplicationCmd *cmd);
static void StartLogicalReplication(StartReplicationCmd *cmd);
-static void ProcessStandbyMessage(void);
-static void ProcessStandbyReplyMessage(void);
-static void ProcessStandbyHSFeedbackMessage(void);
+static void ProcessNodeMessage(void);
+static void ProcessNodeReplyMessage(void);
+static void ProcessNodeHSFeedbackMessage(void);
static void ProcessRepliesIfAny(void);
static void WalSndKeepalive(bool requestReply);
static void WalSndKeepaliveIfNecessary(TimestampTz now);
@@ -279,7 +279,7 @@ WalSndShutdown(void)
{
/*
* Reset whereToSendOutput to prevent ereport from attempting to send any
- * more messages to the standby.
+ * more messages to the node.
*/
if (whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;
@@ -314,7 +314,7 @@ IdentifySystem(void)
if (am_cascading_walsender)
{
/* this also updates ThisTimeLineID */
- logptr = GetStandbyFlushRecPtr();
+ logptr = GetNodeFlushRecPtr();
}
else
logptr = GetInsertRecPtr();
@@ -529,7 +529,7 @@ StartReplication(StartReplicationCmd *cmd)
if (am_cascading_walsender)
{
/* this also updates ThisTimeLineID */
- FlushPtr = GetStandbyFlushRecPtr();
+ FlushPtr = GetNodeFlushRecPtr();
}
else
FlushPtr = GetFlushRecPtr();
@@ -606,7 +606,7 @@ StartReplication(StartReplicationCmd *cmd)
if (!sendTimeLineIsHistoric || cmd->startpoint < sendTimeLineValidUpto)
{
/*
- * When we first start replication the standby will be behind the
+ * When we first start replication the node will be behind the
* primary. For some applications, for example, synchronous
* replication, it is important to have a clear state for this initial
* catchup mode, so we can trigger actions when we change streaming
@@ -1366,7 +1366,7 @@ ProcessRepliesIfAny(void)
/* unexpected error or EOF */
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg("unexpected EOF on standby connection")));
+ errmsg("unexpected EOF on node connection")));
proc_exit(0);
}
if (r == 0)
@@ -1385,22 +1385,22 @@ ProcessRepliesIfAny(void)
if (streamingDoneReceiving && firstchar != 'X')
ereport(FATAL,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg("unexpected standby message type \"%c\", after receiving CopyDone",
+ errmsg("unexpected node message type \"%c\", after receiving CopyDone",
firstchar)));
/* Handle the very limited subset of commands expected in this phase */
switch (firstchar)
{
/*
- * 'd' means a standby reply wrapped in a CopyData packet.
+ * 'd' means a node reply wrapped in a CopyData packet.
*/
case 'd':
- ProcessStandbyMessage();
+ ProcessNodeMessage();
received = true;
break;
/*
- * CopyDone means the standby requested to finish streaming.
+ * CopyDone means the node requested to finish streaming.
* Reply with CopyDone, if we had not sent that already.
*/
case 'c':
@@ -1416,7 +1416,7 @@ ProcessRepliesIfAny(void)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg("unexpected EOF on standby connection")));
+ errmsg("unexpected EOF on node connection")));
proc_exit(0);
}
@@ -1425,7 +1425,7 @@ ProcessRepliesIfAny(void)
break;
/*
- * 'X' means that the standby is closing down the socket.
+ * 'X' means that the node is closing down the socket.
*/
case 'X':
proc_exit(0);
@@ -1433,7 +1433,7 @@ ProcessRepliesIfAny(void)
default:
ereport(FATAL,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg("invalid standby message type \"%c\"",
+ errmsg("invalid node message type \"%c\"",
firstchar)));
}
}
@@ -1449,10 +1449,10 @@ ProcessRepliesIfAny(void)
}
/*
- * Process a status update message received from standby.
+ * Process a status update message received from node.
*/
static void
-ProcessStandbyMessage(void)
+ProcessNodeMessage(void)
{
char msgtype;
@@ -1465,7 +1465,7 @@ ProcessStandbyMessage(void)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg("unexpected EOF on standby connection")));
+ errmsg("unexpected EOF on node connection")));
proc_exit(0);
}
@@ -1477,11 +1477,11 @@ ProcessStandbyMessage(void)
switch (msgtype)
{
case 'r':
- ProcessStandbyReplyMessage();
+ ProcessNodeReplyMessage();
break;
case 'h':
- ProcessStandbyHSFeedbackMessage();
+ ProcessNodeHSFeedbackMessage();
break;
default:
@@ -1527,10 +1527,10 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
}
/*
- * Regular reply from standby advising of WAL positions on standby server.
+ * Regular reply from node advising of WAL positions on node server.
*/
static void
-ProcessStandbyReplyMessage(void)
+ProcessNodeReplyMessage(void)
{
XLogRecPtr writePtr,
flushPtr,
@@ -1550,13 +1550,13 @@ ProcessStandbyReplyMessage(void)
(uint32) (applyPtr >> 32), (uint32) applyPtr,
replyRequested ? " (reply requested)" : "");
- /* Send a reply if the standby requested one. */
+ /* Send a reply if the node requested one. */
if (replyRequested)
WalSndKeepalive(false);
/*
* Update shared state for this WalSender process based on reply data from
- * standby.
+ * node.
*/
{
/* use volatile pointer to prevent code rearrangement */
@@ -1620,7 +1620,7 @@ PhysicalReplicationSlotNewXmin(TransactionId feedbackXmin)
* Hot Standby feedback
*/
static void
-ProcessStandbyHSFeedbackMessage(void)
+ProcessNodeHSFeedbackMessage(void)
{
TransactionId nextXid;
uint32 nextEpoch;
@@ -1652,8 +1652,8 @@ ProcessStandbyHSFeedbackMessage(void)
* Check that the provided xmin/epoch are sane, that is, not in the future
* and not so far back as to be already wrapped around. Ignore if not.
*
- * Epoch of nextXid should be same as standby, or if the counter has
- * wrapped, then one greater than standby.
+ * Epoch of nextXid should be same as node, or if the counter has
+ * wrapped, then one greater than node.
*/
GetNextXidAndEpoch(&nextXid, &nextEpoch);
@@ -1672,10 +1672,10 @@ ProcessStandbyHSFeedbackMessage(void)
return; /* epoch OK, but it's wrapped around */
/*
- * Set the WalSender's xmin equal to the standby's requested xmin, so that
+ * Set the WalSender's xmin equal to the node's requested xmin, so that
* the xmin will be taken into account by GetOldestXmin. This will hold
* back the removal of dead rows and thereby prevent the generation of
- * cleanup conflicts on the standby server.
+ * cleanup conflicts on the node server.
*
* There is a small window for a race condition here: although we just
* checked that feedbackXmin precedes nextXid, the nextXid could have
@@ -1683,7 +1683,7 @@ ProcessStandbyHSFeedbackMessage(void)
* perhaps far enough to make feedbackXmin wrap around. In that case the
* xmin we set here would be "in the future" and have no effect. No point
* in worrying about this since it's too late to save the desired data
- * anyway. Assuming that the standby sends us an increasing sequence of
+ * anyway. Assuming that the node sends us an increasing sequence of
* xmins, this could only happen during the first reply cycle, else our
* own xmin would prevent nextXid from advancing so far.
*
@@ -1771,7 +1771,7 @@ WalSndCheckTimeOut(TimestampTz now)
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
- * standby.
+ * node.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));
@@ -1860,14 +1860,14 @@ WalSndLoop(WalSndSendDataCallback send_data)
* If we're in catchup state, move to streaming. This is an
* important state change for users to know about, since before
* this point data loss might occur if the primary dies and we
- * need to failover to the standby. The state change is also
+ * need to failover to the node. The state change is also
* important for synchronous replication, since commits that
* started to wait at that point might wait for some time.
*/
if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
{
ereport(DEBUG1,
- (errmsg("standby \"%s\" has now caught up with primary",
+ (errmsg("node \"%s\" has now caught up with primary",
application_name)));
WalSndSetState(WALSNDSTATE_STREAMING);
}
@@ -1875,7 +1875,7 @@ WalSndLoop(WalSndSendDataCallback send_data)
/*
* When SIGUSR2 arrives, we send any outstanding logs up to the
* shutdown checkpoint record (i.e., the latest record), wait for
- * them to be replicated to the standby, and exit. This may be a
+ * them to be replicated to the node, and exit. This may be a
* normal termination at shutdown, or a promotion, the walsender
* is not sure which.
*/
@@ -1971,7 +1971,7 @@ InitWalSenderSlot(void)
if (MyWalSnd == NULL)
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
- errmsg("number of requested standby connections "
+ errmsg("number of requested node connections "
"exceeds max_wal_senders (currently %d)",
max_wal_senders)));
@@ -2086,7 +2086,7 @@ retry:
if (sendFile < 0)
{
/*
- * If the file is not found, assume it's because the standby
+ * If the file is not found, assume it's because the node
* asked for a too old WAL segment that has already been
* removed or recycled.
*/
@@ -2214,7 +2214,7 @@ XLogSendPhysical(void)
else if (am_cascading_walsender)
{
/*
- * Streaming the latest timeline on a standby.
+ * Streaming the latest timeline on a node.
*
* Attempt to send all WAL that has already been replayed, so that we
* know it's valid. If we're receiving WAL through streaming
@@ -2232,7 +2232,7 @@ XLogSendPhysical(void)
*/
bool becameHistoric = false;
- SendRqstPtr = GetStandbyFlushRecPtr();
+ SendRqstPtr = GetNodeFlushRecPtr();
if (!RecoveryInProgress())
{
@@ -2246,9 +2246,9 @@ XLogSendPhysical(void)
else
{
/*
- * Still a cascading standby. But is the timeline we're sending
+ * Still a cascading node. But is the timeline we're sending
* still the one recovery is recovering from? ThisTimeLineID was
- * updated by the GetStandbyFlushRecPtr() call above.
+ * updated by the GetNodeFlushRecPtr() call above.
*/
if (sendTimeLine != ThisTimeLineID)
becameHistoric = true;
@@ -2298,8 +2298,8 @@ XLogSendPhysical(void)
* from the master, before promoting, but if the WAL streaming is
* terminated at a WAL page boundary, the valid portion of the timeline
* might end in the middle of a WAL record. We might've already sent the
- * first half of that partial WAL record to the cascading standby, so that
- * sentPtr > sendTimeLineValidUpto. That's OK; the cascading standby can't
+ * first half of that partial WAL record to the cascading node, so that
+ * sentPtr > sendTimeLineValidUpto. That's OK; the cascading node can't
* replay the partial WAL record either, so it can still follow our
* timeline switch.
*/
@@ -2488,7 +2488,7 @@ WalSndDone(WalSndSendDataCallback send_data)
/*
* Check a write location to see whether all the WAL have successfully
- * been replicated if this walsender is connecting to a standby such as
+ * been replicated if this walsender is connecting to a node such as
* pg_receivexlog which always returns an invalid flush location.
* Otherwise, check a flush location.
*/
@@ -2497,7 +2497,7 @@ WalSndDone(WalSndSendDataCallback send_data)
if (WalSndCaughtUp && sentPtr == replicatedPtr &&
!pq_is_send_pending())
{
- /* Inform the standby that XLOG streaming is done */
+ /* Inform the node that XLOG streaming is done */
EndCommand("COPY 0", DestRemote);
pq_flush();
@@ -2509,14 +2509,14 @@ WalSndDone(WalSndSendDataCallback send_data)
/*
* Returns the latest point in WAL that has been safely flushed to disk, and
- * can be sent to the standby. This should only be called when in recovery,
- * ie. we're streaming to a cascaded standby.
+ * can be sent to the node. This should only be called when in recovery,
+ * ie. we're streaming to a cascaded node.
*
* As a side-effect, ThisTimeLineID is updated to the TLI of the last
* replayed WAL record.
*/
static XLogRecPtr
-GetStandbyFlushRecPtr(void)
+GetNodeFlushRecPtr(void)
{
XLogRecPtr replayPtr;
TimeLineID replayTLI;
@@ -2598,7 +2598,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
* If replication has not yet started, die like with SIGTERM. If
* replication is active, only set a flag and wake up the main loop. It
* will send any outstanding WAL, wait for it to be replicated to the
- * standby, and then exit gracefully.
+ * nodes, and then exit gracefully.
*/
if (!replication_active)
kill(MyProcPid, SIGTERM);
@@ -2730,7 +2730,7 @@ WalSndGetStateString(WalSndState state)
/*
* Returns activity of walsenders, including pids and xlog locations sent to
- * standby servers.
+ * node servers.
*/
Datum
pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
@@ -2742,7 +2742,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContext per_query_ctx;
MemoryContext oldcontext;
int *sync_priority;
- int sync_standby;
+ int sync_node;
int i;
/* check to see if caller supports us returning a tuplestore */
@@ -2771,15 +2771,15 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldcontext);
/*
- * Get the priorities of sync standbys all in one go, to minimise lock
+ * Get the priorities of sync nodes all in one go, to minimise lock
* acquisitions and to allow us to evaluate who is the current sync
- * standby.
+ * node.
*/
sync_priority = (int *) palloc(sizeof(int) * max_wal_senders);
LWLockAcquire(SyncRepLock, LW_SHARED);
- /* Look for sync standby if any */
- sync_standby = SyncRepGetSynchronousStandby(sync_priority);
+ /* Look for sync node if any */
+ sync_node = SyncRepGetSynchronousNode(sync_priority);
LWLockRelease(SyncRepLock);
for (i = 0; i < max_wal_senders; i++)
@@ -2836,12 +2836,12 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
values[6] = Int32GetDatum(sync_priority[i]);
/*
- * More easily understood version of standby state. This is purely
+ * More easily understood version of node state. This is purely
* informational, not different from priority.
*/
if (sync_priority[i] == 0)
values[7] = CStringGetTextDatum("async");
- else if (i == sync_standby)
+ else if (i == sync_node)
values[7] = CStringGetTextDatum("sync");
else
values[7] = CStringGetTextDatum("potential");
@@ -2859,8 +2859,8 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
}
/*
- * This function is used to send keepalive message to standby.
- * If requestReply is set, sets a flag in the message requesting the standby
+ * This function is used to send keepalive message to node.
+ * If requestReply is set, sets a flag in the message requesting the node
* to send a message back to us, for heartbeat purposes.
*/
static void
@@ -2899,7 +2899,7 @@ WalSndKeepaliveIfNecessary(TimestampTz now)
/*
* If half of wal_sender_timeout has lapsed without receiving any reply
- * from the standby, send a keep-alive message to the standby requesting
+ * from the node, send a keep-alive message to the node requesting
* an immediate reply.
*/
ping_time = TimestampTzPlusMilliseconds(last_reply_timestamp,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 23cbe90..cce2827 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3209,7 +3209,7 @@ static struct config_string ConfigureNamesString[] =
NULL,
GUC_LIST_INPUT
},
- &SyncRepStandbyNames,
+ &SyncRepNodeNames,
"",
check_synchronous_standby_names, NULL, NULL
},
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 8361c2f..b510b04 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -32,7 +32,7 @@
#define SYNC_REP_WAIT_COMPLETE 2
/* user-settable parameters for synchronous replication */
-extern char *SyncRepStandbyNames;
+extern char *SyncRepNodeNames;
/* called by user backend */
extern void SyncRepWaitForLSN(XLogRecPtr XactCommitLSN);
@@ -45,11 +45,11 @@ extern void SyncRepInitConfig(void);
extern void SyncRepReleaseWaiters(void);
/* called by checkpointer */
-extern void SyncRepUpdateSyncStandbysDefined(void);
+extern void SyncRepUpdateSyncNodesDefined(void);
/* called by various procs */
extern int SyncRepWakeQueue(bool all, int mode);
-extern int SyncRepGetSynchronousStandby(int *sync_priority);
+extern int SyncRepGetSynchronousNode(int *sync_priority);
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
extern void assign_synchronous_commit(int newval, void *extra);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index dff3354..f7e4900 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -40,7 +40,7 @@ typedef struct WalSnd
/*
* The xlog locations that have been written, flushed, and applied by
- * standby-side. These may be invalid if the standby-side has not offered
+ * node-side. These may be invalid if the node-side has not offered
* values yet.
*/
XLogRecPtr write;
@@ -57,11 +57,11 @@ typedef struct WalSnd
Latch latch;
/*
- * The priority order of the standby managed by this WALSender, as listed
+ * The priority order of the node managed by this WALSender, as listed
* in synchronous_standby_names, or 0 if not-listed. Protected by
* SyncRepLock.
*/
- int sync_standby_priority;
+ int sync_node_priority;
} WalSnd;
extern WalSnd *MyWalSnd;
@@ -82,11 +82,11 @@ typedef struct
XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE];
/*
- * Are any sync standbys defined? Waiting backends can't reload the
+ * Are any sync nodes defined? Waiting backends can't reload the
* config file safely, so checkpointer updates this value as needed.
* Protected by SyncRepLock.
*/
- bool sync_standbys_defined;
+ bool sync_nodes_defined;
WalSnd walsnds[1]; /* VARIABLE LENGTH ARRAY */
} WalSndCtlData;
--
2.1.3
On 16 November 2014 12:07, Michael Paquier <michael.paquier@gmail.com> wrote:
Let's work
the multiple node thing once we have a better spec of how to do it,
visibly using a dedicated micro-language within s_s_names.
Hmm, please make sure that is a new post. That is easily something I
could disagree with, even though I support the need for more
functionality.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 17, 2014 at 10:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 16 November 2014 12:07, Michael Paquier <michael.paquier@gmail.com> wrote:
Let's work
the multiple node thing once we have a better spec of how to do it,
visibly using a dedicated micro-language within s_s_names.Hmm, please make sure that is a new post. That is easily something I
could disagree with, even though I support the need for more
functionality.
Sure. I am not really on that yet though :)
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I'll send an updated patch tomorrow.
Here are updated versions. I divided the patch into two for clarity,
the first one is the actual refactoring patch, renaming
SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
like updating synchronous to sync in the comments as you mentioned)
such as the namings have no conflicts.The second one updates the syncrep code, including WAL sender and WAL
receiver, and its surroundings to use the term "node" instead of
"standby". This brings in the code the idea that a node using
replication APIs is not necessarily a standby, making it more generic.
This can be applied on top of the refactoring patch. If any other
folks (Heikki or Fujii-san?) have comments about this idea feel free.
I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.
In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 18, 2014 at 3:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I'll send an updated patch tomorrow.
Here are updated versions. I divided the patch into two for clarity,
the first one is the actual refactoring patch, renaming
SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
like updating synchronous to sync in the comments as you mentioned)
such as the namings have no conflicts.The second one updates the syncrep code, including WAL sender and WAL
receiver, and its surroundings to use the term "node" instead of
"standby". This brings in the code the idea that a node using
replication APIs is not necessarily a standby, making it more generic.
This can be applied on top of the refactoring patch. If any other
folks (Heikki or Fujii-san?) have comments about this idea feel free.I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?
That would really save some cycles. Now we still need to fetch the
priority numbers of all nodes for pg_stat_get_wal_senders, so in this
case scanning all the entries in the WAL sender array is necessary.
This optimization is orthogonal to the refactoring patch btw, no?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 November 2014 00:02, Michael Paquier <michael.paquier@gmail.com> wrote:
I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?
Agreed
That would really save some cycles. Now we still need to fetch the
priority numbers of all nodes for pg_stat_get_wal_senders, so in this
case scanning all the entries in the WAL sender array is necessary.
This optimization is orthogonal to the refactoring patch btw, no?
To the refactoring patch, possibly. But not to the changes you're
planning to make.
ISTM that we should remember the priority list of nodes and check them
in that order by direct lookups to array elements. That will work for
1 or more nodes and it also works better with large numbers of
walsenders.
Can we just wait on this patch until we have the whole feature?
We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Can we just wait on this patch until we have the whole feature?
Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.
We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.
Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.
--
Michael
On 11/18/2014 11:23 PM, Michael Paquier wrote:
On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Can we just wait on this patch until we have the whole feature?
Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.
Some comments on this patch:
* I don't like filling the priorities-array in
SyncRepGetSynchronousStandby(). It might be convenient for the one
caller that needs it, but it seems pretty ad hoc.
In fact, I don't really see the point of having the array at all. You
could just print the priority in the main loop in
pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock
while reading the priority, but it seems over-zealous to be so strict
about that in pg_stat_wal_senders(), since it's just an informational
view, and priority changes so rarely that it's highly unlikely to hit a
race condition there. Also note that when you change the list of
synchronous standbys in the config file, and SIGHUP, the walsender
processes will update their priorities in random order. So the idea that
you get a consistent snapshot of the priorities is a bit bogus anyway.
Also note that between filling the priorities array and the main loop in
pg_stat_get_wal_senders(), a new WAL sender might connect and set a
slot's pid. With the current coding, you'll print an uninitialized value
from the array if that happens. So the only thing that holding the
SyncRepLock really protects from is a "torn" read of the value, if
reading an int is not atomic. We could use the spinlock to protect from
that if we care.
* Would be nicer to return a pointer, than index into the wal senders
array. That's what the caller really wants.
I propose the attached (I admit I haven't tested it).
- Heikki
Attachments:
syncrep_refactor_v4.patchtext/x-diff; name=syncrep_refactor_v4.patchDownload
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..da89a3b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
* Synchronous replication is new as of PostgreSQL 9.1.
*
* If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
*
* This module contains the code for waiting and release of backends.
* All code in this module executes on the primary. The core streaming
@@ -358,6 +358,54 @@ SyncRepInitConfig(void)
}
/*
+ * Find the WAL sender servicing the synchronous standby with the lowest
+ * priority value, or NULL if no synchronous standby is connected. If there
+ * are multiple nodes with the same lowest priority value, the first node
+ * found is selected. The caller must hold SyncRepLock.
+ */
+WalSnd *
+SyncRepGetSynchronousStandby(void)
+{
+ WalSnd *result = NULL;
+ int result_priority = 0;
+ int i;
+
+ /* Scan WAL senders and find synchronous node if any */
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ /* Use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+ int this_priority;
+
+ /* Must be active */
+ if (walsnd->pid == 0)
+ continue;
+
+ /* Must be streaming */
+ if (walsnd->state != WALSNDSTATE_STREAMING)
+ continue;
+
+ /* Must be synchronous */
+ this_priority = walsnd->sync_standby_priority;
+ if (this_priority == 0)
+ continue;
+
+ /* Must have a lower priority value than any previous ones */
+ if (result != NULL && result_priority <= this_priority)
+ continue;
+
+ /* Must have a valid flush position */
+ if (XLogRecPtrIsInvalid(walsnd->flush))
+ continue;
+
+ result = (WalSnd *) walsnd;
+ result_priority = this_priority;
+ }
+
+ return result;
+}
+
+/*
* Update the LSNs on each queue based upon our latest state. This
* implements a simple policy of first-valid-standby-releases-waiter.
*
@@ -368,11 +416,9 @@ void
SyncRepReleaseWaiters(void)
{
volatile WalSndCtlData *walsndctl = WalSndCtl;
- volatile WalSnd *syncWalSnd = NULL;
+ WalSnd *syncWalSnd;
int numwrite = 0;
int numflush = 0;
- int priority = 0;
- int i;
/*
* If this WALSender is serving a standby that is not on the list of
@@ -388,32 +434,13 @@ SyncRepReleaseWaiters(void)
/*
* We're a potential sync standby. Release waiters if we are the highest
* priority standby. If there are multiple standbys with same priorities
- * then we use the first mentioned standby. If you change this, also
- * change pg_stat_get_wal_senders().
+ * then we use the first mentioned standbys.
*/
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+ syncWalSnd = SyncRepGetSynchronousStandby();
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
- if (walsnd->pid != 0 &&
- walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- syncWalSnd = walsnd;
- }
- }
-
- /*
- * We should have found ourselves at least.
- */
- Assert(syncWalSnd);
+ /* We should have found ourselves at least */
+ Assert(syncWalSnd != NULL);
/*
* If we aren't managing the highest priority standby then just leave.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index addae8f..742c455 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2741,9 +2741,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
Tuplestorestate *tupstore;
MemoryContext per_query_ctx;
MemoryContext oldcontext;
- int *sync_priority;
- int priority = 0;
- int sync_standby = -1;
+ WalSnd *sync_standby;
int i;
/* check to see if caller supports us returning a tuplestore */
@@ -2772,38 +2770,10 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldcontext);
/*
- * Get the priorities of sync standbys all in one go, to minimise lock
- * acquisitions and to allow us to evaluate who is the current sync
- * standby. This code must match the code in SyncRepReleaseWaiters().
+ * Get the currently active synchronous standby.
*/
- sync_priority = palloc(sizeof(int) * max_wal_senders);
LWLockAcquire(SyncRepLock, LW_SHARED);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
-
- if (walsnd->pid != 0)
- {
- /*
- * Treat a standby such as a pg_basebackup background process
- * which always returns an invalid flush location, as an
- * asynchronous standby.
- */
- sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
- 0 : walsnd->sync_standby_priority;
-
- if (walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- sync_standby = i;
- }
- }
- }
+ sync_standby = SyncRepGetSynchronousStandby();
LWLockRelease(SyncRepLock);
for (i = 0; i < max_wal_senders; i++)
@@ -2814,6 +2784,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
XLogRecPtr write;
XLogRecPtr flush;
XLogRecPtr apply;
+ int priority;
WalSndState state;
Datum values[PG_STAT_GET_WAL_SENDERS_COLS];
bool nulls[PG_STAT_GET_WAL_SENDERS_COLS];
@@ -2827,6 +2798,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
write = walsnd->write;
flush = walsnd->flush;
apply = walsnd->apply;
+ priority = walsnd->sync_standby_priority;
SpinLockRelease(&walsnd->mutex);
memset(nulls, 0, sizeof(nulls));
@@ -2857,15 +2829,15 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
nulls[5] = true;
values[5] = LSNGetDatum(apply);
- values[6] = Int32GetDatum(sync_priority[i]);
+ values[6] = Int32GetDatum(priority);
/*
* More easily understood version of standby state. This is purely
* informational, not different from priority.
*/
- if (sync_priority[i] == 0)
+ if (priority == 0)
values[7] = CStringGetTextDatum("async");
- else if (i == sync_standby)
+ else if (walsnd == sync_standby)
values[7] = CStringGetTextDatum("sync");
else
values[7] = CStringGetTextDatum("potential");
@@ -2873,7 +2845,6 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}
- pfree(sync_priority);
/* clean up and return the tuplestore */
tuplestore_donestoring(tupstore);
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 7eeaf3b..6f78fee 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -50,6 +50,10 @@ extern void SyncRepUpdateSyncStandbysDefined(void);
/* called by various procs */
extern int SyncRepWakeQueue(bool all, int mode);
+/* forward declaration to avoid pulling in walsender_private.h */
+struct WalSnd;
+extern struct WalSnd *SyncRepGetSynchronousStandby(void);
+
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
extern void assign_synchronous_commit(int newval, void *extra);
On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
* I don't like filling the priorities-array in
SyncRepGetSynchronousStandby(). It might be convenient for the one caller
that needs it, but it seems pretty ad hoc.In fact, I don't really see the point of having the array at all. You
could just print the priority in the main loop in
pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock
while reading the priority, but it seems over-zealous to be so strict about
that in pg_stat_wal_senders(), since it's just an informational view, and
priority changes so rarely that it's highly unlikely to hit a race
condition there. Also note that when you change the list of synchronous
standbys in the config file, and SIGHUP, the walsender processes will
update their priorities in random order. So the idea that you get a
consistent snapshot of the priorities is a bit bogus anyway. Also note that
between filling the priorities array and the main loop in
pg_stat_get_wal_senders(), a new WAL sender might connect and set a slot's
pid. With the current coding, you'll print an uninitialized value from the
array if that happens. So the only thing that holding the SyncRepLock
really protects from is a "torn" read of the value, if reading an int is
not atomic. We could use the spinlock to protect from that if we care.
That's fair, and it simplifies the whole process as well as the API able to
fetch the synchronous WAL sender.
* Would be nicer to return a pointer, than index into the wal senders
array. That's what the caller really wants.I propose the attached (I admit I haven't tested it).
Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to 1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.
Regards,
--
Michael
Attachments:
20141212_syncrep_node.patchtext/x-diff; charset=US-ASCII; name=20141212_syncrep_node.patchDownload
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..34530a0 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
* Synchronous replication is new as of PostgreSQL 9.1.
*
* If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
*
* This module contains the code for waiting and release of backends.
* All code in this module executes on the primary. The core streaming
@@ -358,6 +358,62 @@ SyncRepInitConfig(void)
}
/*
+ * Find the WAL sender servicing the synchronous standby with the lowest
+ * priority value, or NULL if no synchronous standby is connected. If there
+ * are multiple nodes with the same lowest priority value, the first node
+ * found is selected. The caller must hold SyncRepLock.
+ */
+WalSnd *
+SyncRepGetSynchronousStandby(void)
+{
+ WalSnd *result = NULL;
+ int result_priority = 0;
+ int i;
+
+ /* Scan WAL senders and find synchronous node if any */
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ /* Use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+ int this_priority;
+
+ /* Must be active */
+ if (walsnd->pid == 0)
+ continue;
+
+ /* Must be streaming */
+ if (walsnd->state != WALSNDSTATE_STREAMING)
+ continue;
+
+ /* Must be synchronous */
+ this_priority = walsnd->sync_standby_priority;
+ if (this_priority == 0)
+ continue;
+
+ /* Must have a lower priority value than any previous ones */
+ if (result != NULL && result_priority <= this_priority)
+ continue;
+
+ /* Must have a valid flush position */
+ if (XLogRecPtrIsInvalid(walsnd->flush))
+ continue;
+
+ result = (WalSnd *) walsnd;
+ result_priority = this_priority;
+
+ /*
+ * If priority is equal to 1, we are sure that there are no other
+ * WAL senders that could be synchronous with a lower prioroty,
+ * hence leave immediately with this one.
+ */
+ if (this_priority == 1)
+ return result;
+ }
+
+ return result;
+}
+
+/*
* Update the LSNs on each queue based upon our latest state. This
* implements a simple policy of first-valid-standby-releases-waiter.
*
@@ -368,11 +424,9 @@ void
SyncRepReleaseWaiters(void)
{
volatile WalSndCtlData *walsndctl = WalSndCtl;
- volatile WalSnd *syncWalSnd = NULL;
+ WalSnd *syncWalSnd;
int numwrite = 0;
int numflush = 0;
- int priority = 0;
- int i;
/*
* If this WALSender is serving a standby that is not on the list of
@@ -388,32 +442,13 @@ SyncRepReleaseWaiters(void)
/*
* We're a potential sync standby. Release waiters if we are the highest
* priority standby. If there are multiple standbys with same priorities
- * then we use the first mentioned standby. If you change this, also
- * change pg_stat_get_wal_senders().
+ * then we use the first mentioned standbys.
*/
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+ syncWalSnd = SyncRepGetSynchronousStandby();
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
- if (walsnd->pid != 0 &&
- walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- syncWalSnd = walsnd;
- }
- }
-
- /*
- * We should have found ourselves at least.
- */
- Assert(syncWalSnd);
+ /* We should have found ourselves at least */
+ Assert(syncWalSnd != NULL);
/*
* If we aren't managing the highest priority standby then just leave.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index addae8f..742c455 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2741,9 +2741,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
Tuplestorestate *tupstore;
MemoryContext per_query_ctx;
MemoryContext oldcontext;
- int *sync_priority;
- int priority = 0;
- int sync_standby = -1;
+ WalSnd *sync_standby;
int i;
/* check to see if caller supports us returning a tuplestore */
@@ -2772,38 +2770,10 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
MemoryContextSwitchTo(oldcontext);
/*
- * Get the priorities of sync standbys all in one go, to minimise lock
- * acquisitions and to allow us to evaluate who is the current sync
- * standby. This code must match the code in SyncRepReleaseWaiters().
+ * Get the currently active synchronous standby.
*/
- sync_priority = palloc(sizeof(int) * max_wal_senders);
LWLockAcquire(SyncRepLock, LW_SHARED);
- for (i = 0; i < max_wal_senders; i++)
- {
- /* use volatile pointer to prevent code rearrangement */
- volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
-
- if (walsnd->pid != 0)
- {
- /*
- * Treat a standby such as a pg_basebackup background process
- * which always returns an invalid flush location, as an
- * asynchronous standby.
- */
- sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
- 0 : walsnd->sync_standby_priority;
-
- if (walsnd->state == WALSNDSTATE_STREAMING &&
- walsnd->sync_standby_priority > 0 &&
- (priority == 0 ||
- priority > walsnd->sync_standby_priority) &&
- !XLogRecPtrIsInvalid(walsnd->flush))
- {
- priority = walsnd->sync_standby_priority;
- sync_standby = i;
- }
- }
- }
+ sync_standby = SyncRepGetSynchronousStandby();
LWLockRelease(SyncRepLock);
for (i = 0; i < max_wal_senders; i++)
@@ -2814,6 +2784,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
XLogRecPtr write;
XLogRecPtr flush;
XLogRecPtr apply;
+ int priority;
WalSndState state;
Datum values[PG_STAT_GET_WAL_SENDERS_COLS];
bool nulls[PG_STAT_GET_WAL_SENDERS_COLS];
@@ -2827,6 +2798,7 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
write = walsnd->write;
flush = walsnd->flush;
apply = walsnd->apply;
+ priority = walsnd->sync_standby_priority;
SpinLockRelease(&walsnd->mutex);
memset(nulls, 0, sizeof(nulls));
@@ -2857,15 +2829,15 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
nulls[5] = true;
values[5] = LSNGetDatum(apply);
- values[6] = Int32GetDatum(sync_priority[i]);
+ values[6] = Int32GetDatum(priority);
/*
* More easily understood version of standby state. This is purely
* informational, not different from priority.
*/
- if (sync_priority[i] == 0)
+ if (priority == 0)
values[7] = CStringGetTextDatum("async");
- else if (i == sync_standby)
+ else if (walsnd == sync_standby)
values[7] = CStringGetTextDatum("sync");
else
values[7] = CStringGetTextDatum("potential");
@@ -2873,7 +2845,6 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}
- pfree(sync_priority);
/* clean up and return the tuplestore */
tuplestore_donestoring(tupstore);
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 7eeaf3b..6f78fee 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -50,6 +50,10 @@ extern void SyncRepUpdateSyncStandbysDefined(void);
/* called by various procs */
extern int SyncRepWakeQueue(bool all, int mode);
+/* forward declaration to avoid pulling in walsender_private.h */
+struct WalSnd;
+extern struct WalSnd *SyncRepGetSynchronousStandby(void);
+
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
extern void assign_synchronous_commit(int newval, void *extra);
On 12/12/2014 04:29 AM, Michael Paquier wrote:
On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:I propose the attached (I admit I haven't tested it).
Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to 1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.
Ok, committed.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 12, 2014 at 9:38 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
On 12/12/2014 04:29 AM, Michael Paquier wrote:
On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:I propose the attached (I admit I haven't tested it).
Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to
1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.Ok, committed.
Thanks!
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers