pg_stat_replication log positions vs base backups

Started by Magnus Haganderabout 10 years ago15 messages
#1Magnus Hagander
magnus@hagander.net

Are the values for the log locations really relevant for backup
connections? And if they are, we need to document it :) ISTM we are just
more or less leaking random data out there?

I'm talking about the actual state=backup connection - not the connection
if we're using -x with pg_basebackup. Where we have output like:

state | backup
sent_location | 0/0
write_location | 2/76CE0000
flush_location | 2/76CC0000
replay_location | 2/76CBF938

I'm thinking those fields should probably all be NULL for state=backup?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#2Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#1)
1 attachment(s)
Re: pg_stat_replication log positions vs base backups

On Wed, Nov 25, 2015 at 10:19 AM, Magnus Hagander <magnus@hagander.net>
wrote:

Are the values for the log locations really relevant for backup
connections? And if they are, we need to document it :) ISTM we are just
more or less leaking random data out there?

I'm talking about the actual state=backup connection - not the connection
if we're using -x with pg_basebackup. Where we have output like:

state | backup
sent_location | 0/0
write_location | 2/76CE0000
flush_location | 2/76CC0000
replay_location | 2/76CBF938

I'm thinking those fields should probably all be NULL for state=backup?

In particular, it seems that in InitWalSenderSlot, we only initialize the
sent location. Perhaps this is needed?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

walsnd_init.patchtext/x-patch; charset=US-ASCII; name=walsnd_init.patchDownload
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 4a4643e..08da433 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1962,6 +1962,9 @@ InitWalSenderSlot(void)
 			 */
 			walsnd->pid = MyProcPid;
 			walsnd->sentPtr = InvalidXLogRecPtr;
+			walsnd->write = InvalidXLogRecPtr;
+			walsnd->flush = InvalidXLogRecPtr;
+			walsnd->apply = InvalidXLogRecPtr;
 			walsnd->state = WALSNDSTATE_STARTUP;
 			walsnd->latch = &MyProc->procLatch;
 			SpinLockRelease(&walsnd->mutex);
#3Michael Paquier
michael.paquier@gmail.com
In reply to: Magnus Hagander (#2)
1 attachment(s)
Re: pg_stat_replication log positions vs base backups

On Wed, Nov 25, 2015 at 7:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Wed, Nov 25, 2015 at 10:19 AM, Magnus Hagander <magnus@hagander.net>
wrote:

Are the values for the log locations really relevant for backup
connections? And if they are, we need to document it :) ISTM we are just
more or less leaking random data out there?

I'm talking about the actual state=backup connection - not the connection
if we're using -x with pg_basebackup. Where we have output like:

state | backup
sent_location | 0/0
write_location | 2/76CE0000
flush_location | 2/76CC0000
replay_location | 2/76CBF938

I'm thinking those fields should probably all be NULL for state=backup?

Hm. You would basically get that when a backup WAL sender is reusing the
sender of another node that is not here anymore.

In particular, it seems that in InitWalSenderSlot, we only initialize the
sent location. Perhaps this is needed?

Yes, that would be nice to start with a clean state. At the same time, I am
noticing that pg_stat_get_wal_senders() is comparing flush, apply and write
directly with 0. I think those should be InvalidXLogRecPtr. Combined with
your patch it gives the attached.
--
Michael

Attachments:

20151125_walsnd_init_v2.patchapplication/x-patch; name=20151125_walsnd_init_v2.patchDownload
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 4a4643e..c135672 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1962,6 +1962,9 @@ InitWalSenderSlot(void)
 			 */
 			walsnd->pid = MyProcPid;
 			walsnd->sentPtr = InvalidXLogRecPtr;
+			walsnd->write = InvalidXLogRecPtr;
+			walsnd->flush = InvalidXLogRecPtr;
+			walsnd->apply = InvalidXLogRecPtr;
 			walsnd->state = WALSNDSTATE_STARTUP;
 			walsnd->latch = &MyProc->procLatch;
 			SpinLockRelease(&walsnd->mutex);
@@ -2821,15 +2824,15 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
 			values[1] = CStringGetTextDatum(WalSndGetStateString(state));
 			values[2] = LSNGetDatum(sentPtr);
 
-			if (write == 0)
+			if (XLogRecPtrIsInvalid(write))
 				nulls[3] = true;
 			values[3] = LSNGetDatum(write);
 
-			if (flush == 0)
+			if (XLogRecPtrIsInvalid(flush))
 				nulls[4] = true;
 			values[4] = LSNGetDatum(flush);
 
-			if (apply == 0)
+			if (XLogRecPtrIsInvalid(apply))
 				nulls[5] = true;
 			values[5] = LSNGetDatum(apply);
 
#4Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#3)
Re: pg_stat_replication log positions vs base backups

On Wed, Nov 25, 2015 at 3:03 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Wed, Nov 25, 2015 at 7:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Wed, Nov 25, 2015 at 10:19 AM, Magnus Hagander <magnus@hagander.net>
wrote:

Are the values for the log locations really relevant for backup
connections? And if they are, we need to document it :) ISTM we are just
more or less leaking random data out there?

I'm talking about the actual state=backup connection - not the
connection if we're using -x with pg_basebackup. Where we have output like:

state | backup
sent_location | 0/0
write_location | 2/76CE0000
flush_location | 2/76CC0000
replay_location | 2/76CBF938

I'm thinking those fields should probably all be NULL for state=backup?

Hm. You would basically get that when a backup WAL sender is reusing the
sender of another node that is not here anymore.

Yeah - and that's obviously incorrect.

In particular, it seems that in InitWalSenderSlot, we only initialize the

sent location. Perhaps this is needed?

Yes, that would be nice to start with a clean state. At the same time, I
am noticing that pg_stat_get_wal_senders() is comparing flush, apply and
write directly with 0. I think those should be InvalidXLogRecPtr. Combined
with your patch it gives the attached.

Good point.

Another thought - why do we leave 0/0 in sent location, but turn the other
three fields to NULL if it's invalid? That seems inconsistent. Is there a
reason for that, or should we fix that as well while we're at it?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#5Michael Paquier
michael.paquier@gmail.com
In reply to: Magnus Hagander (#4)
1 attachment(s)
Re: pg_stat_replication log positions vs base backups

On Wed, Nov 25, 2015 at 11:08 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Wed, Nov 25, 2015 at 3:03 PM, Michael Paquier <
michael.paquier@gmail.com> wrote:

On Wed, Nov 25, 2015 at 7:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

In particular, it seems that in InitWalSenderSlot, we only initialize the

sent location. Perhaps this is needed?

Yes, that would be nice to start with a clean state. At the same time, I
am noticing that pg_stat_get_wal_senders() is comparing flush, apply and
write directly with 0. I think those should be InvalidXLogRecPtr. Combined
with your patch it gives the attached.

Good point.

Another thought - why do we leave 0/0 in sent location, but turn the other
three fields to NULL if it's invalid? That seems inconsistent. Is there a
reason for that, or should we fix that as well while we're at it?

In some code paths, values are expected to be equal to 0 as XLogRecPtr
values are doing direct comparisons with each other, so you can think that
0 for a XLogRecPtr is slightly different from InvalidXLogRecPtr that could
be expected to be invalid but *not* minimal, imagine for example that we
decide at some point to redefine it as INT64_MAX. See for example
receivedUpto in xlog.c or WalSndWaitForWal in walsender.c. Honestly, I
think that it would be nice to make all the logic consistent under a unique
InvalidXLogRecPtr banner that is defined at 0 once and for all, adding at
the same time a comment in xlogdefs.h mentioning that this should not be
redefined to a higher value because comparison logic of XlogRecPtr's depend
on that. We have actually a similar constraint with TimeLineID = 0 that is
equivalent to infinite. Patch attached following those lines, which should
be backpatched for correctness.

Btw, I found at the same time a portion of NOT_USED code setting up a
XLogRecPtr incorrectly in GetOldestWALSendPointer.

FWIW, the last time I poked at this code, introducing some kind of
MinimalXLogRecPtr logic different to distinguish from InvalidXLogRecPtr I
hit a wall, others did not like that much:
/messages/by-id/CAB7nPqRUaFyTJ024W-HiihW4PwfcadZN9HFMsJQfOZSQtZtDUA@mail.gmail.com

Regards,
--
Michael

Attachments:

20151126_walsnd_init_v3.patchapplication/x-patch; name=20151126_walsnd_init_v3.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f17f834..3988ffd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -191,7 +191,7 @@ HotStandbyState standbyState = STANDBY_DISABLED;
 static XLogRecPtr LastRec;
 
 /* Local copy of WalRcv->receivedUpto */
-static XLogRecPtr receivedUpto = 0;
+static XLogRecPtr receivedUpto = InvalidXLogRecPtr;
 static TimeLineID receiveTLI = 0;
 
 /*
@@ -4666,7 +4666,8 @@ XLOGShmemInit(void)
 	 */
 	allocptr = ((char *) XLogCtl) + sizeof(XLogCtlData);
 	XLogCtl->xlblocks = (XLogRecPtr *) allocptr;
-	memset(XLogCtl->xlblocks, 0, sizeof(XLogRecPtr) * XLOGbuffers);
+	memset(XLogCtl->xlblocks, InvalidXLogRecPtr,
+		   sizeof(XLogRecPtr) * XLOGbuffers);
 	allocptr += sizeof(XLogRecPtr) * XLOGbuffers;
 
 
@@ -11189,7 +11190,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						curFileTLI = tli;
 						RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
 											 PrimarySlotName);
-						receivedUpto = 0;
+						receivedUpto = InvalidXLogRecPtr;
 					}
 
 					/*
@@ -11463,7 +11464,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 static int
 emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
 {
-	static XLogRecPtr lastComplaint = 0;
+	static XLogRecPtr lastComplaint = InvalidXLogRecPtr;
 
 	if (readSource == XLOG_FROM_PG_XLOG && emode == LOG)
 	{
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 183a3a5..ccea4ca 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1027,8 +1027,8 @@ XLogWalRcvFlush(bool dying)
 static void
 XLogWalRcvSendReply(bool force, bool requestReply)
 {
-	static XLogRecPtr writePtr = 0;
-	static XLogRecPtr flushPtr = 0;
+	static XLogRecPtr writePtr = InvalidXLogRecPtr;
+	static XLogRecPtr flushPtr = InvalidXLogRecPtr;
 	XLogRecPtr	applyPtr;
 	static TimestampTz sendTime = 0;
 	TimestampTz now;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 4a4643e..14b83b7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -140,7 +140,7 @@ static XLogRecPtr sendTimeLineValidUpto = InvalidXLogRecPtr;
  * How far have we sent WAL already? This is also advertised in
  * MyWalSnd->sentPtr.  (Actually, this is the next WAL location to send.)
  */
-static XLogRecPtr sentPtr = 0;
+static XLogRecPtr sentPtr = InvalidXLogRecPtr;
 
 /* Buffers for constructing outgoing messages and processing reply messages. */
 static StringInfoData output_message;
@@ -1962,6 +1962,9 @@ InitWalSenderSlot(void)
 			 */
 			walsnd->pid = MyProcPid;
 			walsnd->sentPtr = InvalidXLogRecPtr;
+			walsnd->write = InvalidXLogRecPtr;
+			walsnd->flush = InvalidXLogRecPtr;
+			walsnd->apply = InvalidXLogRecPtr;
 			walsnd->state = WALSNDSTATE_STARTUP;
 			walsnd->latch = &MyProc->procLatch;
 			SpinLockRelease(&walsnd->mutex);
@@ -2819,17 +2822,20 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
 		else
 		{
 			values[1] = CStringGetTextDatum(WalSndGetStateString(state));
+
+			if (XLogRecPtrIsInvalid(sentPtr))
+				nulls[2] = true;
 			values[2] = LSNGetDatum(sentPtr);
 
-			if (write == 0)
+			if (XLogRecPtrIsInvalid(write))
 				nulls[3] = true;
 			values[3] = LSNGetDatum(write);
 
-			if (flush == 0)
+			if (XLogRecPtrIsInvalid(flush))
 				nulls[4] = true;
 			values[4] = LSNGetDatum(flush);
 
-			if (apply == 0)
+			if (XLogRecPtrIsInvalid(apply))
 				nulls[5] = true;
 			values[5] = LSNGetDatum(apply);
 
@@ -2933,7 +2939,7 @@ WalSndKeepaliveIfNecessary(TimestampTz now)
 XLogRecPtr
 GetOldestWALSendPointer(void)
 {
-	XLogRecPtr	oldest = {0, 0};
+	XLogRecPtr	oldest = InvalidXLogRecPtr;
 	int			i;
 	bool		found = false;
 
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index 18a3e7c..39a660c 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -23,7 +23,9 @@ typedef uint64 XLogRecPtr;
 /*
  * Zero is used indicate an invalid pointer. Bootstrap skips the first possible
  * WAL segment, initializing the first WAL page at XLOG_SEG_SIZE, so no XLOG
- * record can begin at zero.
+ * record can begin at zero. As direct comparisons of XLogRecPtr values depend
+ * on this default value being minimal, this should not be redefined to a
+ * higher value.
  */
 #define InvalidXLogRecPtr	0
 #define XLogRecPtrIsInvalid(r)	((r) == InvalidXLogRecPtr)
#6Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#5)
Re: pg_stat_replication log positions vs base backups

On Thu, Nov 26, 2015 at 8:17 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Wed, Nov 25, 2015 at 11:08 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Wed, Nov 25, 2015 at 3:03 PM, Michael Paquier <
michael.paquier@gmail.com> wrote:

On Wed, Nov 25, 2015 at 7:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

In particular, it seems that in InitWalSenderSlot, we only initialize the

sent location. Perhaps this is needed?

Yes, that would be nice to start with a clean state. At the same time, I
am noticing that pg_stat_get_wal_senders() is comparing flush, apply and
write directly with 0. I think those should be InvalidXLogRecPtr. Combined
with your patch it gives the attached.

Good point.

Another thought - why do we leave 0/0 in sent location, but turn the
other three fields to NULL if it's invalid? That seems inconsistent. Is
there a reason for that, or should we fix that as well while we're at it?

In some code paths, values are expected to be equal to 0 as XLogRecPtr
values are doing direct comparisons with each other, so you can think that
0 for a XLogRecPtr is slightly different from InvalidXLogRecPtr that could
be expected to be invalid but *not* minimal, imagine for example that we
decide at some point to redefine it as INT64_MAX. See for example
receivedUpto in xlog.c or WalSndWaitForWal in walsender.c. Honestly, I
think that it would be nice to make all the logic consistent under a unique
InvalidXLogRecPtr banner that is defined at 0 once and for all, adding at
the same time a comment in xlogdefs.h mentioning that this should not be
redefined to a higher value because comparison logic of XlogRecPtr's depend
on that. We have actually a similar constraint with TimeLineID = 0 that is
equivalent to infinite. Patch attached following those lines, which should
be backpatched for correctness.

I'm only talking about the actual value in pg_stat_replication here, not
what we are using internally. These are two different things of course -
let's keep them separate for now. In pg_stat_replication, we explicitly
check for InvalidXLogRecPtr and then explicitly set the resulting value to
NULL in the SQL return.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#7Michael Paquier
michael.paquier@gmail.com
In reply to: Magnus Hagander (#6)
Re: pg_stat_replication log positions vs base backups

On Thu, Nov 26, 2015 at 6:45 PM, Magnus Hagander wrote:

I'm only talking about the actual value in pg_stat_replication here, not
what we are using internally. These are two different things of course -
let's keep them separate for now. In pg_stat_replication, we explicitly
check for InvalidXLogRecPtr and then explicitly set the resulting value to
NULL in the SQL return.

No objections from here. I guess you already have a patch?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#7)
Re: pg_stat_replication log positions vs base backups

On Thu, Nov 26, 2015 at 1:03 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Nov 26, 2015 at 6:45 PM, Magnus Hagander wrote:

I'm only talking about the actual value in pg_stat_replication here, not
what we are using internally. These are two different things of course -
let's keep them separate for now. In pg_stat_replication, we explicitly
check for InvalidXLogRecPtr and then explicitly set the resulting value

to

NULL in the SQL return.

No objections from here. I guess you already have a patch?

Well, no, because I haven't figured out which way is the logical one - make
them all return NULL or make them all return 0/0...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Magnus Hagander (#8)
Re: pg_stat_replication log positions vs base backups

On Thu, Nov 26, 2015 at 10:53 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Nov 26, 2015 at 1:03 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Nov 26, 2015 at 6:45 PM, Magnus Hagander wrote:

I'm only talking about the actual value in pg_stat_replication here, not
what we are using internally. These are two different things of course -
let's keep them separate for now. In pg_stat_replication, we explicitly
check for InvalidXLogRecPtr and then explicitly set the resulting value
to
NULL in the SQL return.

No objections from here. I guess you already have a patch?

Well, no, because I haven't figured out which way is the logical one - make
them all return NULL or make them all return 0/0...

It seems to me that NULL is the logical one. We want to define a value
from the user prospective where things are in an undefined state.
That's my logic flow, other opinions are of course welcome.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#9)
Re: pg_stat_replication log positions vs base backups

On Fri, Nov 27, 2015 at 6:07 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Nov 26, 2015 at 10:53 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Thu, Nov 26, 2015 at 1:03 PM, Michael Paquier <

michael.paquier@gmail.com>

wrote:

On Thu, Nov 26, 2015 at 6:45 PM, Magnus Hagander wrote:

I'm only talking about the actual value in pg_stat_replication here,

not

what we are using internally. These are two different things of

course -

let's keep them separate for now. In pg_stat_replication, we

explicitly

check for InvalidXLogRecPtr and then explicitly set the resulting

value

to
NULL in the SQL return.

No objections from here. I guess you already have a patch?

Well, no, because I haven't figured out which way is the logical one -

make

them all return NULL or make them all return 0/0...

It seems to me that NULL is the logical one. We want to define a value
from the user prospective where things are in an undefined state.
That's my logic flow, other opinions are of course welcome.

I've applied these two patches now.

The one that fixes the initialization backpatched to 9.3 which is the
oldest one that has it, and the one that changes the actual 0-vs-NULL
output to 9.5 only as it's a behaviour change.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#11Michael Paquier
michael.paquier@gmail.com
In reply to: Magnus Hagander (#10)
Re: pg_stat_replication log positions vs base backups

On Mon, Dec 14, 2015 at 1:01 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied these two patches now.

The one that fixes the initialization backpatched to 9.3 which is the oldest
one that has it, and the one that changes the actual 0-vs-NULL output to 9.5
only as it's a behaviour change.

Thanks!
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#11)
Re: pg_stat_replication log positions vs base backups

On Mon, Dec 14, 2015 at 8:59 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Mon, Dec 14, 2015 at 1:01 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied these two patches now.

The one that fixes the initialization backpatched to 9.3 which is the oldest
one that has it, and the one that changes the actual 0-vs-NULL output to 9.5
only as it's a behaviour change.

Thanks!

Interesting. I got just today a bug report that is actually a symptom
that people should be careful about: it is possible that
pg_stat_replication reports 1/potential for sync_priority/sync_state
in the case of a WAL sender in "backup" state: a base backup just
needs to reuse the shared memory slot of a standby that was previously
connected. Commit 61c7bee of Magnus fixes the issue, just let's be
careful if there are similar reports that do not include this fix.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#12)
Re: pg_stat_replication log positions vs base backups

On Wed, Dec 16, 2015 at 8:34 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Mon, Dec 14, 2015 at 8:59 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Mon, Dec 14, 2015 at 1:01 AM, Magnus Hagander <magnus@hagander.net>

wrote:

I've applied these two patches now.

The one that fixes the initialization backpatched to 9.3 which is the

oldest

one that has it, and the one that changes the actual 0-vs-NULL output

to 9.5

only as it's a behaviour change.

Thanks!

Interesting. I got just today a bug report that is actually a symptom
that people should be careful about: it is possible that
pg_stat_replication reports 1/potential for sync_priority/sync_state
in the case of a WAL sender in "backup" state: a base backup just
needs to reuse the shared memory slot of a standby that was previously
connected. Commit 61c7bee of Magnus fixes the issue, just let's be
careful if there are similar reports that do not include this fix.

Hmm. With the fix, it returns "async", right?

Perhaps it should return either "backup" or NULL, to be even more clear?
And with priority set to NULL?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#14Michael Paquier
michael.paquier@gmail.com
In reply to: Magnus Hagander (#13)
Re: pg_stat_replication log positions vs base backups

On Wed, Dec 16, 2015 at 7:14 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Dec 16, 2015 at 8:34 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Interesting. I got just today a bug report that is actually a symptom
that people should be careful about: it is possible that
pg_stat_replication reports 1/potential for sync_priority/sync_state
in the case of a WAL sender in "backup" state: a base backup just
needs to reuse the shared memory slot of a standby that was previously
connected. Commit 61c7bee of Magnus fixes the issue, just let's be
careful if there are similar reports that do not include this fix.

Hmm. With the fix, it returns "async", right?

Yes, it returns async with priority set at 0 after your commit. That's
a bit better than the old behavior, still..

Perhaps it should return either "backup" or NULL, to be even more clear? And
with priority set to NULL?

I'd vote for just NULL for both fields. This async/sync status has no
real sense for a backup. Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#14)
Re: pg_stat_replication log positions vs base backups

On Wed, Dec 16, 2015 at 9:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Dec 16, 2015 at 7:14 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Dec 16, 2015 at 8:34 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Interesting. I got just today a bug report that is actually a symptom
that people should be careful about: it is possible that
pg_stat_replication reports 1/potential for sync_priority/sync_state
in the case of a WAL sender in "backup" state: a base backup just
needs to reuse the shared memory slot of a standby that was previously
connected. Commit 61c7bee of Magnus fixes the issue, just let's be
careful if there are similar reports that do not include this fix.

Hmm. With the fix, it returns "async", right?

Yes, it returns async with priority set at 0 after your commit. That's
a bit better than the old behavior, still..

Perhaps it should return either "backup" or NULL, to be even more clear? And
with priority set to NULL?

I'd vote for just NULL for both fields. This async/sync status has no
real sense for a backup. Thoughts?

While looking again at that, considering a WAL sender having an
invalid WAL flush position as always asynchronous like a base backup
one is directly mentioned in walsender.c, per 0c4b468:
+                       /*
+                        * Treat a standby such as a pg_basebackup
background process
+                        * which always returns an invalid flush location, as an
+                        * asynchronous standby.
+                        */
The point is just to be sure that such a node won't be selected as a
sync standby, while it is a surprising behavior, but after this many
years it may not be worth switching to NULL values.

Fujii-san, thoughts? You introduced this behavior after all.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers