Possible bug in logical replication.

Started by Konstantin Knizhnikover 7 years ago38 messages
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

We got the following assertion failure at our buildfarm of master branch
of Postgres in contrib/test_decoding regression tests:

2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL: Streaming transactions committing after 0/2A00000, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT: SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174')
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", File: "xlogreader.c", Line: 241)

Stack trace is the following:

$ gdb -x ./gdbcmd --batch pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT '.
Program terminated with signal SIGABRT, Aborted.
#0 0xf7722c89 in __kernel_vsyscall ()
#0 0xf7722c89 in __kernel_vsyscall ()
#1 0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3 0xf6b5f297 in __GI_abort () at abort.c:89
#4 0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", errorType=0x56b8bf61 "FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5 0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, errormsg=0xffaf2560) at xlogreader.c:241
#6 0x569c3191 in pg_logical_replication_slot_advance (startlsn=<optimized out>, moveto=44040564) at slotfuncs.c:370
#7 0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at slotfuncs.c:487
#8 0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, randomAccess=false) at execSRF.c:231
#9 0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 <FunctionRecheck>, accessMtd=0x568b3ed0 <FunctionNext>, node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 <FunctionNext>, recheckMtd=0x568b3ec0 <FunctionRecheck>) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at ../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=<optimized out>, dest=0x0, direction=NoMovementScanDirection, numberTuples=<optimized out>, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 "") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 "SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') ") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=<optimized out>, username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1379
#24 0x566cf642 in main (argc=<optimized out>, argv=<optimized out>) at main.c:228

As you can see, assertion failure happen because specified startlsn (0x2a00000) is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in pg_replication_slot_advance:

startlsn = MyReplicationSlot->data.confirmed_flush;

Unfortunately I was not able to reproduce the problem even repeating this regression tests 1000 times, so it seems to be very difficult to reproduced non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in XLogReadRecord and if record doesn't cross page boundary, then the following formula is used:

state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to page boundary.
It is confirmed by the following comment in XLogReadRecord function:

/*
* RecPtr is pointing to end+1 of the previous WAL record. If we're
* at a page boundary, no more records can fit on the current page. We
* must skip over the page header, but we can't do that until we've
* read in the page, since the header size is variable.
*/

But it means that it is possible that confirmed_flush can also point to the page boundary which will cause this assertion failure in XLogReadRecord:

/*
* Caller supplied a position to start at.
*
* In this case, the passed-in record pointer should already be
* pointing to a valid record starting position.
*/
Assert(XRecOffIsValid(RecPtr));

May be this bug is very difficult to reproduce because it takes place only if WAL record preceding specified start point ends exactly at page boundary. The probability of it seems to be quite small...

I am not so familiar with wal iteration code and logical decoding, so I will be pleased to receive confirmation orconfutation of my analysis of the problem.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#2Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#1)
Re: Possible bug in logical replication.

On 17.05.2018 10:45, Konstantin Knizhnik wrote:

We got the following assertion failure at our buildfarm of master
branch of Postgres in contrib/test_decoding regression tests:

2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL: Streaming transactions committing after 0/2A00000, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT: SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174')
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", File: "xlogreader.c", Line: 241)

Stack trace is the following:

$ gdb -x ./gdbcmd --batch pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT '.
Program terminated with signal SIGABRT, Aborted.
#0 0xf7722c89 in __kernel_vsyscall ()
#0 0xf7722c89 in __kernel_vsyscall ()
#1 0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3 0xf6b5f297 in __GI_abort () at abort.c:89
#4 0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", errorType=0x56b8bf61 "FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5 0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, errormsg=0xffaf2560) at xlogreader.c:241
#6 0x569c3191 in pg_logical_replication_slot_advance (startlsn=<optimized out>, moveto=44040564) at slotfuncs.c:370
#7 0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at slotfuncs.c:487
#8 0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, randomAccess=false) at execSRF.c:231
#9 0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 <FunctionRecheck>, accessMtd=0x568b3ed0 <FunctionNext>, node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 <FunctionNext>, recheckMtd=0x568b3ec0 <FunctionRecheck>) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at ../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=<optimized out>, dest=0x0, direction=NoMovementScanDirection, numberTuples=<optimized out>, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 "") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 "SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') ") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=<optimized out>, username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1379
#24 0x566cf642 in main (argc=<optimized out>, argv=<optimized out>) at main.c:228

As you can see, assertion failure happen because specified startlsn (0x2a00000) is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in pg_replication_slot_advance:
startlsn = MyReplicationSlot->data.confirmed_flush;

Unfortunately I was not able to reproduce the problem even repeating this regression tests 1000 times, so it seems to be very difficult to reproduced non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in XLogReadRecord and if record doesn't cross page boundary, then the following formula is used:

state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to page boundary.
It is confirmed by the following comment in XLogReadRecord function:

/*
* RecPtr is pointing to end+1 of the previous WAL record. If we're
* at a page boundary, no more records can fit on the current page. We
* must skip over the page header, but we can't do that until we've
* read in the page, since the header size is variable.
*/

But it means that it is possible that confirmed_flush can also point to the page boundary which will cause this assertion failure in XLogReadRecord:

/*
* Caller supplied a position to start at.
*
* In this case, the passed-in record pointer should already be
* pointing to a valid record starting position.
*/
Assert(XRecOffIsValid(RecPtr));

May be this bug is very difficult to reproduce because it takes place only if WAL record preceding specified start point ends exactly at page boundary. The probability of it seems to be quite small...

I am not so familiar with wal iteration code and logical decoding, so I will be pleased to receive confirmation orconfutation of my analysis of the problem.
--
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Looks like it is not the first time this problem was encountered:

/messages/by-id/34d66f63-40a9-4c3e-c9a1-248d1e393d29@enterprisedb.com

I wonder who is now maintaining logical replication stuff and whether
this bug is going to be fixed?
I think that using restart_lsn instead of confirmed_flush is not right
approach.
If restart_lsn is not available and confirmed_flush is pointing to page
boundary, then in any case we should somehow handle this case and adjust
startlsn to point on the valid record position (by jjust adding page
header size?).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#3Arseny Sher
a.sher@postgrespro.ru
In reply to: Konstantin Knizhnik (#2)
Re: Possible bug in logical replication.

Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

I think that using restart_lsn instead of confirmed_flush is not right approach.
If restart_lsn is not available and confirmed_flush is pointing to page
boundary, then in any case we should somehow handle this case and adjust
startlsn to point on the valid record position (by jjust adding page header
size?).

Well, restart_lsn is always available on live slot: it is initially set
in ReplicationSlotReserveWal during slot creation.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Michael Paquier
michael@paquier.xyz
In reply to: Konstantin Knizhnik (#2)
Re: Possible bug in logical replication.

On Thu, May 17, 2018 at 12:31:09PM +0300, Konstantin Knizhnik wrote:

I wonder who is now maintaining logical replication stuff and whether this
bug is going to be fixed?

I haven't looked at your problem in details so I cannot give a sure
conclusion, but when it comes to pg_replication_slot_advance, which is
new as of v11, you need to look at this commit:
commit: 9c7d06d60680c7f00d931233873dee81fdb311c6
author: Simon Riggs <simon@2ndQuadrant.com>
date: Wed, 17 Jan 2018 11:38:34 +0000
Ability to advance replication slots

Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().

For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.

Author: Petr Jelinek
Reviewed-by: Simon Riggs

I am adding an open item.
--
Michael

#5Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Arseny Sher (#3)
1 attachment(s)
Re: Possible bug in logical replication.

At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher <a.sher@postgrespro.ru> wrote in <87in7md034.fsf@ars-thinkpad>

Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

I think that using restart_lsn instead of confirmed_flush is not right approach.
If restart_lsn is not available and confirmed_flush is pointing to page
boundary, then in any case we should somehow handle this case and adjust
startlsn to point on the valid record position (by jjust adding page header
size?).

Well, restart_lsn is always available on live slot: it is initially set
in ReplicationSlotReserveWal during slot creation.

restart_lsn stays at the beginning of a transaction until the
transaction ends so just using restart_lsn allows repeated
decoding of a transaction, in short, rewinding occurs. The
function works only for inactive slot so the current code works
fine on this point. Addition to that restart_lsn also can be on a
page bounary.

We can see the problem easily.

1. Just create a logical replication slot with setting current LSN.

select pg_create_logical_replication_slot('s1', 'pgoutput');

2. Advance LSN by two or three pages by doing anyting.

3. Advance the slot to a page bounadry.

e.g. select pg_replication_slot_advance('s1', '0/9624000');

4. advance the slot further, then crash.

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

The function accepts any LSN even if it is not at the begiining
of a record. We will see errors or crashs or infinite waiting or
maybe any kind of trouble by such values. The moved LSN must
always be at the "end of a record" (that is, at the start of the
next recored). The attached patch also fixes this.

The documentation doesn't look requiring a fix.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_pg_logical_replication_slot_advance.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d9e10263bb..d3cb777f9f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -318,6 +318,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 /*
  * Helper function for advancing physical replication slot forward.
+ *
+ * This function accepts arbitrary LSN even if the LSN is not at the beginning
+ * of a record. This can lead to any kind of misbehavior but currently the
+ * value is used only to determine up to what wal segment to keep and
+ * successive implicit advancing fixes the state.
  */
 static XLogRecPtr
 pg_physical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
@@ -344,6 +349,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
 	XLogRecPtr	retlsn = InvalidXLogRecPtr;
+	XLogRecPtr	upto;
 
 	PG_TRY();
 	{
@@ -354,6 +360,13 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 									logical_read_local_xlog_page,
 									NULL, NULL, NULL);
 
+		/*
+		 * startlsn can be on page boundary but it is not accepted as explicit
+		 * parameter to XLogReadRecord. Set it in reader context.
+		 */
+		Assert(startlsn != InvalidXLogRecPtr);
+		upto = ctx->reader->EndRecPtr = startlsn;
+	
 		CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
 												   "logical decoding");
 
@@ -361,22 +374,18 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 		InvalidateSystemCaches();
 
 		/* Decode until we run out of records */
-		while ((startlsn != InvalidXLogRecPtr && startlsn < moveto) ||
-			   (ctx->reader->EndRecPtr != InvalidXLogRecPtr && ctx->reader->EndRecPtr < moveto))
+		while (ctx->reader->EndRecPtr <= moveto)
 		{
 			XLogRecord *record;
 			char	   *errm = NULL;
+ 
+			/* ctx->reader->EndRecPtr cannot be go backward here */
+			upto = ctx->reader->EndRecPtr;
 
-			record = XLogReadRecord(ctx->reader, startlsn, &errm);
+			record = XLogReadRecord(ctx->reader, InvalidXLogRecPtr, &errm);
 			if (errm)
 				elog(ERROR, "%s", errm);
 
-			/*
-			 * Now that we've set up the xlog reader state, subsequent calls
-			 * pass InvalidXLogRecPtr to say "continue from last record"
-			 */
-			startlsn = InvalidXLogRecPtr;
-
 			/*
 			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
 			 * store the description into our tuplestore.
@@ -384,18 +393,14 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* check limits */
-			if (moveto <= ctx->reader->EndRecPtr)
-				break;
-
 			CHECK_FOR_INTERRUPTS();
 		}
 
 		CurrentResourceOwner = old_resowner;
 
-		if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
+		if (startlsn != upto)
 		{
-			LogicalConfirmReceivedLocation(moveto);
+			LogicalConfirmReceivedLocation(upto);
 
 			/*
 			 * If only the confirmed_flush_lsn has changed the slot won't get
#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro HORIGUCHI (#5)
1 attachment(s)
Re: Possible bug in logical replication.

On Fri, May 18, 2018 at 2:37 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher <a.sher@postgrespro.ru> wrote in <87in7md034.fsf@ars-thinkpad>

Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

I think that using restart_lsn instead of confirmed_flush is not right approach.
If restart_lsn is not available and confirmed_flush is pointing to page
boundary, then in any case we should somehow handle this case and adjust
startlsn to point on the valid record position (by jjust adding page header
size?).

Well, restart_lsn is always available on live slot: it is initially set
in ReplicationSlotReserveWal during slot creation.

restart_lsn stays at the beginning of a transaction until the
transaction ends so just using restart_lsn allows repeated
decoding of a transaction, in short, rewinding occurs. The
function works only for inactive slot so the current code works
fine on this point. Addition to that restart_lsn also can be on a
page bounary.

We can see the problem easily.

1. Just create a logical replication slot with setting current LSN.

select pg_create_logical_replication_slot('s1', 'pgoutput');

2. Advance LSN by two or three pages by doing anyting.

3. Advance the slot to a page bounadry.

e.g. select pg_replication_slot_advance('s1', '0/9624000');

4. advance the slot further, then crash.

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

I confirmed that the attached patch fixes this problem as well as the
same problem reported on another thread.

I'm not sure it's a good approach to change the state of xlogreader
directly in the replication slot codes because it also means that we
have to care about this code as well when xlogreader code is changed.
Another possible way might be to make XLogFindNextRecord valid in
backend code and move startlsn to the first valid record with an lsn

= startlsn by using that function. Please find attached patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

find_next_valid_record.patchapplication/octet-stream; name=find_next_valid_record.patchDownload
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a7953f3..4b3fbff 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -856,13 +856,6 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 	return true;
 }
 
-#ifdef FRONTEND
-/*
- * Functions that are currently not needed in the backend, but are better
- * implemented inside xlogreader.c because of the internal facilities available
- * here.
- */
-
 /*
  * Find the first record with an lsn >= RecPtr.
  *
@@ -981,9 +974,6 @@ out:
 	return found;
 }
 
-#endif							/* FRONTEND */
-
-
 /* ----------------------------------------
  * Functions for decoding the data and block references in a record.
  * ----------------------------------------
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d9e1026..4199f6d 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -357,6 +357,9 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 		CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
 												   "logical decoding");
 
+		/* Move to the first valid record with an lsn >= startlsn */
+		startlsn = XLogFindNextRecord(ctx->reader, startlsn);
+
 		/* invalidate non-timetravel entries */
 		InvalidateSystemCaches();
 
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index f307b63..7df5a80 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -211,10 +211,7 @@ extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
 
 /* Invalidate read state */
 extern void XLogReaderInvalReadState(XLogReaderState *state);
-
-#ifdef FRONTEND
 extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
-#endif							/* FRONTEND */
 
 /* Functions for decoding an XLogRecord */
 
#7Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#6)
Re: Possible bug in logical replication.

Hello.

At Wed, 23 May 2018 15:56:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+5Tz0Z8zHOmD=sU5F=cygoEjHs7WvbBDL07fH9ayVaw@mail.gmail.com>

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

I confirmed that the attached patch fixes this problem as well as the
same problem reported on another thread.

I'm not sure it's a good approach to change the state of xlogreader
directly in the replication slot codes because it also means that we
have to care about this code as well when xlogreader code is changed.

XLogReadRecrod checks whether state->ReadRecPtr is invalid or not
in the case and works as the same to the explicit LSN case if
so. That is suggesting the usage. (I found no actual use case,
though.) It seems somewhat uneasy also to me, though..

Another possible way might be to make XLogFindNextRecord valid in
backend code and move startlsn to the first valid record with an lsn

= startlsn by using that function. Please find attached patch.

The another reason for the code is the fact that confirmed_lsn is
storing EndRecPtr after the last XLogReadRecord call. That is,
from the definition, confirmed_lsn must be on the start of a
record or page boundary and error out if not. For that reason,
calling XLogFindNextRecord would not be the right thing to do
here. We should just skip a header if we are on a boundary but it
already done in XLogReadRecord.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#8Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro HORIGUCHI (#7)
Re: Possible bug in logical replication.

On Thu, May 24, 2018 at 10:14:01AM +0900, Kyotaro HORIGUCHI wrote:

At Wed, 23 May 2018 15:56:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+5Tz0Z8zHOmD=sU5F=cygoEjHs7WvbBDL07fH9ayVaw@mail.gmail.com>

Another possible way might be to make XLogFindNextRecord valid in
backend code and move startlsn to the first valid record with an lsn

= startlsn by using that function. Please find attached patch.

The another reason for the code is the fact that confirmed_lsn is
storing EndRecPtr after the last XLogReadRecord call. That is,
from the definition, confirmed_lsn must be on the start of a
record or page boundary and error out if not. For that reason,
calling XLogFindNextRecord would not be the right thing to do
here. We should just skip a header if we are on a boundary but it
already done in XLogReadRecord.

Maybe I am being too naive, but wouldn't it be just enough to update the
confirmed flush LSN to ctx->reader->ReadRecPtr?  This way, the slot
advances up to the beginning of the last record where user wants to
advance, and not the beginning of the next record:
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -395,7 +395,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
         if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
         {
-            LogicalConfirmReceivedLocation(moveto);
+            LogicalConfirmReceivedLocation(ctx->reader->ReadRecPtr);

/*
* If only the confirmed_flush_lsn has changed the slot won't get

I agree with the point made above to not touch manually the XLogReader
context out of xlogreader.c.
--
Michael

#9Arseny Sher
a.sher@postgrespro.ru
In reply to: Kyotaro HORIGUCHI (#5)
Re: Possible bug in logical replication.

Hello,

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

restart_lsn stays at the beginning of a transaction until the
transaction ends so just using restart_lsn allows repeated
decoding of a transaction, in short, rewinding occurs. The
function works only for inactive slot so the current code works
fine on this point.

Sorry, I do not follow. restart_lsn is advanced whenever there is a
consistent snapshot dumped (in xl_running_xacts) which is old enough to
wholly decode all xacts not yet confirmed by the client. Could you
please elaborate, what's wrong with that?

Addition to that restart_lsn also can be on a
page bounary.

Do you have an example of that? restart_lsn is set initially to WAL
insert position at ReplicationSlotReserveWal, and later it always points
to xl_running_xacts record with consistent snapshot dumped.

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

There is a minor issue with the patch. Now slot advancement hangs
polling for new WAL on my example from [1]/messages/by-id/873720e4hf.fsf@ars-thinkpad; most probably because we
must exit the loop when ctx->reader->EndRecPtr == moveto.

The function accepts any LSN even if it is not at the begiining
of a record. We will see errors or crashs or infinite waiting or
maybe any kind of trouble by such values. The moved LSN must
always be at the "end of a record" (that is, at the start of the
next recored). The attached patch also fixes this.

Indeed, but we have these problems only if we are trying to read WAL
since confirmed_flush.

[1]: /messages/by-id/873720e4hf.fsf@ars-thinkpad

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#10Arseny Sher
a.sher@postgrespro.ru
In reply to: Michael Paquier (#8)
Re: Possible bug in logical replication.

Michael Paquier <michael@paquier.xyz> writes:

Maybe I am being too naive, but wouldn't it be just enough to update the
confirmed flush LSN to ctx->reader->ReadRecPtr? This way, the slot
advances up to the beginning of the last record where user wants to
advance, and not the beginning of the next record:

Same problem should be handled at pg_logical_slot_get_changes_guts and
apply worker feedback; and there is a convention that all commits since
confirmed_flush must be decoded, so we risk decoding such boundary
commit twice.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Arseny Sher (#10)
Re: Possible bug in logical replication.

This thread seems to have died down without any fix being proposed.
Simon, you own this open item.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Simon Riggs
simon@2ndquadrant.com
In reply to: Alvaro Herrera (#11)
Re: Possible bug in logical replication.

On 6 June 2018 at 17:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

This thread seems to have died down without any fix being proposed.
Simon, you own this open item.

Thanks, will look.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Michael Paquier
michael@paquier.xyz
In reply to: Simon Riggs (#12)
Re: Possible bug in logical replication.

On Thu, Jun 07, 2018 at 08:32:10AM +0100, Simon Riggs wrote:

On 6 June 2018 at 17:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

This thread seems to have died down without any fix being proposed.
Simon, you own this open item.

Thanks, will look.

Petr and I have found a couple of issues about the slot advance stuff on
this thread:
/messages/by-id/20180525052805.GA15634@paquier.xyz

The result is f731cfa, which, per my tests, is able to take care of this
issue as well as advancing first a slot to a WAL page boundary, and then
advancing it to the latest LSN available does not trigger any assertions
anymore. It would be nice if there is a double-check though, so I am
letting this thread on the list of open items for now.
--
Michael

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#13)
Re: Possible bug in logical replication.

Hello

Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
Micha�l's commit fixes the reported bug?

Thanks,

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alvaro Herrera (#14)
Re: Possible bug in logical replication.

On Fri, Jun 15, 2018 at 5:06 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Hello

Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
Michaël's commit fixes the reported bug?

I don't confirm that commit deeply yet but I have confirmed that the
reported bug is fixed using the following test case which led an
assertion failure. And this test case is the same as the previous
report[1]/messages/by-id/34d66f63-40a9-4c3e-c9a1-248d1e393d29@enterprisedb.com.

postgres(1:128004)=# select pg_create_logical_replication_slot('s1',
'pgoutput');
2018-06-15 18:04:31.301 JST [128004] LOG: logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.301 JST [128004] DETAIL: There are no running transactions.
2018-06-15 18:04:31.301 JST [128004] STATEMENT: select
pg_create_logical_replication_slot('s1', 'pgoutput');
pg_create_logical_replication_slot
------------------------------------
(s1,0/16453C0)
(1 row)

postgres(1:128004)=# select pg_switch_wal();
pg_switch_wal
---------------
0/16453D8
(1 row)

postgres(1:128004)=# select pg_switch_wal();
pg_switch_wal
---------------
0/2000000
(1 row)

postgres(1:128004)=# select pg_replication_slot_advance('s1', '0/2000000');
2018-06-15 18:04:31.338 JST [128004] LOG: starting logical decoding
for slot "s1"
2018-06-15 18:04:31.338 JST [128004] DETAIL: Streaming transactions
committing after 0/16453C0, reading WAL from 0/1645388.
2018-06-15 18:04:31.338 JST [128004] STATEMENT: select
pg_replication_slot_advance('s1', '0/2000000');
2018-06-15 18:04:31.339 JST [128004] LOG: logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.339 JST [128004] DETAIL: There are no running transactions.
2018-06-15 18:04:31.339 JST [128004] STATEMENT: select
pg_replication_slot_advance('s1', '0/2000000');
pg_replication_slot_advance
-----------------------------
(s1,0/2000000)
(1 row)

postgres(1:128004)=# create table a (c int);
select pg_replication_slot_advance('s1', pg_current_wal_lsn());
CREATE TABLE
postgres(1:128004)=# select pg_replication_slot_advance('s1',
pg_current_wal_lsn());
2018-06-15 18:04:31.401 JST [128004] LOG: starting logical decoding
for slot "s1"
2018-06-15 18:04:31.401 JST [128004] DETAIL: Streaming transactions
committing after 0/2000000, reading WAL from 0/1645388.
2018-06-15 18:04:31.401 JST [128004] STATEMENT: select
pg_replication_slot_advance('s1', pg_current_wal_lsn());
2018-06-15 18:04:31.402 JST [128004] LOG: logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.402 JST [128004] DETAIL: There are no running transactions.
2018-06-15 18:04:31.402 JST [128004] STATEMENT: select
pg_replication_slot_advance('s1', pg_current_wal_lsn());
pg_replication_slot_advance
-----------------------------
(s1,0/2017828)
(1 row)

[1]: /messages/by-id/34d66f63-40a9-4c3e-c9a1-248d1e393d29@enterprisedb.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#16Arseny Sher
a.sher@postgrespro.ru
In reply to: Alvaro Herrera (#14)
1 attachment(s)
Re: Possible bug in logical replication.

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
Michaël's commit fixes the reported bug?

I confirm that starting reading WAL since restart_lsn as implemented in
f731cfa fixes this issue, as well as the second issue tushar mentioned
at [1]/messages/by-id/5f85bf41-098e-c4e1-7332-9171fef57a0a@enterprisedb.com. I think that the code still can be improved a bit though --
consider the attached patch:
* pg_logical_replication_slot_advance comment was not very informative
and even a bit misleading: it said that we use confirmed_flush_lsn and
restart_lsn, but didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.

[1]: /messages/by-id/5f85bf41-098e-c4e1-7332-9171fef57a0a@enterprisedb.com

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Cosmetic-review-for-f731cfa.patchtext/x-diffDownload
From d8ed8ae3eec54b716d7dbb35379d0047a96c6c75 Mon Sep 17 00:00:00 2001
From: Arseny Sher <sher-ars@yandex.ru>
Date: Fri, 15 Jun 2018 18:11:17 +0300
Subject: [PATCH] Cosmetic review for f731cfa.

* pg_logical_replication_slot_advance comment was not very informative and even
  a bit misleading: it said that we use confirmed_flush_lsn and restart_lsn, but
  didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.
---
 src/backend/replication/slotfuncs.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..597e81f4d9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,24 +341,26 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn allowing
+ * to recycle WAL and old catalog tuples. We do it in special fast_forward
+ * mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
 {
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
+	/*
+	 * Start reading WAL at restart_lsn, which certainly points to the valid
+	 * record.
+	 */
 	XLogRecPtr	startlsn = MyReplicationSlot->data.restart_lsn;
 	XLogRecPtr	retlsn = MyReplicationSlot->data.confirmed_flush;
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/* start_lsn doesn't matter here, we don't replay xacts at all */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
 									true,
@@ -388,17 +390,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 */
 			startlsn = InvalidXLogRecPtr;
 
-			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
-			 */
+			/* Changes are not actually produced in fast_forward mode. */
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* Stop once the moving point wanted by caller has been reached */
-			if (moveto <= ctx->reader->EndRecPtr)
-				break;
-
 			CHECK_FOR_INTERRUPTS();
 		}
 
-- 
2.11.0

#17Michael Paquier
michael@paquier.xyz
In reply to: Arseny Sher (#16)
Re: Possible bug in logical replication.

On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:

I confirm that starting reading WAL since restart_lsn as implemented in
f731cfa fixes this issue, as well as the second issue tushar mentioned
at [1].

Thanks!

+    /*
+     * Start reading WAL at restart_lsn, which certainly points to the valid
+     * record.
+     */
     XLogRecPtr    startlsn = MyReplicationSlot->data.restart_lsn;
     XLogRecPtr    retlsn = MyReplicationSlot->data.confirmed_flush;

What does this one actually bring?

     PG_TRY();
     {
-        /* restart at slot's confirmed_flush */
+        /* start_lsn doesn't matter here, we don't replay xacts at all */
         ctx = CreateDecodingContext(InvalidXLogRecPtr,
                                     NIL,
                                     true,

Okay for this one.

-    /*
-     * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-     * store the description into our tuplestore.
-     */
+    /* Changes are not actually produced in fast_forward mode. */

This one is a good idea. Now CreateDecodingContext is missing the
description of what fast_forward actually does, aka no changes are
produced. Could you update your patch to reflect that? That would be
useful for future callers of CreateDecodingContext as well.

- /* Stop once the moving point wanted by caller has been reached */
- if (moveto <= ctx->reader->EndRecPtr)
- break;
-
CHECK_FOR_INTERRUPTS();

It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?
--
Michael

#18Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#17)
1 attachment(s)
Re: Possible bug in logical replication.

On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:

On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?

I have been chewing a bit more on the proposed patch, finishing with the
attached to close the loop. Thoughts?
--
Michael

Attachments:

slot-advance-comments.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 61588d626f..6068aefd02 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..7c03bf69d3 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -346,7 +346,8 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
  * The LSN position to move to is checked by doing a per-record scan and
  * logical decoding which makes sure that confirmed_lsn is updated to a
  * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * changes.  As decoding is done with fast_forward mode, no changes are
+ * actually generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -389,8 +390,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, but
+			 * that the slot's data still needs to be updated.
 			 */
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
#19Arseny Sher
a.sher@postgrespro.ru
In reply to: Michael Paquier (#18)
1 attachment(s)
Re: Possible bug in logical replication.

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:

On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?

I have been chewing a bit more on the proposed patch, finishing with the
attached to close the loop. Thoughts?

Sorry for being pedantic, but it seems to me worthwhile to mention *why*
we need decoding machinery at all -- like I wrote:

+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).

Also,

* The slot's restart_lsn is used as start point for reading records,

This is clearly seen from the code, I propose to remove this.

* while confirmed_lsn is used as base point for the decoding context.

And as I wrote, this doesn't matter as changes are not produced.

* The LSN position to move to is checked by doing a per-record scan and
* logical decoding which makes sure that confirmed_lsn is updated to a
* LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * changes.  As decoding is done with fast_forward mode, no changes are
+ * actually generated.

confirmed_lsn is always updated to `moveto` unless we run out of WAL
earlier (and unless we try to move slot backwards, which is obviously
forbidden) -- consistent changes are practically irrelevant
here. Moreover, we can directly set confirmed_lsn and still have
consistent changes further as restart_lsn and xmin of the slot are not
touched. What we actually do here is trying to advance *restart_lsn and
xmin* as far as we can but up to the point which ensures that decoding
can assemble a consistent snapshot allowing to fully decode all COMMITs
since updated `confirmed_flush_lsn`. All this happens in
SnapBuildProcessRunningXacts.

It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?

Probably, though I am not sure what is the point of this. Ok, I keep
this check in the updated (with your comments) patch and CC'ing Petr.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

lslot_advance_comments.patchtext/x-diffDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 61588d626f..76bafca41c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -335,11 +335,14 @@ CreateInitDecodingContext(char *plugin,
  *		The LSN at which to start decoding.  If InvalidXLogRecPtr, restart
  *		from the slot's confirmed_flush; otherwise, start from the specified
  *		location (but move it forwards to confirmed_flush if it's older than
- *		that, see below).
+ *		that, see below). Doesn't matter in fast_forward mode.
  *
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..0a4985ef8c 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,12 +341,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).
+ * We do it in special fast_forward mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +356,6 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
 									true,
@@ -388,10 +385,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 */
 			startlsn = InvalidXLogRecPtr;
 
-			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
-			 */
+			/* Changes are not actually produced in fast_forward mode. */
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
 
#20Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Arseny Sher (#19)
Re: Possible bug in logical replication.

On 20/06/18 09:59, Arseny Sher wrote:

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:

On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?

I have been chewing a bit more on the proposed patch, finishing with the
attached to close the loop. Thoughts?

Sorry for being pedantic, but it seems to me worthwhile to mention *why*
we need decoding machinery at all -- like I wrote:

+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).

+1

Also,

* The slot's restart_lsn is used as start point for reading records,

This is clearly seen from the code, I propose to remove this.

Given there was just bug fix done for this I guess the extra clarity
there does not hurt.

* The LSN position to move to is checked by doing a per-record scan and
* logical decoding which makes sure that confirmed_lsn is updated to a
* LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * changes.  As decoding is done with fast_forward mode, no changes are
+ * actually generated.

confirmed_lsn is always updated to `moveto` unless we run out of WAL
earlier (and unless we try to move slot backwards, which is obviously
forbidden) -- consistent changes are practically irrelevant
here. Moreover, we can directly set confirmed_lsn and still have
consistent changes further as restart_lsn and xmin of the slot are not
touched. What we actually do here is trying to advance *restart_lsn and
xmin* as far as we can but up to the point which ensures that decoding
can assemble a consistent snapshot allowing to fully decode all COMMITs
since updated `confirmed_flush_lsn`. All this happens in
SnapBuildProcessRunningXacts.

Those are two different things, here is consistent snapshot for logical
decoding without which we can't decode and that's handled by restart_lsn
and xmin. But the consistent stream of data for the consumer is handled
by confirmed_lsn (and the comment says that). You have to take into
account that next use of the slot can consume data (ie may be done via
one of the other interfaces and not by move). So I think what Michael
has here is correct.

It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?

Well, it seems wasteful to just exit there if we already finished all
the requested work, also gives some consistency with the coding of
get/peek_changes. Not very important for the functionality either way.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#21Michael Paquier
michael@paquier.xyz
In reply to: Petr Jelinek (#20)
Re: Possible bug in logical replication.

On Thu, Jun 21, 2018 at 12:18:44PM +0200, Petr Jelinek wrote:

On 20/06/18 09:59, Arseny Sher wrote:

Michael Paquier <michael@paquier.xyz> writes:

It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted. Petr, isn't that the intention
here?

Well, it seems wasteful to just exit there if we already finished all
the requested work, also gives some consistency with the coding of
get/peek_changes. Not very important for the functionality either way.

I like the concept of consistency.

Could it be possible to get a patch from all the feedback and exchange
gathered here? Petr, I think that it would not hurt if you use the set
of words and comments you think is most adapted as the primary author of
the feature.
--
Michael

#22Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#14)
Re: Possible bug in logical replication.

Hello.

At Thu, 14 Jun 2018 16:06:43 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20180614200643.3my362zmfiwitrni@alvherre.pgsql>

Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
Michaël's commit fixes the reported bug?

pg_advance_replication_slots can advance uninitialized physical
slots and that might not be good. (Logical slots always have
initial invalid values in thw two lsn columns.)

About scanning from restart_lsn in the advancing function, I
think I confirmed that the value always comes from
XLogRecordBuffer.origptr, which comes from ReadRecPtr, not
EndRecPtr, which cannot be on page boundary.

FWIW, as the result, it looks fine for me also regarding the
issue on this thread.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#23Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro HORIGUCHI (#22)
Re: Possible bug in logical replication.

On Fri, Jun 22, 2018 at 04:33:12PM +0900, Kyotaro HORIGUCHI wrote:

pg_advance_replication_slots can advance uninitialized physical
slots and that might not be good. (Logical slots always have
initial invalid values in thw two lsn columns.)

The current logic is careful that users willing to move to a new
position cannot move in the future, but the logic is visibly wanted to
accept past values. Petr, what do you think? KeepLogSeg() won't return
negative values so some applications may take advantage of that. Or
should advancing be simply disabled for non-initialized slots?
--
Michael

#24Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#21)
1 attachment(s)
Re: Possible bug in logical replication.

On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:

Could it be possible to get a patch from all the feedback and exchange
gathered here? Petr, I think that it would not hurt if you use the set
of words and comments you think is most adapted as the primary author of
the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged. Any objections to that?
--
Michael

Attachments:

slot-advance-comment-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index c2d0e0c723..7e10a027ca 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..6c2addd5b7 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -343,10 +343,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
  * Helper function for advancing logical replication slot forward.
  * The slot's restart_lsn is used as start point for reading records,
  * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just use LogicalConfirmReceivedLocation to update
+ * confirmed_flush_lsn, we had better digest WAL to advance restart_lsn
+ * allowing to recycle WAL and old catalog tuples.  As decoding is done
+ * with fast_forward mode, no changes are generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +358,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Note that start_lsn does not matter here, as with fast_forward mode
+		 * no transactions are replayed.
+		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
 									true,
@@ -378,6 +381,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			XLogRecord *record;
 			char	   *errm = NULL;
 
+			/*
+			 * Start reading WAL at the slot's restart_lsn, which certainly
+			 * points to the valid record.
+			 */
 			record = XLogReadRecord(ctx->reader, startlsn, &errm);
 			if (errm)
 				elog(ERROR, "%s", errm);
@@ -389,8 +396,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, and
+			 * that the slot's data is still updated.
 			 */
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#24)
Re: Possible bug in logical replication.

On 2018-Jul-03, Michael Paquier wrote:

On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:

Could it be possible to get a patch from all the feedback and exchange
gathered here? Petr, I think that it would not hurt if you use the set
of words and comments you think is most adapted as the primary author of
the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged. Any objections to that?

Let me review tomorrow.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26Arseny Sher
a.sher@postgrespro.ru
In reply to: Michael Paquier (#24)
Re: Possible bug in logical replication.

Michael Paquier <michael@paquier.xyz> writes:

On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:

Could it be possible to get a patch from all the feedback and exchange
gathered here? Petr, I think that it would not hurt if you use the set
of words and comments you think is most adapted as the primary author of
the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged. Any objections to that?

I'm practically happy with this.

* while confirmed_lsn is used as base point for the decoding context.

This line is excessive as now we have comment below saying it doesn't
matter.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#27Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#25)
Re: Possible bug in logical replication.

On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:

Let me review tomorrow.

Of course, please feel free.
--
Michael

#28Michael Paquier
michael@paquier.xyz
In reply to: Arseny Sher (#26)
1 attachment(s)
Re: Possible bug in logical replication.

On Tue, Jul 03, 2018 at 09:16:42AM +0300, Arseny Sher wrote:

Michael Paquier <michael@paquier.xyz> writes:

On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:

Could it be possible to get a patch from all the feedback and exchange
gathered here? Petr, I think that it would not hurt if you use the set
of words and comments you think is most adapted as the primary author of
the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged. Any objections to that?

I'm practically happy with this.

* while confirmed_lsn is used as base point for the decoding context.

This line is excessive as now we have comment below saying it doesn't
matter.

Okay, let's do as you suggest then. Do you find the attached adapted?
--
Michael

Attachments:

slot-advance-comment-v3.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index c2d0e0c723..7e10a027ca 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -340,6 +340,9 @@ CreateInitDecodingContext(char *plugin,
  * output_plugin_options
  *		contains options passed to the output plugin.
  *
+ * fast_forward
+ *		bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
  *		actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..5cedb688a6 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,12 +341,11 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * The slot's restart_lsn is used as start point for reading records.
+ * While we could just use LogicalConfirmReceivedLocation to update
+ * confirmed_flush_lsn, we had better digest WAL to advance restart_lsn
+ * allowing to recycle WAL and old catalog tuples.  As decoding is done
+ * with fast_forward mode, no changes are generated.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +357,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Note that start_lsn does not matter here, as with fast_forward mode
+		 * no transactions are replayed.
+		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
 									true,
@@ -378,6 +380,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			XLogRecord *record;
 			char	   *errm = NULL;
 
+			/*
+			 * Start reading WAL at the slot's restart_lsn, which certainly
+			 * points to the valid record.
+			 */
 			record = XLogReadRecord(ctx->reader, startlsn, &errm);
 			if (errm)
 				elog(ERROR, "%s", errm);
@@ -389,8 +395,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Note that changes are not generated in fast_forward mode, and
+			 * that the slot's data is still updated.
 			 */
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
#29Arseny Sher
a.sher@postgrespro.ru
In reply to: Michael Paquier (#28)
Re: Possible bug in logical replication.

Michael Paquier <michael@paquier.xyz> writes:

Okay, let's do as you suggest then. Do you find the attached adapted?

Yes, thanks!

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#30Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#27)
Re: Possible bug in logical replication.

On Wed, Jul 04, 2018 at 10:50:28AM +0900, Michael Paquier wrote:

On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:

Let me review tomorrow.

Of course, please feel free.

Alvaro, are you planning to look at that to close the loop? The latest
version is here:
/messages/by-id/20180709070200.GC30202@paquier.xyz
--
Michael

#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#30)
1 attachment(s)
Re: Possible bug in logical replication.

On 2018-Jul-12, Michael Paquier wrote:

On Wed, Jul 04, 2018 at 10:50:28AM +0900, Michael Paquier wrote:

On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:

Let me review tomorrow.

Of course, please feel free.

Alvaro, are you planning to look at that to close the loop? The latest
version is here:
/messages/by-id/20180709070200.GC30202@paquier.xyz

In the immortal words of Julian Bream: "yeah, I didn't like any of that".

I started thinking that the "while we could do X, we better not because
Y" new wording in the comment was misleading -- the comment is precisely
to convey that we must NOT do X, so why say "we could"? I reworded that
comment a few times until it made sense. Then I noticed the other
comments were either misplaced or slightly misleading, so I moved them
to their proper places, then reworded them thoroughly.

I also moved some assignments from the declaration section to the code
section, so that I could attach proper comments to each, to improve
clarity of *why* we do those things.

I then noticed that we get a XLogRecord from XLogReadRecord, but then
fail to do anything with it, so I changed the code to use a bool
instead, which I think is clearer.

I think the proposed comment before the LogicalDecodingProcessRecord
call failed to convey the important ideas, so I rewrote that one also.

There is no struct member called confirmed_flush_lsn anywhere.

The tense of some words in CreateDecodingContext was wrong.

I also back-patched two minor changes from Tom's 3cb646264e8c.

BTW I think I'm starting to have a vague idea of logical decoding now.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-replslot-advance-comment-updates.patchtext/plain; charset=us-asciiDownload
From 6abc059aaddcdc0e368a9e2c8dec49a09819cbef Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 18 Jul 2018 11:32:55 -0400
Subject: [PATCH] replslot advance comment updates

---
 src/backend/replication/logical/logical.c |  5 ++-
 src/backend/replication/slotfuncs.c       | 75 ++++++++++++++++++++-----------
 2 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 61588d626f..c9bbdcda74 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -338,7 +338,10 @@ CreateInitDecodingContext(char *plugin,
  *		that, see below).
  *
  * output_plugin_options
- *		contains options passed to the output plugin.
+ *		options passed to the output plugin.
+ *
+ * fast_forward
+ *		bypass the generation of logical changes.
  *
  * read_page, prepare_write, do_write, update_progress
  *		callbacks that have to be filled to perform the use-case dependent,
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 23af32355b..08d20a9470 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -318,9 +318,10 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 /*
  * Helper function for advancing physical replication slot forward.
- * The LSN position to move to is compared simply to the slot's
- * restart_lsn, knowing that any position older than that would be
- * removed by successive checkpoints.
+ *
+ * The LSN position to move to is compared simply to the slot's restart_lsn,
+ * knowing that any position older than that would be removed by successive
+ * checkpoints.
  */
 static XLogRecPtr
 pg_physical_replication_slot_advance(XLogRecPtr moveto)
@@ -341,67 +342,87 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
+ *
  * The slot's restart_lsn is used as start point for reading records,
  * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ *
+ * We cannot just do LogicalConfirmReceivedLocation to update confirmed_flush,
+ * because we need to digest WAL to advance restart_lsn allowing to recycle
+ * WAL and removal of old catalog tuples.  As decoding is done in fast_forward
+ * mode, no changes are generated anyway.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
 {
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
-	XLogRecPtr	startlsn = MyReplicationSlot->data.restart_lsn;
-	XLogRecPtr	retlsn = MyReplicationSlot->data.confirmed_flush;
+	XLogRecPtr	startlsn;
+	XLogRecPtr	retlsn;
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as Invalid, so that we start processing from confirmed_flush.
+		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
-									true,
+									true,	/* fast_forward */
 									logical_read_local_xlog_page,
 									NULL, NULL, NULL);
 
-		CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
-												   "logical decoding");
+		/*
+		 * Start reading at the slot's restart_lsn, which we know to point to
+		 * a valid record.
+		 */
+		startlsn = MyReplicationSlot->data.restart_lsn;
+
+		/* Initialize our return value in case we don't do anything */
+		retlsn = MyReplicationSlot->data.confirmed_flush;
 
 		/* invalidate non-timetravel entries */
 		InvalidateSystemCaches();
 
-		/* Decode until we run out of records */
-		while ((startlsn != InvalidXLogRecPtr && startlsn < moveto) ||
-			   (ctx->reader->EndRecPtr != InvalidXLogRecPtr && ctx->reader->EndRecPtr < moveto))
+		/* Decode at least one record, until we run out of records */
+		while ((!XLogRecPtrIsInvalid(startlsn) &&
+				startlsn < moveto) ||
+			   (!XLogRecPtrIsInvalid(ctx->reader->EndRecPtr) &&
+				ctx->reader->EndRecPtr < moveto))
 		{
-			XLogRecord *record;
 			char	   *errm = NULL;
+			bool		gotrecord;
 
-			record = XLogReadRecord(ctx->reader, startlsn, &errm);
+			/*
+			 * Read records.  No changes are generated in fast_forward mode,
+			 * but snapbuilder/slot statuses are updated properly.
+			 */
+			gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;
 			if (errm)
 				elog(ERROR, "%s", errm);
 
-			/*
-			 * Now that we've set up the xlog reader state, subsequent calls
-			 * pass InvalidXLogRecPtr to say "continue from last record"
-			 */
+			/* Read sequentially from now on */
 			startlsn = InvalidXLogRecPtr;
 
 			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
+			 * Process the record.  Storage-level changes are ignored in
+			 * fast_forward mode, but other modules (such as snapbuilder)
+			 * might still have critical updates to do.
 			 */
-			if (record != NULL)
+			if (gotrecord)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* Stop once the moving point wanted by caller has been reached */
+			/* Stop once the requested target has been reached */
 			if (moveto <= ctx->reader->EndRecPtr)
 				break;
 
 			CHECK_FOR_INTERRUPTS();
 		}
 
+		/*
+		 * Logical decoding could have clobbered CurrentResourceOwner during
+		 * transaction management, so restore the executor's value.  (This is
+		 * a kluge, but it's not worth cleaning up right now.)
+		 */
 		CurrentResourceOwner = old_resowner;
 
 		if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
@@ -409,7 +430,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			LogicalConfirmReceivedLocation(moveto);
 
 			/*
-			 * If only the confirmed_flush_lsn has changed the slot won't get
+			 * If only the confirmed_flush LSN has changed the slot won't get
 			 * marked as dirty by the above. Callers on the walsender
 			 * interface are expected to keep track of their own progress and
 			 * don't need it written out. But SQL-interface users cannot
-- 
2.11.0

#32Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#31)
Re: Possible bug in logical replication.

On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:

In the immortal words of Julian Bream: "yeah, I didn't like any of
that".

One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.

I may be missing something, but I cannot apply your patch on HEAD so I
have not tested it. Anyway, I read through it and the thing does not
look logically wrong.

I also moved some assignments from the declaration section to the code
section, so that I could attach proper comments to each, to improve
clarity of *why* we do those things.

To be pedantic here, you could move the declarations of startlsn,
old_resowner and ctx directly inside the PG_TRY block.

I then noticed that we get a XLogRecord from XLogReadRecord, but then
fail to do anything with it, so I changed the code to use a bool
instead, which I think is clearer.

Matter of taste perhaps, I was fine with just manipulating the record
pointer.

I think the proposed comment before the LogicalDecodingProcessRecord
call failed to convey the important ideas, so I rewrote that one also.

There is no struct member called confirmed_flush_lsn anywhere.

This is referring to the system catalog field in pg_replication_slots.

BTW I think I'm starting to have a vague idea of logical decoding
now.

Nice.

PG_TRY();
{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as Invalid, so that we start processing from confirmed_flush.
+		 */

I'd rather mention InvalidXLogRecPtr directly here. Invalid alone makes
no real sense.

+ gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;

I would put parenthesis for clarity.
--
Michael

#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#32)
Re: Possible bug in logical replication.

On 2018-Jul-19, Michael Paquier wrote:

On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:

In the immortal words of Julian Bream: "yeah, I didn't like any of
that".

One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.

https://twitter.com/alvherre/status/1019652397306703873

OK, maybe not that "immortal" after all.

I may be missing something, but I cannot apply your patch on HEAD so I
have not tested it. Anyway, I read through it and the thing does not
look logically wrong.

Sorry, I forgot to mention this -- it applies to 11.

I also moved some assignments from the declaration section to the code
section, so that I could attach proper comments to each, to improve
clarity of *why* we do those things.

To be pedantic here, you could move the declarations of startlsn,
old_resowner and ctx directly inside the PG_TRY block.

Good idea, thanks.

I then noticed that we get a XLogRecord from XLogReadRecord, but then
fail to do anything with it, so I changed the code to use a bool
instead, which I think is clearer.

Matter of taste perhaps, I was fine with just manipulating the record
pointer.

Yeah, it works out to the same thing really. Maybe it's just me being
pedantic and annoyed when I realized that logical decoding does not
operate on the record itself but on the xlogreader struct instead.

TBH what I was actually doing at first was just casting the result of
XLogReadRecord to void and not doing anything with it, until I realized
that the logical decoding call below was important because of
side-effects. I thought since fast_forward mode was used,
XLogReadRecord would not return anything anyway -- but I added an assert
and realized that was not so.

I think the proposed comment before the LogicalDecodingProcessRecord
call failed to convey the important ideas, so I rewrote that one also.

There is no struct member called confirmed_flush_lsn anywhere.

This is referring to the system catalog field in pg_replication_slots.

Yeah, I think that's a bit misleading. (I very frequently do a tag-jump
on identifiers in comments, and it's uncomfortable that in this case it
jumps to the Docbook source rather than to the struct declaration.)

PG_TRY();
{
-		/* restart at slot's confirmed_flush */
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as Invalid, so that we start processing from confirmed_flush.
+		 */

I'd rather mention InvalidXLogRecPtr directly here. Invalid alone makes
no real sense.

OK. I was of two minds about that.

+ gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;

I would put parenthesis for clarity.

Or just put it back as a record pointer.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#34Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#33)
Re: Possible bug in logical replication.

On Wed, Jul 18, 2018 at 10:45:27PM -0400, Alvaro Herrera wrote:

On 2018-Jul-19, Michael Paquier wrote:

On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:

In the immortal words of Julian Bream: "yeah, I didn't like any of
that".

One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.

https://twitter.com/alvherre/status/1019652397306703873

OK, maybe not that "immortal" after all.

:)

This is referring to the system catalog field in pg_replication_slots.

Yeah, I think that's a bit misleading. (I very frequently do a tag-jump
on identifiers in comments, and it's uncomfortable that in this case it
jumps to the Docbook source rather than to the struct declaration.)

Okay, this makes sense as well and that's easier to grep.

+ gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;

I would put parenthesis for clarity.

Or just put it back as a record pointer.

I am fine either way if you want to have the last call. So please feel
free to choose what you prefer here. That's no big deal.
--
Michael

#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#34)
Re: Possible bug in logical replication.

On 2018-Jul-19, Michael Paquier wrote:

I am fine either way if you want to have the last call. So please feel
free to choose what you prefer here. That's no big deal.

Okay. You want to push it, or shall I?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#36Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#35)
Re: Possible bug in logical replication.

On Thu, Jul 19, 2018 at 12:38:53AM -0400, Alvaro Herrera wrote:

On 2018-Jul-19, Michael Paquier wrote:

I am fine either way if you want to have the last call. So please feel
free to choose what you prefer here. That's no big deal.

Okay. You want to push it, or shall I?

It seems to me that you are on it more than I. So please go ahead.
Thanks for asking.
--
Michael

#37Erik Rijkers
er@xs4all.nl
In reply to: Michael Paquier (#32)
Re: Possible bug in logical replication.

On 2018-07-19 03:42, Michael Paquier wrote:

On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:

In the immortal words of Julian Bream: "yeah, I didn't like any of
that".

One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.

Julian Bream is the 20th century finest classical guitarist.

The quote is from Bream's 1978 guitar masterclass (at 30:25)

https://www.youtube.com/watch?v=iAXUzIrukt0

#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#36)
Re: Possible bug in logical replication.

On 2018-Jul-19, Michael Paquier wrote:

On Thu, Jul 19, 2018 at 12:38:53AM -0400, Alvaro Herrera wrote:

On 2018-Jul-19, Michael Paquier wrote:

I am fine either way if you want to have the last call. So please feel
free to choose what you prefer here. That's no big deal.

Okay. You want to push it, or shall I?

It seems to me that you are on it more than I. So please go ahead.
Thanks for asking.

Done, thanks everyone.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services