make async slave to wait for lsn to be replayed

Started by Ivan Kartyshovover 9 years ago98 messages
#1Ivan Kartyshov
i.kartyshov@postgrespro.ru
1 attachment(s)

Hi hackers,

Few days earlier I've finished my work on WAITLSN statement utility, so
I’d like to share it.

Introduction
============

Our clients who deal with 9.5 and use asynchronous master-slave
replication, asked to make the wait-mechanism on the slave side to
prevent the situation when slave handles query which needs data (LSN)
that was received, flushed, but still not replayed.

Problem description
===================

The implementation:
Must handle the wait-mechanism using pg_sleep() in order not to load system
Must avoid race conditions if different backend want to wait for
different LSN
Must not take snapshot of DB, to avoid troubles with sudden minXID change
Must have optional timeout parameter if LSN traffic has stalled.
Must release on postmaster’s death or interrupts.

Implementation
==============

To avoid troubles with snapshots, WAITLSN was implemented as a utility
statement, this allows us to circumvent the snapshot-taking mechanism.
We tried different variants and the most effective way was to use Latches.
To handle interprocess interaction all Latches are stored in shared
memory and to cope with race conditions, each Latch is protected by a
Spinlock.
Timeout was made optional parameter, it is set in milliseconds.

What works
==========

Actually, it works well even with significant timeout or wait period
values, but of course there might be things I've overlooked.

How to use it
==========

WAITLSN ‘LSN’ [, timeout in ms];

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN ‘0/303EC60’, 10000;

#Or same without timeout.
WAITLSN ‘0/303EC60’;

Notice: WAITLSN will release on PostmasterDeath or Interruption events
if they come earlier then LSN or timeout.

Testing the implementation
======================

The implementation was tested with testgres and unittest python modules.

How to test this implementation:
Start master server
Make table test, insert tuple 1
Make asynchronous slave replication (9.5 wal_level = standby, 9.6 or
higher wal_level = replica)
Slave: START TRANSACTION ISOLATION LEVEL REPEATABLE READ ;
SELECT * FROM test;
Master: delete tuple + make vacuum + get new LSN
Slave: WAITLSN ‘newLSN’, 60000;
Waitlsn finished with FALSE “LSN doesn`t reached”
Slave: COMMIT;
WAITLSN ‘newLSN’, 60000;
Waitlsn finished with success (without NOTICE message)

The WAITLSN as expected wait LSN, and interrupts on PostmasterDeath,
interrupts or timeout.

Your feedback is welcome!

---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

waitlsn_10dev.patchtext/x-patch; name=waitlsn_10dev.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 77667bd..72c5390 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 0000000..6a8bdca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,108 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-WAITLSN">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAITLSN</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAITLSN</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAITLSN</refname>
+  <refpurpose>wait when target <acronym>LSN</> been replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAITLSN <replaceable class="PARAMETER">'LSN'</replaceable> [ , <replaceable class="PARAMETER">delay</replaceable> ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   The <command>WAITLSN</command> wait till target <acronym>LSN</> will
+   be replayed with an optional <quote>delay</> (milliseconds by default
+   infinity) to be wait for LSN to replayed.
+  </para>
+
+  <para>
+   <command>WAITLSN</command> provides a simple
+   interprocess <acronym>LSN</> wait mechanism for a backends on slave
+   in master-slave replication scheme on <productname>PostgreSQL</productname> database.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Target log sequence number to be wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">delay</replaceable></term>
+    <listitem>
+     <para>
+      Time in miliseconds to waiting for LSN to be replayed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+  <para>
+   Delay time to waiting for LSN to be replayed must be integer. For
+   default it is infinity. Waiting can be interupped using Ctl+C, or
+   by Postmaster death.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Configure and execute a waitlsn from
+   <application>psql</application>:
+
+<programlisting>
+WAITLSN '0/3F07A6B1', 10000;
+NOTICE:  LSN is not reached. Try to make bigger delay.
+WAITLSN
+
+WAITLSN '0/3F07A611';
+WAITLSN
+
+WAITLSN '0/3F0FF791', 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to make bigger delay.
+ERROR:  canceling statement due to user request
+</programlisting>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAITLSN</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 8acdff1..3733ad9 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -200,6 +200,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..609c83e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
@@ -6922,6 +6923,11 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				WaitLSNSetLatch();
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 6b3742c..091cbe2 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o \
 	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
 	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 716f1c3..9ad3275 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -139,7 +139,6 @@
 #include "utils/ps_status.h"
 #include "utils/timestamp.h"
 
-
 /*
  * Maximum size of a NOTIFY payload, including terminating NULL.  This
  * must be kept small enough so that a notification message fits on one
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000..b441b85
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,195 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*-------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave as of 9.5:
+ * README
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "access/xlog_fn.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static uint32 GetSHMEMSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(void);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+} BIDLatch;
+
+typedef struct
+{
+	int			backend_maxid;
+	BIDLatch	l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+static void
+WLOwnLatch(void)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+	if (MyBackendId > state->backend_maxid)
+		state->backend_maxid += 1;
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+static void
+WLDisownLatch(void)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	if (MyBackendId = state->backend_maxid)
+		state->backend_maxid -= 1;
+	state->l_arr[MyBackendId].pid = 0;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function */
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+static uint32
+GetSHMEMSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  GetSHMEMSize(),
+										  &found);
+	if (!found)
+	{
+		state->backend_maxid = 1;
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+	}
+}
+
+void
+WaitLSNSetLatch(void)
+{
+	uint i;
+	for (i = 1; i < (state->backend_maxid+1); i++)
+	{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].pid != 0)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+	}
+}
+
+void
+WaitLSNUtility(const char *lsn, const int *delay)
+{
+	XLogRecPtr	trg_lsn;
+	XLogRecPtr	cur_lsn;
+	int			latch_events;
+	int			tdelay = delay;
+	TimestampTz	timer = GetCurrentTimestamp();
+	trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch();
+
+	for (;;)
+	{
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			tdelay -= (GetCurrentTimestamp() - timer);
+			if (tdelay <= 0)
+				break;
+			timer = GetCurrentTimestamp();
+		}
+
+		CHECK_FOR_INTERRUPTS();
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay);
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to make bigger delay.");
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index cb5cfc4..5fb43f6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -267,7 +267,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
-		CreateMatViewStmt RefreshMatViewStmt CreateAmStmt
+		CreateMatViewStmt RefreshMatViewStmt CreateAmStmt WaitLSNStmt
 
 %type <node>	select_no_parens select_with_parens select_clause
 				simple_select values_clause
@@ -306,7 +306,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -644,7 +644,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLPARSE
 	XMLPI XMLROOT XMLSERIALIZE
@@ -882,6 +882,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -12852,7 +12853,26 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
 
+WaitLSNStmt: WAITLSN Sconst WaitDelay
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $3;
+					$$ = (Node *)n;
+				}
+		;
+WaitDelay:
+			',' Iconst							{ $$ = $2; }
+			| /*EMPTY*/							{ $$ = 0; }
+		;
 /*
  * Supporting nonterminals for expressions.
  */
@@ -13908,6 +13928,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index c04b17f..66001ae 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -254,6 +255,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index ac50c2a..6c2447d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -54,6 +54,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -902,6 +903,20 @@ standard_ProcessUtility(Node *parsetree,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION), 
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(parsetree, queryString,
@@ -2359,6 +2374,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -2951,6 +2970,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000..12d224e
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+
+extern void WaitLSNUtility(const char *lsn, const int *delay);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2f7efa8..8b9fc2b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -463,6 +463,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitLSNStmt,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 1481fff..ee8e0f3 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3101,4 +3101,15 @@ typedef struct AlterTSConfigurationStmt
 	bool		missing_ok;		/* for DROP - skip error if missing? */
 } AlterTSConfigurationStmt;
 
+/* ----------------------
+ *		WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag		type;
+	char	   *lsn;			/* Taraget LSN to wait for */
+	int		   *delay;			/* Delay to wait for LSN*/
+} WaitLSNStmt;
+
 #endif   /* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 17ffef5..b14193e 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -422,6 +422,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Ivan Kartyshov (#1)
Re: make async slave to wait for lsn to be replayed

On 31 August 2016 at 22:16, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:

Our clients who deal with 9.5 and use asynchronous master-slave replication,
asked to make the wait-mechanism on the slave side to prevent the situation
when slave handles query which needs data (LSN) that was received, flushed,
but still not replayed.

I like the broad idea - I've wanted something like it for a while. BDR
has pg_xlog_wait_remote_receive() and pg_xlog_wait_remote_apply() for
use in tests for this reason, but they act on the *upstream* side,
waiting until the downstream has acked the data. Not as useful for
ensuring that apps connected to both master and one or more replicas
get a consistent view of data.

How do you get the commit LSN to watch for? Grab
pg_current_xlog_insert_location() just after the commit and figure
that replaying to that point guarantees you get the commit?

Some time ago[1]/messages/by-id/53E41EC1.5050603@2ndquadrant.com I raised the idea of reporting commit LSN on the wire
to clients. That didn't go anywhere due to compatibility and security
concerns. I think those were resolvable, but it wasn't enough of a
priority to push hard on at the time. A truly "right" solution has to
wait for a protocol bump, but I think good-enough solutions are
possible now. So you might want to read that thread.

It also mentions hesitations about exposing LSN to clients even more.
I think we're *way* past that now - we have replication origins and
replication slots relying on it, it's exposed in a pg_lsn datatype, a
bunch of views expose it, etc. But it might be reasonable to ask
"should the client instead be expected to wait for the confirmed
commit of a 64-bit epoch-extended xid, like that returned by
txid_current()?" . One advantage of using xid is that you can get it
while you're still in the xact, so there's no race between commit and
checking the lsn after commit.

Are you specifically trying to ensure "this commit has replayed on the
replica before we run queries on it" ? Or something else?

(Also, on a side note, Kevin mentioned that it may be possible to use
SSI data to achieve SERIALIZABLE read-only queries on replicas, where
they get the same protection from commit-order related anomalies as
queries on the master. You might want to look more deeply into that
too at some stage, if you're trying to ensure the app can do read only
queries on the master and expect fully consistent results).

[1]: /messages/by-id/53E41EC1.5050603@2ndquadrant.com

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Craig Ringer (#2)
Re: make async slave to wait for lsn to be replayed

On 08/31/2016 05:54 PM, Craig Ringer wrote:

How do you get the commit LSN to watch for? Grab
pg_current_xlog_insert_location() just after the commit and figure
that replaying to that point guarantees you get the commit?

That's the point, it was created in order to provide the cosistent view
of data between master and replica. You almost guessed, I used
GetXLogReplayRecPtr() right after LSN was physically replayed on downstream.

Some time ago[1] I raised the idea of reporting commit LSN on the wire
to clients. That didn't go anywhere due to compatibility and security
concerns. I think those were resolvable, but it wasn't enough of a
priority to push hard on at the time. A truly "right" solution has to
wait for a protocol bump, but I think good-enough solutions are
possible now. So you might want to read that thread.

Thank you for pointing to your thread, it was very informative!
It seems that you have solved the very similar problem.

It also mentions hesitations about exposing LSN to clients even more.
I think we're *way* past that now - we have replication origins and
replication slots relying on it, it's exposed in a pg_lsn datatype, a
bunch of views expose it, etc. But it might be reasonable to ask
"should the client instead be expected to wait for the confirmed
commit of a 64-bit epoch-extended xid, like that returned by
txid_current()?" . One advantage of using xid is that you can get it
while you're still in the xact, so there's no race between commit and
checking the lsn after commit.

That sounds reasonable, but I dont think it will give us any
considerable benefits. But I`ll work out this variant.

Are you specifically trying to ensure "this commit has replayed on the
replica before we run queries on it" ? Or something else?

Yes you are right, I want to ensure data consistency on downstream
before running queries on it. Our clients would use it as a part of
background worker and maybe directly in apps too.

---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Ivan Kartyshov (#1)
Re: make async slave to wait for lsn to be replayed

On Thu, Sep 1, 2016 at 2:16 AM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Hi hackers,

Few days earlier I've finished my work on WAITLSN statement utility, so I’d
like to share it.
[...]
Your feedback is welcome!

[waitlsn_10dev.patch]

Hi Ivan,

Thanks for working on this. Here are some general thoughts on the
feature, and an initial review.

+1 for this feature. Explicitly waiting for a given commit to be
applied is one of several approaches to achieve "causal consistency"
for reads on replica nodes, and I think it will be very useful if
combined with a convenient way to get the values to wait for when you
run COMMIT. This could be used either by applications directly, or by
middleware that somehow keeps track of dependencies between
transactions and inserts waits.

I liked the way Heikki Linnakangas imagined this feature[1]/messages/by-id/5642FF8F.4080803@iki.fi:

BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;

... or perhaps it could be spelled like this:

BEGIN [isolation stuff] WAIT FOR COMMIT TOKEN <xxx> TIMEOUT <timeout>;

That allows waiting only at the start of a transaction, whereas your
idea of making a utility command would allow a single READ COMMITTED
transaction to wait multiple times for transactions it has heard about
through side channels, which may be useful. Perhaps we could support
the same syntax in a stand alone statement inside a transaction OR as
part of a BEGIN ... statement. Being able to do it as part of BEGIN
means that you can use this feature for single-snapshot transactions,
ie REPEATABLE READ and SERIALIZABLE (of course you can't use
SERIALIZABLE on hot standbys yet but that'll be fixed one day).
Otherwise you'd be waiting for the LSN in the middle of your
transaction but not be able to see the result because you don't take a
new snapshot. Or maybe it's enough to use a standalone WAIT ...
statement inside a REPEATABLE READ or SERIALIZABLE transaction as long
as it's the first statement, and should be an error to do so any time
later?

I think working in terms of LSNs or XIDs explicitly is not a good
idea: encouraging clients to think in terms of anything other than
opaque 'commit tokens' seems like a bad idea because it limits future
changes. For example, there is on-going discussion about introducing
CSNs (commit sequence numbers), and there are some related concepts
lurking in the SSI code; maybe we'd want to use those one day. Do you
think it would make sense to have a concept of a commit token that is
a non-analysable string as far as clients are concerned, so that
clients are not encouraged to do anything at all with them except use
them in a WAIT FOR COMMIT TOKEN <xxx> statement?

INITIAL FEEDBACK ON THE PATCH

I didn't get as far as testing or detailed review because it has some
obvious bugs and compiler warnings which I figured we should talk
about first, and I also have some higher level questions about the
design.

gram.y:12882:15: error: assignment makes pointer from integer without
a cast [-Werror=int-conversion]
n->delay = $3;

It looks like struct WaitLSNStmt accidentally has 'delay' as a pointer
to int. Perhaps you want an int? Maybe it would be useful to include
the units (millisecond, ms) in the name?

waitlsn.c: In function 'WLDisownLatch':
waitlsn.c:82:2: error: suggest parentheses around assignment used as
truth value [-Werror=parentheses]
if (MyBackendId = state->backend_maxid)
^~

Pretty sure you want == here.

waitlsn.c: In function 'WaitLSNUtility':
waitlsn.c:153:17: error: initialization makes integer from pointer
without a cast [-Werror=int-conversion]
int tdelay = delay;
^~~~~

Another place where I think you wanted an int but used a pointer to
int? To fix that warning you need tdelay = *delay, but I think delay
should really not be taken by pointer at all.

@@ -6922,6 +6923,11 @@ StartupXLOG(void)
+ /*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+ WaitLSNSetLatch();
+

I think you should try to do this only after commit records are
replayed, not after every record. Only commit records can make
transactions visible, and the raison d'être for this feature is to let
users wait for transactions they know about to become visible. You
probably can't do it directly in xact_redo_commit though, because at
that point XLogCtl->lastReplayedEndRecPtr hasn't been updated yet so a
backend that wakes up might not see that it has advanced and go back
to sleep. It is updated in the StartupXLOG loop after the redo
function runs. That is the reason why WalRcvForceReply() is called
from there rather than in xact_redo_commit, to implement remote_apply
for replication. Perhaps you need something similar?

+ tdelay -= (GetCurrentTimestamp() - timer);

You can't do arithmetic with TimestampTz like this. Depending on
configure option --disable-integer-datetimes (which controls macro
HAVE_INT64_TIMESTAMP), it may be a floating point number of seconds
since the epoch, or an integer number of microseconds since the epoch.
It looks like maybe the above code assumes it works in milliseconds,
since you're using that to adjust your delay which is in milliseconds?

I would try to figure out how to express the logic you want with
TimestampTzPlusMilliseconds and TimestampDifference. I'd probably do
something like compute the absolute timeout time with
TimestampTzPlusMilliseconds(GetCurrentTimestamp(), delay) at the start
and then compute the remaining delay each time through the latch wait
loop with TimestampDifference(GetCurrentTimestamp(), timeout, ...),
though that requires converting (seconds, microseconds) to
millisecond.

+void
+WaitLSNSetLatch(void)
+{
+ uint i;
+ for (i = 1; i < (state->backend_maxid+1); i++)
+ {
+ SpinLockAcquire(&state->l_arr[i].slock);
+ if (state->l_arr[i].pid != 0)
+ SetLatch(&state->l_arr[i].latch);
+ SpinLockRelease(&state->l_arr[i].slock);
+ }
+}

So your approach here is to let regular backends each have their own
slot indexed by backend ID, which seems good because it means that
they don't have to contend for a lock, but it's bad because it means
that the recovery process has to spin through the array looking for
backends to wake up every time it advances, and wake them all up no
matter whether they are interested in the current LSN or not. That
means that they may get woken up many times before they see the value
they're waiting for.

Did you also consider a design where there would be a wait list/queue,
and the recovery process would wake up only those backends that are
waiting for LSNs <= the current replayed location? That would make
the work for the recovery process cheaper (it just has to pop waiters
from one end of a list sorted by the LSN they're waiting for), and let
the waiting backends sleep without so many spurious wake-ups, but it
would create potential for contention between backends that are
calling WAITLSN at the same time because they all need to add
themselves to that queue which would involve some kind of mutex. I
don't know if that would be better or not, but it's probably the first
way that I would have tried to do this. See syncrep.c which does
something similar.

+static void
+WLOwnLatch(void)
+{
+ SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+ OwnLatch(&state->l_arr[MyBackendId].latch);
+ is_latch_owned = true;
+ if (MyBackendId > state->backend_maxid)
+ state->backend_maxid += 1;
+ state->l_arr[MyBackendId].pid = MyProcPid;
+ SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}

I'm a bit confused about state->backend_maxid. It looks like you are
using that to limit the range of slots that WaitLSNSetLatch has to
scan. Isn't it supposed to contain the highest MyBackendId that has
ever been seen? You appear to be incrementing backend_maxid by one,
instead of recording the index of the highest slot in use, but then
WaitLSNSetLatch is using it to restrict the range of indexes. Then
there is the question of the synchronisation of access to
backend_maxid. You hold a spinlock in one arbitrary slot, but that
doesn't seem sufficient: another backend may also read it, compute a
new value and then write it, while holding a different spin lock. Or
am I missing something?

+ if (delay > 0)
+ latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+ else
+ latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;

Here you are using delay <= 0 as 'wait forever'. I wonder if it would
be useful to have two different special values: one meaning 'wait
forever', and another meaning 'don't wait at all: if it's not applied
yet, then timeout immediately'. In any case I'd consider using names
for special wait times and using those for clarity:
WAITLSN_INFINITE_WAIT, WAITLSN_NO_WAIT.

Later I'll have feedback on the error messages, documentation and
comments but let's talk just about the design and code for now.

I hope this is helpful!

[1]: /messages/by-id/5642FF8F.4080803@iki.fi

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#4)
Re: make async slave to wait for lsn to be replayed

On Thu, Sep 15, 2016 at 2:41 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 1, 2016 at 2:16 AM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Hi hackers,

Few days earlier I've finished my work on WAITLSN statement utility, so I’d
like to share it.
[...]
Your feedback is welcome!

[waitlsn_10dev.patch]

Thanks for working on this. Here are some general thoughts on the
feature, and an initial review.

Hi Ivan

I'm marking the patch Returned with Feedback, since there hasn't been
any response or a new patch. I encourage you to keep working on this
feature, and I'll be happy to review future patches.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Thomas Munro (#4)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

Thank you for reviews and suggested improvements.
I rewrote patch to make it more stable.

Changes
=======
I've made a few changes:
1) WAITLSN now doesn`t depend on snapshot
2) Check current replayed LSN rather than in xact_redo_commit
3) Add syntax WAITLSN_INFINITE '0/693FF800' - for infinite wait and
WAITLSN_NO_WAIT '0/693FF800' for check if LSN was replayed as you
advised.
4) Reduce the count of loops with GUCs (WalRcvForceReply() which in 9.5
doesn`t exist).
5) Optimize loop that set latches.
6) Add two GUCs that helps us to configure influence on StartupXLOG:
count_waitlsn (denominator to check not each LSN)
interval_waitlsn (Interval in milliseconds to additional LSN check)

Feedback
========
On 09/15/2016 05:41 AM, Thomas Munro wrote:

You hold a spinlock in one arbitrary slot, but that
doesn't seem sufficient: another backend may also read it, compute a
new value and then write it, while holding a different spin lock. Or
am I missing something?

We acquire an individual spinlock on each member of array, so you cannot
compute new value and write it concurrently.

Tested
======
We have been tested it on different servers and OS`s, in different cases
and workloads. New version is nearly as fast as vanilla on primary and
bring tiny influence on standby performance.

Hardware:
144 Intel Cores with HT
3TB RAM
all data on ramdisk
primary + hotstandby on the same node.

A dataset was created with "pgbench -i -s 1000" command. For each round
of test we pause replay on standby, make 1000000 transaction on primary
with pgbench, start replay on standby and measure replication gap
disappearing time under different standby workload. The workload was
"WAITLSN ('Very/FarLSN', 1000ms timeout)" followed by "select abalance
from pgbench_accounts there aid = random_aid;"
For vanilla 1000ms timeout was enforced on pgbench side by -R option.
GUC waitlsn parameters was adopted for 1000ms timeout on standby with
35000 tps rate on primary.
interval_waitlsn = 500 (ms)
count_waitlsn = 30000

On 200 clients, slave caching up master as vanilla without significant
delay.
On 500 clients, slave caching up master 3% slower then vanilla.
On 1000 clients, 12% slower.
On 5000 clients, 3 time slower because it far above our hardware ability.

How to use it
==========
WAITLSN �LSN� [, timeout in ms];
WAITLSN_INFINITE �LSN�;
WAITLSN_NO_WAIT �LSN�;

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN �0/303EC60�, 10000;

#Or same without timeout.
WAITLSN �0/303EC60�;
orfile:///home/vis/Downloads/waitlsn_10dev_v2.patch
WAITLSN_INFINITE '0/693FF800';

#To check if LSN is replayed can be used.
WAITLSN_NO_WAIT '0/693FF800';

Notice: WAITLSN will release on PostmasterDeath or Interruption events
if they come earlier then target LSN or timeout.

Thank you for reading, will be glad to get your feedback.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

waitlsn_10dev_v3.patchtext/x-patch; name=waitlsn_10dev_v3.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 77667bdebd..72c5390695 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 8acdff1393..3733ad960b 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -200,6 +200,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa9ee5a0dd..9696b5dbb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -143,6 +144,9 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
+/* GUC variable */
+int				count_waitlsn = 10;
+int				interval_waitlsn = 100;
 
 /*
  * Although only "on", "off", and "always" are documented,
@@ -6781,6 +6785,8 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			TimestampTz			time_waitlsn = GetCurrentTimestamp();
+			int					counter_waitlsn = 0;
 
 			InRedo = true;
 
@@ -6998,6 +7004,17 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (counter_waitlsn % count_waitlsn == 0
+					|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+				{
+					WaitLSNSetLatch();
+					time_waitlsn = GetCurrentTimestamp();
+				}
+				counter_waitlsn++;
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 6b3742c0a0..091cbe22a0 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o \
 	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
 	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 716f1c3318..9ad3275131 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -139,7 +139,6 @@
 #include "utils/ps_status.h"
 #include "utils/timestamp.h"
 
-
 /*
  * Maximum size of a NOTIFY payload, including terminating NULL.  This
  * must be kept small enough so that a notification message fits on one
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000000..f75507ee4e
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,242 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*-------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave as of 9.5:
+ * README
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "access/xlog_fn.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "funcapi.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static uint32 WaitLSNShmemSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(void);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+} BIDLatch;
+
+typedef struct
+{
+	char		dummy;
+	int			backend_maxid;
+	BIDLatch	l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+/* Take Latch for current backend at the begining of WAITLSN */
+static void
+WLOwnLatch(void)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* Release Latch for current backend at the end of WAITLSN */
+static void
+WLDisownLatch(void)
+{
+	int i;
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=0; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function on abort*/
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+/* Get size of shared memory to room GlobState */
+static uint32
+WaitLSNShmemSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+/* Init array of Latches in shared memory */
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitLSNShmemSize(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+		state->backend_maxid = 0;
+	}
+}
+
+/* Set all Latches in shared memorys cause new LSN been replayed*/
+void
+WaitLSNSetLatch(void)
+{
+	uint i;
+	for (i = 0; i <= state->backend_maxid; i++)
+	{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].pid != 0)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+	}
+}
+
+/*
+ * On WAITLSN own latch and wait till LSN is replayed, Postmaster death, interruption
+ * or timeout.
+ */
+void
+WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64_t	tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState	*tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "false";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch();
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+
+		/* CHECK_FOR_INTERRUPTS if they comes then disown latch current */
+		if (InterruptPending)
+		{
+			WLDisownLatch();
+			ProcessInterrupts();
+		}
+
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to make bigger delay.");
+	else
+		value = "true";
+
+	/* need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc);
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 2ed7b5259d..9b54d75ac6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -270,7 +270,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
-		CreateMatViewStmt RefreshMatViewStmt CreateAmStmt
+		CreateMatViewStmt RefreshMatViewStmt CreateAmStmt WaitLSNStmt
 
 %type <node>	select_no_parens select_with_parens select_clause
 				simple_select values_clause
@@ -309,7 +309,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -663,7 +663,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WAITLSN_INFINITE WAITLSN_NO_WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLPARSE
 	XMLPI XMLROOT XMLSERIALIZE
@@ -901,6 +902,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -13235,7 +13237,41 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
 
+WaitLSNStmt:
+			WAITLSN Sconst WaitDelay
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $3;
+					$$ = (Node *)n;
+				}
+			| WAITLSN_INFINITE Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN_NO_WAIT Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 1;
+					$$ = (Node *)n;
+				}
+		;
+WaitDelay:
+			',' Iconst							{ $$ = $2; }
+			| /*EMPTY*/							{ $$ = 0; }
+		;
 /*
  * Supporting nonterminals for expressions.
  */
@@ -14266,6 +14302,9 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
+			| WAITLSN_INFINITE
+			| WAITLSN_NO_WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 29febb46c4..a80a003b57 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -268,6 +269,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	AsyncShmemInit();
 	BackendRandomShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index fd4eff4907..e2746282a6 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -54,6 +54,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -907,6 +908,20 @@ standard_ProcessUtility(Node *parsetree,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION), 
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, parsetree, queryString,
@@ -2371,6 +2386,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -2963,6 +2982,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a02511754e..6021389ee0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2825,6 +2825,30 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"interval_waitlsn", PGC_SUSET, DEVELOPER_OPTIONS,
+			gettext_noop("Set interval of time (ms) how often LSN will be checked."),
+			gettext_noop("Set interval of time (ms) how often LSN will be checked to "
+						 "make less influence on StartupXLOG() process."),
+			GUC_UNIT_MS
+		},
+		&interval_waitlsn,
+		100, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"count_waitlsn", PGC_SUSET, DEVELOPER_OPTIONS,
+			gettext_noop("How often LSN will be checked."),
+			gettext_noop("Set count of LSNs that will be passed befor LSN check to "
+						 "make less influence on StartupXLOG() process."),
+			GUC_NOT_IN_SAMPLE
+		},
+		&count_waitlsn,
+		10, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index c9f332c908..f8cb00b214 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -109,6 +109,9 @@ extern bool log_checkpoints;
 
 extern int	CheckPointSegments;
 
+extern int interval_waitlsn;
+extern int count_waitlsn;
+
 /* Archive modes */
 typedef enum ArchiveMode
 {
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000000..2e3960881e
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+#include "tcop/dest.h"
+
+extern void WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index c514d3fc93..ecacf80576 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -469,6 +469,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitLSNStmt,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fc532fbd43..e8ef4fe67b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3213,4 +3213,16 @@ typedef struct AlterTSConfigurationStmt
 	bool		missing_ok;		/* for DROP - skip error if missing? */
 } AlterTSConfigurationStmt;
 
+/* ----------------------
+ *		WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag		type;
+	char	   *lsn;			/* Taraget LSN to wait for */
+	int			delay;			/* Delay to wait for LSN*/
+	bool		nowait;			/* No wait for LSN just result*/
+} WaitLSNStmt;
+
 #endif   /* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 581ff6eedb..0f41ae906c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -427,6 +427,9 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn_infinite", WAITLSN_INFINITE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn_no_wait", WAITLSN_NO_WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
#7Thom Brown
thom@linux.com
In reply to: Ivan Kartyshov (#6)
Re: make async slave to wait for lsn to be replayed

On 23 January 2017 at 11:56, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:

Thank you for reviews and suggested improvements.
I rewrote patch to make it more stable.

Changes
=======
I've made a few changes:
1) WAITLSN now doesn`t depend on snapshot
2) Check current replayed LSN rather than in xact_redo_commit
3) Add syntax WAITLSN_INFINITE '0/693FF800' - for infinite wait and
WAITLSN_NO_WAIT '0/693FF800' for check if LSN was replayed as you
advised.
4) Reduce the count of loops with GUCs (WalRcvForceReply() which in 9.5
doesn`t exist).
5) Optimize loop that set latches.
6) Add two GUCs that helps us to configure influence on StartupXLOG:
count_waitlsn (denominator to check not each LSN)
interval_waitlsn (Interval in milliseconds to additional LSN check)

Feedback
========
On 09/15/2016 05:41 AM, Thomas Munro wrote:

You hold a spinlock in one arbitrary slot, but that
doesn't seem sufficient: another backend may also read it, compute a
new value and then write it, while holding a different spin lock. Or
am I missing something?

We acquire an individual spinlock on each member of array, so you cannot
compute new value and write it concurrently.

Tested
======
We have been tested it on different servers and OS`s, in different cases and
workloads. New version is nearly as fast as vanilla on primary and bring
tiny influence on standby performance.

Hardware:
144 Intel Cores with HT
3TB RAM
all data on ramdisk
primary + hotstandby on the same node.

A dataset was created with "pgbench -i -s 1000" command. For each round of
test we pause replay on standby, make 1000000 transaction on primary with
pgbench, start replay on standby and measure replication gap disappearing
time under different standby workload. The workload was "WAITLSN
('Very/FarLSN', 1000ms timeout)" followed by "select abalance from
pgbench_accounts there aid = random_aid;"
For vanilla 1000ms timeout was enforced on pgbench side by -R option.
GUC waitlsn parameters was adopted for 1000ms timeout on standby with 35000
tps rate on primary.
interval_waitlsn = 500 (ms)
count_waitlsn = 30000

On 200 clients, slave caching up master as vanilla without significant
delay.
On 500 clients, slave caching up master 3% slower then vanilla.
On 1000 clients, 12% slower.
On 5000 clients, 3 time slower because it far above our hardware ability.

How to use it
==========
WAITLSN ‘LSN’ [, timeout in ms];
WAITLSN_INFINITE ‘LSN’;
WAITLSN_NO_WAIT ‘LSN’;

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN ‘0/303EC60’, 10000;

#Or same without timeout.
WAITLSN ‘0/303EC60’;
orfile:///home/vis/Downloads/waitlsn_10dev_v2.patch
WAITLSN_INFINITE '0/693FF800';

#To check if LSN is replayed can be used.
WAITLSN_NO_WAIT '0/693FF800';

Notice: WAITLSN will release on PostmasterDeath or Interruption events
if they come earlier then target LSN or timeout.

Thank you for reading, will be glad to get your feedback.

Could you please rebase your patch as it no longer applies cleanly.

Thanks

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thom Brown (#7)
Re: make async slave to wait for lsn to be replayed

On Thu, Feb 23, 2017 at 3:08 AM, Thom Brown <thom@linux.com> wrote:

On 23 January 2017 at 11:56, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:

Thank you for reading, will be glad to get your feedback.

Could you please rebase your patch as it no longer applies cleanly.

+1

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9David Steele
david@pgmasters.net
In reply to: Thomas Munro (#8)
Re: make async slave to wait for lsn to be replayed

Hi Ivan,

On 2/27/17 3:52 PM, Thomas Munro wrote:

On Thu, Feb 23, 2017 at 3:08 AM, Thom Brown <thom@linux.com> wrote:

On 23 January 2017 at 11:56, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:

Thank you for reading, will be glad to get your feedback.

Could you please rebase your patch as it no longer applies cleanly.

+1

Please provide a rebased patch as soon as possible.

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: David Steele (#9)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

Rebase done.

Meanwhile I made some more changes.

Changes
=======
1) WAITLSN is now implemented as an extension called "pg_waitlsn"

2) Call new hook "lsn_updated_hook" right after xact_redo_commit (xlog.c)

3) Corresponding functions:
pg_waitlsn('0/693FF800', 10000) - wait 10 seconds
pg_waitlsn_infinite('0/693FF800') - for infinite wait
pg_waitlsn_no_wait('0/693FF800') - once check if LSN was replayed or not.

4) Add two GUCs which help tuning influence on StartupXLOG:
count_waitlsn (denominator to check not each LSN)
int count_waitlsn = 10;

interval_waitlsn (Interval in milliseconds to additional LSN check)
int interval_waitlsn = 100;

5) Optimize loop that set latches.

How to use it
==========
Master:
1) Make "wal_level = replica"
Slave:
2) Add shared_preload_libraries = 'pg_waitlsn'
hot_standby = on (in postgresql.conf)
3) Create extension pg_waitlsn;
4) And in hot_standby you can wait for LSN (pgsleep), when LSN will
replayed on slave pg_waitlsn will release

select pg_waitlsn(‘LSN’ [, timeout in ms]);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
select pg_waitlsn(‘0/303EC60’, 10000);

#Or same without timeout.
select pg_waitlsn(‘0/303EC60’);
select pg_waitlsn_infinite('0/693FF800');

#To check if LSN is replayed can be used.
select pg_waitlsn_no_wait('0/693FF800');

Notice: select pg_waitlsn will release on PostmasterDeath or
Interruption events if they come earlier then target LSN or timeout.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

pg_waitlsn10_v4.patchtext/x-patch; name=pg_waitlsn10_v4.patchDownload
diff --git a/contrib/pg_waitlsn/Makefile b/contrib/pg_waitlsn/Makefile
new file mode 100644
index 0000000..49a326c
--- /dev/null
+++ b/contrib/pg_waitlsn/Makefile
@@ -0,0 +1,21 @@
+# pg_waitlsn/Makefile
+
+MODULE_big = pg_waitlsn
+OBJS = pg_waitlsn.o
+EXTENSION = pg_waitlsn
+DATA = pg_waitlsn--1.0.sql
+
+
+
+ifdef USE_PGXS
+
+	PG_CONFIG = pg_config
+	PGXS := $( shell $( PG_CONFIG ) --pgxs )
+	include $(PGXS)
+else
+
+	subdir = contrib/pg_waitlsn
+	top_builddir = ../..
+	include $(top_builddir)/src/Makefile.global
+	include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_waitlsn/pg_waitlsn--1.0.sql b/contrib/pg_waitlsn/pg_waitlsn--1.0.sql
new file mode 100644
index 0000000..8b251f3
--- /dev/null
+++ b/contrib/pg_waitlsn/pg_waitlsn--1.0.sql
@@ -0,0 +1,19 @@
+/* contrib/pg_waitlsn/pg_waitlsn--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_waitlsn" to wait target LSN to been replayed, delay for waiting in miliseconds (default infinity) \quit
+
+CREATE FUNCTION pg_waitlsn(lsn pg_lsn, delay int default 0)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_waitlsn'
+LANGUAGE C IMMUTABLE STRICT ;
+
+CREATE FUNCTION pg_waitlsn_infinite(lsn pg_lsn)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_waitlsn_infinite'
+LANGUAGE C IMMUTABLE STRICT ;
+
+CREATE FUNCTION pg_waitlsn_no_wait(lsn pg_lsn)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_waitlsn_no_wait'
+LANGUAGE C IMMUTABLE STRICT ;
diff --git a/contrib/pg_waitlsn/pg_waitlsn.c b/contrib/pg_waitlsn/pg_waitlsn.c
new file mode 100644
index 0000000..d210678
--- /dev/null
+++ b/contrib/pg_waitlsn/pg_waitlsn.c
@@ -0,0 +1,299 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waitlsn
+ *
+ * Portions Copyright (c) 2012-2017, PostgresPro Global Development Group
+ *
+ * IDENTIFICATION
+ *		  contrib/pg_waitlsn/pg_waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "access/xlog.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+static bool pg_waitlsn_internal(XLogRecPtr lsn, uint64_t delay);
+
+/* Hooks values */
+static lsn_updated_hook_type prev_lsn_updated_hook = NULL;
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static void wl_lsn_updated_hook(void);
+static uint32 estimate_shmem_size(void);
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static void disown_latches_on_abort(XactEvent event, void *arg);
+static void wl_own_latch(void);
+static void wl_disown_latch(void);
+
+/* GUC variable */
+int				count_waitlsn = 10;
+int				interval_waitlsn = 100;
+
+/* Globals */
+TimestampTz		time_waitlsn = 0;
+int				counter_waitlsn = 0;
+
+void			_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+} BIDLatch;
+
+typedef struct
+{
+	char		dummy;
+	int			backend_maxid;
+	BIDLatch	l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+static uint32
+estimate_shmem_size(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+static void
+wl_own_latch(void)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+static void
+wl_disown_latch(void)
+{
+	int		i;
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=0; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function */
+static void
+disown_latches_on_abort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		wl_disown_latch();
+	}
+}
+
+/*
+ * Distribute shared memor, initlocks and latches.
+ */
+static void
+qs_shmem_startup(void)
+{
+	bool	found;
+	uint	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  estimate_shmem_size(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+	}
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	time_waitlsn = GetCurrentTimestamp();
+
+	RequestAddinShmemSpace(sizeof(GlobState));
+
+	/* Define interval_waitlsn */
+	DefineCustomIntVariable(
+		"interval_waitlsn",
+
+		"Set interval of time (ms) how often LSN will be checked.",
+
+		"Set interval of time (ms) how often LSN will be checked to "
+		"make less influence on StartupXLOG() process.",
+		&interval_waitlsn,
+		100, 0, INT_MAX,
+		PGC_SUSET,
+		GUC_UNIT_MS,
+		NULL, NULL, NULL);
+
+	/* Define count_waitlsn */
+	DefineCustomIntVariable(
+		"count_waitlsn",
+
+		"How often LSN will be checked.",
+
+		"Set count of LSNs that will be passed befor LSN check to "
+		"make less influence on StartupXLOG() process.",
+		&count_waitlsn,
+		10, 1, INT_MAX,
+		PGC_SUSET,
+		GUC_NOT_IN_SAMPLE,
+		NULL, NULL, NULL);
+
+	prev_lsn_updated_hook = lsn_updated_hook;
+	lsn_updated_hook = wl_lsn_updated_hook;
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = qs_shmem_startup;
+
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(disown_latches_on_abort, NULL);
+}
+
+/* Hook function */
+static void
+wl_lsn_updated_hook(void)
+{
+	uint i;
+	/*
+	 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+	 */
+	if (counter_waitlsn % count_waitlsn == 0
+		|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+	{
+		for (i = 0; i <= state->backend_maxid; i++)
+		{
+			SpinLockAcquire(&state->l_arr[i].slock);
+			if (state->l_arr[i].pid != 0)
+				SetLatch(&state->l_arr[i].latch);
+			SpinLockRelease(&state->l_arr[i].slock);
+		}
+		elog(DEBUG2,"WAITLSN  - %d / %s", counter_waitlsn, timestamptz_to_str(GetCurrentTimestamp()));
+		time_waitlsn = GetCurrentTimestamp();
+	}
+	counter_waitlsn++;
+}
+
+PG_FUNCTION_INFO_V1( pg_waitlsn );
+PG_FUNCTION_INFO_V1( pg_waitlsn_infinite );
+PG_FUNCTION_INFO_V1( pg_waitlsn_no_wait );
+
+
+Datum
+pg_waitlsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+	uint64_t		delay = PG_GETARG_INT32(1);
+
+	PG_RETURN_BOOL(pg_waitlsn_internal(trg_lsn, delay));
+}
+
+Datum
+pg_waitlsn_infinite(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+
+	PG_RETURN_BOOL(pg_waitlsn_internal(trg_lsn, 0));
+}
+
+Datum
+pg_waitlsn_no_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+
+	PG_RETURN_BOOL(pg_waitlsn_internal(trg_lsn, 1));
+}
+
+static bool
+pg_waitlsn_internal(XLogRecPtr trg_lsn, uint64_t delay)
+{
+	XLogRecPtr		cur_lsn = GetXLogReplayRecPtr(NULL);
+	int				latch_events;
+	uint64_t		tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	wl_own_latch();
+	for (;;)
+	{
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		elog(DEBUG2,"WAITLSN  %x", MyPgXact->xmin);
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		CHECK_FOR_INTERRUPTS();
+	}
+	wl_disown_latch();
+
+	return trg_lsn <= GetXLogReplayRecPtr(NULL);
+}
diff --git a/contrib/pg_waitlsn/pg_waitlsn.control b/contrib/pg_waitlsn/pg_waitlsn.control
new file mode 100644
index 0000000..7be85d6
--- /dev/null
+++ b/contrib/pg_waitlsn/pg_waitlsn.control
@@ -0,0 +1,5 @@
+# pg_waitlsn extension
+comment = 'target LSN waiter for slave replica'
+default_version = '1.0'
+module_pathname = '$libdir/pg_waitlsn'
+relocatable = true
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2dcff7f..c6018e5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -832,6 +832,9 @@ static bool holdingAllLocks = false;
 static MemoryContext walDebugCxt = NULL;
 #endif
 
+/* Hook after xlogreader replayed lsn */
+lsn_updated_hook_type lsn_updated_hook = NULL;
+
 static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
@@ -7174,6 +7177,12 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * Hook after update lastReplayedEndRecPtr
+				 */
+				if (lsn_updated_hook != NULL)
+					lsn_updated_hook();
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9f036c7..175023c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -287,6 +287,12 @@ extern void assign_max_wal_size(int newval, void *extra);
 extern void assign_checkpoint_completion_target(double newval, void *extra);
 
 /*
+ * Hook after xlogreader replayed lsn
+ */
+typedef void (*lsn_updated_hook_type) (void);
+extern PGDLLIMPORT lsn_updated_hook_type lsn_updated_hook;
+
+/*
  * Starting/stopping a base backup
  */
 extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Ivan Kartyshov (#10)
Re: make async slave to wait for lsn to be replayed

On Tue, Mar 7, 2017 at 8:48 PM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Rebase done.

Thank you for updating the patch.

Meanwhile I made some more changes.

Changes
=======
1) WAITLSN is now implemented as an extension called "pg_waitlsn"

I've read the discussion so far but I didn't see the reason why you've
changed it to as a contrib module. Could you tell me about that? I
guess this feature would be more useful if provided as a core feature
and we need to discuss about syntax as Thomas mentioned.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Masahiko Sawada (#11)
Re: make async slave to wait for lsn to be replayed

On Wed, Mar 8, 2017 at 1:58 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 7, 2017 at 8:48 PM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Rebase done.

Thank you for updating the patch.

Meanwhile I made some more changes.

Changes
=======
1) WAITLSN is now implemented as an extension called "pg_waitlsn"

I've read the discussion so far but I didn't see the reason why you've
changed it to as a contrib module. Could you tell me about that? I
guess this feature would be more useful if provided as a core feature
and we need to discuss about syntax as Thomas mentioned.

The problem with using functions like pg_waitlsn(‘LSN’ [, timeout in
ms]) instead of new syntax for transaction starting commands like
BEGIN TRANSACTION ... WAIT FOR ... is that it doesn't work for the
higher isolation levels. In READ COMMITTED it's fine, because every
statement runs with its own snapshot, so when SELECT
pg_waitlsn(some_lsn) returns, the next statement will run with a
snapshot that can see the effects of some_lsn being applied. But in
REPEATABLE READ and SERIALIZABLE, even though pg_waitlsn(some_lsn)
waits for the LSN to be applied, the next statement will run with the
snapshot from before and never see the transaction you were waiting
for.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Masahiko Sawada (#11)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

Hello

On 07.03.2017 15:58, Masahiko Sawada wrote:

I've read the discussion so far but I didn't see the reason why you've
changed it to as a contrib module. Could you tell me about that?

I did it on the initiative of our customer, who preferred the
implementation in the form of contrib. Contrib realization of WAITLSN
adds to core the only hook.

On 07.03.2017 15:58, Masahiko Sawada wrote:

I guess this feature would be more useful if provided as a core
feature and we need to discuss about syntax as Thomas mentioned.

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the
discretion of the community.

Will be glad to hear your suggestion about syntax, code and patch.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

waitlsn_10dev_v3_rebased.patchtext/x-patch; name=waitlsn_10dev_v3_rebased.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 2bc4d9f..6d5a81e 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -178,6 +178,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 0000000..338187b
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,134 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-WAITLSN">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAITLSN</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAITLSN</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAITLSN</refname>
+  <refpurpose>wait until target <acronym>LSN</> has been replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAITLSN <replaceable class="PARAMETER">'LSN'</replaceable> [ , <replaceable class="PARAMETER">delay</replaceable> ]
+WAITLSN_INFINITE <replaceable class="PARAMETER">'LSN'</replaceable>
+WAITLSN_NO_WAIT <replaceable class="PARAMETER">'LSN'</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   The <command>WAITLSN</command> wait till target <acronym>LSN</> will
+   be replayed with an optional <quote>delay</> (milliseconds by default
+   infinity) to be wait for LSN to replayed.
+  </para>
+
+  <para>
+   <command>WAITLSN</command> provides a simple
+   interprocess <acronym>LSN</> wait mechanism for a backends on slave
+   in master-slave replication scheme on <productname>PostgreSQL</productname> database.
+  </para>
+
+  <para>
+   The <command>WAITLSN_INFINITE</command> wait till target <acronym>LSN</> will
+   be replayed on slave .
+  </para>
+
+  <para>
+   The <command>WAITLSN_NO_WAIT</command> will tell if target <acronym>LSN</> was replayed without any wait.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Target log sequence number to be wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">delay</replaceable></term>
+    <listitem>
+     <para>
+      Time in miliseconds to waiting for LSN to be replayed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+  <para>
+   Delay time for waiting till LSN to be replayed must be integer. For
+   default it is infinity. Waiting can be interupped using Ctl+C, or
+   by Postmaster death.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>GUCs</title>
+
+  <para>
+ Add two GUCs which help tuning influence on StartupXLOG:
+ count_waitlsn (denominator to check not each LSN)
+ int	count_waitlsn    = 10;
+  </para>
+
+  <para>
+ interval_waitlsn (Interval in milliseconds to additional LSN check)
+ int	interval_waitlsn = 100;
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Configure and execute a waitlsn from
+   <application>psql</application>:
+
+<programlisting>
+WAITLSN '0/3F07A6B1', 10000;
+NOTICE:  LSN is not reached. Try to make bigger delay.
+WAITLSN
+
+WAITLSN '0/3F07A611';
+WAITLSN
+
+WAITLSN '0/3F0FF791', 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to make bigger delay.
+ERROR:  canceling statement due to user request
+</programlisting>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAITLSN</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index c8191de..a5100a2 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -206,6 +206,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2dcff7f..9c2def1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -145,6 +146,9 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
+/* GUC variable */
+int				count_waitlsn = 10;
+int				interval_waitlsn = 100;
 
 /*
  * Although only "on", "off", and "always" are documented,
@@ -6948,6 +6952,8 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			TimestampTz			time_waitlsn = GetCurrentTimestamp();
+			int					counter_waitlsn = 0;
 
 			InRedo = true;
 
@@ -7174,6 +7180,17 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (counter_waitlsn % count_waitlsn == 0
+					|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+				{
+					WaitLSNSetLatch();
+					time_waitlsn = GetCurrentTimestamp();
+				}
+				counter_waitlsn++;
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e0fab38..8a7e2bd 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o subscriptioncmds.o tablecmds.o \
 	tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
-	vacuumlazy.o variable.o view.o
+	vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index e32d7a1..ce1c7f7 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -139,7 +139,6 @@
 #include "utils/ps_status.h"
 #include "utils/timestamp.h"
 
-
 /*
  * Maximum size of a NOTIFY payload, including terminating NULL.  This
  * must be kept small enough so that a notification message fits on one
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000..47bd90d
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,241 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2017, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "funcapi.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static uint32 WaitLSNShmemSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(void);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+} BIDLatch;
+
+typedef struct
+{
+	char		dummy;
+	int			backend_maxid;
+	BIDLatch	l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+/* Take Latch for current backend at the begining of WAITLSN */
+static void
+WLOwnLatch(void)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* Release Latch for current backend at the end of WAITLSN */
+static void
+WLDisownLatch(void)
+{
+	int i;
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=0; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function on abort*/
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+/* Get size of shared memory to room GlobState */
+static uint32
+WaitLSNShmemSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+/* Init array of Latches in shared memory */
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitLSNShmemSize(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+		state->backend_maxid = 0;
+	}
+}
+
+/* Set all Latches in shared memorys cause new LSN been replayed*/
+void
+WaitLSNSetLatch(void)
+{
+	uint i;
+	for (i = 0; i <= state->backend_maxid; i++)
+	{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].pid != 0)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+	}
+}
+
+/*
+ * On WAITLSN own latch and wait till LSN is replayed, Postmaster death, interruption
+ * or timeout.
+ */
+void
+WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64_t	tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState	*tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "false";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch();
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+
+		/* CHECK_FOR_INTERRUPTS if they comes then disown latch current */
+		if (InterruptPending)
+		{
+			WLDisownLatch();
+			ProcessInterrupts();
+		}
+
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to make bigger delay.");
+	else
+		value = "true";
+
+	/* need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc);
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 174773b..dcaed39 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -275,7 +275,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitLSNStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -320,7 +320,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -678,7 +678,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WAITLSN_INFINITE WAITLSN_NO_WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLPARSE
 	XMLPI XMLROOT XMLSERIALIZE
@@ -935,6 +936,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -13507,7 +13509,41 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
 
+WaitLSNStmt:
+			WAITLSN Sconst WaitDelay
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $3;
+					$$ = (Node *)n;
+				}
+			| WAITLSN_INFINITE Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN_NO_WAIT Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 1;
+					$$ = (Node *)n;
+				}
+		;
+WaitDelay:
+			',' Iconst							{ $$ = $2; }
+			| /*EMPTY*/							{ $$ = 0; }
+		;
 /*
  * Supporting nonterminals for expressions.
  */
@@ -14541,6 +14577,9 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
+			| WAITLSN_INFINITE
+			| WAITLSN_NO_WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..932136f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -271,6 +272,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	AsyncShmemInit();
 	BackendRandomShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5d3be38..6595be6 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -56,6 +56,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -917,6 +918,20 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2453,6 +2468,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -3064,6 +3083,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0249721..ab52ed4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2835,6 +2835,30 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"interval_waitlsn", PGC_SUSET, DEVELOPER_OPTIONS,
+			gettext_noop("Set interval of time (ms) how often LSN will be checked."),
+			gettext_noop("Set interval of time (ms) how often LSN will be checked to "
+						 "make less influence on StartupXLOG() process."),
+			GUC_UNIT_MS
+		},
+		&interval_waitlsn,
+		100, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"count_waitlsn", PGC_SUSET, DEVELOPER_OPTIONS,
+			gettext_noop("How often LSN will be checked."),
+			gettext_noop("Set count of LSNs that will be passed befor LSN check to "
+						 "make less influence on StartupXLOG() process."),
+			GUC_NOT_IN_SAMPLE
+		},
+		&count_waitlsn,
+		10, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9f036c7..5249c8e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -111,6 +111,9 @@ extern bool log_checkpoints;
 
 extern int	CheckPointSegments;
 
+extern int interval_waitlsn;
+extern int count_waitlsn;
+
 /* Archive modes */
 typedef enum ArchiveMode
 {
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000..2e39608
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+#include "tcop/dest.h"
+
+extern void WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 95dd8ba..5f02105 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -478,6 +478,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitLSNStmt,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 07a8436..732304d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3299,4 +3299,16 @@ typedef struct DropSubscriptionStmt
 	bool		missing_ok;		/* Skip error if missing? */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag		type;
+	char	   *lsn;			/* Taraget LSN to wait for */
+	int			delay;			/* Delay to wait for LSN*/
+	bool		nowait;			/* No wait for LSN just result*/
+} WaitLSNStmt;
+
 #endif   /* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 985d650..c77f7b5 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -430,6 +430,9 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn_infinite", WAITLSN_INFINITE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn_no_wait", WAITLSN_NO_WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
#14Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Ivan Kartyshov (#13)
Re: make async slave to wait for lsn to be replayed

On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the discretion
of the community.

Will be glad to hear your suggestion about syntax, code and patch.

Hi Ivan,

Here is some feedback based on a first read-through of the v4 patch.
I haven't tested it yet.

First, I'll restate my view of the syntax-vs-function question: I
think an fmgr function is the wrong place to do this, because it
doesn't work for our 2 higher isolation levels as mentioned. Most
people probably use READ COMMITTED most of the time so the extension
version you've proposed is certainly useful for many people and I like
it, but I will vote against inclusion in core of any feature that
doesn't consider all three of our isolation levels, especially if
there is no way to extend support to other levels later. I don't want
PostgreSQL to keep adding features that eventually force everyone to
use READ COMMITTED because they want to use feature X, Y or Z.

Maybe someone can think of a clever way for an extension to insert a
wait for a user-supplied LSN *before* acquiring a snapshot so it can
work for the higher levels, or maybe the hooks should go into core
PostgreSQL so that the extension can exist as an external project not
requiring a patched PostgreSQL installation, or maybe this should be
done with new core syntax that extends transaction commands. Do other
people have views on this?

+ * Portions Copyright (c) 2012-2017, PostgresPro Global Development Group

This should say PostgreSQL.

+wl_lsn_updated_hook(void)
+{
+    uint i;
+    /*
+     * After update lastReplayedEndRecPtr set Latches in SHMEM array
+     */
+    if (counter_waitlsn % count_waitlsn == 0
+        || TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+    {

Doesn't this mean that if you are waiting for LSN 1234, and the
primary sends that LSN and then doesn't send anything for another
hour, a standby waiting in pg_waitlsn is quite likely to skip that
update (either because of count_waitlsn or interval_waitlsn), and then
finish up waiting for a very long time?

I'm not sure if this is a good idea, but it's an idea: You could keep
your update skipping logic, but make sure you always catch the
important case where recovery hits the end of WAL, by invoking your
callback from WaitForWALToBecomeAvailable. It could have a boolean
parameter that means 'don't skip this one!'. In other words, it's OK
to skip updates, but not if you know there is no more WAL available to
apply (since you have no idea how long it will be for more to arrive).

Calling GetCurrentTimestamp() at high frequency (after every WAL
record is replayed) may be a bad idea. It's a slow system call on
some operating systems. Can we use an existing timestamp for free,
like recoveryLastXTime? Remembering that the motivation for using
this feature is to wait for *whole transactions* to be replayed and
become visible, there is no sensible reason to wait for random WAL
positions inside a transaction, so if you used that then you would
effectively skip all non-COMMIT records and also skip some COMMIT
records that are coming down the pipe too fast.

+static void
+wl_own_latch(void)
+{
+    SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+    OwnLatch(&state->l_arr[MyBackendId].latch);
+    is_latch_owned = true;
+
+    if (state->backend_maxid < MyBackendId)
+        state->backend_maxid = MyBackendId;
+
+    state->l_arr[MyBackendId].pid = MyProcPid;
+    SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}

What is the point of using extra latches for this? Why not just use
the standard latch? Syncrep and many other things do that. I'm not
actually sure if there is ever a reason to create more latches in
regular backends. SIGUSR1 will be delivered and set the main latch
anyway.

There are cases of specialised latches in the system, like the wal
receiver latch, and I'm reliably informed that that solves problems
like delivering a wakeup message without having to know which backend
is currently the wal receiver (a problem we might consider solving
today with a condition variable?) I don't think anything like that
applies here.

+        for (i = 0; i <= state->backend_maxid; i++)
+        {
+            SpinLockAcquire(&state->l_arr[i].slock);
+            if (state->l_arr[i].pid != 0)
+                SetLatch(&state->l_arr[i].latch);
+            SpinLockRelease(&state->l_arr[i].slock);
+        }

Once we get through the update-skipping logic above, we hit this loop
which acquires spinlocks for every backend one after another and sets
the latches of every backend, no matter whether they are waiting for
the LSN that has been applied. Assuming we go with this
scan-the-whole-array approach, why not include the LSN waited for in
the array slots, so that we can avoid disturbing processes that are
waiting for a later LSN?

Could you talk a bit about the trade-off between this approach and a
queue based approach? I would like to understand whether this really
is the best way to do it.

One way to use a queue would be
ConditionVariableBroadcast(&state->lsn_moved), and then waiters would
use a condition variable wait loop. That would make your code much
simpler (you wouldn't even need your array of spinlocks + latches) but
would still have the problem of processes waking up even though
they're actually waiting for a later LSN. Another choice would be to
create a custom wait list which actually holds the LSNs waited for in
sorted order, so that we wake up exactly the right processes, cheaply,
or in arbitrary order which makes insertion cheaper but search more
expensive.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15David Steele
david@pgmasters.net
In reply to: Thomas Munro (#14)
Re: make async slave to wait for lsn to be replayed

Hi Ivan,

On 3/12/17 10:20 PM, Thomas Munro wrote:

On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the discretion
of the community.

Will be glad to hear your suggestion about syntax, code and patch.

Hi Ivan,

Here is some feedback based on a first read-through of the v4 patch.
I haven't tested it yet.

This thread has been idle for over a week. Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#14)
Re: make async slave to wait for lsn to be replayed

On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Maybe someone can think of a clever way for an extension to insert a
wait for a user-supplied LSN *before* acquiring a snapshot so it can
work for the higher levels, or maybe the hooks should go into core
PostgreSQL so that the extension can exist as an external project not
requiring a patched PostgreSQL installation, or maybe this should be
done with new core syntax that extends transaction commands. Do other
people have views on this?

IMHO, trying to do this using a function-based interface is a really
bad idea for exactly the reasons you mention. I don't see why we'd
resist the idea of core syntax here; transactions are a core part of
PostgreSQL.

There is, of course, the question of whether making LSNs such a
user-visible thing is a good idea in the first place, but that's a
separate question from issue of what syntax for such a thing is best.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17David Steele
david@pgmasters.net
In reply to: David Steele (#15)
Re: make async slave to wait for lsn to be replayed

Hi Ivan,

On 3/21/17 1:06 PM, David Steele wrote:

Hi Ivan,

On 3/12/17 10:20 PM, Thomas Munro wrote:

On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the
discretion
of the community.

Will be glad to hear your suggestion about syntax, code and patch.

Hi Ivan,

Here is some feedback based on a first read-through of the v4 patch.
I haven't tested it yet.

This thread has been idle for over a week. Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".

This submission has been marked "Returned with Feedback". Please feel
free to resubmit to a future commitfest.

Regards,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Ivan Kartyshov (#13)
Re: make async slave to wait for lsn to be replayed

On 3/9/17 07:49, Ivan Kartyshov wrote:

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the
discretion of the community.

This patch is registered in the upcoming commit fest, but it needs to be
rebased.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Craig Ringer
craig@2ndquadrant.com
In reply to: Robert Haas (#16)
Re: make async slave to wait for lsn to be replayed

On 22 March 2017 at 01:17, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Maybe someone can think of a clever way for an extension to insert a
wait for a user-supplied LSN *before* acquiring a snapshot so it can
work for the higher levels, or maybe the hooks should go into core
PostgreSQL so that the extension can exist as an external project not
requiring a patched PostgreSQL installation, or maybe this should be
done with new core syntax that extends transaction commands. Do other
people have views on this?

IMHO, trying to do this using a function-based interface is a really
bad idea for exactly the reasons you mention. I don't see why we'd
resist the idea of core syntax here; transactions are a core part of
PostgreSQL.

There is, of course, the question of whether making LSNs such a
user-visible thing is a good idea in the first place, but that's a
separate question from issue of what syntax for such a thing is best.

(I know this is old, but):

That ship sailed a long time ago unfortunately, they're all over
pg_stat_replication and pg_replication_slots and so on. They're already
routinely used for monitoring replication lag in bytes, waiting for a peer
to catch up, etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20Noname
i.kartyshov@postgrespro.ru
In reply to: Thomas Munro (#14)
Re: make async slave to wait for lsn to be replayed

Hello, thank you for your comments over main idea and code.

On 13.03.2017 05:20, Thomas Munro wrote:
1)

First, I'll restate my view of the syntax-vs-function question: I
think an fmgr function is the wrong place to do this, because it
doesn't work for our 2 higher isolation levels as mentioned. Most
people probably use READ COMMITTED most of the time so the extension
version you've proposed is certainly useful for many people and I like
it, but I will vote against inclusion in core of any feature that
doesn't consider all three of our isolation levels, especially if
there is no way to extend support to other levels later. I don't want
PostgreSQL to keep adding features that eventually force everyone to
use READ COMMITTED because they want to use feature X, Y or Z.

Waiting for LSN is expected to be used on hot standby READ ONLY
replication.
Only READ COMMITTED and REPEATABLE READ, are allowed on hot standby.

Maybe someone can think of a clever way for an extension to insert a
wait for a user-supplied LSN *before* acquiring a snapshot so it can
work for the higher levels, or maybe the hooks should go into core
PostgreSQL so that the extension can exist as an external project not
requiring a patched PostgreSQL installation, or maybe this should be
done with new core syntax that extends transaction commands. Do other
people have views on this?

I think it is a good idea, but it is not clear for me, how to do it
properly.

2)

+wl_lsn_updated_hook(void)
+{
+    uint i;
+    /*
+     * After update lastReplayedEndRecPtr set Latches in SHMEM array
+     */
+    if (counter_waitlsn % count_waitlsn == 0
+        || 
TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+    {

Doesn't this mean that if you are waiting for LSN 1234, and the
primary sends that LSN and then doesn't send anything for another
hour, a standby waiting in pg_waitlsn is quite likely to skip that
update (either because of count_waitlsn or interval_waitlsn), and then
finish up waiting for a very long time?

I'm not sure if this is a good idea, but it's an idea: You could keep
your update skipping logic, but make sure you always catch the
important case where recovery hits the end of WAL, by invoking your
callback from WaitForWALToBecomeAvailable. It could have a boolean
parameter that means 'don't skip this one!'. In other words, it's OK
to skip updates, but not if you know there is no more WAL available to
apply (since you have no idea how long it will be for more to arrive).

Calling GetCurrentTimestamp() at high frequency (after every WAL
record is replayed) may be a bad idea. It's a slow system call on
some operating systems. Can we use an existing timestamp for free,
like recoveryLastXTime? Remembering that the motivation for using
this feature is to wait for *whole transactions* to be replayed and
become visible, there is no sensible reason to wait for random WAL
positions inside a transaction, so if you used that then you would
effectively skip all non-COMMIT records and also skip some COMMIT
records that are coming down the pipe too fast.

Yes, I applied this idea and prepared new patch.

3)

+static void
+wl_own_latch(void)
+{
+    SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+    OwnLatch(&state->l_arr[MyBackendId].latch);
+    is_latch_owned = true;
+
+    if (state->backend_maxid < MyBackendId)
+        state->backend_maxid = MyBackendId;
+
+    state->l_arr[MyBackendId].pid = MyProcPid;
+    SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}

What is the point of using extra latches for this? Why not just use
the standard latch? Syncrep and many other things do that. I'm not
actually sure if there is ever a reason to create more latches in
regular backends. SIGUSR1 will be delivered and set the main latch
anyway.

There are cases of specialised latches in the system, like the wal
receiver latch, and I'm reliably informed that that solves problems
like delivering a wakeup message without having to know which backend
is currently the wal receiver (a problem we might consider solving
today with a condition variable?) I don't think anything like that
applies here.

In my case I create a bunch of shared latches for each backend, I`ll
think
how to use standard latches in an efficient way. And about specialized
latches on standby they are already in busy with wal replaying
functions.

4)

+        for (i = 0; i <= state->backend_maxid; i++)
+        {
+            SpinLockAcquire(&state->l_arr[i].slock);
+            if (state->l_arr[i].pid != 0)
+                SetLatch(&state->l_arr[i].latch);
+            SpinLockRelease(&state->l_arr[i].slock);
+        }

Once we get through the update-skipping logic above, we hit this loop
which acquires spinlocks for every backend one after another and sets
the latches of every backend, no matter whether they are waiting for
the LSN that has been applied. Assuming we go with this
scan-the-whole-array approach, why not include the LSN waited for in
the array slots, so that we can avoid disturbing processes that are
waiting for a later LSN?

Done.

Could you talk a bit about the trade-off between this approach and a
queue based approach? I would like to understand whether this really
is the best way to do it.
One way to use a queue would be
ConditionVariableBroadcast(&state->lsn_moved), and then waiters would
use a condition variable wait loop. That would make your code much
simpler (you wouldn't even need your array of spinlocks + latches) but
would still have the problem of processes waking up even though
they're actually waiting for a later LSN. Another choice would be to
create a custom wait list which actually holds the LSNs waited for in
sorted order, so that we wake up exactly the right processes, cheaply,
or in arbitrary order which makes insertion cheaper but search more
expensive.

I`ll think how to implemented waiting for lsn with queue in next patch.

Thank you for your review.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Noname
i.kartyshov@postgrespro.ru
In reply to: Craig Ringer (#19)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

I forget to include patch in last letter.

Craig Ringer wrote 2017-08-15 05:00:

That ship sailed a long time ago unfortunately, they're all over
pg_stat_replication and pg_replication_slots and so on. They're
already routinely used for monitoring replication lag in bytes,
waiting for a peer to catch up, etc.

--

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Function pg_replication_slots is only master, and waitlsn is async hot
standby replication function. It allows us to wait untill insert made on
master is be replayed on replica.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

waitlsn_10dev_v6.patchtext/x-diff; name=waitlsn_10dev_v6.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 0000000..077e869
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,119 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-WAITLSN">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAITLSN</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAITLSN</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAITLSN</refname>
+  <refpurpose>wait until target <acronym>LSN</> has been replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAITLSN <replaceable class="PARAMETER">'LSN'</replaceable> [ , <replaceable class="PARAMETER">delay</replaceable> ]
+WAITLSN_INFINITE <replaceable class="PARAMETER">'LSN'</replaceable>
+WAITLSN_NO_WAIT <replaceable class="PARAMETER">'LSN'</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   The <command>WAITLSN</command> wait till target <acronym>LSN</> will
+   be replayed with an optional <quote>delay</> (milliseconds by default
+   infinity) to be wait for LSN to replayed.
+  </para>
+
+  <para>
+   <command>WAITLSN</command> provides a simple
+   interprocess <acronym>LSN</> wait mechanism for a backends on slave
+   in master-slave replication scheme on <productname>PostgreSQL</productname> database.
+  </para>
+
+  <para>
+   The <command>WAITLSN_INFINITE</command> wait till target <acronym>LSN</> will
+   be replayed on slave .
+  </para>
+
+  <para>
+   The <command>WAITLSN_NO_WAIT</command> will tell if target <acronym>LSN</> was replayed without any wait.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Target log sequence number to be wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">delay</replaceable></term>
+    <listitem>
+     <para>
+      Time in miliseconds to waiting for LSN to be replayed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+  <para>
+   Delay time for waiting till LSN to be replayed must be integer. For
+   default it is infinity. Waiting can be interupped using Ctl+C, or
+   by Postmaster death.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Configure and execute a waitlsn from
+   <application>psql</application>:
+
+<programlisting>
+WAITLSN '0/3F07A6B1', 10000;
+NOTICE:  LSN is not reached. Try to make bigger delay.
+WAITLSN
+
+WAITLSN '0/3F07A611';
+WAITLSN
+
+WAITLSN '0/3F0FF791', 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to make bigger delay.
+ERROR:  canceling statement due to user request
+</programlisting>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAITLSN</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index df4843f..9dfb59d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -147,7 +148,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7237,6 +7237,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+				{
+
+					WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index bacc08e..7d6011f 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -139,7 +139,6 @@
 #include "utils/ps_status.h"
 #include "utils/timestamp.h"
 
-
 /*
  * Maximum size of a NOTIFY payload, including terminating NULL.  This
  * must be kept small enough so that a notification message fits on one
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000..4f6cb7b
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,273 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2017, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "funcapi.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static uint32 WaitLSNShmemSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(XLogRecPtr trg_lsn);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+	XLogRecPtr			trg_lsn;
+} BIDLatch;
+
+typedef struct
+{
+	char			dummy;			// УБРАТЬ
+	int				backend_maxid;
+	XLogRecPtr		min_lsn;
+	BIDLatch		l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+/* Take Latch for current backend at the begining of WAITLSN */
+static void
+WLOwnLatch(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	state->l_arr[MyBackendId].trg_lsn = trg_lsn;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+}
+
+/* Release Latch for current backend at the end of WAITLSN */
+static void
+WLDisownLatch(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->l_arr[MyBackendId].trg_lsn;
+
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+	state->l_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	/* Update state->min_lsn iff it is nessesary choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->l_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->l_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->l_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=2; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function on abort*/
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+/* Get size of shared memory to room GlobState */
+static uint32
+WaitLSNShmemSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+/* Init array of Latches in shared memory */
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitLSNShmemSize(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			state->l_arr[i].trg_lsn = InvalidXLogRecPtr;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all Latches in shared memory cause new LSN been replayed*/
+void
+WaitLSNSetLatch(XLogRecPtr cur_lsn)
+{
+	uint i;
+
+	for (i = 2; i <= state->backend_maxid; i++)
+	{
+		if (state->l_arr[i].trg_lsn != 0)
+		{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].trg_lsn <= cur_lsn)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWaitLSN(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAITLSN own latch and wait till LSN is replayed, Postmaster death, interruption
+ * or timeout.
+ */
+void
+WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64_t		tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState	*tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "false";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+
+		/* CHECK_FOR_INTERRUPTS if they comes then disown latch current */
+		if (InterruptPending)
+		{
+			WLDisownLatch();
+			ProcessInterrupts();
+		}
+
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to make bigger delay.");
+	else
+		value = "true";
+
+	/* need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc);
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d0de99..89e1e1d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -275,7 +275,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitLSNStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -682,7 +682,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WAITLSN_INFINITE WAITLSN_NO_WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -933,6 +934,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -13819,7 +13821,41 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
 
+WaitLSNStmt:
+			WAITLSN Sconst WaitDelay
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $3;
+					$$ = (Node *)n;
+				}
+			| WAITLSN_INFINITE Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN_NO_WAIT Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 1;
+					$$ = (Node *)n;
+				}
+		;
+WaitDelay:
+			',' Iconst							{ $$ = $2; }
+			| /*EMPTY*/							{ $$ = 0; }
+		;
 /*
  * Supporting nonterminals for expressions.
  */
@@ -14856,6 +14892,9 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
+			| WAITLSN_INFINITE
+			| WAITLSN_NO_WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..932136f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -271,6 +272,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	AsyncShmemInit();
 	BackendRandomShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 775477c..61f0674 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -56,6 +56,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -923,6 +924,20 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2481,6 +2496,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -3100,6 +3119,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000..49cf9e8
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+#include "tcop/dest.h"
+
+extern void WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitLSN(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3..df5d296 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -479,6 +479,7 @@ typedef enum NodeTag
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
+	T_WaitLSNStmt,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 5f2a4a7..0ff537c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3424,4 +3424,16 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *             WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag         type;
+	char       *lsn;                  /* Taraget LSN to wait for */
+	int                     delay;    /* Delay to wait for LSN*/
+	bool            nowait;           /* No wait for LSN just result*/
+} WaitLSNStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f50e45e..6ebb4a8 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -433,6 +433,9 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn_infinite", WAITLSN_INFINITE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn_no_wait", WAITLSN_NO_WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
#22Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Ivan Kartyshov (#6)
Re: make async slave to wait for lsn to be replayed

On Mon, Jan 23, 2017 at 2:56 PM, Ivan Kartyshov <i.kartyshov@postgrespro.ru>
wrote:

How to use it
==========
WAITLSN ‘LSN’ [, timeout in ms];
WAITLSN_INFINITE ‘LSN’;
WAITLSN_NO_WAIT ‘LSN’;

Adding suffix to the command name looks weird. We don't do so for any
other command.
I propose following syntax options.

WAITLSN lsn;
WAITLSN lsn TIMEOUT delay;
WAITLSN lsn INFINITE;
WAITLSN lsn NOWAIT;

For me that looks rather better. What do you think?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#23Ants Aasma
ants.aasma@eesti.ee
In reply to: Craig Ringer (#19)
Re: make async slave to wait for lsn to be replayed

On Tue, Aug 15, 2017 at 5:00 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 22 March 2017 at 01:17, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Maybe someone can think of a clever way for an extension to insert a
wait for a user-supplied LSN *before* acquiring a snapshot so it can
work for the higher levels, or maybe the hooks should go into core
PostgreSQL so that the extension can exist as an external project not
requiring a patched PostgreSQL installation, or maybe this should be
done with new core syntax that extends transaction commands. Do other
people have views on this?

IMHO, trying to do this using a function-based interface is a really
bad idea for exactly the reasons you mention. I don't see why we'd
resist the idea of core syntax here; transactions are a core part of
PostgreSQL.

There is, of course, the question of whether making LSNs such a
user-visible thing is a good idea in the first place, but that's a
separate question from issue of what syntax for such a thing is best.

(I know this is old, but):

That ship sailed a long time ago unfortunately, they're all over
pg_stat_replication and pg_replication_slots and so on. They're already
routinely used for monitoring replication lag in bytes, waiting for a peer
to catch up, etc.

(continuing the trend of resurrecting old topics)

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Alexander Korotkov (#22)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

Alexander Korotkov писал 2017-09-26 12:07:

I propose following syntax options.

WAITLSN lsn;
WAITLSN lsn TIMEOUT delay;
WAITLSN lsn INFINITE;
WAITLSN lsn NOWAIT;

For me that looks rather better. What do you think?

I agree with you, now syntax looks better.
New patch attached to tha mail.

Ants Aasma писал 2017-09-26 13:00:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.

I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

waitlsn_11dev_v7.patchtext/x-diff; name=waitlsn_11dev_v7.patchDownload
commit 217f842726531edb1b0056a5c5727ab01bab7f9b
Author: i.kartyshov <i.kartyshov@postgrespro.com>
Date:   Mon Oct 23 12:08:59 2017 +0300

    Cherry picked and ported 11dev

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 0000000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-WAITLSN">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAITLSN</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAITLSN</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAITLSN</refname>
+  <refpurpose>wait for the target <acronym>LSN</> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' [ INFINITELY ]
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' TIMEOUT <replaceable class="PARAMETER">wait_time</replaceable>
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' NOWAIT
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   interprocess communication mechanism to wait for the target log sequence
+   number (<acronym>LSN</>) on standby in <productname>&productname;</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAITLSN</command> command
+   waits for the specified <acronym>LSN</> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</>, or
+   by shutting down the <literal>postgres</> server. You can also limit the wait
+   time using the <option>TIMEOUT</> option, or check the target <acronym>LSN</>
+   status immediately using the <option>NOWAIT</> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">LSN</replaceable></term>
+    <listitem>
+     <para>
+	  Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+	<term>INFINITELY</term>
+    <listitem>
+     <para>
+	  Wait until the target <acronym>LSN</> is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+   <varlistentry>
+	<term>TIMEOUT <replaceable class="PARAMETER">wait_time</replaceable></term>
+	<listitem>
+	 <para>
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified <replaceable>wait_time</replaceable> must be an integer
+	  and is measured in milliseconds.
+	 </para>
+	</listitem>
+   </varlistentry>
+
+   <varlistentry>
+	<term>NOWAIT</term>
+	<listitem>
+	 <para>
+	  Report whether the target <acronym>LSN</> has been replayed already,
+	  without any waiting.
+	 </para>
+	</listitem>
+   </varlistentry>
+  </variablelist>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAITLSN</> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAITLSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</> is replayed:
+<screen>
+WAITLSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAITLSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAITLSN</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+				{
+
+					WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index f7de742..cdeddfc 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -141,7 +141,6 @@
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
 
-
 /*
  * Maximum size of a NOTIFY payload, including terminating NULL.  This
  * must be kept small enough so that a notification message fits on one
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000..db2f549
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,273 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2017, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "funcapi.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static uint32 WaitLSNShmemSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(XLogRecPtr trg_lsn);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+	XLogRecPtr			trg_lsn;
+} BIDLatch;
+
+typedef struct
+{
+	char			dummy;			// УБРАТЬ
+	int				backend_maxid;
+	XLogRecPtr		min_lsn;
+	BIDLatch		l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+/* Take Latch for current backend at the begining of WAITLSN */
+static void
+WLOwnLatch(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	state->l_arr[MyBackendId].trg_lsn = trg_lsn;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+}
+
+/* Release Latch for current backend at the end of WAITLSN */
+static void
+WLDisownLatch(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->l_arr[MyBackendId].trg_lsn;
+
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+	state->l_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	/* Update state->min_lsn iff it is nessesary choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->l_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->l_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->l_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=2; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function on abort*/
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+/* Get size of shared memory to room GlobState */
+static uint32
+WaitLSNShmemSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+/* Init array of Latches in shared memory */
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitLSNShmemSize(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			state->l_arr[i].trg_lsn = InvalidXLogRecPtr;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all Latches in shared memory cause new LSN been replayed*/
+void
+WaitLSNSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32 i;
+
+	for (i = 2; i <= state->backend_maxid; i++)
+	{
+		if (state->l_arr[i].trg_lsn != 0)
+		{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].trg_lsn <= cur_lsn)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWaitLSN(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAITLSN own latch and wait till LSN is replayed, Postmaster death, interruption
+ * or timeout.
+ */
+void
+WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState	*tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+
+		/* CHECK_FOR_INTERRUPTS if they comes then disown latch current */
+		if (InterruptPending)
+		{
+			WLDisownLatch();
+			ProcessInterrupts();
+		}
+
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc);
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4c83a63..a149b54 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -275,7 +275,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitLSNStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -636,7 +636,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
-	INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDEX INDEXES INFINITELY INHERIT INHERITS INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -675,7 +675,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
+	TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
 	UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -684,7 +684,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -935,6 +936,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -13831,6 +13833,44 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
+
+WaitLSNStmt:
+			WAITLSN Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst TIMEOUT Iconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $4;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst INFINITELY
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst NOWAIT
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 1;
+					$$ = (Node *)n;
+				}
+		;
 
 /*
  * Supporting nonterminals for expressions.
@@ -14705,6 +14745,7 @@ unreserved_keyword:
 			| INCREMENT
 			| INDEX
 			| INDEXES
+			| INFINITELY
 			| INHERIT
 			| INHERITS
 			| INLINE_P
@@ -14843,6 +14884,7 @@ unreserved_keyword:
 			| TEMPLATE
 			| TEMPORARY
 			| TEXT_P
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -14868,6 +14910,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..932136f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -271,6 +272,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	AsyncShmemInit();
 	BackendRandomShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 82a707a..544baeb 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -56,6 +56,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -923,6 +924,20 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2481,6 +2496,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -3104,6 +3123,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000..49cf9e8
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+#include "tcop/dest.h"
+
+extern void WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitLSN(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffeeb49..201677b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -479,6 +479,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitLSNStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 732e5d6..55ffda8 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3446,4 +3446,16 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag		type;
+	char	   *lsn;			/* Taraget LSN to wait for */
+	int			delay;			/* Delay to wait for LSN*/
+	bool		nowait;			/* No wait for LSN just result*/
+} WaitLSNStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f50e45e..618cdb2 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -198,6 +198,7 @@ PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD)
 PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD)
 PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD)
+PG_KEYWORD("infinitely", INFINITELY, UNRESERVED_KEYWORD)
 PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD)
 PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD)
@@ -394,6 +395,7 @@ PG_KEYWORD("temporary", TEMPORARY, UNRESERVED_KEYWORD)
 PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -433,6 +435,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
#25Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Ivan Kartyshov (#24)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

New little cleanup code changes

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

waitlsn_11dev_v7edit.patchtext/x-diff; name=waitlsn_11dev_v7edit.patchDownload
commit 217f842726531edb1b0056a5c5727ab01bab7f9b
Author: i.kartyshov <i.kartyshov@postgrespro.com>
Date:   Mon Oct 23 12:08:59 2017 +0300

    Cherry picked and ported 11dev

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 0000000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-WAITLSN">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAITLSN</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAITLSN</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAITLSN</refname>
+  <refpurpose>wait for the target <acronym>LSN</> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' [ INFINITELY ]
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' TIMEOUT <replaceable class="PARAMETER">wait_time</replaceable>
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' NOWAIT
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   interprocess communication mechanism to wait for the target log sequence
+   number (<acronym>LSN</>) on standby in <productname>&productname;</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAITLSN</command> command
+   waits for the specified <acronym>LSN</> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</>, or
+   by shutting down the <literal>postgres</> server. You can also limit the wait
+   time using the <option>TIMEOUT</> option, or check the target <acronym>LSN</>
+   status immediately using the <option>NOWAIT</> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">LSN</replaceable></term>
+    <listitem>
+     <para>
+	  Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+	<term>INFINITELY</term>
+    <listitem>
+     <para>
+	  Wait until the target <acronym>LSN</> is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+   <varlistentry>
+	<term>TIMEOUT <replaceable class="PARAMETER">wait_time</replaceable></term>
+	<listitem>
+	 <para>
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified <replaceable>wait_time</replaceable> must be an integer
+	  and is measured in milliseconds.
+	 </para>
+	</listitem>
+   </varlistentry>
+
+   <varlistentry>
+	<term>NOWAIT</term>
+	<listitem>
+	 <para>
+	  Report whether the target <acronym>LSN</> has been replayed already,
+	  without any waiting.
+	 </para>
+	</listitem>
+   </varlistentry>
+  </variablelist>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAITLSN</> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAITLSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</> is replayed:
+<screen>
+WAITLSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAITLSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAITLSN</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+				{
+
+					WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index f7de742..cdeddfc 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -141,7 +141,6 @@
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
 
-
 /*
  * Maximum size of a NOTIFY payload, including terminating NULL.  This
  * must be kept small enough so that a notification message fits on one
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000..db2f549
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,273 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2017, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "funcapi.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+
+/* Latches Own-DisownLatch and AbortCallBack */
+static uint32 WaitLSNShmemSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(XLogRecPtr trg_lsn);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+	XLogRecPtr			trg_lsn;
+} BIDLatch;
+
+
+typedef struct
+{
+	int				backend_maxid;
+	XLogRecPtr		min_lsn;
+	BIDLatch		l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+/* Take Latch for current backend at the begining of WAITLSN */
+static void
+WLOwnLatch(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	state->l_arr[MyBackendId].trg_lsn = trg_lsn;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+}
+
+/* Release Latch for current backend at the end of WAITLSN */
+static void
+WLDisownLatch(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->l_arr[MyBackendId].trg_lsn;
+
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+	state->l_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	/* Update state->min_lsn iff it is nessesary choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->l_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->l_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->l_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=2; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function on abort*/
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+/* Get size of shared memory to room GlobState */
+static uint32
+WaitLSNShmemSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+/* Init array of Latches in shared memory */
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitLSNShmemSize(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			state->l_arr[i].trg_lsn = InvalidXLogRecPtr;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all Latches in shared memory cause new LSN been replayed*/
+void
+WaitLSNSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32 i;
+
+	for (i = 2; i <= state->backend_maxid; i++)
+	{
+		if (state->l_arr[i].trg_lsn != 0)
+		{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].trg_lsn <= cur_lsn)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWaitLSN(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAITLSN own latch and wait till LSN is replayed, Postmaster death, interruption
+ * or timeout.
+ */
+void
+WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState	*tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+
+		/* CHECK_FOR_INTERRUPTS if they comes then disown latch current */
+		if (InterruptPending)
+		{
+			WLDisownLatch();
+			ProcessInterrupts();
+		}
+
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc);
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4c83a63..a149b54 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -275,7 +275,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitLSNStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -636,7 +636,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
-	INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDEX INDEXES INFINITELY INHERIT INHERITS INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -675,7 +675,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
+	TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
 	UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -684,7 +684,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -935,6 +936,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -13831,6 +13833,44 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
+
+WaitLSNStmt:
+			WAITLSN Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst TIMEOUT Iconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $4;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst INFINITELY
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst NOWAIT
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 1;
+					$$ = (Node *)n;
+				}
+		;
 
 /*
  * Supporting nonterminals for expressions.
@@ -14705,6 +14745,7 @@ unreserved_keyword:
 			| INCREMENT
 			| INDEX
 			| INDEXES
+			| INFINITELY
 			| INHERIT
 			| INHERITS
 			| INLINE_P
@@ -14843,6 +14884,7 @@ unreserved_keyword:
 			| TEMPLATE
 			| TEMPORARY
 			| TEXT_P
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -14868,6 +14910,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..932136f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -271,6 +272,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	AsyncShmemInit();
 	BackendRandomShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 82a707a..544baeb 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -56,6 +56,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -923,6 +924,20 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2481,6 +2496,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -3104,6 +3123,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000..49cf9e8
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+#include "tcop/dest.h"
+
+extern void WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitLSN(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffeeb49..201677b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -479,6 +479,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitLSNStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 732e5d6..55ffda8 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3446,4 +3446,16 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag		type;
+	char	   *lsn;			/* Taraget LSN to wait for */
+	int			delay;			/* Delay to wait for LSN*/
+	bool		nowait;			/* No wait for LSN just result*/
+} WaitLSNStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f50e45e..618cdb2 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -198,6 +198,7 @@ PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD)
 PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD)
 PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD)
+PG_KEYWORD("infinitely", INFINITELY, UNRESERVED_KEYWORD)
 PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD)
 PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD)
@@ -394,6 +395,7 @@ PG_KEYWORD("temporary", TEMPORARY, UNRESERVED_KEYWORD)
 PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -433,6 +435,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
#26Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Ivan Kartyshov (#25)
Re: make async slave to wait for lsn to be replayed

On Mon, Oct 23, 2017 at 12:42 PM, Ivan Kartyshov <i.kartyshov@postgrespro.ru

wrote:

New little cleanup code changes

Despite code cleanup, you still have some random empty lines removals in
your patch.

@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {

{NULL, 0, false}
};

-
/*
* Although only "on", "off", and "always" are documented,
* we accept all the likely variants of "on" and "off".

@@ -141,7 +141,6 @@

#include "utils/timestamp.h"
#include "utils/tqual.h"

-
/*
* Maximum size of a NOTIFY payload, including terminating NULL. This
* must be kept small enough so that a notification message fits on one

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#27Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Alexander Korotkov (#26)
1 attachment(s)
Re: make async slave to wait for lsn to be replayed

Alexander Korotkov писал 2017-10-23 13:19:

Despite code cleanup, you still have some random empty lines removals
in your patch.

I reconfigured my IDE to avoid this in the future.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

waitlsn_11dev_v8.patchtext/x-diff; name=waitlsn_11dev_v8.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitlsn            SYSTEM "waitlsn.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 0000000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-WAITLSN">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAITLSN</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAITLSN</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAITLSN</refname>
+  <refpurpose>wait for the target <acronym>LSN</> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' [ INFINITELY ]
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' TIMEOUT <replaceable class="PARAMETER">wait_time</replaceable>
+WAITLSN '<replaceable class="PARAMETER">LSN</replaceable>' NOWAIT
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   interprocess communication mechanism to wait for the target log sequence
+   number (<acronym>LSN</>) on standby in <productname>&productname;</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAITLSN</command> command
+   waits for the specified <acronym>LSN</> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</>, or
+   by shutting down the <literal>postgres</> server. You can also limit the wait
+   time using the <option>TIMEOUT</> option, or check the target <acronym>LSN</>
+   status immediately using the <option>NOWAIT</> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">LSN</replaceable></term>
+    <listitem>
+     <para>
+	  Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+	<term>INFINITELY</term>
+    <listitem>
+     <para>
+	  Wait until the target <acronym>LSN</> is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+   <varlistentry>
+	<term>TIMEOUT <replaceable class="PARAMETER">wait_time</replaceable></term>
+	<listitem>
+	 <para>
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified <replaceable>wait_time</replaceable> must be an integer
+	  and is measured in milliseconds.
+	 </para>
+	</listitem>
+   </varlistentry>
+
+   <varlistentry>
+	<term>NOWAIT</term>
+	<listitem>
+	 <para>
+	  Report whether the target <acronym>LSN</> has been replayed already,
+	  without any waiting.
+	 </para>
+	</listitem>
+   </varlistentry>
+  </variablelist>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAITLSN</> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAITLSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</> is replayed:
+<screen>
+WAITLSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAITLSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAITLSN</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@
    &update;
    &vacuum;
    &values;
+   &waitlsn;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+				{
+
+					WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 0000000..db2f549
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,273 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2017, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on slave
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/waitlsn.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "funcapi.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+
+/* Latches Own-DisownLatch and AbortCallBack */
+static uint32 WaitLSNShmemSize(void);
+static void WLDisownLatchAbort(XactEvent event, void *arg);
+static void WLOwnLatch(XLogRecPtr trg_lsn);
+static void WLDisownLatch(void);
+
+void		_PG_init(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	int					pid;
+	volatile slock_t	slock;
+	Latch				latch;
+	XLogRecPtr			trg_lsn;
+} BIDLatch;
+
+
+typedef struct
+{
+	int				backend_maxid;
+	XLogRecPtr		min_lsn;
+	BIDLatch		l_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState  *state;
+bool						is_latch_owned = false;
+
+/* Take Latch for current backend at the begining of WAITLSN */
+static void
+WLOwnLatch(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	OwnLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = true;
+
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->l_arr[MyBackendId].pid = MyProcPid;
+	state->l_arr[MyBackendId].trg_lsn = trg_lsn;
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+}
+
+/* Release Latch for current backend at the end of WAITLSN */
+static void
+WLDisownLatch(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->l_arr[MyBackendId].trg_lsn;
+
+	SpinLockAcquire(&state->l_arr[MyBackendId].slock);
+	DisownLatch(&state->l_arr[MyBackendId].latch);
+	is_latch_owned = false;
+	state->l_arr[MyBackendId].pid = 0;
+	state->l_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	/* Update state->min_lsn iff it is nessesary choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->l_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->l_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->l_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MaxConnections+1); i >=2; i--)
+			if (state->l_arr[i].pid != 0)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->l_arr[MyBackendId].slock);
+}
+
+/* CallBack function on abort*/
+static void
+WLDisownLatchAbort(XactEvent event, void *arg)
+{
+	if (is_latch_owned && (event == XACT_EVENT_PARALLEL_ABORT ||
+						   event == XACT_EVENT_ABORT))
+	{
+		WLDisownLatch();
+	}
+}
+
+/* Module load callback */
+void
+_PG_init(void)
+{
+	if (!IsUnderPostmaster)
+		RegisterXactCallback(WLDisownLatchAbort, NULL);
+}
+
+/* Get size of shared memory to room GlobState */
+static uint32
+WaitLSNShmemSize(void)
+{
+	return offsetof(GlobState, l_arr) + sizeof(BIDLatch) * (MaxConnections+1);
+}
+
+/* Init array of Latches in shared memory */
+void
+WaitLSNShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitLSNShmemSize(),
+										  &found);
+	if (!found)
+	{
+		for (i = 0; i < (MaxConnections+1); i++)
+		{
+			state->l_arr[i].pid = 0;
+			state->l_arr[i].trg_lsn = InvalidXLogRecPtr;
+			SpinLockInit(&state->l_arr[i].slock);
+			InitSharedLatch(&state->l_arr[i].latch);
+		}
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all Latches in shared memory cause new LSN been replayed*/
+void
+WaitLSNSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32 i;
+
+	for (i = 2; i <= state->backend_maxid; i++)
+	{
+		if (state->l_arr[i].trg_lsn != 0)
+		{
+		SpinLockAcquire(&state->l_arr[i].slock);
+		if (state->l_arr[i].trg_lsn <= cur_lsn)
+			SetLatch(&state->l_arr[i].latch);
+		SpinLockRelease(&state->l_arr[i].slock);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWaitLSN(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAITLSN own latch and wait till LSN is replayed, Postmaster death, interruption
+ * or timeout.
+ */
+void
+WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState	*tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	WLOwnLatch(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN had been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If Delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(&state->l_arr[MyBackendId].latch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(&state->l_arr[MyBackendId].latch);
+
+		/* CHECK_FOR_INTERRUPTS if they comes then disown latch current */
+		if (InterruptPending)
+		{
+			WLDisownLatch();
+			ProcessInterrupts();
+		}
+
+	}
+
+	WLDisownLatch();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc);
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4c83a63..a149b54 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -275,7 +275,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitLSNStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	OptSchemaEltList
 
 %type <boolean> TriggerForSpec TriggerForType
-%type <ival>	TriggerActionTime
+%type <ival>	TriggerActionTime WaitDelay
 %type <list>	TriggerEvents TriggerOneEvent
 %type <value>	TriggerFuncArg
 %type <node>	TriggerWhen
@@ -636,7 +636,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
-	INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDEX INDEXES INFINITELY INHERIT INHERITS INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -675,7 +675,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
+	TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
 	UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -684,7 +684,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAITLSN WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -935,6 +936,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitLSNStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -13831,6 +13833,44 @@ frame_bound:
 				}
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAITLSN <LSN> can appear as a query-level command
+ *
+ *
+ *****************************************************************************/
+
+WaitLSNStmt:
+			WAITLSN Sconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst TIMEOUT Iconst
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = $4;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst INFINITELY
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					$$ = (Node *)n;
+				}
+			| WAITLSN Sconst NOWAIT
+				{
+					WaitLSNStmt *n = makeNode(WaitLSNStmt);
+					n->lsn = $2;
+					n->delay = 1;
+					$$ = (Node *)n;
+				}
+		;
 
 /*
  * Supporting nonterminals for expressions.
@@ -14705,6 +14745,7 @@ unreserved_keyword:
 			| INCREMENT
 			| INDEX
 			| INDEXES
+			| INFINITELY
 			| INHERIT
 			| INHERITS
 			| INLINE_P
@@ -14843,6 +14884,7 @@ unreserved_keyword:
 			| TEMPLATE
 			| TEMPORARY
 			| TEXT_P
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -14868,6 +14910,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAITLSN
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..932136f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -271,6 +272,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	AsyncShmemInit();
 	BackendRandomShmemInit();
 
+	/*
+	 * Init array of Latches  in SHMEM for WAITLSN
+	 */
+	WaitLSNShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 82a707a..544baeb 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -56,6 +56,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -923,6 +924,20 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitLSNStmt:
+			{
+				WaitLSNStmt *stmt = (WaitLSNStmt *) parsetree;
+				if (!RecoveryInProgress())
+				{
+					ereport(ERROR,(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
+							errmsg("cannot execute %s not during recovery",
+							"WaitLSN")));
+				}
+				else
+					WaitLSNUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2481,6 +2496,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitLSNStmt:
+			tag = "WAITLSN";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -3104,6 +3123,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitLSNStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
new file mode 100644
index 0000000..49cf9e8
--- /dev/null
+++ b/src/include/commands/waitlsn.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * waitlsn.h
+ *	  WaitLSN notification: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/waitlsn.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAITLSN_H
+#define WAITLSN_H
+#include "tcop/dest.h"
+
+extern void WaitLSNUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitLSNShmemInit(void);
+extern void WaitLSNSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitLSN(void);
+
+#endif   /* WAITLSN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffeeb49..201677b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -479,6 +479,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitLSNStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 732e5d6..55ffda8 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3446,4 +3446,16 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WaitLSN Statement
+ * ----------------------
+ */
+typedef struct WaitLSNStmt
+{
+	NodeTag		type;
+	char	   *lsn;			/* Taraget LSN to wait for */
+	int			delay;			/* Delay to wait for LSN*/
+	bool		nowait;			/* No wait for LSN just result*/
+} WaitLSNStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f50e45e..618cdb2 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -198,6 +198,7 @@ PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD)
 PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD)
 PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD)
+PG_KEYWORD("infinitely", INFINITELY, UNRESERVED_KEYWORD)
 PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD)
 PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD)
@@ -394,6 +395,7 @@ PG_KEYWORD("temporary", TEMPORARY, UNRESERVED_KEYWORD)
 PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -433,6 +435,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("waitlsn", WAITLSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Ants Aasma (#23)
Re: make async slave to wait for lsn to be replayed

On Tue, Sep 26, 2017 at 12:00 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order.

That would be a bad thing to encode because it doesn't.

Well... actually on the standby it does, and that's the only thing
that matters in this case I guess. But I agree with you that's it's
not a wonderful thing to bake into the UI, because we might want to
change it some day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Ants Aasma
ants.aasma@eesti.ee
In reply to: Ivan Kartyshov (#24)
Re: make async slave to wait for lsn to be replayed

On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Ants Aasma писал 2017-09-26 13:00:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.

I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.

Currently transactions on the master become visible when xid is
removed from proc array. This depends on the order of acquiring
ProcArrayLock, which can happen in a different order from inserting
the commit record to wal. Whereas on the standby the transactions will
become visible in the same order that commit records appear in wal.
The difference can be quite large when transactions are using
different values for synchronous_commit. Fixing this is not easy, but
we may want to do it someday. IIRC CSN threads contained more
discussion on this topic. If we do fix it, it seems likely that what
we need to wait on is not LSN, but some other kind of value. If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Robert Haas
robertmhaas@gmail.com
In reply to: Ants Aasma (#29)
Re: make async slave to wait for lsn to be replayed

On Thu, Oct 26, 2017 at 4:29 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

That assumes, probably optimistically, that nobody will develop a
dependency on it being, precisely, an LSN.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Ants Aasma (#29)
Re: make async slave to wait for lsn to be replayed

Ants Aasma писал 2017-10-26 17:29:

On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Ants Aasma писал 2017-09-26 13:00:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to
be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.

I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.

Currently transactions on the master become visible when xid is
removed from proc array. This depends on the order of acquiring
ProcArrayLock, which can happen in a different order from inserting
the commit record to wal. Whereas on the standby the transactions will
become visible in the same order that commit records appear in wal.
The difference can be quite large when transactions are using
different values for synchronous_commit. Fixing this is not easy, but
we may want to do it someday. IIRC CSN threads contained more
discussion on this topic. If we do fix it, it seems likely that what
we need to wait on is not LSN, but some other kind of value. If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

Regards,
Ants Aasma

It sounds reasonable. I can offer the following version.

WAIT LSN lsn_number;
WAIT LSN lsn_number TIMEOUT delay;
WAIT LSN lsn_number INFINITE;
WAIT LSN lsn_number NOWAIT;

WAIT [token] wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Ants Aasma
ants.aasma@eesti.ee
In reply to: Ivan Kartyshov (#31)
Re: make async slave to wait for lsn to be replayed

On Mon, Oct 30, 2017 at 7:25 PM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

It sounds reasonable. I can offer the following version.

WAIT LSN lsn_number;
WAIT LSN lsn_number TIMEOUT delay;
WAIT LSN lsn_number INFINITE;
WAIT LSN lsn_number NOWAIT;

WAIT [token] wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

Making the interface more specific about the mechanism is not what I
had in mind, quite the opposite. I would like to see the interface
describe the desired effect of the wait.

Stepping back for a while, from what I understand the reason we want
to waiting is to prevent observation of database state going
backwards. To limit the amount of waiting needed we tell the database
what we have observed. For example "I just observed my transaction
commit", or "the last time I observed state was X", and then have the
database provide us with a snapshot that is causally dependent on
those states. This does not give us linearizability, for that we still
need at the very least serializable transactions on standby. But it
seems to provide a form of sequential consistency, which (if it can be
proved to hold) makes reasoning about concurrency a lot nicer.

For lack of a better proposal I would like something along the lines of:

WAIT FOR state_id[, state_id] [ OPTIONS (..)]

And to get the tokens maybe a function pg_last_commit_state().

Optionally, to provide read-to-read causality, pg_snapshot_state()
could return for example replay_lsn at the start of the current
transaction. This makes sure that things don't just appear and
disappear when load balancing across many standby servers.

WAIT FOR semantics is to ensure that next snapshot is causally
dependent (happens after) each of the specified observed states.

The state_id could simply be a LSN, or to allow for future expansion
something like 'lsn:0000/1234'. Example extension would be to allow
for waiting on xids. On master that would be just a ShareLock on the
transactionid. On standby it would wait for the commit or rollback
record for that transaction to be replayed.

Robert made a good point that people will still rely on the token
being an LSN, but perhaps they will be slightly less angry when we
explicitly tell them that this might change in the future.

Regards,
Ants Aasma

[1]: https://www.postgresql.org/docs/devel/static/functions-admin.html#functions-snapshot-synchronization

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Michael Paquier
michael.paquier@gmail.com
In reply to: Ants Aasma (#32)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Oct 31, 2017 at 9:42 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

Robert made a good point that people will still rely on the token
being an LSN, but perhaps they will be slightly less angry when we
explicitly tell them that this might change in the future.

This thread has stalled, I am marking the patch as returned with
feedback as this is what looks like feedback.
--
Michael

#34Stephen Frost
sfrost@snowman.net
In reply to: Ivan Kartyshov (#31)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Greetings Ivan,

* Ivan Kartyshov (i.kartyshov@postgrespro.ru) wrote:

Ants Aasma писал 2017-10-26 17:29:

On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
<i.kartyshov@postgrespro.ru> wrote:

Ants Aasma писал 2017-09-26 13:00:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.

I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.

Currently transactions on the master become visible when xid is
removed from proc array. This depends on the order of acquiring
ProcArrayLock, which can happen in a different order from inserting
the commit record to wal. Whereas on the standby the transactions will
become visible in the same order that commit records appear in wal.
The difference can be quite large when transactions are using
different values for synchronous_commit. Fixing this is not easy, but
we may want to do it someday. IIRC CSN threads contained more
discussion on this topic. If we do fix it, it seems likely that what
we need to wait on is not LSN, but some other kind of value. If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

Regards,
Ants Aasma

It sounds reasonable. I can offer the following version.

WAIT LSN lsn_number;
WAIT LSN lsn_number TIMEOUT delay;
WAIT LSN lsn_number INFINITE;
WAIT LSN lsn_number NOWAIT;

WAIT [token] wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

There were a few different suggestions made, but somehow this thread
ended up in Needs Review again while still having LSNs. I've changed it
back to Waiting for Author since it seems pretty unlikely that using LSN
is going to be acceptable based on the feedback.

Thanks!

Stephen

#35Simon Riggs
simon@2ndquadrant.com
In reply to: Stephen Frost (#34)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 22 January 2018 at 23:21, Stephen Frost <sfrost@snowman.net> wrote:

It sounds reasonable. I can offer the following version.

WAIT LSN lsn_number;
WAIT LSN lsn_number TIMEOUT delay;
WAIT LSN lsn_number INFINITE;
WAIT LSN lsn_number NOWAIT;

WAIT [token] wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

There were a few different suggestions made, but somehow this thread
ended up in Needs Review again while still having LSNs. I've changed it
back to Waiting for Author since it seems pretty unlikely that using LSN
is going to be acceptable based on the feedback.

I agree with the need for a separate command rather than a function.

I agree that WAIT LSN is good syntax because this allows us to wait
for something else in future.

Without having reviewed the patch, I think we want this feature in PG11.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#36Simon Riggs
simon@2ndquadrant.com
In reply to: Ivan Kartyshov (#31)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 30 October 2017 at 17:25, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:

It sounds reasonable. I can offer the following version.

WAIT LSN lsn_number;
WAIT LSN lsn_number TIMEOUT delay;
WAIT LSN lsn_number INFINITE;
WAIT LSN lsn_number NOWAIT;

WAIT [token] wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

OK, on review we want this feature in PG11

Many people think we will want to wait on a variety of things in the
future. Support for those things can come in the future when/if they
exist.

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

We do not need "INFINITE" or "INFINITELY", obviously the default mode
for WAIT is to continue waiting until the thing you asked for happens.

I couldn't see the point of the NOWAIT option, was that a Zen joke?

WAIT can be issued on masters as well as standbys, no need to block that.

If you want this in PG11, please work on this now, including docs and
tap tests. Please submit before 1 March and I will shepherd this to
commit.

Thomas, I suggest we also do what Robert suggested elsewhere which was
to have an connection option that returns xid or lsn (or both) via the
protocol, so we can use that with the WAIT command and you can have
the overall causal consistency feature into PG11. I'll be reviewer for
that feature also if you submit.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#37Stephen Frost
sfrost@snowman.net
In reply to: Simon Riggs (#35)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Greetings,

* Simon Riggs (simon@2ndquadrant.com) wrote:

On 22 January 2018 at 23:21, Stephen Frost <sfrost@snowman.net> wrote:

It sounds reasonable. I can offer the following version.

WAIT LSN lsn_number;
WAIT LSN lsn_number TIMEOUT delay;
WAIT LSN lsn_number INFINITE;
WAIT LSN lsn_number NOWAIT;

WAIT [token] wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

There were a few different suggestions made, but somehow this thread
ended up in Needs Review again while still having LSNs. I've changed it
back to Waiting for Author since it seems pretty unlikely that using LSN
is going to be acceptable based on the feedback.

I agree with the need for a separate command rather than a function.

I agree that WAIT LSN is good syntax because this allows us to wait
for something else in future.

Without having reviewed the patch, I think we want this feature in PG11.

I've also looked back through this and while I understand the up-thread
discussion about having something better than LSN, I don't see any
particular reason to not allow waiting on LSN, so I agree with Simon
that this makes sense to include. There are definite cases it helps
with today and it doesn't block off future work.

As we're closing out the January commitfest, I've moved this to the next
one.

Thanks!

Stephen

#38Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#36)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Fri, Feb 2, 2018 at 3:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

WAIT FOR TIMEOUT sounds a lot like SELECT pg_sleep_for(), and WAIT
UNTIL TIMESTAMP sounds a lot like SELECT pg_sleep_until().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39Simon Riggs
simon@2ndquadrant.com
In reply to: Robert Haas (#38)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2 February 2018 at 18:46, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 2, 2018 at 3:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

WAIT FOR TIMEOUT sounds a lot like SELECT pg_sleep_for(), and WAIT
UNTIL TIMESTAMP sounds a lot like SELECT pg_sleep_until().

Yes, it sounds very similar. It's the behavior that differs; I read
and agreed with yours and Thomas' earlier comments on that point.

As pointed out upthread, the key difference is whether it gets
cancelled on Hot Standby and whether you can call it in a non-READ
COMMITTED transaction.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#40Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#39)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2018-02-02 19:41:37 +0000, Simon Riggs wrote:

On 2 February 2018 at 18:46, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 2, 2018 at 3:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

WAIT FOR TIMEOUT sounds a lot like SELECT pg_sleep_for(), and WAIT
UNTIL TIMESTAMP sounds a lot like SELECT pg_sleep_until().

Yes, it sounds very similar. It's the behavior that differs; I read
and agreed with yours and Thomas' earlier comments on that point.

As pointed out upthread, the key difference is whether it gets
cancelled on Hot Standby and whether you can call it in a non-READ
COMMITTED transaction.

Given that nobody has updated the patch or even discussed doing so, I
assume this would CF issue should now appropriately be classified as
returned with feedback?

Greetings,

Andres Freund

#41Ivan Kartyshov
i.kartyshov@postgrespro.ru
In reply to: Andres Freund (#40)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Andres Freund писал 2018-03-02 03:47:

On 2018-02-02 19:41:37 +0000, Simon Riggs wrote:

On 2 February 2018 at 18:46, Robert Haas <robertmhaas@gmail.com>
wrote:

On Fri, Feb 2, 2018 at 3:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

WAIT FOR TIMEOUT sounds a lot like SELECT pg_sleep_for(), and WAIT
UNTIL TIMESTAMP sounds a lot like SELECT pg_sleep_until().

Yes, it sounds very similar. It's the behavior that differs; I read
and agreed with yours and Thomas' earlier comments on that point.

As pointed out upthread, the key difference is whether it gets
cancelled on Hot Standby and whether you can call it in a non-READ
COMMITTED transaction.

Given that nobody has updated the patch or even discussed doing so, I
assume this would CF issue should now appropriately be classified as
returned with feedback?

Hello, I now is preparing the patch over syntax that Simon offered. And
in few day I will update the patch.
Thank you for your interest in thread.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#42Dmitry Ivanov
d.ivanov@postgrespro.ru
In reply to: Simon Riggs (#36)
Re: [HACKERS] make async slave to wait for lsn to be replayed

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

I have a (possibly) dumb question: if we have specified several events,
should WAIT finish if only one of them triggered? It's not immediately
obvious if we're waiting for ALL of them to trigger, or just one will
suffice (ANY). IMO the syntax could be extended to something like:

WAIT FOR [ANY | ALL] event [, event ...] options,

with ANY being the default variant.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#43Simon Riggs
simon@2ndquadrant.com
In reply to: Dmitry Ivanov (#42)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 6 March 2018 at 11:24, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

I have a (possibly) dumb question: if we have specified several events,
should WAIT finish if only one of them triggered? It's not immediately
obvious if we're waiting for ALL of them to trigger, or just one will
suffice (ANY). IMO the syntax could be extended to something like:

WAIT FOR [ANY | ALL] event [, event ...] options,

with ANY being the default variant.

+1

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#44Michael Paquier
michael@paquier.xyz
In reply to: Ivan Kartyshov (#41)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:

Hello, I now is preparing the patch over syntax that Simon offered. And in
few day I will update the patch.
Thank you for your interest in thread.

It has been more than one month since a patch update has been requested,
and time is growing short. This refactored patch introduces a whole new
concept as well, so my recommendation would be to mark this patch as
returned with feedback, and then review it freshly for v12 if this
concept is still alive and around.
--
Michael

#45David Steele
david@pgmasters.net
In reply to: Michael Paquier (#44)
Re: Re: [HACKERS] make async slave to wait for lsn to be replayed

Hi Ivan,

On 3/6/18 9:25 PM, Michael Paquier wrote:

On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:

Hello, I now is preparing the patch over syntax that Simon offered. And in
few day I will update the patch.
Thank you for your interest in thread.

It has been more than one month since a patch update has been requested,
and time is growing short. This refactored patch introduces a whole new
concept as well, so my recommendation would be to mark this patch as
returned with feedback, and then review it freshly for v12 if this
concept is still alive and around.

This patch wasn't updated at the beginning of the CF and still hasn't
been updated after almost two weeks.

I have marked the patch Returned with Feedback. Please resubmit to a
new CF when you have an updated patch.

Regards,
--
-David
david@pgmasters.net

#46Fujii Masao
masao.fujii@gmail.com
In reply to: David Steele (#45)
Re: Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Mar 13, 2018 at 10:06 PM David Steele <david@pgmasters.net> wrote:

Hi Ivan,

On 3/6/18 9:25 PM, Michael Paquier wrote:

On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:

Hello, I now is preparing the patch over syntax that Simon offered. And in
few day I will update the patch.
Thank you for your interest in thread.

It has been more than one month since a patch update has been requested,
and time is growing short. This refactored patch introduces a whole new
concept as well, so my recommendation would be to mark this patch as
returned with feedback, and then review it freshly for v12 if this
concept is still alive and around.

This patch wasn't updated at the beginning of the CF and still hasn't
been updated after almost two weeks.

I have marked the patch Returned with Feedback. Please resubmit to a
new CF when you have an updated patch.

There are no updates from about two years before, but this patch
has been registered in CF 2020-03. Not sure why. It should be marked
as Returned with Feedback again?

Regards,

--
Fujii Masao

#47David Steele
david@pgmasters.net
In reply to: Fujii Masao (#46)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 3/4/20 5:36 AM, Fujii Masao wrote:

On Tue, Mar 13, 2018 at 10:06 PM David Steele <david@pgmasters.net> wrote:

On 3/6/18 9:25 PM, Michael Paquier wrote:

On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:

Hello, I now is preparing the patch over syntax that Simon offered. And in
few day I will update the patch.
Thank you for your interest in thread.

It has been more than one month since a patch update has been requested,
and time is growing short. This refactored patch introduces a whole new
concept as well, so my recommendation would be to mark this patch as
returned with feedback, and then review it freshly for v12 if this
concept is still alive and around.

This patch wasn't updated at the beginning of the CF and still hasn't
been updated after almost two weeks.

I have marked the patch Returned with Feedback. Please resubmit to a
new CF when you have an updated patch.

There are no updates from about two years before, but this patch
has been registered in CF 2020-03. Not sure why. It should be marked
as Returned with Feedback again?

Worse, it was marked Needs Review even though no new patch was provided.

I'm going to set this back to Returned with Feedback. If anyone has a
good reason that it should be in the CF we can always revive it.

Regards,
--
-David
david@pgmasters.net

#48Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#47)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Wed, Mar 04, 2020 at 07:17:31AM -0500, David Steele wrote:

On 3/4/20 5:36 AM, Fujii Masao wrote:

There are no updates from about two years before, but this patch
has been registered in CF 2020-03. Not sure why. It should be marked
as Returned with Feedback again?

Worse, it was marked Needs Review even though no new patch was provided.

I'm going to set this back to Returned with Feedback. If anyone has a good
reason that it should be in the CF we can always revive it.

+1.
--
Michael
#49Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Michael Paquier (#48)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2018-03-06 14:50, Simon Riggs wrote:

On 6 March 2018 at 11:24, Dmitry Ivanov <d.ivanov@postgrespro.ru>
wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns.
It
doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use
whichever
they have)

I have a (possibly) dumb question: if we have specified several
events,
should WAIT finish if only one of them triggered? It's not immediately
obvious if we're waiting for ALL of them to trigger, or just one will
suffice (ANY). IMO the syntax could be extended to something like:

WAIT FOR [ANY | ALL] event [, event ...] options,

with ANY being the default variant.

+1

Here I made new patch of feature, discussed above.

WAIT FOR - wait statement to pause beneath statements
==========

Synopsis
==========
WAIT FOR [ANY | SOME | ALL] event [, event ...] options
and event is:
LSN value
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp
Description
==========
WAIT FOR - make to wait statements (that are beneath) on sleep until
event happens (Don’t process new queries until an event happens).

How to use it
==========
WAIT FOR LSN ‘LSN’ [, timeout in ms];

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAIT FOR ANY LSN ‘0/303EC60’, TIMEOUT 10000;

#Or same without timeout.
WAIT FOR LSN ‘0/303EC60’;

#Or wait for some timestamp.
WAIT FOR TIMESTAMP '2020-01-02 17:20:19.028161+03';

#Wait until ALL events occur: LSN to be replayed and timestamp
passed.
WAIT FOR ALL LSN ‘0/303EC60’, TIMESTAMP '2020-01-28 11:10:39.021341+03';

Notice: WAIT FOR will release on PostmasterDeath or Interruption events
if they come earlier then LSN or timeout.

Testing the implementation
======================
The implementation was tested with src/test/recovery/t/018_waitfor.pl

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

wait_for_v1.patchtext/x-diff; name=wait_for_v1.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e6..1a2bc7dfa93 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitfor            SYSTEM "waitfor.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitfor.sgml b/doc/src/sgml/ref/waitfor.sgml
new file mode 100644
index 00000000000..8fa2ddb4492
--- /dev/null
+++ b/doc/src/sgml/ref/waitfor.sgml
@@ -0,0 +1,136 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-waitlsn">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' 
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable> 
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command> command
+   waits for the specified <acronym>LSN</acronym> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server. You can also limit the wait
+   time using the <option>TIMEOUT</option> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b3..bfe62ee3516 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &waitfor;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 63ab0ab6c2e..2263e930ac6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -7245,6 +7246,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d64628566d3..ba42e494aa0 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o variable.o view.o
+	vacuum.o variable.o view.o wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..bceffeb600c
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,265 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  wait statment: wait
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2019, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on replica
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add/delete to shmem array */
+static void AddEvent(XLogRecPtr trg_lsn);
+static void DeleteEvent(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	XLogRecPtr	trg_lsn;
+	/* Left here struct BIDState for future compatibility */
+} BIDState;
+
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	BIDState	event_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState *state;
+
+/* Add event for current backend to shmem array */
+static void
+AddEvent(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->event_arr[MyBackendId].trg_lsn = trg_lsn;
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event for current backend to shared array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->event_arr[MyBackendId].trg_lsn;
+
+	state->event_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+	/* Update state->min_lsn iff it is nessesary choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->event_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->event_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >=2; i--)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/* Get size of shared memory for GlobState */
+Size
+WaitShmemSize(void)
+{
+	return offsetof(GlobState, event_arr) + sizeof(BIDState) * (MaxBackends+1);
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends+1); i++)
+			state->event_arr[i].trg_lsn = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all Latches in shared memory cause new LSN being replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->event_arr[i].trg_lsn != 0)
+		{
+			if (state->event_arr[i].trg_lsn <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT own MyLatch and wait till LSN is replayed, Postmaster death interruption
+ * or timeout.
+ */
+void
+WaitUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState *tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	AddEvent(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		/* Little hack behaviour like SnapshotResetXmin to work outoff snapshot*/
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(MyLatch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(MyLatch);
+
+		/* CHECK_FOR_INTERRUPTS if they come then Delete current event from array */
+		if (InterruptPending)
+		{
+			DeleteEvent();
+			ProcessInterrupts();
+		}
+
+	}
+
+	DeleteEvent();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
+
+void
+WaitTimeUtility(const int delay)
+{
+	int				latch_events;
+
+	if (delay < 0)
+		return ;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, delay, WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index ecd6a8bae7d..5ccf14ead46 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static Query *transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -326,6 +327,9 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
+		case T_WaitStmt:
+			result = transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			break;
 
 		default:
 
@@ -2981,6 +2985,20 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+static Query *
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	Query *result;
+
+	stmt->time = transformExpr(pstate, stmt->time, EXPR_KIND_OTHER);
+
+	result = makeNode(Query);
+	result->commandType = CMD_UTILITY;
+	result->utilityStmt = (Node *) stmt;
+
+	return result;
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 208b4a1f28a..d13e22fea86 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -485,7 +485,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -655,7 +655,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -685,7 +685,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -695,7 +695,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -947,6 +948,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -14010,6 +14012,88 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...] [option]
+ *				event:
+ *					LSN value
+ *					TIMEOUT value
+ *					TIMESTAMP timestamp
+ *				option:
+ *					TIMEOUT delay
+ *					UNTIL TIMESTAMP timestamp
+ *
+ *****************************************************************************/
+
+WaitStmt:
+			WAIT FOR wait_strategy LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_LSN;
+					n->lsn = $5;
+					n->delay = 0;
+					n->time = 0;
+					$$ = (Node *)n;
+				}
+			| WAIT FOR  wait_strategy LSN Sconst TIMEOUT Iconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_MIX;
+					n->lsn = $5;
+					n->delay = $7;
+					n->time = 0;
+					$$ = (Node *)n;
+				}
+			| WAIT FOR  wait_strategy LSN Sconst UNTIL ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_MIX;
+					n->lsn = $5;
+					n->delay = 0;
+					n->time = makeStringConstCast($8, @8, $7);
+					$$ = (Node *)n;
+				}
+			| WAIT FOR wait_strategy ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_TIME;
+					n->lsn = NULL;
+					n->delay = 0;
+					n->time = makeStringConstCast($5, @5, $4);
+					$$ = (Node *)n;
+				}
+		;
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| SOME				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ANY; }
+		;
+/*
+WaitEvent:
+			LSN Sconst WaitOption
+				{
+					TypeName *t = makeTypeNameFromNameList("pg_lsn");
+					t->location = @1;
+					$$ = makeStringConstCast($2, @2, t);
+				}
+			| WaitOption
+				{
+					$$ = $1;
+				}
+		;
+
+WaitOption:
+			ConstDatetime Sconst
+				{
+				$$ = makeStringConstCast($2, @2, $1);
+				}
+			|
+				{ $$ - NIL; }
+		;
+*/
 
 /*
  * Aggregate decoration clauses
@@ -15153,6 +15237,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15275,6 +15360,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15300,6 +15386,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d733530ff..05876634d22 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in SHMEM for Wait
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 05ec7f3ac61..a1e07c2c8f0 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -57,6 +57,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +71,7 @@
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/rel.h"
+#include "executor/spi.h"
 
 
 /* Hook for plugins to get control in ProcessUtility() */
@@ -922,6 +924,61 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt *stmt = (WaitStmt *) parsetree;
+				float8 time_val;
+
+				if (stmt->time)
+				{
+					Const *time = (Const *) stmt->time;
+					int ret;
+
+					Oid		types[] = { time->consttype };
+					Datum	values[] = { time->constvalue };
+					char	nulls[] = { " " };
+
+					Datum result;
+					bool isnull;
+
+					SPI_connect();
+
+					if (time->consttype == 1083)
+						ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+													1, types, values, nulls, true, 0);
+					else if (time->consttype == 1266)
+						ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+													1, types, values, nulls, true, 0);
+					else
+						ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+													1, types, values, nulls, true, 0);
+
+					Assert(ret >= 0);
+					result = SPI_getbinval(SPI_tuptable->vals[0],
+										   SPI_tuptable->tupdesc,
+										   1, &isnull);
+
+					Assert(!isnull);
+					time_val = DatumGetFloat8(result);
+
+					elog(INFO, "time: %f", time_val);
+
+					SPI_finish();
+				}
+
+				if (time_val <= 0)
+					time_val = 0;
+
+				if (!stmt->delay)
+					stmt->delay = (int)(time_val * 1000);
+
+				if (stmt->kind == WAIT_FOR_TIME)
+					WaitTimeUtility(time_val * 1000);
+				else
+					WaitUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2567,6 +2624,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "NOTIFY";
 			break;
 
+		case T_WaitStmt:
+			tag = "WAIT";
+			break;
+
 		case T_ListenStmt:
 			tag = "LISTEN";
 			break;
@@ -3194,6 +3255,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..815aaa2f92f
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  wait notification: wait
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern void WaitUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitTimeUtility(const int delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39105b..ef94eaf50bf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -486,6 +486,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d6b943c898c..4c58bf3737a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3521,4 +3521,32 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		Wait Statement
+ * ----------------------
+ */
+
+typedef enum WaitForType
+{
+	WAIT_FOR_LSN = 0,
+	WAIT_FOR_TIME,
+	WAIT_FOR_MIX
+} WaitForType;
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag			type;
+	WaitForType		kind;
+	WaitForStrategy	strategy;
+	char		   *lsn;		/* Taraget LSN to wait for */
+	int				delay;		/* Delay to wait for LSN */
+	Node		   *time;		/* Wait for timestamp */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 00ace8425e2..51d01e10f56 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -242,6 +242,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -403,6 +404,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -442,6 +444,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/test/recovery/t/018_waitfor.pl b/src/test/recovery/t/018_waitfor.pl
new file mode 100644
index 00000000000..6817431e9c8
--- /dev/null
+++ b/src/test/recovery/t/018_waitfor.pl
@@ -0,0 +1,64 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content
+$node_master->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+my $delay        = 4;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Make new content on master and check its presence in standby depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $master_insert_time = time();
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(11, 20))");
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved master LSN.
+my $until_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# Check that waitlsn is able to setup infinite waiting loop and exit
+# it without timeouts.
+$node_standby->safe_psql('postgres',
+    "WAIT FOR LSN '$until_lsn'", 't')
+  or die "standby never caught up";
+
+# Check that waitlsn can return result immediately with NOWAIT.
+$node_standby->poll_query_until('postgres',
+    "WAIT FOR LSN '$until_lsn' TIMEOUT 1", 't')
+  or die "standby never caught up";
+
+# This test is successful if and only if the LSN has been applied with at least
+# the configured apply delay.
+my $time_waited = time() - $master_insert_time;
+ok($time_waited >= $delay,"standby applies WAL only after replication delay");
#50Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kartyshov Ivan (#49)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hello.

I looked this briefly but not tested.

At Fri, 06 Mar 2020 00:24:01 +0300, Kartyshov Ivan <i.kartyshov@postgrespro.ru> wrote in

On 2018-03-06 14:50, Simon Riggs wrote:

On 6 March 2018 at 11:24, Dmitry Ivanov <d.ivanov@postgrespro.ru>
wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. It
doesn't hold snapshot and will not get cancelled by Hot Standby.
WAIT FOR event [, event ...] options
event is
LSN value
TIMESTAMP value
options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use whichever
they have)

I have a (possibly) dumb question: if we have specified several
events,
should WAIT finish if only one of them triggered? It's not immediately
obvious if we're waiting for ALL of them to trigger, or just one will
suffice (ANY). IMO the syntax could be extended to something like:
WAIT FOR [ANY | ALL] event [, event ...] options,
with ANY being the default variant.

+1

Here I made new patch of feature, discussed above.

WAIT FOR - wait statement to pause beneath statements
==========

Synopsis
==========
WAIT FOR [ANY | SOME | ALL] event [, event ...] options
and event is:
LSN value
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp

The syntax seems getting confused. What happens if we typed in the
command "WAIT FOR TIMESTAMP '...' UNTIL TIMESTAMP '....'"? It seems
to me the options is useles. Couldn't the TIMEOUT option be a part of
event? I know gram.y doesn't accept that syntax but it is not
apparent from the description above.

As I read through the previous thread, one of the reason for this
feature implemented as a syntax is it was inteded to be combined into
BEGIN statement. If there is not any use case for the feature midst
of a transaction, why don't you turn it into a part of BEGIN command?

Description
==========
WAIT FOR - make to wait statements (that are beneath) on sleep until
event happens (Don’t process new queries until an event happens).

...

Notice: WAIT FOR will release on PostmasterDeath or Interruption
events
if they come earlier then LSN or timeout.

I think interrupts ought to result in ERROR.

wait.c adds a fair amount of code and uses proc-array based
approach. But Thomas suggested queue-based approach and I also think
it is better. We already have a queue-based mechanism that behaves
almost the same with this feature in the comit code on master-side. It
avoids spurious backend wakeups. Couldn't we extend SyncRepWaitForLSN
or share a part of the code/infrastructures so that this feature can
share the code?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#51Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Kyotaro Horiguchi (#50)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-03-06 08:54, Kyotaro Horiguchi wrote:

The syntax seems getting confused. What happens if we typed in the
command "WAIT FOR TIMESTAMP '...' UNTIL TIMESTAMP '....'"? It seems
to me the options is useles. Couldn't the TIMEOUT option be a part of
event? I know gram.y doesn't accept that syntax but it is not
apparent from the description above.

I`ll fix the doc file.

Synopsis
==========
WAIT FOR [ANY | SOME | ALL] event [, event ...]
and event is:
LSN value [options]
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp

As I read through the previous thread, one of the reason for this
feature implemented as a syntax is it was inteded to be combined into
BEGIN statement. If there is not any use case for the feature midst
of a transaction, why don't you turn it into a part of BEGIN command?

It`s seem to have some limitations on hot standbys. I`ll take few days
to make a prototype.

Description
==========
WAIT FOR - make to wait statements (that are beneath) on sleep until
event happens (Don’t process new queries until an event happens).

...

Notice: WAIT FOR will release on PostmasterDeath or Interruption
events
if they come earlier then LSN or timeout.

I think interrupts ought to result in ERROR.

wait.c adds a fair amount of code and uses proc-array based
approach. But Thomas suggested queue-based approach and I also think
it is better. We already have a queue-based mechanism that behaves
almost the same with this feature in the comit code on master-side. It
avoids spurious backend wakeups. Couldn't we extend SyncRepWaitForLSN
or share a part of the code/infrastructures so that this feature can
share the code?

I`ll take a look on.

Thank you for your review.

Rebased patch is attached.
--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#52Adam Brusselback
adambrusselback@gmail.com
In reply to: Kartyshov Ivan (#51)
Re: [HACKERS] make async slave to wait for lsn to be replayed

I just wanted to express my excitement that this is being picked up again.
I was very much looking forward to this years ago, and the use case for me
is still there, so I am excited to see this moving again.

#53Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Kartyshov Ivan (#51)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Sorry, I have some troubles on email sending.
On 2020-03-06 08:54, Kyotaro Horiguchi wrote:

The syntax seems getting confused. What happens if we typed in the
command "WAIT FOR TIMESTAMP '...' UNTIL TIMESTAMP '....'"? It seems
to me the options is useles. Couldn't the TIMEOUT option be a part of
event? I know gram.y doesn't accept that syntax but it is not
apparent from the description above.

I`ll fix the doc file.

Synopsis
==========
WAIT FOR [ANY | SOME | ALL] event [, event ...]
and event is:
LSN value [options]
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp

As I read through the previous thread, one of the reason for this
feature implemented as a syntax is it was inteded to be combined into
BEGIN statement. If there is not any use case for the feature midst
of a transaction, why don't you turn it into a part of BEGIN command?

It`s seem to have some limitations on hot standbys. I`ll take few days
to make a prototype.

Description
==========
WAIT FOR - make to wait statements (that are beneath) on sleep until
event happens (Don’t process new queries until an event happens).

...

Notice: WAIT FOR will release on PostmasterDeath or Interruption
events
if they come earlier then LSN or timeout.

I think interrupts ought to result in ERROR.

wait.c adds a fair amount of code and uses proc-array based
approach. But Thomas suggested queue-based approach and I also think
it is better. We already have a queue-based mechanism that behaves
almost the same with this feature in the comit code on master-side. It
avoids spurious backend wakeups. Couldn't we extend SyncRepWaitForLSN
or share a part of the code/infrastructures so that this feature can
share the code?

I`ll take a look on.

Thank you for your review.

Rebased patch is attached.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

wait_for_v2.patchtext/x-diff; name=wait_for_v2.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e..1a2bc7dfa9 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY waitfor            SYSTEM "waitfor.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/waitfor.sgml b/doc/src/sgml/ref/waitfor.sgml
new file mode 100644
index 0000000000..8fa2ddb449
--- /dev/null
+++ b/doc/src/sgml/ref/waitfor.sgml
@@ -0,0 +1,136 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-waitlsn">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' 
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable> 
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command> command
+   waits for the specified <acronym>LSN</acronym> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server. You can also limit the wait
+   time using the <option>TIMEOUT</option> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..bfe62ee351 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &waitfor;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4361568882..d1963b1728 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7285,6 +7286,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * After update lastReplayedEndRecPtr set Latches in SHMEM array
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce6..9b310926c1 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 0000000000..bceffeb600
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,265 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  wait statment: wait
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2019, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * -------------------------------------------------------------------------
+ * Wait for LSN been replayed on replica
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add/delete to shmem array */
+static void AddEvent(XLogRecPtr trg_lsn);
+static void DeleteEvent(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	XLogRecPtr	trg_lsn;
+	/* Left here struct BIDState for future compatibility */
+} BIDState;
+
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	BIDState	event_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState *state;
+
+/* Add event for current backend to shmem array */
+static void
+AddEvent(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->event_arr[MyBackendId].trg_lsn = trg_lsn;
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event for current backend to shared array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->event_arr[MyBackendId].trg_lsn;
+
+	state->event_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+	/* Update state->min_lsn iff it is nessesary choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->event_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->event_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >=2; i--)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/* Get size of shared memory for GlobState */
+Size
+WaitShmemSize(void)
+{
+	return offsetof(GlobState, event_arr) + sizeof(BIDState) * (MaxBackends+1);
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends+1); i++)
+			state->event_arr[i].trg_lsn = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all Latches in shared memory cause new LSN being replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->event_arr[i].trg_lsn != 0)
+		{
+			if (state->event_arr[i].trg_lsn <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT own MyLatch and wait till LSN is replayed, Postmaster death interruption
+ * or timeout.
+ */
+void
+WaitUtility(const char *lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in, CStringGetDatum(lsn)));
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState *tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	AddEvent(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been Replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If the postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		/* Little hack behaviour like SnapshotResetXmin to work outoff snapshot*/
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(MyLatch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(MyLatch);
+
+		/* CHECK_FOR_INTERRUPTS if they come then Delete current event from array */
+		if (InterruptPending)
+		{
+			DeleteEvent();
+			ProcessInterrupts();
+		}
+
+	}
+
+	DeleteEvent();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
+
+void
+WaitTimeUtility(const int delay)
+{
+	int				latch_events;
+
+	if (delay < 0)
+		return ;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, delay, WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..56d2f15f99 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static Query *transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -326,6 +327,9 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
+		case T_WaitStmt:
+			result = transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			break;
 
 		default:
 
@@ -2981,6 +2985,20 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+static Query *
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	Query *result;
+
+	stmt->time = transformExpr(pstate, stmt->time, EXPR_KIND_OTHER);
+
+	result = makeNode(Query);
+	result->commandType = CMD_UTILITY;
+	result->utilityStmt = (Node *) stmt;
+
+	return result;
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 96e7fdbcfe..b80195a85f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -660,7 +660,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -690,7 +690,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -700,7 +700,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -953,6 +954,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -14128,6 +14130,88 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...] [option]
+ *				event:
+ *					LSN value
+ *					TIMEOUT value
+ *					TIMESTAMP timestamp
+ *				option:
+ *					TIMEOUT delay
+ *					UNTIL TIMESTAMP timestamp
+ *
+ *****************************************************************************/
+
+WaitStmt:
+			WAIT FOR wait_strategy LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_LSN;
+					n->lsn = $5;
+					n->delay = 0;
+					n->time = 0;
+					$$ = (Node *)n;
+				}
+			| WAIT FOR  wait_strategy LSN Sconst TIMEOUT Iconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_MIX;
+					n->lsn = $5;
+					n->delay = $7;
+					n->time = 0;
+					$$ = (Node *)n;
+				}
+			| WAIT FOR  wait_strategy LSN Sconst UNTIL ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_MIX;
+					n->lsn = $5;
+					n->delay = 0;
+					n->time = makeStringConstCast($8, @8, $7);
+					$$ = (Node *)n;
+				}
+			| WAIT FOR wait_strategy ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->kind = WAIT_FOR_TIME;
+					n->lsn = NULL;
+					n->delay = 0;
+					n->time = makeStringConstCast($5, @5, $4);
+					$$ = (Node *)n;
+				}
+		;
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| SOME				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ANY; }
+		;
+/*
+WaitEvent:
+			LSN Sconst WaitOption
+				{
+					TypeName *t = makeTypeNameFromNameList("pg_lsn");
+					t->location = @1;
+					$$ = makeStringConstCast($2, @2, t);
+				}
+			| WaitOption
+				{
+					$$ = $1;
+				}
+		;
+
+WaitOption:
+			ConstDatetime Sconst
+				{
+				$$ = makeStringConstCast($2, @2, $1);
+				}
+			|
+				{ $$ - NIL; }
+		;
+*/
 
 /*
  * Aggregate decoration clauses
@@ -15272,6 +15356,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15394,6 +15479,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15420,6 +15506,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..8c3d196a9a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in SHMEM for Wait
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1b460a2612..b8061863d7 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -5,7 +5,7 @@
  *	  commands.  At one time acted as an interface between the Lisp and C
  *	  systems.
  *
- * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+d * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
@@ -57,6 +57,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +71,7 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -267,6 +269,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -1061,6 +1064,61 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt *stmt = (WaitStmt *) parsetree;
+				float8 time_val;
+
+				if (stmt->time)
+				{
+					Const *time = (Const *) stmt->time;
+					int ret;
+
+					Oid		types[] = { time->consttype };
+					Datum	values[] = { time->constvalue };
+					char	nulls[] = { " " };
+
+					Datum result;
+					bool isnull;
+
+					SPI_connect();
+
+					if (time->consttype == 1083)
+						ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+													1, types, values, nulls, true, 0);
+					else if (time->consttype == 1266)
+						ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+													1, types, values, nulls, true, 0);
+					else
+						ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+													1, types, values, nulls, true, 0);
+
+					Assert(ret >= 0);
+					result = SPI_getbinval(SPI_tuptable->vals[0],
+										   SPI_tuptable->tupdesc,
+										   1, &isnull);
+
+					Assert(!isnull);
+					time_val = DatumGetFloat8(result);
+
+					elog(INFO, "time: %f", time_val);
+
+					SPI_finish();
+				}
+
+				if (time_val <= 0)
+					time_val = 0;
+
+				if (!stmt->delay)
+					stmt->delay = (int)(time_val * 1000);
+
+				if (stmt->kind == WAIT_FOR_TIME)
+					WaitTimeUtility(time_val * 1000);
+				else
+					WaitUtility(stmt->lsn, stmt->delay, dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2713,6 +2771,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3344,6 +3406,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 0000000000..815aaa2f92
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  wait notification: wait
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern void WaitUtility(const char *lsn, const int delay, DestReceiver *dest);
+extern void WaitTimeUtility(const int delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a3cfa92ab2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -487,6 +487,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index da0706add5..463afdc817 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3554,4 +3554,32 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		Wait Statement
+ * ----------------------
+ */
+
+typedef enum WaitForType
+{
+	WAIT_FOR_LSN = 0,
+	WAIT_FOR_TIME,
+	WAIT_FOR_MIX
+} WaitForType;
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag			type;
+	WaitForType		kind;
+	WaitForStrategy	strategy;
+	char		   *lsn;		/* Taraget LSN to wait for */
+	int				delay;		/* Delay to wait for LSN */
+	Node		   *time;		/* Wait for timestamp */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index b1184c2d15..dd22e358b9 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -404,6 +405,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -444,6 +446,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d28145a50d..28f5a2ff67 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT", false, false, false)
diff --git a/src/test/recovery/t/018_waitfor.pl b/src/test/recovery/t/018_waitfor.pl
new file mode 100644
index 0000000000..6817431e9c
--- /dev/null
+++ b/src/test/recovery/t/018_waitfor.pl
@@ -0,0 +1,64 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content
+$node_master->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+my $delay        = 4;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Make new content on master and check its presence in standby depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $master_insert_time = time();
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(11, 20))");
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved master LSN.
+my $until_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# Check that waitlsn is able to setup infinite waiting loop and exit
+# it without timeouts.
+$node_standby->safe_psql('postgres',
+    "WAIT FOR LSN '$until_lsn'", 't')
+  or die "standby never caught up";
+
+# Check that waitlsn can return result immediately with NOWAIT.
+$node_standby->poll_query_until('postgres',
+    "WAIT FOR LSN '$until_lsn' TIMEOUT 1", 't')
+  or die "standby never caught up";
+
+# This test is successful if and only if the LSN has been applied with at least
+# the configured apply delay.
+my $time_waited = time() - $master_insert_time;
+ok($time_waited >= $delay,"standby applies WAL only after replication delay");
#54Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Kartyshov Ivan (#53)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

I made some improvements over old implementation WAIT FOR.

Synopsis
==========
WAIT FOR [ANY | SOME | ALL] event [, event ...]
and event is:
LSN value options
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp

ALL - option used by default.

P.S. Now I testing BEGIN base WAIT prototype as discussed earlier.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

wait_for_v3.patchtext/x-diff; name=wait_for_v3.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e..8697f9807f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY wait               SYSTEM "wait.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 0000000000..9a79524779
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,138 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-waitlsn">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>'
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ALL LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>, TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ANY LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command> command
+   waits for the specified <acronym>LSN</acronym> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server. You can also limit the wait
+   time using the <option>TIMEOUT</option> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..588e96aa14 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &wait;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4361568882..f7f5a76216 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7285,6 +7286,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If lastReplayedEndRecPtr was updated,
+				 * set latches in SHMEM array.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce6..9b310926c1 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 0000000000..17a201b31c
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,308 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT - a utility command that allows
+ *	  waiting for LSN to have been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shmem array */
+static void AddEvent(XLogRecPtr trg_lsn);
+static void DeleteEvent(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	XLogRecPtr	trg_lsn;
+	/*
+	 * Left struct BIDState here for compatibility with
+	 * a planned future patch that will allow waiting for XIDs.
+	 */
+} BIDState;
+
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	BIDState	event_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState *state;
+
+/* Add event of the current backend to shmem array */
+static void
+AddEvent(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->event_arr[MyBackendId].trg_lsn = trg_lsn;
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->event_arr[MyBackendId].trg_lsn;
+
+	state->event_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+	/* Update state->min_lsn iff it is nessesary for choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->event_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->event_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >=2; i--)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/* Get size of shared memory for GlobState */
+Size
+WaitShmemSize(void)
+{
+	return offsetof(GlobState, event_arr) + sizeof(BIDState) * (MaxBackends+1);
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends+1); i++)
+			state->event_arr[i].trg_lsn = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->event_arr[i].trg_lsn != 0)
+		{
+			if (state->event_arr[i].trg_lsn <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+void
+WaitUtility(XLogRecPtr lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = lsn;
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState *tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	AddEvent(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		/* A little hack similar to SnapshotResetXmin to work out of snapshot */
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(MyLatch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(MyLatch);
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			DeleteEvent();
+			ProcessInterrupts();
+		}
+
+	}
+
+	DeleteEvent();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+}
+
+void
+WaitTimeUtility(const int delay)
+{
+	int				latch_events;
+
+	if (delay < 0)
+		return ;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, delay, WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
+
+/* Get universal time */
+float8
+WaitTimeResolve(Const *time)
+{
+	int			ret;
+	float8		val;
+
+	Oid		types[] = { time->consttype };
+	Datum	values[] = { time->constvalue };
+	char	nulls[] = { " " };
+
+	Datum result;
+	bool isnull;
+
+	SPI_connect();
+
+	if (time->consttype == 1083)
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+									1, types, values, nulls, true, 0);
+	else if (time->consttype == 1266)
+		ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+									1, types, values, nulls, true, 0);
+	else
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+									1, types, values, nulls, true, 0);
+
+	Assert(ret >= 0);
+	result = SPI_getbinval(SPI_tuptable->vals[0],
+						   SPI_tuptable->tupdesc,
+						   1, &isnull);
+
+	Assert(!isnull);
+	val = DatumGetFloat8(result);
+
+	elog(INFO, "time: %f", val);
+
+	SPI_finish();
+	return val;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e084c3f069..cc8d20a91b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2760,6 +2760,19 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_INT_FIELD(delay);
+	WRITE_NODE_FIELD(events);
+	WRITE_NODE_FIELD(time);
+	WRITE_ENUM_FIELD(wait_type, WaitType);
+	WRITE_ENUM_FIELD(strategy, WaitForStrategy);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4305,6 +4318,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..4543fa1b9f 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static Query *transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -326,6 +327,9 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
+		case T_WaitStmt:
+			result = transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			break;
 
 		default:
 
@@ -2981,6 +2985,25 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+static Query *
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	Query		   *result;
+	ListCell	   *events;
+
+	foreach(events, stmt->events)
+	{
+		WaitStmt	   *event = (WaitStmt *) lfirst(events);
+		event->time = transformExpr(pstate, event->time, EXPR_KIND_OTHER);
+	}
+
+	result = makeNode(Query);
+	result->commandType = CMD_UTILITY;
+	result->utilityStmt = (Node *) stmt;
+
+	return result;
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 96e7fdbcfe..8b8144d12c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -591,6 +591,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <list>		wait_list
+%type <node>		WaitEvent
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -660,7 +662,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -690,7 +692,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -700,7 +702,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -953,6 +956,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -14128,6 +14132,82 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...]
+ *				event  [option]:
+ *					LSN value
+ *					TIMEOUT value
+ *					TIMESTAMP timestamp
+ *				option:
+ *					TIMEOUT delay
+ *					UNTIL TIMESTAMP timestamp
+ *
+ *****************************************************************************/
+WaitStmt:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->wait_type = WAIT_EVENT_NONE;
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			;
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ALL; }
+		;
+
+wait_list:
+			WaitEvent					{ $$ = list_make1($1); }
+			| wait_list ',' WaitEvent	{ $$ = lappend($1, $3); }
+			| wait_list WaitEvent		{ $$ = lappend($1, $2); }
+		;
+
+WaitEvent:
+			LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->wait_type = WAIT_EVENT_LSN;
+					n->lsn = $2;
+					n->delay = 0;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+
+			| LSN Sconst TIMEOUT Iconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_MIX;
+						n->lsn = $2;
+						n->delay = $4;
+						n->time = NULL;
+						$$ = (Node *)n;
+					}
+			| LSN Sconst UNTIL ConstDatetime Sconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_MIX;
+						n->lsn = $2;
+						n->delay = 0;
+						n->time = makeStringConstCast($5, @5, $4);
+						$$ = (Node *)n;
+					}
+			| ConstDatetime Sconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_TIME;
+						n->lsn = NULL;
+						n->delay = 0;
+						n->time = makeStringConstCast($2, @2, $1);
+						$$ = (Node *)n;
+					}
+			;
+
 
 /*
  * Aggregate decoration clauses
@@ -15272,6 +15352,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15394,6 +15475,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15420,6 +15502,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..8c3d196a9a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in SHMEM for Wait
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1b460a2612..b3e6dcf492 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -267,6 +272,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -1061,6 +1067,104 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt *stmt = (WaitStmt *) parsetree;
+				float8			time_val = 0;
+				float8			val = 0;
+				ListCell	   *events;
+				XLogRecPtr		lsn = InvalidXLogRecPtr;
+				XLogRecPtr		trg_lsn = InvalidXLogRecPtr;
+
+				if (stmt->strategy == WAIT_FOR_ANY)
+				{
+					time_val = DBL_MAX;
+					lsn = PG_UINT64_MAX;
+				}
+
+				/* Extract options from the statement node tree */
+				foreach(events, stmt->events)
+				{
+					WaitStmt    *event = (WaitStmt *) lfirst(events);
+
+					if (event->lsn)
+					{
+						int32	res;
+						trg_lsn = DatumGetLSN(
+									DirectFunctionCall1(pg_lsn_in,
+										CStringGetDatum(event->lsn)));
+						res = DatumGetUInt32(
+									DirectFunctionCall2(pg_lsn_cmp,
+										lsn, trg_lsn));
+
+						/* Nice behavour on LSN from past */
+						if (stmt->strategy == WAIT_FOR_ALL)
+						{
+							if (res <= 0)
+								lsn = trg_lsn;
+						}
+						else
+						{
+							if (res > 0)
+								lsn = trg_lsn;
+						}
+
+						if (stmt->wait_type == WAIT_EVENT_TIME)
+							stmt->wait_type = WAIT_EVENT_MIX;
+						else if (stmt->wait_type == WAIT_EVENT_NONE)
+							stmt->wait_type = WAIT_EVENT_LSN;
+
+						if (event->delay)
+						{
+							if (stmt->strategy == WAIT_FOR_ANY)
+							{
+								if (event->delay < time_val)
+									time_val = event->delay / 1000;
+							}
+							else
+							{
+								if (event->delay >= time_val)
+									time_val = event->delay / 1000;
+							}
+						}
+					}
+
+					if (event->time)
+					{
+						Const *time = (Const *) event->time;
+						val = WaitTimeResolve(time);
+
+						if (stmt->wait_type == WAIT_EVENT_LSN)
+							stmt->wait_type = WAIT_EVENT_MIX;
+						else if (stmt->wait_type == WAIT_EVENT_NONE)
+							stmt->wait_type = WAIT_EVENT_TIME;
+
+						/* if val == 0 ??  */
+						if (stmt->strategy == WAIT_FOR_ALL)
+						{
+							if (time_val <= val)
+								time_val = val;
+						}
+						else
+						{
+							if (time_val > val)
+								time_val = val;
+						}
+					}
+
+				}
+
+				/* Time <= 0 iff time event is passed */
+				if (time_val <= 0)
+					time_val = 1;
+
+				if (stmt->wait_type == WAIT_EVENT_TIME)
+					WaitTimeUtility(time_val * 1000);
+				else
+					WaitUtility(lsn, (int)(time_val * 1000), dest);
+			}
+			break;
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2713,6 +2817,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3344,6 +3452,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 0000000000..6aebf67459
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern void WaitUtility(XLogRecPtr lsn, const int delay, DestReceiver *dest);
+extern void WaitTimeUtility(const int delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern float8 WaitTimeResolve(Const *time);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a3cfa92ab2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -487,6 +487,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index da0706add5..b7a4fa8bcc 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3554,4 +3554,34 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		Wait Statement
+ * ----------------------
+ */
+
+typedef enum WaitType
+{
+	WAIT_EVENT_NONE = 0,
+	WAIT_EVENT_LSN,
+	WAIT_EVENT_TIME,
+	WAIT_EVENT_MIX
+} WaitType;
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag			type;
+	WaitType		wait_type;
+	WaitForStrategy	strategy;
+	List		   *events;		/* option */
+	char		   *lsn;		/* Target LSN to wait for */
+	int				delay;		/* Timeout when waiting for LSN, in msec */
+	Node		   *time;		/* Wait for timestamp */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index b1184c2d15..dd22e358b9 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -404,6 +405,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -444,6 +446,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d28145a50d..0e36f1049e 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/018_waitfor.pl b/src/test/recovery/t/018_waitfor.pl
new file mode 100644
index 0000000000..6817431e9c
--- /dev/null
+++ b/src/test/recovery/t/018_waitfor.pl
@@ -0,0 +1,64 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content
+$node_master->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+my $delay        = 4;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Make new content on master and check its presence in standby depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $master_insert_time = time();
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(11, 20))");
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved master LSN.
+my $until_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# Check that waitlsn is able to setup infinite waiting loop and exit
+# it without timeouts.
+$node_standby->safe_psql('postgres',
+    "WAIT FOR LSN '$until_lsn'", 't')
+  or die "standby never caught up";
+
+# Check that waitlsn can return result immediately with NOWAIT.
+$node_standby->poll_query_until('postgres',
+    "WAIT FOR LSN '$until_lsn' TIMEOUT 1", 't')
+  or die "standby never caught up";
+
+# This test is successful if and only if the LSN has been applied with at least
+# the configured apply delay.
+my $time_waited = time() - $master_insert_time;
+ok($time_waited >= $delay,"standby applies WAL only after replication delay");
#55Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Kartyshov Ivan (#54)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-03-17 15:47, Kartyshov Ivan wrote:

Synopsis
==========
WAIT FOR [ANY | SOME | ALL] event [, event ...]

I'm confused as to what SOME would mean in this
command's syntax, but I can see you removed it
from gram.y since the last patch. Did you
decide to not implement it after all?

Also, I had a look at the code and tested it a bit.

================
If I specify many events, here's what happens:

For WAIT_FOR_ALL strategy, it chooses
- maximum LSN
- maximum delay
and waits for the resulting event.

For WAIT_FOR_ANY strategy - same, but it uses
minimal LSN/delay.

In other words, statements
(1) WAIT FOR ALL
LSN '7F97208' TIMEOUT 11,
LSN '3002808' TIMEOUT 50;
(2) WAIT FOR ANY
LSN '7F97208' TIMEOUT 11,
LSN '3002808' TIMEOUT 50;
are essentially equivalent to:
(1) WAIT FOR LSN '7F97208' TIMEOUT 50;
(2) WAIT FOR LSN '3002808' TIMEOUT 11;

It seems a bit counter-intuitive to me, because
I expected events to be treated independently.
Is this the expected behaviour?

================
In utility.c:
if (event->delay < time_val)
time_val = event->delay / 1000;

Since event->delay is an int, the result will
be zero for any delay value less than 1000.
I suggest either dividing by 1000.0 or
explicitly converting int to float.

Also, shouldn't event->delay be divided
by 1000 in the 'if' part as well?

================
You compare two LSN-s using pg_lsn_cmp():
res = DatumGetUInt32(
DirectFunctionCall2(pg_lsn_cmp,
lsn, trg_lsn));

As far as I understand, it'd be enough to use
operators such as "<=", as you do in wait.c:
/* If LSN has been replayed */
if (trg_lsn <= cur_lsn)

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

#56Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Kyotaro Horiguchi (#50)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

As it was discussed earlier, I added wait for statement into begin/start
statement.

Synopsis
==========
BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event
where transaction_mode is one of:
ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ
COMMITTED | READ UNCOMMITTED }
READ WRITE | READ ONLY
[ NOT ] DEFERRABLE

WAIT FOR [ANY | SOME | ALL] event [, event ...]
and event is:
LSN value [options]
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

begin_waitfor_v1.patchtext/x-diff; name=begin_waitfor_v1.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e..8697f9807f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY wait               SYSTEM "wait.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e7..d86465b6d3 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,25 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<replaceable class="parameter">wait_for_event</replaceable>
+    WAIT FOR [ANY | SOME | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is:</phrase>
+    LSN value [<replaceable class="parameter">options</replaceable>]
+    TIMESTAMP value
+
+<phrase>and where <replaceable class="parameter">options</replaceable> is one of:</phrase>
+    TIMEOUT delay
+    UNTIL TIMESTAMP timestamp
+
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 0000000000..41b8eb019f
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,148 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-waitlsn">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR [ANY | SOME | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is:</phrase>
+    LSN value [<replaceable class="parameter">options</replaceable>]
+    TIMESTAMP value
+
+<phrase>and where <replaceable class="parameter">options</replaceable> is one of:</phrase>
+    TIMEOUT delay
+    UNTIL TIMESTAMP timestamp
+
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>'
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ALL LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>, TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ANY LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command> command
+   waits for the specified <acronym>LSN</acronym> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server. You can also limit the wait
+   time using the <option>TIMEOUT</option> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..588e96aa14 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &wait;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 793c076da6..8591472e21 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7285,6 +7286,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If lastReplayedEndRecPtr was updated,
+				 * set latches in SHMEM array.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce6..9b310926c1 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 0000000000..e0891b3840
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,412 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT - a utility command that allows
+ *	  waiting for LSN to have been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shmem array */
+static void AddEvent(XLogRecPtr trg_lsn);
+static void DeleteEvent(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	XLogRecPtr	trg_lsn;
+	/*
+	 * Left struct BIDState here for compatibility with
+	 * a planned future patch that will allow waiting for XIDs.
+	 */
+} BIDState;
+
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	BIDState	event_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState *state;
+
+/* Add event of the current backend to shmem array */
+static void
+AddEvent(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->event_arr[MyBackendId].trg_lsn = trg_lsn;
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->event_arr[MyBackendId].trg_lsn;
+
+	state->event_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+	/* Update state->min_lsn iff it is nessesary for choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->event_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->event_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >=2; i--)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/* Get size of shared memory for GlobState */
+Size
+WaitShmemSize(void)
+{
+	return offsetof(GlobState, event_arr) + sizeof(BIDState) * (MaxBackends+1);
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends+1); i++)
+			state->event_arr[i].trg_lsn = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->event_arr[i].trg_lsn != 0)
+		{
+			if (state->event_arr[i].trg_lsn <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+int
+WaitUtility(XLogRecPtr lsn, const int delay, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = lsn;
+	XLogRecPtr		cur_lsn;
+	int				latch_events;
+	uint64			tdelay = delay;
+	long			secs;
+	int				microsecs;
+	TimestampTz		timer = GetCurrentTimestamp();
+	TupOutputState *tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	if (delay > 0)
+		latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+	else
+		latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
+
+	AddEvent(trg_lsn);
+
+	for (;;)
+	{
+		cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (trg_lsn <= cur_lsn)
+			break;
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* If delay time is over */
+		if (latch_events & WL_TIMEOUT)
+		{
+			if (TimestampDifferenceExceeds(timer,GetCurrentTimestamp(),delay))
+				break;
+			TimestampDifference(timer,GetCurrentTimestamp(),&secs, &microsecs);
+			tdelay = delay - (secs*1000 + microsecs/1000);
+		}
+
+		/* A little hack similar to SnapshotResetXmin to work out of snapshot */
+		MyPgXact->xmin = InvalidTransactionId;
+		WaitLatch(MyLatch, latch_events, tdelay, WAIT_EVENT_CLIENT_READ);
+		ResetLatch(MyLatch);
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			DeleteEvent();
+			ProcessInterrupts();
+		}
+
+	}
+
+	DeleteEvent();
+
+	if (trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+	return strcmp(value,"t")?1:-1;
+}
+
+void
+WaitTimeUtility(const int delay)
+{
+	int				latch_events;
+
+	if (delay < 0)
+		return ;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, delay, WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
+
+/* Get universal time */
+float8
+WaitTimeResolve(Const *time)
+{
+	int			ret;
+	float8		val;
+
+	Oid		types[] = { time->consttype };
+	Datum	values[] = { time->constvalue };
+	char	nulls[] = { " " };
+
+	Datum result;
+	bool isnull;
+
+	SPI_connect();
+
+	if (time->consttype == 1083)
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+									1, types, values, nulls, true, 0);
+	else if (time->consttype == 1266)
+		ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+									1, types, values, nulls, true, 0);
+	else
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+									1, types, values, nulls, true, 0);
+
+	Assert(ret >= 0);
+	result = SPI_getbinval(SPI_tuptable->vals[0],
+						   SPI_tuptable->tupdesc,
+						   1, &isnull);
+
+	Assert(!isnull);
+	val = DatumGetFloat8(result);
+
+	elog(INFO, "time: %f", val);
+
+	SPI_finish();
+	return val;
+}
+
+int
+WaitMain(WaitStmt *stmt, DestReceiver *dest)
+{
+	int				res = 0;
+	float8			val = 0;
+	XLogRecPtr		trg_lsn = InvalidXLogRecPtr;
+	ListCell	   *events;
+	float8			time_val = 0;
+	XLogRecPtr		lsn = InvalidXLogRecPtr;
+
+	if (stmt->strategy == WAIT_FOR_ANY)
+	{
+		time_val = DBL_MAX;
+		lsn = PG_UINT64_MAX;
+	}
+
+	/* Extract options from the statement node tree */
+	foreach(events, stmt->events)
+	{
+		WaitStmt	   *event = (WaitStmt *) lfirst(events);
+
+		if (event->lsn)
+		{
+			int32	res;
+			trg_lsn = DatumGetLSN(
+						DirectFunctionCall1(pg_lsn_in,
+							CStringGetDatum(event->lsn)));
+			res = DatumGetUInt32(
+						DirectFunctionCall2(pg_lsn_cmp,
+							lsn, trg_lsn));
+
+			/* Nice behavour on LSN from past */
+			if (stmt->strategy == WAIT_FOR_ALL)
+			{
+				if (res <= 0)
+					lsn = trg_lsn;
+			}
+			else
+			{
+				if (res > 0)
+					lsn = trg_lsn;
+			}
+
+			if (stmt->wait_type == WAIT_EVENT_TIME)
+				stmt->wait_type = WAIT_EVENT_MIX;
+			else if (stmt->wait_type == WAIT_EVENT_NONE)
+				stmt->wait_type = WAIT_EVENT_LSN;
+
+			if (event->delay)
+			{
+				if (stmt->strategy == WAIT_FOR_ANY)
+				{
+					if (event->delay < time_val)
+						time_val = event->delay / 1000;
+				}
+				else
+				{
+					if (event->delay >= time_val)
+						time_val = event->delay / 1000;
+				}
+			}
+		}
+
+		if (event->time)
+		{
+			Const *time = (Const *) event->time;
+			val = WaitTimeResolve(time);
+
+			if (stmt->wait_type == WAIT_EVENT_LSN)
+				stmt->wait_type = WAIT_EVENT_MIX;
+			else if (stmt->wait_type == WAIT_EVENT_NONE)
+				stmt->wait_type = WAIT_EVENT_TIME;
+
+			/* if val == 0 ??  */
+			if (stmt->strategy == WAIT_FOR_ALL)
+			{
+				if (time_val <= val)
+					time_val = val;
+			}
+			else
+			{
+				if (time_val > val)
+					time_val = val;
+			}
+		}
+
+	}
+
+	/* Time <= 0 iff time event is passed */
+	if (time_val <= 0)
+		time_val = 1;
+
+	if (stmt->wait_type == WAIT_EVENT_TIME)
+	{
+		WaitTimeUtility(time_val * 1000);
+		res = 1;
+	}
+	else
+		res = WaitUtility(lsn, (int)(time_val * 1000), dest);
+	return res;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e084c3f069..cc8d20a91b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2760,6 +2760,19 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_INT_FIELD(delay);
+	WRITE_NODE_FIELD(events);
+	WRITE_NODE_FIELD(time);
+	WRITE_ENUM_FIELD(wait_type, WaitType);
+	WRITE_ENUM_FIELD(strategy, WaitForStrategy);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4305,6 +4318,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..8777ba81fc 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static Query *transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -279,6 +280,12 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			/*
 			 * Optimizable statements
 			 */
+		case T_TransactionStmt:
+			{
+				TransactionStmt *stmt = (TransactionStmt *) parseTree;
+				result = transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);
+				break;
+			}
 		case T_InsertStmt:
 			result = transformInsertStmt(pstate, (InsertStmt *) parseTree);
 			break;
@@ -326,6 +333,9 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
+		case T_WaitStmt:
+			result = transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			break;
 
 		default:
 
@@ -2981,6 +2991,25 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+static Query *
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	Query		   *result;
+	ListCell	   *events;
+
+	foreach(events, stmt->events)
+	{
+		WaitStmt	   *event = (WaitStmt *) lfirst(events);
+		event->time = transformExpr(pstate, event->time, EXPR_KIND_OTHER);
+	}
+
+	result = makeNode(Query);
+	result->commandType = CMD_UTILITY;
+	result->utilityStmt = (Node *) stmt;
+
+	return result;
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7e384f956c..bd5f2f0329 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -591,6 +591,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <list>		wait_list
+%type <node>		WaitEvent wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -660,7 +662,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -690,7 +692,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -700,7 +702,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -954,6 +957,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -9930,18 +9934,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14147,6 +14153,93 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...]
+ *				event  [option]:
+ *					LSN value
+ *					TIMEOUT value
+ *					TIMESTAMP timestamp
+ *				option:
+ *					TIMEOUT delay
+ *					UNTIL TIMESTAMP timestamp
+ *
+ *****************************************************************************/
+WaitStmt:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->wait_type = WAIT_EVENT_NONE;
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+//			| /* EMPTY */		{ $$ = NULL; }
+			;
+wait_for:
+			WAIT FOR wait_strategy wait_list
+			{
+				WaitStmt *n = makeNode(WaitStmt);
+				n->wait_type = WAIT_EVENT_NONE;
+				n->strategy = $3;
+				n->events = $4;
+				$$ = (Node *)n;
+			}
+			| /* EMPTY */		{ $$ = NULL; };
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ALL; }
+		;
+
+wait_list:
+			WaitEvent					{ $$ = list_make1($1); }
+			| wait_list ',' WaitEvent	{ $$ = lappend($1, $3); }
+			| wait_list WaitEvent		{ $$ = lappend($1, $2); }
+		;
+
+WaitEvent:
+			LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->wait_type = WAIT_EVENT_LSN;
+					n->lsn = $2;
+					n->delay = 0;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+
+			| LSN Sconst TIMEOUT Iconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_MIX;
+						n->lsn = $2;
+						n->delay = $4;
+						n->time = NULL;
+						$$ = (Node *)n;
+					}
+			| LSN Sconst UNTIL ConstDatetime Sconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_MIX;
+						n->lsn = $2;
+						n->delay = 0;
+						n->time = makeStringConstCast($5, @5, $4);
+						$$ = (Node *)n;
+					}
+			| ConstDatetime Sconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_TIME;
+						n->lsn = NULL;
+						n->delay = 0;
+						n->time = makeStringConstCast($2, @2, $1);
+						$$ = (Node *)n;
+					}
+			;
+
 
 /*
  * Aggregate decoration clauses
@@ -15291,6 +15384,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15413,6 +15507,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15439,6 +15534,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..8c3d196a9a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in SHMEM for Wait
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d0..8f2cffb658 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -268,6 +273,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -591,6 +597,12 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							int			res = 0;
+							WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+
+							res = WaitMain(waitstmt, dest);
+							if (res <= 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
@@ -1062,6 +1074,13 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt *stmt = (WaitStmt *) parsetree;
+				WaitMain(stmt, dest);
+				break;
+			}
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2718,6 +2737,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3357,6 +3380,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 0000000000..62253db4e6
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const int delay, DestReceiver *dest);
+extern void WaitTimeUtility(const int delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern float8 WaitTimeResolve(Const *time);
+extern int WaitMain(WaitStmt *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..348de76c5f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -488,6 +488,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2039b42449..0429eebb5f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3054,6 +3054,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* Wait for event node or NULL */
 } TransactionStmt;
 
 /* ----------------------
@@ -3563,4 +3564,34 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		Wait Statement
+ * ----------------------
+ */
+
+typedef enum WaitType
+{
+	WAIT_EVENT_NONE = 0,
+	WAIT_EVENT_LSN,
+	WAIT_EVENT_TIME,
+	WAIT_EVENT_MIX
+} WaitType;
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag			type;
+	WaitType		wait_type;
+	WaitForStrategy	strategy;
+	List		   *events;		/* option */
+	char		   *lsn;		/* Target LSN to wait for */
+	int				delay;		/* Timeout when waiting for LSN, in msec */
+	Node		   *time;		/* Wait for timestamp */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index b1184c2d15..dd22e358b9 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -404,6 +405,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -444,6 +446,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index ed72770978..9a8ae14fee 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/018_waitfor.pl b/src/test/recovery/t/018_waitfor.pl
new file mode 100644
index 0000000000..6817431e9c
--- /dev/null
+++ b/src/test/recovery/t/018_waitfor.pl
@@ -0,0 +1,64 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content
+$node_master->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+my $delay        = 4;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Make new content on master and check its presence in standby depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $master_insert_time = time();
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(11, 20))");
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved master LSN.
+my $until_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# Check that waitlsn is able to setup infinite waiting loop and exit
+# it without timeouts.
+$node_standby->safe_psql('postgres',
+    "WAIT FOR LSN '$until_lsn'", 't')
+  or die "standby never caught up";
+
+# Check that waitlsn can return result immediately with NOWAIT.
+$node_standby->poll_query_until('postgres',
+    "WAIT FOR LSN '$until_lsn' TIMEOUT 1", 't')
+  or die "standby never caught up";
+
+# This test is successful if and only if the LSN has been applied with at least
+# the configured apply delay.
+my $time_waited = time() - $master_insert_time;
+ok($time_waited >= $delay,"standby applies WAL only after replication delay");
#57Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Kartyshov Ivan (#56)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-03-21 14:16, Kartyshov Ivan wrote:

As it was discussed earlier, I added wait for statement into
begin/start statement.

Thanks! To address the discussion: I like the idea of having WAIT as a
part of BEGIN statement rather than a separate command, as Thomas Munro
suggested. That way, the syntax itself enforces that WAIT FOR LSN will
actually take effect, even for single-snapshot transactions. It seems
more convenient for the user, who won't have to remember the details
about how WAIT interacts with isolation levels.

BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event

Not sure about this, but could we add "WAIT FOR .." as another
transaction_mode rather than a separate thing? That way, user won't have
to worry about the order. As of now, one should remember to always put
WAIT FOR as the Last parameter in the BEGIN statement.

and event is:
LSN value [options]
TIMESTAMP value

I would maybe remove WAIT FOR TIMESTAMP. As Robert Haas has pointed out,
it seems a lot like pg_sleep_until(). Besides, it doesn't necessarily
need to be connected to transaction start, which makes it different from
WAIT FOR LSN - so I wouldn't mix them together.

I had another look at the code:

===
In WaitShmemSize() you might want to use functions that check for
overflow - add_size()/mul_size(). They're used in similar situations,
for example in BTreeShmemSize().

===
This is how WaitUtility() is called - note that time_val will always be

0:

+    if (time_val <= 0)
+        time_val = 1;
+...
+    res = WaitUtility(lsn, (int)(time_val * 1000), dest);

(time_val * 1000) is passed to WaitUtility() as the delay argument. And
inside WaitUtility() we have this:

+if (delay > 0)
+    latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+else
+    latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;

Since we always pass a delay value greater than 0, we'll never get to
the "else" clause here and we'll never be ready to wait for LSN forever.
Perhaps due to that, the current test outputs this after a simple WAIT
FOR LSN command:
psql:<stdin>:1: NOTICE:  LSN is not reached.

===
Speaking of tests,

When I tried to test BEGIN TRANSACTION WAIT FOR LSN, I got a segfault:
LOG: statement: BEGIN TRANSACTION WAIT FOR LSN '0/3002808'
LOG: server process (PID 10385) was terminated by signal 11:
Segmentation fault
DETAIL: Failed process was running: COMMIT

Could you add some more tests to the patch when this is fixed? With WAIT
as part of BEGIN statement + with things such as WAIT FOR ALL ... / WAIT
FOR ANY ... / WAIT FOR LSN ... UNTIL TIMESTAMP ...

===
In WaitSetLatch() we should probably check backend for NULL before
calling SetLatch(&backend->procLatch)

We might also need to check wait statement for NULL in these two cases:
+   case T_TransactionStmt:
+   {...
+       result = transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);
case TRANS_STMT_START:
{...
+   WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+   res = WaitMain(waitstmt, dest);

===
After we added the "wait" attribute to TransactionStmt struct, do we
perhaps need to add something to _copyTransactionStmt() /
_equalTransactionStmt()?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

#58Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Anna Akenteva (#57)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Anna, thank you for your review.

On 2020-03-25 21:10, Anna Akenteva wrote:

On 2020-03-21 14:16, Kartyshov Ivan wrote:

and event is:
LSN value [options]
TIMESTAMP value

I would maybe remove WAIT FOR TIMESTAMP. As Robert Haas has pointed
out, it seems a lot like pg_sleep_until(). Besides, it doesn't
necessarily need to be connected to transaction start, which makes it
different from WAIT FOR LSN - so I wouldn't mix them together.

I don't mind.
But I think we should get one more opinions on this point.

===
This is how WaitUtility() is called - note that time_val will always be

0:

+    if (time_val <= 0)
+        time_val = 1;
+...
+    res = WaitUtility(lsn, (int)(time_val * 1000), dest);

(time_val * 1000) is passed to WaitUtility() as the delay argument.
And inside WaitUtility() we have this:

+if (delay > 0)
+    latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+else
+    latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;

Since we always pass a delay value greater than 0, we'll never get to
the "else" clause here and we'll never be ready to wait for LSN
forever. Perhaps due to that, the current test outputs this after a
simple WAIT FOR LSN command:
psql:<stdin>:1: NOTICE:  LSN is not reached.

I fix it, and Interruptions in last patch.

Anna, feel free to work on this patch.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

begin_waitfor_v2.patchtext/x-diff; name=begin_waitfor_v2.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e..8697f9807f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY wait               SYSTEM "wait.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e7..45289c0173 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,25 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<replaceable class="parameter">wait_for_event</replaceable>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is:</phrase>
+    LSN value [<replaceable class="parameter">options</replaceable>]
+    TIMESTAMP value
+
+<phrase>and where <replaceable class="parameter">options</replaceable> is one of:</phrase>
+    TIMEOUT delay
+    UNTIL TIMESTAMP timestamp
+
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d4177..01b402e9cd 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,24 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<replaceable class="parameter">wait_for_event</replaceable>
+    WAIT FOR [ANY | SOME | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is:</phrase>
+    LSN value [<replaceable class="parameter">options</replaceable>]
+    TIMESTAMP value
+
+<phrase>and where <replaceable class="parameter">options</replaceable> is one of:</phrase>
+    TIMEOUT delay
+    UNTIL TIMESTAMP timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 0000000000..b824088f6c
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,148 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-waitlsn">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is:</phrase>
+    LSN value [<replaceable class="parameter">options</replaceable>]
+    TIMESTAMP value
+
+<phrase>and where <replaceable class="parameter">options</replaceable> is one of:</phrase>
+    TIMEOUT delay
+    UNTIL TIMESTAMP timestamp
+
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>'
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ALL LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>, TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ANY LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command> command
+   waits for the specified <acronym>LSN</acronym> to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server. You can also limit the wait
+   time using the <option>TIMEOUT</option> option.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds, and then cancel the command:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..588e96aa14 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &wait;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7621fc05e2..a753065e99 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7307,6 +7308,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If lastReplayedEndRecPtr was updated,
+				 * set latches in SHMEM array.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce6..9b310926c1 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 0000000000..b8fa9e4903
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,413 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT - a utility command that allows
+ *	  waiting for LSN to have been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include <math.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shmem array */
+static void AddEvent(XLogRecPtr trg_lsn);
+static void DeleteEvent(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	XLogRecPtr	trg_lsn;
+	/*
+	 * Left struct BIDState here for compatibility with
+	 * a planned future patch that will allow waiting for XIDs.
+	 */
+} BIDState;
+
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	BIDState	event_arr[FLEXIBLE_ARRAY_MEMBER];
+} GlobState;
+
+static volatile GlobState *state;
+
+/* Add event of the current backend to shmem array */
+static void
+AddEvent(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->event_arr[MyBackendId].trg_lsn = trg_lsn;
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int i;
+	XLogRecPtr trg_lsn = state->event_arr[MyBackendId].trg_lsn;
+
+	state->event_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+	/* Update state->min_lsn iff it is nessesary for choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->event_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->event_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >=2; i--)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/* Get size of shared memory for GlobState */
+Size
+WaitShmemSize(void)
+{
+	return offsetof(GlobState, event_arr) + sizeof(BIDState) * (MaxBackends+1);
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool	found;
+	uint32	i;
+
+	state = (GlobState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends+1); i++)
+			state->event_arr[i].trg_lsn = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->event_arr[i].trg_lsn != 0)
+		{
+			if (state->event_arr[i].trg_lsn <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+int
+WaitUtility(XLogRecPtr lsn, const float8 secs, DestReceiver *dest)
+{
+	XLogRecPtr		trg_lsn = lsn;
+	XLogRecPtr		cur_lsn = GetXLogReplayRecPtr(NULL);
+	int				latch_events;
+	float8			endtime;
+	TupOutputState *tstate;
+	TupleDesc		tupdesc;
+	char		   *value = "f";
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	if (lsn)
+		AddEvent(trg_lsn);
+
+#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
+	endtime = GetNowFloat() + secs;
+
+	for (;;)
+	{
+		int			rc;
+		float8		delay = 0;
+		long		delay_ms;
+
+		if (secs > 0)
+			delay = endtime - GetNowFloat();
+		else if (secs == 0) /* 1 minute timeout to check for Interupts */
+			delay = 10;
+		else
+			delay = 1;
+
+		if (delay > 0.0)
+			delay_ms = (long) ceil(delay * 1000.0);
+		else
+			break;
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			if (lsn)
+				DeleteEvent();
+			ProcessInterrupts();
+		}
+
+		if (lsn && rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (lsn && trg_lsn <= cur_lsn)
+			break;
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* A little hack similar to SnapshotResetXmin to work out of snapshot */
+		MyPgXact->xmin = InvalidTransactionId;
+		rc = WaitLatch(MyLatch, latch_events, delay_ms, 
+					   WAIT_EVENT_CLIENT_READ);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+	}
+
+	DeleteEvent();
+
+	if (lsn && trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+	return strcmp(value,"t")?1:0;
+}
+
+void
+WaitTimeUtility(float8 delay)
+{
+	int				latch_events;
+
+	if (delay < 0)
+		return ;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, (long) ceil(delay * 1000.0), WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
+
+/* Get universal time */
+float8
+WaitTimeResolve(Const *time)
+{
+	int			ret;
+	float8		val;
+
+	Oid		types[] = { time->consttype };
+	Datum	values[] = { time->constvalue };
+	char	nulls[] = { " " };
+
+	Datum result;
+	bool isnull;
+
+	SPI_connect();
+
+	if (time->consttype == 1083)
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+									1, types, values, nulls, true, 0);
+	else if (time->consttype == 1266)
+		ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+									1, types, values, nulls, true, 0);
+	else
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+									1, types, values, nulls, true, 0);
+
+	Assert(ret >= 0);
+	result = SPI_getbinval(SPI_tuptable->vals[0],
+						   SPI_tuptable->tupdesc,
+						   1, &isnull);
+
+	Assert(!isnull);
+	val = DatumGetFloat8(result);
+
+	elog(INFO, "time: %f", val);
+
+	SPI_finish();
+	return val;
+}
+
+int
+WaitMain(WaitStmt *stmt, DestReceiver *dest)
+{
+	int				res = 0;
+	float8			val = 0;
+	XLogRecPtr		trg_lsn = InvalidXLogRecPtr;
+	ListCell	   *events;
+	float8			time_val = 0;
+	XLogRecPtr		lsn = InvalidXLogRecPtr;
+
+	if (stmt->strategy == WAIT_FOR_ANY)
+	{
+		time_val = DBL_MAX;
+		lsn = PG_UINT64_MAX;
+	}
+
+	/* Extract options from the statement node tree */
+	foreach(events, stmt->events)
+	{
+		WaitStmt	   *event = (WaitStmt *) lfirst(events);
+
+		if (event->lsn)
+		{
+			int32	res;
+			trg_lsn = DatumGetLSN(
+						DirectFunctionCall1(pg_lsn_in,
+							CStringGetDatum(event->lsn)));
+			res = DatumGetUInt32(
+						DirectFunctionCall2(pg_lsn_cmp,
+							lsn, trg_lsn));
+
+			/* Nice behavour on LSN from past */
+			if (stmt->strategy == WAIT_FOR_ALL)
+			{
+				if (res <= 0)
+				{
+					lsn = trg_lsn;
+					if (event->delay)
+						time_val = event->delay / 1000;
+				}
+			}
+			else
+			{
+				if (res > 0)
+				{
+					lsn = trg_lsn;
+					if (event->delay)
+						time_val = event->delay / 1000;
+				}
+			}
+
+			if (stmt->wait_type == WAIT_EVENT_TIME)
+				stmt->wait_type = WAIT_EVENT_MIX;
+			else if (stmt->wait_type == WAIT_EVENT_NONE)
+				stmt->wait_type = WAIT_EVENT_LSN;
+
+		}
+
+		if (event->time)
+		{
+			Const *time = (Const *) event->time;
+			val = WaitTimeResolve(time);
+
+			if (stmt->wait_type == WAIT_EVENT_LSN)
+				stmt->wait_type = WAIT_EVENT_MIX;
+			else if (stmt->wait_type == WAIT_EVENT_NONE)
+				stmt->wait_type = WAIT_EVENT_TIME;
+
+			/* if val == 0 ??  */
+			if (stmt->strategy == WAIT_FOR_ALL)
+			{
+				if (time_val <= val)
+					time_val = val;
+			}
+			else
+			{
+				if (time_val > val)
+					time_val = val;
+			}
+		}
+
+	}
+
+	if (stmt->wait_type == WAIT_EVENT_TIME)
+	{
+		WaitTimeUtility(time_val * 1000);
+		res = 1;
+	}
+	else
+		res = WaitUtility(lsn, time_val, dest);
+	return res;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e084c3f069..cc8d20a91b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2760,6 +2760,19 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_INT_FIELD(delay);
+	WRITE_NODE_FIELD(events);
+	WRITE_NODE_FIELD(time);
+	WRITE_ENUM_FIELD(wait_type, WaitType);
+	WRITE_ENUM_FIELD(strategy, WaitForStrategy);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4305,6 +4318,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..413faad65a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static Query *transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -279,6 +280,12 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			/*
 			 * Optimizable statements
 			 */
+		case T_TransactionStmt:
+			{
+				TransactionStmt *stmt = (TransactionStmt *) parseTree;
+				result = transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);
+				break;
+			}
 		case T_InsertStmt:
 			result = transformInsertStmt(pstate, (InsertStmt *) parseTree);
 			break;
@@ -326,6 +333,9 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
+		case T_WaitStmt:
+			result = transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			break;
 
 		default:
 
@@ -2981,6 +2991,26 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+static Query *
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	Query		   *result;
+	ListCell	   *events;
+
+	if (stmt)
+		foreach(events, stmt->events)
+		{
+			WaitStmt	   *event = (WaitStmt *) lfirst(events);
+			event->time = transformExpr(pstate, event->time, EXPR_KIND_OTHER);
+		}
+
+	result = makeNode(Query);
+	result->commandType = CMD_UTILITY;
+	result->utilityStmt = (Node *) stmt;
+
+	return result;
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7e384f956c..00504a1fb7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -591,6 +591,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <list>		wait_list
+%type <node>		WaitEvent wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -660,7 +662,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -690,7 +692,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -700,7 +702,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -954,6 +957,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -9930,18 +9934,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14147,6 +14153,92 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...]
+ *				event  [option]:
+ *					LSN value
+ *					TIMEOUT value
+ *					TIMESTAMP timestamp
+ *				option:
+ *					TIMEOUT delay
+ *					UNTIL TIMESTAMP timestamp
+ *
+ *****************************************************************************/
+WaitStmt:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->wait_type = WAIT_EVENT_NONE;
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			;
+wait_for:
+			WAIT FOR wait_strategy wait_list
+			{
+				WaitStmt *n = makeNode(WaitStmt);
+				n->wait_type = WAIT_EVENT_NONE;
+				n->strategy = $3;
+				n->events = $4;
+				$$ = (Node *)n;
+			}
+			| /* EMPTY */		{ $$ = NULL; };
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ALL; }
+		;
+
+wait_list:
+			WaitEvent					{ $$ = list_make1($1); }
+			| wait_list ',' WaitEvent	{ $$ = lappend($1, $3); }
+			| wait_list WaitEvent		{ $$ = lappend($1, $2); }
+		;
+
+WaitEvent:
+			LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->wait_type = WAIT_EVENT_LSN;
+					n->lsn = $2;
+					n->delay = 0;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+
+			| LSN Sconst TIMEOUT Iconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_MIX;
+						n->lsn = $2;
+						n->delay = $4;
+						n->time = NULL;
+						$$ = (Node *)n;
+					}
+			| LSN Sconst UNTIL ConstDatetime Sconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_MIX;
+						n->lsn = $2;
+						n->delay = 0;
+						n->time = makeStringConstCast($5, @5, $4);
+						$$ = (Node *)n;
+					}
+			| ConstDatetime Sconst
+					{
+						WaitStmt *n = makeNode(WaitStmt);
+						n->wait_type = WAIT_EVENT_TIME;
+						n->lsn = NULL;
+						n->delay = 0;
+						n->time = makeStringConstCast($2, @2, $1);
+						$$ = (Node *)n;
+					}
+			;
+
 
 /*
  * Aggregate decoration clauses
@@ -15291,6 +15383,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15413,6 +15506,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15439,6 +15533,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..8c3d196a9a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in SHMEM for Wait
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d0..f9a276e84b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -268,6 +273,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -591,6 +597,13 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							int			res = -1;
+							WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+
+							if (stmt->wait)
+								res = WaitMain(waitstmt, dest);
+							if (res == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
@@ -1062,6 +1075,13 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt *stmt = (WaitStmt *) parsetree;
+				WaitMain(stmt, dest);
+				break;
+			}
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2718,6 +2738,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3357,6 +3381,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 0000000000..11115b9dab
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const float8 delay, DestReceiver *dest);
+extern void WaitTimeUtility(float8 delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern float8 WaitTimeResolve(Const *time);
+extern int WaitMain(WaitStmt *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..348de76c5f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -488,6 +488,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2039b42449..0429eebb5f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3054,6 +3054,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* Wait for event node or NULL */
 } TransactionStmt;
 
 /* ----------------------
@@ -3563,4 +3564,34 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		Wait Statement
+ * ----------------------
+ */
+
+typedef enum WaitType
+{
+	WAIT_EVENT_NONE = 0,
+	WAIT_EVENT_LSN,
+	WAIT_EVENT_TIME,
+	WAIT_EVENT_MIX
+} WaitType;
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag			type;
+	WaitType		wait_type;
+	WaitForStrategy	strategy;
+	List		   *events;		/* option */
+	char		   *lsn;		/* Target LSN to wait for */
+	int				delay;		/* Timeout when waiting for LSN, in msec */
+	Node		   *time;		/* Wait for timestamp */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index b1184c2d15..dd22e358b9 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -404,6 +405,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -444,6 +446,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 8ef0f55e74..430bb5c717 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/018_waitfor.pl b/src/test/recovery/t/018_waitfor.pl
new file mode 100644
index 0000000000..6817431e9c
--- /dev/null
+++ b/src/test/recovery/t/018_waitfor.pl
@@ -0,0 +1,64 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content
+$node_master->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+my $delay        = 4;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Make new content on master and check its presence in standby depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $master_insert_time = time();
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(11, 20))");
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved master LSN.
+my $until_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# Check that waitlsn is able to setup infinite waiting loop and exit
+# it without timeouts.
+$node_standby->safe_psql('postgres',
+    "WAIT FOR LSN '$until_lsn'", 't')
+  or die "standby never caught up";
+
+# Check that waitlsn can return result immediately with NOWAIT.
+$node_standby->poll_query_until('postgres',
+    "WAIT FOR LSN '$until_lsn' TIMEOUT 1", 't')
+  or die "standby never caught up";
+
+# This test is successful if and only if the LSN has been applied with at least
+# the configured apply delay.
+my $time_waited = time() - $master_insert_time;
+ok($time_waited >= $delay,"standby applies WAL only after replication delay");
#59Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Kartyshov Ivan (#58)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-03-27 04:15, Kartyshov Ivan wrote:

Anna, feel free to work on this patch.

Ivan and I worked on this patch a bit more. We fixed the bugs that we
could find and cleaned up the code. For now, we've kept both options:
WAIT as a standalone statement and WAIT as a part of BEGIN. The new
patch is attached.

The syntax looks a bit different now:

- WAIT FOR [ANY | ALL] event [, ...]
- BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR
[ANY | ALL] event [, ...]]
where event is one of:
LSN value
TIMEOUT number_of_milliseconds
timestamp

Now, one event cannot contain both an LSN and a TIMEOUT. With such
syntax, the behaviour seems to make sense. For the (default) WAIT FOR
ALL strategy, we pick the maximum LSN and maximum allowed timeout, and
wait for the LSN till the timeout is over. If no timeout is specified,
we wait forever. If no LSN is specified, we just wait for the time to
pass. For the WAIT FOR ANY strategy, it's the same but we pick minimum
LSN and timeout.

There are still some questions left:
1) Should we only keep the BEGIN option, or does the standalone command
have potential after all?
2) Should we change the grammar so that WAIT can be in any position of
the BEGIN statement, not necessarily in the end? Ivan and I haven't come
to a consensus about this, so more opinions would be helpful.
3) Since we added the "wait" attribute to TransactionStmt struct, do we
need to add something to _copyTransactionStmt() /
_equalTransactionStmt()?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

Attachments:

begin_waitfor_v3.patchtext/x-diff; name=begin_waitfor_v3.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e6..8697f9807ff 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY wait               SYSTEM "wait.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..558ff668dbd 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<phrase>and <replaceable class="parameter">wait_for_event</replaceable> is:</phrase>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN lsn_value
+    TIMEOUT number_of_milliseconds
+    timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..e5e2e15cdf0 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<phrase>and <replaceable class="parameter">wait_for_event</replaceable> is:</phrase>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN lsn_value
+    TIMEOUT number_of_milliseconds
+    timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 00000000000..0e5bd2523d6
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,146 @@
+<!--
+doc/src/sgml/ref/waitlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-waitlsn">
+ <indexterm zone="sql-waitlsn">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, <replaceable class="parameter">event</replaceable> ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN value
+    TIMEOUT number_of_milliseconds
+    timestamp
+
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>'
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ALL LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>, TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ANY LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple interprocess communication
+   mechanism to wait for timestamp, timeout or the target log sequence number
+   (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command>
+   command waits for the specified <acronym>LSN</acronym> to be replayed.
+   If no timestamp or timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds,
+   and then cancel the command if <acronym>LSN</acronym> was not reached:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b3..588e96aa143 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &wait;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8fe92962b0d..b8e884ccc4f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7312,6 +7313,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..bd4f57630eb
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,410 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows
+ *	  waiting for LSN to have been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include <math.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shmem array */
+static void AddEvent(XLogRecPtr trg_lsn);
+static void DeleteEvent(void);
+
+/* Shared memory structures */
+typedef struct
+{
+	XLogRecPtr	trg_lsn;
+	/*
+	 * Left struct BIDState here for compatibility with
+	 * a planned future patch that will allow waiting for XIDs.
+	 */
+} BIDState;
+
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	BIDState	event_arr[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static volatile WaitState *state;
+
+/* Add event of the current backend to shmem array */
+static void
+AddEvent(XLogRecPtr trg_lsn)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->event_arr[MyBackendId].trg_lsn = trg_lsn;
+
+	if (trg_lsn < state->min_lsn)
+		state->min_lsn = trg_lsn;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int			i;
+	XLogRecPtr	trg_lsn = state->event_arr[MyBackendId].trg_lsn;
+
+	state->event_arr[MyBackendId].trg_lsn = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+	/* Update state->min_lsn iff it is nessesary for choosing next min_lsn */
+	if (state->min_lsn == trg_lsn)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr &&
+				state->event_arr[i].trg_lsn < state->min_lsn)
+				state->min_lsn = state->event_arr[i].trg_lsn;
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->event_arr[i].trg_lsn != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, event_arr);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(BIDState)));
+	return size;
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends+1); i++)
+			state->event_arr[i].trg_lsn = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->event_arr[i].trg_lsn != 0)
+		{
+			if (backend && state->event_arr[i].trg_lsn <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+int
+WaitUtility(XLogRecPtr lsn, const float8 secs, DestReceiver *dest)
+{
+	XLogRecPtr	trg_lsn = lsn;
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	char	   *value = "f";
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	if (lsn)
+		AddEvent(trg_lsn);
+
+#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
+	endtime = GetNowFloat() + secs;
+
+	for (;;)
+	{
+		int			rc;
+		float8		delay = 0;
+		long		delay_ms;
+
+		if (secs > 0)
+			delay = endtime - GetNowFloat();
+		else if (secs == 0) /* 1 minute timeout to check for Interupts */
+			delay = 10;
+		else
+			delay = 1;
+
+		if (delay > 0.0)
+			delay_ms = (long) ceil(delay * 1000.0);
+		else
+			break;
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			if (lsn)
+				DeleteEvent();
+			ProcessInterrupts();
+		}
+
+		if (lsn && rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (lsn && trg_lsn <= cur_lsn)
+			break;
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* A little hack similar to SnapshotResetXmin to work out of snapshot */
+		MyPgXact->xmin = InvalidTransactionId;
+		rc = WaitLatch(MyLatch, latch_events, delay_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+	}
+
+	DeleteEvent();
+
+	if (lsn && trg_lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+	return strcmp(value,"t");
+}
+
+void
+WaitTimeUtility(float8 delay)
+{
+	int			latch_events;
+
+	if (delay <= 0)
+		return;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, (long) ceil(delay * 1000.0), WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
+
+/*
+ * Get the amount of seconds left till the specified time.
+ */
+float8
+WaitTimeResolve(Const *time)
+{
+	int			ret;
+	float8		val;
+	Oid			types[] = { time->consttype };
+	Datum		values[] = { time->constvalue };
+	char		nulls[] = { " " };
+	Datum		result;
+	bool		isnull;
+
+	SPI_connect();
+
+	if (time->consttype == 1083)
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+									1, types, values, nulls, true, 0);
+	else if (time->consttype == 1266)
+		ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+									1, types, values, nulls, true, 0);
+	else
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+									1, types, values, nulls, true, 0);
+
+	Assert(ret >= 0);
+	result = SPI_getbinval(SPI_tuptable->vals[0],
+						   SPI_tuptable->tupdesc,
+						   1, &isnull);
+
+	Assert(!isnull);
+	val = DatumGetFloat8(result);
+
+	elog(INFO, "time: %f", val);
+
+	SPI_finish();
+	return val;
+}
+
+/* Implementation of WAIT FOR */
+int
+WaitMain(WaitStmt *stmt, DestReceiver *dest)
+{
+	ListCell   *events;
+	float8		delay = 0;
+	float8		final_delay = 0;
+	XLogRecPtr	lsn = InvalidXLogRecPtr;
+	XLogRecPtr	final_lsn = InvalidXLogRecPtr;
+	bool		has_lsn = false;
+	bool		wait_forever = true;
+	int			res = 1;
+
+	if (stmt->strategy == WAIT_FOR_ANY)
+	{
+		/* Prepare to find minimum LSN and delay */
+		final_delay = DBL_MAX;
+		final_lsn = PG_UINT64_MAX;
+	}
+
+	/* Extract options from the statement node tree */
+	foreach(events, stmt->events)
+	{
+		WaitStmt   *event = (WaitStmt *) lfirst(events);
+
+		/* LSN to wait for */
+		if (event->lsn)
+		{
+			has_lsn = true;
+			lsn = DatumGetLSN(
+						DirectFunctionCall1(pg_lsn_in,
+							CStringGetDatum(event->lsn)));
+
+			/*
+			 * When waiting for ALL, select max LSN to wait for.
+			 * When waiting for ANY, select min LSN to wait for.
+			 */
+			if ((stmt->strategy == WAIT_FOR_ALL && final_lsn <= lsn) ||
+				(stmt->strategy == WAIT_FOR_ANY && final_lsn > lsn))
+			{
+				final_lsn = lsn;
+			}
+		}
+
+		/* Time delay to wait for */
+		if (event->time || event->delay)
+		{
+			wait_forever = false;
+
+			if (event->delay)
+				delay = event->delay / 1000.0;
+
+			if (event->time)
+			{
+				Const *time = (Const *) event->time;
+				delay = WaitTimeResolve(time);
+			}
+
+			if (delay < 0)
+				delay = 0;
+
+			/*
+			 * When waiting for ALL, select max delay to wait for.
+			 * When waiting for ANY, select min delay to wait for.
+			 */
+			if ((stmt->strategy == WAIT_FOR_ALL && final_delay <= delay) ||
+				(stmt->strategy == WAIT_FOR_ANY && final_delay > delay))
+			{
+				final_delay = delay;
+			}
+		}
+	}
+
+	if (!has_lsn)
+	{
+		WaitTimeUtility(final_delay);
+		res = 0;
+	}
+	else
+		res = WaitUtility(final_lsn, wait_forever ? 0 : final_delay, dest);
+
+	return res;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e084c3f0699..c980c6d560e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2760,6 +2760,18 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_INT_FIELD(delay);
+	WRITE_NODE_FIELD(events);
+	WRITE_NODE_FIELD(time);
+	WRITE_ENUM_FIELD(strategy, WaitForStrategy);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4305,6 +4317,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842b..bc2304a0ba1 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static void transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -326,6 +327,19 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
+		case T_WaitStmt:
+			transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			result = makeNode(Query);
+			result->commandType = CMD_UTILITY;
+			result->utilityStmt = (Node *) parseTree;
+			break;
+		case T_TransactionStmt:
+			{
+				TransactionStmt *stmt = (TransactionStmt *) parseTree;
+				if ((stmt->kind == TRANS_STMT_BEGIN ||
+						stmt->kind == TRANS_STMT_START) && stmt->wait)
+					transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);
+			}
 
 		default:
 
@@ -2981,6 +2995,18 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+static void
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	ListCell   *events;
+
+	foreach(events, stmt->events)
+	{
+		WaitStmt   *event = (WaitStmt *) lfirst(events);
+		event->time = transformExpr(pstate, event->time, EXPR_KIND_OTHER);
+	}
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7e384f956c8..63e7b7224c3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -591,6 +591,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <list>		wait_list
+%type <node>		WaitEvent wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -660,7 +662,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -690,7 +692,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -700,7 +702,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -954,6 +957,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -9930,18 +9934,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14147,6 +14153,74 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...]
+ *				event is one of:
+ *					LSN value
+ *					TIMEOUT delay
+ *					timestamp
+ *
+ *****************************************************************************/
+WaitStmt:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			;
+wait_for:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; };
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ALL; }
+		;
+
+wait_list:
+			WaitEvent					{ $$ = list_make1($1); }
+			| wait_list ',' WaitEvent	{ $$ = lappend($1, $3); }
+			| wait_list WaitEvent		{ $$ = lappend($1, $2); }
+		;
+
+WaitEvent:
+			LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+			| TIMEOUT Iconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = NULL;
+					n->delay = $2;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+			| ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = NULL;
+					n->delay = 0;
+					n->time = makeStringConstCast($2, @2, $1);
+					$$ = (Node *)n;
+				}
+			;
+
 
 /*
  * Aggregate decoration clauses
@@ -15291,6 +15365,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15413,6 +15488,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15439,6 +15515,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..bb8af349808 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in shared memory for WAIT
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d01..daec0551717 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -268,6 +273,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -591,6 +597,11 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+
+							/* If we have Wait event, but it not reached */
+							if (stmt->wait && WaitMain(waitstmt, dest) != 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
@@ -1062,6 +1073,13 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt   *stmt = (WaitStmt *) parsetree;
+				WaitMain(stmt, dest);
+				break;
+			}
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2718,6 +2736,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3357,6 +3379,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..11115b9dab6
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const float8 delay, DestReceiver *dest);
+extern void WaitTimeUtility(float8 delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern float8 WaitTimeResolve(Const *time);
+extern int WaitMain(WaitStmt *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8ccb..348de76c5f4 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -488,6 +488,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2039b424499..30af2cfd505 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3054,6 +3054,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* Wait for event node or NULL */
 } TransactionStmt;
 
 /* ----------------------
@@ -3563,4 +3564,25 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		Wait Statement
+ * ----------------------
+ */
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag			type;
+	WaitForStrategy	strategy;
+	List		   *events;		/* option */
+	char		   *lsn;		/* Target LSN to wait for */
+	int				delay;		/* Timeout when waiting for LSN, in msec */
+	Node		   *time;		/* Wait for timestamp */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index b1184c2d158..dd22e358b9a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -404,6 +405,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -444,6 +446,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 8ef0f55e748..430bb5c7171 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/018_waitfor.pl b/src/test/recovery/t/018_waitfor.pl
new file mode 100644
index 00000000000..42900dd8867
--- /dev/null
+++ b/src/test/recovery/t/018_waitfor.pl
@@ -0,0 +1,106 @@
+# Checks WAIT FOR
+
+# TODO:
+# - test WAIT FOR ALL (many LSNs without time, many LSNs with time)
+# - test WAIT FOR ANY (many LSNs without time, many LSNs with time)
+# - test WAIT FOR TIMESTAMP (with LSN + without LSN)
+# - test WAIT FOR TIMEOUT (with LSN + without LSN)
+
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content
+$node_master->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Make new content on master and check its presence in standby depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $master_insert_time = time();
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(11, 20))");
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved master LSN.
+my $lsn1 =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Check that waitlsn is able to setup infinite waiting loop and exit
+# it without timeouts.
+$node_standby->safe_psql('postgres',
+    "WAIT FOR LSN '$lsn1'")
+  or die "standby never caught up";
+
+# This test is successful iff the LSN has been applied with at least
+# the configured apply delay.
+my $time_waited = time() - $master_insert_time;
+ok($time_waited >= $delay, "standby applies WAL only after replication delay");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# Check that waitlsn can return result immediately with NOWAIT.
+$node_standby->poll_query_until('postgres',
+    "WAIT FOR LSN '$lsn1' TIMEOUT 1", 't')
+  or die "standby never caught up";
+
+# Check that timeouts work on their own, without an LSN specified
+my $timeout = 1000; # 1 second
+my $start_time = time();
+$node_standby->safe_psql('postgres', "WAIT FOR TIMEOUT $timeout");
+$time_waited = time() - $start_time;
+note("Asked for timeout $timeout ms, got delayed for $time_waited second(s)");
+ok($time_waited >= $timeout / 1000, "WAIT FOR TIMEOUT works without LSN");
+
+# ==============================================================================
+# TODO: Check that isolation level doesn't get broken due to WAIT
+# TODO: Add more tests for WAIT FOR ALL / WAIT FOR ANY / WAIT FOR DATETIME
+# TODO: Cleanup tests
+# ==============================================================================
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(31, 40))");
+
+my $lsn2 =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+$node_master->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(41, 500000))");
+
+my $standby_max =
+  $node_standby->safe_psql('postgres',
+    "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ WAIT FOR LSN '$lsn2'; SELECT pg_sleep_for('2s'); SELECT max(a) FROM tab_int; COMMIT;");
+
+my $lsn_master =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $lsn_standby =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+note("Waited for LSN $lsn2, got LSN $lsn_standby on standby (current master LSN = $lsn_master), standby max = $standby_max");
+
+$node_master->stop;
+$node_standby->stop;
#60Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Anna Akenteva (#59)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-01 02:26, Anna Akenteva wrote:

On 2020-03-27 04:15, Kartyshov Ivan wrote:

Anna, feel free to work on this patch.

Ivan and I worked on this patch a bit more. We fixed the bugs that we
could find and cleaned up the code. For now, we've kept both options:
WAIT as a standalone statement and WAIT as a part of BEGIN. The new
patch is attached.

The syntax looks a bit different now:

- WAIT FOR [ANY | ALL] event [, ...]
- BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR
[ANY | ALL] event [, ...]]
where event is one of:
LSN value
TIMEOUT number_of_milliseconds
timestamp

Now, one event cannot contain both an LSN and a TIMEOUT.

In my understanding the whole idea of having TIMEOUT was to do something
like 'Do wait for this LSN to be replicated, but no longer than TIMEOUT
milliseconds'. What is the point of having plain TIMEOUT? It seems to be
equivalent to pg_sleep, doesn't it?

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#61Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Alexey Kondratov (#60)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

I did some code cleanup and added tests - both for the standalone WAIT
FOR statement and for WAIT FOR as a part of BEGIN. The new patch is
attached.

On 2020-04-03 17:29, Alexey Kondratov wrote:

On 2020-04-01 02:26, Anna Akenteva wrote:

- WAIT FOR [ANY | ALL] event [, ...]
- BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR
[ANY | ALL] event [, ...]]
where event is one of:
LSN value
TIMEOUT number_of_milliseconds
timestamp

Now, one event cannot contain both an LSN and a TIMEOUT.

In my understanding the whole idea of having TIMEOUT was to do
something like 'Do wait for this LSN to be replicated, but no longer
than TIMEOUT milliseconds'. What is the point of having plain TIMEOUT?
It seems to be equivalent to pg_sleep, doesn't it?

In the patch that I reviewed, you could do things like:
WAIT FOR
LSN lsn0,
LSN lsn1 TIMEOUT time1,
LSN lsn2 TIMEOUT time2;
and such a statement was in practice equivalent to
WAIT FOR LSN(max(lsn0, lsn1, lsn2)) TIMEOUT (max(time1, time2))

As you can see, even though grammatically lsn1 is grouped with time1 and
lsn2 is grouped with time2, both timeouts that we specified are not
connected to their respective LSN-s, and instead they kinda act like
global timeouts. Therefore, I didn't see a point in keeping TIMEOUT
necessarily grammatically connected to LSN.

In the new syntax our statement would look like this:
WAIT FOR LSN lsn0, LSN lsn1, LSN lsn2, TIMEOUT time1, TIMEOUT time2;
TIMEOUT-s are not forced to be grouped with LSN-s anymore, which makes
it more clear that all specified TIMEOUTs will be global and will apply
to all LSN-s at once.

The point of having TIMEOUT is still to let us limit the time of waiting
for LSNs. It's just that with the new syntax, we can also use TIMEOUT
without an LSN. You are right, such a case is equivalent to pg_sleep.
One way to avoid that is to prohibit waiting for TIMEOUT without
specifying an LSN. Do you think we should do that?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

Attachments:

begin_waitfor_v4.patchtext/x-diff; name=begin_waitfor_v4.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e6..8697f9807ff 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY wait               SYSTEM "wait.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..cfee3c8f102 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<phrase>where <replaceable class="parameter">wait_for_event</replaceable> is:</phrase>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, ...]
+
+<phrase>and <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN lsn_value
+    TIMEOUT number_of_milliseconds
+    timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..0a2ea7e80be 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<phrase>where <replaceable class="parameter">wait_for_event</replaceable> is:</phrase>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, ...]
+
+<phrase>and <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN lsn_value
+    TIMEOUT number_of_milliseconds
+    timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 00000000000..26cae3ad859
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,146 @@
+<!--
+doc/src/sgml/ref/wait.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-wait">
+ <indexterm zone="sql-wait">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed or for specified time to pass</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN value
+    TIMEOUT number_of_milliseconds
+    timestamp
+
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>'
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ALL LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>, TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ANY LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple interprocess communication
+   mechanism to wait for timestamp, timeout or the target log sequence number
+   (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command>
+   command waits for the specified <acronym>LSN</acronym> to be replayed.
+   If no timestamp or timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds,
+   and then cancel the command if <acronym>LSN</acronym> was not reached:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b3..588e96aa143 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &wait;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448f502..c0e2c2141a8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7376,6 +7377,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..01a90a12c8a
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,402 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  time passing or LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include <math.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shared memory array */
+static void AddEvent(XLogRecPtr lsn_to_wait);
+static void DeleteEvent(void);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	XLogRecPtr	waited_lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static volatile WaitState *state;
+
+/* Add the event of the current backend to the shared memory array */
+static void
+AddEvent(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->waited_lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn)
+		state->min_lsn = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared memory array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ */
+static void
+DeleteEvent(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete = state->waited_lsn[MyBackendId];
+
+	state->waited_lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+
+	/* If we need to choose the next min_lsn, update state->min_lsn */
+	if (state->min_lsn == lsn_to_delete)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->waited_lsn[i] != InvalidXLogRecPtr &&
+				state->waited_lsn[i] < state->min_lsn)
+				state->min_lsn = state->waited_lsn[i];
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->waited_lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, waited_lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->waited_lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->waited_lsn[i] != 0)
+		{
+			if (backend && state->waited_lsn[i] <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+int
+WaitUtility(XLogRecPtr lsn, const float8 secs, DestReceiver *dest)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	char	   *value = "f";
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	if (lsn != InvalidXLogRecPtr)
+		AddEvent(lsn);
+
+#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
+	endtime = GetNowFloat() + secs;
+
+	for (;;)
+	{
+		int			rc;
+		float8		delay = 0;
+		long		delay_ms;
+
+		if (secs > 0)
+			delay = endtime - GetNowFloat();
+		else if (secs == 0) /* 1 minute timeout to check for Interupts */
+			delay = 10;
+		else
+			delay = 1;
+
+		if (delay > 0.0)
+			delay_ms = (long) ceil(delay * 1000.0);
+		else
+			break;
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			if (lsn != InvalidXLogRecPtr)
+				DeleteEvent();
+			ProcessInterrupts();
+		}
+
+		if (lsn != InvalidXLogRecPtr && rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (lsn != InvalidXLogRecPtr && lsn <= cur_lsn)
+			break;
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		/* A little hack similar to SnapshotResetXmin to work out of snapshot */
+		MyPgXact->xmin = InvalidTransactionId;
+		rc = WaitLatch(MyLatch, latch_events, delay_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+	}
+
+	DeleteEvent();
+
+	if (lsn != InvalidXLogRecPtr && lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		value = "t";
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, value);
+	end_tup_output(tstate);
+	return strcmp(value,"t");
+}
+
+void
+WaitTimeUtility(float8 delay)
+{
+	int			latch_events;
+
+	if (delay <= 0)
+		return;
+
+	latch_events = WL_TIMEOUT | WL_POSTMASTER_DEATH;
+
+	MyPgXact->xmin = InvalidTransactionId;
+	WaitLatch(MyLatch, latch_events, (long) ceil(delay * 1000.0),
+			  WAIT_EVENT_CLIENT_READ);
+	ResetLatch(MyLatch);
+}
+
+/*
+ * Get the amount of seconds left till the specified time.
+ */
+float8
+WaitTimeResolve(Const *time)
+{
+	int			ret;
+	float8		val;
+	Oid			types[] = { time->consttype };
+	Datum		values[] = { time->constvalue };
+	char		nulls[] = { " " };
+	Datum		result;
+	bool		isnull;
+
+	SPI_connect();
+
+	if (time->consttype == 1083)
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+									1, types, values, nulls, true, 0);
+	else if (time->consttype == 1266)
+		ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+									1, types, values, nulls, true, 0);
+	else
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+									1, types, values, nulls, true, 0);
+
+	Assert(ret >= 0);
+	result = SPI_getbinval(SPI_tuptable->vals[0],
+						   SPI_tuptable->tupdesc,
+						   1, &isnull);
+
+	Assert(!isnull);
+	val = DatumGetFloat8(result);
+
+	elog(INFO, "time: %f", val);
+
+	SPI_finish();
+	return val;
+}
+
+/* Implementation of WAIT FOR */
+int
+WaitMain(WaitStmt *stmt, DestReceiver *dest)
+{
+	ListCell   *events;
+	float8		delay = 0;
+	float8		final_delay = 0;
+	XLogRecPtr	lsn = InvalidXLogRecPtr;
+	XLogRecPtr	final_lsn = InvalidXLogRecPtr;
+	bool		has_lsn = false;
+	bool		wait_forever = true;
+	int			res = 1;
+
+	if (stmt->strategy == WAIT_FOR_ANY)
+	{
+		/* Prepare to find minimum LSN and delay */
+		final_delay = DBL_MAX;
+		final_lsn = PG_UINT64_MAX;
+	}
+
+	/* Extract options from the statement node tree */
+	foreach(events, stmt->events)
+	{
+		WaitStmt   *event = (WaitStmt *) lfirst(events);
+
+		/* LSN to wait for */
+		if (event->lsn)
+		{
+			has_lsn = true;
+			lsn = DatumGetLSN(
+						DirectFunctionCall1(pg_lsn_in,
+							CStringGetDatum(event->lsn)));
+
+			/*
+			 * When waiting for ALL, select max LSN to wait for.
+			 * When waiting for ANY, select min LSN to wait for.
+			 */
+			if ((stmt->strategy == WAIT_FOR_ALL && final_lsn <= lsn) ||
+				(stmt->strategy == WAIT_FOR_ANY && final_lsn > lsn))
+			{
+				final_lsn = lsn;
+			}
+		}
+
+		/* Time delay to wait for */
+		if (event->time || event->delay)
+		{
+			wait_forever = false;
+
+			if (event->delay)
+				delay = event->delay / 1000.0;
+
+			if (event->time)
+			{
+				Const *time = (Const *) event->time;
+				delay = WaitTimeResolve(time);
+			}
+
+			if (delay < 0)
+				delay = 0;
+
+			/*
+			 * When waiting for ALL, select max delay to wait for.
+			 * When waiting for ANY, select min delay to wait for.
+			 */
+			if ((stmt->strategy == WAIT_FOR_ALL && final_delay <= delay) ||
+				(stmt->strategy == WAIT_FOR_ANY && final_delay > delay))
+			{
+				final_delay = delay;
+			}
+		}
+	}
+
+	if (!has_lsn)
+	{
+		WaitTimeUtility(final_delay);
+		res = 0;
+	}
+	else
+		res = WaitUtility(final_lsn, wait_forever ? 0 : final_delay, dest);
+
+	return res;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb168ffd6da..830bdbb6ab0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2760,6 +2760,18 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_INT_FIELD(delay);
+	WRITE_NODE_FIELD(events);
+	WRITE_NODE_FIELD(time);
+	WRITE_ENUM_FIELD(strategy, WaitForStrategy);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4306,6 +4318,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842b..08e2649b9df 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static void transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -326,7 +327,20 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
-
+		case T_WaitStmt:
+			transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			result = makeNode(Query);
+			result->commandType = CMD_UTILITY;
+			result->utilityStmt = (Node *) parseTree;
+			break;
+		case T_TransactionStmt:
+			{
+				TransactionStmt *stmt = (TransactionStmt *) parseTree;
+				if ((stmt->kind == TRANS_STMT_BEGIN ||
+						stmt->kind == TRANS_STMT_START) && stmt->wait)
+					transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);
+			}
+			/* no break here - we want to fall through to the default */
 		default:
 
 			/*
@@ -2981,6 +2995,23 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+/*
+ * transformWaitForStmt -
+ *	transform the WAIT FOR clause of the BEGIN statement
+ *	transform the WAIT FOR statement (TODO: remove this line if we don't keep it)
+ */
+static void
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	ListCell   *events;
+
+	foreach(events, stmt->events)
+	{
+		WaitStmt   *event = (WaitStmt *) lfirst(events);
+		event->time = transformExpr(pstate, event->time, EXPR_KIND_OTHER);
+	}
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index eb0bf12cd8b..4ce315f95d9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -591,6 +591,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <list>		wait_list
+%type <node>		WaitEvent wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -660,7 +662,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -690,7 +692,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -700,7 +702,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -954,6 +957,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -9940,18 +9944,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14157,6 +14163,74 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...]
+ *				event is one of:
+ *					LSN value
+ *					TIMEOUT delay
+ *					timestamp
+ *
+ *****************************************************************************/
+WaitStmt:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			;
+wait_for:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; };
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ALL; }
+		;
+
+wait_list:
+			WaitEvent					{ $$ = list_make1($1); }
+			| wait_list ',' WaitEvent	{ $$ = lappend($1, $3); }
+			| wait_list WaitEvent		{ $$ = lappend($1, $2); }
+		;
+
+WaitEvent:
+			LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+			| TIMEOUT Iconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = NULL;
+					n->delay = $2;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+			| ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = NULL;
+					n->delay = 0;
+					n->time = makeStringConstCast($2, @2, $1);
+					$$ = (Node *)n;
+				}
+			;
+
 
 /*
  * Aggregate decoration clauses
@@ -15301,6 +15375,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15423,6 +15498,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15449,6 +15525,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..bb8af349808 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in shared memory for WAIT
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d01..ad85f106040 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -268,6 +273,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -591,6 +597,11 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+
+							/* If needed to WAIT FOR something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) != 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
@@ -1062,6 +1073,13 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt   *stmt = (WaitStmt *) parsetree;
+				WaitMain(stmt, dest);
+				break;
+			}
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2718,6 +2736,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3357,6 +3379,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..11115b9dab6
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const float8 delay, DestReceiver *dest);
+extern void WaitTimeUtility(float8 delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern float8 WaitTimeResolve(Const *time);
+extern int WaitMain(WaitStmt *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8ccb..348de76c5f4 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -488,6 +488,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 77943f06376..971f343cf7a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3055,6 +3055,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* WAIT clause: list of events to wait for */
 } TransactionStmt;
 
 /* ----------------------
@@ -3564,4 +3565,26 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WAIT FOR Statement + WAIT FOR clause of BEGIN statement
+ *		TODO: if we only pick one, remove the other
+ * ----------------------
+ */
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag		type;
+	WaitForStrategy strategy;
+	List	   *events;		/* used as a pointer to the next WAIT event */
+	char	   *lsn;		/* WAIT FOR LSN */
+	int			delay;		/* WAIT FOR TIMEOUT */
+	Node	   *time;		/* WAIT FOR TIMESTAMP or TIME */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index b1184c2d158..dd22e358b9a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -404,6 +405,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -444,6 +446,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 8ef0f55e748..430bb5c7171 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/020_begin_wait.pl b/src/test/recovery/t/020_begin_wait.pl
new file mode 100644
index 00000000000..9bec57ee8f4
--- /dev/null
+++ b/src/test/recovery/t/020_begin_wait.pl
@@ -0,0 +1,145 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a streaming standby with a 1 second delay from the backup
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+
+# Make sure that WAIT FOR LSN works: add new content to master and memorize
+# master's new LSN, then wait for master's LSN on standby. Prove that WAIT is
+# able to setup an infinite waiting loop and exit it if given no wait timeout.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_standby->safe_psql('postgres', "BEGIN WAIT FOR LSN '$lsn1'");
+
+# Get the current LSN on standby and make sure it's the same as master's LSN
+my $lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn1'::pg_lsn)");
+ok($compare_lsns eq 0, "standby reached the same LSN as master after WAIT");
+
+
+
+# Check that timeouts work on their own and let us wait for specified time (1s)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $one_second = 1000; # in milliseconds
+my $start_time = time();
+# While we're at it, also make sure that the syntax with commas works fine and
+# that by default we use WAIT FOR ALL strategy, which means waiting for max time
+$node_standby->safe_psql('postgres',
+	"WAIT FOR TIMEOUT $one_second, TIMESTAMP '$current_time'");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $one_second, "WAIT FOR TIMEOUT waits for enough time");
+
+# Now, check that timeouts work as expected when waiting for LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn2' TIMEOUT 1");
+ok($reached_lsn eq "f", "WAIT doesn't reach LSN if given too little wait time");
+
+
+#===============================================================================
+# TODO: remove this test if we remove the standalone "WAIT FOR" command
+#===============================================================================
+# We need to check that WAIT works fine inside transactions. For that, let's
+# get two LSNs that will correspond to two different max values in our table.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(31, 40))");
+my $lsn3 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(41, 50))");
+my $lsn4 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Before starting transaction, wait for LSN which ensures a max value of 40.
+# Inside the transaction, wait for LSN that ensures a max value of 50.
+# Due to ISOLATION LEVEL REPEATABLE READ, we should NOT see the new max value.
+my $standby_results = $node_standby->safe_psql(
+	'postgres', qq[
+	BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ WAIT FOR LSN '$lsn3';
+	SELECT max(a) FROM wait_test;
+	BEGIN WAIT FOR LSN '$lsn4';
+	SELECT pg_last_wal_replay_lsn();
+	SELECT max(a) FROM wait_test;
+	COMMIT;
+]);
+
+# Make sure that we indeed reach master's last LSN inside the transaction.
+# For that, check that calling pg_last_wal_replay_lsn returned that LSN.
+my $last_lsn_reached = $standby_results =~ /$lsn4/;
+ok($last_lsn_reached, "WAIT FOR LSN works inside a transaction");
+
+# Check that transaction doesn't break and show us the new max value after WAIT.
+# For that, make sure that the older max value is repeated twice in the results.
+my $count = () = $standby_results =~ /40/g;
+ok($count eq 2, "transaction isolation level doesn't get broken due to WAIT");
+
+
+
+# Get multiple LSNs for testing WAIT FOR ANY / WAIT FOR ALL
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(51, 60))");
+my $lsn5 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 70000))");
+my $lsn6 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 800000))");
+my $lsn7 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Check that WAIT FOR ANY works fine
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR ANY LSN '$lsn5' LSN '$lsn6' LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn5'::pg_lsn)");
+ok($compare_lsns ge 0,
+	"WAIT FOR ANY makes us reach at least the minimum LSN from the list");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+# TODO: Could this somehow fail due to the machine being very fast at applying LSN?
+ok($compare_lsns lt 0,
+	"WAIT FOR ANY didn't make us reach the maximum LSN from the list");
+
+# Check that WAIT FOR ALL works fine
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR ALL LSN '$lsn5', LSN '$lsn6', LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"WAIT FOR ALL makes us reach the maximum LSN from the list");
+
+
+
+$node_standby->stop;
+$node_master->stop;
diff --git a/src/test/recovery/t/021_wait.pl b/src/test/recovery/t/021_wait.pl
new file mode 100644
index 00000000000..c270e785740
--- /dev/null
+++ b/src/test/recovery/t/021_wait.pl
@@ -0,0 +1,144 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a streaming standby with a 1 second delay from the backup
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+
+# Make sure that WAIT FOR LSN works: add new content to master and memorize
+# master's new LSN, then wait for master's LSN on standby. Prove that WAIT is
+# able to setup an infinite waiting loop and exit it if given no wait timeout.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_standby->safe_psql('postgres', "WAIT FOR LSN '$lsn1'");
+
+# Get the current LSN on standby and make sure it's the same as master's LSN
+my $lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn1'::pg_lsn)");
+ok($compare_lsns eq 0, "standby reached the same LSN as master after WAIT");
+
+
+
+# Check that timeouts work on their own and let us wait for specified time (1s)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $one_second = 1000; # in milliseconds
+my $start_time = time();
+# While we're at it, also make sure that the syntax with commas works fine and
+# that by default we use WAIT FOR ALL strategy, which means waiting for max time
+$node_standby->safe_psql('postgres',
+	"WAIT FOR TIMEOUT $one_second, TIMESTAMP '$current_time'");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $one_second, "WAIT FOR TIMEOUT waits for enough time");
+
+# Now, check that timeouts work as expected when waiting for LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"WAIT FOR LSN '$lsn2' TIMEOUT 1");
+ok($reached_lsn eq "f", "WAIT doesn't reach LSN if given too little wait time");
+
+
+
+# We need to check that WAIT works fine inside transactions. For that, let's
+# get two LSNs that will correspond to two different max values in our table.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(31, 40))");
+my $lsn3 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(41, 50))");
+my $lsn4 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Before starting transaction, wait for LSN which ensures a max value of 40.
+# Inside the transaction, wait for LSN that ensures a max value of 50.
+# Due to ISOLATION LEVEL REPEATABLE READ, we should NOT see the new max value.
+my $standby_results = $node_standby->safe_psql(
+	'postgres', qq[
+	WAIT FOR LSN '$lsn3';
+	BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+	SELECT max(a) FROM wait_test;
+	WAIT FOR LSN '$lsn4';
+	SELECT pg_last_wal_replay_lsn();
+	SELECT max(a) FROM wait_test;
+	COMMIT;
+]);
+
+# Make sure that we indeed reach master's last LSN inside the transaction.
+# For that, check that calling pg_last_wal_replay_lsn returned that LSN.
+my $last_lsn_reached = $standby_results =~ /$lsn4/;
+ok($last_lsn_reached, "WAIT FOR LSN works inside a transaction");
+
+# Check that transaction doesn't break and show us the new max value after WAIT.
+# For that, make sure that the older max value is repeated twice in the results.
+my $count = () = $standby_results =~ /40/g;
+ok($count eq 2, "transaction isolation level doesn't get broken due to WAIT");
+
+
+
+# Get multiple LSNs for testing WAIT FOR ANY / WAIT FOR ALL
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(51, 60))");
+my $lsn5 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 70000))");
+my $lsn6 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 800000))");
+my $lsn7 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Check that WAIT FOR ANY works fine
+$node_standby->safe_psql('postgres',
+	"WAIT FOR ANY LSN '$lsn5' LSN '$lsn6' LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn5'::pg_lsn)");
+ok($compare_lsns ge 0,
+	"WAIT FOR ANY makes us reach at least the minimum LSN from the list");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+# TODO: Could this somehow fail due to the machine being very fast at applying LSN?
+ok($compare_lsns lt 0,
+	"WAIT FOR ANY didn't make us reach the maximum LSN from the list");
+
+# Check that WAIT FOR ALL works fine
+$node_standby->safe_psql('postgres',
+	"WAIT FOR ALL LSN '$lsn5', LSN '$lsn6', LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"WAIT FOR ALL makes us reach the maximum LSN from the list");
+
+
+
+$node_standby->stop;
+$node_master->stop;
#62Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Anna Akenteva (#61)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hi!

On Fri, Apr 3, 2020 at 9:51 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote:

In the patch that I reviewed, you could do things like:
WAIT FOR
LSN lsn0,
LSN lsn1 TIMEOUT time1,
LSN lsn2 TIMEOUT time2;
and such a statement was in practice equivalent to
WAIT FOR LSN(max(lsn0, lsn1, lsn2)) TIMEOUT (max(time1, time2))

As you can see, even though grammatically lsn1 is grouped with time1 and
lsn2 is grouped with time2, both timeouts that we specified are not
connected to their respective LSN-s, and instead they kinda act like
global timeouts. Therefore, I didn't see a point in keeping TIMEOUT
necessarily grammatically connected to LSN.

In the new syntax our statement would look like this:
WAIT FOR LSN lsn0, LSN lsn1, LSN lsn2, TIMEOUT time1, TIMEOUT time2;
TIMEOUT-s are not forced to be grouped with LSN-s anymore, which makes
it more clear that all specified TIMEOUTs will be global and will apply
to all LSN-s at once.

The point of having TIMEOUT is still to let us limit the time of waiting
for LSNs. It's just that with the new syntax, we can also use TIMEOUT
without an LSN. You are right, such a case is equivalent to pg_sleep.
One way to avoid that is to prohibit waiting for TIMEOUT without
specifying an LSN. Do you think we should do that?

I think specifying multiple LSNs/TIMEOUTs is kind of ridiculous. We
can assume that client application is smart enough to calculate
minimum/maximum on its side. When multiple LSNs/TIMEOUTs are
specified, what should we wait for? Reaching all the LSNs? Reaching
any of LSNs? Are timeouts independent from LSNs or sticked together?
So if we didn't manage to reach LSN1 in TIMEOUT1, then we don't wait
for LSN2 with TIMEOUT2 (or not)?

I think that now we would be fine with single LSN and single TIMEOUT.
In future we may add multiple LSNs/TIMEOUTs or/and support for
expressions as LSNs/TIMEOUTs if we figure out it's necessary.

I also think it's good to couple waiting for lsn with beginning of
transaction is good idea. Separate WAIT FOR LSN statement called in
the middle of transaction looks problematic for me. Imagine we have RR
isolation and already acquired the snapshot. Then out snapshot can
block applying wal records, which we are waiting for. That would be
implicit deadlock. It would be nice to evade such deadlocks by
design.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#63Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Anna Akenteva (#61)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-03 21:51, Anna Akenteva wrote:

I did some code cleanup and added tests - both for the standalone WAIT
FOR statement and for WAIT FOR as a part of BEGIN. The new patch is
attached.

I did more cleanup and code optimization on waiting events on latch.
And rebase patch.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

begin_waitfor_v5.patchtext/x-diff; name=begin_waitfor_v5.patchDownload
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index ab71176cdf..6b4e45bc68 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY update             SYSTEM "update.sgml">
 <!ENTITY vacuum             SYSTEM "vacuum.sgml">
 <!ENTITY values             SYSTEM "values.sgml">
+<!ENTITY wait               SYSTEM "wait.sgml">
 
 <!-- applications and utilities -->
 <!ENTITY clusterdb          SYSTEM "clusterdb.sgml">
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e7..cfee3c8f10 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<phrase>where <replaceable class="parameter">wait_for_event</replaceable> is:</phrase>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, ...]
+
+<phrase>and <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN lsn_value
+    TIMEOUT number_of_milliseconds
+    timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d4177..0a2ea7e80b 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] <replaceable class="parameter">wait_for_event</replaceable>
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
     ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
     READ WRITE | READ ONLY
     [ NOT ] DEFERRABLE
+
+<phrase>where <replaceable class="parameter">wait_for_event</replaceable> is:</phrase>
+    WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, ...]
+
+<phrase>and <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN lsn_value
+    TIMEOUT number_of_milliseconds
+    timestamp
 </synopsis>
  </refsynopsisdiv>
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 0000000000..26cae3ad85
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,146 @@
+<!--
+doc/src/sgml/ref/wait.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-wait">
+ <indexterm zone="sql-wait">
+  <primary>WAIT FOR</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>WAIT FOR</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>WAIT FOR</refname>
+  <refpurpose>wait for the target <acronym>LSN</acronym> to be replayed or for specified time to pass</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+WAIT FOR [ANY | ALL] <replaceable class="parameter">event</replaceable> [, ...]
+
+<phrase>where <replaceable class="parameter">event</replaceable> is one of:</phrase>
+    LSN value
+    TIMEOUT number_of_milliseconds
+    timestamp
+
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>'
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>
+WAIT FOR LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ALL LSN '<replaceable class="parameter">lsn_number</replaceable>' TIMEOUT <replaceable class="parameter">wait_timeout</replaceable>, TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+WAIT FOR ANY LSN '<replaceable class="parameter">lsn_number</replaceable>', TIMESTAMP <replaceable class="parameter">wait_time</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>WAIT FOR</command> provides a simple interprocess communication
+   mechanism to wait for timestamp, timeout or the target log sequence number
+   (<acronym>LSN</acronym>) on standby in <productname>PostgreSQL</productname>
+   databases with master-standby asynchronous replication. When run with the
+   <replaceable>LSN</replaceable> option, the <command>WAIT FOR</command>
+   command waits for the specified <acronym>LSN</acronym> to be replayed.
+   If no timestamp or timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">LSN</replaceable></term>
+    <listitem>
+     <para>
+      Specify the target log sequence number to wait for.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>TIMEOUT <replaceable class="parameter">wait_timeout</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time interval to wait for the LSN to be replayed.
+      The specified <replaceable>wait_timeout</replaceable> must be an integer
+      and is measured in milliseconds.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>UNTIL TIMESTAMP <replaceable class="parameter">wait_time</replaceable></term>
+    <listitem>
+     <para>
+      Limit the time to wait for the LSN to be replayed.
+      The specified <replaceable>wait_time</replaceable> must be timestamp.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Run <literal>WAIT FOR</literal> from <application>psql</application>,
+   limiting wait time to 10000 milliseconds:
+
+<screen>
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 10000;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-------------
+ f
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Wait until the specified <acronym>LSN</acronym> is replayed:
+<screen>
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-------------
+ t
+(1 row)
+</screen>
+  </para>
+
+  <para>
+   Limit <acronym>LSN</acronym> wait time to 500000 milliseconds,
+   and then cancel the command if <acronym>LSN</acronym> was not reached:
+<screen>
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 500000;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-------------
+ f
+(1 row)
+</screen>
+</para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>WAIT FOR</command> statement in the SQL
+   standard.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d25a77b13c..dbd40f66ed 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
    &update;
    &vacuum;
    &values;
+   &wait;
 
  </reference>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abf954ba39..d2856c8894 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7332,6 +7333,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce6..9b310926c1 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 0000000000..ac9905e632
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,394 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  time passing or LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include <math.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shared memory array */
+static void AddEvent(XLogRecPtr lsn_to_wait);
+static void DeleteEvent(void);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	XLogRecPtr	waited_lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static volatile WaitState *state;
+
+/* Add the event of the current backend to the shared memory array */
+static void
+AddEvent(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->waited_lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn)
+		state->min_lsn = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared memory array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ * Check:
+ * 1) nomal|smart|fast|immediate stop
+ * 2) SIGKILL and SIGTERM
+ */
+static void
+DeleteEvent(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete = state->waited_lsn[MyBackendId];
+
+	state->waited_lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+
+	/* If we need to choose the next min_lsn, update state->min_lsn */
+	if (state->min_lsn == lsn_to_delete)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->waited_lsn[i] != InvalidXLogRecPtr &&
+				state->waited_lsn[i] < state->min_lsn)
+				state->min_lsn = state->waited_lsn[i];
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->waited_lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, waited_lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->waited_lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->waited_lsn[i] != 0)
+		{
+			if (backend && state->waited_lsn[i] <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+int
+WaitUtility(XLogRecPtr lsn, const float8 secs)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	uint		res = 0;
+
+#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
+	endtime = GetNowFloat() + secs;
+
+latch_events = WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	if (lsn != InvalidXLogRecPtr)
+	{
+		/* Just check if we reached */
+		if (lsn < cur_lsn || secs < 0)
+			return (lsn < cur_lsn);
+
+		latch_events |= WL_LATCH_SET;
+		AddEvent(lsn);
+	}
+	else if (!secs)
+		return 1;
+
+	for (;;)
+	{
+		int			rc;
+		float8		delay = 0;
+		long		delay_ms;
+
+		if (secs > 0)
+			delay = endtime - GetNowFloat();
+		else if (secs == 0)
+			/*
+			* If we wait forever, then 1 minute timeout to check
+			* for Interupts.
+			*/
+			delay = 60;
+
+		if (delay > 0.0)
+			delay_ms = (long) ceil(delay * 1000.0);
+		else
+			break;
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			if (lsn != InvalidXLogRecPtr)
+				DeleteEvent();
+			ProcessInterrupts();
+		}
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, delay_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+
+		if (lsn && rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (lsn && lsn <= cur_lsn)
+			break;
+	}
+
+	if (lsn != InvalidXLogRecPtr)
+		DeleteEvent();
+
+	if (lsn != InvalidXLogRecPtr && lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		res = 1;
+
+	return res;
+}
+
+/*
+ * Get the amount of seconds left till the specified time.
+ */
+float8
+WaitTimeResolve(Const *time)
+{
+	int			ret;
+	float8		val;
+	Oid			types[] = { time->consttype };
+	Datum		values[] = { time->constvalue };
+	char		nulls[] = { " " };
+	Datum		result;
+	bool		isnull;
+
+	SPI_connect();
+
+	if (time->consttype == 1083)
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()::time))",
+									1, types, values, nulls, true, 0);
+	else if (time->consttype == 1266)
+		ret = SPI_execute_with_args("select extract (epoch from (timezone('UTC',$1)::time - timezone('UTC', now()::timetz)::time))",
+									1, types, values, nulls, true, 0);
+	else
+		ret = SPI_execute_with_args("select extract (epoch from ($1 - now()))",
+									1, types, values, nulls, true, 0);
+
+	Assert(ret >= 0);
+	result = SPI_getbinval(SPI_tuptable->vals[0],
+						   SPI_tuptable->tupdesc,
+						   1, &isnull);
+
+	Assert(!isnull);
+	val = DatumGetFloat8(result);
+
+	elog(INFO, "time: %f", val);
+
+	SPI_finish();
+	return val;
+}
+
+/* Implementation of WAIT FOR */
+int
+WaitMain(WaitStmt *stmt, DestReceiver *dest)
+{
+	ListCell   *events;
+	TupleDesc	tupdesc;
+	TupOutputState *tstate;
+	float8		delay = 0;
+	float8		final_delay = 0;
+	XLogRecPtr	lsn = InvalidXLogRecPtr;
+	XLogRecPtr	final_lsn = InvalidXLogRecPtr;
+	bool		has_lsn = false;
+	bool		wait_forever = true;
+	int			res = 0;
+
+	if (stmt->strategy == WAIT_FOR_ANY)
+	{
+		/* Prepare to find minimum LSN and delay */
+		final_delay = DBL_MAX;
+		final_lsn = PG_UINT64_MAX;
+	}
+
+	/* Extract options from the statement node tree */
+	foreach(events, stmt->events)
+	{
+		WaitStmt   *event = (WaitStmt *) lfirst(events);
+
+		/* LSN to wait for */
+		if (event->lsn)
+		{
+			has_lsn = true;
+			lsn = DatumGetLSN(
+						DirectFunctionCall1(pg_lsn_in,
+							CStringGetDatum(event->lsn)));
+
+			/*
+			 * When waiting for ALL, select max LSN to wait for.
+			 * When waiting for ANY, select min LSN to wait for.
+			 */
+			if ((stmt->strategy == WAIT_FOR_ALL && final_lsn <= lsn) ||
+				(stmt->strategy == WAIT_FOR_ANY && final_lsn > lsn))
+			{
+				final_lsn = lsn;
+			}
+		}
+
+		/* Time delay to wait for */
+		if (event->time || event->delay)
+		{
+			wait_forever = false;
+
+			if (event->delay)
+				delay = event->delay / 1000.0;
+
+			if (event->time)
+			{
+				Const *time = (Const *) event->time;
+				delay = WaitTimeResolve(time);
+			}
+
+			/*
+			 * When waiting for ALL, select max delay to wait for.
+			 * When waiting for ANY, select min delay to wait for.
+			 */
+			if ((stmt->strategy == WAIT_FOR_ALL && final_delay <= delay) ||
+				(stmt->strategy == WAIT_FOR_ANY && final_delay > delay))
+			{
+				final_delay = delay;
+			}
+		}
+	}
+	if (wait_forever)
+		final_delay = 0;
+	if (!has_lsn)
+		final_lsn = InvalidXLogRecPtr;
+
+	res = WaitUtility(final_lsn, final_delay);
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, res?"t":"f");
+	end_tup_output(tstate);
+	return res;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f4aecdcbcd..4c2c2735e2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2762,6 +2762,18 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_INT_FIELD(delay);
+	WRITE_NODE_FIELD(events);
+	WRITE_NODE_FIELD(time);
+	WRITE_ENUM_FIELD(strategy, WaitForStrategy);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4308,6 +4320,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842..08e2649b9d 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -78,6 +78,7 @@ static Query *transformCreateTableAsStmt(ParseState *pstate,
 										 CreateTableAsStmt *stmt);
 static Query *transformCallStmt(ParseState *pstate,
 								CallStmt *stmt);
+static void transformWaitForStmt(ParseState *pstate, WaitStmt *stmt);
 static void transformLockingClause(ParseState *pstate, Query *qry,
 								   LockingClause *lc, bool pushedDown);
 #ifdef RAW_EXPRESSION_COVERAGE_TEST
@@ -326,7 +327,20 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
-
+		case T_WaitStmt:
+			transformWaitForStmt(pstate, (WaitStmt *) parseTree);
+			result = makeNode(Query);
+			result->commandType = CMD_UTILITY;
+			result->utilityStmt = (Node *) parseTree;
+			break;
+		case T_TransactionStmt:
+			{
+				TransactionStmt *stmt = (TransactionStmt *) parseTree;
+				if ((stmt->kind == TRANS_STMT_BEGIN ||
+						stmt->kind == TRANS_STMT_START) && stmt->wait)
+					transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);
+			}
+			/* no break here - we want to fall through to the default */
 		default:
 
 			/*
@@ -2981,6 +2995,23 @@ applyLockingClause(Query *qry, Index rtindex,
 	qry->rowMarks = lappend(qry->rowMarks, rc);
 }
 
+/*
+ * transformWaitForStmt -
+ *	transform the WAIT FOR clause of the BEGIN statement
+ *	transform the WAIT FOR statement (TODO: remove this line if we don't keep it)
+ */
+static void
+transformWaitForStmt(ParseState *pstate, WaitStmt *stmt)
+{
+	ListCell   *events;
+
+	foreach(events, stmt->events)
+	{
+		WaitStmt   *event = (WaitStmt *) lfirst(events);
+		event->time = transformExpr(pstate, event->time, EXPR_KIND_OTHER);
+	}
+}
+
 /*
  * Coverage testing for raw_expression_tree_walker().
  *
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3449c26bd1..917bda51ac 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,7 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		SecLabelStmt SelectStmt TransactionStmt TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
 		VariableResetStmt VariableSetStmt VariableShowStmt
-		ViewStmt CheckPointStmt CreateConversionStmt
+		ViewStmt WaitStmt CheckPointStmt CreateConversionStmt
 		DeallocateStmt PrepareStmt ExecuteStmt
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
@@ -488,7 +488,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <list>	row explicit_row implicit_row type_list array_expr_list
 %type <node>	case_expr case_arg when_clause case_default
 %type <list>	when_clause_list
-%type <ival>	sub_type opt_materialized
+%type <ival>	sub_type wait_strategy opt_materialized
 %type <value>	NumericOnly
 %type <list>	NumericOnly_list
 %type <alias>	alias_clause opt_alias_clause
@@ -592,6 +592,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <list>		wait_list
+%type <node>		WaitEvent wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -661,7 +663,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -692,7 +694,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -702,7 +704,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -956,6 +959,7 @@ stmt :
 			| VariableSetStmt
 			| VariableShowStmt
 			| ViewStmt
+			| WaitStmt
 			| /*EMPTY*/
 				{ $$ = NULL; }
 		;
@@ -9946,18 +9950,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14187,6 +14193,74 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY:
+ *				WAIT FOR <event> [, <event> ...]
+ *				event is one of:
+ *					LSN value
+ *					TIMEOUT delay
+ *					timestamp
+ *
+ *****************************************************************************/
+WaitStmt:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			;
+wait_for:
+			WAIT FOR wait_strategy wait_list
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->strategy = $3;
+					n->events = $4;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; };
+
+wait_strategy:
+			ALL					{ $$ = WAIT_FOR_ALL; }
+			| ANY				{ $$ = WAIT_FOR_ANY; }
+			| /* EMPTY */		{ $$ = WAIT_FOR_ALL; }
+		;
+
+wait_list:
+			WaitEvent					{ $$ = list_make1($1); }
+			| wait_list ',' WaitEvent	{ $$ = lappend($1, $3); }
+			| wait_list WaitEvent		{ $$ = lappend($1, $2); }
+		;
+
+WaitEvent:
+			LSN Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = $2;
+					n->delay = 0;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+			| TIMEOUT Iconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = NULL;
+					n->delay = $2;
+					n->time = NULL;
+					$$ = (Node *)n;
+				}
+			| ConstDatetime Sconst
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = NULL;
+					n->delay = 0;
+					n->time = makeStringConstCast($2, @2, $1);
+					$$ = (Node *)n;
+				}
+			;
+
 
 /*
  * Aggregate decoration clauses
@@ -15338,6 +15412,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15465,6 +15540,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15491,6 +15567,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..bb8af34980 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in shared memory for WAIT
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d0..700a13f05b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -268,6 +273,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_LoadStmt:
 		case T_PrepareStmt:
 		case T_UnlistenStmt:
+		case T_WaitStmt:
 		case T_VariableSetStmt:
 			{
 				/*
@@ -591,6 +597,11 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+
+							/* If needed to WAIT FOR something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
@@ -1062,6 +1073,13 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				break;
 			}
 
+		case T_WaitStmt:
+			{
+				WaitStmt   *stmt = (WaitStmt *) parsetree;
+				WaitMain(stmt, dest);
+				break;
+			}
+
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(pstate, pstmt, queryString,
@@ -2718,6 +2736,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3357,6 +3379,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 0000000000..0270160d44
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2016, Regents of PostgresPRO
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const float8 delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern float8 WaitTimeResolve(Const *time);
+extern int WaitMain(WaitStmt *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..348de76c5f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -488,6 +488,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cd6f1be643..bfed954d21 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3058,6 +3058,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* WAIT clause: list of events to wait for */
 } TransactionStmt;
 
 /* ----------------------
@@ -3567,4 +3568,26 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WAIT FOR Statement + WAIT FOR clause of BEGIN statement
+ *		TODO: if we only pick one, remove the other
+ * ----------------------
+ */
+
+typedef enum WaitForStrategy
+{
+	WAIT_FOR_ANY = 0,
+	WAIT_FOR_ALL
+} WaitForStrategy;
+
+typedef struct WaitStmt
+{
+	NodeTag		type;
+	WaitForStrategy strategy;
+	List	   *events;		/* used as a pointer to the next WAIT event */
+	char	   *lsn;		/* WAIT FOR LSN */
+	int			delay;		/* WAIT FOR TIMEOUT */
+	Node	   *time;		/* WAIT FOR TIMESTAMP or TIME */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 08f22ce211..6e1848fe4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -410,6 +411,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -450,6 +452,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 8ef0f55e74..430bb5c717 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/020_begin_wait.pl b/src/test/recovery/t/020_begin_wait.pl
new file mode 100644
index 0000000000..9bec57ee8f
--- /dev/null
+++ b/src/test/recovery/t/020_begin_wait.pl
@@ -0,0 +1,145 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a streaming standby with a 1 second delay from the backup
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+
+# Make sure that WAIT FOR LSN works: add new content to master and memorize
+# master's new LSN, then wait for master's LSN on standby. Prove that WAIT is
+# able to setup an infinite waiting loop and exit it if given no wait timeout.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_standby->safe_psql('postgres', "BEGIN WAIT FOR LSN '$lsn1'");
+
+# Get the current LSN on standby and make sure it's the same as master's LSN
+my $lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn1'::pg_lsn)");
+ok($compare_lsns eq 0, "standby reached the same LSN as master after WAIT");
+
+
+
+# Check that timeouts work on their own and let us wait for specified time (1s)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $one_second = 1000; # in milliseconds
+my $start_time = time();
+# While we're at it, also make sure that the syntax with commas works fine and
+# that by default we use WAIT FOR ALL strategy, which means waiting for max time
+$node_standby->safe_psql('postgres',
+	"WAIT FOR TIMEOUT $one_second, TIMESTAMP '$current_time'");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $one_second, "WAIT FOR TIMEOUT waits for enough time");
+
+# Now, check that timeouts work as expected when waiting for LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn2' TIMEOUT 1");
+ok($reached_lsn eq "f", "WAIT doesn't reach LSN if given too little wait time");
+
+
+#===============================================================================
+# TODO: remove this test if we remove the standalone "WAIT FOR" command
+#===============================================================================
+# We need to check that WAIT works fine inside transactions. For that, let's
+# get two LSNs that will correspond to two different max values in our table.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(31, 40))");
+my $lsn3 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(41, 50))");
+my $lsn4 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Before starting transaction, wait for LSN which ensures a max value of 40.
+# Inside the transaction, wait for LSN that ensures a max value of 50.
+# Due to ISOLATION LEVEL REPEATABLE READ, we should NOT see the new max value.
+my $standby_results = $node_standby->safe_psql(
+	'postgres', qq[
+	BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ WAIT FOR LSN '$lsn3';
+	SELECT max(a) FROM wait_test;
+	BEGIN WAIT FOR LSN '$lsn4';
+	SELECT pg_last_wal_replay_lsn();
+	SELECT max(a) FROM wait_test;
+	COMMIT;
+]);
+
+# Make sure that we indeed reach master's last LSN inside the transaction.
+# For that, check that calling pg_last_wal_replay_lsn returned that LSN.
+my $last_lsn_reached = $standby_results =~ /$lsn4/;
+ok($last_lsn_reached, "WAIT FOR LSN works inside a transaction");
+
+# Check that transaction doesn't break and show us the new max value after WAIT.
+# For that, make sure that the older max value is repeated twice in the results.
+my $count = () = $standby_results =~ /40/g;
+ok($count eq 2, "transaction isolation level doesn't get broken due to WAIT");
+
+
+
+# Get multiple LSNs for testing WAIT FOR ANY / WAIT FOR ALL
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(51, 60))");
+my $lsn5 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 70000))");
+my $lsn6 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 800000))");
+my $lsn7 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Check that WAIT FOR ANY works fine
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR ANY LSN '$lsn5' LSN '$lsn6' LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn5'::pg_lsn)");
+ok($compare_lsns ge 0,
+	"WAIT FOR ANY makes us reach at least the minimum LSN from the list");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+# TODO: Could this somehow fail due to the machine being very fast at applying LSN?
+ok($compare_lsns lt 0,
+	"WAIT FOR ANY didn't make us reach the maximum LSN from the list");
+
+# Check that WAIT FOR ALL works fine
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR ALL LSN '$lsn5', LSN '$lsn6', LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"WAIT FOR ALL makes us reach the maximum LSN from the list");
+
+
+
+$node_standby->stop;
+$node_master->stop;
diff --git a/src/test/recovery/t/021_wait.pl b/src/test/recovery/t/021_wait.pl
new file mode 100644
index 0000000000..c270e78574
--- /dev/null
+++ b/src/test/recovery/t/021_wait.pl
@@ -0,0 +1,144 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a streaming standby with a 1 second delay from the backup
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+
+# Make sure that WAIT FOR LSN works: add new content to master and memorize
+# master's new LSN, then wait for master's LSN on standby. Prove that WAIT is
+# able to setup an infinite waiting loop and exit it if given no wait timeout.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_standby->safe_psql('postgres', "WAIT FOR LSN '$lsn1'");
+
+# Get the current LSN on standby and make sure it's the same as master's LSN
+my $lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn1'::pg_lsn)");
+ok($compare_lsns eq 0, "standby reached the same LSN as master after WAIT");
+
+
+
+# Check that timeouts work on their own and let us wait for specified time (1s)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $one_second = 1000; # in milliseconds
+my $start_time = time();
+# While we're at it, also make sure that the syntax with commas works fine and
+# that by default we use WAIT FOR ALL strategy, which means waiting for max time
+$node_standby->safe_psql('postgres',
+	"WAIT FOR TIMEOUT $one_second, TIMESTAMP '$current_time'");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $one_second, "WAIT FOR TIMEOUT waits for enough time");
+
+# Now, check that timeouts work as expected when waiting for LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"WAIT FOR LSN '$lsn2' TIMEOUT 1");
+ok($reached_lsn eq "f", "WAIT doesn't reach LSN if given too little wait time");
+
+
+
+# We need to check that WAIT works fine inside transactions. For that, let's
+# get two LSNs that will correspond to two different max values in our table.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(31, 40))");
+my $lsn3 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(41, 50))");
+my $lsn4 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Before starting transaction, wait for LSN which ensures a max value of 40.
+# Inside the transaction, wait for LSN that ensures a max value of 50.
+# Due to ISOLATION LEVEL REPEATABLE READ, we should NOT see the new max value.
+my $standby_results = $node_standby->safe_psql(
+	'postgres', qq[
+	WAIT FOR LSN '$lsn3';
+	BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+	SELECT max(a) FROM wait_test;
+	WAIT FOR LSN '$lsn4';
+	SELECT pg_last_wal_replay_lsn();
+	SELECT max(a) FROM wait_test;
+	COMMIT;
+]);
+
+# Make sure that we indeed reach master's last LSN inside the transaction.
+# For that, check that calling pg_last_wal_replay_lsn returned that LSN.
+my $last_lsn_reached = $standby_results =~ /$lsn4/;
+ok($last_lsn_reached, "WAIT FOR LSN works inside a transaction");
+
+# Check that transaction doesn't break and show us the new max value after WAIT.
+# For that, make sure that the older max value is repeated twice in the results.
+my $count = () = $standby_results =~ /40/g;
+ok($count eq 2, "transaction isolation level doesn't get broken due to WAIT");
+
+
+
+# Get multiple LSNs for testing WAIT FOR ANY / WAIT FOR ALL
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(51, 60))");
+my $lsn5 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 70000))");
+my $lsn6 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 800000))");
+my $lsn7 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Check that WAIT FOR ANY works fine
+$node_standby->safe_psql('postgres',
+	"WAIT FOR ANY LSN '$lsn5' LSN '$lsn6' LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn5'::pg_lsn)");
+ok($compare_lsns ge 0,
+	"WAIT FOR ANY makes us reach at least the minimum LSN from the list");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+# TODO: Could this somehow fail due to the machine being very fast at applying LSN?
+ok($compare_lsns lt 0,
+	"WAIT FOR ANY didn't make us reach the maximum LSN from the list");
+
+# Check that WAIT FOR ALL works fine
+$node_standby->safe_psql('postgres',
+	"WAIT FOR ALL LSN '$lsn5', LSN '$lsn6', LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"WAIT FOR ALL makes us reach the maximum LSN from the list");
+
+
+
+$node_standby->stop;
+$node_master->stop;
#64Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Alexander Korotkov (#62)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-04 03:14, Alexander Korotkov wrote:

I think that now we would be fine with single LSN and single TIMEOUT.
In future we may add multiple LSNs/TIMEOUTs or/and support for
expressions as LSNs/TIMEOUTs if we figure out it's necessary.

I also think it's good to couple waiting for lsn with beginning of
transaction is good idea. Separate WAIT FOR LSN statement called in
the middle of transaction looks problematic for me. Imagine we have RR
isolation and already acquired the snapshot. Then out snapshot can
block applying wal records, which we are waiting for. That would be
implicit deadlock. It would be nice to evade such deadlocks by
design.

Ok, here is a new version of patch with single LSN and TIMEOUT.

Synopsis
==========
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [WAIT FOR LSN
'lsn' [ TIMEOUT 'value']]
and
START TRANSACTION [ transaction_mode [, ...] ] [WAIT FOR LSN 'lsn' [
TIMEOUT 'value']]
where lsn is result of pg_current_wal_flush_lsn on master.
and value is uint time interval in milliseconds.
Description
==========
BEGIN/START...WAIT FOR - pause the start of transaction until a
specified LSN has
been replayed. (Don’t open transaction if lsn is not reached on
timeout).

How to use it
==========
WAIT FOR LSN ‘LSN’ [, timeout in ms];

# Before starting transaction, wait until LSN 0/84832E8 is replayed.
Wait time is
not limited here because a timeout was not specified
BEGIN WAIT FOR LSN '0/84832E8';

# Before starting transaction, wait until LSN 0/84832E8 is replayed.
Limit the wait
time with 10 seconds, and if LSN is not reached by then, don't start the
transaction.
START TRANSACTION WAIT FOR LSN '0/8DFFB88' TIMEOUT 10000;

# Same as previous, but with transaction isolation level = REPEATABLE
READ
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ WAIT FOR LSN
'0/815C0F1' TIMEOUT 10000;

Notice: WAIT FOR will release on PostmasterDeath or Interruption events
if they come earlier than LSN or timeout.

Testing the implementation
======================
The implementation was tested with src/test/recovery/t/020_begin_wait.pl

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

begin_waitfor_v6.patchtext/x-diff; name=begin_waitfor_v6.patchDownload
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..7a71769cd8f 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -63,6 +63,16 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
    <xref linkend="sql-set-transaction"/>
    was executed.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with master-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be an
+   integer. Waiting can be interrupted using <literal>Ctrl+C</literal>, or by
+   shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -146,6 +156,10 @@ BEGIN;
    different purpose in embedded SQL. You are advised to be careful
    about the transaction semantics when porting database applications.
   </para>
+
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
  </refsect1>
 
  <refsect1>
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..f5412c2ca7b 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -40,6 +40,16 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    characteristics, as if <xref linkend="sql-set-transaction"/> was executed. This is the same
    as the <xref linkend="sql-begin"/> command.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with master-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be an
+   integer. Waiting can be interrupted using <literal>Ctrl+C</literal>, or by
+   shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -78,6 +88,10 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    omitted.
   </para>
 
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
+
   <para>
    See also the compatibility section of <xref linkend="sql-set-transaction"/>.
   </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abf954ba392..d2856c88943 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7332,6 +7333,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..66245d43882
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,279 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  time passing or LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include <math.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shared memory array */
+static void AddEvent(XLogRecPtr lsn_to_wait);
+static void DeleteEvent(void);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	XLogRecPtr	waited_lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static volatile WaitState *state;
+
+/* Add the event of the current backend to the shared memory array */
+static void
+AddEvent(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->waited_lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn)
+		state->min_lsn = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete event of the current backend from the shared memory array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ * Check:
+ * 1) nomal|smart|fast|immediate stop
+ * 2) SIGKILL and SIGTERM
+ */
+static void
+DeleteEvent(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete = state->waited_lsn[MyBackendId];
+
+	state->waited_lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+
+	/* If we need to choose the next min_lsn, update state->min_lsn */
+	if (state->min_lsn == lsn_to_delete)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->waited_lsn[i] != InvalidXLogRecPtr &&
+				state->waited_lsn[i] < state->min_lsn)
+				state->min_lsn = state->waited_lsn[i];
+	}
+
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->waited_lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, waited_lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/* Init array of events in shared memory */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->waited_lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int 		backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+		if (state->waited_lsn[i] != 0)
+		{
+			if (backend && state->waited_lsn[i] <= cur_lsn)
+				SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/* Get minimal LSN that will be next */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use MyLatch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ */
+int
+WaitUtility(XLogRecPtr lsn, const float8 secs)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	uint		res = 0;
+
+#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
+	endtime = GetNowFloat() + secs;
+
+latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Just check if we reached */
+	if (lsn <= cur_lsn)
+		return (lsn <= cur_lsn);
+
+	AddEvent(lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		delay = 0;
+		long		delay_ms;
+
+		if (secs > 0)
+			delay = endtime - GetNowFloat();
+		else if (secs == 0)
+			/*
+			* If we wait forever, then 1 minute timeout to check
+			* for Interupts.
+			*/
+			delay = 60;
+
+		if (delay > 0.0)
+			delay_ms = (long) ceil(delay * 1000.0);
+		else
+			break;
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			DeleteEvent();
+			ProcessInterrupts();
+		}
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, delay_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteEvent();
+
+	if (lsn > cur_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		res = 1;
+
+	return res;
+}
+
+/* Implementation of WAIT FOR */
+int
+WaitMain(WaitStmt *stmt, DestReceiver *dest)
+{
+	TupleDesc	tupdesc;
+	TupOutputState *tstate;
+	int			res = 0;
+
+	res = WaitUtility(DatumGetLSN(
+				  DirectFunctionCall1(pg_lsn_in,CStringGetDatum(stmt->lsn))),
+				  (float8)stmt->delay/1000);
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send it */
+	do_text_output_oneline(tstate, res?"t":"f");
+	end_tup_output(tstate);
+	return res;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f4aecdcbcda..b3160eb204a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2762,6 +2762,28 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outWaitStmt(StringInfo str, const WaitStmt *node)
+{
+	WRITE_NODE_TYPE("WAITSTMT");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_UINT_FIELD(delay);
+}
+
+static void
+_outTransactionStmt(StringInfo str, const TransactionStmt *node)
+{
+	WRITE_NODE_TYPE("TRANSACTIONSTMT");
+
+	WRITE_STRING_FIELD(savepoint_name);
+	WRITE_STRING_FIELD(gid);
+	WRITE_NODE_FIELD(options);
+	WRITE_BOOL_FIELD(chain);
+	WRITE_ENUM_FIELD(kind, TransactionStmtKind);
+	WRITE_NODE_FIELD(wait);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4308,6 +4330,12 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_WaitStmt:
+				_outWaitStmt(str, obj);
+				break;
+			case T_TransactionStmt:
+				_outTransactionStmt(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6676412842b..8eba11c6221 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -326,7 +326,6 @@ transformStmt(ParseState *pstate, Node *parseTree)
 			result = transformCallStmt(pstate,
 									   (CallStmt *) parseTree);
 			break;
-
 		default:
 
 			/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3449c26bd11..156878d8f73 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -592,6 +592,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <ival>		wait_time
+%type <node>		wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -661,7 +663,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -692,7 +694,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -702,7 +704,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -9946,18 +9949,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14187,6 +14192,31 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*****************************************************************************
+ *
+ *		SUBQUERY:
+ *				WAIT FOR <event>
+ *				event is one of:
+ *					LSN value TIMEOUT delay
+ *					TIMEOUT delay
+ *
+ *****************************************************************************/
+wait_for:
+			WAIT FOR LSN Sconst wait_time
+				{
+					WaitStmt *n = makeNode(WaitStmt);
+					n->lsn = $4;
+					n->delay = $5;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; }
+		;
+
+wait_time:
+			TIMEOUT Iconst		{ $$ = $2; }
+			| /* EMPTY */		{ $$ = 0; }
+		;
+
 
 /*
  * Aggregate decoration clauses
@@ -15338,6 +15368,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15465,6 +15496,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15491,6 +15523,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..bb8af349808 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in shared memory for WAIT
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d01..7345513de55 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -15,6 +15,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
+#include <float.h>
 
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -57,6 +58,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +72,9 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+#include "utils/pg_lsn.h"
 
 /* Hook for plugins to get control in ProcessUtility() */
 ProcessUtility_hook_type ProcessUtility_hook = NULL;
@@ -591,6 +596,11 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+
+							/* If needed to WAIT FOR something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
@@ -2718,6 +2728,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_NOTIFY;
 			break;
 
+		case T_WaitStmt:
+			tag = CMDTAG_WAIT;
+			break;
+
 		case T_ListenStmt:
 			tag = CMDTAG_LISTEN;
 			break;
@@ -3357,6 +3371,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
+		case T_WaitStmt:
+			lev = LOGSTMT_ALL;
+			break;
+
 		case T_ListenStmt:
 			lev = LOGSTMT_ALL;
 			break;
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..e612eb6138c
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group       
+ * Portions Copyright (c) 2020, Regents of PostgresPro 
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const float8 delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern int WaitMain(WaitStmt *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8ccb..348de76c5f4 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -488,6 +488,7 @@ typedef enum NodeTag
 	T_DropReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
+	T_WaitStmt,
 	T_SQLCmd,
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cd6f1be6435..306b2ef4df9 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3058,6 +3058,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* WAIT clause: list of events to wait for */
 } TransactionStmt;
 
 /* ----------------------
@@ -3567,4 +3568,17 @@ typedef struct DropSubscriptionStmt
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 } DropSubscriptionStmt;
 
+/* ----------------------
+ *		WAIT FOR Statement + WAIT FOR clause of BEGIN statement
+ *		TODO: if we only pick one, remove the other
+ * ----------------------
+ */
+
+typedef struct WaitStmt
+{
+	NodeTag		type;
+	char	   *lsn;		/* WAIT FOR LSN */
+	uint			delay;		/* WAIT FOR TIMESTAMP or TIME */
+} WaitStmt;
+
 #endif							/* PARSENODES_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 08f22ce211d..6e1848fe4cc 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -410,6 +411,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -450,6 +452,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 8ef0f55e748..430bb5c7171 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -216,3 +216,4 @@ PG_CMDTAG(CMDTAG_TRUNCATE_TABLE, "TRUNCATE TABLE", false, false, false)
 PG_CMDTAG(CMDTAG_UNLISTEN, "UNLISTEN", false, false, false)
 PG_CMDTAG(CMDTAG_UPDATE, "UPDATE", false, false, true)
 PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)
+PG_CMDTAG(CMDTAG_WAIT, "WAIT FOR", false, false, false)
diff --git a/src/test/recovery/t/020_begin_wait.pl b/src/test/recovery/t/020_begin_wait.pl
new file mode 100644
index 00000000000..033e65458bf
--- /dev/null
+++ b/src/test/recovery/t/020_begin_wait.pl
@@ -0,0 +1,121 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a streaming standby with a 1 second delay from the backup
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+
+# Make sure that WAIT FOR LSN works: add new content to master and memorize
+# master's new LSN, then wait for master's LSN on standby. Prove that WAIT is
+# able to setup an infinite waiting loop and exit it if given no wait timeout.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_standby->safe_psql('postgres', "BEGIN WAIT FOR LSN '$lsn1'");
+
+# Get the current LSN on standby and make sure it's the same as master's LSN
+my $lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn1'::pg_lsn)");
+ok($compare_lsns eq 0, "standby reached the same LSN as master after WAIT");
+
+
+
+# Check that timeouts work on their own and let us wait for specified time (1s)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $one_second = 1000; # in milliseconds
+my $start_time = time();
+# While we're at it, also make sure that the syntax with commas works fine and
+# that by default we use WAIT FOR ALL strategy, which means waiting for max time
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '0/FFFFFFFF' TIMEOUT $one_second");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $one_second, "WAIT FOR TIMEOUT waits for enough time");
+
+# Now, check that timeouts work as expected when waiting for LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn2' TIMEOUT 1");
+ok($reached_lsn eq "f", "WAIT doesn't reach LSN if given too little wait time");
+
+
+#===============================================================================
+# TODO: remove this test if we remove the standalone "WAIT FOR" command
+#===============================================================================
+# We need to check that WAIT works fine inside transactions. For that, let's
+# get two LSNs that will correspond to two different max values in our table.
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(31, 40))");
+my $lsn3 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(41, 50))");
+my $lsn4 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+
+# Get multiple LSNs for testing WAIT FOR ANY / WAIT FOR ALL
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(51, 60))");
+my $lsn5 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 70000))");
+my $lsn6 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(61, 800000))");
+my $lsn7 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Check that WAIT FOR works fine
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn5'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn5'::pg_lsn)");
+ok($compare_lsns ge 0,
+	"WAIT FOR ANY makes us reach at least the minimum LSN from the list");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+# TODO: Could this somehow fail due to the machine being very fast at applying LSN?
+ok($compare_lsns lt 0,
+	"WAIT FOR ANY didn't make us reach the maximum LSN from the list");
+
+# Check that WAIT FOR works fine
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn7'");
+$lsn_standby = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+$compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$lsn_standby'::pg_lsn, '$lsn7'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"WAIT FOR makes us reach the maximum LSN");
+
+
+
+$node_standby->stop;
+$node_master->stop;
#65Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Kartyshov Ivan (#64)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Apr 7, 2020 at 12:58 AM Kartyshov Ivan
<i.kartyshov@postgrespro.ru> wrote:

On 2020-04-04 03:14, Alexander Korotkov wrote:

I think that now we would be fine with single LSN and single TIMEOUT.
In future we may add multiple LSNs/TIMEOUTs or/and support for
expressions as LSNs/TIMEOUTs if we figure out it's necessary.

I also think it's good to couple waiting for lsn with beginning of
transaction is good idea. Separate WAIT FOR LSN statement called in
the middle of transaction looks problematic for me. Imagine we have RR
isolation and already acquired the snapshot. Then out snapshot can
block applying wal records, which we are waiting for. That would be
implicit deadlock. It would be nice to evade such deadlocks by
design.

Ok, here is a new version of patch with single LSN and TIMEOUT.

I think this quite small feature, which already received quite amount
of review. The last version is very pinched. But I think it would be
good to commit some very basic version, which is at least some
progress in the area and could be extended in future. I'm going to
pass trough the code tomorrow and commit this unless I found major
issues or somebody objects.

------
Alexander Korotkov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#66Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Kartyshov Ivan (#64)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-07 00:58, Kartyshov Ivan wrote:

Ok, here is a new version of patch with single LSN and TIMEOUT.

I had a look at the code and did some more code cleanup, with Ivan's
permission.
This is what I did:
- Removed "WAIT FOR" command tag from cmdtaglist.h and renamed WaitStmt
to WaitClause (since there's no standalone WAIT FOR command anymore)
- Added _copyWaitClause() and _equalWaitClause()
- Removed unused #include-s from utility.c
- Adjusted tests and documentation
- Fixed/added some code comments

I have a couple of questions about WaitUtility() though:
- When waiting forever (due to not specifying a timeout), isn't 60
seconds too long of an interval to check for interrupts?
- If we did specify a timeout, it might be a very long one. In this
case, shouldn't we also make sure to wake up sometimes to check for
interrupts?
- Is it OK that specifying timeout = 0 (BEGIN WAIT FOR LSN ... TIMEOUT
0) is the same as not specifying timeout at all?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

Attachments:

begin_waitfor_v7.patchtext/x-diff; name=begin_waitfor_v7.patchDownload
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..19a33d7d8fb 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -63,6 +63,17 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
    <xref linkend="sql-set-transaction"/>
    was executed.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with master-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -146,6 +157,10 @@ BEGIN;
    different purpose in embedded SQL. You are advised to be careful
    about the transaction semantics when porting database applications.
   </para>
+
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
  </refsect1>
 
  <refsect1>
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..c9f70d2709a 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -40,6 +40,17 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    characteristics, as if <xref linkend="sql-set-transaction"/> was executed. This is the same
    as the <xref linkend="sql-begin"/> command.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with master-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -78,6 +89,10 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    omitted.
   </para>
 
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
+
   <para>
    See also the compatibility section of <xref linkend="sql-set-transaction"/>.
   </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abf954ba392..d2856c88943 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7332,6 +7333,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..d7dbf3b725f
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,291 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <float.h>
+#include <math.h>
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlogdefs.h"
+#include "access/xlog.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shared memory array */
+static void AddEvent(XLogRecPtr lsn_to_wait);
+static void DeleteEvent(void);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	XLogRecPtr	min_lsn;
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static volatile WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddEvent(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn)
+		state->min_lsn = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ * Check:
+ * 1) nomal|smart|fast|immediate stop
+ * 2) SIGKILL and SIGTERM
+ */
+static void
+DeleteEvent(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete = state->lsn[MyBackendId];
+
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	SpinLockAcquire(&state->mutex);
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (state->min_lsn == lsn_to_delete)
+	{
+		state->min_lsn = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+				state->lsn[i] < state->min_lsn)
+				state->min_lsn = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn = PG_UINT64_MAX;
+	}
+}
+
+/*
+ * Set latches in shared memory to signal that new LSN has been replayed
+ */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int			backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+	SpinLockRelease(&state->mutex);
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+
+		if (backend && state->lsn[i] != 0 &&
+			state->lsn[i] <= cur_lsn)
+		{
+			SetLatch(&backend->procLatch);
+		}
+	}
+}
+
+/*
+ * Get minimal LSN that someone waits for
+ */
+XLogRecPtr
+GetMinWait(void)
+{
+	return state->min_lsn;
+}
+
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ * Returns 1 if LSN was reached and 0 otherwise.
+ */
+int
+WaitUtility(XLogRecPtr target_lsn, const float8 secs)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	int			res = 0;
+
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+	endtime = GetNowFloat() + secs;
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Check if we already reached the needed LSN */
+	if (cur_lsn >= target_lsn)
+		return 1;
+
+	AddEvent(target_lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		delay = 0;
+		long		delay_ms;
+
+		if (secs > 0.0)
+			delay = endtime - GetNowFloat();
+		else
+			/* If we wait forever, use 1 min timeout to check for interrupts */
+			delay = 60;
+
+		if (delay > 0.0)
+			delay_ms = (long) ceil(delay * 1000.0);
+		else
+			break;
+
+		/*
+		 * If received an interruption from CHECK_FOR_INTERRUPTS,
+		 * then delete the current event from array.
+		 */
+		if (InterruptPending)
+		{
+			DeleteEvent();
+			ProcessInterrupts();
+		}
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, delay_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		/* If LSN has been replayed */
+		if (target_lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteEvent();
+
+	if (cur_lsn < target_lsn)
+		elog(NOTICE,"LSN is not reached. Try to increase wait time.");
+	else
+		res = 1;
+
+	return res;
+}
+
+/*
+ * Implementation of WAIT FOR
+ */
+int
+WaitMain(WaitClause *stmt, DestReceiver *dest)
+{
+	TupleDesc	tupdesc;
+	TupOutputState *tstate;
+	int			res = 0;
+
+	res = WaitUtility(DatumGetLSN(
+				  DirectFunctionCall1(pg_lsn_in,CStringGetDatum(stmt->lsn))),
+				  stmt->timeout / 1000.0);
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send the result */
+	do_text_output_oneline(tstate, res ? "t" : "f");
+	end_tup_output(tstate);
+	return res;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f9d86859ee7..b00c772b71e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3741,10 +3741,22 @@ _copyTransactionStmt(const TransactionStmt *from)
 	COPY_STRING_FIELD(savepoint_name);
 	COPY_STRING_FIELD(gid);
 	COPY_SCALAR_FIELD(chain);
+	COPY_NODE_FIELD(wait);
 
 	return newnode;
 }
 
+static WaitClause *
+_copyWaitClause(const WaitClause *from)
+{
+	WaitClause *newnode = makeNode(WaitClause);
+
+	COPY_STRING_FIELD(lsn);
+	COPY_SCALAR_FIELD(timeout);
+
+	return newnode;
+};
+
 static CompositeTypeStmt *
 _copyCompositeTypeStmt(const CompositeTypeStmt *from)
 {
@@ -5332,6 +5344,9 @@ copyObjectImpl(const void *from)
 		case T_TransactionStmt:
 			retval = _copyTransactionStmt(from);
 			break;
+		case T_WaitClause:
+			retval = _copyWaitClause(from);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _copyCompositeTypeStmt(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e8e781834a5..c1b622ea301 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1539,6 +1539,16 @@ _equalTransactionStmt(const TransactionStmt *a, const TransactionStmt *b)
 	COMPARE_STRING_FIELD(savepoint_name);
 	COMPARE_STRING_FIELD(gid);
 	COMPARE_SCALAR_FIELD(chain);
+	COMPARE_NODE_FIELD(wait);
+
+	return true;
+}
+
+static bool
+_equalWaitClause(const WaitClause *a, const WaitClause *b)
+{
+	COMPARE_STRING_FIELD(lsn);
+	COMPARE_SCALAR_FIELD(timeout);
 
 	return true;
 }
@@ -3389,6 +3399,9 @@ equal(const void *a, const void *b)
 		case T_TransactionStmt:
 			retval = _equalTransactionStmt(a, b);
 			break;
+		case T_WaitClause:
+			retval = _equalWaitClause(a, b);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _equalCompositeTypeStmt(a, b);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 35ed8c0d538..8eb54d4ab05 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2778,6 +2778,28 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outTransactionStmt(StringInfo str, const TransactionStmt *node)
+{
+	WRITE_NODE_TYPE("TRANSACTIONSTMT");
+
+	WRITE_STRING_FIELD(savepoint_name);
+	WRITE_STRING_FIELD(gid);
+	WRITE_NODE_FIELD(options);
+	WRITE_BOOL_FIELD(chain);
+	WRITE_ENUM_FIELD(kind, TransactionStmtKind);
+	WRITE_NODE_FIELD(wait);
+}
+
+static void
+_outWaitClause(StringInfo str, const WaitClause *node)
+{
+	WRITE_NODE_TYPE("WAITCLAUSE");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_UINT_FIELD(timeout);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4327,6 +4349,12 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_TransactionStmt:
+				_outTransactionStmt(str, obj);
+				break;
+			case T_WaitClause:
+				_outWaitClause(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3449c26bd11..a2f40285c6e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -592,6 +592,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <ival>		wait_time
+%type <node>		wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -661,7 +663,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -692,7 +694,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -702,7 +704,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -9946,18 +9949,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14187,6 +14192,25 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*
+ * WAIT FOR clause of BEGIN and START TRANSACTION statements
+ */
+wait_for:
+			WAIT FOR LSN Sconst wait_time
+				{
+					WaitClause *n = makeNode(WaitClause);
+					n->lsn = $4;
+					n->timeout = $5;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; }
+		;
+
+wait_time:
+			TIMEOUT Iconst		{ $$ = $2; }
+			| /* EMPTY */		{ $$ = 0; }
+		;
+
 
 /*
  * Aggregate decoration clauses
@@ -15338,6 +15362,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15465,6 +15490,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15491,6 +15517,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..bb8af349808 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of Latches in shared memory for WAIT
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d01..e0ca38106d4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -57,6 +57,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -591,6 +592,11 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitClause   *waitstmt = (WaitClause *) stmt->wait;
+
+							/* If needed to WAIT FOR something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..4df5ed5638f
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+
+extern int WaitUtility(XLogRecPtr lsn, const float8 delay);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWait(void);
+extern int WaitMain(WaitClause *stmt, DestReceiver *dest);
+
+#endif   /* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 50b1ba51863..c37663a28bd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -492,6 +492,7 @@ typedef enum NodeTag
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
+	T_WaitClause,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cd6f1be6435..30f5d7c0916 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1430,6 +1430,17 @@ typedef struct OnConflictClause
 	int			location;		/* token location, or -1 if unknown */
 } OnConflictClause;
 
+/*
+ * WaitClause -
+ *		representation of WAIT FOR clause for BEGIN and START TRANSACTION.
+ */
+typedef struct WaitClause
+{
+	NodeTag		type;
+	char	   *lsn;			/* LSN to wait for */
+	int			timeout;		/* Number of milliseconds to limit wait time */
+} WaitClause;
+
 /*
  * CommonTableExpr -
  *	   representation of WITH list element
@@ -3058,6 +3069,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node		*wait;			/* WAIT FOR clause */
 } TransactionStmt;
 
 /* ----------------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 08f22ce211d..6e1848fe4cc 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -410,6 +411,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -450,6 +452,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/test/recovery/t/020_begin_wait.pl b/src/test/recovery/t/020_begin_wait.pl
new file mode 100644
index 00000000000..da18ac3761c
--- /dev/null
+++ b/src/test/recovery/t/020_begin_wait.pl
@@ -0,0 +1,71 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Using the backup, create a streaming standby with a 1 second delay
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+# Check that timeouts make us wait for the specified time (1s here)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $one_second = 1000; # in milliseconds
+my $start_time = time();
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '0/FFFFFFFF' TIMEOUT $one_second");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $one_second, "WAIT FOR TIMEOUT waits for enough time");
+
+
+# Check that timeouts let us stop waiting right away, before reaching target LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn1' TIMEOUT 1");
+ok($reached_lsn eq "f", "WAIT doesn't reach LSN if given too little wait time");
+
+
+# Check that WAIT FOR works fine and reaches target LSN if given no timeout
+
+# Add data on master, memorize master's last LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Wait for it to appear on replica, memorize replica's last LSN
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn2'");
+$reached_lsn = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+
+# Make sure that master's and replica's LSNs are the same after WAIT
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"standby reached the same LSN as master before starting transaction");
+
+
+$node_standby->stop;
+$node_master->stop;
#67Amit Kapila
amit.kapila16@gmail.com
In reply to: Anna Akenteva (#66)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Apr 7, 2020 at 7:56 AM Anna Akenteva <a.akenteva@postgrespro.ru> wrote:

On 2020-04-07 00:58, Kartyshov Ivan wrote:

Ok, here is a new version of patch with single LSN and TIMEOUT.

I had a look at the code and did some more code cleanup, with Ivan's
permission.
This is what I did:
- Removed "WAIT FOR" command tag from cmdtaglist.h and renamed WaitStmt
to WaitClause (since there's no standalone WAIT FOR command anymore)
- Added _copyWaitClause() and _equalWaitClause()
- Removed unused #include-s from utility.c
- Adjusted tests and documentation
- Fixed/added some code comments

I have a couple of questions about WaitUtility() though:
- When waiting forever (due to not specifying a timeout), isn't 60
seconds too long of an interval to check for interrupts?
- If we did specify a timeout, it might be a very long one. In this
case, shouldn't we also make sure to wake up sometimes to check for
interrupts?

Right, we should probably wait for 100ms before checking the
interrupts. See the similar logic in pg_promote where we wait for
specified number of seconds.

- Is it OK that specifying timeout = 0 (BEGIN WAIT FOR LSN ... TIMEOUT
0) is the same as not specifying timeout at all?

Yes that sounds reasonable to me.

Review comments:
--------------------------
1.
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ *
+ * TODO: Consider state cleanup on backend failure.
+ * Check:
+ * 1) nomal|smart|fast|immediate stop
+ * 2) SIGKILL and SIGTERM
+ */
+static void
+DeleteEvent(void)

I don't see how this is implemented or called to handle any errors.
For example in function WaitUtility if the WaitLatch errors out due to
any error, then the event won't be deleted. I think we can't assume
WaitLatch or any other code in this code path will never error out.
For ex. WaitLatch---->WaitEventSetWaitBlock() can error out. Also, in
future we can add more code which can error out.

2.
+ /*
+ * If received an interruption from CHECK_FOR_INTERRUPTS,
+ * then delete the current event from array.
+ */
+ if (InterruptPending)
+ {
+ DeleteEvent();
+ ProcessInterrupts();
+ }

We generally do this type of handling via CHECK_FOR_INTERRUPTS. One
reason is that it behaves slightly differently in Windows. I am not
sure why we want to do differently here? This looks quite adhoc to me
and may not be correct. If we handle this event in error path, then
we might not need to do some special handling.

3.
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens.
+ * Returns 1 if LSN was reached and 0 otherwise.
+ */
+int
+WaitUtility(XLogRecPtr target_lsn, const float8 secs)

Isn't it better to have a return value as bool? IOW, why this
function need int as its return value?

4.
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)

This same define is used elsewhere in the code as well, may be we can
define it in some central place and use it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#68Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Korotkov (#65)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Apr 7, 2020 at 5:56 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Tue, Apr 7, 2020 at 12:58 AM Kartyshov Ivan
<i.kartyshov@postgrespro.ru> wrote:

On 2020-04-04 03:14, Alexander Korotkov wrote:

I think that now we would be fine with single LSN and single TIMEOUT.
In future we may add multiple LSNs/TIMEOUTs or/and support for
expressions as LSNs/TIMEOUTs if we figure out it's necessary.

I also think it's good to couple waiting for lsn with beginning of
transaction is good idea. Separate WAIT FOR LSN statement called in
the middle of transaction looks problematic for me. Imagine we have RR
isolation and already acquired the snapshot. Then out snapshot can
block applying wal records, which we are waiting for. That would be
implicit deadlock. It would be nice to evade such deadlocks by
design.

Ok, here is a new version of patch with single LSN and TIMEOUT.

I think this quite small feature, which already received quite amount
of review. The last version is very pinched. But I think it would be
good to commit some very basic version, which is at least some
progress in the area and could be extended in future. I'm going to
pass trough the code tomorrow and commit this unless I found major
issues or somebody objects.

I have gone through this thread and skimmed through the patch and I am
not sure if we can say that this patch is ready to go. First, I don't
think we have a consensus on the syntax being used in the patch
(various people didn't agree to LSN specific syntax). They wanted a
more generic syntax and I see that we tried to implement it and it
turns out to be a bit complex but that doesn't mean we just give up on
the idea and take the simplest approach and that too without a broader
agreement. Second, on my quick review, it seems there are a few
things like error handling, interrupt checking which need more work.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#69Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Amit Kapila (#68)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-07 13:32, Amit Kapila wrote:

First, I don't
think we have a consensus on the syntax being used in the patch
(various people didn't agree to LSN specific syntax). They wanted a
more generic syntax and I see that we tried to implement it and it
turns out to be a bit complex but that doesn't mean we just give up on
the idea and take the simplest approach and that too without a broader
agreement.

Thank you for your comments!

Initially, the syntax used to be "WAITLSN", which confined us with only
waiting for LSN-s and not anything else. So we switched to "WAIT FOR
LSN", which would allow us to add variations like "WAIT FOR XID" or
"WAIT FOR COMMIT TOKEN" in the future if we wanted. A few people seemed
to imply that this kind of syntax is expandable enough:

On 2018-02-01 14:47, Simon Riggs wrote:

I agree that WAIT LSN is good syntax because this allows us to wait
for something else in future.

On 2017-10-31 12:42:56, Ants Aasma wrote:

For lack of a better proposal I would like something along the lines
of:
WAIT FOR state_id[, state_id] [ OPTIONS (..)]

As for giving up waiting for multiple events: we can only wait for LSN-s
at the moment, and there seems to be no point in waiting for multiple
LSN-s at once, because it's equivalent to waiting for the biggest LSN.
So we opted for simpler grammar for now, only letting the user specify
one LSN and one TIMEOUT. If in the future we allow waiting for something
else, like XID-s, we can expand the grammar as needed.

What are your own thoughts on the syntax?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

#70Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Anna Akenteva (#69)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Apr 7, 2020 at 3:07 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote:

On 2017-10-31 12:42:56, Ants Aasma wrote:

For lack of a better proposal I would like something along the lines
of:
WAIT FOR state_id[, state_id] [ OPTIONS (..)]

As for giving up waiting for multiple events: we can only wait for LSN-s
at the moment, and there seems to be no point in waiting for multiple
LSN-s at once, because it's equivalent to waiting for the biggest LSN.
So we opted for simpler grammar for now, only letting the user specify
one LSN and one TIMEOUT. If in the future we allow waiting for something
else, like XID-s, we can expand the grammar as needed.

+1
In the latest version of patch we have very brief and simple syntax
allowing to wait for given LSN with given timeout. In future we can
expand this syntax in different ways. I don't see that current syntax
is limiting us from something.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Anna Akenteva (#69)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Apr 7, 2020 at 5:37 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote:

On 2020-04-07 13:32, Amit Kapila wrote:

First, I don't
think we have a consensus on the syntax being used in the patch
(various people didn't agree to LSN specific syntax). They wanted a
more generic syntax and I see that we tried to implement it and it
turns out to be a bit complex but that doesn't mean we just give up on
the idea and take the simplest approach and that too without a broader
agreement.

Thank you for your comments!

Initially, the syntax used to be "WAITLSN", which confined us with only
waiting for LSN-s and not anything else. So we switched to "WAIT FOR
LSN", which would allow us to add variations like "WAIT FOR XID" or
"WAIT FOR COMMIT TOKEN" in the future if we wanted. A few people seemed
to imply that this kind of syntax is expandable enough:

On 2018-02-01 14:47, Simon Riggs wrote:

I agree that WAIT LSN is good syntax because this allows us to wait
for something else in future.

On 2017-10-31 12:42:56, Ants Aasma wrote:

For lack of a better proposal I would like something along the lines
of:
WAIT FOR state_id[, state_id] [ OPTIONS (..)]

As for giving up waiting for multiple events: we can only wait for LSN-s
at the moment, and there seems to be no point in waiting for multiple
LSN-s at once, because it's equivalent to waiting for the biggest LSN.
So we opted for simpler grammar for now, only letting the user specify
one LSN and one TIMEOUT. If in the future we allow waiting for something
else, like XID-s, we can expand the grammar as needed.

What are your own thoughts on the syntax?

I don't know how users can specify the LSN value but OTOH I could see
if users can somehow provide the correct value of commit LSN for which
they want to wait, then it could work out. It is possible that I
misread and we have a consensus on WAIT FOR LSN [option] because I saw
what Simon and Ants have proposed includes multiple state/events and
it might be fine to have just one event for now.

Anyone else wants to share an opinion on syntax?

I think even if we are good with syntax, I could see the code is not
completely ready to go as mentioned in few comments raised by me. I
am not sure if we want to commit it in the current form and then
improve after feature freeze. If it is possible to fix the loose ends
quickly and there are no more comments by anyone then probably we
might be able to commit it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#72Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Amit Kapila (#67)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-07 12:58, Amit Kapila wrote:

Review comments:
1.
+static void
+DeleteEvent(void)
I don't see how this is implemented or called to handle any errors.

2.
+ if (InterruptPending)
+ {
+ DeleteEvent();
+ ProcessInterrupts();
+ }
We generally do this type of handling via CHECK_FOR_INTERRUPTS.
3.
+int
+WaitUtility(XLogRecPtr target_lsn, const float8 secs)
Isn't it better to have a return value as bool?

4.
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
This same define is used elsewhere in the code as well, may be we can
define it in some central place and use it.

Thank you for your review!
Ivan and I have worked on the patch and tried to address your comments:

0. Now we wake up at least every 100ms to check for interrupts.
1. Now we call DeleteWaitedLSN() from
ProcessInterrupts()=>LockErrorCleanup(). It seems that we can only exit
the WAIT cycle improperly due to interrupts, so this should be enough
(?)
2. Now we use CHECK_FOR_INTERRUPTS() instead of ProcessInterrupts()
3. Now WaitUtility() returns bool rather than int
4. Now GetNowFloat() is only defined at one place in the code

What we changed additionally:
- Prohibited using WAIT FOR LSN on master
- Added more tests
- Checked the code with pgindent and adjusted pgindent/typedefs.list
- Changed min_lsn's type to pg_atomic_uint64 + fixed how we work with
mutex
- Code cleanup in wait.[c|h]: cleaned up #include-s, gave better names
to functions, changed elog() to ereport()

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

Attachments:

begin_waitfor_v8.patchtext/x-diff; name=begin_waitfor_v8.patchDownload
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..19a33d7d8fb 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -63,6 +63,17 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
    <xref linkend="sql-set-transaction"/>
    was executed.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with master-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -146,6 +157,10 @@ BEGIN;
    different purpose in embedded SQL. You are advised to be careful
    about the transaction semantics when porting database applications.
   </para>
+
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
  </refsect1>
 
  <refsect1>
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..c9f70d2709a 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -40,6 +40,17 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    characteristics, as if <xref linkend="sql-set-transaction"/> was executed. This is the same
    as the <xref linkend="sql-begin"/> command.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with master-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -78,6 +89,10 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    omitted.
   </para>
 
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
+
   <para>
    See also the compatibility section of <xref linkend="sql-set-transaction"/>.
   </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abf954ba392..4c7eb0cb219 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7332,6 +7333,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..e1123df321e
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,282 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(&state->mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+				state->lsn[i] < state->min_lsn.value)
+				state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn.value = PG_UINT64_MAX;
+	}
+}
+
+/*
+ * Set latches in shared memory to signal that new LSN has been replayed
+ */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int			backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+
+		if (backend && state->lsn[i] != 0 &&
+			state->lsn[i] <= cur_lsn)
+		{
+			SetLatch(&backend->procLatch);
+		}
+	}
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Get minimal LSN that someone waits for
+ */
+XLogRecPtr
+GetMinWaitedLSN(void)
+{
+	return state->min_lsn.value;
+}
+
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens. Timeout is specified in milliseconds.
+ * Returns true if LSN was reached and false otherwise.
+ */
+bool
+WaitUtility(XLogRecPtr target_lsn, const int timeout_ms)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	bool		res = false;
+	bool		wait_forever = (timeout_ms <= 0);
+
+	endtime = GetNowFloat() + timeout_ms / 1000.0;
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Check if we already reached the needed LSN */
+	if (cur_lsn >= target_lsn)
+		return true;
+
+	AddWaitedLSN(target_lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		time_left = 0;
+		long		time_left_ms = 0;
+
+		time_left = endtime - GetNowFloat();
+
+		/* Use 100 ms as the default timeout to check for interrupts */
+		if (wait_forever || time_left < 0 || time_left > 0.1)
+			time_left_ms = 100;
+		else
+			time_left_ms = (long) ceil(time_left * 1000.0);
+
+		/* If interrupt, LockErrorCleanup() will do DeleteWaitedLSN() for us */
+		CHECK_FOR_INTERRUPTS();
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, time_left_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		if (rc & WL_TIMEOUT)
+		{
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+			/* If the time specified by user has passed, stop waiting */
+			time_left = endtime - GetNowFloat();
+			if (!wait_forever && time_left <= 0.0)
+				break;
+		}
+
+		/* If LSN has been replayed */
+		if (target_lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteWaitedLSN();
+
+	if (cur_lsn < target_lsn)
+		ereport(WARNING,
+				(errcode(ERRCODE_NO_ACTIVE_SQL_TRANSACTION),
+				 errmsg("didn't start transaction because LSN was not reached"),
+				 errhint("Try to increase wait time.")));
+	else
+		res = true;
+
+	return res;
+}
+
+/*
+ * Implementation of WAIT FOR
+ */
+int
+WaitMain(WaitClause *stmt, DestReceiver *dest)
+{
+	TupleDesc	tupdesc;
+	TupOutputState *tstate;
+	XLogRecPtr	target_lsn;
+	bool		res = false;
+
+	target_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in,
+												 CStringGetDatum(stmt->lsn)));
+	res = WaitUtility(target_lsn, stmt->timeout);
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send the result */
+	do_text_output_oneline(tstate, res ? "t" : "f");
+	end_tup_output(tstate);
+	return res;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f9d86859ee7..b00c772b71e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3741,10 +3741,22 @@ _copyTransactionStmt(const TransactionStmt *from)
 	COPY_STRING_FIELD(savepoint_name);
 	COPY_STRING_FIELD(gid);
 	COPY_SCALAR_FIELD(chain);
+	COPY_NODE_FIELD(wait);
 
 	return newnode;
 }
 
+static WaitClause *
+_copyWaitClause(const WaitClause *from)
+{
+	WaitClause *newnode = makeNode(WaitClause);
+
+	COPY_STRING_FIELD(lsn);
+	COPY_SCALAR_FIELD(timeout);
+
+	return newnode;
+};
+
 static CompositeTypeStmt *
 _copyCompositeTypeStmt(const CompositeTypeStmt *from)
 {
@@ -5332,6 +5344,9 @@ copyObjectImpl(const void *from)
 		case T_TransactionStmt:
 			retval = _copyTransactionStmt(from);
 			break;
+		case T_WaitClause:
+			retval = _copyWaitClause(from);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _copyCompositeTypeStmt(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e8e781834a5..c1b622ea301 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1539,6 +1539,16 @@ _equalTransactionStmt(const TransactionStmt *a, const TransactionStmt *b)
 	COMPARE_STRING_FIELD(savepoint_name);
 	COMPARE_STRING_FIELD(gid);
 	COMPARE_SCALAR_FIELD(chain);
+	COMPARE_NODE_FIELD(wait);
+
+	return true;
+}
+
+static bool
+_equalWaitClause(const WaitClause *a, const WaitClause *b)
+{
+	COMPARE_STRING_FIELD(lsn);
+	COMPARE_SCALAR_FIELD(timeout);
 
 	return true;
 }
@@ -3389,6 +3399,9 @@ equal(const void *a, const void *b)
 		case T_TransactionStmt:
 			retval = _equalTransactionStmt(a, b);
 			break;
+		case T_WaitClause:
+			retval = _equalWaitClause(a, b);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _equalCompositeTypeStmt(a, b);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 35ed8c0d538..8eb54d4ab05 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2778,6 +2778,28 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outTransactionStmt(StringInfo str, const TransactionStmt *node)
+{
+	WRITE_NODE_TYPE("TRANSACTIONSTMT");
+
+	WRITE_STRING_FIELD(savepoint_name);
+	WRITE_STRING_FIELD(gid);
+	WRITE_NODE_FIELD(options);
+	WRITE_BOOL_FIELD(chain);
+	WRITE_ENUM_FIELD(kind, TransactionStmtKind);
+	WRITE_NODE_FIELD(wait);
+}
+
+static void
+_outWaitClause(StringInfo str, const WaitClause *node)
+{
+	WRITE_NODE_TYPE("WAITCLAUSE");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_UINT_FIELD(timeout);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4327,6 +4349,12 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_TransactionStmt:
+				_outTransactionStmt(str, obj);
+				break;
+			case T_WaitClause:
+				_outWaitClause(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3449c26bd11..a2f40285c6e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -592,6 +592,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <ival>		wait_time
+%type <node>		wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -661,7 +663,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -692,7 +694,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -702,7 +704,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -9946,18 +9949,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14187,6 +14192,25 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*
+ * WAIT FOR clause of BEGIN and START TRANSACTION statements
+ */
+wait_for:
+			WAIT FOR LSN Sconst wait_time
+				{
+					WaitClause *n = makeNode(WaitClause);
+					n->lsn = $4;
+					n->timeout = $5;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; }
+		;
+
+wait_time:
+			TIMEOUT Iconst		{ $$ = $2; }
+			| /* EMPTY */		{ $$ = 0; }
+		;
+
 
 /*
  * Aggregate decoration clauses
@@ -15338,6 +15362,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15465,6 +15490,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15491,6 +15517,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..2dcfdde5f3f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the WAIT FOR clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 9938cddb570..a7887bd98e2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -717,6 +718,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If BEGIN WAIT FOR LSN was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d01..59c041d8507 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -57,6 +57,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -591,6 +592,18 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitClause *waitstmt = (WaitClause *) stmt->wait;
+
+							/* WAIT FOR cannot be used on master */
+							if (stmt->wait && !RecoveryInProgress())
+								ereport(ERROR,
+										(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+										 errmsg("WAIT FOR can only be "
+												"used on standby")));
+
+							/* If needed to WAIT FOR something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index ee340fb0f02..03f997cba70 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -372,8 +372,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
-
 	endtime = GetNowFloat() + secs;
 
 	for (;;)
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..d08ee574ed3
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+#include "nodes/parsenodes.h"
+
+extern bool WaitUtility(XLogRecPtr lsn, const int timeout_ms);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitedLSN(void);
+extern int	WaitMain(WaitClause *stmt, DestReceiver *dest);
+extern void DeleteWaitedLSN(void);
+
+#endif							/* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 50b1ba51863..c37663a28bd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -492,6 +492,7 @@ typedef enum NodeTag
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
+	T_WaitClause,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cd6f1be6435..2d0aad8df98 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1430,6 +1430,17 @@ typedef struct OnConflictClause
 	int			location;		/* token location, or -1 if unknown */
 } OnConflictClause;
 
+/*
+ * WaitClause -
+ *		representation of WAIT FOR clause for BEGIN and START TRANSACTION.
+ */
+typedef struct WaitClause
+{
+	NodeTag		type;
+	char	   *lsn;			/* LSN to wait for */
+	int			timeout;		/* Number of milliseconds to limit wait time */
+} WaitClause;
+
 /*
  * CommonTableExpr -
  *	   representation of WITH list element
@@ -3058,6 +3069,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node	   *wait;			/* WAIT FOR clause */
 } TransactionStmt;
 
 /* ----------------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 08f22ce211d..6e1848fe4cc 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -410,6 +411,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -450,6 +452,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 03a1de569f0..eaeeb79c411 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -109,4 +109,6 @@ extern int	date2isoyearday(int year, int mon, int mday);
 
 extern bool TimestampTimestampTzRequiresRewrite(void);
 
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+
 #endif							/* TIMESTAMP_H */
diff --git a/src/test/recovery/t/020_begin_wait.pl b/src/test/recovery/t/020_begin_wait.pl
new file mode 100644
index 00000000000..da4bfb4ef32
--- /dev/null
+++ b/src/test/recovery/t/020_begin_wait.pl
@@ -0,0 +1,85 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+
+# And some content and take a backup
+$node_master->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Using the backup, create a streaming standby with a 1 second delay
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+# Check that timeouts make us wait for the specified time (1s here)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $two_seconds = 2000; # in milliseconds
+my $start_time = time();
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '0/FFFFFFFF' TIMEOUT $two_seconds");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $two_seconds, "WAIT FOR TIMEOUT waits for enough time");
+
+
+# Check that timeouts let us stop waiting right away, before reaching target LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my ($ret, $out, $err) = $node_standby->psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn1' TIMEOUT 1");
+
+ok($ret == 0, "zero return value when failed to WAIT FOR LSN on standby");
+ok($err =~ /WARNING:  didn't start transaction because LSN was not reached/,
+	"correct error message when failed to WAIT FOR LSN on standby");
+ok($out eq "f", "if given too little wait time, WAIT doesn't reach target LSN");
+
+
+# Check that WAIT FOR works fine and reaches target LSN if given no timeout
+
+# Add data on master, memorize master's last LSN
+$node_master->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Wait for it to appear on replica, memorize replica's last LSN
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn2'");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+
+# Make sure that master's and replica's LSNs are the same after WAIT
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"standby reached the same LSN as master before starting transaction");
+
+
+# Make sure that it's not allowed to use WAIT FOR on master
+($ret, $out, $err) = $node_master->psql('postgres',
+	"BEGIN WAIT FOR LSN '0/FFFFFFFF'");
+
+ok($ret != 0, "non-zero return value when trying to WAIT FOR LSN on master");
+ok($err =~ /ERROR:  WAIT FOR can only be used on standby/,
+	"correct error message when trying to WAIT FOR LSN on master");
+ok($out eq '', "empty output when trying to WAIT FOR LSN on master");
+
+
+$node_standby->stop;
+$node_master->stop;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 34623523a70..a2d1b9defc2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2621,6 +2621,7 @@ WSABUF
 WSADATA
 WSANETWORKEVENTS
 WSAPROTOCOL_INFO
+WaitClause
 WaitEvent
 WaitEventActivity
 WaitEventClient
@@ -2629,6 +2630,7 @@ WaitEventIPC
 WaitEventSet
 WaitEventTimeout
 WaitPMResult
+WaitState
 WalCloseMethod
 WalLevel
 WalRcvData
#73Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Anna Akenteva (#72)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Apr 7, 2020 at 10:58 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote:

Thank you for your review!
Ivan and I have worked on the patch and tried to address your comments:

I've pushed this. I promise to do careful post-commit review and
resolve any issues arising.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#74Kartyshov Ivan
i.kartyshov@postgrespro.ru
In reply to: Alexander Korotkov (#73)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-08 00:27, Tom Lane wrote:

Alexander Korotkov <akorotkov@postgresql.org> writes:

» WAIT FOR LSN lsn [ TIMEOUT timeout ]

This seems like a really carelessly chosen syntax —- *three* new
keywords, when you probably didn't need any. Are you not aware that
there is distributed overhead in the grammar for every keyword?
Plus, each new keyword carries the risk of breaking existing
applications, since it no longer works as an alias-not-preceded-by-AS.

To avoid creating new keywords, we could change syntax in the following
way:
WAIT FOR => DEPENDS ON
LSN => EVENT
TIMEOUT => WITH INTERVAL

So
START TRANSACTION WAIT FOR LSN '0/3F07A6B1' TIMEOUT 5000;
would instead look as
START TRANSACTION DEPENDS ON EVENT '0/3F07A6B1' WITH INTERVAL '5
seconds';

[1]: /messages/by-id/28209.1586294824@sss.pgh.pa.us
/messages/by-id/28209.1586294824@sss.pgh.pa.us

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#75Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Kartyshov Ivan (#74)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Wed, Apr 8, 2020 at 2:14 AM Kartyshov Ivan
<i.kartyshov@postgrespro.ru> wrote:

On 2020-04-08 00:27, Tom Lane wrote:

Alexander Korotkov <akorotkov@postgresql.org> writes:

» WAIT FOR LSN lsn [ TIMEOUT timeout ]

This seems like a really carelessly chosen syntax —- *three* new
keywords, when you probably didn't need any. Are you not aware that
there is distributed overhead in the grammar for every keyword?
Plus, each new keyword carries the risk of breaking existing
applications, since it no longer works as an alias-not-preceded-by-AS.

To avoid creating new keywords, we could change syntax in the following
way:
WAIT FOR => DEPENDS ON

Looks OK for me.

LSN => EVENT

I think it's too generic. Not every event is lsn. TBH, lsn is not
event at all :)

I wonder is we can still use word lsn, but don't use keyword for that.
Can we take arbitrary non-quoted literal there and check it later?

TIMEOUT => WITH INTERVAL

I'm not yet sure about this. Probably there are better options.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#76Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alexander Korotkov (#75)
Re: [HACKERS] make async slave to wait for lsn to be replayed

At Wed, 8 Apr 2020 02:52:55 +0300, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote in

On Wed, Apr 8, 2020 at 2:14 AM Kartyshov Ivan
<i.kartyshov@postgrespro.ru> wrote:

On 2020-04-08 00:27, Tom Lane wrote:

Alexander Korotkov <akorotkov@postgresql.org> writes:

» WAIT FOR LSN lsn [ TIMEOUT timeout ]

This seems like a really carelessly chosen syntax —- *three* new
keywords, when you probably didn't need any. Are you not aware that
there is distributed overhead in the grammar for every keyword?
Plus, each new keyword carries the risk of breaking existing
applications, since it no longer works as an alias-not-preceded-by-AS.

To avoid creating new keywords, we could change syntax in the following
way:
WAIT FOR => DEPENDS ON

Looks OK for me.

LSN => EVENT

I think it's too generic. Not every event is lsn. TBH, lsn is not
event at all :)

I wonder is we can still use word lsn, but don't use keyword for that.
Can we take arbitrary non-quoted literal there and check it later?

TIMEOUT => WITH INTERVAL

I'm not yet sure about this. Probably there are better options.

How about something like the follows.

BEGIN AFTER ColId Sconst
BEGIN FOLOWING ColId Sconst

UNTIL <absolute time>;
LIMIT BY <interval>;
WITHIN Iconst;

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#77Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Kyotaro Horiguchi (#76)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-08 04:09, Kyotaro Horiguchi wrote:

How about something like the follows.

BEGIN AFTER ColId Sconst
BEGIN FOLOWING ColId Sconst

UNTIL <absolute time>;
LIMIT BY <interval>;
WITHIN Iconst;

regards.

I like your suggested keywords! I think that "AFTER" + "WITHIN" sound
the most natural. We could completely give up the LSN keyword for now.
The final command could look something like:

BEGIN AFTER ‘0/303EC60’ WITHIN '5 seconds';
or
BEGIN AFTER ‘0/303EC60’ WITHIN 5000;

I'd like to hear others' opinions on the syntax as well.

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

#78Tom Lane
tgl@sss.pgh.pa.us
In reply to: Anna Akenteva (#77)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Anna Akenteva <a.akenteva@postgrespro.ru> writes:

I'd like to hear others' opinions on the syntax as well.

Pardon me for coming very late to the party, but it seems like there are
other questions that ought to be answered before we worry about any of
this. Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place? It seems like it would be just as useful as a separate
command, if not more so. You could always start a transaction just
after waiting. Conversely, there might be reasons to want to wait
within an already-started transaction.

If it could survive as a separate command, then I'd humbly suggest
that it requires no grammar work at all. You could just invent one
or more functions that take suitable parameters.

regards, tom lane

#79Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#78)
Re: [HACKERS] make async slave to wait for lsn to be replayed

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Anna Akenteva <a.akenteva@postgrespro.ru> writes:

I'd like to hear others' opinions on the syntax as well.

Pardon me for coming very late to the party, but it seems like there are
other questions that ought to be answered before we worry about any of
this. Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place? It seems like it would be just as useful as a separate
command, if not more so. You could always start a transaction just
after waiting. Conversely, there might be reasons to want to wait
within an already-started transaction.

If it could survive as a separate command, then I'd humbly suggest
that it requires no grammar work at all. You could just invent one
or more functions that take suitable parameters.

The rationale for not being a fmgr function is stated in the following
comments.

/messages/by-id/CAEepm=0V74EApmfv=MGZa24Ac_pV1vGrp3Ovnv-3rUXwxu9epg@mail.gmail.com
| because it doesn't work for our 2 higher isolation levels as
| mentioned."

/messages/by-id/CA+Tgmob-aG3Lqh6OpvMDYTNR5eyq94VugyEejyk7pLhE9uwnyA@mail.gmail.com

| IMHO, trying to do this using a function-based interface is a really
| bad idea for exactly the reasons you mention. I don't see why we'd
| resist the idea of core syntax here; transactions are a core part of
| PostgreSQL.

It seemed to me that they were suggested it to be in a part of BEGIN
command, but the next proposed patch implemented "WAIT FOR" command
for uncertain reasons to me. I don't object to the isolate command if
it is useful than a part of BEGIN command.

By the way, for example, pg_current_wal_lsn() is a volatile function
and repeated calls within a SERIALIZABLE transaction can return
different values.

If there's no necessity for this feature to be a core command, I think
I would like to be it a function.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#80Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#79)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020/04/09 16:11, Kyotaro Horiguchi wrote:

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Anna Akenteva <a.akenteva@postgrespro.ru> writes:

I'd like to hear others' opinions on the syntax as well.

Pardon me for coming very late to the party, but it seems like there are
other questions that ought to be answered before we worry about any of
this. Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place? It seems like it would be just as useful as a separate
command, if not more so. You could always start a transaction just
after waiting. Conversely, there might be reasons to want to wait
within an already-started transaction.

If it could survive as a separate command, then I'd humbly suggest
that it requires no grammar work at all. You could just invent one
or more functions that take suitable parameters.

The rationale for not being a fmgr function is stated in the following
comments.

This issue happens because the function is executed after BEGIN? If yes,
what about executing the function (i.e., as separate transaction) before BEGIN?
If so, the snapshot taken in the function doesn't affect the subsequent
transaction whatever its isolation level is.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#81Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#80)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Fujii Masao <masao.fujii@oss.nttdata.com> writes:

On 2020/04/09 16:11, Kyotaro Horiguchi wrote:

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place?

The rationale for not being a fmgr function is stated in the following
comments. [...]

This issue happens because the function is executed after BEGIN? If yes,
what about executing the function (i.e., as separate transaction) before BEGIN?
If so, the snapshot taken in the function doesn't affect the subsequent
transaction whatever its isolation level is.

I wonder whether making it a procedure, rather than a plain function,
would help any.

regards, tom lane

#82Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Tom Lane (#81)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-09 16:33, Tom Lane wrote:

Fujii Masao <masao.fujii@oss.nttdata.com> writes:

On 2020/04/09 16:11, Kyotaro Horiguchi wrote:

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote in

Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place?

The rationale for not being a fmgr function is stated in the
following
comments. [...]

This issue happens because the function is executed after BEGIN? If
yes,
what about executing the function (i.e., as separate transaction)
before BEGIN?
If so, the snapshot taken in the function doesn't affect the
subsequent
transaction whatever its isolation level is.

I wonder whether making it a procedure, rather than a plain function,
would help any.

Just another idea in case if one will still decide to go with a separate
statement + BEGIN integration instead of a function. We could use
parenthesized options list here. This is already implemented for VACUUM,
REINDEX, etc. There was an idea to allow CONCURRENTLY in REINDEX there
[1]: /messages/by-id/aad2ec49-5142-7356-ffb2-a9b2649cdd1f@2ndquadrant.com
is much more extensible from the grammar perspective.

That way, the whole feature may look like:

WAIT (LSN '16/B374D848', TIMEOUT 100);

and/or

BEGIN
WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT);
...
COMMIT;

It requires only one reserved keyword 'WAIT'. The advantage of this
approach is that it can be extended to support xid, timestamp, csn or
anything else, that may be invented in the future, without affecting the
grammar.

What do you think?

Personally, I find this syntax to be more convenient and human-readable
compared with function call:

SELECT pg_wait_for_lsn('16/B374D848');
BEGIN;

[1]: /messages/by-id/aad2ec49-5142-7356-ffb2-a9b2649cdd1f@2ndquadrant.com
/messages/by-id/aad2ec49-5142-7356-ffb2-a9b2649cdd1f@2ndquadrant.com

[2]: /messages/by-id/20200401060334.GB142683@paquier.xyz
/messages/by-id/20200401060334.GB142683@paquier.xyz

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#83Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Alexey Kondratov (#82)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020/04/10 3:16, Alexey Kondratov wrote:

On 2020-04-09 16:33, Tom Lane wrote:

Fujii Masao <masao.fujii@oss.nttdata.com> writes:

On 2020/04/09 16:11, Kyotaro Horiguchi wrote:

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place?

The rationale for not being a fmgr function is stated in the following
comments. [...]

This issue happens because the function is executed after BEGIN? If yes,
what about executing the function (i.e., as separate transaction) before BEGIN?
If so, the snapshot taken in the function doesn't affect the subsequent
transaction whatever its isolation level is.

I wonder whether making it a procedure, rather than a plain function,
would help any.

Just another idea in case if one will still decide to go with a separate statement + BEGIN integration instead of a function. We could use parenthesized options list here. This is already implemented for VACUUM, REINDEX, etc. There was an idea to allow CONCURRENTLY in REINDEX there [1] and recently this was proposed again for new options [2], since it is much more extensible from the grammar perspective.

That way, the whole feature may look like:

WAIT (LSN '16/B374D848', TIMEOUT 100);

and/or

BEGIN
WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT);
...
COMMIT;

It requires only one reserved keyword 'WAIT'. The advantage of this approach is that it can be extended to support xid, timestamp, csn or anything else, that may be invented in the future, without affecting the grammar.

What do you think?

Personally, I find this syntax to be more convenient and human-readable compared with function call:

SELECT pg_wait_for_lsn('16/B374D848');
BEGIN;

I can imagine that some users want to specify the LSN to wait for,
from the result of another query, for example,
SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case,
isn't the function better?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#84Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Fujii Masao (#83)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-10 05:25, Fujii Masao wrote:

On 2020/04/10 3:16, Alexey Kondratov wrote:

Just another idea in case if one will still decide to go with a
separate statement + BEGIN integration instead of a function. We could
use parenthesized options list here. This is already implemented for
VACUUM, REINDEX, etc. There was an idea to allow CONCURRENTLY in
REINDEX there [1] and recently this was proposed again for new options
[2], since it is much more extensible from the grammar perspective.

That way, the whole feature may look like:

WAIT (LSN '16/B374D848', TIMEOUT 100);

and/or

BEGIN
WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT);
...
COMMIT;

It requires only one reserved keyword 'WAIT'. The advantage of this
approach is that it can be extended to support xid, timestamp, csn or
anything else, that may be invented in the future, without affecting
the grammar.

What do you think?

Personally, I find this syntax to be more convenient and
human-readable compared with function call:

SELECT pg_wait_for_lsn('16/B374D848');
BEGIN;

I can imagine that some users want to specify the LSN to wait for,
from the result of another query, for example,
SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case,
isn't the function better?

I think that the main purpose of the feature is to achieve
read-your-writes-consistency, while using async replica for reads. In
that case lsn of last modification is stored inside application, so
there is no need to do any query for that. Moreover, you cannot store
this lsn inside database, since reads are distributed across all
replicas (+ primary).

Thus, I could imagine that 'xxx' in your example states for some kind of
stored procedure, that fetches lsn from the off-postgres storage, but it
looks like very narrow case to count on it, doesn't it?

Anyway, I am not against implementing this as a function. That was just
another option to consider.

Just realized that the last patch I have seen does not allow usage of
wait on primary. It may be a problem if reads are pooled not only across
replicas, but on primary as well, which should be quite usual I guess.
In that case application does not know either request will be processed
on replica, or on primary. I think it should be allowed without any
warning, or just saying some LOG/DEBUG at most, that there was no
waiting performed.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#85Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#83)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hi,

On 2020-04-10 11:25:02 +0900, Fujii Masao wrote:

BEGIN
WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT);
...
COMMIT;

It requires only one reserved keyword 'WAIT'. The advantage of this approach is that it can be extended to support xid, timestamp, csn or anything else, that may be invented in the future, without affecting the grammar.

What do you think?

Personally, I find this syntax to be more convenient and human-readable compared with function call:

SELECT pg_wait_for_lsn('16/B374D848');
BEGIN;

I can imagine that some users want to specify the LSN to wait for,
from the result of another query, for example,
SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case,
isn't the function better?

I don't think a function is a good idea - it'll cause a snapshot to be
held while waiting. Which in turn will cause hot_standby_feedback to not
be able to report an increased xmin up. And it will possibly hit
snapshot recovery conflicts.

Whereas explicit syntax, especially if a transaction control statement,
won't have that problem.

I'd personally look at 'AFTER' instead of 'WAIT'. Upthread you talked
about a reserved keyword - why does it have to be reserved?

FWIW, I'm not really convinced there needs to be bespoke timeout syntax
for this feature. I can see reasons why you'd not just want to rely on
statement_timeout, but at least that should be discussed.

Greetings,

Andres Freund

#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#85)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Andres Freund <andres@anarazel.de> writes:

I don't think a function is a good idea - it'll cause a snapshot to be
held while waiting. Which in turn will cause hot_standby_feedback to not
be able to report an increased xmin up. And it will possibly hit
snapshot recovery conflicts.

Good point, but we could address that by making it a procedure no?

regards, tom lane

#87Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#86)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hi,

On 2020-04-10 16:29:39 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I don't think a function is a good idea - it'll cause a snapshot to be
held while waiting. Which in turn will cause hot_standby_feedback to not
be able to report an increased xmin up. And it will possibly hit
snapshot recovery conflicts.

Good point, but we could address that by making it a procedure no?

Probably. Don't think we have great infrastructure for builtin
procedures yet though? We'd presumably not want to use plpgsql.

ISTM that we can make it BEGIN AFTER 'xx/xx' or such, which'd not
require any keywords, it'd be easier to use than a procedure.

With a separate procedure, you'd likely need more roundtrips / complex
logic at the client. You either need to check first if the procedure
errored ou, and then send the BEGIN, or send both together and separate
out potential errors.

Greetings,

Andres Freund

#88Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#87)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Andres Freund <andres@anarazel.de> writes:

On 2020-04-10 16:29:39 -0400, Tom Lane wrote:

Good point, but we could address that by making it a procedure no?

Probably. Don't think we have great infrastructure for builtin
procedures yet though? We'd presumably not want to use plpgsql.

Don't think anyone's tried yet. It's not instantly clear that the
amount of code needed would be more than comes along with new
syntax, though.

ISTM that we can make it BEGIN AFTER 'xx/xx' or such, which'd not
require any keywords, it'd be easier to use than a procedure.

I still don't see a good argument for tying this to BEGIN. If it
has to be a statement, why not a standalone statement?

(I also have a lurking suspicion that this shouldn't be SQL at all
but part of the replication command set.)

regards, tom lane

#89Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#88)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hi,

On 2020-04-10 17:17:10 -0400, Tom Lane wrote:

ISTM that we can make it BEGIN AFTER 'xx/xx' or such, which'd not
require any keywords, it'd be easier to use than a procedure.

I still don't see a good argument for tying this to BEGIN. If it
has to be a statement, why not a standalone statement?

Because the goal is to start a transaction where a certain action from
the primary is visible.

I think there's also some advantages of having it in a single statement
for poolers. If a pooler analyzes BEGIN AFTER 'xx/xx' it could
e.g. redirect the transaction to a node that's caught up far enough,
instead of blocking. But that can't work even close to as easily if it's
something that has to be executed before transaction begin.

(I also have a lurking suspicion that this shouldn't be SQL at all
but part of the replication command set.)

Hm? I'm not quite following. The feature is useful to achieve
read-your-own-writes consistency. Consider

Primary: INSERT INTO app_users ...; SELECT pg_current_wal_lsn();
Standby: BEGIN AFTER 'returned/lsn';
Standby: SELECT i_am_a_set_of_very_expensive_queries FROM ..., app_users;

without the AFTER/WAIT whatnot, you cannot rely on the insert having
been replicated to the standby.

Offloading queries from the write node to replicas is a pretty standard
technique for scaling out databases (including PG). We just make it
harder than necessary.

How would this be part of the replication command set? This shouldn't
require replication permissions for the user executing the queries.
While I'm in favor of merging the replication protocol entirely with the
normal protocol, I've so far received very little support for that
proposition...

Greetings,

Andres Freund

#90Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Andres Freund (#89)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-04-11 00:44, Andres Freund wrote:

I think there's also some advantages of having it in a single statement
for poolers. If a pooler analyzes BEGIN AFTER 'xx/xx' it could
e.g. redirect the transaction to a node that's caught up far enough,
instead of blocking. But that can't work even close to as easily if
it's
something that has to be executed before transaction begin.

I think that's a good point.

Also, I'm not sure how we'd expect a wait-for-LSN procedure to work
inside a single-snapshot transaction. Would it throw an error inside a
RR transaction block? Would it give a warning?

--
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#91Daniel Gustafsson
daniel@yesql.se
In reply to: Anna Akenteva (#90)
Re: [HACKERS] make async slave to wait for lsn to be replayed

This patch require some rewording of documentation/comments and variable names
after the language change introduced by 229f8c219f8f..a9a4a7ad565b, the thread
below can be used as reference for how to change:

/messages/by-id/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de

cheers ./daniel

#92Anna Akenteva
a.akenteva@postgrespro.ru
In reply to: Daniel Gustafsson (#91)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On 2020-07-13 14:21, Daniel Gustafsson wrote:

This patch require some rewording of documentation/comments and
variable names
after the language change introduced by 229f8c219f8f..a9a4a7ad565b, the
thread
below can be used as reference for how to change:

/messages/by-id/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de

Thank you for the heads up!

I updated the most recent patch and removed the use of "master" from it,
replacing it with "primary".

--
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

begin_waitfor_v9.patchtext/x-diff; name=begin_waitfor_v9.patchDownload
commit 1021b166c0cb4e4017d503eb481e787638842ab1
Author: Akenteva Anna <a.akenteva@postgrespro.ru>
Date:   Tue Aug 18 12:37:16 2020 +0300

    wait for lsn

diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..fbfadbac8e9 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -63,6 +63,17 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
    <xref linkend="sql-set-transaction"/>
    was executed.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with primary-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -146,6 +157,10 @@ BEGIN;
    different purpose in embedded SQL. You are advised to be careful
    about the transaction semantics when porting database applications.
   </para>
+
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
  </refsect1>
 
  <refsect1>
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..016e59930df 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ WAIT FOR LSN <replaceable class="parameter">lsn_value</replaceable> [TIMEOUT <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -40,6 +40,17 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    characteristics, as if <xref linkend="sql-set-transaction"/> was executed. This is the same
    as the <xref linkend="sql-begin"/> command.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with primary-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -78,6 +89,10 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    omitted.
   </para>
 
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
+
   <para>
    See also the compatibility section of <xref linkend="sql-set-transaction"/>.
   </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae4..19cb04869c1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7385,6 +7386,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..e1123df321e
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,282 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(&state->mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+				state->lsn[i] < state->min_lsn.value)
+				state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn.value = PG_UINT64_MAX;
+	}
+}
+
+/*
+ * Set latches in shared memory to signal that new LSN has been replayed
+ */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int			backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+
+		if (backend && state->lsn[i] != 0 &&
+			state->lsn[i] <= cur_lsn)
+		{
+			SetLatch(&backend->procLatch);
+		}
+	}
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Get minimal LSN that someone waits for
+ */
+XLogRecPtr
+GetMinWaitedLSN(void)
+{
+	return state->min_lsn.value;
+}
+
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens. Timeout is specified in milliseconds.
+ * Returns true if LSN was reached and false otherwise.
+ */
+bool
+WaitUtility(XLogRecPtr target_lsn, const int timeout_ms)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	bool		res = false;
+	bool		wait_forever = (timeout_ms <= 0);
+
+	endtime = GetNowFloat() + timeout_ms / 1000.0;
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Check if we already reached the needed LSN */
+	if (cur_lsn >= target_lsn)
+		return true;
+
+	AddWaitedLSN(target_lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		time_left = 0;
+		long		time_left_ms = 0;
+
+		time_left = endtime - GetNowFloat();
+
+		/* Use 100 ms as the default timeout to check for interrupts */
+		if (wait_forever || time_left < 0 || time_left > 0.1)
+			time_left_ms = 100;
+		else
+			time_left_ms = (long) ceil(time_left * 1000.0);
+
+		/* If interrupt, LockErrorCleanup() will do DeleteWaitedLSN() for us */
+		CHECK_FOR_INTERRUPTS();
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, time_left_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		if (rc & WL_TIMEOUT)
+		{
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+			/* If the time specified by user has passed, stop waiting */
+			time_left = endtime - GetNowFloat();
+			if (!wait_forever && time_left <= 0.0)
+				break;
+		}
+
+		/* If LSN has been replayed */
+		if (target_lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteWaitedLSN();
+
+	if (cur_lsn < target_lsn)
+		ereport(WARNING,
+				(errcode(ERRCODE_NO_ACTIVE_SQL_TRANSACTION),
+				 errmsg("didn't start transaction because LSN was not reached"),
+				 errhint("Try to increase wait time.")));
+	else
+		res = true;
+
+	return res;
+}
+
+/*
+ * Implementation of WAIT FOR
+ */
+int
+WaitMain(WaitClause *stmt, DestReceiver *dest)
+{
+	TupleDesc	tupdesc;
+	TupOutputState *tstate;
+	XLogRecPtr	target_lsn;
+	bool		res = false;
+
+	target_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in,
+												 CStringGetDatum(stmt->lsn)));
+	res = WaitUtility(target_lsn, stmt->timeout);
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send the result */
+	do_text_output_oneline(tstate, res ? "t" : "f");
+	end_tup_output(tstate);
+	return res;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 89c409de664..2e8041a95df 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3749,10 +3749,22 @@ _copyTransactionStmt(const TransactionStmt *from)
 	COPY_STRING_FIELD(savepoint_name);
 	COPY_STRING_FIELD(gid);
 	COPY_SCALAR_FIELD(chain);
+	COPY_NODE_FIELD(wait);
 
 	return newnode;
 }
 
+static WaitClause *
+_copyWaitClause(const WaitClause *from)
+{
+	WaitClause *newnode = makeNode(WaitClause);
+
+	COPY_STRING_FIELD(lsn);
+	COPY_SCALAR_FIELD(timeout);
+
+	return newnode;
+};
+
 static CompositeTypeStmt *
 _copyCompositeTypeStmt(const CompositeTypeStmt *from)
 {
@@ -5340,6 +5352,9 @@ copyObjectImpl(const void *from)
 		case T_TransactionStmt:
 			retval = _copyTransactionStmt(from);
 			break;
+		case T_WaitClause:
+			retval = _copyWaitClause(from);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _copyCompositeTypeStmt(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e3f33c40be5..c04412ea0d9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1542,6 +1542,16 @@ _equalTransactionStmt(const TransactionStmt *a, const TransactionStmt *b)
 	COMPARE_STRING_FIELD(savepoint_name);
 	COMPARE_STRING_FIELD(gid);
 	COMPARE_SCALAR_FIELD(chain);
+	COMPARE_NODE_FIELD(wait);
+
+	return true;
+}
+
+static bool
+_equalWaitClause(const WaitClause *a, const WaitClause *b)
+{
+	COMPARE_STRING_FIELD(lsn);
+	COMPARE_SCALAR_FIELD(timeout);
 
 	return true;
 }
@@ -3392,6 +3402,9 @@ equal(const void *a, const void *b)
 		case T_TransactionStmt:
 			retval = _equalTransactionStmt(a, b);
 			break;
+		case T_WaitClause:
+			retval = _equalWaitClause(a, b);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _equalCompositeTypeStmt(a, b);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515da..626b7552629 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2787,6 +2787,28 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outTransactionStmt(StringInfo str, const TransactionStmt *node)
+{
+	WRITE_NODE_TYPE("TRANSACTIONSTMT");
+
+	WRITE_STRING_FIELD(savepoint_name);
+	WRITE_STRING_FIELD(gid);
+	WRITE_NODE_FIELD(options);
+	WRITE_BOOL_FIELD(chain);
+	WRITE_ENUM_FIELD(kind, TransactionStmtKind);
+	WRITE_NODE_FIELD(wait);
+}
+
+static void
+_outWaitClause(StringInfo str, const WaitClause *node)
+{
+	WRITE_NODE_TYPE("WAITCLAUSE");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_UINT_FIELD(timeout);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4337,6 +4359,12 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_TransactionStmt:
+				_outTransactionStmt(str, obj);
+				break;
+			case T_WaitClause:
+				_outWaitClause(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index dbb47d49829..78a17190497 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -598,6 +598,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <ival>		wait_time
+%type <node>		wait_for
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -667,7 +669,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED LSN
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -698,7 +700,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
-	TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
+	TIES TIME TIMEOUT TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
 	TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P TYPES_P
 
@@ -708,7 +710,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WAIT WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -9724,18 +9727,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty wait_for
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14012,6 +14017,25 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*
+ * WAIT FOR clause of BEGIN and START TRANSACTION statements
+ */
+wait_for:
+			WAIT FOR LSN Sconst wait_time
+				{
+					WaitClause *n = makeNode(WaitClause);
+					n->lsn = $4;
+					n->timeout = $5;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; }
+		;
+
+wait_time:
+			TIMEOUT Iconst		{ $$ = $2; }
+			| /* EMPTY */		{ $$ = 0; }
+		;
+
 
 /*
  * Aggregate decoration clauses
@@ -15158,6 +15182,7 @@ unreserved_keyword:
 			| LOCK_P
 			| LOCKED
 			| LOGGED
+			| LSN
 			| MAPPING
 			| MATCH
 			| MATERIALIZED
@@ -15285,6 +15310,7 @@ unreserved_keyword:
 			| TEMPORARY
 			| TEXT_P
 			| TIES
+			| TIMEOUT
 			| TRANSACTION
 			| TRANSFORM
 			| TRIGGER
@@ -15311,6 +15337,7 @@ unreserved_keyword:
 			| VIEW
 			| VIEWS
 			| VOLATILE
+			| WAIT
 			| WHITESPACE_P
 			| WITHIN
 			| WITHOUT
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd6..3b941dc7b8c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -23,6 +23,7 @@
 #include "access/syncscan.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -268,6 +270,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the WAIT FOR clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index aa9fbd80545..48dacb3366e 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -712,6 +713,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If BEGIN WAIT FOR LSN was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 9b0c376c8cb..618804e0311 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -57,6 +57,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -593,6 +594,18 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitClause *waitstmt = (WaitClause *) stmt->wait;
+
+							/* WAIT FOR cannot be used on primary */
+							if (stmt->wait && !RecoveryInProgress())
+								ereport(ERROR,
+										(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+										 errmsg("WAIT FOR can only be "
+												"used on standby")));
+
+							/* If needed to WAIT FOR something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 37c23c9155a..2181191d99e 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -373,8 +373,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
-
 	endtime = GetNowFloat() + secs;
 
 	for (;;)
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..d08ee574ed3
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+#include "nodes/parsenodes.h"
+
+extern bool WaitUtility(XLogRecPtr lsn, const int timeout_ms);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitedLSN(void);
+extern int	WaitMain(WaitClause *stmt, DestReceiver *dest);
+extern void DeleteWaitedLSN(void);
+
+#endif							/* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4f..822827aa32d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -492,6 +492,7 @@ typedef enum NodeTag
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
+	T_WaitClause,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 151bcdb7ef5..6a21ed5faeb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1431,6 +1431,17 @@ typedef struct OnConflictClause
 	int			location;		/* token location, or -1 if unknown */
 } OnConflictClause;
 
+/*
+ * WaitClause -
+ *		representation of WAIT FOR clause for BEGIN and START TRANSACTION.
+ */
+typedef struct WaitClause
+{
+	NodeTag		type;
+	char	   *lsn;			/* LSN to wait for */
+	int			timeout;		/* Number of milliseconds to limit wait time */
+} WaitClause;
+
 /*
  * CommonTableExpr -
  *	   representation of WITH list element
@@ -3061,6 +3072,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node	   *wait;			/* WAIT FOR clause */
 } TransactionStmt;
 
 /* ----------------------
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 08f22ce211d..6e1848fe4cc 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("lsn", LSN, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
@@ -410,6 +411,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
 PG_KEYWORD("ties", TIES, UNRESERVED_KEYWORD)
 PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
+PG_KEYWORD("timeout", TIMEOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
 PG_KEYWORD("to", TO, RESERVED_KEYWORD)
 PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
@@ -450,6 +452,7 @@ PG_KEYWORD("version", VERSION_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("view", VIEW, UNRESERVED_KEYWORD)
 PG_KEYWORD("views", VIEWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("volatile", VOLATILE, UNRESERVED_KEYWORD)
+PG_KEYWORD("wait", WAIT, UNRESERVED_KEYWORD)
 PG_KEYWORD("when", WHEN, RESERVED_KEYWORD)
 PG_KEYWORD("where", WHERE, RESERVED_KEYWORD)
 PG_KEYWORD("whitespace", WHITESPACE_P, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 03a1de569f0..eaeeb79c411 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -109,4 +109,6 @@ extern int	date2isoyearday(int year, int mon, int mday);
 
 extern bool TimestampTimestampTzRequiresRewrite(void);
 
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+
 #endif							/* TIMESTAMP_H */
diff --git a/src/test/recovery/t/020_begin_wait.pl b/src/test/recovery/t/020_begin_wait.pl
new file mode 100644
index 00000000000..1cb3c59cace
--- /dev/null
+++ b/src/test/recovery/t/020_begin_wait.pl
@@ -0,0 +1,85 @@
+# Checks WAIT FOR
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# And some content and take a backup
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Using the backup, create a streaming standby with a 1 second delay
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+# Check that timeouts make us wait for the specified time (1s here)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $two_seconds = 2000; # in milliseconds
+my $start_time = time();
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '0/FFFFFFFF' TIMEOUT $two_seconds");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $two_seconds, "WAIT FOR TIMEOUT waits for enough time");
+
+
+# Check that timeouts let us stop waiting right away, before reaching target LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my ($ret, $out, $err) = $node_standby->psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn1' TIMEOUT 1");
+
+ok($ret == 0, "zero return value when failed to WAIT FOR LSN on standby");
+ok($err =~ /WARNING:  didn't start transaction because LSN was not reached/,
+	"correct error message when failed to WAIT FOR LSN on standby");
+ok($out eq "f", "if given too little wait time, WAIT doesn't reach target LSN");
+
+
+# Check that WAIT FOR works fine and reaches target LSN if given no timeout
+
+# Add data on primary, memorize primary's last LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Wait for it to appear on replica, memorize replica's last LSN
+$node_standby->safe_psql('postgres',
+	"BEGIN WAIT FOR LSN '$lsn2'");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+
+# Make sure that primary's and replica's LSNs are the same after WAIT
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"standby reached the same LSN as primary before starting transaction");
+
+
+# Make sure that it's not allowed to use WAIT FOR on primary
+($ret, $out, $err) = $node_primary->psql('postgres',
+	"BEGIN WAIT FOR LSN '0/FFFFFFFF'");
+
+ok($ret != 0, "non-zero return value when trying to WAIT FOR LSN on primary");
+ok($err =~ /ERROR:  WAIT FOR can only be used on standby/,
+	"correct error message when trying to WAIT FOR LSN on primary");
+ok($out eq '', "empty output when trying to WAIT FOR LSN on primary");
+
+
+$node_standby->stop;
+$node_primary->stop;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3d990463ce9..eee95edfe11 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2679,6 +2679,7 @@ WSABUF
 WSADATA
 WSANETWORKEVENTS
 WSAPROTOCOL_INFO
+WaitClause
 WaitEvent
 WaitEventActivity
 WaitEventClient
@@ -2687,6 +2688,7 @@ WaitEventIPC
 WaitEventSet
 WaitEventTimeout
 WaitPMResult
+WaitState
 WalCloseMethod
 WalLevel
 WalRcvData
#93Michael Paquier
michael@paquier.xyz
In reply to: Anna Akenteva (#92)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Tue, Aug 18, 2020 at 01:12:51PM +0300, Anna Akenteva wrote:

I updated the most recent patch and removed the use of "master" from it,
replacing it with "primary".

This is failing to apply lately, causing the CF bot to complain:
http://cfbot.cputube.org/patch_29_772.log
--
Michael

#94Noname
a.pervushina@postgrespro.ru
In reply to: Anna Akenteva (#77)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Anna Akenteva писал 2020-04-08 22:36:

On 2020-04-08 04:09, Kyotaro Horiguchi wrote:

I like your suggested keywords! I think that "AFTER" + "WITHIN" sound
the most natural. We could completely give up the LSN keyword for now.
The final command could look something like:

BEGIN AFTER ‘0/303EC60’ WITHIN '5 seconds';
or
BEGIN AFTER ‘0/303EC60’ WITHIN 5000;

Hello,

I've changed the syntax of the command from BEGIN [ WAIT FOR LSN value [
TIMEOUT delay ]] to BEGIN [ AFTER value [ WITHIN delay ]] and removed
all the unnecessary keywords.

Best regards,
Alexandra Pervushina.

Attachments:

begin_after.patchtext/x-diff; name=begin_after.patchDownload
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 53eac922645..c5bf1a99039 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,7 +21,7 @@ doc/src/sgml/ref/begin.sgml
 
  <refsynopsisdiv>
 <synopsis>
-BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [AFTER <replaceable class="parameter">lsn_value</replaceable> [WITHIN <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -63,6 +63,17 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</
    <xref linkend="sql-set-transaction"/>
    was executed.
   </para>
+
+  <para>
+   The <literal>BEGIN AFTER</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with primary-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -146,6 +157,10 @@ BEGIN;
    different purpose in embedded SQL. You are advised to be careful
    about the transaction semantics when porting database applications.
   </para>
+
+  <para>
+   There is no <command>AFTER</command> clause in the SQL standard.
+  </para>
  </refsect1>
 
  <refsect1>
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d5dfa413aeb..ea9b0b43fd0 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,7 +21,7 @@ doc/src/sgml/ref/start_transaction.sgml
 
  <refsynopsisdiv>
 <synopsis>
-START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ]
+START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable> [, ...] ] [ AFTER <replaceable class="parameter">lsn_value</replaceable> [WITHIN <replaceable class="parameter">number_of_milliseconds</replaceable> ] ]
 
 <phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
 
@@ -40,6 +40,17 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    characteristics, as if <xref linkend="sql-set-transaction"/> was executed. This is the same
    as the <xref linkend="sql-begin"/> command.
   </para>
+
+  <para>
+   The <literal>WAIT FOR</literal> clause allows to wait for the target log
+   sequence number (<acronym>LSN</acronym>) to be replayed on standby before
+   starting the transaction in <productname>PostgreSQL</productname> databases
+   with primary-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If <acronym>LSN</acronym> was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using <literal>Ctrl+C</literal>, or
+   by shutting down the <literal>postgres</literal> server.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -78,6 +89,10 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable
    omitted.
   </para>
 
+  <para>
+   There is no <command>WAIT FOR</command> clause in the SQL standard.
+  </para>
+
   <para>
    See also the compatibility section of <xref linkend="sql-set-transaction"/>.
   </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9d3f1c12fc5..27e964c8111 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7387,6 +7388,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..84195e2a7df
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,282 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements BEGIN AFTER, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(&state->mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+				state->lsn[i] < state->min_lsn.value)
+				state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn.value = PG_UINT64_MAX;
+	}
+}
+
+/*
+ * Set latches in shared memory to signal that new LSN has been replayed
+ */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int			backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+
+		if (backend && state->lsn[i] != 0 &&
+			state->lsn[i] <= cur_lsn)
+		{
+			SetLatch(&backend->procLatch);
+		}
+	}
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Get minimal LSN that someone waits for
+ */
+XLogRecPtr
+GetMinWaitedLSN(void)
+{
+	return state->min_lsn.value;
+}
+
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens. Timeout is specified in milliseconds.
+ * Returns true if LSN was reached and false otherwise.
+ */
+bool
+WaitUtility(XLogRecPtr target_lsn, const int timeout_ms)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	bool		res = false;
+	bool		wait_forever = (timeout_ms <= 0);
+
+	endtime = GetNowFloat() + timeout_ms / 1000.0;
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Check if we already reached the needed LSN */
+	if (cur_lsn >= target_lsn)
+		return true;
+
+	AddWaitedLSN(target_lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		time_left = 0;
+		long		time_left_ms = 0;
+
+		time_left = endtime - GetNowFloat();
+
+		/* Use 100 ms as the default timeout to check for interrupts */
+		if (wait_forever || time_left < 0 || time_left > 0.1)
+			time_left_ms = 100;
+		else
+			time_left_ms = (long) ceil(time_left * 1000.0);
+
+		/* If interrupt, LockErrorCleanup() will do DeleteWaitedLSN() for us */
+		CHECK_FOR_INTERRUPTS();
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, time_left_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		if (rc & WL_TIMEOUT)
+		{
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+			/* If the time specified by user has passed, stop waiting */
+			time_left = endtime - GetNowFloat();
+			if (!wait_forever && time_left <= 0.0)
+				break;
+		}
+
+		/* If LSN has been replayed */
+		if (target_lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteWaitedLSN();
+
+	if (cur_lsn < target_lsn)
+		ereport(WARNING,
+				(errcode(ERRCODE_NO_ACTIVE_SQL_TRANSACTION),
+				 errmsg("didn't start transaction because LSN was not reached"),
+				 errhint("Try to increase wait time.")));
+	else
+		res = true;
+
+	return res;
+}
+
+/*
+ * Implementation of BEGIN AFTER
+ */
+int
+WaitMain(WaitClause *stmt, DestReceiver *dest)
+{
+	TupleDesc	tupdesc;
+	TupOutputState *tstate;
+	XLogRecPtr	target_lsn;
+	bool		res = false;
+
+	target_lsn = DatumGetLSN(DirectFunctionCall1(pg_lsn_in,
+												 CStringGetDatum(stmt->lsn)));
+	res = WaitUtility(target_lsn, stmt->timeout);
+
+	/* Need a tuple descriptor representing a single TEXT column */
+	tupdesc = CreateTemplateTupleDesc(1);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "LSN reached", TEXTOID, -1, 0);
+
+	/* Prepare for projection of tuples */
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsMinimalTuple);
+
+	/* Send the result */
+	do_text_output_oneline(tstate, res ? "t" : "f");
+	end_tup_output(tstate);
+	return res;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d8cf87e6d08..df2a68156cc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3749,10 +3749,22 @@ _copyTransactionStmt(const TransactionStmt *from)
 	COPY_STRING_FIELD(savepoint_name);
 	COPY_STRING_FIELD(gid);
 	COPY_SCALAR_FIELD(chain);
+	COPY_NODE_FIELD(wait);
 
 	return newnode;
 }
 
+static WaitClause *
+_copyWaitClause(const WaitClause *from)
+{
+	WaitClause *newnode = makeNode(WaitClause);
+
+	COPY_STRING_FIELD(lsn);
+	COPY_SCALAR_FIELD(timeout);
+
+	return newnode;
+};
+
 static CompositeTypeStmt *
 _copyCompositeTypeStmt(const CompositeTypeStmt *from)
 {
@@ -5340,6 +5352,9 @@ copyObjectImpl(const void *from)
 		case T_TransactionStmt:
 			retval = _copyTransactionStmt(from);
 			break;
+		case T_WaitClause:
+			retval = _copyWaitClause(from);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _copyCompositeTypeStmt(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 627b026b195..ecad865771a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1542,6 +1542,16 @@ _equalTransactionStmt(const TransactionStmt *a, const TransactionStmt *b)
 	COMPARE_STRING_FIELD(savepoint_name);
 	COMPARE_STRING_FIELD(gid);
 	COMPARE_SCALAR_FIELD(chain);
+	COMPARE_NODE_FIELD(wait);
+
+	return true;
+}
+
+static bool
+_equalWaitClause(const WaitClause *a, const WaitClause *b)
+{
+	COMPARE_STRING_FIELD(lsn);
+	COMPARE_SCALAR_FIELD(timeout);
 
 	return true;
 }
@@ -3392,6 +3402,9 @@ equal(const void *a, const void *b)
 		case T_TransactionStmt:
 			retval = _equalTransactionStmt(a, b);
 			break;
+		case T_WaitClause:
+			retval = _equalWaitClause(a, b);
+			break;
 		case T_CompositeTypeStmt:
 			retval = _equalCompositeTypeStmt(a, b);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f1883847077..8455f991b18 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2788,6 +2788,28 @@ _outDefElem(StringInfo str, const DefElem *node)
 	WRITE_LOCATION_FIELD(location);
 }
 
+static void
+_outTransactionStmt(StringInfo str, const TransactionStmt *node)
+{
+	WRITE_NODE_TYPE("TRANSACTIONSTMT");
+
+	WRITE_STRING_FIELD(savepoint_name);
+	WRITE_STRING_FIELD(gid);
+	WRITE_NODE_FIELD(options);
+	WRITE_BOOL_FIELD(chain);
+	WRITE_ENUM_FIELD(kind, TransactionStmtKind);
+	WRITE_NODE_FIELD(wait);
+}
+
+static void
+_outWaitClause(StringInfo str, const WaitClause *node)
+{
+	WRITE_NODE_TYPE("WAITCLAUSE");
+
+	WRITE_STRING_FIELD(lsn);
+	WRITE_UINT_FIELD(timeout);
+}
+
 static void
 _outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
@@ -4338,6 +4360,12 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionRangeDatum:
 				_outPartitionRangeDatum(str, obj);
 				break;
+			case T_TransactionStmt:
+				_outTransactionStmt(str, obj);
+				break;
+			case T_WaitClause:
+				_outWaitClause(str, obj);
+				break;
 
 			default:
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 67483552441..6c24f518acc 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -600,6 +600,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <partboundspec> PartitionBoundSpec
 %type <list>		hash_partbound
 %type <defelt>		hash_partbound_elem
+%type <ival>		wait_time
+%type <node>		begin_after
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -710,7 +712,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	VACUUM VALID VALIDATE VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VIEWS VOLATILE
 
-	WHEN WHERE WHITESPACE_P WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE
+	WHEN WHERE WHITESPACE_P WINDOW
+	WITH WITHIN WITHOUT WORK WRAPPER WRITE
 
 	XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLEXISTS XMLFOREST XMLNAMESPACES
 	XMLPARSE XMLPI XMLROOT XMLSERIALIZE XMLTABLE
@@ -9960,18 +9963,20 @@ TransactionStmt:
 					n->chain = $3;
 					$$ = (Node *)n;
 				}
-			| BEGIN_P opt_transaction transaction_mode_list_or_empty
+			| BEGIN_P opt_transaction transaction_mode_list_or_empty begin_after
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_BEGIN;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
-			| START TRANSACTION transaction_mode_list_or_empty
+			| START TRANSACTION transaction_mode_list_or_empty begin_after
 				{
 					TransactionStmt *n = makeNode(TransactionStmt);
 					n->kind = TRANS_STMT_START;
 					n->options = $3;
+					n->wait = $4;
 					$$ = (Node *)n;
 				}
 			| COMMIT opt_transaction opt_transaction_chain
@@ -14259,6 +14264,25 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+/*
+ * AFTER clause of BEGIN and START TRANSACTION statements
+ */
+begin_after:
+			AFTER Sconst wait_time
+				{
+					WaitClause *n = makeNode(WaitClause);
+					n->lsn = $2;
+					n->timeout = $3;
+					$$ = (Node *)n;
+				}
+			| /* EMPTY */		{ $$ = NULL; }
+		;
+
+wait_time:
+			WITHIN Iconst		{ $$ = $2; }
+			| /* EMPTY */		{ $$ = 0; }
+		;
+
 
 /*
  * Aggregate decoration clauses
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..abc93295478 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the BEGIN AFTER clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index f79fa687b73..94f1df64af1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -720,6 +721,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If BEGIN AFTER was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 0070dfa5b82..022816cbbb9 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -57,6 +57,7 @@
 #include "commands/user.h"
 #include "commands/vacuum.h"
 #include "commands/view.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "parser/parse_utilcmd.h"
 #include "postmaster/bgwriter.h"
@@ -593,6 +594,18 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 					case TRANS_STMT_START:
 						{
 							ListCell   *lc;
+							WaitClause *waitstmt = (WaitClause *) stmt->wait;
+
+							/* BEGIN AFTER cannot be used on primary */
+							if (stmt->wait && !RecoveryInProgress())
+								ereport(ERROR,
+										(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+										 errmsg("BEGIN AFTER can only be "
+												"used on standby")));
+
+							/* If needed to BEGIN AFTER something but failed */
+							if (stmt->wait && WaitMain(waitstmt, dest) == 0)
+								break;
 
 							BeginTransactionBlock();
 							foreach(lc, stmt->options)
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 37c23c9155a..2181191d99e 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -373,8 +373,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
-
 	endtime = GetNowFloat() + secs;
 
 	for (;;)
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..d08ee574ed3
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+#include "nodes/parsenodes.h"
+
+extern bool WaitUtility(XLogRecPtr lsn, const int timeout_ms);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitedLSN(void);
+extern int	WaitMain(WaitClause *stmt, DestReceiver *dest);
+extern void DeleteWaitedLSN(void);
+
+#endif							/* WAIT_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4f..822827aa32d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -492,6 +492,7 @@ typedef enum NodeTag
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
+	T_WaitClause,
 
 	/*
 	 * TAGS FOR RANDOM OTHER STUFF
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ada17bf6eea..7d8c1fdcf92 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1431,6 +1431,17 @@ typedef struct OnConflictClause
 	int			location;		/* token location, or -1 if unknown */
 } OnConflictClause;
 
+/*
+ * WaitClause -
+ *		representation of WAIT FOR clause for BEGIN and START TRANSACTION.
+ */
+typedef struct WaitClause
+{
+	NodeTag		type;
+	char	   *lsn;			/* LSN to wait for */
+	int			timeout;		/* Number of milliseconds to limit wait time */
+} WaitClause;
+
 /*
  * CommonTableExpr -
  *	   representation of WITH list element
@@ -3062,6 +3073,7 @@ typedef struct TransactionStmt
 	char	   *savepoint_name; /* for savepoint commands */
 	char	   *gid;			/* for two-phase-commit related commands */
 	bool		chain;			/* AND CHAIN option */
+	Node	   *wait;			/* BEGIN AFTER clause */
 } TransactionStmt;
 
 /* ----------------------
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 03a1de569f0..eaeeb79c411 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -109,4 +109,6 @@ extern int	date2isoyearday(int year, int mon, int mday);
 
 extern bool TimestampTimestampTzRequiresRewrite(void);
 
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+
 #endif							/* TIMESTAMP_H */
diff --git a/src/test/recovery/t/021_begin_after.pl b/src/test/recovery/t/021_begin_after.pl
new file mode 100644
index 00000000000..1e3570d7633
--- /dev/null
+++ b/src/test/recovery/t/021_begin_after.pl
@@ -0,0 +1,85 @@
+# Checks BEGIN AFTER
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# And some content and take a backup
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Using the backup, create a streaming standby with a 1 second delay
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+
+# Check that within makes us wait for the specified time (1s here)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $two_seconds = 2000; # in milliseconds
+my $start_time = time();
+$node_standby->safe_psql('postgres',
+	"BEGIN AFTER '0/FFFFFFFF' WITHIN $two_seconds");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $two_seconds, "BEGIN AFTER WITHIN waits for enough time");
+
+
+# Check that within lets us stop waiting right away, before reaching target LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my ($ret, $out, $err) = $node_standby->psql('postgres',
+	"BEGIN AFTER '$lsn1' WITHIN 1");
+
+ok($ret == 0, "zero return value when failed to BEGIN AFTER on standby");
+ok($err =~ /WARNING:  didn't start transaction because LSN was not reached/,
+	"correct error message when failed to BEGIN AFTER on standby");
+ok($out eq "f", "if given too little wait time, WAIT doesn't reach target LSN");
+
+
+# Check that BEGIN AFTER works fine and reaches target LSN if given no within
+
+# Add data on primary, memorize primary's last LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Wait for it to appear on replica, memorize replica's last LSN
+$node_standby->safe_psql('postgres',
+	"BEGIN AFTER '$lsn2'");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+
+# Make sure that primary's and replica's LSNs are the same after WAIT
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"standby reached the same LSN as primary before starting transaction");
+
+
+# Make sure that it's not allowed to use BEGIN AFTER on primary
+($ret, $out, $err) = $node_primary->psql('postgres',
+	"BEGIN AFTER '0/FFFFFFFF'");
+
+ok($ret != 0, "non-zero return value when trying to BEGIN AFTER on primary");
+ok($err =~ /ERROR:  BEGIN AFTER can only be used on standby/,
+	"correct error message when trying to BEGIN AFTER on primary");
+ok($out eq '', "empty output when trying to BEGIN AFTER on primary");
+
+
+$node_standby->stop;
+$node_primary->stop;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1e140..652e83b550f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2678,6 +2678,7 @@ WSABUF
 WSADATA
 WSANETWORKEVENTS
 WSAPROTOCOL_INFO
+WaitClause
 WaitEvent
 WaitEventActivity
 WaitEventClient
@@ -2686,6 +2687,7 @@ WaitEventIPC
 WaitEventSet
 WaitEventTimeout
 WaitPMResult
+WaitState
 WalCloseMethod
 WalLevel
 WalRcvData
#95Noname
a.pervushina@postgrespro.ru
In reply to: Noname (#94)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hello,

I've changed the BEGIN WAIT FOR LSN statement to core functions
pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
Currently the functions work inside repeatable read transactions, but
waitlsn creates a snapshot if called first in a transaction block, which
can possibly lead the transaction to working incorrectly, so the
function gives a warning.

Usage examples
==========
select pg_waitlsn(‘LSN’, timeout);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

Attachments:

pg_waitlsn_v10.patchtext/x-diff; name=pg_waitlsn_v10.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9d3f1c12fc5..27e964c8111 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7387,6 +7388,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00000000000..ac18040c362
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,297 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements waitlsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(&state->mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+				state->lsn[i] < state->min_lsn.value)
+				state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn.value = PG_UINT64_MAX;
+	}
+}
+
+/*
+ * Set latches in shared memory to signal that new LSN has been replayed
+ */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int			backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+
+		if (backend && state->lsn[i] != 0 &&
+			state->lsn[i] <= cur_lsn)
+		{
+			SetLatch(&backend->procLatch);
+		}
+	}
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Get minimal LSN that someone waits for
+ */
+XLogRecPtr
+GetMinWaitedLSN(void)
+{
+	return state->min_lsn.value;
+}
+
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens. Timeout is specified in milliseconds.
+ * Returns true if LSN was reached and false otherwise.
+ */
+bool
+WaitUtility(XLogRecPtr target_lsn, const int timeout_ms)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	bool		res = false;
+	bool		wait_forever = (timeout_ms <= 0);
+
+	if (!RecoveryInProgress()) {
+		ereport(ERROR,
+				errmsg("Cannot use waitlsn on primary"));
+		return false;
+	}
+
+	/*
+	 * In transactions, that have isolation level repeatable read or higher
+	 * waitlsn creates a snapshot if called first in a block, which can
+	 * lead the transaction to working incorrectly
+	 */
+
+	if (IsTransactionBlock() && XactIsoLevel != XACT_READ_COMMITTED) {
+		ereport(WARNING,
+				errmsg("Waitlsn may work incorrectly in this isolation level"),
+				errhint("Call waitlsn before starting the transaction"));
+	}
+
+	endtime = GetNowFloat() + timeout_ms / 1000.0;
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Check if we already reached the needed LSN */
+	if (cur_lsn >= target_lsn)
+		return true;
+
+	AddWaitedLSN(target_lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		time_left = 0;
+		long		time_left_ms = 0;
+
+		time_left = endtime - GetNowFloat();
+
+		/* Use 100 ms as the default timeout to check for interrupts */
+		if (wait_forever || time_left < 0 || time_left > 0.1)
+			time_left_ms = 100;
+		else
+			time_left_ms = (long) ceil(time_left * 1000.0);
+
+		/* If interrupt, LockErrorCleanup() will do DeleteWaitedLSN() for us */
+		CHECK_FOR_INTERRUPTS();
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, time_left_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		if (rc & WL_TIMEOUT)
+		{
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+			/* If the time specified by user has passed, stop waiting */
+			time_left = endtime - GetNowFloat();
+			if (!wait_forever && time_left <= 0.0)
+				break;
+		}
+
+		/* If LSN has been replayed */
+		if (target_lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteWaitedLSN();
+
+	if (cur_lsn < target_lsn)
+		ereport(WARNING,
+				 errmsg("LSN was not reached"),
+				 errhint("Try to increase wait time."));
+	else
+		res = true;
+
+	return res;
+}
+
+Datum
+pg_waitlsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+	uint64_t		delay = PG_GETARG_INT32(1);
+
+	PG_RETURN_BOOL(WaitUtility(trg_lsn, delay));
+}
+
+Datum
+pg_waitlsn_infinite(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+
+	PG_RETURN_BOOL(WaitUtility(trg_lsn, 0));
+}
+
+Datum
+pg_waitlsn_no_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+
+	PG_RETURN_BOOL(WaitUtility(trg_lsn, 1));
+}
\ No newline at end of file
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cde..2dcfdde5f3f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/syncscan.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -147,6 +148,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -264,6 +266,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the wait clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5844540ccca..42f8b33c2f4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -717,6 +718,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If waitlsn was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 37c23c9155a..2181191d99e 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -373,8 +373,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
-
 	endtime = GetNowFloat() + secs;
 
 	for (;;)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c2287273a93..16177049f6f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10937,4 +10937,19 @@
   proname => 'is_normalized', prorettype => 'bool', proargtypes => 'text text',
   prosrc => 'unicode_is_normalized' },
 
+{ oid => '16387', descr => 'wait for LSN until timeout',
+  proname => 'pg_waitlsn', prorettype => 'bool', proargtypes => 'pg_lsn int8',
+  proargnames => '{trg_lsn,delay}',
+  prosrc => 'pg_waitlsn' },
+
+{ oid => '16388', descr => 'wait for LSN for an infinite time',
+  proname => 'pg_waitlsn_infinite', prorettype => 'bool', proargtypes => 'pg_lsn',
+  proargnames => '{trg_lsn}',
+  prosrc => 'pg_waitlsn_infinite' },
+
+{ oid => '16389', descr => 'wait for LSN with no timeout',
+  proname => 'pg_waitlsn_no_wait', prorettype => 'bool', proargtypes => 'pg_lsn',
+  proargnames => '{trg_lsn}',
+  prosrc => 'pg_waitlsn_no_wait' },
+
 ]
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 00000000000..fd21e434167
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+#include "nodes/parsenodes.h"
+
+extern bool WaitUtility(XLogRecPtr lsn, const int timeout_ms);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitedLSN(void);
+extern void DeleteWaitedLSN(void);
+
+#endif							/* WAIT_H */
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 16c3fd8ec97..4e85b6f9db1 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -111,4 +111,6 @@ extern int	date2isoyearday(int year, int mon, int mday);
 
 extern bool TimestampTimestampTzRequiresRewrite(void);
 
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+
 #endif							/* TIMESTAMP_H */
diff --git a/src/test/recovery/t/021_waitlsn.pl b/src/test/recovery/t/021_waitlsn.pl
new file mode 100644
index 00000000000..81dd70ef963
--- /dev/null
+++ b/src/test/recovery/t/021_waitlsn.pl
@@ -0,0 +1,91 @@
+# Checks waitlsn
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# And some content and take a backup
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Using the backup, create a streaming standby with a 1 second delay
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+# Check that timeouts make us wait for the specified time (1s here)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $two_seconds = 2000; # in milliseconds
+my $start_time = time();
+$node_standby->safe_psql('postgres',
+	"SELECT pg_waitlsn('0/FFFFFFFF', $two_seconds)");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $two_seconds, "waitlsn waits for enough time");
+
+# Check that timeouts let us stop waiting right away, before reaching target LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my ($ret, $out, $err) = $node_standby->psql('postgres',
+	"SELECT pg_waitlsn('$lsn1', 1)");
+
+ok($ret == 0, "zero return value when failed to waitlsn on standby");
+ok($err =~ /WARNING:  LSN was not reached/,
+	"correct error message when failed to waitlsn on standby");
+ok($out eq "f", "if given too little wait time, WAIT doesn't reach target LSN");
+
+
+# Check that waitlsn works fine and reaches target LSN if given no timeout
+
+# Add data on primary, memorize primary's last LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Wait for it to appear on replica, memorize replica's last LSN
+$node_standby->safe_psql('postgres',
+	"SELECT pg_waitlsn_infinite('$lsn2')");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+
+# Make sure that primary's and replica's LSNs are the same after WAIT
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"standby reached the same LSN as primary before starting transaction");
+
+
+# Make sure that it's not allowed to use waitlsn on primary
+($ret, $out, $err) = $node_primary->psql('postgres',
+	"SELECT pg_waitlsn_infinite('0/FFFFFFFF')");
+
+ok($ret != 0, "non-zero return value when trying to waitlsn on primary");
+ok($err =~ /ERROR:  Cannot use waitlsn on primary/,
+	"correct error message when trying to waitlsn on primary");
+ok($out eq '', "empty output when trying to waitlsn on primary");
+
+# Make sure that waitlsn gives a warning inside a read commited transaction
+
+($ret, $out, $err) = $node_standby->psql('postgres',
+	"BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT pg_waitlsn_no_wait('0/FFFFFFFF')");
+ok($ret == 0, "zero return value when trying to waitlsn in transaction");
+ok($err =~ /WARNING:  Waitlsn may work incorrectly in this isolation level/,
+	"correct warning message when trying to waitlsn in transaction");
+ok($out eq "f", "non empty output when trying to waitlsn in transaction");
+
+$node_standby->stop;
+$node_primary->stop;
\ No newline at end of file
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1e140..204a9bc67e1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2686,6 +2686,7 @@ WaitEventIPC
 WaitEventSet
 WaitEventTimeout
 WaitPMResult
+WaitState
 WalCloseMethod
 WalLevel
 WalRcvData
#96Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noname (#95)
1 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

Hello.

At Wed, 18 Nov 2020 15:05:00 +0300, a.pervushina@postgrespro.ru wrote in

I've changed the BEGIN WAIT FOR LSN statement to core functions
pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
Currently the functions work inside repeatable read transactions, but
waitlsn creates a snapshot if called first in a transaction block,
which can possibly lead the transaction to working incorrectly, so the
function gives a warning.

According to the discuttion here, implementing as functions is not
optimal. As a Poc, I made it as a procedure. However I'm not sure it
is the correct implement as a native procedure but it seems working as
expected.

Usage examples
==========
select pg_waitlsn(‘LSN’, timeout);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

The first and second usage is coverd by a single procedure. The last
function is equivalent to pg_last_wal_replay_lsn(). As the result, the
following procedure is provided in the attached.

pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)

Any opinions mainly compared to implementation as a command?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

pg_waitlsn_v10_2_kh.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 470e113b33..4283b98eb4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7463,6 +7464,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..c19d49e7a4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1460,6 +1460,10 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+CREATE OR REPLACE PROCEDURE
+  pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
+  LANGUAGE internal AS 'pg_waitlsn';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..2c0bd41336 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..959e96b7e0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -23,6 +23,7 @@
 #include "access/syncscan.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -268,6 +270,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the wait clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..2b4d73ba2f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -713,6 +714,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If waitlsn was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 4096faff9a..90876da120 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -373,8 +373,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
-
 	endtime = GetNowFloat() + secs;
 
 	for (;;)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..918eaedfd5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11375,4 +11375,8 @@
   proname => 'is_normalized', prorettype => 'bool', proargtypes => 'text text',
   prosrc => 'unicode_is_normalized' },
 
+{ oid => '9313', descr => 'wait for LSN to be replayed',
+  proname => 'pg_waitlsn',  prokind => 'p',prorettype => 'void', proargtypes => 'pg_lsn int4',
+  proargnames => '{wait_lsn,timeout}',
+  prosrc => 'pg_waitlsn' }
 ]
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 63bf71ac61..6c4ecd704d 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -113,4 +113,6 @@ extern int	date2isoyearday(int year, int mon, int mday);
 
 extern bool TimestampTimestampTzRequiresRewrite(void);
 
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+
 #endif							/* TIMESTAMP_H */
#97Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Kyotaro Horiguchi (#96)
Re: [HACKERS] make async slave to wait for lsn to be replayed

On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

Hello.

At Wed, 18 Nov 2020 15:05:00 +0300, a.pervushina@postgrespro.ru wrote in

I've changed the BEGIN WAIT FOR LSN statement to core functions
pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
Currently the functions work inside repeatable read transactions, but
waitlsn creates a snapshot if called first in a transaction block,
which can possibly lead the transaction to working incorrectly, so the
function gives a warning.

According to the discuttion here, implementing as functions is not
optimal. As a Poc, I made it as a procedure. However I'm not sure it
is the correct implement as a native procedure but it seems working as
expected.

Usage examples
==========
select pg_waitlsn(‘LSN’, timeout);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

The first and second usage is coverd by a single procedure. The last
function is equivalent to pg_last_wal_replay_lsn(). As the result, the
following procedure is provided in the attached.

pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)

Any opinions mainly compared to implementation as a command?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has
compilation errors. Can you please take a look?

https://cirrus-ci.com/task/6241565996744704

xlog.c:45:10: fatal error: commands/wait.h: No such file or directory
#include "commands/wait.h"
^~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [<builtin>: xlog.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
make[2]: *** [common.mk:39: access-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

I am changing the status to "Waiting on Author"

--
Ibrar Ahmed

#98Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ibrar Ahmed (#97)
2 attachment(s)
Re: [HACKERS] make async slave to wait for lsn to be replayed

At Thu, 18 Mar 2021 18:57:15 +0500, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote in

On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

Hello.

At Wed, 18 Nov 2020 15:05:00 +0300, a.pervushina@postgrespro.ru wrote in

I've changed the BEGIN WAIT FOR LSN statement to core functions
pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
Currently the functions work inside repeatable read transactions, but
waitlsn creates a snapshot if called first in a transaction block,
which can possibly lead the transaction to working incorrectly, so the
function gives a warning.

According to the discuttion here, implementing as functions is not
optimal. As a Poc, I made it as a procedure. However I'm not sure it
is the correct implement as a native procedure but it seems working as
expected.

Usage examples
==========
select pg_waitlsn(‘LSN’, timeout);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

The first and second usage is coverd by a single procedure. The last
function is equivalent to pg_last_wal_replay_lsn(). As the result, the
following procedure is provided in the attached.

pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)

Any opinions mainly compared to implementation as a command?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has
compilation errors. Can you please take a look?

https://cirrus-ci.com/task/6241565996744704

xlog.c:45:10: fatal error: commands/wait.h: No such file or directory
#include "commands/wait.h"
^~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [<builtin>: xlog.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
make[2]: *** [common.mk:39: access-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

I am changing the status to "Waiting on Author"

Anna is the autor. The "patch" was just to show how we can implement
the feature as a procedure. (Sorry for the bad mistake I made.)

The patch still applies to the master. So I resend just rebased
version as v10_2, and attached the "PoC" as *.txt which applies on top
of the patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

pg_waitlsn_v10_2.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f8810e149..3c580083dd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7535,6 +7536,15 @@ StartupXLOG(void)
 					break;
 				}
 
+				/*
+				 * If we replayed an LSN that someone was waiting for,
+				 * set latches in shared memory array to notify the waiter.
+				 */
+				if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+				{
+					WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+				}
+
 				/* Else, try to fetch the next WAL record */
 				record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..2c0bd41336 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 0000000000..1f2483672b
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,297 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.c
+ *	  Implements waitlsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(&state->mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(&state->mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+				state->lsn[i] < state->min_lsn.value)
+				state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+				state->backend_maxid = i;
+				break;
+			}
+
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+										  WaitShmemSize(),
+										  &found);
+	if (!found)
+	{
+		SpinLockInit(&state->mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn.value = PG_UINT64_MAX;
+	}
+}
+
+/*
+ * Set latches in shared memory to signal that new LSN has been replayed
+ */
+void
+WaitSetLatch(XLogRecPtr cur_lsn)
+{
+	uint32		i;
+	int			backend_maxid;
+	PGPROC	   *backend;
+
+	SpinLockAcquire(&state->mutex);
+	backend_maxid = state->backend_maxid;
+
+	for (i = 2; i <= backend_maxid; i++)
+	{
+		backend = BackendIdGetProc(i);
+
+		if (backend && state->lsn[i] != 0 &&
+			state->lsn[i] <= cur_lsn)
+		{
+			SetLatch(&backend->procLatch);
+		}
+	}
+	SpinLockRelease(&state->mutex);
+}
+
+/*
+ * Get minimal LSN that someone waits for
+ */
+XLogRecPtr
+GetMinWaitedLSN(void)
+{
+	return state->min_lsn.value;
+}
+
+/*
+ * On WAIT use a latch to wait till LSN is replayed,
+ * postmaster dies or timeout happens. Timeout is specified in milliseconds.
+ * Returns true if LSN was reached and false otherwise.
+ */
+bool
+WaitUtility(XLogRecPtr target_lsn, const int timeout_ms)
+{
+	XLogRecPtr	cur_lsn = GetXLogReplayRecPtr(NULL);
+	int			latch_events;
+	float8		endtime;
+	bool		res = false;
+	bool		wait_forever = (timeout_ms <= 0);
+
+	if (!RecoveryInProgress()) {
+		ereport(ERROR,
+				errmsg("Cannot use waitlsn on primary"));
+		return false;
+	}
+
+	/*
+	 * In transactions, that have isolation level repeatable read or higher
+	 * waitlsn creates a snapshot if called first in a block, which can
+	 * lead the transaction to working incorrectly
+	 */
+
+	if (IsTransactionBlock() && XactIsoLevel != XACT_READ_COMMITTED) {
+		ereport(WARNING,
+				errmsg("Waitlsn may work incorrectly in this isolation level"),
+				errhint("Call waitlsn before starting the transaction"));
+	}
+
+	endtime = GetNowFloat() + timeout_ms / 1000.0;
+
+	latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+	/* Check if we already reached the needed LSN */
+	if (cur_lsn >= target_lsn)
+		return true;
+
+	AddWaitedLSN(target_lsn);
+
+	for (;;)
+	{
+		int			rc;
+		float8		time_left = 0;
+		long		time_left_ms = 0;
+
+		time_left = endtime - GetNowFloat();
+
+		/* Use 100 ms as the default timeout to check for interrupts */
+		if (wait_forever || time_left < 0 || time_left > 0.1)
+			time_left_ms = 100;
+		else
+			time_left_ms = (long) ceil(time_left * 1000.0);
+
+		/* If interrupt, LockErrorCleanup() will do DeleteWaitedLSN() for us */
+		CHECK_FOR_INTERRUPTS();
+
+		/* If postmaster dies, finish immediately */
+		if (!PostmasterIsAlive())
+			break;
+
+		rc = WaitLatch(MyLatch, latch_events, time_left_ms,
+					   WAIT_EVENT_CLIENT_READ);
+
+		ResetLatch(MyLatch);
+
+		if (rc & WL_LATCH_SET)
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+
+		if (rc & WL_TIMEOUT)
+		{
+			cur_lsn = GetXLogReplayRecPtr(NULL);
+			/* If the time specified by user has passed, stop waiting */
+			time_left = endtime - GetNowFloat();
+			if (!wait_forever && time_left <= 0.0)
+				break;
+		}
+
+		/* If LSN has been replayed */
+		if (target_lsn <= cur_lsn)
+			break;
+	}
+
+	DeleteWaitedLSN();
+
+	if (cur_lsn < target_lsn)
+		ereport(WARNING,
+				 errmsg("LSN was not reached"),
+				 errhint("Try to increase wait time."));
+	else
+		res = true;
+
+	return res;
+}
+
+Datum
+pg_waitlsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+	uint64_t		delay = PG_GETARG_INT32(1);
+
+	PG_RETURN_BOOL(WaitUtility(trg_lsn, delay));
+}
+
+Datum
+pg_waitlsn_infinite(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+
+	PG_RETURN_BOOL(WaitUtility(trg_lsn, 0));
+}
+
+Datum
+pg_waitlsn_no_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr		trg_lsn = PG_GETARG_LSN(0);
+
+	PG_RETURN_BOOL(WaitUtility(trg_lsn, 1));
+}
\ No newline at end of file
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..fb8f8588a7 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -23,6 +23,7 @@
 #include "access/syncscan.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the wait clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 897045ee27..540991146a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -716,6 +717,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If waitlsn was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 634f574d7e..50c836fdb7 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -375,8 +375,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()	((float8) GetCurrentTimestamp() / 1000000.0)
-
 	endtime = GetNowFloat() + secs;
 
 	for (;;)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e259531f60..c11387961e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11411,4 +11411,19 @@
   proname => 'is_normalized', prorettype => 'bool', proargtypes => 'text text',
   prosrc => 'unicode_is_normalized' },
 
+{ oid => '16387', descr => 'wait for LSN until timeout',
+  proname => 'pg_waitlsn', prorettype => 'bool', proargtypes => 'pg_lsn int8',
+  proargnames => '{trg_lsn,delay}',
+  prosrc => 'pg_waitlsn' },
+
+{ oid => '16388', descr => 'wait for LSN for an infinite time',
+  proname => 'pg_waitlsn_infinite', prorettype => 'bool', proargtypes => 'pg_lsn',
+  proargnames => '{trg_lsn}',
+  prosrc => 'pg_waitlsn_infinite' },
+
+{ oid => '16389', descr => 'wait for LSN with no timeout',
+  proname => 'pg_waitlsn_no_wait', prorettype => 'bool', proargtypes => 'pg_lsn',
+  proargnames => '{trg_lsn}',
+  prosrc => 'pg_waitlsn_no_wait' },
+
 ]
diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h
new file mode 100644
index 0000000000..fd21e43416
--- /dev/null
+++ b/src/include/commands/wait.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * wait.h
+ *	  prototypes for commands/wait.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * src/include/commands/wait.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef WAIT_H
+#define WAIT_H
+#include "postgres.h"
+#include "tcop/dest.h"
+#include "nodes/parsenodes.h"
+
+extern bool WaitUtility(XLogRecPtr lsn, const int timeout_ms);
+extern Size WaitShmemSize(void);
+extern void WaitShmemInit(void);
+extern void WaitSetLatch(XLogRecPtr cur_lsn);
+extern XLogRecPtr GetMinWaitedLSN(void);
+extern void DeleteWaitedLSN(void);
+
+#endif							/* WAIT_H */
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 63bf71ac61..6c4ecd704d 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -113,4 +113,6 @@ extern int	date2isoyearday(int year, int mon, int mday);
 
 extern bool TimestampTimestampTzRequiresRewrite(void);
 
+#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0)
+
 #endif							/* TIMESTAMP_H */
diff --git a/src/test/recovery/t/021_waitlsn.pl b/src/test/recovery/t/021_waitlsn.pl
new file mode 100644
index 0000000000..81dd70ef96
--- /dev/null
+++ b/src/test/recovery/t/021_waitlsn.pl
@@ -0,0 +1,91 @@
+# Checks waitlsn
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# And some content and take a backup
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a");
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Using the backup, create a streaming standby with a 1 second delay
+my $node_standby = get_new_node('standby');
+my $delay        = 1;
+$node_standby->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby->start;
+
+# Check that timeouts make us wait for the specified time (1s here)
+my $current_time = $node_standby->safe_psql('postgres', "SELECT now()");
+my $two_seconds = 2000; # in milliseconds
+my $start_time = time();
+$node_standby->safe_psql('postgres',
+	"SELECT pg_waitlsn('0/FFFFFFFF', $two_seconds)");
+my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds
+ok($time_waited >= $two_seconds, "waitlsn waits for enough time");
+
+# Check that timeouts let us stop waiting right away, before reaching target LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
+my $lsn1 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+my ($ret, $out, $err) = $node_standby->psql('postgres',
+	"SELECT pg_waitlsn('$lsn1', 1)");
+
+ok($ret == 0, "zero return value when failed to waitlsn on standby");
+ok($err =~ /WARNING:  LSN was not reached/,
+	"correct error message when failed to waitlsn on standby");
+ok($out eq "f", "if given too little wait time, WAIT doesn't reach target LSN");
+
+
+# Check that waitlsn works fine and reaches target LSN if given no timeout
+
+# Add data on primary, memorize primary's last LSN
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
+my $lsn2 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Wait for it to appear on replica, memorize replica's last LSN
+$node_standby->safe_psql('postgres',
+	"SELECT pg_waitlsn_infinite('$lsn2')");
+my $reached_lsn = $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn()");
+
+# Make sure that primary's and replica's LSNs are the same after WAIT
+my $compare_lsns = $node_standby->safe_psql('postgres',
+	"SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)");
+ok($compare_lsns eq 0,
+	"standby reached the same LSN as primary before starting transaction");
+
+
+# Make sure that it's not allowed to use waitlsn on primary
+($ret, $out, $err) = $node_primary->psql('postgres',
+	"SELECT pg_waitlsn_infinite('0/FFFFFFFF')");
+
+ok($ret != 0, "non-zero return value when trying to waitlsn on primary");
+ok($err =~ /ERROR:  Cannot use waitlsn on primary/,
+	"correct error message when trying to waitlsn on primary");
+ok($out eq '', "empty output when trying to waitlsn on primary");
+
+# Make sure that waitlsn gives a warning inside a read commited transaction
+
+($ret, $out, $err) = $node_standby->psql('postgres',
+	"BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT pg_waitlsn_no_wait('0/FFFFFFFF')");
+ok($ret == 0, "zero return value when trying to waitlsn in transaction");
+ok($err =~ /WARNING:  Waitlsn may work incorrectly in this isolation level/,
+	"correct warning message when trying to waitlsn in transaction");
+ok($out eq "f", "non empty output when trying to waitlsn in transaction");
+
+$node_standby->stop;
+$node_primary->stop;
\ No newline at end of file
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d1d5d2f0e..6075ee5e77 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2730,6 +2730,7 @@ WaitEventIPC
 WaitEventSet
 WaitEventTimeout
 WaitPMResult
+WaitState
 WalCloseMethod
 WalLevel
 WalRcvData
procedure.txttext/plain; charset=us-asciiDownload
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..635508639a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1474,6 +1474,10 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+CREATE OR REPLACE PROCEDURE
+  pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
+  LANGUAGE internal AS 'pg_waitlsn';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c11387961e..7f25938cbc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11426,4 +11426,8 @@
   proargnames => '{trg_lsn}',
   prosrc => 'pg_waitlsn_no_wait' },
 
+{ oid => '9313', descr => 'wait for LSN to be replayed',
+  proname => 'pg_waitlsn',  prokind => 'p',prorettype => 'void', proargtypes => 'pg_lsn int4',
+  proargnames => '{wait_lsn,timeout}',
+  prosrc => 'pg_waitlsn' }
 ]