some review comments on logical rep code

Started by Fujii Masaoover 8 years ago45 messages
#1Fujii Masao
masao.fujii@gmail.com

Hi,

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

No one resets on_commit_launcher_wakeup flag to false.

ApplyLauncherWakeup() should be static function.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Typo: "an subscriber" should be "a subscriber" in some places.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Noah Misch
noah@leadboat.com
In reply to: Fujii Masao (#1)
Re: some review comments on logical rep code

On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

No one resets on_commit_launcher_wakeup flag to false.

ApplyLauncherWakeup() should be static function.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Typo: "an subscriber" should be "a subscriber" in some places.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#1)
6 attachment(s)
Re: some review comments on logical rep code

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

It seems nobody is working on dealing with these review comments, so
I've attached patches. Since there are lots of review comment I
numbered each review comment. The prefix of patches I attached is
corresponding to review comment number.

1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1]/messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp, and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

7.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.

Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

[1]: /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

001_change_DatumGetObjectId_to_DatumGetInt32.patchapplication/octet-stream; name=001_change_DatumGetObjectId_to_DatumGetInt32.patchDownload
commit 99d4318bf605be07b6393c7ecc5336f0864cd9a5
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:27:50 2017 +0900

    Change DatumGetObjectId to DatumGetInt32.

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 656d399..fa4f3b6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1441,7 +1441,7 @@ subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
 void
 ApplyWorkerMain(Datum main_arg)
 {
-	int				worker_slot = DatumGetObjectId(main_arg);
+	int				worker_slot = DatumGetInt32(main_arg);
 	MemoryContext	oldctx;
 	char			originname[NAMEDATALEN];
 	XLogRecPtr		origin_startpos;
002_Reset_on_commit_launcher_wakeup.patchapplication/octet-stream; name=002_Reset_on_commit_launcher_wakeup.patchDownload
commit 6e1af14d03c4bd9ce7d983b769ff903a98ce9167
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:32:37 2017 +0900

    Reset on_commit_launcher_wakeup after woke up.

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2d663f6..a9688bb 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -620,7 +620,10 @@ void
 AtCommit_ApplyLauncher(void)
 {
 	if (on_commit_launcher_wakeup)
+	{
 		ApplyLauncherWakeup();
+		on_commit_launcher_wakeup = false;
+	}
 }
 
 /*
003_Fix_ApplyLancherWakeUp_function.patchapplication/octet-stream; name=003_Fix_ApplyLancherWakeUp_function.patchDownload
commit 7362ce14def1243808dd535ad53014053551a3e2
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:36:00 2017 +0900

    Make ApplyLauncherWakeup static function.

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index a9688bb..8a8dfe8 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -72,6 +72,7 @@ typedef struct LogicalRepCtxStruct
 
 LogicalRepCtxStruct *LogicalRepCtx;
 
+static void ApplyLauncherWakeup(void);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 
@@ -640,11 +641,10 @@ ApplyLauncherWakeupAtCommit(void)
 		on_commit_launcher_wakeup = true;
 }
 
-void
+static void
 ApplyLauncherWakeup(void)
 {
-	if (IsBackendPid(LogicalRepCtx->launcher_pid))
-		kill(LogicalRepCtx->launcher_pid, SIGUSR1);
+	kill(LogicalRepCtx->launcher_pid, SIGUSR1);
 }
 
 /*
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 060946a..0c2bf03 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -21,7 +21,6 @@ extern void ApplyLauncherMain(Datum main_arg);
 extern Size ApplyLauncherShmemSize(void);
 extern void ApplyLauncherShmemInit(void);
 
-extern void ApplyLauncherWakeup(void);
 extern void ApplyLauncherWakeupAtCommit(void);
 extern void AtCommit_ApplyLauncher(void);
 
006_Prevent_to_emit_statement_log_double.patchapplication/octet-stream; name=006_Prevent_to_emit_statement_log_double.patchDownload
commit 568ecee9953e2b5eca8156071db24fc3ae23d3f1
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:37:03 2017 +0900

    Prevent to log statement double.

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 9d7bb25..69d266f 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -119,8 +119,8 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 {
 	WalReceiverConn *conn;
 	PostgresPollingStatusType status;
-	const char *keys[5];
-	const char *vals[5];
+	const char *keys[6];
+	const char *vals[6];
 	int			i = 0;
 
 	/*
@@ -145,6 +145,9 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	{
 		keys[++i] = "client_encoding";
 		vals[i] = GetDatabaseEncodingName();
+		/* Prevent that command is doubly logged  */
+		keys[++i] = "options";
+		vals[i] = "-c log_statement=none";
 	}
 	keys[++i] = NULL;
 	vals[i] = NULL;
007_Regiter_on_shmem_exit_for_launcher.patchapplication/octet-stream; name=007_Regiter_on_shmem_exit_for_launcher.patchDownload
commit b898b351e074fb3713c9ce5543b1da727239b7fe
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:38:06 2017 +0900

    Register shmem_exit function to reset LogicalRepCtx->launcher_pid.

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8a8dfe8..2e4275d 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,6 +73,7 @@ typedef struct LogicalRepCtxStruct
 LogicalRepCtxStruct *LogicalRepCtx;
 
 static void ApplyLauncherWakeup(void);
+static void	logicalrep_launcher_onexit(int code, Datum arg);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 
@@ -481,6 +482,17 @@ logicalrep_worker_detach(void)
 }
 
 /*
+ * Cleanup function for logical replication launcher.
+ *
+ * Called on logical replication launcher exit.
+ */
+static void
+logicalrep_launcher_onexit(int code, Datum arg)
+{
+	LogicalRepCtx->launcher_pid = 0;
+}
+
+/*
  * Cleanup function.
  *
  * Called on logical replication worker exit.
@@ -656,6 +668,8 @@ ApplyLauncherMain(Datum main_arg)
 	ereport(DEBUG1,
 			(errmsg("logical replication launcher started")));
 
+	before_shmem_exit(logicalrep_launcher_onexit, (Datum) 0);
+
 	/* Establish signal handlers. */
 	pqsignal(SIGHUP, logicalrep_worker_sighup);
 	pqsignal(SIGTERM, logicalrep_worker_sigterm);
009_Add_not_null_constraint.patchapplication/octet-stream; name=009_Add_not_null_constraint.patchDownload
commit 0c0dcbf03de108cd923cb771fa5fa2aed781a71b
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:38:56 2017 +0900

    Add NOT NULL constraint to all columns of pg_subscription.

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index a183850..3cc762a 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -74,7 +74,6 @@ GetSubscription(Oid subid, bool missing_ok)
 							tup,
 							Anum_pg_subscription_subconninfo,
 							&isnull);
-	Assert(!isnull);
 	sub->conninfo = TextDatumGetCString(datum);
 
 	/* Get slotname */
@@ -82,7 +81,6 @@ GetSubscription(Oid subid, bool missing_ok)
 							tup,
 							Anum_pg_subscription_subslotname,
 							&isnull);
-	Assert(!isnull);
 	sub->slotname = pstrdup(NameStr(*DatumGetName(datum)));
 
 	/* Get synccommit */
@@ -90,7 +88,6 @@ GetSubscription(Oid subid, bool missing_ok)
 							tup,
 							Anum_pg_subscription_subsynccommit,
 							&isnull);
-	Assert(!isnull);
 	sub->synccommit = TextDatumGetCString(datum);
 
 	/* Get publications */
@@ -98,7 +95,6 @@ GetSubscription(Oid subid, bool missing_ok)
 							tup,
 							Anum_pg_subscription_subpublications,
 							&isnull);
-	Assert(!isnull);
 	sub->publications = textarray_to_stringlist(DatumGetArrayTypeP(datum));
 
 	ReleaseSysCache(tup);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 519c684..1ef4fa3 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -796,19 +796,16 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	/* Get subname */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
 							Anum_pg_subscription_subname, &isnull);
-	Assert(!isnull);
 	subname = pstrdup(NameStr(*DatumGetName(datum)));
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
 							Anum_pg_subscription_subconninfo, &isnull);
-	Assert(!isnull);
 	conninfo = TextDatumGetCString(datum);
 
 	/* Get slotname */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
 							Anum_pg_subscription_subslotname, &isnull);
-	Assert(!isnull);
 	slotname = pstrdup(NameStr(*DatumGetName(datum)));
 
 	ObjectAddressSet(myself, SubscriptionRelationId, subid);
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index fae542b..a069772 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -41,10 +41,10 @@ CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHE
 								 * (the worker should be running) */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		subconninfo;	/* Connection string to the publisher */
-	NameData	subslotname;	/* Slot name on publisher */
-	text		subsynccommit;	/* Synchronous commit setting for worker */
-	text		subpublications[1];	/* List of publications subscribed to */
+	text		subconninfo BKI_FORCE_NOT_NULL;	/* Connection string to the publisher */
+	NameData	subslotname BKI_FORCE_NOT_NULL;	/* Slot name on publisher */
+	text		subsynccommit BKI_FORCE_NOT_NULL;	/* Synchronous commit setting for worker */
+	text		subpublications[1] BKI_FORCE_NOT_NULL;	/* List of publications subscribed to */
 #endif
 } FormData_pg_subscription;
 
#4Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#3)
Re: some review comments on logical rep code

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

It seems nobody is working on dealing with these review comments, so
I've attached patches. Since there are lots of review comment I
numbered each review comment. The prefix of patches I attached is
corresponding to review comment number.

1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
Oid relid)
{
int slot;

...

for (slot = 0; slot < max_logical_replication_workers; slot++)

...

bgw.bgw_main_arg = slot;

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

7.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.

Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

[1] /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro HORIGUCHI (#4)
Re: some review comments on logical rep code

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

It seems nobody is working on dealing with these review comments, so
I've attached patches. Since there are lots of review comment I
numbered each review comment. The prefix of patches I attached is
corresponding to review comment number.

Thank you for reviewing.

1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
Oid relid)
{
int slot;

...

for (slot = 0; slot < max_logical_replication_workers; slot++)

...

bgw.bgw_main_arg = slot;

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

7.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.

Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

You're right. Will fix it.

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

[1] /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

--
Kyotaro Horiguchi
NTT Open Source Software Center

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#5)
3 attachment(s)
Re: some review comments on logical rep code

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

It seems nobody is working on dealing with these review comments, so
I've attached patches. Since there are lots of review comment I
numbered each review comment. The prefix of patches I attached is
corresponding to review comment number.

Thank you for reviewing.

1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
Oid relid)
{
int slot;

...

for (slot = 0; slot < max_logical_replication_workers; slot++)

...

bgw.bgw_main_arg = slot;

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

7.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.

Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

You're right. Will fix it.

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

[1] /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

--
Kyotaro Horiguchi
NTT Open Source Software Center

I've attached latest v2 three patches; 001, 006 and 009. The review
comments I got so far are incorporated.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

001_change_DatumGetObjectId_to_DatumGetInt32_v2.patchapplication/octet-stream; name=001_change_DatumGetObjectId_to_DatumGetInt32_v2.patchDownload
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2d663f6..394b1eb 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -305,7 +305,7 @@ logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
 
 	bgw.bgw_restart_time = BGW_NEVER_RESTART;
 	bgw.bgw_notify_pid = MyProcPid;
-	bgw.bgw_main_arg = slot;
+	bgw.bgw_main_arg = Int32GetDatum(slot);
 
 	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
 	{
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 656d399..fa4f3b6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1441,7 +1441,7 @@ subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
 void
 ApplyWorkerMain(Datum main_arg)
 {
-	int				worker_slot = DatumGetObjectId(main_arg);
+	int				worker_slot = DatumGetInt32(main_arg);
 	MemoryContext	oldctx;
 	char			originname[NAMEDATALEN];
 	XLogRecPtr		origin_startpos;
006_Prevent_to_emit_statement_log_double_v2.patchapplication/octet-stream; name=006_Prevent_to_emit_statement_log_double_v2.patchDownload
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index dbb10c7..678ae83 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1369,14 +1369,6 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
-	 * Log replication command if log_replication_commands is enabled. Even
-	 * when it's disabled, log the command with DEBUG1 level for backward
-	 * compatibility.
-	 */
-	ereport(log_replication_commands ? LOG : DEBUG1,
-			(errmsg("received replication command: %s", cmd_string)));
-
-	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -1400,6 +1392,16 @@ exec_replication_command(const char *cmd_string)
 	cmd_node = replication_parse_result;
 
 	/*
+	 * Log replication command if log_replication_commands is enabled. Even
+	 * when it's disabled, log the command with DEBUG1 level for backward
+	 * compatibility. To prevent the command is logged doubly, we doesn't
+	 * log it when the command is a SQL command.
+	 */
+	if (cmd_node->type != T_SQLCmd)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				(errmsg("received replication command: %s", cmd_string)));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
 	 * called outside of transaction the snapshot should be cleared here.
 	 */
009_Add_not_null_constraint_v2.patchapplication/octet-stream; name=009_Add_not_null_constraint_v2.patchDownload
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index fae542b..a069772 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -41,10 +41,10 @@ CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHE
 								 * (the worker should be running) */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		subconninfo;	/* Connection string to the publisher */
-	NameData	subslotname;	/* Slot name on publisher */
-	text		subsynccommit;	/* Synchronous commit setting for worker */
-	text		subpublications[1];	/* List of publications subscribed to */
+	text		subconninfo BKI_FORCE_NOT_NULL;	/* Connection string to the publisher */
+	NameData	subslotname BKI_FORCE_NOT_NULL;	/* Slot name on publisher */
+	text		subsynccommit BKI_FORCE_NOT_NULL;	/* Synchronous commit setting for worker */
+	text		subpublications[1] BKI_FORCE_NOT_NULL;	/* List of publications subscribed to */
 #endif
 } FormData_pg_subscription;
 
#7Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#6)
Re: some review comments on logical rep code

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

Yes. For example, code to send a signal to autovacuum launcher
looks as the follows. I don't think there's a reason to do
different thing here.

| avlauncher_needs_signal = false;
| if (AutoVacPID != 0)
| kill(AutoVacPID, SIGUSR2);

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

Here, we can choose whether a SQL command is a part of
replication commands or not. The previous fix thought it is and
this doesn't. (They are emitted in different forms) Is this ok?

I'm not sure why you have moved the location of the code, but if
it should show all received commands, it might be better to be
placed before CHECK_FOR_INTERRUPTS..

The comment for the code seems should be more clearly.

- * compatibility. To prevent the command is logged doubly, we doesn't
- * log it when the command is a SQL command.
+ * compatibility. SQL command are logged later according
+ * to log_statement setting.

7.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.

Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

You're right. Will fix it.

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

[1] /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

--
Kyotaro Horiguchi
NTT Open Source Software Center

I've attached latest v2 three patches; 001, 006 and 009. The review
comments I got so far are incorporated.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro HORIGUCHI (#7)
6 attachment(s)
Re: some review comments on logical rep code

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

I understood. Sounds reasonable.

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

Yes. For example, code to send a signal to autovacuum launcher
looks as the follows. I don't think there's a reason to do
different thing here.

| avlauncher_needs_signal = false;
| if (AutoVacPID != 0)
| kill(AutoVacPID, SIGUSR2);

Fixed.

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

Here, we can choose whether a SQL command is a part of
replication commands or not. The previous fix thought it is and
this doesn't. (They are emitted in different forms) Is this ok?

Yes, v2 patch logs other than T_SQLCmd as a replication command, and
T_SQLCmd is logged later in exec_simple_query. The
log_replication_command logs only replication commands[1]https://www.postgresql.org/docs/devel/static/runtime-config-logging.html#guc-log-replication-commands, it doesn't
mean to log commands executed on replication connection. I think such
behavior is right.

I'm not sure why you have moved the location of the code, but if
it should show all received commands, it might be better to be
placed before CHECK_FOR_INTERRUPTS..

Hmm, the reason why I've moved it is that we cannot know whether given
cmd_string is a SQL command or a replication command before parsing.

The comment for the code seems should be more clearly.

- * compatibility. To prevent the command is logged doubly, we doesn't
- * log it when the command is a SQL command.
+ * compatibility. SQL command are logged later according
+ * to log_statement setting.

Fixed.

7.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.

Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

You're right. Will fix it.

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

[1] /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

--
Kyotaro Horiguchi
NTT Open Source Software Center

I've attached latest v2 three patches; 001, 006 and 009. The review
comments I got so far are incorporated.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attached current latest all patches.

[1]: https://www.postgresql.org/docs/devel/static/runtime-config-logging.html#guc-log-replication-commands

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

006_Prevent_to_emit_statement_log_double_v3.patchapplication/octet-stream; name=006_Prevent_to_emit_statement_log_double_v3.patchDownload
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index dbb10c7..f56b557 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1369,14 +1369,6 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
-	 * Log replication command if log_replication_commands is enabled. Even
-	 * when it's disabled, log the command with DEBUG1 level for backward
-	 * compatibility.
-	 */
-	ereport(log_replication_commands ? LOG : DEBUG1,
-			(errmsg("received replication command: %s", cmd_string)));
-
-	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -1400,6 +1392,17 @@ exec_replication_command(const char *cmd_string)
 	cmd_node = replication_parse_result;
 
 	/*
+	 * Log replication command if log_replication_commands is enabled. Even
+	 * when it's disabled, log the command with DEBUG1 level for backward
+	 * compatibility. SQL commmands are logged later according to log_statement
+	 * setting. To prevent the command is logged doubly, we doesn't log it
+	 * when the command is a SQL command.
+	 */
+	if (cmd_node->type != T_SQLCmd)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				(errmsg("received replication command: %s", cmd_string)));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
 	 * called outside of transaction the snapshot should be cleared here.
 	 */
001_change_DatumGetObjectId_to_DatumGetInt32_v3.patchapplication/octet-stream; name=001_change_DatumGetObjectId_to_DatumGetInt32_v3.patchDownload
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2d663f6..394b1eb 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -305,7 +305,7 @@ logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
 
 	bgw.bgw_restart_time = BGW_NEVER_RESTART;
 	bgw.bgw_notify_pid = MyProcPid;
-	bgw.bgw_main_arg = slot;
+	bgw.bgw_main_arg = Int32GetDatum(slot);
 
 	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
 	{
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 656d399..fa4f3b6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1441,7 +1441,7 @@ subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
 void
 ApplyWorkerMain(Datum main_arg)
 {
-	int				worker_slot = DatumGetObjectId(main_arg);
+	int				worker_slot = DatumGetInt32(main_arg);
 	MemoryContext	oldctx;
 	char			originname[NAMEDATALEN];
 	XLogRecPtr		origin_startpos;
002_Reset_on_commit_launcher_wakeup_v3.patchapplication/octet-stream; name=002_Reset_on_commit_launcher_wakeup_v3.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 92b263a..e09ed67 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2138,7 +2138,7 @@ CommitTransaction(void)
 	AtEOXact_HashTables(true);
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
-	AtCommit_ApplyLauncher();
+	AtEOXact_ApplyLauncher(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2612,6 +2612,7 @@ AbortTransaction(void)
 		AtEOXact_ComboCid();
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
+		AtEOXact_ApplyLauncher(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2d663f6..d51629b 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -614,13 +614,17 @@ ApplyLauncherShmemInit(void)
 }
 
 /*
+ * AtEOXact_ApplyLauncher
+ *
  * Wakeup the launcher on commit if requested.
  */
 void
-AtCommit_ApplyLauncher(void)
+AtEOXact_ApplyLauncher(bool isCommit)
 {
-	if (on_commit_launcher_wakeup)
+	if (isCommit && on_commit_launcher_wakeup)
 		ApplyLauncherWakeup();
+
+	on_commit_launcher_wakeup = false;
 }
 
 /*
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 060946a..8ad5798 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -23,6 +23,6 @@ extern void ApplyLauncherShmemInit(void);
 
 extern void ApplyLauncherWakeup(void);
 extern void ApplyLauncherWakeupAtCommit(void);
-extern void AtCommit_ApplyLauncher(void);
+extern void AtEOXact_ApplyLauncher(bool isCommit);
 
 #endif   /* LOGICALLAUNCHER_H */
009_Add_not_null_constraint_v3.patchapplication/octet-stream; name=009_Add_not_null_constraint_v3.patchDownload
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index fae542b..a069772 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -41,10 +41,10 @@ CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHE
 								 * (the worker should be running) */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		subconninfo;	/* Connection string to the publisher */
-	NameData	subslotname;	/* Slot name on publisher */
-	text		subsynccommit;	/* Synchronous commit setting for worker */
-	text		subpublications[1];	/* List of publications subscribed to */
+	text		subconninfo BKI_FORCE_NOT_NULL;	/* Connection string to the publisher */
+	NameData	subslotname BKI_FORCE_NOT_NULL;	/* Slot name on publisher */
+	text		subsynccommit BKI_FORCE_NOT_NULL;	/* Synchronous commit setting for worker */
+	text		subpublications[1] BKI_FORCE_NOT_NULL;	/* List of publications subscribed to */
 #endif
 } FormData_pg_subscription;
 
007_Regiter_on_shmem_exit_for_launcher_v3.patchapplication/octet-stream; name=007_Regiter_on_shmem_exit_for_launcher_v3.patchDownload
commit b898b351e074fb3713c9ce5543b1da727239b7fe
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date:   Mon Apr 17 17:38:06 2017 +0900

    Register shmem_exit function to reset LogicalRepCtx->launcher_pid.

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8a8dfe8..2e4275d 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,6 +73,7 @@ typedef struct LogicalRepCtxStruct
 LogicalRepCtxStruct *LogicalRepCtx;
 
 static void ApplyLauncherWakeup(void);
+static void	logicalrep_launcher_onexit(int code, Datum arg);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 
@@ -481,6 +482,17 @@ logicalrep_worker_detach(void)
 }
 
 /*
+ * Cleanup function for logical replication launcher.
+ *
+ * Called on logical replication launcher exit.
+ */
+static void
+logicalrep_launcher_onexit(int code, Datum arg)
+{
+	LogicalRepCtx->launcher_pid = 0;
+}
+
+/*
  * Cleanup function.
  *
  * Called on logical replication worker exit.
@@ -656,6 +668,8 @@ ApplyLauncherMain(Datum main_arg)
 	ereport(DEBUG1,
 			(errmsg("logical replication launcher started")));
 
+	before_shmem_exit(logicalrep_launcher_onexit, (Datum) 0);
+
 	/* Establish signal handlers. */
 	pqsignal(SIGHUP, logicalrep_worker_sighup);
 	pqsignal(SIGTERM, logicalrep_worker_sigterm);
003_005_Fix_ApplyLancherWakeUp_function_v3.patchapplication/octet-stream; name=003_005_Fix_ApplyLancherWakeUp_function_v3.patchDownload
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 2d663f6..c025d6b 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -72,6 +72,7 @@ typedef struct LogicalRepCtxStruct
 
 LogicalRepCtxStruct *LogicalRepCtx;
 
+static void ApplyLauncherWakeup(void);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 
@@ -637,10 +638,10 @@ ApplyLauncherWakeupAtCommit(void)
 		on_commit_launcher_wakeup = true;
 }
 
-void
+static void
 ApplyLauncherWakeup(void)
 {
-	if (IsBackendPid(LogicalRepCtx->launcher_pid))
+	if (LogicalRepCtx->launcher_pid != 0)
 		kill(LogicalRepCtx->launcher_pid, SIGUSR1);
 }
 
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 060946a..0c2bf03 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -21,7 +21,6 @@ extern void ApplyLauncherMain(Datum main_arg);
 extern Size ApplyLauncherShmemSize(void);
 extern void ApplyLauncherShmemInit(void);
 
-extern void ApplyLauncherWakeup(void);
 extern void ApplyLauncherWakeupAtCommit(void);
 extern void AtCommit_ApplyLauncher(void);
 
#9Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Masahiko Sawada (#8)
1 attachment(s)
Re: some review comments on logical rep code

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

Yes. For example, code to send a signal to autovacuum launcher
looks as the follows. I don't think there's a reason to do
different thing here.

| avlauncher_needs_signal = false;
| if (AutoVacPID != 0)
| kill(AutoVacPID, SIGUSR2);

Fixed.

Yes the IsBackendPid was needed only because we didn't reset
launcher_pid and it was causing issues otherwise obviously (and I didn't
realize we are not resetting it...)

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

Here, we can choose whether a SQL command is a part of
replication commands or not. The previous fix thought it is and
this doesn't. (They are emitted in different forms) Is this ok?

Yes, v2 patch logs other than T_SQLCmd as a replication command, and
T_SQLCmd is logged later in exec_simple_query. The
log_replication_command logs only replication commands[1], it doesn't
mean to log commands executed on replication connection. I think such
behavior is right.

I'm not sure why you have moved the location of the code, but if
it should show all received commands, it might be better to be
placed before CHECK_FOR_INTERRUPTS..

Hmm, the reason why I've moved it is that we cannot know whether given
cmd_string is a SQL command or a replication command before parsing.

Well the issue is that then the query is not logged if there was parsing
issue even when logging was specifically requested. I don't know what's
good solution here, maybe teaching exec_simple_query to not log the
query if it came from walsender.

BTW reading that function in walsender, there is :

/*
* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
* command arrives. Clean up the old stuff if there's anything.
*/
SnapBuildClearExportedSnapshot();

and then

/*
* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
* called outside of transaction the snapshot should be cleared here.
*/
if (!IsTransactionBlock())
SnapBuildClearExportedSnapshot();

The first one should not be there, it looks like a result of some merge
conflict being solved wrongly. Maybe your patch could fix that too?

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

Hmm, I think doing what I attached should be fine here. We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

010_dont-read-catalog-inside-spinlock.patchtext/plain; charset=UTF-8; name=010_dont-read-catalog-inside-spinlock.patchDownload
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 108326b..73e8c42 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -705,17 +705,19 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
 {
 	char		   *slotname;
 	char		   *err;
+	char			relstate;
 
 	/* Check the state of the table synchronization. */
 	StartTransactionCommand();
+	relstate = GetSubscriptionRelState(MyLogicalRepWorker->subid,
+									   MyLogicalRepWorker->relid,
+									   &MyLogicalRepWorker->relstate_lsn,
+									   false);
+	CommitTransactionCommand();
+
 	SpinLockAcquire(&MyLogicalRepWorker->relmutex);
-	MyLogicalRepWorker->relstate =
-		GetSubscriptionRelState(MyLogicalRepWorker->subid,
-								MyLogicalRepWorker->relid,
-								&MyLogicalRepWorker->relstate_lsn,
-								false);
+	MyLogicalRepWorker->relstate = relstate;
 	SpinLockRelease(&MyLogicalRepWorker->relmutex);
-	CommitTransactionCommand();
 
 	/*
 	 * To build a slot name for the sync work, we are limited to NAMEDATALEN -
#10Fujii Masao
masao.fujii@gmail.com
In reply to: Petr Jelinek (#9)
Re: some review comments on logical rep code

On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

3.

ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

Yes. For example, code to send a signal to autovacuum launcher
looks as the follows. I don't think there's a reason to do
different thing here.

| avlauncher_needs_signal = false;
| if (AutoVacPID != 0)
| kill(AutoVacPID, SIGUSR2);

Fixed.

Yes the IsBackendPid was needed only because we didn't reset
launcher_pid and it was causing issues otherwise obviously (and I didn't
realize we are not resetting it...)

6.

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

Attached 006 patch fixes it by adding "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

Here, we can choose whether a SQL command is a part of
replication commands or not. The previous fix thought it is and
this doesn't. (They are emitted in different forms) Is this ok?

Yes, v2 patch logs other than T_SQLCmd as a replication command, and
T_SQLCmd is logged later in exec_simple_query. The
log_replication_command logs only replication commands[1], it doesn't
mean to log commands executed on replication connection. I think such
behavior is right.

I'm not sure why you have moved the location of the code, but if
it should show all received commands, it might be better to be
placed before CHECK_FOR_INTERRUPTS..

Hmm, the reason why I've moved it is that we cannot know whether given
cmd_string is a SQL command or a replication command before parsing.

Well the issue is that then the query is not logged if there was parsing
issue even when logging was specifically requested. I don't know what's
good solution here, maybe teaching exec_simple_query to not log the
query if it came from walsender.

BTW reading that function in walsender, there is :

/*
* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
* command arrives. Clean up the old stuff if there's anything.
*/
SnapBuildClearExportedSnapshot();

and then

/*
* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
* called outside of transaction the snapshot should be cleared here.
*/
if (!IsTransactionBlock())
SnapBuildClearExportedSnapshot();

The first one should not be there, it looks like a result of some merge
conflict being solved wrongly. Maybe your patch could fix that too?

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

Hmm, I think doing what I attached should be fine here.

Thanks for the patch!

We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.

Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
be protected with the spinlock.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Fujii Masao (#10)
1 attachment(s)
Re: some review comments on logical rep code

On 18/04/17 19:27, Fujii Masao wrote:

On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

Hmm, I think doing what I attached should be fine here.

Thanks for the patch!

We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.

Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
be protected with the spinlock.

Yes, sorry tired/blind, fixed.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

010_dont-read-catalog-inside-spinlock-v2.patchtext/plain; charset=UTF-8; name=010_dont-read-catalog-inside-spinlock-v2.patchDownload
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 108326b..7262709 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -703,19 +703,22 @@ copy_table(Relation rel)
 char *
 LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
 {
-	char		   *slotname;
-	char		   *err;
+	char	   *slotname;
+	char	   *err;
+	char		relstate;
+	XLogRecPtr	relstate_lsn;
 
 	/* Check the state of the table synchronization. */
 	StartTransactionCommand();
+	relstate = GetSubscriptionRelState(MyLogicalRepWorker->subid,
+									   MyLogicalRepWorker->relid,
+									   &relstate_lsn, false);
+	CommitTransactionCommand();
+
 	SpinLockAcquire(&MyLogicalRepWorker->relmutex);
-	MyLogicalRepWorker->relstate =
-		GetSubscriptionRelState(MyLogicalRepWorker->subid,
-								MyLogicalRepWorker->relid,
-								&MyLogicalRepWorker->relstate_lsn,
-								false);
+	MyLogicalRepWorker->relstate = relstate;
+	MyLogicalRepWorker->relstate_lsn = relstate_lsn;
 	SpinLockRelease(&MyLogicalRepWorker->relmutex);
-	CommitTransactionCommand();
 
 	/*
 	 * To build a slot name for the sync work, we are limited to NAMEDATALEN -
#12Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Petr Jelinek (#11)
Re: some review comments on logical rep code

At Wed, 19 Apr 2017 04:18:18 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <c2cfda3b-9335-2b57-e9ee-a55a8646afcd@2ndquadrant.com>

On 18/04/17 19:27, Fujii Masao wrote:

On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

Hmm, I think doing what I attached should be fine here.

Thanks for the patch!

We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.

Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
be protected with the spinlock.

Yes, sorry tired/blind, fixed.

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Kyotaro HORIGUCHI (#12)
Re: some review comments on logical rep code

On 19/04/17 10:25, Kyotaro HORIGUCHI wrote:

At Wed, 19 Apr 2017 04:18:18 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <c2cfda3b-9335-2b57-e9ee-a55a8646afcd@2ndquadrant.com>

On 18/04/17 19:27, Fujii Masao wrote:

On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

10.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock. (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)

Hmm, I think doing what I attached should be fine here.

Thanks for the patch!

We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.

Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
be protected with the spinlock.

Yes, sorry tired/blind, fixed.

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

We just read catalogs so there should not be locking issues.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Petr Jelinek (#13)
Re: some review comments on logical rep code

At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

We just read catalogs so there should not be locking issues.

Some other process may modify it then go to there. With a kind of
priority inversion, the process may modify the data on the memory
*before* we modify it. Of course this is rather unrealistic,
though.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#14)
Re: some review comments on logical rep code

At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170419.174317.114509231.horiguchi.kyotaro@lab.ntt.co.jp>

At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

We just read catalogs so there should not be locking issues.

Some other process may modify it then go to there. With a kind of
priority inversion, the process may modify the data on the memory
*before* we modify it. Of course this is rather unrealistic,
though.

A bit short.

Some other process may modify it *after* we read it then go to
there. With a kind of priority inversion, the process may modify
the data on the memory *before* we modify it. Of course this is
rather unrealistic, though.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Kyotaro HORIGUCHI (#15)
Re: some review comments on logical rep code

On 19/04/17 10:45, Kyotaro HORIGUCHI wrote:

At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170419.174317.114509231.horiguchi.kyotaro@lab.ntt.co.jp>

At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

We just read catalogs so there should not be locking issues.

Some other process may modify it then go to there. With a kind of
priority inversion, the process may modify the data on the memory
*before* we modify it. Of course this is rather unrealistic,
though.

A bit short.

Some other process may modify it *after* we read it then go to
there. With a kind of priority inversion, the process may modify
the data on the memory *before* we modify it. Of course this is
rather unrealistic, though.

Yeah but I think that's risk anyway in MVCC read committed database, the
only way to protect against that would be to hold lock over the catalogs
access which was what we wanted to get rid of :)

BTW the whole sync coordination is explicitly written in a way that
unless you mess with catalogs manually only the tablesync process should
do UPDATEs to that catalog (with the exception of s->r state switch
which can happen in apply and has no effect on sync because both states
mean that synchronization is done, only makes apply stop tracking the
table individually).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#2)
Re: some review comments on logical rep code

On Sun, Apr 16, 2017 at 06:14:49AM +0000, Noah Misch wrote:

On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

No one resets on_commit_launcher_wakeup flag to false.

ApplyLauncherWakeup() should be static function.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Typo: "an subscriber" should be "a subscriber" in some places.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1] /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Masahiko Sawada (#8)
Re: some review comments on logical rep code

On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

I understood. Sounds reasonable.

ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.
To fix this issue, we should call AtEOXact_ApplyLauncher() in
FinishPreparedTransaction() or somewhere?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Fujii Masao
masao.fujii@gmail.com
In reply to: Noah Misch (#17)
Re: some review comments on logical rep code

On Thu, Apr 20, 2017 at 12:05 PM, Noah Misch <noah@leadboat.com> wrote:

On Sun, Apr 16, 2017 at 06:14:49AM +0000, Noah Misch wrote:

On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

No one resets on_commit_launcher_wakeup flag to false.

ApplyLauncherWakeup() should be static function.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Typo: "an subscriber" should be "a subscriber" in some places.

/* Get conninfo */
datum = SysCacheGetAttr(SUBSCRIPTIONOID,
tup,
Anum_pg_subscription_subconninfo,
&isnull);
Assert(!isnull);

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.

SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,
false);
SpinLockRelease(&MyLogicalRepWorker->relmutex);

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1] /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

I'm not the owner of this item, but the current status is;

I've pushed several patches, and there is now only one remaining patch.
I posted the review comment on that patch, and I'm expecting that
Masahiko-san will update the patch. So what about waiting for the updated
version of the patch by next Monday (April 24th)?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Fujii Masao (#19)
Re: some review comments on logical rep code

On 4/20/17 12:30, Fujii Masao wrote:

I've pushed several patches, and there is now only one remaining patch.
I posted the review comment on that patch, and I'm expecting that
Masahiko-san will update the patch. So what about waiting for the updated
version of the patch by next Monday (April 24th)?

Thank you for pitching in.

Where is your latest patch?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#20)
Re: some review comments on logical rep code

On Fri, Apr 21, 2017 at 4:41 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/20/17 12:30, Fujii Masao wrote:

I've pushed several patches, and there is now only one remaining patch.
I posted the review comment on that patch, and I'm expecting that
Masahiko-san will update the patch. So what about waiting for the updated
version of the patch by next Monday (April 24th)?

Thank you for pitching in.

Where is your latest patch?

The remaining patch is 002_Reset_on_commit_launcher_wakeup_v3.patch
attached on this mail[1]/messages/by-id/CAD21AoB1HiZV++ah=VrQjVZoH+b6Rn8Atv6Miz+HCp0j+L6GZQ@mail.gmail.com. I'll update this patch and send it.

[1]: /messages/by-id/CAD21AoB1HiZV++ah=VrQjVZoH+b6Rn8Atv6Miz+HCp0j+L6GZQ@mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Petr Jelinek (#16)
Re: some review comments on logical rep code

At Wed, 19 Apr 2017 10:59:00 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <3ef9c831-0508-51a9-5ded-c2e31e958a9f@2ndquadrant.com>

On 19/04/17 10:45, Kyotaro HORIGUCHI wrote:

At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170419.174317.114509231.horiguchi.kyotaro@lab.ntt.co.jp>

At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>
Some other process may modify it then go to there. With a kind of
priority inversion, the process may modify the data on the memory
*before* we modify it. Of course this is rather unrealistic,
though.

A bit short.

Some other process may modify it *after* we read it then go to
there. With a kind of priority inversion, the process may modify
the data on the memory *before* we modify it. Of course this is
rather unrealistic, though.

Yeah but I think that's risk anyway in MVCC read committed database, the
only way to protect against that would be to hold lock over the catalogs
access which was what we wanted to get rid of :)

Agreed, or, I'm not sure about the real risks.

BTW the whole sync coordination is explicitly written in a way that
unless you mess with catalogs manually only the tablesync process should
do UPDATEs to that catalog (with the exception of s->r state switch
which can happen in apply and has no effect on sync because both states
mean that synchronization is done, only makes apply stop tracking the
table individually).

Hmm. Inhibiting retrospective updates of the on-memory data would
save from some kind of priority inversions but such kind of
manual operation is similar to overwriting of pg_control and I
think is not worth bothering about:p

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#18)
Re: some review comments on logical rep code

On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

I understood. Sounds reasonable.

ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.

Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
to a prepared transaction. Even COMMIT PREPARED might wake up the
launcher process but might not wake up it. ROLLBACK PREPARED is also
the same. For example, in the following situation we wake up the
launcher at COMMIT, not at COMMIT PREPARED.

(BTW, CreateSubscription, is the only one function that sets
on_commit_launcher_wakeup for now, cannot be called in a transaction
block. So it doesn't actually happen that we wake up the launcher when
a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
SUBSCRIPTION ENABLE, we need to deal with it.)

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

To fix this issue, we should call AtEOXact_ApplyLauncher() in
FinishPreparedTransaction() or somewhere?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#21)
Re: some review comments on logical rep code

On Fri, Apr 21, 2017 at 11:55 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 21, 2017 at 4:41 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/20/17 12:30, Fujii Masao wrote:

I've pushed several patches, and there is now only one remaining patch.
I posted the review comment on that patch, and I'm expecting that
Masahiko-san will update the patch. So what about waiting for the updated
version of the patch by next Monday (April 24th)?

Thank you for pitching in.

Where is your latest patch?

The remaining patch is 002_Reset_on_commit_launcher_wakeup_v3.patch

Oops, there is one more remaining issue that reported on this thread
and another one[1]/messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp. The latest patch is attached on [1]/messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp.

[1]: /messages/by-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Fujii Masao
masao.fujii@gmail.com
In reply to: Masahiko Sawada (#23)
Re: some review comments on logical rep code

On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

I understood. Sounds reasonable.

ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.

Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
to a prepared transaction. Even COMMIT PREPARED might wake up the
launcher process but might not wake up it. ROLLBACK PREPARED is also
the same. For example, in the following situation we wake up the
launcher at COMMIT, not at COMMIT PREPARED.

(BTW, CreateSubscription, is the only one function that sets
on_commit_launcher_wakeup for now, cannot be called in a transaction
block. So it doesn't actually happen that we wake up the launcher when
a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
SUBSCRIPTION ENABLE, we need to deal with it.)

We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
within the transaction block.

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#25)
1 attachment(s)
Re: some review comments on logical rep code

On Sat, Apr 22, 2017 at 4:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

I understood. Sounds reasonable.

ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.

Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
to a prepared transaction. Even COMMIT PREPARED might wake up the
launcher process but might not wake up it. ROLLBACK PREPARED is also
the same. For example, in the following situation we wake up the
launcher at COMMIT, not at COMMIT PREPARED.

(BTW, CreateSubscription, is the only one function that sets
on_commit_launcher_wakeup for now, cannot be called in a transaction
block. So it doesn't actually happen that we wake up the launcher when
a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
SUBSCRIPTION ENABLE, we need to deal with it.)

We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
within the transaction block.

Oops, I'd missed it.

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

002_Reset_on_commit_launcher_wakeup_v4.patchapplication/octet-stream; name=002_Reset_on_commit_launcher_wakeup_v4.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 92b263a..e09ed67 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2138,7 +2138,7 @@ CommitTransaction(void)
 	AtEOXact_HashTables(true);
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
-	AtCommit_ApplyLauncher();
+	AtEOXact_ApplyLauncher(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2612,6 +2612,7 @@ AbortTransaction(void)
 		AtEOXact_ComboCid();
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
+		AtEOXact_ApplyLauncher(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index f6ae610..92cf274 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -633,13 +633,26 @@ ApplyLauncherShmemInit(void)
 }
 
 /*
- * Wakeup the launcher on commit if requested.
+ * AtEOXact_ApplyLauncher
+ *      Wakeup the launcher on commit if requested.
+ *
+ * Note that on_commit_launcher_wakeup flag doesn't work as expected in
+ * two-phase commit case. For example, COMMIT PREPARED on the transaction
+ * enabling the flag doesn't wake up  the logical replication launcher if
+ * ROLLBACK on another transaction runs before it. To handle this case
+ * properly, the flag needs to be recoreded in two-phase commit state file
+ * so that COMMIT PREPARED and ROLLBACK PREPREAED can determine whether to
+ * wake up the launcher. However this is overkill for small gain and false
+ * wakeup of the launcher is not so harmful (probably we can live with
+ * that), so we do nothing here for this issue.
  */
 void
-AtCommit_ApplyLauncher(void)
+AtEOXact_ApplyLauncher(bool isCommit)
 {
-	if (on_commit_launcher_wakeup)
+	if (isCommit && on_commit_launcher_wakeup)
 		ApplyLauncherWakeup();
+
+	on_commit_launcher_wakeup = false;
 }
 
 /*
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 0c2bf03..fb3c2f5 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,6 +22,6 @@ extern Size ApplyLauncherShmemSize(void);
 extern void ApplyLauncherShmemInit(void);
 
 extern void ApplyLauncherWakeupAtCommit(void);
-extern void AtCommit_ApplyLauncher(void);
+extern void AtEOXact_ApplyLauncher(bool isCommit);
 
 #endif   /* LOGICALLAUNCHER_H */
#27Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#26)
Re: some review comments on logical rep code

Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

void
AtPrepare_ApplyLauncher(void)
{
if (on_commit_launcher_wakeup)
ereport(WARNING,
(errmsg ("logical replication may not start immediately"),

errhint("PREPARE TRANSACTION can hinder immediate lanching of logical replication worker.")));
}

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro HORIGUCHI (#27)
Re: some review comments on logical rep code

On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most'

In what case does launching of the apply worker delay 3 minutes at most?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#28)
Re: some review comments on logical rep code

At Tue, 25 Apr 2017 10:11:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBAXRMBBaH5-mYXxksPaTis0xt_-yvvb2Y+jG2m-EWAoQ@mail.gmail.com>

On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most'

In what case does launching of the apply worker delay 3 minutes at most?

In the "while (!got_SIGTERM)" loop in ApplyLauncherMain, if we
tried to run enabled subscriptions but no subscription is
enabled, it waits for 3 minutes. If we turn on a subscription but
the trigger is consumed by the false wakeup, the loop sees no
enabled subscription and will sleep for 3 minutes unless any
other trigger comes.

# I haven't tried that, though.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro HORIGUCHI (#27)
1 attachment(s)
Re: some review comments on logical rep code

On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Regards,

--
Fujii Masao

Attachments:

002_Reset_on_commit_launcher_wakeup_v5.patchapplication/octet-stream; name=002_Reset_on_commit_launcher_wakeup_v5.patchDownload
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***************
*** 93,98 ****
--- 93,99 ----
  #include "miscadmin.h"
  #include "pg_trace.h"
  #include "pgstat.h"
+ #include "replication/logicallauncher.h"
  #include "replication/origin.h"
  #include "replication/syncrep.h"
  #include "replication/walsender.h"
***************
*** 914,919 **** typedef struct TwoPhaseFileHeader
--- 915,921 ----
  	int32		nabortrels;		/* number of delete-on-abort rels */
  	int32		ninvalmsgs;		/* number of cache invalidation messages */
  	bool		initfileinval;	/* does relcache init file need invalidation? */
+ 	bool		wakeuplauncher;	/* need to wake up logical rep launcher? */
  	uint16		gidlen;			/* length of the GID - GID follows the header */
  } TwoPhaseFileHeader;
  
***************
*** 1025,1030 **** StartPrepare(GlobalTransaction gxact)
--- 1027,1034 ----
  	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
  	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
  														  &hdr.initfileinval);
+ 	hdr.wakeuplauncher = on_commit_launcher_wakeup;
+ 	on_commit_launcher_wakeup = false;
  	hdr.gidlen = strlen(gxact->gid) + 1;		/* Include '\0' */
  
  	save_state_data(&hdr, sizeof(TwoPhaseFileHeader));
***************
*** 1501,1506 **** FinishPreparedTransaction(const char *gid, bool isCommit)
--- 1505,1517 ----
  	/* Count the prepared xact as committed or aborted */
  	AtEOXact_PgStat(isCommit);
  
+ 	/* Wake up the logical replication launcher if necessary */
+ 	if (hdr->wakeuplauncher)
+ 	{
+ 		ApplyLauncherWakeupAtCommit();
+ 		AtEOXact_ApplyLauncher(isCommit);
+ 	}
+ 
  	/*
  	 * And now we can clean up any files we may have left.
  	 */
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***************
*** 2138,2144 **** CommitTransaction(void)
  	AtEOXact_HashTables(true);
  	AtEOXact_PgStat(true);
  	AtEOXact_Snapshot(true, false);
! 	AtCommit_ApplyLauncher();
  	pgstat_report_xact_timestamp(0);
  
  	CurrentResourceOwner = NULL;
--- 2138,2144 ----
  	AtEOXact_HashTables(true);
  	AtEOXact_PgStat(true);
  	AtEOXact_Snapshot(true, false);
! 	AtEOXact_ApplyLauncher(true);
  	pgstat_report_xact_timestamp(0);
  
  	CurrentResourceOwner = NULL;
***************
*** 2612,2617 **** AbortTransaction(void)
--- 2612,2618 ----
  		AtEOXact_ComboCid();
  		AtEOXact_HashTables(false);
  		AtEOXact_PgStat(false);
+ 		AtEOXact_ApplyLauncher(false);
  		pgstat_report_xact_timestamp(0);
  	}
  
*** a/src/backend/replication/logical/launcher.c
--- b/src/backend/replication/logical/launcher.c
***************
*** 81,87 **** static void logicalrep_worker_detach(void);
  volatile sig_atomic_t got_SIGHUP = false;
  volatile sig_atomic_t got_SIGTERM = false;
  
! static bool	on_commit_launcher_wakeup = false;
  
  Datum pg_stat_get_subscription(PG_FUNCTION_ARGS);
  
--- 81,87 ----
  volatile sig_atomic_t got_SIGHUP = false;
  volatile sig_atomic_t got_SIGTERM = false;
  
! bool	on_commit_launcher_wakeup = false;
  
  Datum pg_stat_get_subscription(PG_FUNCTION_ARGS);
  
***************
*** 633,645 **** ApplyLauncherShmemInit(void)
  }
  
  /*
!  * Wakeup the launcher on commit if requested.
   */
  void
! AtCommit_ApplyLauncher(void)
  {
! 	if (on_commit_launcher_wakeup)
  		ApplyLauncherWakeup();
  }
  
  /*
--- 633,648 ----
  }
  
  /*
!  * AtEOXact_ApplyLauncher
!  *      Wakeup the launcher on commit if requested.
   */
  void
! AtEOXact_ApplyLauncher(bool isCommit)
  {
! 	if (isCommit && on_commit_launcher_wakeup)
  		ApplyLauncherWakeup();
+ 
+ 	on_commit_launcher_wakeup = false;
  }
  
  /*
*** a/src/include/replication/logicallauncher.h
--- b/src/include/replication/logicallauncher.h
***************
*** 22,27 **** extern Size ApplyLauncherShmemSize(void);
  extern void ApplyLauncherShmemInit(void);
  
  extern void ApplyLauncherWakeupAtCommit(void);
! extern void AtCommit_ApplyLauncher(void);
  
  #endif   /* LOGICALLAUNCHER_H */
--- 22,29 ----
  extern void ApplyLauncherShmemInit(void);
  
  extern void ApplyLauncherWakeupAtCommit(void);
! extern void AtEOXact_ApplyLauncher(bool isCommit);
! 
! extern bool	on_commit_launcher_wakeup;
  
  #endif   /* LOGICALLAUNCHER_H */
#31Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Fujii Masao (#30)
Re: some review comments on logical rep code

On 26/04/17 01:01, Fujii Masao wrote:

On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Looks like it does not even increase size of the 2pc file, +1 for this.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#31)
Re: some review comments on logical rep code

On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 01:01, Fujii Masao wrote:

On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#32)
Re: some review comments on logical rep code

At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>

On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 01:01, Fujii Masao wrote:

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Looks like it does not even increase size of the 2pc file, +1 for this.

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

This change looks reasonable to me, +1 from me to this.

The patch reads on_commit_launcher_wakeup directly then updates
it via ApplyALuncherWakeupAtCommit() but it's too much to add a
function for the sake of this.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro HORIGUCHI (#33)
1 attachment(s)
Re: some review comments on logical rep code

On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>

On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 01:01, Fujii Masao wrote:

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Looks like it does not even increase size of the 2pc file, +1 for this.

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

This change looks reasonable to me, +1 from me to this.

The patch reads on_commit_launcher_wakeup directly then updates
it via ApplyALuncherWakeupAtCommit() but it's too much to add a
function for the sake of this.

OK, so what about the attached patch? I replaced all the calls to
ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".

Regards,

--
Fujii Masao

Attachments:

002_Reset_on_commit_launcher_wakeup_v6.patchapplication/octet-stream; name=002_Reset_on_commit_launcher_wakeup_v6.patchDownload
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***************
*** 93,98 ****
--- 93,99 ----
  #include "miscadmin.h"
  #include "pg_trace.h"
  #include "pgstat.h"
+ #include "replication/logicallauncher.h"
  #include "replication/origin.h"
  #include "replication/syncrep.h"
  #include "replication/walsender.h"
***************
*** 914,919 **** typedef struct TwoPhaseFileHeader
--- 915,921 ----
  	int32		nabortrels;		/* number of delete-on-abort rels */
  	int32		ninvalmsgs;		/* number of cache invalidation messages */
  	bool		initfileinval;	/* does relcache init file need invalidation? */
+ 	bool		wakeuplauncher;	/* need to wake up logical rep launcher? */
  	uint16		gidlen;			/* length of the GID - GID follows the header */
  } TwoPhaseFileHeader;
  
***************
*** 1025,1030 **** StartPrepare(GlobalTransaction gxact)
--- 1027,1034 ----
  	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
  	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
  														  &hdr.initfileinval);
+ 	hdr.wakeuplauncher = on_commit_launcher_wakeup;
+ 	on_commit_launcher_wakeup = false;
  	hdr.gidlen = strlen(gxact->gid) + 1;		/* Include '\0' */
  
  	save_state_data(&hdr, sizeof(TwoPhaseFileHeader));
***************
*** 1501,1506 **** FinishPreparedTransaction(const char *gid, bool isCommit)
--- 1505,1517 ----
  	/* Count the prepared xact as committed or aborted */
  	AtEOXact_PgStat(isCommit);
  
+ 	/* Wake up the logical replication launcher if necessary */
+ 	if (hdr->wakeuplauncher)
+ 	{
+ 		on_commit_launcher_wakeup = true;
+ 		AtEOXact_ApplyLauncher(isCommit);
+ 	}
+ 
  	/*
  	 * And now we can clean up any files we may have left.
  	 */
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***************
*** 2138,2144 **** CommitTransaction(void)
  	AtEOXact_HashTables(true);
  	AtEOXact_PgStat(true);
  	AtEOXact_Snapshot(true, false);
! 	AtCommit_ApplyLauncher();
  	pgstat_report_xact_timestamp(0);
  
  	CurrentResourceOwner = NULL;
--- 2138,2144 ----
  	AtEOXact_HashTables(true);
  	AtEOXact_PgStat(true);
  	AtEOXact_Snapshot(true, false);
! 	AtEOXact_ApplyLauncher(true);
  	pgstat_report_xact_timestamp(0);
  
  	CurrentResourceOwner = NULL;
***************
*** 2612,2617 **** AbortTransaction(void)
--- 2612,2618 ----
  		AtEOXact_ComboCid();
  		AtEOXact_HashTables(false);
  		AtEOXact_PgStat(false);
+ 		AtEOXact_ApplyLauncher(false);
  		pgstat_report_xact_timestamp(0);
  	}
  
*** a/src/backend/commands/subscriptioncmds.c
--- b/src/backend/commands/subscriptioncmds.c
***************
*** 452,458 **** CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
  
  	heap_close(rel, RowExclusiveLock);
  
! 	ApplyLauncherWakeupAtCommit();
  
  	ObjectAddressSet(myself, SubscriptionRelationId, subid);
  
--- 452,464 ----
  
  	heap_close(rel, RowExclusiveLock);
  
! 	/*
! 	* Request wakeup of the launcher on commit of the transaction.
! 	* This is used to send launcher signal to stop sleeping and process
! 	* the subscriptions when current transaction commits.
! 	*/
! 	if (enabled)
! 		on_commit_launcher_wakeup = true;
  
  	ObjectAddressSet(myself, SubscriptionRelationId, subid);
  
***************
*** 645,651 **** AlterSubscription(AlterSubscriptionStmt *stmt)
  				replaces[Anum_pg_subscription_subenabled - 1] = true;
  
  				if (enabled)
! 					ApplyLauncherWakeupAtCommit();
  
  				update_tuple = true;
  				break;
--- 651,657 ----
  				replaces[Anum_pg_subscription_subenabled - 1] = true;
  
  				if (enabled)
! 					on_commit_launcher_wakeup = true;
  
  				update_tuple = true;
  				break;
*** a/src/backend/replication/logical/launcher.c
--- b/src/backend/replication/logical/launcher.c
***************
*** 83,89 **** static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
  volatile sig_atomic_t got_SIGHUP = false;
  volatile sig_atomic_t got_SIGTERM = false;
  
! static bool	on_commit_launcher_wakeup = false;
  
  Datum pg_stat_get_subscription(PG_FUNCTION_ARGS);
  
--- 83,89 ----
  volatile sig_atomic_t got_SIGHUP = false;
  volatile sig_atomic_t got_SIGTERM = false;
  
! bool	on_commit_launcher_wakeup = false;
  
  Datum pg_stat_get_subscription(PG_FUNCTION_ARGS);
  
***************
*** 745,771 **** ApplyLauncherShmemInit(void)
  }
  
  /*
!  * Wakeup the launcher on commit if requested.
   */
  void
! AtCommit_ApplyLauncher(void)
  {
! 	if (on_commit_launcher_wakeup)
  		ApplyLauncherWakeup();
- }
  
! /*
!  * Request wakeup of the launcher on commit of the transaction.
!  *
!  * This is used to send launcher signal to stop sleeping and process the
!  * subscriptions when current transaction commits. Should be used when new
!  * tuple was added to the pg_subscription catalog.
! */
! void
! ApplyLauncherWakeupAtCommit(void)
! {
! 	if (!on_commit_launcher_wakeup)
! 		on_commit_launcher_wakeup = true;
  }
  
  static void
--- 745,760 ----
  }
  
  /*
!  * AtEOXact_ApplyLauncher
!  *      Wakeup the launcher on commit if requested.
   */
  void
! AtEOXact_ApplyLauncher(bool isCommit)
  {
! 	if (isCommit && on_commit_launcher_wakeup)
  		ApplyLauncherWakeup();
  
! 	on_commit_launcher_wakeup = false;
  }
  
  static void
*** a/src/include/replication/logicallauncher.h
--- b/src/include/replication/logicallauncher.h
***************
*** 21,27 **** extern void ApplyLauncherMain(Datum main_arg);
  extern Size ApplyLauncherShmemSize(void);
  extern void ApplyLauncherShmemInit(void);
  
! extern void ApplyLauncherWakeupAtCommit(void);
! extern void AtCommit_ApplyLauncher(void);
  
  #endif   /* LOGICALLAUNCHER_H */
--- 21,28 ----
  extern Size ApplyLauncherShmemSize(void);
  extern void ApplyLauncherShmemInit(void);
  
! extern void AtEOXact_ApplyLauncher(bool isCommit);
! 
! extern bool	on_commit_launcher_wakeup;
  
  #endif   /* LOGICALLAUNCHER_H */
#35Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#34)
1 attachment(s)
Re: some review comments on logical rep code

On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>

On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 01:01, Fujii Masao wrote:

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Looks like it does not even increase size of the 2pc file, +1 for this.

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

This change looks reasonable to me, +1 from me to this.

The patch reads on_commit_launcher_wakeup directly then updates
it via ApplyALuncherWakeupAtCommit() but it's too much to add a
function for the sake of this.

OK, so what about the attached patch? I replaced all the calls to
ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".

BTW, while I was reading the code to implement the patch that
I posted upthread, I found that the following condition doesn't
work as expected. This is because "last_start_time" is always 0.

/* Limit the start retry to once a wal_retrieve_retry_interval */
if (TimestampDifferenceExceeds(last_start_time, now,
wal_retrieve_retry_interval))

"last_start_time" is set to "now" when the launcher starts up new
worker. But "last_start_time" is defined and always initialized with 0
at the beginning of the main loop in ApplyLauncherMain(), so
the above condition becomes always true. This is obviously a bug.
Attached patch fixes this issue.

Regards,

--
Fujii Masao

Attachments:

bugfix.patchapplication/octet-stream; name=bugfix.patchDownload
*** a/src/backend/replication/logical/launcher.c
--- b/src/backend/replication/logical/launcher.c
***************
*** 781,786 **** ApplyLauncherWakeup(void)
--- 781,788 ----
  void
  ApplyLauncherMain(Datum main_arg)
  {
+ 	TimestampTz		last_start_time = 0;
+ 
  	ereport(DEBUG1,
  			(errmsg("logical replication launcher started")));
  
***************
*** 812,818 **** ApplyLauncherMain(Datum main_arg)
  		MemoryContext	subctx;
  		MemoryContext	oldctx;
  		TimestampTz		now;
- 		TimestampTz		last_start_time = 0;
  		long			wait_time = DEFAULT_NAPTIME_PER_CYCLE;
  
  		now = GetCurrentTimestamp();
--- 814,819 ----
#36Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Fujii Masao (#35)
Re: some review comments on logical rep code

On 26/04/17 18:36, Fujii Masao wrote:

On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>

On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 01:01, Fujii Masao wrote:

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Looks like it does not even increase size of the 2pc file, +1 for this.

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

This change looks reasonable to me, +1 from me to this.

The patch reads on_commit_launcher_wakeup directly then updates
it via ApplyALuncherWakeupAtCommit() but it's too much to add a
function for the sake of this.

OK, so what about the attached patch? I replaced all the calls to
ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".

BTW, while I was reading the code to implement the patch that
I posted upthread, I found that the following condition doesn't
work as expected. This is because "last_start_time" is always 0.

/* Limit the start retry to once a wal_retrieve_retry_interval */
if (TimestampDifferenceExceeds(last_start_time, now,
wal_retrieve_retry_interval))

"last_start_time" is set to "now" when the launcher starts up new
worker. But "last_start_time" is defined and always initialized with 0
at the beginning of the main loop in ApplyLauncherMain(), so
the above condition becomes always true. This is obviously a bug.
Attached patch fixes this issue.

Yes, please go ahead and commit this.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Fujii Masao
masao.fujii@gmail.com
In reply to: Petr Jelinek (#36)
Re: some review comments on logical rep code

On Thu, Apr 27, 2017 at 5:37 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 18:36, Fujii Masao wrote:

On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>

On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 26/04/17 01:01, Fujii Masao wrote:

However this is overkill for small gain and false wakeup of the
launcher is not so harmful (probably we can live with that), so
we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.

How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Looks like it does not even increase size of the 2pc file, +1 for this.

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

This change looks reasonable to me, +1 from me to this.

The patch reads on_commit_launcher_wakeup directly then updates
it via ApplyALuncherWakeupAtCommit() but it's too much to add a
function for the sake of this.

OK, so what about the attached patch? I replaced all the calls to
ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".

BTW, while I was reading the code to implement the patch that
I posted upthread, I found that the following condition doesn't
work as expected. This is because "last_start_time" is always 0.

/* Limit the start retry to once a wal_retrieve_retry_interval */
if (TimestampDifferenceExceeds(last_start_time, now,
wal_retrieve_retry_interval))

"last_start_time" is set to "now" when the launcher starts up new
worker. But "last_start_time" is defined and always initialized with 0
at the beginning of the main loop in ApplyLauncherMain(), so
the above condition becomes always true. This is obviously a bug.
Attached patch fixes this issue.

Yes, please go ahead and commit this.

Pushed. Thanks!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Noah Misch
noah@leadboat.com
In reply to: Fujii Masao (#37)
Re: some review comments on logical rep code

On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:

Pushed. Thanks!

Does this close the open item, or is there more to do?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#38)
Re: some review comments on logical rep code

On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:

Pushed. Thanks!

Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]/messages/by-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP+y5hecsoQmQqzZf8T7A@mail.gmail.com. I guess reviewing that patch is almost done.

[1]: /messages/by-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP+y5hecsoQmQqzZf8T7A@mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#39)
Re: some review comments on logical rep code

On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:

On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:

Pushed. Thanks!

Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]. I guess reviewing that patch is almost done.

[1] /messages/by-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP+y5hecsoQmQqzZf8T7A@mail.gmail.com

Thanks. Peter, This PostgreSQL 10 open item is past due for your status
update. Kindly send a status update within 24 hours, and include a date for
your subsequent status update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Noah Misch (#40)
Re: some review comments on logical rep code

On 4/28/17 01:01, Noah Misch wrote:

On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:

On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:

Pushed. Thanks!

Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]. I guess reviewing that patch is almost done.

[1] /messages/by-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP+y5hecsoQmQqzZf8T7A@mail.gmail.com

Thanks. Peter, This PostgreSQL 10 open item is past due for your status
update. Kindly send a status update within 24 hours, and include a date for
your subsequent status update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

I think the patch that Fujii Masao has proposed has found general
agreement. I would recommend that he commits it as he sees fit.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Noah Misch
noah@leadboat.com
In reply to: Peter Eisentraut (#41)
Re: some review comments on logical rep code

On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote:

On 4/28/17 01:01, Noah Misch wrote:

On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:

On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:

Pushed. Thanks!

Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]. I guess reviewing that patch is almost done.

[1] /messages/by-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP+y5hecsoQmQqzZf8T7A@mail.gmail.com

Thanks. Peter, This PostgreSQL 10 open item is past due for your status
update. Kindly send a status update within 24 hours, and include a date for
your subsequent status update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

I think the patch that Fujii Masao has proposed has found general
agreement. I would recommend that he commits it as he sees fit.

This is not a conforming status update, because it does not specify a date for
your next update.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#42)
Re: some review comments on logical rep code

On Sat, Apr 29, 2017 at 12:33 AM, Noah Misch <noah@leadboat.com> wrote:

I think the patch that Fujii Masao has proposed has found general
agreement. I would recommend that he commits it as he sees fit.

This is not a conforming status update, because it does not specify a date for
your next update.

If Peter thinks that the issues fixed by this patch don't really need
to be corrected, then we could interpret this as a statement that it
should not be listed as an open item but that he does not object to
someone else fixing it anyway.

If these are real issues, then Peter should take responsibility for
them because they were created by his commit. If Fujii Masao is
willing to help by committing the patch in question, great; but if
not, then Peter should be responsible for committing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Masahiko Sawada (#26)
Re: some review comments on logical rep code

I have committed this version.

I have omitted all the talk about 2PC. There are discussions ongoing
about changing the transaction behavior of CREATE SUBSCRIPTION, which
might interfere with that. If someone wants to rebase and propose the
parts about 2PC separately, I don't object, but it can be handled
separately.

I will close this open item now, since I think we have covered
everything that was originally reported. Please start new threads if
there are any issues that were missed or that have been discovered
during the discussion.

On 4/23/17 22:18, Masahiko Sawada wrote:

On Sat, Apr 22, 2017 at 4:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>

On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>

On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
1.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

Thanks.

2.

No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

I understood. Sounds reasonable.

ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.

Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
to a prepared transaction. Even COMMIT PREPARED might wake up the
launcher process but might not wake up it. ROLLBACK PREPARED is also
the same. For example, in the following situation we wake up the
launcher at COMMIT, not at COMMIT PREPARED.

(BTW, CreateSubscription, is the only one function that sets
on_commit_launcher_wakeup for now, cannot be called in a transaction
block. So it doesn't actually happen that we wake up the launcher when
a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
SUBSCRIPTION ENABLE, we need to deal with it.)

We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
within the transaction block.

Oops, I'd missed it.

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT; -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Agreed.

I added this comment to the patch and attached it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Noah Misch (#42)
Re: some review comments on logical rep code

On 4/29/17 00:33, Noah Misch wrote:

On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote:

On 4/28/17 01:01, Noah Misch wrote:

On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:

On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:

Pushed. Thanks!

Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]. I guess reviewing that patch is almost done.

[1] /messages/by-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP+y5hecsoQmQqzZf8T7A@mail.gmail.com

Thanks. Peter, This PostgreSQL 10 open item is past due for your status
update. Kindly send a status update within 24 hours, and include a date for
your subsequent status update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

I think the patch that Fujii Masao has proposed has found general
agreement. I would recommend that he commits it as he sees fit.

This is not a conforming status update, because it does not specify a date for
your next update.

Everything related to this is fixed now.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers