Error while creating subscription when server is running in single user mode

Started by tusharover 8 years ago18 messages
#1tushar
tushar.ahuja@enterprisedb.com

Hi,

There is an error while creating subscription when server is running in
single user mode

centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
PostgreSQL stand-alone backend 10beta1
backend> create subscription sub connection 'dbname=postgres port=5433
user=centos' publication p with (create_slot=0,enabled=off);
2017-05-31 12:53:09.318 BST [10469] LOG: statement: create subscription
sub connection 'dbname=postgres port=5433 user=centos' publication p
with (create_slot=0,enabled=off);

2017-05-31 12:53:09.326 BST [10469] ERROR: epoll_ctl() failed: Bad file
descriptor

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Dilip Kumar
dilipbalaut@gmail.com
In reply to: tushar (#1)
Re: Error while creating subscription when server is running in single user mode

On Wed, May 31, 2017 at 7:54 AM, tushar <tushar.ahuja@enterprisedb.com> wrote:

centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
PostgreSQL stand-alone backend 10beta1
backend> create subscription sub connection 'dbname=postgres port=5433
user=centos' publication p with (create_slot=0,enabled=off);
2017-05-31 12:53:09.318 BST [10469] LOG: statement: create subscription sub
connection 'dbname=postgres port=5433 user=centos' publication p with
(create_slot=0,enabled=off);

2017-05-31 12:53:09.326 BST [10469] ERROR: epoll_ctl() failed: Bad file
descriptor

IMHO, In single user mode, it can not support replication (it can not
have background WALReciver task). However, I believe there should be a
proper error if the above statement is correct.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Michael Paquier
michael.paquier@gmail.com
In reply to: Dilip Kumar (#2)
Re: Error while creating subscription when server is running in single user mode

On Wed, May 31, 2017 at 7:01 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, May 31, 2017 at 7:54 AM, tushar <tushar.ahuja@enterprisedb.com> wrote:

centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
PostgreSQL stand-alone backend 10beta1
backend> create subscription sub connection 'dbname=postgres port=5433
user=centos' publication p with (create_slot=0,enabled=off);
2017-05-31 12:53:09.318 BST [10469] LOG: statement: create subscription sub
connection 'dbname=postgres port=5433 user=centos' publication p with
(create_slot=0,enabled=off);

2017-05-31 12:53:09.326 BST [10469] ERROR: epoll_ctl() failed: Bad file
descriptor

IMHO, In single user mode, it can not support replication (it can not
have background WALReciver task). However, I believe there should be a
proper error if the above statement is correct.

Yeah, see 0e0f43d6 for example. A simple fix is to look at
IsUnderPostmaster when creating, altering or dropping a subscription
in subscriptioncmds.c.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#3)
1 attachment(s)
Re: Error while creating subscription when server is running in single user mode

On Wed, May 31, 2017 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yeah, see 0e0f43d6 for example. A simple fix is to look at
IsUnderPostmaster when creating, altering or dropping a subscription
in subscriptioncmds.c.

Yeah, below patch fixes that.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

subscription_error.patchapplication/octet-stream; name=subscription_error.patchDownload
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 21ef15f..95c34fe 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -291,6 +291,11 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	bool		create_slot;
 	List	   *publications;
 
+	if (!IsUnderPostmaster)
+		ereport(FATAL,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("subscription commands are not supported by single-user servers")));
+
 	/*
 	 * Parse and check options.
 	 * Connection and publication should not be specified here.
@@ -583,6 +588,12 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
 	bool		update_tuple = false;
 	Subscription   *sub;
 
+	if (!IsUnderPostmaster)
+		ereport(FATAL,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("subscription commands are not supported by single-user servers")));
+
+
 	rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
 
 	/* Fetch the existing tuple. */
@@ -780,6 +791,11 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	WalReceiverConn	   *wrconn = NULL;
 	StringInfoData		cmd;
 
+	if (!IsUnderPostmaster)
+		ereport(FATAL,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("subscription commands are not supported by single-user servers")));
+
 	/*
 	 * Lock pg_subscription with AccessExclusiveLock to ensure
 	 * that the launcher doesn't restart new worker during dropping
#5Michael Paquier
michael.paquier@gmail.com
In reply to: Dilip Kumar (#4)
Re: Error while creating subscription when server is running in single user mode

On Wed, May 31, 2017 at 10:49 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, May 31, 2017 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yeah, see 0e0f43d6 for example. A simple fix is to look at
IsUnderPostmaster when creating, altering or dropping a subscription
in subscriptioncmds.c.

Yeah, below patch fixes that.

Thanks, this looks correct to me at quick glance.

+    if (!IsUnderPostmaster)
+        ereport(FATAL,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+            errmsg("subscription commands are not supported by
single-user servers")));
The messages could be more detailed, like directly the operation of
CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: Error while creating subscription when server is running in single user mode

On Thu, Jun 1, 2017 at 1:02 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Thanks, this looks correct to me at quick glance.

+    if (!IsUnderPostmaster)
+        ereport(FATAL,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+            errmsg("subscription commands are not supported by
single-user servers")));
The messages could be more detailed, like directly the operation of
CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.

Thanks for looking into it. Yeah, I think it's better to give
specific message instead of generic because we still support some of
the subscription commands even in single-user mode i.e ALTER
SUBSCRIPTION OWNER. My patch doesn't block this command and there is
no harm in supporting this in single-user mode but does this make any
sense? We may create some use case like creation subscription in
normal mode and then ALTER OWNER in single user mode but it makes
little sense to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

subscription_error_v1.patchapplication/octet-stream; name=subscription_error_v1.patchDownload
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 21ef15f..ff611fc 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -291,6 +291,11 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	bool		create_slot;
 	List	   *publications;
 
+	if (!IsUnderPostmaster)
+		ereport(FATAL,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("CREATE SUBSCRIPTION is not supported by single-user servers")));
+
 	/*
 	 * Parse and check options.
 	 * Connection and publication should not be specified here.
@@ -583,6 +588,12 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
 	bool		update_tuple = false;
 	Subscription   *sub;
 
+	if (!IsUnderPostmaster)
+		ereport(FATAL,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("ALTER SUBSCRIPTION is not supported by single-user servers")));
+
+
 	rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
 
 	/* Fetch the existing tuple. */
@@ -780,6 +791,11 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	WalReceiverConn	   *wrconn = NULL;
 	StringInfoData		cmd;
 
+	if (!IsUnderPostmaster)
+		ereport(FATAL,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("DROP SUBSCRIPTION is not supported by single-user servers")));
+
 	/*
 	 * Lock pg_subscription with AccessExclusiveLock to ensure
 	 * that the launcher doesn't restart new worker during dropping
#7Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Dilip Kumar (#6)
Re: Error while creating subscription when server is running in single user mode

On 6/1/17 04:49, Dilip Kumar wrote:

On Thu, Jun 1, 2017 at 1:02 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Thanks, this looks correct to me at quick glance.

+    if (!IsUnderPostmaster)
+        ereport(FATAL,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+            errmsg("subscription commands are not supported by
single-user servers")));
The messages could be more detailed, like directly the operation of
CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.

Thanks for looking into it. Yeah, I think it's better to give
specific message instead of generic because we still support some of
the subscription commands even in single-user mode i.e ALTER
SUBSCRIPTION OWNER. My patch doesn't block this command and there is
no harm in supporting this in single-user mode but does this make any
sense? We may create some use case like creation subscription in
normal mode and then ALTER OWNER in single user mode but it makes
little sense to me.

We should look at what the underlying problem is before we prohibit
anything at a high level.

When I try it, I get a

TRAP: FailedAssertion("!(event->fd != (-1))", File: "latch.c", Line: 861)

which might indicate that there is a more general problem with latch use
in single-user mode.

If I remove that assertion, things work fine after that. The originally
reported error "epoll_ctl() failed: Bad file descriptor" might indicate
that there is platform-dependent behavior.

I think the general problem is that the latch code that checks for
postmaster death does not handle single-user mode well.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#7)
Re: Error while creating subscription when server is running in single user mode

On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:

We should look at what the underlying problem is before we prohibit
anything at a high level.

I'm not sure there's any underlying issue here, except being in single
user mode.

When I try it, I get a

TRAP: FailedAssertion("!(event->fd != (-1))", File: "latch.c", Line: 861)

which might indicate that there is a more general problem with latch use
in single-user mode.

That just means that the latch isn't initialized. Which makes:

If I remove that assertion, things work fine after that. The originally
reported error "epoll_ctl() failed: Bad file descriptor" might indicate
that there is platform-dependent behavior.

quite unsurprising. I'm not sure how this hints at platform dependent
behaviour?

libpqrcv_connect() uses MyProc->procLatch, which doesn't exist/isn't
initialized in single user mode. I'm very unclear why that code uses
MyProc->procLatch rather than MyLatch, but that'd not change anything -
the tablesync stuff etc would still not work.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#8)
Re: Error while creating subscription when server is running in single user mode

On 6/1/17 21:55, Andres Freund wrote:

On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:

We should look at what the underlying problem is before we prohibit
anything at a high level.

I'm not sure there's any underlying issue here, except being in single
user mode.

My point is that we shouldn't be putting checks into DDL commands about
single-user mode if the actual cause of the issue is in a lower-level
system. Not all uses of a particular DDL command necessary use a latch,
for example. Also, there could be other things that hit a latch that
are reachable in single-user mode that we haven't found yet.

So I think the check should either go somewhere in the latch code, or
possibly in the libpqwalreceiver code. Or we make the latch code work
so that the check-for-postmaster-death code becomes a noop in
single-user mode. Suggestions?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#9)
Re: Error while creating subscription when server is running in single user mode

On 2017-06-02 15:00:21 -0400, Peter Eisentraut wrote:

On 6/1/17 21:55, Andres Freund wrote:

On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:

We should look at what the underlying problem is before we prohibit
anything at a high level.

I'm not sure there's any underlying issue here, except being in single
user mode.

My point is that we shouldn't be putting checks into DDL commands about
single-user mode if the actual cause of the issue is in a lower-level
system.

But it's not really.

Not all uses of a particular DDL command necessary use a latch,
for example. Also, there could be other things that hit a latch that
are reachable in single-user mode that we haven't found yet.

Latches work in single user mode, it's just that the new code for some
reason uses uninitialized memory as the latch. As I pointed out above,
the new code really should just use MyLatch instead of
MyProc->procLatch.

So I think the check should either go somewhere in the latch code, or
possibly in the libpqwalreceiver code. Or we make the latch code work
so that the check-for-postmaster-death code becomes a noop in
single-user mode. Suggestions?

I don't think the postmaster death code is really the issue here. Nor
is libpqwalreceiver really the issue. We can put ERRORs in a bunch of
unrelated subsystems, sure, but that doesn't really solve the issue that
logical rep pretty essentially requires multiple processes. We've
prevented parallelism from being used in general (cf. standard_planner),
we've not put checks in all the subsystems it uses.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#9)
Re: Error while creating subscription when server is running in single user mode

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

My point is that we shouldn't be putting checks into DDL commands about
single-user mode if the actual cause of the issue is in a lower-level
system. Not all uses of a particular DDL command necessary use a latch,
for example. Also, there could be other things that hit a latch that
are reachable in single-user mode that we haven't found yet.

So I think the check should either go somewhere in the latch code, or
possibly in the libpqwalreceiver code. Or we make the latch code work
so that the check-for-postmaster-death code becomes a noop in
single-user mode. Suggestions?

It's certainly plausible that we could have the latch code just ignore
WL_POSTMASTER_DEATH if not IsUnderPostmaster. I think that the original
reasoning for not doing that was that the calling code should know which
environment it's in, and not pass an unimplementable wait-exit reason;
so silently ignoring the bit could mask a bug. Perhaps that argument is
no longer attractive. Alternatively, we could fix the relevant call sites
to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
behavior for the majority of call sites.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Tom Lane (#11)
1 attachment(s)
Re: Error while creating subscription when server is running in single user mode

On 6/2/17 15:41, Tom Lane wrote:

It's certainly plausible that we could have the latch code just ignore
WL_POSTMASTER_DEATH if not IsUnderPostmaster. I think that the original
reasoning for not doing that was that the calling code should know which
environment it's in, and not pass an unimplementable wait-exit reason;
so silently ignoring the bit could mask a bug. Perhaps that argument is
no longer attractive. Alternatively, we could fix the relevant call sites
to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
behavior for the majority of call sites.

There are a lot of those call sites. (And a lot of duplicate code for
what to do if postmaster death actually happens.) I doubt we want to
check them all.

The attached patch fixes the reported issue for me.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Ignore-WL_POSTMASTER_DEATH-latch-event-in-single-use.patchtext/plain; charset=UTF-8; name=0001-Ignore-WL_POSTMASTER_DEATH-latch-event-in-single-use.patch; x-mac-creator=0; x-mac-type=0Download
From 8f0e1c844aabd4a0e3c245180d9019cbf65b1601 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Fri, 2 Jun 2017 23:02:13 -0400
Subject: [PATCH] Ignore WL_POSTMASTER_DEATH latch event in single user mode

Otherwise code that uses this will abort with an assertion failure,
because postmaster_alive_fds are not initialized.

Reported-by: tushar <tushar.ahuja@enterprisedb.com>
---
 src/backend/storage/ipc/latch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 53e6bf2477..55959de91f 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -370,7 +370,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 		AddWaitEventToSet(set, WL_LATCH_SET, PGINVALID_SOCKET,
 						  (Latch *) latch, NULL);
 
-	if (wakeEvents & WL_POSTMASTER_DEATH)
+	if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster)
 		AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
 						  NULL, NULL);
 
-- 
2.13.0

#13Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#10)
Re: Error while creating subscription when server is running in single user mode

On 6/2/17 15:24, Andres Freund wrote:

but that doesn't really solve the issue that
logical rep pretty essentially requires multiple processes.

But it may be sensible to execute certain DDL commands for repair, which
is why I'm arguing for a finer-grained approach than just prohibiting
everything.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Noah Misch
noah@leadboat.com
In reply to: Peter Eisentraut (#12)
Re: Error while creating subscription when server is running in single user mode

On Fri, Jun 02, 2017 at 11:06:52PM -0400, Peter Eisentraut wrote:

On 6/2/17 15:41, Tom Lane wrote:

It's certainly plausible that we could have the latch code just ignore
WL_POSTMASTER_DEATH if not IsUnderPostmaster. I think that the original
reasoning for not doing that was that the calling code should know which
environment it's in, and not pass an unimplementable wait-exit reason;
so silently ignoring the bit could mask a bug. Perhaps that argument is
no longer attractive. Alternatively, we could fix the relevant call sites
to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
behavior for the majority of call sites.

There are a lot of those call sites. (And a lot of duplicate code for
what to do if postmaster death actually happens.) I doubt we want to
check them all.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#12)
Re: Error while creating subscription when server is running in single user mode

On 6/2/17 23:06, Peter Eisentraut wrote:

On 6/2/17 15:41, Tom Lane wrote:

It's certainly plausible that we could have the latch code just ignore
WL_POSTMASTER_DEATH if not IsUnderPostmaster. I think that the original
reasoning for not doing that was that the calling code should know which
environment it's in, and not pass an unimplementable wait-exit reason;
so silently ignoring the bit could mask a bug. Perhaps that argument is
no longer attractive. Alternatively, we could fix the relevant call sites
to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
behavior for the majority of call sites.

There are a lot of those call sites. (And a lot of duplicate code for
what to do if postmaster death actually happens.) I doubt we want to
check them all.

The attached patch fixes the reported issue for me.

committed

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#10)
Re: Error while creating subscription when server is running in single user mode

On Fri, Jun 2, 2017 at 3:24 PM, Andres Freund <andres@anarazel.de> wrote:

Latches work in single user mode, it's just that the new code for some
reason uses uninitialized memory as the latch. As I pointed out above,
the new code really should just use MyLatch instead of
MyProc->procLatch.

We seem to have accumulated quite a few instance of that.

[rhaas pgsql]$ git grep MyLatch | wc -l
116
[rhaas pgsql]$ git grep 'MyProc->procLatch' | wc -l
33

Most of the offenders are in src/backend/replication, but there are
some that are related to parallelism as well (bgworker.c, pqmq.c,
parallel.c, condition_variable.c). Maybe we (you?) should just go and
change them all. I don't think using MyLatch instead of
MyProc->procLatch has become automatic for everyone yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#16)
Re: Error while creating subscription when server is running in single user mode

On 2017-06-06 15:48:42 -0400, Robert Haas wrote:

On Fri, Jun 2, 2017 at 3:24 PM, Andres Freund <andres@anarazel.de> wrote:

Latches work in single user mode, it's just that the new code for some
reason uses uninitialized memory as the latch. As I pointed out above,
the new code really should just use MyLatch instead of
MyProc->procLatch.

FWIW, I'd misremembered some code here, and we actually reach the
function initializing the shared latch, even in single user mode.

We seem to have accumulated quite a few instance of that.

[rhaas pgsql]$ git grep MyLatch | wc -l
116
[rhaas pgsql]$ git grep 'MyProc->procLatch' | wc -l
33

Most of the offenders are in src/backend/replication, but there are
some that are related to parallelism as well (bgworker.c, pqmq.c,
parallel.c, condition_variable.c). Maybe we (you?) should just go and
change them all. I don't think using MyLatch instead of
MyProc->procLatch has become automatic for everyone yet.

Nevertheless this should be changed. Will do.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#17)
1 attachment(s)
Re: Error while creating subscription when server is running in single user mode

On 2017-06-06 12:53:21 -0700, Andres Freund wrote:

On 2017-06-06 15:48:42 -0400, Robert Haas wrote:

On Fri, Jun 2, 2017 at 3:24 PM, Andres Freund <andres@anarazel.de> wrote:

Latches work in single user mode, it's just that the new code for some
reason uses uninitialized memory as the latch. As I pointed out above,
the new code really should just use MyLatch instead of
MyProc->procLatch.

FWIW, I'd misremembered some code here, and we actually reach the
function initializing the shared latch, even in single user mode.

We seem to have accumulated quite a few instance of that.

[rhaas pgsql]$ git grep MyLatch | wc -l
116
[rhaas pgsql]$ git grep 'MyProc->procLatch' | wc -l
33

Most of the offenders are in src/backend/replication, but there are
some that are related to parallelism as well (bgworker.c, pqmq.c,
parallel.c, condition_variable.c). Maybe we (you?) should just go and
change them all. I don't think using MyLatch instead of
MyProc->procLatch has become automatic for everyone yet.

Nevertheless this should be changed. Will do.

Here's the patch for that, also addressing some issues I found while
updating those callsites (separate thread started, too).

- Andres

Attachments:

0001-Clean-up-latch-related-code.patchtext/x-patch; charset=us-asciiDownload
From 9206ced1dc05d3a9cc99faafa22d5d8b16d998d1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 6 Jun 2017 16:13:00 -0700
Subject: [PATCH] Clean up latch related code.

The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch.  The latter works even early during backend startup,
where MyProc->procLatch doesn't yet.  While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might.  Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.

While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.

As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.

Author: Andres Freund
Discussion:
    https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de
    https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
---
 src/backend/access/transam/parallel.c              |  4 +--
 src/backend/libpq/pqmq.c                           |  4 +--
 src/backend/postmaster/bgworker.c                  |  4 +--
 .../libpqwalreceiver/libpqwalreceiver.c            | 13 ++++----
 src/backend/replication/logical/launcher.c         | 35 +++++++++++++++-------
 src/backend/replication/logical/tablesync.c        | 12 ++++----
 src/backend/replication/logical/worker.c           | 10 +++++--
 src/backend/storage/lmgr/condition_variable.c      |  6 ++--
 src/test/modules/worker_spi/worker_spi.c           |  2 ++
 9 files changed, 56 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index cb22174270..16a0bee610 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -527,9 +527,9 @@ WaitForParallelWorkersToFinish(ParallelContext *pcxt)
 		if (!anyone_alive)
 			break;
 
-		WaitLatch(&MyProc->procLatch, WL_LATCH_SET, -1,
+		WaitLatch(MyLatch, WL_LATCH_SET, -1,
 				  WAIT_EVENT_PARALLEL_FINISH);
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 	}
 
 	if (pcxt->toc != NULL)
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 96939327c3..8fbc03819d 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -172,9 +172,9 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 		if (result != SHM_MQ_WOULD_BLOCK)
 			break;
 
-		WaitLatch(&MyProc->procLatch, WL_LATCH_SET, 0,
+		WaitLatch(MyLatch, WL_LATCH_SET, 0,
 				  WAIT_EVENT_MQ_PUT_MESSAGE);
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 		CHECK_FOR_INTERRUPTS();
 	}
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index c3454276bf..712d700481 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -1144,7 +1144,7 @@ WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
 		if (status == BGWH_STOPPED)
 			break;
 
-		rc = WaitLatch(&MyProc->procLatch,
+		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0,
 					   WAIT_EVENT_BGWORKER_SHUTDOWN);
 
@@ -1154,7 +1154,7 @@ WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
 			break;
 		}
 
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 	}
 
 	return status;
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 726d1b5bd8..89c34b8225 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -176,7 +176,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 				   ? WL_SOCKET_READABLE
 				   : WL_SOCKET_WRITEABLE);
 
-		rc = WaitLatchOrSocket(&MyProc->procLatch,
+		rc = WaitLatchOrSocket(MyLatch,
 							   WL_POSTMASTER_DEATH |
 							   WL_LATCH_SET | io_flag,
 							   PQsocket(conn->streamConn),
@@ -190,7 +190,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 		/* Interrupted? */
 		if (rc & WL_LATCH_SET)
 		{
-			ResetLatch(&MyProc->procLatch);
+			ResetLatch(MyLatch);
 			CHECK_FOR_INTERRUPTS();
 		}
 
@@ -574,21 +574,22 @@ libpqrcv_PQexec(PGconn *streamConn, const char *query)
 			 * the signal arrives in the middle of establishment of
 			 * replication connection.
 			 */
-			ResetLatch(&MyProc->procLatch);
-			rc = WaitLatchOrSocket(&MyProc->procLatch,
+			rc = WaitLatchOrSocket(MyLatch,
 								   WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
 								   WL_LATCH_SET,
 								   PQsocket(streamConn),
 								   0,
 								   WAIT_EVENT_LIBPQWALRECEIVER);
+
+			/* Emergency bailout? */
 			if (rc & WL_POSTMASTER_DEATH)
 				exit(1);
 
-			/* interrupted */
+			/* Interrupted? */
 			if (rc & WL_LATCH_SET)
 			{
+				ResetLatch(MyLatch);
 				CHECK_FOR_INTERRUPTS();
-				continue;
 			}
 			if (PQconsumeInput(streamConn) == 0)
 				return NULL;	/* trouble */
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 5aaf24bfe4..5a3274b2c2 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -208,10 +208,15 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
 					   1000L, WAIT_EVENT_BGWORKER_STARTUP);
 
+		/* emergency bailout if postmaster has died */
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		ResetLatch(MyLatch);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
 	}
 
 	return;
@@ -440,10 +445,8 @@ logicalrep_worker_stop(Oid subid, Oid relid)
 
 		LWLockRelease(LogicalRepWorkerLock);
 
-		CHECK_FOR_INTERRUPTS();
-
 		/* Wait for signal. */
-		rc = WaitLatch(&MyProc->procLatch,
+		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
 					   1000L, WAIT_EVENT_BGWORKER_STARTUP);
 
@@ -451,7 +454,11 @@ logicalrep_worker_stop(Oid subid, Oid relid)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		ResetLatch(&MyProc->procLatch);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
 
 		/* Check worker status. */
 		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
@@ -492,7 +499,7 @@ logicalrep_worker_stop(Oid subid, Oid relid)
 		CHECK_FOR_INTERRUPTS();
 
 		/* Wait for more work. */
-		rc = WaitLatch(&MyProc->procLatch,
+		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
 					   1000L, WAIT_EVENT_BGWORKER_SHUTDOWN);
 
@@ -500,7 +507,11 @@ logicalrep_worker_stop(Oid subid, Oid relid)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		ResetLatch(&MyProc->procLatch);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
 	}
 }
 
@@ -876,7 +887,7 @@ ApplyLauncherMain(Datum main_arg)
 		}
 
 		/* Wait for more work. */
-		rc = WaitLatch(&MyProc->procLatch,
+		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
 					   wait_time,
 					   WAIT_EVENT_LOGICAL_LAUNCHER_MAIN);
@@ -885,13 +896,17 @@ ApplyLauncherMain(Datum main_arg)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
 		if (got_SIGHUP)
 		{
 			got_SIGHUP = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
-
-		ResetLatch(&MyProc->procLatch);
 	}
 
 	LogicalRepCtx->launcher_pid = 0;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ed66602935..6e55d2d606 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -191,7 +191,7 @@ wait_for_relation_state_change(Oid relid, char expected_state)
 		if (!worker)
 			return false;
 
-		rc = WaitLatch(&MyProc->procLatch,
+		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
 					   1000L, WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
 
@@ -199,7 +199,7 @@ wait_for_relation_state_change(Oid relid, char expected_state)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 	}
 
 	return false;
@@ -236,7 +236,7 @@ wait_for_worker_state_change(char expected_state)
 		if (MyLogicalRepWorker->relstate == expected_state)
 			return true;
 
-		rc = WaitLatch(&MyProc->procLatch,
+		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
 					   1000L, WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
 
@@ -244,7 +244,7 @@ wait_for_worker_state_change(char expected_state)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 	}
 
 	return false;
@@ -604,7 +604,7 @@ copy_read_data(void *outbuf, int minread, int maxread)
 		/*
 		 * Wait for more data or latch.
 		 */
-		rc = WaitLatchOrSocket(&MyProc->procLatch,
+		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_POSTMASTER_DEATH,
 							   fd, 1000L, WAIT_EVENT_LOGICAL_SYNC_DATA);
@@ -613,7 +613,7 @@ copy_read_data(void *outbuf, int minread, int maxread)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 	}
 
 	return bytesread;
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 51a64487cd..999d627c87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1146,7 +1146,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		/*
 		 * Wait for more data or latch.
 		 */
-		rc = WaitLatchOrSocket(&MyProc->procLatch,
+		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_POSTMASTER_DEATH,
 							   fd, NAPTIME_PER_CYCLE,
@@ -1156,6 +1156,12 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
 		if (got_SIGHUP)
 		{
 			got_SIGHUP = false;
@@ -1209,8 +1215,6 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 
 			send_feedback(last_received, requestReply, requestReply);
 		}
-
-		ResetLatch(&MyProc->procLatch);
 	}
 }
 
diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 5afb21121b..b4b7d28dd5 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -68,14 +68,14 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv)
 	{
 		cv_wait_event_set = CreateWaitEventSet(TopMemoryContext, 1);
 		AddWaitEventToSet(cv_wait_event_set, WL_LATCH_SET, PGINVALID_SOCKET,
-						  &MyProc->procLatch, NULL);
+						  MyLatch, NULL);
 	}
 
 	/*
 	 * Reset my latch before adding myself to the queue and before entering
 	 * the caller's predicate loop.
 	 */
-	ResetLatch(&MyProc->procLatch);
+	ResetLatch(MyLatch);
 
 	/* Add myself to the wait queue. */
 	SpinLockAcquire(&cv->mutex);
@@ -135,7 +135,7 @@ ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info)
 		WaitEventSetWait(cv_wait_event_set, -1, &event, 1, wait_event_info);
 
 		/* Reset latch before testing whether we can return. */
-		ResetLatch(&MyProc->procLatch);
+		ResetLatch(MyLatch);
 
 		/*
 		 * If this process has been taken out of the wait list, then we know
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 9abfc714a9..553baf0045 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -235,6 +235,8 @@ worker_spi_main(Datum main_arg)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
+		CHECK_FOR_INTERRUPTS();
+
 		/*
 		 * In case of a SIGHUP, just reload the configuration.
 		 */
-- 
2.12.0.264.gd6db3f2165.dirty