Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Started by Bharath Rupireddyover 5 years ago29 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
1 attachment(s)

Hi,

I found an issue while executing a backup use case(please see [1]CREATE TABLE t1 (id SERIAL); INSERT INTO t1 (id) SELECT * FROM generate_series(1, 20000000); SELECT * FROM pg_start_backup('label', false, false); /*JIT is enabled in my session and it is being picked by below query*/ EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT COUNT(*) FROM t1; SELECT * FROM pg_stop_backup(false, true); for
queries) on postgres version 12.

Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
on_shmem_exit call back which will
add this function to the before_shmem_exit_list array which is
supposed to be removed on pg_stop_backup
so that we can do the pending cleanup and issue a warning for each
pg_start_backup for which we did not call
the pg_stop backup. Now, I executed a query for which JIT is picked
up, then the the llvm compiler inserts it's
own exit callback i.e. llvm_shutdown at the end of
before_shmem_exit_list array. Now, I executed pg_stop_backup
and call to cancel_before_shmem_exit() is made with the expectation
that the nonexclusive_base_backup_cleanup
callback is removed from before_shmem_exit_list array.

Since the cancel_before_shmem_exit() only checks the last entry in the
before_shmem_exit_list array, which is
llvm compiler's exit callback, so we exit the
cancel_before_shmem_exit() without removing the intended
nonexclusive_base_backup_cleanup callback which remains still the
before_shmem_exit_list and gets executed
during the session exit throwing a warning "aborting backup due to
backend exiting before pg_stop_backup was called",
which is unintended.

Attached is the patch that fixes the above problem by making
cancel_before_shmem_exit() to look for the
given function(and for given args) in the entire
before_shmem_exit_list array, not just the last entry, starting
from the last entry.

Request the community take this patch for review for v12.

Having said that, abovementioned problem for backup use case does not
occur for v13 and latest versions of
postgres (please below description[2]for v13 and latest versions, start_backup first registers do_pg_abort_backup, and then pg_start_backup_callback, performs startup backup operations and unregisters only pg_start_backup_callback from before_shmem_exit_list, retaining do_pg_abort_backup still in the list, which is to be called on session's exit JIT compiler inserts it's own exit call back at the end of before_shmem_exit_list array. stop_backup registers pg_stop_backup_callback, performs stop operations, unregisters pg_stop_backup_callback from before_shmem_exit_list, and sets the sessionBackupState = SESSION_BACKUP_NONE, note that the callback do_pg_abort_backup registered by start_backup command still exists in the before_shmem_exit_list and will not be removed by stop_backup. On session exit, do_pg_abort_backup gets called but returns without performing any operations(not even throwing a warning), by checking sessionBackupState which was set to SESSION_BACKUP_NONE by stop_backup.), but these kinds of issues can
come, if the cancel_before_shmem_exit()
is left to just look at the last array entry while removing a
registered callback.

There's also a comment in cancel_before_shmem_exit() function
description "For simplicity, only the latest entry
can be removed. (We could work harder but there is no need for current uses.)

Since we start to find use cases now, there is a need to enhance
cancel_before_shmem_exit(), so I also propose
to have the same attached patch for v13 and latest versions.

Thoughts?

[1]: CREATE TABLE t1 (id SERIAL); INSERT INTO t1 (id) SELECT * FROM generate_series(1, 20000000); SELECT * FROM pg_start_backup('label', false, false); /*JIT is enabled in my session and it is being picked by below query*/ EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT COUNT(*) FROM t1; SELECT * FROM pg_stop_backup(false, true);
CREATE TABLE t1 (id SERIAL);
INSERT INTO t1 (id) SELECT * FROM generate_series(1, 20000000);
SELECT * FROM pg_start_backup('label', false, false);
/*JIT is enabled in my session and it is being picked by below query*/
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT COUNT(*) FROM t1;
SELECT * FROM pg_stop_backup(false, true);

[2]: for v13 and latest versions, start_backup first registers do_pg_abort_backup, and then pg_start_backup_callback, performs startup backup operations and unregisters only pg_start_backup_callback from before_shmem_exit_list, retaining do_pg_abort_backup still in the list, which is to be called on session's exit JIT compiler inserts it's own exit call back at the end of before_shmem_exit_list array. stop_backup registers pg_stop_backup_callback, performs stop operations, unregisters pg_stop_backup_callback from before_shmem_exit_list, and sets the sessionBackupState = SESSION_BACKUP_NONE, note that the callback do_pg_abort_backup registered by start_backup command still exists in the before_shmem_exit_list and will not be removed by stop_backup. On session exit, do_pg_abort_backup gets called but returns without performing any operations(not even throwing a warning), by checking sessionBackupState which was set to SESSION_BACKUP_NONE by stop_backup.
for v13 and latest versions, start_backup first registers do_pg_abort_backup,
and then pg_start_backup_callback, performs startup backup operations
and unregisters only pg_start_backup_callback from before_shmem_exit_list,
retaining do_pg_abort_backup still in the list, which is to be called
on session's exit
JIT compiler inserts it's own exit call back at the end of
before_shmem_exit_list array.
stop_backup registers pg_stop_backup_callback, performs stop operations,
unregisters pg_stop_backup_callback from before_shmem_exit_list, and sets
the sessionBackupState = SESSION_BACKUP_NONE, note that the callback
do_pg_abort_backup registered by start_backup command still exists in the
before_shmem_exit_list and will not be removed by stop_backup. On session exit,
do_pg_abort_backup gets called but returns without performing any operations(not
even throwing a warning), by checking sessionBackupState which was set to
SESSION_BACKUP_NONE by stop_backup.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-Modify-cancel_before_shmem_exit-to-search-all-reg.patchapplication/octet-stream; name=v1-0001-Modify-cancel_before_shmem_exit-to-search-all-reg.patchDownload
From ef10416c6cdce0b2fbd7a0539c864655b18f0036 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 3 Jul 2020 16:19:06 +0530
Subject: [PATCH v1] Modify cancel_before_shmem_exit to search all registered
 exit callbacks

cancel_before_shmem_exit: currently looks for only one entry
pointed to by before_shmem_exit_index i.e last entry in
before_shmem_exit_list array. This might have problems.
For instance, (happens only postgres version 12)
pg_start_backup  makes nonexclusive_base_backup_cleanup callback
entry in the list, executes a jit enabled query with jit compiler
registering it's own exit call back llvm_shutdown, and then user
issues pg_stop_backup which tries to unregister the exit callback
i.e. nonexclusive_base_backup_cleanup using
cancel_before_shmem_exit, looks at only the last element i.e.
llvm_shutdown in the list and exits, causing intended one
remaining in the list which gets executed at the proc exit of
the session causing some of the callbacks not called at the
intended moment. Hence, cancel_before_shmem_exit is imrpoved
to look for the requested entry in the entire list/array not
only the latest/last entry pointed to by before_shmem_exit_index
and remove it.
---
 src/backend/storage/ipc/ipc.c | 61 ++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..1475cf49df 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,19 +381,64 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  Look through all entries in before_shmem_exit_list
+ *		starting from the latest entry i.e. entry at
+ *		before_shmem_exit_index - 1, try to find the entry, if found
+ *      adjust the elements next to the found entry and decrement
+ *  	the before_shmem_exit_index.
  * ----------------------------------------------------------------
  */
 void
 cancel_before_shmem_exit(pg_on_exit_callback function, Datum arg)
 {
-	if (before_shmem_exit_index > 0 &&
-		before_shmem_exit_list[before_shmem_exit_index - 1].function
-		== function &&
-		before_shmem_exit_list[before_shmem_exit_index - 1].arg == arg)
-		--before_shmem_exit_index;
+	int idx;
+
+	/* Start from the last entry. */
+	idx = before_shmem_exit_index - 1;
+
+	while (idx > 0)
+	{
+		if (before_shmem_exit_list[idx].function == function &&
+			before_shmem_exit_list[idx].arg == arg)
+		{
+			/* Entry found. */
+			if (idx == before_shmem_exit_index - 1)
+			{
+				/*
+				 * Found entry is last entry in the array,
+				 * so, just reinitialize function and arg,
+				 * adjust the index and exit.
+				 */
+				before_shmem_exit_list[idx].function = NULL;
+				before_shmem_exit_list[idx].arg = (Datum) 0;
+				before_shmem_exit_index--;
+				break;
+			}
+			else
+			{
+				/*
+				 * Found entry is not the last entry in the array,
+				 * it is somewhere else in the array. Move the entries
+				 * next to the found entry location
+				 */
+				int idx1 = idx;
+
+				while (idx1 < before_shmem_exit_index - 1)
+				{
+					before_shmem_exit_list[idx1].function = before_shmem_exit_list[idx1+1].function;
+					before_shmem_exit_list[idx1].arg = before_shmem_exit_list[idx1+1].arg;
+					idx1++;
+				}
+
+				/* Reinitialize the last entry, for further usage. */
+				before_shmem_exit_list[idx1].function = NULL;
+				before_shmem_exit_list[idx1].arg = (Datum) 0;
+				before_shmem_exit_index--;
+				break;
+			}
+		}
+		idx--;
+	}
 }
 
 /* ----------------------------------------------------------------
-- 
2.25.1

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bharath Rupireddy (#1)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
on_shmem_exit call back which will
add this function to the before_shmem_exit_list array which is
supposed to be removed on pg_stop_backup
so that we can do the pending cleanup and issue a warning for each
pg_start_backup for which we did not call
the pg_stop backup. Now, I executed a query for which JIT is picked
up, then the the llvm compiler inserts it's
own exit callback i.e. llvm_shutdown at the end of
before_shmem_exit_list array. Now, I executed pg_stop_backup
and call to cancel_before_shmem_exit() is made with the expectation
that the nonexclusive_base_backup_cleanup
callback is removed from before_shmem_exit_list array.

I'm of the opinion that the JIT code is abusing this mechanism, and the
right thing to do is fix that. The restriction you propose to remove is
not just there out of laziness, it's an expectation about what safe use of
this mechanism would involve. Un-ordered removal of callbacks seems
pretty risky: it would mean that whatever cleanup is needed is not going
to be done in LIFO order.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Hi,

On 2020-07-07 09:44:41 -0400, Tom Lane wrote:

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
on_shmem_exit call back which will
add this function to the before_shmem_exit_list array which is
supposed to be removed on pg_stop_backup
so that we can do the pending cleanup and issue a warning for each
pg_start_backup for which we did not call
the pg_stop backup. Now, I executed a query for which JIT is picked
up, then the the llvm compiler inserts it's
own exit callback i.e. llvm_shutdown at the end of
before_shmem_exit_list array. Now, I executed pg_stop_backup
and call to cancel_before_shmem_exit() is made with the expectation
that the nonexclusive_base_backup_cleanup
callback is removed from before_shmem_exit_list array.

I'm of the opinion that the JIT code is abusing this mechanism, and the
right thing to do is fix that.

What are you proposing? For now we could easily enough work around this
by just making it a on_proc_exit() callback, but that doesn't really
change the fundamental issue imo.

The restriction you propose to remove is not just there out of
laziness, it's an expectation about what safe use of this mechanism
would involve. Un-ordered removal of callbacks seems pretty risky: it
would mean that whatever cleanup is needed is not going to be done in
LIFO order.

Maybe I am confused, but isn't it <13's pg_start_backup() that's
violating the protocol much more clearly than the JIT code? Given that
it relies on there not being any callbacks registered between two SQL
function calls? I mean, what it does is basically:

1) before_shmem_exit(nonexclusive_base_backup_cleanup...
2) arbitrary code executed for a long time
3) cancel_before_shmem_exit(nonexclusive_base_backup_cleanup...

which pretty obviously can't at all deal with any other
before_shmem_exit callbacks being registered in 2). Won't this be a
problem for every other before_shmem_exit callback that we register
on-demand? Say Async_UnlistenOnExit, RemoveTempRelationsCallback?

Greetings,

Andres Freund

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Andres Freund (#3)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Tue, Jul 7, 2020 at 10:24 PM Andres Freund <andres@anarazel.de> wrote:

On 2020-07-07 09:44:41 -0400, Tom Lane wrote:

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
on_shmem_exit call back which will
add this function to the before_shmem_exit_list array which is
supposed to be removed on pg_stop_backup
so that we can do the pending cleanup and issue a warning for each
pg_start_backup for which we did not call
the pg_stop backup. Now, I executed a query for which JIT is picked
up, then the the llvm compiler inserts it's
own exit callback i.e. llvm_shutdown at the end of
before_shmem_exit_list array. Now, I executed pg_stop_backup
and call to cancel_before_shmem_exit() is made with the expectation
that the nonexclusive_base_backup_cleanup
callback is removed from before_shmem_exit_list array.

I'm of the opinion that the JIT code is abusing this mechanism, and the
right thing to do is fix that.

What are you proposing? For now we could easily enough work around this
by just making it a on_proc_exit() callback, but that doesn't really
change the fundamental issue imo.

The restriction you propose to remove is not just there out of
laziness, it's an expectation about what safe use of this mechanism
would involve. Un-ordered removal of callbacks seems pretty risky: it
would mean that whatever cleanup is needed is not going to be done in
LIFO order.

I quickly searched(in HEAD) what all the callbacks are getting
registered to before_shmem_exit_list, with the intention to see if
they also call corresponding cancel_before_shmem_exit() after their
intended usage is done.

For few of the callbacks there is no cancel_before_shmem_exit(). This
seems expected; those callbacks ought to be executed before shmem
exit. These callbacks are(let say SET 1): ShutdownPostgres,
logicalrep_worker_onexit, llvm_shutdown, Async_UnlistenOnExit,
RemoveTempRelationsCallback, ShutdownAuxiliaryProcess,
do_pg_abort_backup in xlog.c (this callback exist only in v13 or
later), AtProcExit_Twophase.

Which means, once they are into the before_shmem_exit_list array, in
some order, they are never going to be removed from it as they don't
have corresponding cancel_before_shmem_exit() and the relative order
of execution remains the same.

And there are other callbacks that are getting registered to
before_shmem_exit_list array(let say SET 2): apw_detach_shmem,
_bt_end_vacuum_callback, pg_start_backup_callback,
pg_stop_backup_callback, createdb_failure_callback,
movedb_failure_callback, do_pg_abort_backup(in basebackup.c). They all
have corresponding cancel_before_shmem_exit() to unregister/remove the
callbacks from before_shmem_exit_list array.

I think the callbacks that have no cancel_before_shmem_exit()(SET 1)
may have to be executed in the LIFO order: it makes sense to execute
ShutdownPostgres at the end after let's say other callbacks in SET 1.

And the SET 2 callbacks have cancel_before_shmem_exit() with the only
intention that there's no need to call the callbacks on the
before_shmem_exit(), since they are not needed, and try to remove from
the before_shmem_exit_list array and may fail, if any other callback
gets registered in between.

If I'm not wrong with the above points, we must enhance
cancel_before_shmem_exit() or have cancel_before_shmem_exit_v2() (as
mentioned in my below response).

Maybe I am confused, but isn't it <13's pg_start_backup() that's
violating the protocol much more clearly than the JIT code? Given that
it relies on there not being any callbacks registered between two SQL
function calls? I mean, what it does is basically:

1) before_shmem_exit(nonexclusive_base_backup_cleanup...
2) arbitrary code executed for a long time
3) cancel_before_shmem_exit(nonexclusive_base_backup_cleanup...

which pretty obviously can't at all deal with any other
before_shmem_exit callbacks being registered in 2). Won't this be a
problem for every other before_shmem_exit callback that we register
on-demand? Say Async_UnlistenOnExit, RemoveTempRelationsCallback?

Yes, for versions <13's, clearly pg_start_backup causes the problem
and the issue can also be reproduced with Async_UnlistenOnExit,
RemoveTempRelationsCallback coming in between pg_start_backup and
pg_stop_backup.

We can have it fixed in a few ways: 1) enhance
cancel_before_shmem_exit() as attached in the original patch. 2) have
existing cancel_before_shmem_exit(), whenever called for
nonexclusive_base_backup_cleanup(), we can look for the entire array
instead of just the last entry. 3) have a separate function, say,
cancel_before_shmem_exit_v2(), that searches for the entire
before_shmem_exit_list array(the logic proposed in this patch) so that
it will not disturb the existing cancel_before_shmem_exit(). 4) or try
to have the pg_start_backup code that exists in after > 13 versions.

If okay to have cancel_before_shmem_exit_v2() for versions < 13's,
with the searching for the entire array instead of just the last
element to fix the abort issue, maybe we can have this function in
version 13 and latest as well(at least with disable mode, something
like #if 0 ... #endif), so that in case if any of similar issues arise
we could just quickly reuse.

If the before_shmem_exit_list array is to be used in LIFO order, do we
have some comment/usage guideline mentioned in the ipc.c/.h? I didn't
find one.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#5Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#3)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Tue, Jul 7, 2020 at 12:55 PM Andres Freund <andres@anarazel.de> wrote:

What are you proposing? For now we could easily enough work around this
by just making it a on_proc_exit() callback, but that doesn't really
change the fundamental issue imo.

I think it would be more correct for it to be an on_proc_exit()
callback, because before_shmem_exit() callbacks can and do perform
actions which rely on an awful lot of the system being still in a
working state. RemoveTempRelationsCallback() is a good example: it
thinks it can start and end transactions and make a bunch of catalog
changes. I don't know that any of that could use JIT, but moving the
JIT cleanup to the on_shmem_exit() stage seems better. At that point,
there shouldn't be anybody doing anything that relies on being able to
perform logical changes to the database; we're just shutting down
low-level subsystems at that point, and thus presumably not doing
anything that could possibly need JIT.

But I also agree that what pg_start_backup() was doing before v13 was
wrong; that's why I committed
303640199d0436c5e7acdf50b837a027b5726594. The only reason I didn't
back-patch it is because the consequences are so minor I didn't think
it was worth worrying about. We could, though. I'd be somewhat
inclined to both do that and also change LLVM to use on_proc_exit() in
master, but I don't feel super-strongly about it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#5)
4 attachment(s)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Tue, Jul 21, 2020 at 1:17 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 7, 2020 at 12:55 PM Andres Freund <andres@anarazel.de> wrote:

What are you proposing? For now we could easily enough work around this
by just making it a on_proc_exit() callback, but that doesn't really
change the fundamental issue imo.

I think it would be more correct for it to be an on_proc_exit()
callback, because before_shmem_exit() callbacks can and do perform
actions which rely on an awful lot of the system being still in a
working state. RemoveTempRelationsCallback() is a good example: it
thinks it can start and end transactions and make a bunch of catalog
changes. I don't know that any of that could use JIT, but moving the
JIT cleanup to the on_shmem_exit() stage seems better. At that point,
there shouldn't be anybody doing anything that relies on being able to
perform logical changes to the database; we're just shutting down
low-level subsystems at that point, and thus presumably not doing
anything that could possibly need JIT.

I looked at what actually llvm_shutdown() does? It frees up JIT stacks,
also if exists perf related resource, using LLVMOrcDisposeInstance() and
LLVMOrcUnregisterPerf(), that were dynamically allocated in
llvm_session_initialize through a JIT library function
LLVMOrcCreateInstance() [1]https://llvm.org/doxygen/OrcCBindings_8cpp_source.html.

It looks like there is no problem in moving llvm_shutdown to either
on_shmem_exit() or on_proc_exit().

[1]: https://llvm.org/doxygen/OrcCBindings_8cpp_source.html

But I also agree that what pg_start_backup() was doing before v13 was
wrong; that's why I committed
303640199d0436c5e7acdf50b837a027b5726594. The only reason I didn't
back-patch it is because the consequences are so minor I didn't think
it was worth worrying about. We could, though. I'd be somewhat
inclined to both do that and also change LLVM to use on_proc_exit() in
master, but I don't feel super-strongly about it.

Patch: v1-0001-Move-llvm_shutdown-to-on_proc_exit-list-from-befo.patch
Moved llvm_shutdown to on_proc_exit() call back list. Request to consider
this change for master, if possible <=13 versions. Basic JIT use cases and
regression tests are working fine with the patch.

Patches: PG11-0001-Fix-minor-problems-with-non-exclusive-backup-clea.patch
and PG12-0001-Fix-minor-problems-with-non-exclusive-backup-cleanup.patch
Request to consider the commit
303640199d0436c5e7acdf50b837a027b5726594(above two patches are for this
commit) to versions < 13, to fix the abort issue. Please note that the
above two patches have no difference in the code, just I made it applicable
on PG11.

Patch: v1-0001-Modify-cancel_before_shmem_exit-comments.patch
This patch, modifies cancel_before_shmem_exit() function comment to reflect
the safe usage of before_shmem_exit_list callback mechanism and also
removes the point "For simplicity, only the latest entry can be
removed*********" as this gives a meaning that there is still scope for
improvement in cancel_before_shmem_exit() search mechanism.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-Move-llvm_shutdown-to-on_proc_exit-list-from-befo.patchapplication/x-patch; name=v1-0001-Move-llvm_shutdown-to-on_proc_exit-list-from-befo.patchDownload
From 5fae4b3a046789fb7b54e689c979b01cb322aaf9 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 23 Jul 2020 17:56:21 +0530
Subject: [PATCH v1] Move llvm_shutdown to on_proc_exit list from
 before_shmem_exit.

llvm_shutdown frees up dynamically allocated memory for jit
compilers in a session/backend. Having this as on_proc_exit doesn't
cause any harm.
---
 src/backend/jit/llvm/llvmjit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index af8b34aaaf..43bed78a52 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -683,7 +683,7 @@ llvm_session_initialize(void)
 	}
 #endif
 
-	before_shmem_exit(llvm_shutdown, 0);
+	on_proc_exit(llvm_shutdown, 0);
 
 	llvm_session_initialized = true;
 
-- 
2.25.1

PG11-0001-Fix-minor-problems-with-non-exclusive-backup-clea.patchapplication/x-patch; name=PG11-0001-Fix-minor-problems-with-non-exclusive-backup-clea.patchDownload
From 3b0a048afd7ad6d8564a54d13397c72f6eadc5da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 24 Jul 2020 08:55:52 +0530
Subject: [PATCH v5] Fix minor problems with non-exclusive backup cleanup.

The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.

To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.

This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.
---
 src/backend/access/transam/xlog.c      | 32 +++++++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c | 17 ++------------
 src/backend/replication/basebackup.c   | 16 ++-----------
 src/include/access/xlog.h              |  3 ++-
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28292393d1..aad37d9509 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11329,23 +11329,30 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
  * system out of backup mode, thus making it a lot more safe to call from
  * an error handler.
  *
+ * The caller can pass 'arg' as 'true' or 'false' to control whether a warning
+ * is emitted.
+ *
  * NB: This is only for aborting a non-exclusive backup that doesn't write
  * backup_label. A backup started with pg_start_backup() needs to be finished
  * with pg_stop_backup().
+ *
+ * NB: This gets used as a before_shmem_exit handler, hence the odd-looking
+ * signature.
  */
 void
-do_pg_abort_backup(void)
+do_pg_abort_backup(int code, Datum arg)
 {
+	bool	emit_warning = DatumGetBool(arg);
+
 	/*
 	 * Quick exit if session is not keeping around a non-exclusive backup
 	 * already started.
 	 */
-	if (sessionBackupState == SESSION_BACKUP_NONE)
+	if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
 		return;
 
 	WALInsertLockAcquireExclusive();
 	Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-	Assert(sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
 	XLogCtl->Insert.nonExclusiveBackups--;
 
 	if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
@@ -11354,6 +11361,25 @@ do_pg_abort_backup(void)
 		XLogCtl->Insert.forcePageWrites = false;
 	}
 	WALInsertLockRelease();
+
+	if (emit_warning)
+		ereport(WARNING,
+				(errmsg("aborting backup due to backend exiting before pg_stop_back up was called")));
+}
+
+/*
+ * Register a handler that will warn about unterminated backups at end of
+ * session, unless this has already been done.
+ */
+void
+register_persistent_abort_backup_handler(void)
+{
+	static bool already_done = false;
+
+	if (already_done)
+		return;
+	before_shmem_exit(do_pg_abort_backup, DatumGetBool(true));
+	already_done = true;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 9731742978..cc5e97c38b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -42,18 +42,6 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
-/*
- * Called when the backend exits with a running non-exclusive base backup,
- * to clean up state.
- */
-static void
-nonexclusive_base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-	ereport(WARNING,
-			(errmsg("aborting backup due to backend exiting before pg_stop_backup was called")));
-}
-
 /*
  * pg_start_backup: set up for taking an on-line backup dump
  *
@@ -101,10 +89,10 @@ pg_start_backup(PG_FUNCTION_ARGS)
 		tblspc_map_file = makeStringInfo();
 		MemoryContextSwitchTo(oldcontext);
 
+		register_persistent_abort_backup_handler();
+
 		startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
 										NULL, tblspc_map_file, false, true);
-
-		before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 	}
 
 	PG_RETURN_LSN(startpoint);
@@ -246,7 +234,6 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 		 * and tablespace map so they can be written to disk by the caller.
 		 */
 		stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
-		cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 
 		values[1] = CStringGetTextDatum(label_file->data);
 		values[2] = CStringGetTextDatum(tblspc_map_file->data);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 2eba0dc21a..3e53b3df6f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -66,7 +66,6 @@ static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *sta
 			 bool sizeonly);
 static void send_int8_string(StringInfoData *buf, int64 intval);
 static void SendBackupHeader(List *tablespaces);
-static void base_backup_cleanup(int code, Datum arg);
 static void perform_base_backup(basebackup_options *opt);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
@@ -231,17 +230,6 @@ static const struct exclude_list_item noChecksumFiles[] = {
 	{NULL, false}
 };
 
-
-/*
- * Called when ERROR or FATAL happens in perform_base_backup() after
- * we have started the backup - make sure we end it!
- */
-static void
-base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-}
-
 /*
  * Actually do a base backup for the specified tablespaces.
  *
@@ -280,7 +268,7 @@ perform_base_backup(basebackup_options *opt)
 	 * do_pg_stop_backup() should be inside the error cleanup block!
 	 */
 
-	PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 	{
 		ListCell   *lc;
 		tablespaceinfo *ti;
@@ -389,7 +377,7 @@ perform_base_backup(basebackup_options *opt)
 
 		endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 
 
 	if (opt->includewal)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1d9c87cb82..e422a9dc8b 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -326,7 +326,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
 				   bool needtblspcmapfile);
 extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
 				  TimeLineID *stoptli_p);
-extern void do_pg_abort_backup(void);
+extern void do_pg_abort_backup(int code, Datum arg);
+extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
-- 
2.25.1

PG12-0001-Fix-minor-problems-with-non-exclusive-backup-cleanup.patchapplication/x-patch; name=PG12-0001-Fix-minor-problems-with-non-exclusive-backup-cleanup.patchDownload
From 303640199d0436c5e7acdf50b837a027b5726594 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 19 Dec 2019 09:06:54 -0500
Subject: [PATCH] Fix minor problems with non-exclusive backup cleanup.

The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.

To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.

This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.

Patch by me, reviewed by Kyotaro Horiguchi, Michael Paquier (who
preferred a different approach, but got outvoted), Fujii Masao,
and Tom Lane, and with comments by various others.

Discussion: http://postgr.es/m/CA+TgmobMjnyBfNhGTKQEDbqXYE3_rXWpc4CM63fhyerNCes3mA@mail.gmail.com
---
 src/backend/access/transam/xlog.c      | 32 +++++++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c | 17 ++------------
 src/backend/replication/basebackup.c   | 16 ++-----------
 src/include/access/xlog.h              |  3 ++-
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 71b8389ba1..edee0c0f22 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11133,23 +11133,30 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
  * system out of backup mode, thus making it a lot more safe to call from
  * an error handler.
  *
+ * The caller can pass 'arg' as 'true' or 'false' to control whether a warning
+ * is emitted.
+ *
  * NB: This is only for aborting a non-exclusive backup that doesn't write
  * backup_label. A backup started with pg_start_backup() needs to be finished
  * with pg_stop_backup().
+ *
+ * NB: This gets used as a before_shmem_exit handler, hence the odd-looking
+ * signature.
  */
 void
-do_pg_abort_backup(void)
+do_pg_abort_backup(int code, Datum arg)
 {
+	bool	emit_warning = DatumGetBool(arg);
+
 	/*
 	 * Quick exit if session is not keeping around a non-exclusive backup
 	 * already started.
 	 */
-	if (sessionBackupState == SESSION_BACKUP_NONE)
+	if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
 		return;
 
 	WALInsertLockAcquireExclusive();
 	Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-	Assert(sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
 	XLogCtl->Insert.nonExclusiveBackups--;
 
 	if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
@@ -11158,6 +11165,25 @@ do_pg_abort_backup(void)
 		XLogCtl->Insert.forcePageWrites = false;
 	}
 	WALInsertLockRelease();
+
+	if (emit_warning)
+		ereport(WARNING,
+				(errmsg("aborting backup due to backend exiting before pg_stop_back up was called")));
+}
+
+/*
+ * Register a handler that will warn about unterminated backups at end of
+ * session, unless this has already been done.
+ */
+void
+register_persistent_abort_backup_handler(void)
+{
+	static bool already_done = false;
+
+	if (already_done)
+		return;
+	before_shmem_exit(do_pg_abort_backup, DatumGetBool(true));
+	already_done = true;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 1fccf29a36..5fd12152b2 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -44,18 +44,6 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
-/*
- * Called when the backend exits with a running non-exclusive base backup,
- * to clean up state.
- */
-static void
-nonexclusive_base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-	ereport(WARNING,
-			(errmsg("aborting backup due to backend exiting before pg_stop_backup was called")));
-}
-
 /*
  * pg_start_backup: set up for taking an on-line backup dump
  *
@@ -103,10 +91,10 @@ pg_start_backup(PG_FUNCTION_ARGS)
 		tblspc_map_file = makeStringInfo();
 		MemoryContextSwitchTo(oldcontext);
 
+		register_persistent_abort_backup_handler();
+
 		startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
 										NULL, tblspc_map_file, false, true);
-
-		before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 	}
 
 	PG_RETURN_LSN(startpoint);
@@ -248,7 +236,6 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 		 * and tablespace map so they can be written to disk by the caller.
 		 */
 		stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
-		cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 
 		values[1] = CStringGetTextDatum(label_file->data);
 		values[2] = CStringGetTextDatum(tblspc_map_file->data);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..1bb72a0a57 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -65,7 +65,6 @@ static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *sta
 						  bool sizeonly);
 static void send_int8_string(StringInfoData *buf, int64 intval);
 static void SendBackupHeader(List *tablespaces);
-static void base_backup_cleanup(int code, Datum arg);
 static void perform_base_backup(basebackup_options *opt);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
@@ -216,17 +215,6 @@ static const char *const noChecksumFiles[] = {
 	NULL,
 };
 
-
-/*
- * Called when ERROR or FATAL happens in perform_base_backup() after
- * we have started the backup - make sure we end it!
- */
-static void
-base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-}
-
 /*
  * Actually do a base backup for the specified tablespaces.
  *
@@ -265,7 +253,7 @@ perform_base_backup(basebackup_options *opt)
 	 * do_pg_stop_backup() should be inside the error cleanup block!
 	 */
 
-	PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 	{
 		ListCell   *lc;
 		tablespaceinfo *ti;
@@ -374,7 +362,7 @@ perform_base_backup(basebackup_options *opt)
 
 		endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 
 
 	if (opt->includewal)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9b588c87a5..3fea1993bc 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -349,7 +349,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
 									 bool needtblspcmapfile);
 extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
 									TimeLineID *stoptli_p);
-extern void do_pg_abort_backup(void);
+extern void do_pg_abort_backup(int code, Datum arg);
+extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
-- 
2.25.1

v1-0001-Modify-cancel_before_shmem_exit-comments.patchapplication/x-patch; name=v1-0001-Modify-cancel_before_shmem_exit-comments.patchDownload
From b1c8e83b5c81102070a66d7761ee2aa8894d6ec2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 24 Jul 2020 09:35:54 +0530
Subject: [PATCH v1] Modify cancel_before_shmem_exit() comments

Currently, there is no mention of safe usage of cancel_before_shmem_exit()
in the function comment and also it has a point that, there is still
scope for improving the way cancel_before_shmem_exit() looks for callback
to removed from before_shmem_exit_list. So, this patch modifies the comment
to reflect how the caller needs to use before_shmem_exit_list mechanism.
---
 src/backend/storage/ipc/ipc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..4ade61fe38 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,9 +381,9 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  We only look at the latest entry for removal, as we
+ * 		expect the caller to use before_shmem_exit callback mechanism
+ * 		in the LIFO order. Un-ordered removal of callbacks may be risky.
  * ----------------------------------------------------------------
  */
 void
-- 
2.25.1

#7Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#6)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Fri, Jul 24, 2020 at 7:10 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I looked at what actually llvm_shutdown() does? It frees up JIT stacks, also if exists perf related resource, using LLVMOrcDisposeInstance() and LLVMOrcUnregisterPerf(), that were dynamically allocated in llvm_session_initialize through a JIT library function LLVMOrcCreateInstance() [1].

It looks like there is no problem in moving llvm_shutdown to either on_shmem_exit() or on_proc_exit().

If it doesn't involve shared memory, I guess it can be on_proc_exit()
rather than on_shmem_exit().

I guess the other question is why we're doing it at all. What
resources are being allocated that wouldn't be freed up by process
exit anyway?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#7)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Fri, Jul 24, 2020 at 8:07 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 24, 2020 at 7:10 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I looked at what actually llvm_shutdown() does? It frees up JIT stacks,

also if exists perf related resource, using LLVMOrcDisposeInstance() and
LLVMOrcUnregisterPerf(), that were dynamically allocated in
llvm_session_initialize through a JIT library function
LLVMOrcCreateInstance() [1]https://llvm.org/doxygen/OrcCBindings_8cpp_source.html.

It looks like there is no problem in moving llvm_shutdown to either

on_shmem_exit() or on_proc_exit().

If it doesn't involve shared memory, I guess it can be on_proc_exit()
rather than on_shmem_exit().

I guess the other question is why we're doing it at all. What
resources are being allocated that wouldn't be freed up by process
exit anyway?

LLVMOrcCreateInstance() and LLVMOrcDisposeInstance() are doing new and
delete respectively, I just found these functions from the link [1]https://llvm.org/doxygen/OrcCBindings_8cpp_source.html. But I
don't exactly know whether there are any other resources being allocated
that can't be freed up by proc_exit(). Tagging @Andres Freund for inputs
on whether we have any problem making llvm_shutdown() a on_proc_exit()
callback instead of before_shmem_exit() callback.

And as suggested in the previous mails, we wanted to make it on_proc_exit()
to avoid the abort issue reported in this mail chain, however if we take
the abort issue fix commit # 303640199d0436c5e7acdf50b837a027b5726594 as
mentioned in the previous response[2]/messages/by-id/CALj2ACVwOKZ8qYUsZrU2y2efnYZOLRxPC6k52FQcB3oriH9Kcg@mail.gmail.com, then it may not be necessary, right
now, but just to be safer and to avoid any of these similar kind of issues
in future, we can consider this change as well.

[1]: https://llvm.org/doxygen/OrcCBindings_8cpp_source.html

LLVMOrcJITStackRef LLVMOrcCreateInstance(LLVMTargetMachineRef TM) {
TargetMachine *TM2(unwrap(TM));
Triple T(TM2->getTargetTriple());
auto IndirectStubsMgrBuilder =
orc::createLocalIndirectStubsManagerBuilder(T);
OrcCBindingsStack *JITStack =
new OrcCBindingsStack(*TM2, std::move(IndirectStubsMgrBuilder));
return wrap(JITStack);
}

LLVMErrorRef LLVMOrcDisposeInstance(LLVMOrcJITStackRef JITStack) {
auto *J = unwrap(JITStack);
auto Err = J->shutdown();
delete J;
return wrap(std::move(Err));
}

[2]: /messages/by-id/CALj2ACVwOKZ8qYUsZrU2y2efnYZOLRxPC6k52FQcB3oriH9Kcg@mail.gmail.com
/messages/by-id/CALj2ACVwOKZ8qYUsZrU2y2efnYZOLRxPC6k52FQcB3oriH9Kcg@mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#9Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#5)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Mon, Jul 20, 2020 at 3:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

But I also agree that what pg_start_backup() was doing before v13 was
wrong; that's why I committed
303640199d0436c5e7acdf50b837a027b5726594. The only reason I didn't
back-patch it is because the consequences are so minor I didn't think
it was worth worrying about. We could, though. I'd be somewhat
inclined to both do that and also change LLVM to use on_proc_exit() in
master, but I don't feel super-strongly about it.

Unless somebody complains pretty soon, I'm going to go ahead and do
what is described above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#9)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Thu, Jul 30, 2020 at 8:11 AM Robert Haas <robertmhaas@gmail.com> wrote:

Unless somebody complains pretty soon, I'm going to go ahead and do
what is described above.

Done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#10)
1 attachment(s)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Thu, Aug 6, 2020 at 11:51 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jul 30, 2020 at 8:11 AM Robert Haas <robertmhaas@gmail.com> wrote:

Unless somebody complains pretty soon, I'm going to go ahead and do
what is described above.

Done.

Thanks!

I have one more request to make: since we are of the opinion to not
change the way cancel_before_shmem_exit() searches
before_shmem_exit_list array, wouldn't it be good to adjust comments
before the function cancel_before_shmem_exit()?

I sent the patch previously[1]/messages/by-id/CALj2ACVwOKZ8qYUsZrU2y2efnYZOLRxPC6k52FQcB3oriH9Kcg@mail.gmail.com, but attaching here again, modifies
cancel_before_shmem_exit() function comment to reflect the safe usage
of before_shmem_exit_list callback mechanism and also removes the
point "For simplicity, only the latest entry can be removed*********"
as this gives a meaning that there is still scope for improvement in
cancel_before_shmem_exit() search mechanism.

Thoughts?

[1]: /messages/by-id/CALj2ACVwOKZ8qYUsZrU2y2efnYZOLRxPC6k52FQcB3oriH9Kcg@mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-Modify-cancel_before_shmem_exit-comments.patchapplication/x-patch; name=v1-0001-Modify-cancel_before_shmem_exit-comments.patchDownload
From b1c8e83b5c81102070a66d7761ee2aa8894d6ec2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 24 Jul 2020 09:35:54 +0530
Subject: [PATCH v1] Modify cancel_before_shmem_exit() comments

Currently, there is no mention of safe usage of cancel_before_shmem_exit()
in the function comment and also it has a point that, there is still
scope for improving the way cancel_before_shmem_exit() looks for callback
to removed from before_shmem_exit_list. So, this patch modifies the comment
to reflect how the caller needs to use before_shmem_exit_list mechanism.
---
 src/backend/storage/ipc/ipc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..4ade61fe38 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,9 +381,9 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  We only look at the latest entry for removal, as we
+ * 		expect the caller to use before_shmem_exit callback mechanism
+ * 		in the LIFO order. Un-ordered removal of callbacks may be risky.
  * ----------------------------------------------------------------
  */
 void
-- 
2.25.1

#12Robert Haas
robertmhaas@gmail.com
In reply to: Bharath Rupireddy (#11)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Thu, Aug 6, 2020 at 11:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I sent the patch previously[1], but attaching here again, modifies
cancel_before_shmem_exit() function comment to reflect the safe usage
of before_shmem_exit_list callback mechanism and also removes the
point "For simplicity, only the latest entry can be removed*********"
as this gives a meaning that there is still scope for improvement in
cancel_before_shmem_exit() search mechanism.

Thoughts?

I think that the first part of the comment change you suggest is a
good idea and would avoid developer confusion, but I think that the
statement about unordered removal of comments being risky doesn't add
much. It's too vague to help anybody and I don't think I believe it,
either. So I suggest something more like:

- * callback.  For simplicity, only the latest entry can be
- * removed.  (We could work harder but there is no need for
- * current uses.)
+ * callback.  We only look at the latest entry for removal, as we
+ * expect the caller to use before_shmem_exit callback mechanism
+ * in the LIFO order.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Robert Haas <robertmhaas@gmail.com> writes:

... So I suggest something more like:

- * callback.  For simplicity, only the latest entry can be
- * removed.  (We could work harder but there is no need for
- * current uses.)
+ * callback.  We only look at the latest entry for removal, as we
+ * expect the caller to use before_shmem_exit callback mechanism
+ * in the LIFO order.

That's a meaningless statement for any one caller. So it needs to be more
like "we expect callers to add and remove temporary before_shmem_exit
callbacks in strict LIFO order".

I wonder whether we ought to change the function to complain if the
last list entry doesn't match. We'd have caught this bug sooner
if it did, and it's not very clear why silently doing nothing is
a good idea when there's no match.

regards, tom lane

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Fri, Aug 7, 2020 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's a meaningless statement for any one caller. So it needs to be more
like "we expect callers to add and remove temporary before_shmem_exit
callbacks in strict LIFO order".

Sure, that seems fine.

I wonder whether we ought to change the function to complain if the
last list entry doesn't match. We'd have caught this bug sooner
if it did, and it's not very clear why silently doing nothing is
a good idea when there's no match.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#12)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Hi,

On 2020-08-07 12:29:03 -0400, Robert Haas wrote:

On Thu, Aug 6, 2020 at 11:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I sent the patch previously[1], but attaching here again, modifies
cancel_before_shmem_exit() function comment to reflect the safe usage
of before_shmem_exit_list callback mechanism and also removes the
point "For simplicity, only the latest entry can be removed*********"
as this gives a meaning that there is still scope for improvement in
cancel_before_shmem_exit() search mechanism.

Thoughts?

I think that the first part of the comment change you suggest is a
good idea and would avoid developer confusion, but I think that the
statement about unordered removal of comments being risky doesn't add
much. It's too vague to help anybody and I don't think I believe it,
either. So I suggest something more like:

- * callback.  For simplicity, only the latest entry can be
- * removed.  (We could work harder but there is no need for
- * current uses.)
+ * callback.  We only look at the latest entry for removal, as we
+ * expect the caller to use before_shmem_exit callback mechanism
+ * in the LIFO order.

In which situations is the removal actually useful *and* safe, with
these constraints? You'd have to have a very narrow set of functions
that are called while the exit hook is present, i.e. basically this
would only be usable for PG_ENSURE_ERROR_CLEANUP and nothing else. And
even there it seems like it's pretty easy to get into a situation where
it's not safe.

Greetings,

Andres Freund

#16Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#14)
2 attachment(s)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Fri, Aug 7, 2020 at 11:09 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Aug 7, 2020 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's a meaningless statement for any one caller. So it needs to be more
like "we expect callers to add and remove temporary before_shmem_exit
callbacks in strict LIFO order".

Sure, that seems fine.

v2 patch has the comments modified.

I wonder whether we ought to change the function to complain if the
last list entry doesn't match. We'd have caught this bug sooner
if it did, and it's not very clear why silently doing nothing is
a good idea when there's no match.

+1.

This is a good idea. v3 patch has both the modified comments(from v2)
as well as a DEBUG3 (DEBUG3 level, because the other
non-error/non-fatal logs in ipc.c are using the same level) log to
report when the latest entry for removal is not matched with the one
the caller cancel_before_shmem_exit() is looking for and a hint on how
to safely use temporary before_shmem_exit() callbacks. In v3 patch,
function pointer is being printed, I'm not sure how much it is helpful
to have function pointers in the logs though there are some other
places printing pointers into the logs, I wish I could print function
names. (Is there a way we could get function names from function
pointers?).

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-Modify-cancel_before_shmem_exit-comments.patchapplication/x-patch; name=v2-0001-Modify-cancel_before_shmem_exit-comments.patchDownload
From 7cdea387c2e18eb537077ccf518747b9cffb405c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 10 Aug 2020 10:42:23 +0530
Subject: [PATCH v2] Modify cancel_before_shmem_exit() comments

Currently, there is no mention of safe usage of cancel_before_shmem_exit()
in the function comment and also it has a point that, there is still
scope for improving the way cancel_before_shmem_exit() looks for callback
to removed from before_shmem_exit_list. So, this patch modifies the comment
to reflect how the caller needs to use before_shmem_exit_list mechanism.
---
 src/backend/storage/ipc/ipc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..3ca5e7c89c 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,9 +381,9 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  We only look at the latest entry for removal, as we
+ * 		expect callers to add and remove temporary before_shmem_exit
+ * 		callbacks in strict LIFO order.
  * ----------------------------------------------------------------
  */
 void
-- 
2.25.1

v3-0001-Modify-cancel_before_shmem_exit-comments.patchapplication/x-patch; name=v3-0001-Modify-cancel_before_shmem_exit-comments.patchDownload
From b41e4fa116f1e6eb5d1cee25f9fc6b0fe020a8af Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 10 Aug 2020 12:13:31 +0530
Subject: [PATCH v3] Modify cancel_before_shmem_exit() comments

Currently, there is no mention of safe usage of cancel_before_shmem_exit()
in the function comment and also it has a point that, there is still
scope for improving the way cancel_before_shmem_exit() looks for callback
to removed from before_shmem_exit_list. So, this patch modifies the comment
to reflect how the caller needs to use before_shmem_exit_list mechanism.
---
 src/backend/storage/ipc/ipc.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..3cc5cf9b9b 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,9 +381,9 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  We only look at the latest entry for removal, as we
+ * 		expect callers to add and remove temporary before_shmem_exit
+ * 		callbacks in strict LIFO order.
  * ----------------------------------------------------------------
  */
 void
@@ -393,7 +393,18 @@ cancel_before_shmem_exit(pg_on_exit_callback function, Datum arg)
 		before_shmem_exit_list[before_shmem_exit_index - 1].function
 		== function &&
 		before_shmem_exit_list[before_shmem_exit_index - 1].arg == arg)
+	{
 		--before_shmem_exit_index;
+	}
+	else if (before_shmem_exit_index > 0)
+	{
+		ereport(DEBUG3,
+				(errmsg("before_shmem_exit callback %p cannot be removed from the callback list as it doesn't match with the latest entry %p at index %d",
+						function,
+						before_shmem_exit_list[before_shmem_exit_index - 1].function,
+						before_shmem_exit_index),
+				 errhint("Make sure the temporary before_shmem_exit callbacks are added and removed in strict Last-In-First-Out(LIFO) order.")));
+	}
 }
 
 /* ----------------------------------------------------------------
-- 
2.25.1

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#15)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Fri, Aug 7, 2020 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:

In which situations is the removal actually useful *and* safe, with
these constraints? You'd have to have a very narrow set of functions
that are called while the exit hook is present, i.e. basically this
would only be usable for PG_ENSURE_ERROR_CLEANUP and nothing else. And
even there it seems like it's pretty easy to get into a situation where
it's not safe.

Well, I don't really care whether or not we change this function to
iterate over the callback list or whether we add a warning that you
need to use it in LIFO order, but I think we should do one or the
other, because this same confusion has come up multiple times. I
thought that Tom was opposed to making it iterate over the callback
list (for reasons I don't really understand, honestly) so adding a
comment and a cross-check seemed like the practical option. Now I also
think it's fine to iterate over the callback list: this function
doesn't get used so much that it's likely to be a performance problem,
and I don't think this is the first bug that would have become a
non-bug had we done that years and years ago whenever it was first
proposed. In fact, I'd go so far as to say that the latter is a
slightly better option. However, doing nothing is clearly worst.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Robert Haas <robertmhaas@gmail.com> writes:

Well, I don't really care whether or not we change this function to
iterate over the callback list or whether we add a warning that you
need to use it in LIFO order, but I think we should do one or the
other, because this same confusion has come up multiple times. I
thought that Tom was opposed to making it iterate over the callback
list (for reasons I don't really understand, honestly) so adding a
comment and a cross-check seemed like the practical option. Now I also
think it's fine to iterate over the callback list: this function
doesn't get used so much that it's likely to be a performance problem,
and I don't think this is the first bug that would have become a
non-bug had we done that years and years ago whenever it was first
proposed. In fact, I'd go so far as to say that the latter is a
slightly better option. However, doing nothing is clearly worst.

I agree that doing nothing seems like a bad idea. My concern about
allowing non-LIFO callback removal is that it seems to me that such
usage patterns have likely got *other* bugs, so we should discourage
that. These callbacks don't exist in a vacuum: they reflect that
the mainline code path has set up, or torn down, important state.
Non-LIFO usage requires very strong assumptions that the states in
question are not interdependent, and that's something I'd rather not
rely on if we don't have to.

regards, tom lane

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#18)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Mon, Aug 10, 2020 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I agree that doing nothing seems like a bad idea. My concern about
allowing non-LIFO callback removal is that it seems to me that such
usage patterns have likely got *other* bugs, so we should discourage
that. These callbacks don't exist in a vacuum: they reflect that
the mainline code path has set up, or torn down, important state.
Non-LIFO usage requires very strong assumptions that the states in
question are not interdependent, and that's something I'd rather not
rely on if we don't have to.

I have mixed feelings about this. On the one hand, I think saying that
it requires "very strong assumptions" about interdependence makes it
sound scarier than it is -- not that your statement is wrong, but that
in most cases we kinda know whether that's true or not, and the idea
that you shouldn't remove a callback upon which some later callback
might be depending is not too difficult for anybody to understand. On
the other hand, I think that in most cases we ought to be discouraging
people who are trying to do per-subsystem cleanup from using
before_shmem_exit() at all. I think such callbacks ought to be done
using on_shmem_exit() if they involve shared memory or on_proc_exit()
if they do not. Those functions don't have cancel_blah_exit()
variants, and I don't think they should: the right way to code those
things is not to remove the callbacks from the stack when they're no
longer needed, but rather to code the callbacks so that they will do
nothing if no work is required, leaving them permanently registered.

And the main reason why I think that such callbacks should be
registered using on_shmem_exit() or on_proc_exit() rather than
before_shmem_exit() is because of (1) the interdependency issue you
raise and (2) the fact that cancel_before_shmem_exit doesn't do what
people tend to think it does. For an example of (1), look at
ShutdownPostgres() and RemoveTempRelationsCallback(). The latter
aborts out of any transaction and then starts a new one to drop your
temp schema. But that might fail, so ShutdownPostgres() also needs to
be prepared to AbortOutOfAnyTransaction(). If you inserted more
cleanup steps that were thematically similar to those, each one of
them would also need to begin with AbortOutOfAnyTransaction(). That
suggests that this whole thing is a bit under-engineered. Some of the
other before_shmem_exit() callbacks don't start with that incantation,
but that's because they are per-subsystem callbacks that likely ought
to be using on_shmem_exit() rather than actually being the same sort
of thing.

Perhaps we really have four categories here:
(1) Temporary handlers for PG_ENSURE_ERROR_CLEANUP().
(2) High-level cleanup that needs to run after aborting out of the
current transaction.
(3) Per-subsystem shutdown for shared memory stuff.
(4) Per-subsystem shutdown for backend-private stuff.

Right now we blend (1), (2), and some of (3) together, but we could
try to create a cleaner line. We could redefine before_shmem_exit() as
being exactly #2, and abort out of any transaction before calling each
step, and document that you shouldn't use it unless you need that
behavior. And we could have a separate stack for #1 that is explicitly
LIFO and not intended for any other use. But then again maybe that's
overkill. What I do think we should do, after thinking about it more,
is discourage the casual use of before_shmem_exit() for things where
on_shmem_exit() or on_proc_exit() would be just as good. I think
that's what would avoid the most problems here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#19)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Robert Haas <robertmhaas@gmail.com> writes:

Perhaps we really have four categories here:
(1) Temporary handlers for PG_ENSURE_ERROR_CLEANUP().
(2) High-level cleanup that needs to run after aborting out of the
current transaction.
(3) Per-subsystem shutdown for shared memory stuff.
(4) Per-subsystem shutdown for backend-private stuff.

Hmm, I don't think we actually have any of (2) do we? Or at least
we aren't using ipc.c callbacks for them.

What I do think we should do, after thinking about it more,
is discourage the casual use of before_shmem_exit() for things where
on_shmem_exit() or on_proc_exit() would be just as good. I think
that's what would avoid the most problems here.

I think we're mostly in violent agreement here. The interesting
question seems to be Andres' one about whether before_shmem_exit
actually has any safe use except for PG_ENSURE_ERROR_CLEANUP.
It may not, in which case perhaps we oughta rename it?

regards, tom lane

#21Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#20)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Mon, Aug 10, 2020 at 3:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Perhaps we really have four categories here:
(1) Temporary handlers for PG_ENSURE_ERROR_CLEANUP().
(2) High-level cleanup that needs to run after aborting out of the
current transaction.
(3) Per-subsystem shutdown for shared memory stuff.
(4) Per-subsystem shutdown for backend-private stuff.

Hmm, I don't think we actually have any of (2) do we? Or at least
we aren't using ipc.c callbacks for them.

Well, I was thinking about the place where ShutdownPostgres() does
LockReleaseAll(), and also the stuff in RemoveTempRelationsCallback().
Those are pretty high-level operations that need to happen before we
start shutting down subsystems. Especially the removal of temp
relations.

What I do think we should do, after thinking about it more,
is discourage the casual use of before_shmem_exit() for things where
on_shmem_exit() or on_proc_exit() would be just as good. I think
that's what would avoid the most problems here.

I think we're mostly in violent agreement here. The interesting
question seems to be Andres' one about whether before_shmem_exit
actually has any safe use except for PG_ENSURE_ERROR_CLEANUP.
It may not, in which case perhaps we oughta rename it?

If we could eliminate the other places where it's used, that'd be
great. That's not too clear to me, though, because of the above two
cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#21)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Hi,

On 2020-08-10 15:50:19 -0400, Robert Haas wrote:

On Mon, Aug 10, 2020 at 3:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I do think we should do, after thinking about it more,
is discourage the casual use of before_shmem_exit() for things where
on_shmem_exit() or on_proc_exit() would be just as good. I think
that's what would avoid the most problems here.

I think we're mostly in violent agreement here. The interesting
question seems to be Andres' one about whether before_shmem_exit
actually has any safe use except for PG_ENSURE_ERROR_CLEANUP.
It may not, in which case perhaps we oughta rename it?

If we could eliminate the other places where it's used, that'd be
great. That's not too clear to me, though, because of the above two
cases.

I think there's two different aspects here: Having before_shmem_exit(),
and having cancel_before_shmem_exit(). We could just not have the
latter, and instead use a separate list for PG_ENSURE_ERROR_CLEANUP
internally. With the callback for PG_ENSURE_ERROR_CLEANUP calling those
from its private list. There's no other uses of
cancel_before_shmem_exit afaict.

I guess alternatively we at some point might just need a more complex
callback system, where one can specify where in relation to another
callback a callback needs to be registered etc.

Greetings,

Andres Freund

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#22)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Andres Freund <andres@anarazel.de> writes:

I think there's two different aspects here: Having before_shmem_exit(),
and having cancel_before_shmem_exit(). We could just not have the
latter, and instead use a separate list for PG_ENSURE_ERROR_CLEANUP
internally. With the callback for PG_ENSURE_ERROR_CLEANUP calling those
from its private list. There's no other uses of
cancel_before_shmem_exit afaict.

It's certainly arguable that PG_ENSURE_ERROR_CLEANUP is a special
snowflake and needs to use a separate mechanism. What is not real clear
to me is why there are any other callers that must use before_shmem_exit
rather than on_shmem_exit --- IOW, except for P_E_E_C's use, I have never
been persuaded that the former callback list should exist at all. The
expectation for on_shmem_exit is that callbacks correspond to system
service modules that are initialized in a particular order, and can safely
be torn down in the reverse order. Why can't the existing callers just
make even-later entries into that same callback list?

regards, tom lane

#24Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Mon, Aug 10, 2020 at 8:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's certainly arguable that PG_ENSURE_ERROR_CLEANUP is a special
snowflake and needs to use a separate mechanism. What is not real clear
to me is why there are any other callers that must use before_shmem_exit
rather than on_shmem_exit --- IOW, except for P_E_E_C's use, I have never
been persuaded that the former callback list should exist at all. The
expectation for on_shmem_exit is that callbacks correspond to system
service modules that are initialized in a particular order, and can safely
be torn down in the reverse order. Why can't the existing callers just
make even-later entries into that same callback list?

That split dates to the parallel query work, and there are some
comments in shmem_exit() about it; see in particular the explanation
in the middle where it says "Call dynamic shared memory callbacks." It
seemed to me that I needed the re-entrancy behavior that is described
there, but for a set of callbacks that needed to run before some of
the existing callbacks and after others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#24)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Mon, Aug 10, 2020 at 10:10:08PM -0400, Robert Haas wrote:

That split dates to the parallel query work, and there are some
comments in shmem_exit() about it; see in particular the explanation
in the middle where it says "Call dynamic shared memory callbacks." It
seemed to me that I needed the re-entrancy behavior that is described
there, but for a set of callbacks that needed to run before some of
the existing callbacks and after others.

We still have a CF entry here:
https://commitfest.postgresql.org/29/2649/

Is there still something that needs to absolutely be done here knowing
that we have bab1500 that got rid of the root issue? Can the CF entry
be marked as committed?

(FWIW, I would move any discussion about improving more stuff related
to shared memory cleanup code at proc exit into a new thread, as that
looks like a new topic.)
--
Michael

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#25)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Michael Paquier <michael@paquier.xyz> writes:

Is there still something that needs to absolutely be done here knowing
that we have bab1500 that got rid of the root issue? Can the CF entry
be marked as committed?

I think there is agreement that we're not going to change
cancel_before_shmem_exit's restriction to only allow LIFO popping.
So we should improve its comment to explain why. The other thing
that seems legitimately on-the-table for this CF entry is whether
we should change cancel_before_shmem_exit to complain, rather than
silently do nothing, if it fails to pop the stack. Bharath's
last patchset proposed to add an elog(DEBUG3) complaint, which
seems to me to be just about entirely useless. I'd make it an
ERROR, or maybe an Assert.

regards, tom lane

#27Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tom Lane (#26)
1 attachment(s)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On Mon, Sep 7, 2020 at 8:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there is agreement that we're not going to change
cancel_before_shmem_exit's restriction to only allow LIFO popping.
So we should improve its comment to explain why. The other thing
that seems legitimately on-the-table for this CF entry is whether
we should change cancel_before_shmem_exit to complain, rather than
silently do nothing, if it fails to pop the stack. Bharath's
last patchset proposed to add an elog(DEBUG3) complaint, which
seems to me to be just about entirely useless. I'd make it an
ERROR, or maybe an Assert.

Attaching a patch with both the comments modification and changing
DEBUG3 to ERROR. make check and make world-check passes on this patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0001-Modify-cancel_before_shmem_exit-comments.patchapplication/x-patch; name=v4-0001-Modify-cancel_before_shmem_exit-comments.patchDownload
From 45cb0ada52eba25ce83985cc23edab0d34be9e4a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 7 Sep 2020 23:46:26 -0700
Subject: [PATCH v4] Modify cancel_before_shmem_exit() comments

Currently, there is no mention of safe usage of cancel_before_shmem_exit()
in the function comment and also it has a mis-directing point that, there
is still scope for improving the way cancel_before_shmem_exit() looks
for callback to be removed from before_shmem_exit_list. So, this patch
modifies the comment to reflect how the caller needs to use
before_shmem_exit_list mechanism.

This patch also adds an error in case the before_shmem_exit
function is not found at the latest entry.
---
 src/backend/storage/ipc/ipc.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..3c2c30c189 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,9 +381,9 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  We only look at the latest entry for removal, as we
+ * 		expect callers to add and remove temporary before_shmem_exit
+ * 		callbacks in strict LIFO order.
  * ----------------------------------------------------------------
  */
 void
@@ -393,7 +393,18 @@ cancel_before_shmem_exit(pg_on_exit_callback function, Datum arg)
 		before_shmem_exit_list[before_shmem_exit_index - 1].function
 		== function &&
 		before_shmem_exit_list[before_shmem_exit_index - 1].arg == arg)
+	{
 		--before_shmem_exit_index;
+	}
+	else if (before_shmem_exit_index > 0)
+	{
+		ereport(ERROR,
+				(errmsg("before_shmem_exit callback %p cannot be removed from the callback list as it doesn't match with the latest entry %p at index %d",
+						function,
+						before_shmem_exit_list[before_shmem_exit_index - 1].function,
+						before_shmem_exit_index),
+				 errhint("Make sure the temporary before_shmem_exit callbacks are added and removed in strict Last-In-First-Out(LIFO) order.")));
+	}
 }
 
 /* ----------------------------------------------------------------
-- 
2.25.1

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bharath Rupireddy (#27)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

Attaching a patch with both the comments modification and changing
DEBUG3 to ERROR. make check and make world-check passes on this patch.

I pushed this after simplifying the ereport down to an elog. I see
no reason to consider this a user-facing error, so there's no need
to make translators deal with the message.

regards, tom lane

#29Daniel Gustafsson
daniel@yesql.se
In reply to: Robert Haas (#9)
Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

On 30 Jul 2020, at 14:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jul 20, 2020 at 3:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

But I also agree that what pg_start_backup() was doing before v13 was
wrong; that's why I committed
303640199d0436c5e7acdf50b837a027b5726594. The only reason I didn't
back-patch it is because the consequences are so minor I didn't think
it was worth worrying about. We could, though. I'd be somewhat
inclined to both do that and also change LLVM to use on_proc_exit() in
master, but I don't feel super-strongly about it.

Unless somebody complains pretty soon, I'm going to go ahead and do
what is described above.

When backpatching 9dce22033d5d I ran into this in v13 and below, since it needs
llvm_shutdown to happen via on_proc_exit in order for all llvm_release_context
calls to have finished. Unless anyone objects I will backpatch bab150045bd97
to v12 and v13 as part of my backpatch.

--
Daniel Gustafsson