A new function to wait for the backend exit after termination

Started by Bharath Rupireddyabout 5 years ago57 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
1 attachment(s)

Hi,

Currently pg_terminate_backend(), sends SIGTERM to the backend process but
doesn't ensure it's exit. There are chances that backends still are
running(even after pg_terminate_backend() is called) until the interrupts
are processed(using ProcessInterrupts()). This could cause problems
especially in testing, for instance in a sql file right after
pg_terminate_backend(), if any test case depends on the backend's
non-existence[1]/messages/by-id/f31cc4da-a7ea-677f-cf64-a2f9db854bf5@oss.nttdata.com, but the backend is not terminated. As discussed in [1]/messages/by-id/f31cc4da-a7ea-677f-cf64-a2f9db854bf5@oss.nttdata.com,
we have wait_pid()(see regress.c and sql/dblink.sql), but it's not usable
across the system. In [1]/messages/by-id/f31cc4da-a7ea-677f-cf64-a2f9db854bf5@oss.nttdata.com, we thought it would be better to have functions
ensuring the backend's exit on the similar lines of pg_terminate_backend().

I propose to have two functions:

1. pg_terminate_backend_and_wait() -- which sends SIGTERM to the backend
and wait's until it's exit.
2. pg_wait_backend() -- which waits for a given backend process. Note that
this function has to be used carefully after pg_terminate_backend(), if
used on a backend that's not ternmited it simply keeps waiting in a loop.

Attaching a WIP patch herewith.

Thoughts?

[1]: /messages/by-id/f31cc4da-a7ea-677f-cf64-a2f9db854bf5@oss.nttdata.com
/messages/by-id/f31cc4da-a7ea-677f-cf64-a2f9db854bf5@oss.nttdata.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-pg_terminate_backend_and_wait.patchapplication/x-patch; name=v1-0001-pg_terminate_backend_and_wait.patchDownload
From b1714f9fd95007b9fb6c26b675f83fa73fecd562 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 21 Oct 2020 11:52:34 +0530
Subject: [PATCH v1] pg_terminate_backend_and_wait

---
 src/backend/storage/ipc/signalfuncs.c | 71 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  7 +++
 2 files changed, 78 insertions(+)

diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82cb9..002ac5f10d 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -149,6 +149,77 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+/*
+ * Signal to terminate a backend process and wait until no process has a given
+ * PID. This is allowed if you are a member of the role whose process is being
+ * terminated.
+ *
+ * Note that only superusers can signal superuser-owned processes.
+ */
+Datum
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int			pid = PG_GETARG_INT32(0);
+	bool		r = DatumGetBool(DirectFunctionCall1(pg_terminate_backend,
+													 PG_GETARG_DATUM(0)));
+	if (r)
+	{
+		while (kill(pid, 0) == 0)
+		{
+			/* Process interrupts in case of self termination. */
+			if (pid == MyProcPid)
+				CHECK_FOR_INTERRUPTS();
+
+			pg_usleep(50000);
+		}
+
+		if (errno != ESRCH)
+			elog(ERROR, "could not check PID %d liveness: %m", pid);
+	}
+
+	PG_RETURN_BOOL(r);
+}
+
+Datum
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int			pid = PG_GETARG_INT32(0);
+	PGPROC	   *proc = BackendPidGetProc(pid);
+
+	/*
+	 * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
+	 * we reach kill(), a process for which we get a valid proc here might
+	 * have terminated on its own.  There's no way to acquire a lock on an
+	 * arbitrary process to prevent that. But since so far all the callers of
+	 * this mechanism involve some request for ending the process anyway, that
+	 * it might end on its own first is not a problem.
+	 */
+	if (proc == NULL)
+	{
+		/*
+		 * This is just a warning so a loop-through-resultset will not abort
+		 * if one backend terminated on its own during the run.
+		 */
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		PG_RETURN_BOOL(false);
+	}
+
+	if (!superuser())
+		elog(ERROR, "must be superuser to check PID liveness");
+
+	while (kill(pid, 0) == 0)
+	{
+		//CHECK_FOR_INTERRUPTS();
+		pg_usleep(50000);
+	}
+
+	if (errno != ESRCH)
+		elog(ERROR, "could not check PID %d liveness: %m", pid);
+
+	PG_RETURN_BOOL(true);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340baf1c..1c502d51e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6085,6 +6085,13 @@
 { oid => '2096', descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'terminate a server process and wait for it\'s exit',
+  proname => 'pg_terminate_backend_and_wait', provolatile => 'v',
+  prorettype => 'bool', proargtypes => 'int4',
+  prosrc => 'pg_terminate_backend_and_wait' },
+{ oid => '16387', descr => 'waits for a backend\'s exit',
+  proname => 'pg_wait_backend', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4', prosrc => 'pg_wait_backend' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
-- 
2.25.1

#2Magnus Hagander
magnus@hagander.net
In reply to: Bharath Rupireddy (#1)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 21, 2020 at 3:02 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

Currently pg_terminate_backend(), sends SIGTERM to the backend process but
doesn't ensure it's exit. There are chances that backends still are
running(even after pg_terminate_backend() is called) until the interrupts
are processed(using ProcessInterrupts()). This could cause problems
especially in testing, for instance in a sql file right after
pg_terminate_backend(), if any test case depends on the backend's
non-existence[1], but the backend is not terminated. As discussed in [1],
we have wait_pid()(see regress.c and sql/dblink.sql), but it's not usable
across the system. In [1], we thought it would be better to have functions
ensuring the backend's exit on the similar lines of pg_terminate_backend().

I propose to have two functions:

1. pg_terminate_backend_and_wait() -- which sends SIGTERM to the backend
and wait's until it's exit.

I think it would be nicer to have a pg_terminate_backend(pid, wait=false),
so a function with a second parameter which defaults to the current
behaviour of not waiting. And it might be a good idea to also give it a
timeout parameter?

2. pg_wait_backend() -- which waits for a given backend process. Note that
this function has to be used carefully after pg_terminate_backend(), if
used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

And surely we should show some sort of wait event when it's waiting.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Magnus Hagander (#2)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 21, 2020 at 6:13 AM Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Oct 21, 2020 at 3:02 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

Currently pg_terminate_backend(), sends SIGTERM to the backend process
but doesn't ensure it's exit. There are chances that backends still are
running(even after pg_terminate_backend() is called) until the interrupts
are processed(using ProcessInterrupts()). This could cause problems
especially in testing, for instance in a sql file right after
pg_terminate_backend(), if any test case depends on the backend's
non-existence[1], but the backend is not terminated. As discussed in [1],
we have wait_pid()(see regress.c and sql/dblink.sql), but it's not usable
across the system. In [1], we thought it would be better to have functions
ensuring the backend's exit on the similar lines of pg_terminate_backend().

I propose to have two functions:

1. pg_terminate_backend_and_wait() -- which sends SIGTERM to the backend
and wait's until it's exit.

I think it would be nicer to have a pg_terminate_backend(pid, wait=false),
so a function with a second parameter which defaults to the current
behaviour of not waiting. And it might be a good idea to also give it a
timeout parameter?

Agreed on the overload, and the timeouts make sense too - with the caller
deciding whether a timeout results in a failure or a false return value.

2. pg_wait_backend() -- which waits for a given backend process. Note
that this function has to be used carefully after pg_terminate_backend(),
if used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

Is there a requirement for waiting to be superuser only? You are not
affecting any session but your own during the waiting period.

I could imagine, in theory at least, wanting to wait for a backend to go
idle as well as for it disappearing. Scope creep in terms of this patch's
goal but worth at least considering now.

David J.

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Magnus Hagander (#2)
Re: A new function to wait for the backend exit after termination

Thanks for the feedback.

On Wed, Oct 21, 2020 at 6:43 PM Magnus Hagander <magnus@hagander.net> wrote:

Currently pg_terminate_backend(), sends SIGTERM to the backend process but doesn't ensure it's exit. There are chances that backends still are running(even after pg_terminate_backend() is called) until the interrupts are processed(using ProcessInterrupts()). This could cause problems especially in testing, for instance in a sql file right after pg_terminate_backend(), if any test case depends on the backend's non-existence[1], but the backend is not terminated. As discussed in [1], we have wait_pid()(see regress.c and sql/dblink.sql), but it's not usable across the system. In [1], we thought it would be better to have functions ensuring the backend's exit on the similar lines of pg_terminate_backend().

I propose to have two functions:

1. pg_terminate_backend_and_wait() -- which sends SIGTERM to the backend and wait's until it's exit.

I think it would be nicer to have a pg_terminate_backend(pid, wait=false), so a function with a second parameter which defaults to the current behaviour of not waiting. And it might be a good idea to also give it a timeout parameter?

+1 to have pg_terminate_backend(pid, wait=false, timeout), timeout in
milliseconds only valid if wait = true.

2. pg_wait_backend() -- which waits for a given backend process. Note that this function has to be used carefully after pg_terminate_backend(), if used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

And surely we should show some sort of wait event when it's waiting.

Yes for this function too we can have a timeout value.
pg_wait_backend(pid, timeout), timeout in milliseconds.

I think we can use WaitLatch with the given timeout and with a new
wait event type WAIT_EVENT_BACKEND_SHUTDOWN instead of pg_usleep for
achieving the given timeout mechanism. With WaitLatch we would also
get the waiting event in stats. Thoughts?

rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_POSTMASTER_DEATH, timeout,
WAIT_EVENT_BACKEND_SHUTDOWN);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#5Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#2)
Re: A new function to wait for the backend exit after termination

On 2020-10-21 15:13:36 +0200, Magnus Hagander wrote:

It seems this one also very much would need a timeout value.

I'm not really against that, but I wonder if we just end up
reimplementing statement timeout...

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David G. Johnston (#3)
Re: A new function to wait for the backend exit after termination

Thanks for the feedback.

On Wed, Oct 21, 2020 at 8:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Oct 21, 2020 at 6:13 AM Magnus Hagander <magnus@hagander.net> wrote:

I think it would be nicer to have a pg_terminate_backend(pid, wait=false), so a function with a second parameter which defaults to the current behaviour of not waiting. And it might be a good idea to also give it a timeout parameter?

Agreed on the overload, and the timeouts make sense too - with the caller deciding whether a timeout results in a failure or a false return value.

If the backend is terminated within the user specified timeout then
the function returns true, otherwise false.

2. pg_wait_backend() -- which waits for a given backend process. Note that this function has to be used carefully after pg_terminate_backend(), if used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

Is there a requirement for waiting to be superuser only? You are not affecting any session but your own during the waiting period.

IIUC, in the same patch instead of returning an error in case of
non-superusers, do we need to wait for user provided timeout
milliseconds until the current user becomes superuser and then throw
error if still non-superuser, and proceed further if superuser?

Do we need to have a new function that waits until a current
non-superuser in a session becomes superuser?

Something else?

I could imagine, in theory at least, wanting to wait for a backend to go idle as well as for it disappearing. Scope creep in terms of this patch's goal but worth at least considering now.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

If my understanding is wrong, could you please explain more?

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Bharath Rupireddy (#6)
Re: A new function to wait for the backend exit after termination

On Wednesday, October 21, 2020, Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for the feedback.

On Wed, Oct 21, 2020 at 8:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Oct 21, 2020 at 6:13 AM Magnus Hagander <magnus@hagander.net>

wrote:

I think it would be nicer to have a pg_terminate_backend(pid,

wait=false), so a function with a second parameter which defaults to the
current behaviour of not waiting. And it might be a good idea to also give
it a timeout parameter?

Agreed on the overload, and the timeouts make sense too - with the

caller deciding whether a timeout results in a failure or a false return
value.

If the backend is terminated within the user specified timeout then
the function returns true, otherwise false.

I’m suggesting an option for the second case to fail instead of returning
false.

2. pg_wait_backend() -- which waits for a given backend process. Note

that this function has to be used carefully after pg_terminate_backend(),
if used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

Is there a requirement for waiting to be superuser only? You are not

affecting any session but your own during the waiting period.

IIUC, in the same patch instead of returning an error in case of
non-superusers, do we need to wait for user provided timeout
milliseconds until the current user becomes superuser and then throw
error if still non-superuser, and proceed further if superuser?

Do we need to have a new function that waits until a current
non-superuser in a session becomes superuser?

Something else?

Not sure how that would even be possible mid-statement. I was suggesting
removing the superuser check altogether and letting any user execute “wait”.

I could imagine, in theory at least, wanting to wait for a backend to go

idle as well as for it disappearing. Scope creep in terms of this patch's
goal but worth at least considering now.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

If my understanding is wrong, could you please explain more?

Yes, this describes what i was thinking.

David J.

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David G. Johnston (#7)
Re: A new function to wait for the backend exit after termination

On Thu, Oct 22, 2020 at 8:39 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

If the backend is terminated within the user specified timeout then
the function returns true, otherwise false.

I’m suggesting an option for the second case to fail instead of returning false.

That seems fine.

I could imagine, in theory at least, wanting to wait for a backend to go idle as well as for it disappearing. Scope creep in terms of this patch's goal but worth at least considering now.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

If my understanding is wrong, could you please explain more?

Yes, this describes what i was thinking.

+1.

I will implement these functionality and post a new patch soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Magnus Hagander (#2)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 21, 2020 at 6:43 PM Magnus Hagander <magnus@hagander.net> wrote:

I think it would be nicer to have a pg_terminate_backend(pid, wait=false), so a function with a second parameter which defaults to the current behaviour of not waiting. And it might be a good idea to also give it a timeout parameter?

Done.

2. pg_wait_backend() -- which waits for a given backend process. Note that this function has to be used carefully after pg_terminate_backend(), if used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

Done.

And surely we should show some sort of wait event when it's waiting.

Added two wait events.

If the backend is terminated within the user specified timeout then
the function returns true, otherwise false.

I’m suggesting an option for the second case to fail instead of returning false.

Done.

I could imagine, in theory at least, wanting to wait for a backend to go idle as well as for it disappearing. Scope creep in terms of this patch's goal but worth at least considering now.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

Done.

Attaching a v2 patch herewith.

Thoughts and feedback are welcome.

Below things are still pending, which I plan to work on soon:

1. More testing and addition of test cases into the regression test suite.
2. Addition of the new function information into the docs.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-pg_wait_backend-and-pg_terminate_backend-with-wai.patchapplication/x-patch; name=v2-0001-pg_wait_backend-and-pg_terminate_backend-with-wai.patchDownload
From da1d6d54d877689dac4c2cc5338e15b73e0c6979 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 28 Oct 2020 06:56:16 +0530
Subject: [PATCH v2] pg_wait_backend() and pg_terminate_backend() with wait and
 timeout options

---
 src/backend/postmaster/pgstat.c       | 175 ++++++++++++++++----------
 src/backend/storage/ipc/signalfuncs.c | 164 ++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |   9 ++
 src/include/pgstat.h                  |   6 +-
 4 files changed, 290 insertions(+), 64 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0ebc62..c077f3c072 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3778,6 +3778,89 @@ pgstat_get_wait_event(uint32 wait_event_info)
 	return event_name;
 }
 
+/* ----------
+ * pgstat_get_backend_status_entry() -
+ *
+ * Return the entry of the backend with the given PID from the backend status
+ * array. This looks directly at the BackendStatusArray, and so will provide
+ * current information regardless of the age of our transaction's snapshot of
+ * the status array.
+ * ----------
+ */
+bool
+pgstat_get_backend_status_entry(int pid, PgBackendStatus *bestatusentry)
+{
+	PgBackendStatus *beentry;
+	int			i;
+
+	beentry = BackendStatusArray;
+	for (i = 1; i <= MaxBackends; i++)
+	{
+		/*
+		 * Although we expect the target backend's entry to be stable, that
+		 * doesn't imply that anyone else's is.  To avoid identifying the
+		 * wrong backend, while we check for a match to the desired PID we
+		 * must follow the protocol of retrying if st_changecount changes
+		 * while we examine the entry, or if it's odd.  (This might be
+		 * unnecessary, since fetching or storing an int is almost certainly
+		 * atomic, but let's play it safe.)  We use a volatile pointer here to
+		 * ensure the compiler doesn't try to get cute.
+		 */
+		volatile PgBackendStatus *vbeentry = beentry;
+		bool		found;
+
+		for (;;)
+		{
+			int			before_changecount;
+			int			after_changecount;
+
+			pgstat_begin_read_activity(vbeentry, before_changecount);
+
+			found = (vbeentry->st_procpid == pid);
+
+			pgstat_end_read_activity(vbeentry, after_changecount);
+
+			if (pgstat_read_activity_complete(before_changecount,
+											  after_changecount))
+				break;
+
+			/* Make sure we can break out of loop if stuck... */
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (found)
+		{
+			/* Now it is safe to use the non-volatile pointer */
+			bestatusentry = beentry;
+			return true;
+		}
+
+		beentry++;
+	}
+
+	bestatusentry = NULL;
+	return false;
+}
+
+/* ----------
+ * pgstat_get_backend_state() -
+ *
+ * Return a pointer to the state of the backend with the specified PID.
+ * ----------
+ */
+BackendState *
+pgstat_get_backend_state(int pid)
+{
+	PgBackendStatus *beentry = NULL;
+	bool found = pgstat_get_backend_status_entry(pid, beentry);
+
+	if(found && beentry != NULL)
+		return &beentry->st_state;
+
+	/* PID not found */
+	return NULL;
+}
+
 /* ----------
  * pgstat_get_wait_activity() -
  *
@@ -4021,6 +4104,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
+		case WAIT_EVENT_BACKEND_IDLE:
+			event_name = "BackendIdle";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4307,78 +4396,38 @@ pgstat_get_wait_io(WaitEventIO w)
 /* ----------
  * pgstat_get_backend_current_activity() -
  *
- *	Return a string representing the current activity of the backend with
- *	the specified PID.  This looks directly at the BackendStatusArray,
- *	and so will provide current information regardless of the age of our
- *	transaction's snapshot of the status array.
- *
- *	It is the caller's responsibility to invoke this only for backends whose
- *	state is expected to remain stable while the result is in use.  The
- *	only current use is in deadlock reporting, where we can expect that
- *	the target backend is blocked on a lock.  (There are corner cases
- *	where the target's wait could get aborted while we are looking at it,
- *	but the very worst consequence is to return a pointer to a string
- *	that's been changed, so we won't worry too much.)
- *
- *	Note: return strings for special cases match pg_stat_get_backend_activity.
+ * Return a string representing the current activity of the backend with
+ * the specified PID.
+ *
+ * It is the caller's responsibility to invoke this only for backends whose
+ * state is expected to remain stable while the result is in use.  The only
+ * current use is in deadlock reporting, where we can expect that the target
+ * backend is blocked on a lock.  (There are corner cases where the target's
+ * wait could get aborted while we are looking at it, but the very worst
+ * consequence is to return a pointer to a string that's been changed, so we
+ * won't worry too much.)
+ *
+ * Note: return strings for special cases match pg_stat_get_backend_activity.
  * ----------
  */
 const char *
 pgstat_get_backend_current_activity(int pid, bool checkUser)
 {
-	PgBackendStatus *beentry;
-	int			i;
+	PgBackendStatus *beentry = NULL;
+	bool found = pgstat_get_backend_status_entry(pid, beentry);
 
-	beentry = BackendStatusArray;
-	for (i = 1; i <= MaxBackends; i++)
+	if (found && beentry != NULL)
 	{
-		/*
-		 * Although we expect the target backend's entry to be stable, that
-		 * doesn't imply that anyone else's is.  To avoid identifying the
-		 * wrong backend, while we check for a match to the desired PID we
-		 * must follow the protocol of retrying if st_changecount changes
-		 * while we examine the entry, or if it's odd.  (This might be
-		 * unnecessary, since fetching or storing an int is almost certainly
-		 * atomic, but let's play it safe.)  We use a volatile pointer here to
-		 * ensure the compiler doesn't try to get cute.
-		 */
-		volatile PgBackendStatus *vbeentry = beentry;
-		bool		found;
-
-		for (;;)
-		{
-			int			before_changecount;
-			int			after_changecount;
-
-			pgstat_begin_read_activity(vbeentry, before_changecount);
-
-			found = (vbeentry->st_procpid == pid);
-
-			pgstat_end_read_activity(vbeentry, after_changecount);
-
-			if (pgstat_read_activity_complete(before_changecount,
-											  after_changecount))
-				break;
-
-			/* Make sure we can break out of loop if stuck... */
-			CHECK_FOR_INTERRUPTS();
-		}
-
-		if (found)
+		/* Now it is safe to use the non-volatile pointer */
+		if (checkUser && !superuser() && beentry->st_userid != GetUserId())
+			return "<insufficient privilege>";
+		else if (*(beentry->st_activity_raw) == '\0')
+			return "<command string not enabled>";
+		else
 		{
-			/* Now it is safe to use the non-volatile pointer */
-			if (checkUser && !superuser() && beentry->st_userid != GetUserId())
-				return "<insufficient privilege>";
-			else if (*(beentry->st_activity_raw) == '\0')
-				return "<command string not enabled>";
-			else
-			{
-				/* this'll leak a bit of memory, but that seems acceptable */
-				return pgstat_clip_activity(beentry->st_activity_raw);
-			}
+			/* this'll leak a bit of memory, but that seems acceptable */
+			return pgstat_clip_activity(beentry->st_activity_raw);
 		}
-
-		beentry++;
 	}
 
 	/* If we get here, caller is in error ... */
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82cb9..8a9b8ef254 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -17,13 +17,16 @@
 #include <signal.h>
 
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/formatting.h"
 
 
 /*
@@ -44,6 +47,18 @@
 #define SIGNAL_BACKEND_ERROR 1
 #define SIGNAL_BACKEND_NOPERMISSION 2
 #define SIGNAL_BACKEND_NOSUPERUSER 3
+
+/* ----------
+ * The types of backend wait mode
+ * ----------
+ */
+typedef enum BackendWaitMode
+{
+	PGSTAT_BACKEND_UNDEFINED,
+	PGSTAT_BACKEND_TERMINATE,
+	PGSTAT_BACKEND_IDLE
+} BackendWaitMode;
+
 static int
 pg_signal_backend(int pid, int sig)
 {
@@ -149,6 +164,155 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+/*
+ * Wait until there is no backend process with given PID or wait until timeout.
+ * On timeout, an error is thrown to the user.
+ */
+static void
+pg_wait_until_termination_or_idle(int pid,
+								  float8 timeout,
+								  BackendWaitMode mode)
+{
+	float8	sleeptime = 50000;
+	float8	totalsleeptime = 0;
+	float8	remainingtime = 0;
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errmsg("timeout cannot be negative: %f", timeout)));
+
+	remainingtime = timeout * 1000L;
+
+	if (mode == PGSTAT_BACKEND_TERMINATE)
+	{
+		while (kill(pid, 0) == 0)
+		{
+			if (remainingtime < sleeptime)
+				sleeptime = remainingtime;
+
+			pgstat_report_wait_start(WAIT_EVENT_BACKEND_TERMINATION);
+			pg_usleep(sleeptime);
+			pgstat_report_wait_end();
+
+			totalsleeptime += sleeptime;
+
+			if (totalsleeptime >= timeout * 1000L)
+				ereport(ERROR,
+						(errmsg("could not check PID %d liveness due to timeout",
+								pid)));
+
+			remainingtime = timeout * 1000L - totalsleeptime;
+		}
+
+		if (errno != ESRCH)
+			ereport(ERROR,
+					(errmsg("could not check PID %d liveness: %m", pid)));
+	}
+	else if (mode == PGSTAT_BACKEND_IDLE)
+	{
+		BackendState *state;
+
+		while (1)
+		{
+			state = pgstat_get_backend_state(pid);
+
+			if (state != NULL &&
+				*state == STATE_IDLE)
+				break;
+
+			if (remainingtime < sleeptime)
+				sleeptime = remainingtime;
+
+			pgstat_report_wait_start(WAIT_EVENT_BACKEND_IDLE);
+			pg_usleep(sleeptime);
+			pgstat_report_wait_end();
+
+			totalsleeptime += sleeptime;
+
+			if (totalsleeptime >= timeout * 1000L)
+				ereport(ERROR,
+						(errmsg("could not wait until backend with PID %d becomes idle due to timeout",
+								pid)));
+
+			remainingtime = timeout * 1000L - totalsleeptime;
+		}
+	}
+}
+
+/*
+ * Signal to terminate a backend process and wait until timeout or no process
+ * has a given PID. If the wait input argument is false, this function
+ * behaviour is same as pg_terminate_backend. On timeout, an error is thrown to
+ * the user. Self termination is allowed but waiting is not. Because once the
+ * backend is self terminated there is no point in waiting for it to go away.
+ */
+Datum
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int 	pid = PG_GETARG_DATUM(0);
+	bool	wait = PG_GETARG_BOOL(1);
+	bool	r;
+
+	/* Self termination is allowed but waiting is not. */
+	if (pid == MyProcPid && wait)
+		wait = false;
+
+	r = DatumGetBool(DirectFunctionCall1(pg_terminate_backend, pid));
+
+	if (wait && r)
+		pg_wait_until_termination_or_idle(pid,
+										  PG_GETARG_FLOAT8(2),
+										  PGSTAT_BACKEND_TERMINATE);
+
+	PG_RETURN_BOOL(r);
+}
+
+/*
+ * This function waits for a backend with given PID to become idle or be
+ * terminated or it gets timed out. If the PID given is of the current backend
+ * that issued this function, then, in wait until idle mode mostly it gets
+ * timed out because the backend never reaches idle state. In wait until
+ * termination mode, either timeout can occur or the function can succeed. It
+ * depends on whether the backend itself got teriminated or someone else
+ * has terminated the backend.
+ */
+Datum
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int				pid = PG_GETARG_INT32(0);
+	PGPROC	   		*proc = BackendPidGetProc(pid);
+	char	   		*wait_until = NULL;
+	char 	   		*wait_until_lower = NULL;
+	BackendWaitMode	mode = PGSTAT_BACKEND_UNDEFINED;
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process",
+				 pid)));
+		PG_RETURN_BOOL(false);
+	}
+
+	wait_until = text_to_cstring(PG_GETARG_TEXT_PP(1));
+	wait_until_lower = str_tolower(wait_until, strlen(wait_until),
+								 DEFAULT_COLLATION_OID);
+
+	if (strcmp(wait_until_lower, "termination") == 0)
+		mode = PGSTAT_BACKEND_TERMINATE;
+	else if(strcmp(wait_until_lower, "idle") == 0)
+		mode = PGSTAT_BACKEND_IDLE;
+	else
+		ereport(ERROR,
+				(errmsg("specified wait until parameter \"%s\" is invalid",
+						 wait_until)));
+
+	pg_wait_until_termination_or_idle(pid, PG_GETARG_FLOAT8(2), mode);
+
+	pfree(wait_until_lower);
+
+	PG_RETURN_BOOL(true);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index bbcac69d48..e04d6b0a9e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6085,6 +6085,15 @@
 { oid => '2096', descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'terminate a server process and wait for it\'s exit or until timeout occurs',
+  proname => 'pg_terminate_backend', provolatile => 'v',
+  prorettype => 'bool', proargtypes => 'int4 bool float8',
+  proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend_and_wait' },
+{ oid => '16387', descr => 'waits for a backend\'s exit or until the backend goes to idle mode or timeout occurs',
+  proname => 'pg_wait_backend', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 text float8', proargnames => '{pid,wait_until,timeout}',
+  prosrc => 'pg_wait_backend' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4f15..fc44aa6173 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -952,7 +952,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION,
+	WAIT_EVENT_BACKEND_IDLE
 } WaitEventIPC;
 
 /* ----------
@@ -1399,6 +1401,8 @@ extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
 extern const char *pgstat_get_wait_event(uint32 wait_event_info);
 extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
+extern bool pgstat_get_backend_status_entry(int pid, PgBackendStatus *bestatusentry);
+extern BackendState *pgstat_get_backend_state(int pid);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 													   int buflen);
-- 
2.25.1

#10Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Bharath Rupireddy (#9)
Re: A new function to wait for the backend exit after termination

On 2020/10/28 20:50, Bharath Rupireddy wrote:

On Wed, Oct 21, 2020 at 6:43 PM Magnus Hagander <magnus@hagander.net> wrote:

I think it would be nicer to have a pg_terminate_backend(pid, wait=false), so a function with a second parameter which defaults to the current behaviour of not waiting. And it might be a good idea to also give it a timeout parameter?

Done.

2. pg_wait_backend() -- which waits for a given backend process. Note that this function has to be used carefully after pg_terminate_backend(), if used on a backend that's not ternmited it simply keeps waiting in a loop.

It seems this one also very much would need a timeout value.

Done.

And surely we should show some sort of wait event when it's waiting.

Added two wait events.

If the backend is terminated within the user specified timeout then
the function returns true, otherwise false.

I’m suggesting an option for the second case to fail instead of returning false.

Done.

I prefer that false is returned when the timeout happens,
like pg_promote() does.

I could imagine, in theory at least, wanting to wait for a backend to go idle as well as for it disappearing. Scope creep in terms of this patch's goal but worth at least considering now.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

Isn't this wait-for-idle mode fragile? Because there is no guarantee
that the backend is still in idle state when pg_wait_backend(idle) returns.

Done.

Attaching a v2 patch herewith.

Thoughts and feedback are welcome.

Thanks for the patch!

When the specified timeout is negative, the following error is thrown *after*
SIGTERM is signaled to the target backend. This seems strange to me.
The timeout value should be verified at the beginning of the function, instead.

ERROR: timeout cannot be negative

pg_terminate_backend(xxx, false) failed with the following error. I think
it's more helpful if the function can work even without the timeout value.
That is, what about redefining the function in src/backend/catalog/system_views.sql
and specifying the DEFAULT values for the arguments "wait" and "timeout"?
The similar function "pg_promote" would be good reference to you.

ERROR: function pg_terminate_backend(integer, boolean) does not exist at character 8

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Fujii Masao (#10)
Re: A new function to wait for the backend exit after termination

Thanks for the comments.

On Wed, Oct 28, 2020 at 6:41 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I prefer that false is returned when the timeout happens,
like pg_promote() does.

Earlier it was suggested to error out on timeout. Since users can not
guess on time it takes to terminate or become idle, throwing error
seems to be odd on timeout. And also in case if the given pid is not a
backend pid, we are throwing a warning and returning false but not
error. Similarly we can return false on timeout, if required a
warning. Thoughts?

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

Isn't this wait-for-idle mode fragile? Because there is no guarantee
that the backend is still in idle state when pg_wait_backend(idle) returns.

Yeah this can happen. By the time pg_wait_backend returns we could
have the idle state of the backend changed. Looks like this is also a
problem with the existing pgstat_get_backend_current_activity()
function. There we have a comment saying below and the function
returns a pointer to the current activity string. Maybe we could have
similar comments about the usage in the document?

* It is the caller's responsibility to invoke this only for backends whose
* state is expected to remain stable while the result is in use.

Does this problem exist even if we use pg_stat_activity()?

When the specified timeout is negative, the following error is thrown *after*
SIGTERM is signaled to the target backend. This seems strange to me.
The timeout value should be verified at the beginning of the function, instead.

ERROR: timeout cannot be negative

Okay. I will change that.

pg_terminate_backend(xxx, false) failed with the following error. I think
it's more helpful if the function can work even without the timeout value.
That is, what about redefining the function in src/backend/catalog/system_views.sql
and specifying the DEFAULT values for the arguments "wait" and "timeout"?
The similar function "pg_promote" would be good reference to you.

ERROR: function pg_terminate_backend(integer, boolean) does not exist at character 8

Yeah. This seems good. I will have false as default value for the wait
parameter. I have defined the timeout to be in milliseconds, then how
about having a default value of 100 milliseconds?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Bharath Rupireddy (#11)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 28, 2020 at 6:50 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for the comments.

On Wed, Oct 28, 2020 at 6:41 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

I prefer that false is returned when the timeout happens,
like pg_promote() does.

Earlier it was suggested to error out on timeout.

For consideration. I'll give a point for being consistent with other
existing functions, and it wouldn't be hard to extend should we want to add
the option later, so while the more flexible API seems better on its face
limiting ourselves to boolean false isn't a big deal to me; especially as
I've yet to write code that would make use of this feature.

Since users can not

guess on time it takes to terminate or become idle, throwing error
seems to be odd on timeout.

I don't see how the one follows from the other.

And also in case if the given pid is not a

backend pid, we are throwing a warning and returning false but not
error.

Similarly we can return false on timeout, if required a

warning. Thoughts?

IMO, if there are multiple ways to return false then all of them should
emit a notice or warning describing which of the false conditions was hit.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

Isn't this wait-for-idle mode fragile? Because there is no guarantee
that the backend is still in idle state when pg_wait_backend(idle)

returns.

I was thinking this would be useful for orchestration. However, as you
say, its a pretty fragile method. I withdraw the suggestion. What I would
replace it with is a pg_wait_for_notify(payload_test) function that allows
an SQL user to plug itself into the listen/notify feature and pause the
session until a notification arrives. The session it is coordinating with
would simply notify just before ending its script/transaction.

David J.

#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David G. Johnston (#12)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 28, 2020 at 7:51 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

And also in case if the given pid is not a
backend pid, we are throwing a warning and returning false but not
error.

Similarly we can return false on timeout, if required a
warning. Thoughts?

IMO, if there are multiple ways to return false then all of them should emit a notice or warning describing which of the false conditions was hit.

Currently there are two possibilities in pg_teriminate_backend where a
warning is thrown and false is returned. 1. when the process with a
given pid is not a backend 2. when we can not send the SIGTERM to the
given backend.

I will add another case to throw the warning and return false when
timeout occurs.

IIUC, do we need a new option, something like pg_wait_backend(pid,
timeout, waituntil) where "waituntil" if specified "idle" waits until
the given backend goes to idle mode, or "termination" waits until
termination?

Isn't this wait-for-idle mode fragile? Because there is no guarantee
that the backend is still in idle state when pg_wait_backend(idle) returns.

I was thinking this would be useful for orchestration. However, as you say, its a pretty fragile method. I withdraw the suggestion.

So, pg_wait_backend(pid, timeout) waits until the backend with a given
pid is terminated?

What I would replace it with is a pg_wait_for_notify(payload_test) function that allows an SQL user to plug itself into the listen/notify feature and pause the session until a notification arrives. The session it is coordinating with would >simply notify just before ending its script/transaction.

Why does one session need to listen and wait until another session
notifies? If my understanding is wrong, could you please elaborate on
the above point, the usage and the use case?

For consideration. I'll give a point for being consistent with other existing functions, and it wouldn't be hard to extend should we want to add the option later, so while the more flexible API seems better on its face limiting ourselves to >boolean false isn't a big deal to me; especially as I've yet to write code that would make use of this feature.

I see that this pg_wait_backend(pid, timeout) functionality can be
right away used in two places, one in dblink.sql where wait_pid is
being used, second in postgres_fdw.sql where
terminate_backend_and_wait() is being used. However we can make these
changes as part of another patch set after the proposed two new
functions are finalized and reviewed.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Bharath Rupireddy (#13)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 28, 2020 at 10:14 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Oct 28, 2020 at 7:51 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

I was thinking this would be useful for orchestration. However, as you

say, its a pretty fragile method. I withdraw the suggestion.

So, pg_wait_backend(pid, timeout) waits until the backend with a given
pid is terminated?

Yes. The original proposal.

What I would replace it with is a pg_wait_for_notify(payload_test)

function that allows an SQL user to plug itself into the listen/notify
feature and pause the session until a notification arrives. The session it
is coordinating with would >simply notify just before ending its
script/transaction.

Why does one session need to listen and wait until another session
notifies? If my understanding is wrong, could you please elaborate on
the above point, the usage and the use case?

Theory, but I imagine writing an isolation test like test script where the
two sessions wait for notifications instead of sleep for random amounts of
time.

More generally, psql is very powerful but doesn't allow scripting to plug
into pub/sub. I don't have a concrete use case for why it should but the
capability doesn't seem far-fetched.

I'm not saying this is something that is needed, rather it would seem more
useful than wait_for_idle.

David J.

#15Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Fujii Masao (#10)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Wed, Oct 28, 2020 at 6:41 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I prefer that false is returned when the timeout happens,
like pg_promote() does.

Done.

When the specified timeout is negative, the following error is thrown *after*
SIGTERM is signaled to the target backend. This seems strange to me.
The timeout value should be verified at the beginning of the function, instead.

ERROR: timeout cannot be negative

I'm not throwing error for this case, instead a warning and returning
false. This is to keep it consistent with other cases such as the
given pid is not a backend pid.

Attaching the v3 patch. I tried to address the review comments
received so far and added documentation. I tested the patch locally
here. I saw that we don't have any test cases for existing
pg_terminate_backend(), do we need to add test cases into regression
suites for these two new functions?

Please review the v3 patch and let me know comments.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v3-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchapplication/x-patch; name=v3-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchDownload
From 6d6602d6caa0773327444ec1bc1ead2f7afc5c30 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 31 Oct 2020 15:56:09 +0530
Subject: [PATCH v1] pg_terminate_backend with wait, timeout and
 pg_wait_backend with timeout

This patch adds two new functions:
1. pg_terminate_backend(pid, wait, timeout) - terminate and
wait or timeout for a given backend.
2. pg_wait_backend(pid, timeout) - check the existence of the
backend with a given PID and wait or timeout until it goes away.
---
 doc/src/sgml/func.sgml                |  28 +++++-
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 135 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |   9 ++
 src/include/pgstat.h                  |   3 +-
 6 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d8eee3a826..0472f9166f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23972,7 +23972,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -23980,7 +23980,31 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        provided, only SIGTERM is sent to the backend with the given process
+        ID and <literal>false</literal> is returned immediately. But the
+        backend may still exist. With <parameter>wait</parameter> specified as
+        <literal>true</literal>, the function waits for the backend termination
+        until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. On timeout <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_backend</primary>
+        </indexterm>
+        <function>pg_wait_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check the existence of the session whose backend process has the
+        specified process ID. On success return <literal>true</literal>. This
+        function waits until the given <parameter>timeout</parameter> or the
+        default 100 milliseconds. On timeout <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5171ea05c7..4a5ce09817 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1243,6 +1243,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend_and_wait'
+  PARALLEL UNSAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_backend(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_backend'
+  PARALLEL UNSAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f1dca2f25b..4eb11382cd 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4024,6 +4024,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82cb9..4ed0c135b0 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -149,6 +150,140 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+/*
+ * Wait until there is no backend process with the given PID or timeout. On
+ * timeout, a warning is thrown to the user.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check the backend existence. If doesn't exist, then check for the errno
+	 * ESRCH that confirms it and return true. If still exist, then wait for
+	 * waittime milliseconds, again check for the existence. Repeat until
+	 * timeout or an error occurs or a pending interrupt such as query cancel
+	 * is processed.
+	 */
+	for (;;)
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		remainingtime -= waittime;
+
+		if (remainingtime <= 0)
+			break;
+	}
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %ld milliseconds",
+					pid, timeout)));
+
+	return false;
+}
+
+/*
+ * Signal termination of a backend process, wait until timeout or no backend
+ * has the given PID. If the wait input argument is false, this function
+ * behaviour is same as pg_terminate_backend. On timeout, a warning is thrown
+ * to the user. Self termination is allowed but waiting is not, because once
+ * the backend is self terminated there is no point in waiting for it to go
+ * away.
+ */
+Datum
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int 	pid = PG_GETARG_DATUM(0);
+	bool	wait = PG_GETARG_BOOL(1);
+	int64	timeout = PG_GETARG_INT64(2);
+	bool	r = false;
+
+	if (timeout <= 0)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %ld", timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	/* Self termination is allowed but waiting is not after that. */
+	if (pid == MyProcPid && wait)
+		wait = false;
+
+	r = DatumGetBool(DirectFunctionCall1(pg_terminate_backend, pid));
+
+	/* Wait only if requested and the termination is successful. */
+	if (wait && r)
+		r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout occurs. If the PID is of the backend which issued this
+ * function, either timeout can occur or the function can succeed if the
+ * backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int		pid = PG_GETARG_INT32(0);
+	int64  	timeout = PG_GETARG_INT64(1);
+	PGPROC	*proc = NULL;
+	bool	r = false;
+
+	if (timeout <= 0)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %ld", timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		PG_RETURN_BOOL(r);
+	}
+
+	r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d9770bbadd..ce6604dab6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6084,6 +6084,15 @@
 { oid => '2096', descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'terminate a backend process and wait for it\'s exit or until timeout occurs',
+  proname => 'pg_terminate_backend', provolatile => 'v',
+  prorettype => 'bool', proargtypes => 'int4 bool int8',
+  proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend_and_wait' },
+{ oid => '16387', descr => 'waits for a backend process exit or timeout occurs',
+  proname => 'pg_wait_backend', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_backend' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 257e515bfe..d100a1c737 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -958,7 +958,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION
 } WaitEventIPC;
 
 /* ----------
-- 
2.25.1

#16Muhammad Usama
m.usama@gmail.com
In reply to: Bharath Rupireddy (#15)
Re: A new function to wait for the backend exit after termination

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

I have tested the patch against current master branch (commit:6742e14959a3033d946ab3d67f5ce4c99367d332)
Both functions work without a problem and as expected.
Just a tiny comment/suggestion.
specifying a -ve timeout in pg_terminate_backed rightly throws an error,
I am not sure if it would be right or a wrong approach but I guess we can ignore -ve
timeout in pg_terminate_backend function when wait (second argument) is false.

e.g. pg_terminate_backend(12320, false,-1); -- ignore -1 timout since wait is false

The new status of this patch is: Ready for Committer

#17Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Muhammad Usama (#16)
Re: A new function to wait for the backend exit after termination

On Mon, Nov 30, 2020 at 8:10 PM Muhammad Usama <m.usama@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

I have tested the patch against current master branch (commit:6742e14959a3033d946ab3d67f5ce4c99367d332)
Both functions work without a problem and as expected.

Thanks!

Just a tiny comment/suggestion.
specifying a -ve timeout in pg_terminate_backed rightly throws an error,
I am not sure if it would be right or a wrong approach but I guess we can ignore -ve
timeout in pg_terminate_backend function when wait (second argument) is false.

e.g. pg_terminate_backend(12320, false,-1); -- ignore -1 timout since wait is false

IMO, that's not a good idea. I see it this way, for any function first
the input args have to be validated. If okay, then follows the use of
those args and the main functionality. I can also see pg_promote(),
which first does the input timeout validation throwing error if it is
<= 0.

We can retain the existing behaviour.

The new status of this patch is: Ready for Committer

Thanks!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#18Muhammad Usama
m.usama@gmail.com
In reply to: Bharath Rupireddy (#17)
Re: A new function to wait for the backend exit after termination

On Wed, Dec 2, 2020 at 1:30 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Nov 30, 2020 at 8:10 PM Muhammad Usama <m.usama@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

I have tested the patch against current master branch

(commit:6742e14959a3033d946ab3d67f5ce4c99367d332)

Both functions work without a problem and as expected.

Thanks!

Just a tiny comment/suggestion.
specifying a -ve timeout in pg_terminate_backed rightly throws an error,
I am not sure if it would be right or a wrong approach but I guess we

can ignore -ve

timeout in pg_terminate_backend function when wait (second argument) is

false.

e.g. pg_terminate_backend(12320, false,-1); -- ignore -1 timout since

wait is false

IMO, that's not a good idea. I see it this way, for any function first
the input args have to be validated. If okay, then follows the use of
those args and the main functionality. I can also see pg_promote(),
which first does the input timeout validation throwing error if it is
<= 0.

We can retain the existing behaviour.

Agreed!

Thanks
Best regards
Muhammad Usama

Show quoted text

The new status of this patch is: Ready for Committer

Thanks!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#19Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Muhammad Usama (#18)
RE: A new function to wait for the backend exit after termination

Hi

I take a look into the patch, and here some comments.

1.
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %ld milliseconds",
+					pid, timeout)));
+

The code use %ld to print int64 type.
How about use INT64_FORMAT, which looks more appropriate.

2.
+	if (timeout <= 0)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %ld", timeout)));
+		PG_RETURN_BOOL(r);
+	}

The same as 1.

3.
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int 	pid = PG_GETARG_DATUM(0);
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int		pid = PG_GETARG_INT32(0);

The code use different macro to get pid,
How about use PG_GETARG_INT32(0) for each one.

I changed the status to 'wait on anthor'.
The others of the patch LGTM,
I think it can be changed to Ready for Committer again, when this comment is confirmed.

Best regards,
houzj

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hou, Zhijie (#19)
Re: A new function to wait for the backend exit after termination

"Hou, Zhijie" <houzj.fnst@cn.fujitsu.com> writes:

+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %ld milliseconds",
+					pid, timeout)));

The code use %ld to print int64 type.
How about use INT64_FORMAT, which looks more appropriate.

This is a translatable message, so INT64_FORMAT is no good -- we need
something that is the same across platforms. The current project standard
for this problem is to use "%lld" and explicitly cast the argument to long
long int to match that.

regards, tom lane

#21Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Tom Lane (#20)
RE: A new function to wait for the backend exit after termination
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the

backend with PID %d within %ld milliseconds",

+ pid, timeout)));

The code use %ld to print int64 type.
How about use INT64_FORMAT, which looks more appropriate.

This is a translatable message, so INT64_FORMAT is no good -- we need
something that is the same across platforms. The current project standard
for this problem is to use "%lld" and explicitly cast the argument to long
long int to match that.

Thank you for pointing out that,
And sorry for did not think of it.

Yes, we can use %lld, (long long int) timeout.

Best regards,
houzj

#22Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Hou, Zhijie (#19)
RE: A new function to wait for the backend exit after termination

I changed the status to 'wait on anthor'.
The others of the patch LGTM,
I think it can be changed to Ready for Committer again, when this comment
is confirmed.

I am Sorry I forgot a possible typo comment.

+{ oid => '16386', descr => 'terminate a backend process and wait for it\'s exit or until timeout occurs'

Does the following change looks better?

Wait for it\'s exit => Wait for its exit

Best regards,
houzj

#23Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Hou, Zhijie (#19)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

Thanks for the review.

On Thu, Dec 3, 2020 at 7:24 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

1.
+
+       ereport(WARNING,
+                       (errmsg("could not wait for the termination of the backend with PID %d within %ld milliseconds",
+                                       pid, timeout)));
+

The code use %ld to print int64 type.
How about use INT64_FORMAT, which looks more appropriate.

Changed it to use %lld and typecasting timeout to (long long int) as
suggested by Tom.

2.
+       if (timeout <= 0)
+       {
+               ereport(WARNING,
+                               (errmsg("timeout cannot be negative or zero: %ld", timeout)));
+               PG_RETURN_BOOL(r);
+       }

The same as 1.

Changed.

3.
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+       int     pid = PG_GETARG_DATUM(0);
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+       int             pid = PG_GETARG_INT32(0);

The code use different macro to get pid,
How about use PG_GETARG_INT32(0) for each one.

Changed.

I am Sorry I forgot a possible typo comment.

+{ oid => '16386', descr => 'terminate a backend process and wait for it\'s exit or until timeout occurs'

Does the following change looks better?

Wait for it\'s exit => Wait for its exit

Changed.

I changed the status to 'wait on anthor'.
The others of the patch LGTM,
I think it can be changed to Ready for Committer again, when this comment is confirmed.

Attaching v4 patch. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchapplication/x-patch; name=v4-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchDownload
From e3f364e8a5a06b3e122b657e668146f220549acb Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 3 Dec 2020 09:09:13 +0530
Subject: [PATCH v4] pg_terminate_backend with wait, timeout and
 pg_wait_backend with timeout

This patch adds two new functions:
1. pg_terminate_backend(pid, wait, timeout) - terminate and
wait or timeout for a given backend.
2. pg_wait_backend(pid, timeout) - check the existence of the
backend with a given PID and wait or timeout until it goes away.
---
 doc/src/sgml/func.sgml                |  28 +++++-
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 137 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |   9 ++
 src/include/pgstat.h                  |   3 +-
 6 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..6b77b80336 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23954,7 +23954,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -23962,7 +23962,31 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        provided, only SIGTERM is sent to the backend with the given process
+        ID and <literal>false</literal> is returned immediately. But the
+        backend may still exist. With <parameter>wait</parameter> specified as
+        <literal>true</literal>, the function waits for the backend termination
+        until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. On timeout <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_backend</primary>
+        </indexterm>
+        <function>pg_wait_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check the existence of the session whose backend process has the
+        specified process ID. On success return <literal>true</literal>. This
+        function waits until the given <parameter>timeout</parameter> or the
+        default 100 milliseconds. On timeout <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..2d1907b4a8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1246,6 +1246,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend_and_wait'
+  PARALLEL UNSAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_backend(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_backend'
+  PARALLEL UNSAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9bad14981b..cf4d9f8730 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4040,6 +4040,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82cb9..1d5f001efc 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -149,6 +150,142 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+/*
+ * Wait until there is no backend process with the given PID or timeout. On
+ * timeout, a warning is thrown to the user.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check the backend existence. If doesn't exist, then check for the errno
+	 * ESRCH that confirms it and return true. If still exist, then wait for
+	 * waittime milliseconds, again check for the existence. Repeat until
+	 * timeout or an error occurs or a pending interrupt such as query cancel
+	 * is processed.
+	 */
+	for (;;)
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		remainingtime -= waittime;
+
+		if (remainingtime <= 0)
+			break;
+	}
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal termination of a backend process, wait until timeout or no backend
+ * has the given PID. If the wait input argument is false, this function
+ * behaviour is same as pg_terminate_backend. On timeout, a warning is thrown
+ * to the user. Self termination is allowed but waiting is not, because once
+ * the backend is self terminated there is no point in waiting for it to go
+ * away.
+ */
+Datum
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int 	pid = PG_GETARG_INT32(0);
+	bool	wait = PG_GETARG_BOOL(1);
+	int64	timeout = PG_GETARG_INT64(2);
+	bool	r = false;
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	/* Self termination is allowed but waiting is not after that. */
+	if (pid == MyProcPid && wait)
+		wait = false;
+
+	r = DatumGetBool(DirectFunctionCall1(pg_terminate_backend, pid));
+
+	/* Wait only if requested and the termination is successful. */
+	if (wait && r)
+		r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout occurs. If the PID is of the backend which issued this
+ * function, either timeout can occur or the function can succeed if the
+ * backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int		pid = PG_GETARG_INT32(0);
+	int64  	timeout = PG_GETARG_INT64(1);
+	PGPROC	*proc = NULL;
+	bool	r = false;
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		PG_RETURN_BOOL(r);
+	}
+
+	r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc2202b843..f77eadaa54 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6094,6 +6094,15 @@
 { oid => '2096', descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'terminate a backend process and wait for its exit or until timeout occurs',
+  proname => 'pg_terminate_backend', provolatile => 'v',
+  prorettype => 'bool', proargtypes => 'int4 bool int8',
+  proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend_and_wait' },
+{ oid => '16387', descr => 'waits for a backend process exit or timeout occurs',
+  proname => 'pg_wait_backend', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_backend' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5954068dec..c9f5668509 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -964,7 +964,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION
 } WaitEventIPC;
 
 /* ----------
-- 
2.25.1

#24Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Bharath Rupireddy (#23)
RE: A new function to wait for the backend exit after termination

Hi,

-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        provided, only SIGTERM is sent to the backend with the given process
+        ID and <literal>false</literal> is returned immediately. But the

I test the case when no wait and timeout are provided.
True is returned as the following which seems different from the doc.

postgres=# select pg_terminate_backend(pid);
pg_terminate_backend
----------------------
t
(1 row)

Best regards,
houzj

#25Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Hou, Zhijie (#24)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Fri, Dec 4, 2020 at 8:44 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
wrote:

-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter>

are

+ provided, only SIGTERM is sent to the backend with the given

process

+ ID and <literal>false</literal> is returned immediately. But the

I test the case when no wait and timeout are provided.
True is returned as the following which seems different from the doc.

postgres=# select pg_terminate_backend(pid);
pg_terminate_backend
----------------------
t
(1 row)

Thanks for pointing that out. I reworded that statement. Attaching v5
patch. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchapplication/octet-stream; name=v5-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchDownload
From ada257ccf8a438571ac5c07f2b900d30f9351f34 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 4 Dec 2020 11:55:35 +0530
Subject: [PATCH v5] pg_terminate_backend with wait, timeout and
 pg_wait_backend with timeout

This patch adds two new functions:
1. pg_terminate_backend(pid, wait, timeout) - terminate and
wait or timeout for a given backend.
2. pg_wait_backend(pid, timeout) - check the existence of the
backend with a given PID and wait or timeout until it goes away.
---
 doc/src/sgml/func.sgml                |  28 +++++-
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 137 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |   9 ++
 src/include/pgstat.h                  |   3 +-
 6 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..0c23df6f37 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23954,7 +23954,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -23962,7 +23962,31 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        provided, <function>pg_terminate_backend</function> sends SIGTERM to
+        the backend with the given process ID without waiting for its
+        termination. But the backend may still exist. With <parameter>wait</parameter> specified as
+        <literal>true</literal>, the function waits for the backend termination
+        until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. On timeout <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_backend</primary>
+        </indexterm>
+        <function>pg_wait_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check the existence of the session whose backend process has the
+        specified process ID. On success return <literal>true</literal>. This
+        function waits until the given <parameter>timeout</parameter> or the
+        default 100 milliseconds. On timeout <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..2d1907b4a8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1246,6 +1246,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend_and_wait'
+  PARALLEL UNSAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_backend(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_backend'
+  PARALLEL UNSAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9bad14981b..cf4d9f8730 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4040,6 +4040,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82cb9..1d5f001efc 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -149,6 +150,142 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+/*
+ * Wait until there is no backend process with the given PID or timeout. On
+ * timeout, a warning is thrown to the user.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check the backend existence. If doesn't exist, then check for the errno
+	 * ESRCH that confirms it and return true. If still exist, then wait for
+	 * waittime milliseconds, again check for the existence. Repeat until
+	 * timeout or an error occurs or a pending interrupt such as query cancel
+	 * is processed.
+	 */
+	for (;;)
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		remainingtime -= waittime;
+
+		if (remainingtime <= 0)
+			break;
+	}
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal termination of a backend process, wait until timeout or no backend
+ * has the given PID. If the wait input argument is false, this function
+ * behaviour is same as pg_terminate_backend. On timeout, a warning is thrown
+ * to the user. Self termination is allowed but waiting is not, because once
+ * the backend is self terminated there is no point in waiting for it to go
+ * away.
+ */
+Datum
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int 	pid = PG_GETARG_INT32(0);
+	bool	wait = PG_GETARG_BOOL(1);
+	int64	timeout = PG_GETARG_INT64(2);
+	bool	r = false;
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	/* Self termination is allowed but waiting is not after that. */
+	if (pid == MyProcPid && wait)
+		wait = false;
+
+	r = DatumGetBool(DirectFunctionCall1(pg_terminate_backend, pid));
+
+	/* Wait only if requested and the termination is successful. */
+	if (wait && r)
+		r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout occurs. If the PID is of the backend which issued this
+ * function, either timeout can occur or the function can succeed if the
+ * backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int		pid = PG_GETARG_INT32(0);
+	int64  	timeout = PG_GETARG_INT64(1);
+	PGPROC	*proc = NULL;
+	bool	r = false;
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		PG_RETURN_BOOL(r);
+	}
+
+	r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc2202b843..f77eadaa54 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6094,6 +6094,15 @@
 { oid => '2096', descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'terminate a backend process and wait for its exit or until timeout occurs',
+  proname => 'pg_terminate_backend', provolatile => 'v',
+  prorettype => 'bool', proargtypes => 'int4 bool int8',
+  proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend_and_wait' },
+{ oid => '16387', descr => 'waits for a backend process exit or timeout occurs',
+  proname => 'pg_wait_backend', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_backend' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5954068dec..c9f5668509 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -964,7 +964,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION
 } WaitEventIPC;
 
 /* ----------
-- 
2.25.1

#26Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Bharath Rupireddy (#25)
RE: A new function to wait for the backend exit after termination

Hi,

When test pg_terminate_backend_and_wait with parallel query.
I noticed that the function is not defined as parallel safe.

I am not very familiar with the standard about whether a function should be parallel safe.
But I found the following function are all defined as parallel safe:

pg_promote
pg_terminate_backend(integer)
pg_sleep*

Is there a reason why pg_terminate_backend_and_wait are not parallel safe ?
(I'm sorry if I miss something in previous mails.)

Best regards,
houzj

#27Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Hou, Zhijie (#26)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Fri, Dec 4, 2020 at 2:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Hi,

When test pg_terminate_backend_and_wait with parallel query.
I noticed that the function is not defined as parallel safe.

I am not very familiar with the standard about whether a function should be parallel safe.
But I found the following function are all defined as parallel safe:

pg_promote
pg_terminate_backend(integer)
pg_sleep*

Is there a reason why pg_terminate_backend_and_wait are not parallel safe ?
(I'm sorry if I miss something in previous mails.)

I'm not quite sure of a use case where existing pg_terminate_backend()
or for that matter the new pg_terminate_backend_and_wait() and
pg_wait_backend() will ever get used from parallel workers. Having
said that, I marked the new functions as parallel safe to keep it the
way it is with existing pg_terminate_backend().

postgres=# select proparallel, proname, prosrc from pg_proc where
proname IN ('pg_wait_backend', 'pg_terminate_backend');
proparallel | proname | prosrc
-------------+----------------------+-------------------------------
s | pg_terminate_backend | pg_terminate_backend
s | pg_wait_backend | pg_wait_backend
s | pg_terminate_backend | pg_terminate_backend_and_wait
(3 rows)

Attaching v6 patch. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v6-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchapplication/octet-stream; name=v6-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patchDownload
From 546ce72eceace17257d2d8f75b63fcce174f917e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 4 Dec 2020 14:41:32 +0530
Subject: [PATCH v6] pg_terminate_backend with wait, timeout and
 pg_wait_backend with timeout

This patch adds two new functions:
1. pg_terminate_backend(pid, wait, timeout) - terminate and
wait or timeout for a given backend.
2. pg_wait_backend(pid, timeout) - check the existence of the
backend with a given PID and wait or timeout until it goes away.
---
 doc/src/sgml/func.sgml                |  28 +++++-
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 137 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |   9 ++
 src/include/pgstat.h                  |   3 +-
 6 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..0c23df6f37 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23954,7 +23954,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -23962,7 +23962,31 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        provided, <function>pg_terminate_backend</function> sends SIGTERM to
+        the backend with the given process ID without waiting for its
+        termination. But the backend may still exist. With <parameter>wait</parameter> specified as
+        <literal>true</literal>, the function waits for the backend termination
+        until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. On timeout <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_backend</primary>
+        </indexterm>
+        <function>pg_wait_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check the existence of the session whose backend process has the
+        specified process ID. On success return <literal>true</literal>. This
+        function waits until the given <parameter>timeout</parameter> or the
+        default 100 milliseconds. On timeout <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..c71d1bd10a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1246,6 +1246,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend_and_wait'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_backend(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_backend'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9bad14981b..cf4d9f8730 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4040,6 +4040,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82cb9..1d5f001efc 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -149,6 +150,142 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+/*
+ * Wait until there is no backend process with the given PID or timeout. On
+ * timeout, a warning is thrown to the user.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check the backend existence. If doesn't exist, then check for the errno
+	 * ESRCH that confirms it and return true. If still exist, then wait for
+	 * waittime milliseconds, again check for the existence. Repeat until
+	 * timeout or an error occurs or a pending interrupt such as query cancel
+	 * is processed.
+	 */
+	for (;;)
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		remainingtime -= waittime;
+
+		if (remainingtime <= 0)
+			break;
+	}
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal termination of a backend process, wait until timeout or no backend
+ * has the given PID. If the wait input argument is false, this function
+ * behaviour is same as pg_terminate_backend. On timeout, a warning is thrown
+ * to the user. Self termination is allowed but waiting is not, because once
+ * the backend is self terminated there is no point in waiting for it to go
+ * away.
+ */
+Datum
+pg_terminate_backend_and_wait(PG_FUNCTION_ARGS)
+{
+	int 	pid = PG_GETARG_INT32(0);
+	bool	wait = PG_GETARG_BOOL(1);
+	int64	timeout = PG_GETARG_INT64(2);
+	bool	r = false;
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	/* Self termination is allowed but waiting is not after that. */
+	if (pid == MyProcPid && wait)
+		wait = false;
+
+	r = DatumGetBool(DirectFunctionCall1(pg_terminate_backend, pid));
+
+	/* Wait only if requested and the termination is successful. */
+	if (wait && r)
+		r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout occurs. If the PID is of the backend which issued this
+ * function, either timeout can occur or the function can succeed if the
+ * backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_backend(PG_FUNCTION_ARGS)
+{
+	int		pid = PG_GETARG_INT32(0);
+	int64  	timeout = PG_GETARG_INT64(1);
+	PGPROC	*proc = NULL;
+	bool	r = false;
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+		PG_RETURN_BOOL(r);
+	}
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		PG_RETURN_BOOL(r);
+	}
+
+	r = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(r);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc2202b843..f77eadaa54 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6094,6 +6094,15 @@
 { oid => '2096', descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'terminate a backend process and wait for its exit or until timeout occurs',
+  proname => 'pg_terminate_backend', provolatile => 'v',
+  prorettype => 'bool', proargtypes => 'int4 bool int8',
+  proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend_and_wait' },
+{ oid => '16387', descr => 'waits for a backend process exit or timeout occurs',
+  proname => 'pg_wait_backend', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_backend' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5954068dec..c9f5668509 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -964,7 +964,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION
 } WaitEventIPC;
 
 /* ----------
-- 
2.25.1

#28hou zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Bharath Rupireddy (#27)
Re: A new function to wait for the backend exit after termination

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

Thanks for the new patch, the patch LGTM and works as expected

The new status of this patch is: Ready for Committer

#29Magnus Hagander
magnus@hagander.net
In reply to: Bharath Rupireddy (#27)
Re: A new function to wait for the backend exit after termination

On Fri, Dec 4, 2020 at 10:13 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Dec 4, 2020 at 2:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Hi,

When test pg_terminate_backend_and_wait with parallel query.
I noticed that the function is not defined as parallel safe.

I am not very familiar with the standard about whether a function should be parallel safe.
But I found the following function are all defined as parallel safe:

pg_promote
pg_terminate_backend(integer)
pg_sleep*

Is there a reason why pg_terminate_backend_and_wait are not parallel safe ?
(I'm sorry if I miss something in previous mails.)

I'm not quite sure of a use case where existing pg_terminate_backend()
or for that matter the new pg_terminate_backend_and_wait() and
pg_wait_backend() will ever get used from parallel workers. Having
said that, I marked the new functions as parallel safe to keep it the
way it is with existing pg_terminate_backend().

postgres=# select proparallel, proname, prosrc from pg_proc where
proname IN ('pg_wait_backend', 'pg_terminate_backend');
proparallel | proname | prosrc
-------------+----------------------+-------------------------------
s | pg_terminate_backend | pg_terminate_backend
s | pg_wait_backend | pg_wait_backend
s | pg_terminate_backend | pg_terminate_backend_and_wait
(3 rows)

Attaching v6 patch. Please have a look.

Taking another look at this patch. Here are a few more comments:

For pg_terminate_backend, wouldn't it be easier to just create one
function that has a default for wait and a default for timeout?
Instead of having one version that takes one argument, and another
version that takes 3? Seems that would also simplify the
implementation by not having to set things up and call indirectly?

pg_wait_backend() "checks the existence of the session", and "returns
true on success". It's unclear from that what's considered a success.
Also, technically, it only checks for the existence of the backend and
not the session inside, I think?

But also the fact is that it returns true when the backend is *gone*,
which I think is a very strange definition of "success". In fact,
isn't pg_wait_backend() is a pretty bad name for a function that does
this? Maybe pg_wait_for_backend_termination()? (the internal function
has a name that more matches what it does, but the SQL function does
not)

Why is the for(;;) loop in pg_wait_until_termination not a do {}
while(remainingtime > 0)?

The wait event needs to be added to the list in the documentation.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#30Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Magnus Hagander (#29)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Sat, Mar 6, 2021 at 10:36 PM Magnus Hagander <magnus@hagander.net> wrote:

Attaching v6 patch. Please have a look.

Taking another look at this patch. Here are a few more comments:

Thanks for the review comments.

For pg_terminate_backend, wouldn't it be easier to just create one
function that has a default for wait and a default for timeout?
Instead of having one version that takes one argument, and another
version that takes 3? Seems that would also simplify the
implementation by not having to set things up and call indirectly?

Done.

pg_wait_backend() "checks the existence of the session", and "returns
true on success". It's unclear from that what's considered a success.
Also, technically, it only checks for the existence of the backend and
not the session inside, I think?
But also the fact is that it returns true when the backend is *gone*,
which I think is a very strange definition of "success".

Yes, it only checks the existence of the backend process. Changed the
phrasing a bit to make things clear.

In fact, isn't pg_wait_backend() is a pretty bad name for a function that does
this? Maybe pg_wait_for_backend_termination()? (the internal function
has a name that more matches what it does, but the SQL function does
not)

pg_wait_for_backend_termination LGTM, so changed pg_wait_backend to that name.

Why is the for(;;) loop in pg_wait_until_termination not a do {}
while(remainingtime > 0)?

Done.

The wait event needs to be added to the list in the documentation.

Added to monitoring.sgml's IPC wait event type.

Attaching v7 patch for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v7-0001-pg_terminate_backend-with-wait-and-timeout.patchapplication/octet-stream; name=v7-0001-pg_terminate_backend-with-wait-and-timeout.patchDownload
From a11c71c0865a16b7f23fc088452123b720da4544 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sun, 7 Mar 2021 14:33:14 +0530
Subject: [PATCH v7] pg_terminate_backend with wait and timeout

This patch adds extra default arguments wait and timeout
to pg_terminate_backend which will be used to terminate
and wait timeout milliseconds for the backend's termination.

This patch also adds a new function
pg_wait_for_backend_termination(pid, timeout) which checks
for the existence of the backend with given PID and waits
timeout milliseconds for its termination.
---
 doc/src/sgml/func.sgml                |  31 +++++-
 doc/src/sgml/monitoring.sgml          |   4 +
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 145 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat       |   9 +-
 src/include/pgstat.h                  |   3 +-
 7 files changed, 196 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ece09699ef..b72dec1f77 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,7 +24821,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -24829,7 +24829,34 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        specified, this function sends SIGTERM to the backend with the given
+        process ID without waiting for its actual termination and returns <literal>true</literal>.
+        Sometimes the backend process may still exist. With <parameter>wait</parameter>
+        specified as <literal>true</literal>, the function ensures that the
+        backend process is actually terminated. It keeps checking for the
+        existence of the backend process either until the given <parameter>timeout</parameter>
+        or default 100 milliseconds and returns <literal>true</literal>. On
+        timeout a warning is emitted and <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_for_backend_termination</primary>
+        </indexterm>
+        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Checks the existence of backend process with specified process ID. It
+        waits until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. <literal>true</literal> is returned when the backend is
+        found to be not existing within the <parameter>timeout</parameter> milliseconds.
+        On timeout, a warning is emitted and <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..1d97482849 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1762,6 +1762,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for the group leader to update transaction status at
        end of a parallel operation.</entry>
      </row>
+     <row>
+      <entry><literal>BackendTermination</literal></entry>
+      <entry>Waiting for the termination of a backend.</entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fc94a73a54..b7842c787d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1265,6 +1265,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean DEFAULT false, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..eb8d659c28 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 69fe23a256..3b19f4f95d 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -126,15 +127,88 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Signal to terminate a backend process.  This is allowed if you are a member
- * of the role whose process is being terminated.
+ * Wait until there is no backend process with the given PID and return true.
+ * On timeout, a warning is emitted and false is returned.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check existence of the backend. If it doesn't exist, then check for the
+	 * errno ESRCH that confirms it and return true. If the backend still
+	 * exists, then wait for waittime milliseconds, again check for the
+	 * existence. Repeat this until timeout or an error occurs or a pending
+	 * interrupt such as query cancel gets processed.
+	 */
+	do
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		remainingtime -= waittime;
+	} while (remainingtime > 0);
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal to terminate a backend process. This is allowed if you are a member
+ * of the role whose process is being terminated. If wait input argument is
+ * true, then wait until given timeout (default 100) milliseconds or no backend
+ * has the given PID. On timeout, a warning is emitted. Self termination is
+ * allowed but waiting is not, because once the backend is self-terminated
+ * there is no point in waiting for it to go away. If wait input argument is
+ * false (which is default), then this function just signals the backend and
+ * exits.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int	 pid;
+	int	 r;
+	bool wait;
+	bool result;
+
+	pid = PG_GETARG_INT32(0);
+	wait = PG_GETARG_BOOL(1);
+	r = pg_signal_backend(pid, SIGTERM);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
@@ -146,7 +220,70 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
 
-	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
+	result = (r == SIGNAL_BACKEND_SUCCESS);
+
+	/*
+	 * Wait only if requested and the termination is successful. Self
+	 * termination is allowed but waiting is not.
+	 */
+	if (wait && pid != MyProcPid && result)
+	{
+		int64	timeout = PG_GETARG_INT64(2);
+
+		if (timeout < 1)
+		{
+			ereport(WARNING,
+					(errmsg("timeout cannot be negative or zero: %lld",
+							(long long int) timeout)));
+
+			result = false;
+		}
+		else
+			result = pg_wait_until_termination(pid, timeout);
+	}
+
+	PG_RETURN_BOOL(result);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout milliseconds occurs. If the PID is of the backend which
+ * issued this function, either timeout can occur or the function can succeed
+ * if the backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
+{
+	int	 pid;
+	int64	timeout;
+	PGPROC	*proc = NULL;
+	bool	result = false;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 506689d8ac..42d0d876d9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6142,9 +6142,14 @@
 { oid => '2171', descr => 'cancel a server process\' current query',
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
-{ oid => '2096', descr => 'terminate a server process',
+{ oid => '2096', descr => 'terminate a backend process and if specified, wait for its exit or until timeout occurs',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
-  proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+  proargtypes => 'int4 bool int8', proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs',
+  proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..6bc8b88b45 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -998,7 +998,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION
 } WaitEventIPC;
 
 /* ----------
-- 
2.25.1

#31Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#30)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Sun, Mar 7, 2021 at 2:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Attaching v7 patch for further review.

Attaching v8 patch after rebasing on to the latest master.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v8-0001-pg_terminate_backend-with-wait-and-timeout.patchapplication/octet-stream; name=v8-0001-pg_terminate_backend-with-wait-and-timeout.patchDownload
From 5294cb760ec370580dd375079bf2f9c2d112c0ad Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 15 Mar 2021 08:56:55 +0530
Subject: [PATCH v8] pg_terminate_backend with wait and timeout

This patch adds extra default arguments wait and timeout
to pg_terminate_backend which will be used to terminate
and wait timeout milliseconds for the backend's termination.

This patch also adds a new function
pg_wait_for_backend_termination(pid, timeout) which checks
for the existence of the backend with given PID and waits
timeout milliseconds for its termination.
---
 doc/src/sgml/func.sgml                |  31 +++++-
 doc/src/sgml/monitoring.sgml          |   4 +
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 145 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat       |   9 +-
 src/include/pgstat.h                  |   3 +-
 7 files changed, 196 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..0a417a0b2b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,7 +24821,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -24829,7 +24829,34 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        specified, this function sends SIGTERM to the backend with the given
+        process ID without waiting for its actual termination and returns <literal>true</literal>.
+        Sometimes the backend process may still exist. With <parameter>wait</parameter>
+        specified as <literal>true</literal>, the function ensures that the
+        backend process is actually terminated. It keeps checking for the
+        existence of the backend process either until the given <parameter>timeout</parameter>
+        or default 100 milliseconds and returns <literal>true</literal>. On
+        timeout a warning is emitted and <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_for_backend_termination</primary>
+        </indexterm>
+        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Checks the existence of backend process with specified process ID. It
+        waits until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. <literal>true</literal> is returned when the backend is
+        found to be not existing within the <parameter>timeout</parameter> milliseconds.
+        On timeout, a warning is emitted and <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c35045faa1..50eb46f607 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1775,6 +1775,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for the group leader to update transaction status at
        end of a parallel operation.</entry>
      </row>
+     <row>
+      <entry><literal>BackendTermination</literal></entry>
+      <entry>Waiting for the termination of a backend.</entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..85521546e3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1277,6 +1277,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean DEFAULT false, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..2f72241d2f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4130,6 +4130,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 69fe23a256..a3331dfad6 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -126,15 +127,88 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Signal to terminate a backend process.  This is allowed if you are a member
- * of the role whose process is being terminated.
+ * Wait until there is no backend process with the given PID and return true.
+ * On timeout, a warning is emitted and false is returned.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check existence of the backend. If it doesn't exist, then check for the
+	 * errno ESRCH that confirms it and return true. If the backend still
+	 * exists, then wait for waittime milliseconds, again check for the
+	 * existence. Repeat this until timeout or an error occurs or a pending
+	 * interrupt such as query cancel gets processed.
+	 */
+	do
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		remainingtime -= waittime;
+	} while (remainingtime > 0);
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal to terminate a backend process. This is allowed if you are a member
+ * of the role whose process is being terminated. If wait input argument is
+ * true, then wait until given timeout (default 100) milliseconds or no backend
+ * has the given PID. On timeout, a warning is emitted. Self termination is
+ * allowed but waiting is not, because once the backend is self-terminated
+ * there is no point in waiting for it to go away. If wait input argument is
+ * false (which is default), then this function just signals the backend and
+ * exits.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int	 pid;
+	int	 r;
+	bool wait;
+	bool result;
+
+	pid = PG_GETARG_INT32(0);
+	wait = PG_GETARG_BOOL(1);
+	r = pg_signal_backend(pid, SIGTERM);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
@@ -146,7 +220,70 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
 
-	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
+	result = (r == SIGNAL_BACKEND_SUCCESS);
+
+	/*
+	 * Wait only if requested and the termination is successful. Self
+	 * termination is allowed but waiting is not.
+	 */
+	if (wait && pid != MyProcPid && result)
+	{
+		int64	timeout = PG_GETARG_INT64(2);
+
+		if (timeout < 1)
+		{
+			ereport(WARNING,
+					(errmsg("timeout cannot be negative or zero: %lld",
+							(long long int) timeout)));
+
+			result = false;
+		}
+		else
+			result = pg_wait_until_termination(pid, timeout);
+	}
+
+	PG_RETURN_BOOL(result);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout milliseconds occurs. If the PID is of the backend which
+ * issued this function, either timeout can occur or the function can succeed
+ * if the backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
+{
+	int	 pid;
+	int64	timeout;
+	PGPROC	*proc = NULL;
+	bool	result = false;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 1)
+	{
+		ereport(WARNING,
+				(errmsg("timeout cannot be negative or zero: %lld",
+						(long long int) timeout)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..3f158de1dc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6143,9 +6143,14 @@
 { oid => '2171', descr => 'cancel a server process\' current query',
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
-{ oid => '2096', descr => 'terminate a server process',
+{ oid => '2096', descr => 'terminate a backend process and if specified, wait for its exit or until timeout occurs',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
-  proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+  proargtypes => 'int4 bool int8', proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs',
+  proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..c6bc672be6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1010,7 +1010,8 @@ typedef enum
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
 	WAIT_EVENT_WALRCV_EXIT,
-	WAIT_EVENT_XACT_GROUP_UPDATE
+	WAIT_EVENT_XACT_GROUP_UPDATE,
+	WAIT_EVENT_BACKEND_TERMINATION
 } WaitEventIPC;
 
 /* ----------
-- 
2.25.1

#32Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Bharath Rupireddy (#31)
Re: A new function to wait for the backend exit after termination

On 2021/03/15 12:27, Bharath Rupireddy wrote:

On Sun, Mar 7, 2021 at 2:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Attaching v7 patch for further review.

Attaching v8 patch after rebasing on to the latest master.

Thanks for rebasing the patch!

-       WAIT_EVENT_XACT_GROUP_UPDATE
+       WAIT_EVENT_XACT_GROUP_UPDATE,
+       WAIT_EVENT_BACKEND_TERMINATION

These should be listed in alphabetical order.

In pg_wait_until_termination's do-while loop, ResetLatch() should be called. Otherwise, it would enter busy-loop after any signal arrives. Because the latch is kept set and WaitLatch() always exits immediately in that case.

+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 10;

10 ms per cycle seems too frequent?

+			ereport(WARNING,
+					(errmsg("timeout cannot be negative or zero: %lld",
+							(long long int) timeout)));
+
+			result = false;

IMO the parameter should be verified before doing the actual thing.

Why is WARNING thrown in this case? Isn't it better to throw ERROR like pg_promote() does?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#33Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Fujii Masao (#32)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Mon, Mar 15, 2021 at 10:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/03/15 12:27, Bharath Rupireddy wrote:

On Sun, Mar 7, 2021 at 2:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Attaching v7 patch for further review.

Attaching v8 patch after rebasing on to the latest master.

Thanks for rebasing the patch!

Thanks for reviewing.

-       WAIT_EVENT_XACT_GROUP_UPDATE
+       WAIT_EVENT_XACT_GROUP_UPDATE,
+       WAIT_EVENT_BACKEND_TERMINATION

These should be listed in alphabetical order.

Done.

In pg_wait_until_termination's do-while loop, ResetLatch() should be called. Otherwise, it would enter busy-loop after any signal arrives. Because the latch is kept set and WaitLatch() always exits immediately in that case.

Done.

+       /*
+        * Wait in steps of waittime milliseconds until this function exits or
+        * timeout.
+        */
+       int64   waittime = 10;

10 ms per cycle seems too frequent?

Increased it to 100msec.

+                       ereport(WARNING,
+                                       (errmsg("timeout cannot be negative or zero: %lld",
+                                                       (long long int) timeout)));
+
+                       result = false;

IMO the parameter should be verified before doing the actual thing.

Done.

Why is WARNING thrown in this case? Isn't it better to throw ERROR like pg_promote() does?

Done.

Attaching v9 patch for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v9-0001-pg_terminate_backend-with-wait-and-timeout.patchapplication/x-patch; name=v9-0001-pg_terminate_backend-with-wait-and-timeout.patchDownload
From ddcf3f3f4a3dbabdf558b828c5728dfbbdc13a31 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 16 Mar 2021 14:46:41 +0530
Subject: [PATCH v9] pg_terminate_backend with wait and timeout

This patch adds extra default arguments wait and timeout
to pg_terminate_backend which will be used to terminate
and wait timeout milliseconds for the backend's termination.

This patch also adds a new function
pg_wait_for_backend_termination(pid, timeout) which checks
for the existence of the backend with given PID and waits
timeout milliseconds for its termination.
---
 doc/src/sgml/func.sgml                |  31 +++++-
 doc/src/sgml/monitoring.sgml          |   4 +
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 143 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat       |   9 +-
 src/include/pgstat.h                  |   3 +-
 7 files changed, 194 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..0a417a0b2b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,7 +24821,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -24829,7 +24829,34 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        specified, this function sends SIGTERM to the backend with the given
+        process ID without waiting for its actual termination and returns <literal>true</literal>.
+        Sometimes the backend process may still exist. With <parameter>wait</parameter>
+        specified as <literal>true</literal>, the function ensures that the
+        backend process is actually terminated. It keeps checking for the
+        existence of the backend process either until the given <parameter>timeout</parameter>
+        or default 100 milliseconds and returns <literal>true</literal>. On
+        timeout a warning is emitted and <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_for_backend_termination</primary>
+        </indexterm>
+        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Checks the existence of backend process with specified process ID. It
+        waits until the given <parameter>timeout</parameter> or the default 100
+        milliseconds. <literal>true</literal> is returned when the backend is
+        found to be not existing within the <parameter>timeout</parameter> milliseconds.
+        On timeout, a warning is emitted and <literal>false</literal> is
+        returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..f362184d5e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1569,6 +1569,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
     </thead>
 
     <tbody>
+    <row>
+      <entry><literal>BackendTermination</literal></entry>
+      <entry>Waiting for the termination of a backend.</entry>
+     </row>
      <row>
       <entry><literal>BackupWaitWalArchive</literal></entry>
       <entry>Waiting for WAL files required for a backup to be successfully
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..85521546e3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1277,6 +1277,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean DEFAULT false, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..2f72241d2f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4130,6 +4130,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 69fe23a256..8d879af47c 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -126,15 +127,103 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Signal to terminate a backend process.  This is allowed if you are a member
- * of the role whose process is being terminated.
+ * Wait until there is no backend process with the given PID and return true.
+ * On timeout, a warning is emitted and false is returned.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 100;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check existence of the backend. If it doesn't exist, then check for the
+	 * errno ESRCH that confirms it and return true. If the backend still
+	 * exists, then wait for waittime milliseconds, again check for the
+	 * existence. Repeat this until timeout or an error occurs or a pending
+	 * interrupt such as query cancel gets processed.
+	 */
+	do
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		ResetLatch(MyLatch);
+
+		remainingtime -= waittime;
+	} while (remainingtime > 0);
+
+	ereport(WARNING,
+			(errmsg("could not wait for the termination of the backend with PID %d within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal to terminate a backend process. This is allowed if you are a member
+ * of the role whose process is being terminated. If wait input argument is
+ * true, then wait until given timeout (default 100) milliseconds or no backend
+ * has the given PID. On timeout, a warning is emitted. Self termination is
+ * allowed but waiting is not, because once the backend is self-terminated
+ * there is no point in waiting for it to go away. If wait input argument is
+ * false (which is default), then this function just signals the backend and
+ * exits.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int	 pid;
+	int	 r;
+	bool wait;
+	int timeout;
+	bool result;
+
+	pid = PG_GETARG_INT32(0);
+	wait = PG_GETARG_BOOL(1);
+
+	/* If asked to wait, check whether the timeout value is valid or not. */
+	if (wait && pid != MyProcPid)
+	{
+		timeout = PG_GETARG_INT64(2);
+
+		if (timeout <= 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					 errmsg("\"timeout\" must not be negative or zero")));
+	}
+
+	r = pg_signal_backend(pid, SIGTERM);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
@@ -146,7 +235,53 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
 
-	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
+	result = (r == SIGNAL_BACKEND_SUCCESS);
+
+	/*
+	 * Wait only if requested and the termination is successful. Self
+	 * termination is allowed but waiting is not.
+	 */
+	if (wait && pid != MyProcPid && result)
+		result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout milliseconds occurs. If the PID is of the backend which
+ * issued this function, either timeout can occur or the function can succeed
+ * if the backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
+{
+	int	 pid;
+	int64	timeout;
+	PGPROC	*proc = NULL;
+	bool	result = false;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("\"timeout\" must not be negative or zero")));
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..3f158de1dc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6143,9 +6143,14 @@
 { oid => '2171', descr => 'cancel a server process\' current query',
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
-{ oid => '2096', descr => 'terminate a server process',
+{ oid => '2096', descr => 'terminate a backend process and if specified, wait for its exit or until timeout occurs',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
-  proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+  proargtypes => 'int4 bool int8', proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs',
+  proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..bbfe2f2f04 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -967,7 +967,8 @@ typedef enum
  */
 typedef enum
 {
-	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+	WAIT_EVENT_BACKEND_TERMINATION = PG_WAIT_IPC,
+	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE,
 	WAIT_EVENT_BGWORKER_SHUTDOWN,
 	WAIT_EVENT_BGWORKER_STARTUP,
 	WAIT_EVENT_BTREE_PAGE,
-- 
2.25.1

#34Magnus Hagander
magnus@hagander.net
In reply to: Bharath Rupireddy (#33)
Re: A new function to wait for the backend exit after termination

On Tue, Mar 16, 2021 at 10:38 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Mar 15, 2021 at 10:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/03/15 12:27, Bharath Rupireddy wrote:

On Sun, Mar 7, 2021 at 2:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Attaching v7 patch for further review.

Attaching v8 patch after rebasing on to the latest master.

Thanks for rebasing the patch!

Thanks for reviewing.

-       WAIT_EVENT_XACT_GROUP_UPDATE
+       WAIT_EVENT_XACT_GROUP_UPDATE,
+       WAIT_EVENT_BACKEND_TERMINATION

These should be listed in alphabetical order.

Done.

In pg_wait_until_termination's do-while loop, ResetLatch() should be called. Otherwise, it would enter busy-loop after any signal arrives. Because the latch is kept set and WaitLatch() always exits immediately in that case.

Done.

+       /*
+        * Wait in steps of waittime milliseconds until this function exits or
+        * timeout.
+        */
+       int64   waittime = 10;

10 ms per cycle seems too frequent?

Increased it to 100msec.

+                       ereport(WARNING,
+                                       (errmsg("timeout cannot be negative or zero: %lld",
+                                                       (long long int) timeout)));
+
+                       result = false;

IMO the parameter should be verified before doing the actual thing.

Done.

Why is WARNING thrown in this case? Isn't it better to throw ERROR like pg_promote() does?

Done.

Attaching v9 patch for further review.

Almost there :)

Does it really make sense that pg_wait_for_backend_termination()
defaults to waiting *100 milliseconds*, and then logs a warning? That
seems extremely short if I'm explicitly asking it to wait.

I'd argue that 100ms is too short for pg_terminate_backend() as well,
but I think it's a bit more reasonable there.

Wait events should be in alphabetical order in pgstat_get_wait_ipc()
as well, not just in the header (which was adjusted per Fujii's
comment)

+ (errmsg("could not wait for the termination of
the backend with PID %d within %lld milliseconds",

That's not true though? The wait succeeded, it just timed out? Isn't
itm ore like "backend with PID %d did not terminate within %lld
milliseconds"?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#35Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Magnus Hagander (#34)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Tue, Mar 16, 2021 at 9:48 PM Magnus Hagander <magnus@hagander.net> wrote:

Does it really make sense that pg_wait_for_backend_termination()
defaults to waiting *100 milliseconds*, and then logs a warning? That
seems extremely short if I'm explicitly asking it to wait.

I increased the default wait timeout to 5seconds.

Wait events should be in alphabetical order in pgstat_get_wait_ipc()
as well, not just in the header (which was adjusted per Fujii's
comment)

Done.

+ (errmsg("could not wait for the termination of
the backend with PID %d within %lld milliseconds",

That's not true though? The wait succeeded, it just timed out? Isn't
itm ore like "backend with PID %d did not terminate within %lld
milliseconds"?

Looks better. Done.

Attaching v10 patch for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v10-0001-pg_terminate_backend-with-wait-and-timeout.patchapplication/octet-stream; name=v10-0001-pg_terminate_backend-with-wait-and-timeout.patchDownload
From 706decaed16d501c637390fe8d58abc9315b6de5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 17 Mar 2021 06:55:27 +0530
Subject: [PATCH v10] pg_terminate_backend with wait and timeout

This patch adds extra default arguments wait and timeout
to pg_terminate_backend which will be used to terminate
and wait timeout milliseconds for the backend's termination.

This patch also adds a new function
pg_wait_for_backend_termination(pid, timeout) which checks
for the existence of the backend with given PID and waits
timeout milliseconds for its termination.
---
 doc/src/sgml/func.sgml                |  31 +++++-
 doc/src/sgml/monitoring.sgml          |   4 +
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 143 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat       |   9 +-
 src/include/pgstat.h                  |   3 +-
 7 files changed, 194 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..c0f47008ad 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,7 +24821,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>wait</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>100</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -24829,7 +24829,34 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When no
+        <parameter>wait</parameter> and <parameter>timeout</parameter> are
+        specified, this function sends SIGTERM to the backend with the given
+        process ID without waiting for its actual termination and returns <literal>true</literal>.
+        Sometimes the backend process may still exist. With <parameter>wait</parameter>
+        specified as <literal>true</literal>, the function ensures that the
+        backend process is actually terminated. It keeps checking for the
+        existence of the backend process either until the given <parameter>timeout</parameter>
+        or default 100 milliseconds and returns <literal>true</literal>. On
+        timeout a warning is emitted and <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_for_backend_termination</primary>
+        </indexterm>
+        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>5000</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Checks the existence of backend process with specified process ID. It
+        waits until the given <parameter>timeout</parameter> or the default
+        5000 milliseconds or 5 seconds. <literal>true</literal> is returned
+        when the backend is found to be not existing within the
+        <parameter>timeout</parameter> milliseconds. On timeout, a warning is
+        emitted and <literal>false</literal> is returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..f362184d5e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1569,6 +1569,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
     </thead>
 
     <tbody>
+    <row>
+      <entry><literal>BackendTermination</literal></entry>
+      <entry>Waiting for the termination of a backend.</entry>
+     </row>
      <row>
       <entry><literal>BackupWaitWalArchive</literal></entry>
       <entry>Waiting for WAL files required for a backup to be successfully
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..ee2ee8204a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1277,6 +1277,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, wait boolean DEFAULT false, timeout int8 DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..ce34d39a73 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3998,6 +3998,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 
 	switch (w)
 	{
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 		case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
 			event_name = "BackupWaitWalArchive";
 			break;
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 69fe23a256..331cc9c2be 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -126,15 +127,103 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Signal to terminate a backend process.  This is allowed if you are a member
- * of the role whose process is being terminated.
+ * Wait until there is no backend process with the given PID and return true.
+ * On timeout, a warning is emitted and false is returned.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 100;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check existence of the backend. If it doesn't exist, then check for the
+	 * errno ESRCH that confirms it and return true. If the backend still
+	 * exists, then wait for waittime milliseconds, again check for the
+	 * existence. Repeat this until timeout or an error occurs or a pending
+	 * interrupt such as query cancel gets processed.
+	 */
+	do
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+			{
+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;
+			}
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		ResetLatch(MyLatch);
+
+		remainingtime -= waittime;
+	} while (remainingtime > 0);
+
+	ereport(WARNING,
+			(errmsg("backend with PID %d did not terminate within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal to terminate a backend process. This is allowed if you are a member
+ * of the role whose process is being terminated. If wait input argument is
+ * true, then wait until given timeout (default 100) milliseconds or no backend
+ * has the given PID. On timeout, a warning is emitted. Self termination is
+ * allowed but waiting is not, because once the backend is self-terminated
+ * there is no point in waiting for it to go away. If wait input argument is
+ * false (which is default), then this function just signals the backend and
+ * exits.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int	 pid;
+	int	 r;
+	bool wait;
+	int timeout;
+	bool result;
+
+	pid = PG_GETARG_INT32(0);
+	wait = PG_GETARG_BOOL(1);
+
+	/* If asked to wait, check whether the timeout value is valid or not. */
+	if (wait && pid != MyProcPid)
+	{
+		timeout = PG_GETARG_INT64(2);
+
+		if (timeout <= 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					 errmsg("\"timeout\" must not be negative or zero")));
+	}
+
+	r = pg_signal_backend(pid, SIGTERM);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
@@ -146,7 +235,53 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
 
-	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
+	result = (r == SIGNAL_BACKEND_SUCCESS);
+
+	/*
+	 * Wait only if requested and the termination is successful. Self
+	 * termination is allowed but waiting is not.
+	 */
+	if (wait && pid != MyProcPid && result)
+		result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the timeout milliseconds occurs. If the PID is of the backend which
+ * issued this function, either timeout can occur or the function can succeed
+ * if the backend gets terminated by some other process, say postmaster.
+ */
+Datum
+pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
+{
+	int	 pid;
+	int64	timeout;
+	PGPROC	*proc = NULL;
+	bool	result = false;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("\"timeout\" must not be negative or zero")));
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..3f158de1dc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6143,9 +6143,14 @@
 { oid => '2171', descr => 'cancel a server process\' current query',
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
-{ oid => '2096', descr => 'terminate a server process',
+{ oid => '2096', descr => 'terminate a backend process and if specified, wait for its exit or until timeout occurs',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
-  proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+  proargtypes => 'int4 bool int8', proargnames => '{pid,wait,timeout}',
+  prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs',
+  proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..bbfe2f2f04 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -967,7 +967,8 @@ typedef enum
  */
 typedef enum
 {
-	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+	WAIT_EVENT_BACKEND_TERMINATION = PG_WAIT_IPC,
+	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE,
 	WAIT_EVENT_BGWORKER_SHUTDOWN,
 	WAIT_EVENT_BGWORKER_STARTUP,
 	WAIT_EVENT_BTREE_PAGE,
-- 
2.25.1

#36Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#35)
Re: A new function to wait for the backend exit after termination

At Wed, 17 Mar 2021 07:01:39 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

Attaching v10 patch for further review.

The time-out mechanism doesn't count remainingtime as expected,
concretely it does the following.

do {
kill();
WaitLatch(WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, waittime);
ResetLatch(MyLatch);
remainingtime -= waittime;
} while (remainingtime > 0);

So, the WaitLatch doesn't consume as much time as the set waittime in
case of latch set. remainingtime reduces faster than the real at the
iteration.

It wouldn't happen actually but I concern about PID recycling. We can
make sure to get rid of the fear by checking for our BEENTRY instead
of PID. However, it seems to me that some additional function is
needed in pgstat.c so that we can check the realtime value of
PgBackendStatus, which might be too much.

+	/* If asked to wait, check whether the timeout value is valid or not. */
+	if (wait && pid != MyProcPid)
+	{
+		timeout = PG_GETARG_INT64(2);
+
+		if (timeout <= 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					 errmsg("\"timeout\" must not be negative or zero")));

This means that pg_terminate_backend accepts negative timeouts when
terminating myself, which looks odd.

Is there any reason to reject 0 as timeout?

+	 * Wait only if requested and the termination is successful. Self
+	 * termination is allowed but waiting is not.
+	 */
+	if (wait && pid != MyProcPid && result)
+		result = pg_wait_until_termination(pid, timeout);

Why don't we wait for myself to be terminated? There's no guarantee
that myself will be terminated without failure. (I agree that that is
not so useful, but I think there's no reason not to do so.)

The first suggested signature for pg_terminate_backend() with timeout
was pg_terminate_backend(pid, timeout). The current signature (pid,
wait?, timeout) looks redundant. Maybe the reason for rejecting 0
astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
can wait forever in that case (as other features does). On the other
hand pg_terminate_backend(pid, false, 100) is apparently odd but this
patch doesn't seem to reject it. If there's no considerable reason
for the current signature, I would suggest that:

pg_terminate_backend(pid, timeout), where it waits forever if timeout
is zero and waits for the timeout if positive. Negative values are not
accepted.

That being said, I didn't find the disucssion about allowing default
timeout value by separating the boolean, if it is the consensus on
this thread, sorry for the noise.

+				ereport(WARNING,
+						(errmsg("could not check the existence of the backend with PID %d: %m",
+								pid)));
+				return false;

I think this is worth ERROR. We can avoid this handling if we look
into PgBackendEntry instead.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#37Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#36)
Re: A new function to wait for the backend exit after termination

On Wed, Mar 17, 2021 at 8:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 17 Mar 2021 07:01:39 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

Attaching v10 patch for further review.

The time-out mechanism doesn't count remainingtime as expected,
concretely it does the following.

do {
kill();
WaitLatch(WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, waittime);
ResetLatch(MyLatch);
remainingtime -= waittime;
} while (remainingtime > 0);

So, the WaitLatch doesn't consume as much time as the set waittime in
case of latch set. remainingtime reduces faster than the real at the
iteration.

WaitLatch can exit without waiting for the waittime duration whenever
the MyLatch is set (SetLatch). Now the question is how frequently
SetLatch can get called in a backend? For instance, if we keep calling
pg_reload_conf in any of the backends in the cluster, then the
SetLatch will be called and the timeout in pg_wait_until_termination
will be reached fastly. I see that this problem can also exist in
case of pg_promote function. Similarly it may exist in other places
where we have WaitLatch for timeouts.

IMO, the frequency of SetLatch calls may not be that much in real time
scenarios. If at all, the latch gets set too frequently, then the
terminate and wait functions might timeout earlier. But is it a
critical problem to worry about? (IMHO, it's not that critical) If
yes, we might as well need to fix it (I don't know how?) in other
critical areas like pg_promote?

It wouldn't happen actually but I concern about PID recycling. We can
make sure to get rid of the fear by checking for our BEENTRY instead
of PID. However, it seems to me that some additional function is
needed in pgstat.c so that we can check the realtime value of
PgBackendStatus, which might be too much.

The aim of the wait logic is to ensure that the process is gone from
the system processes that is why using kill(), not it's entries are
gone from the shared memory.

+       /* If asked to wait, check whether the timeout value is valid or not. */
+       if (wait && pid != MyProcPid)
+       {
+               timeout = PG_GETARG_INT64(2);
+
+               if (timeout <= 0)
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+                                        errmsg("\"timeout\" must not be negative or zero")));

This means that pg_terminate_backend accepts negative timeouts when
terminating myself, which looks odd.

I will change this.

Is there any reason to reject 0 as timeout?

Actually, timeout 0 should mean that "don't wait" and we can error out
on negative values. Thoughts?

+        * Wait only if requested and the termination is successful. Self
+        * termination is allowed but waiting is not.
+        */
+       if (wait && pid != MyProcPid && result)
+               result = pg_wait_until_termination(pid, timeout);

Why don't we wait for myself to be terminated? There's no guarantee
that myself will be terminated without failure. (I agree that that is
not so useful, but I think there's no reason not to do so.)

We could programmatically allow it to wait in case of self termination
and it doesn't make any difference to the user, they would see
"Terminating connection due to administrator command" FATAL error. I
can remove pid != MyProcPid.

The first suggested signature for pg_terminate_backend() with timeout
was pg_terminate_backend(pid, timeout). The current signature (pid,
wait?, timeout) looks redundant. Maybe the reason for rejecting 0
astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
can wait forever in that case (as other features does). On the other
hand pg_terminate_backend(pid, false, 100) is apparently odd but this
patch doesn't seem to reject it. If there's no considerable reason
for the current signature, I would suggest that:

pg_terminate_backend(pid, timeout), where it waits forever if timeout
is zero and waits for the timeout if positive. Negative values are not
accepted.

So, as stated above, how about a timeout 0 (which is default) telling
"don't wait", negative error out, a positive milliseconds value
indicating that we should wait after termination?

And for pg_wait_for_backend_termination timeout 0 or negative, we error out?

IMO, the above semantics are better than timeout 0 meaning "wait
forever". Thoughts?

+                               ereport(WARNING,
+                                               (errmsg("could not check the existence of the backend with PID %d: %m",
+                                                               pid)));
+                               return false;

I think this is worth ERROR. We can avoid this handling if we look
into PgBackendEntry instead.

I will change it to ERROR.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#38Zhihong Yu
zyu@yugabyte.com
In reply to: Bharath Rupireddy (#37)
Re: A new function to wait for the backend exit after termination

Hi,
w.r.t. WaitLatch(), if its return value is WL_TIMEOUT, we know the
specified timeout has elapsed.
It seems WaitLatch() can be enhanced to also return the actual duration of
the wait.
This way, the caller can utilize the duration directly.

As for other places where WaitLatch() is called, similar change can be
applied on a per-case basis (with separate patches, not under this topic).

Cheers

On Wed, Mar 17, 2021 at 2:04 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Wed, Mar 17, 2021 at 8:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 17 Mar 2021 07:01:39 +0530, Bharath Rupireddy <

bharath.rupireddyforpostgres@gmail.com> wrote in

Attaching v10 patch for further review.

The time-out mechanism doesn't count remainingtime as expected,
concretely it does the following.

do {
kill();
WaitLatch(WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, waittime);
ResetLatch(MyLatch);
remainingtime -= waittime;
} while (remainingtime > 0);

So, the WaitLatch doesn't consume as much time as the set waittime in
case of latch set. remainingtime reduces faster than the real at the
iteration.

WaitLatch can exit without waiting for the waittime duration whenever
the MyLatch is set (SetLatch). Now the question is how frequently
SetLatch can get called in a backend? For instance, if we keep calling
pg_reload_conf in any of the backends in the cluster, then the
SetLatch will be called and the timeout in pg_wait_until_termination
will be reached fastly. I see that this problem can also exist in
case of pg_promote function. Similarly it may exist in other places
where we have WaitLatch for timeouts.

IMO, the frequency of SetLatch calls may not be that much in real time
scenarios. If at all, the latch gets set too frequently, then the
terminate and wait functions might timeout earlier. But is it a
critical problem to worry about? (IMHO, it's not that critical) If
yes, we might as well need to fix it (I don't know how?) in other
critical areas like pg_promote?

It wouldn't happen actually but I concern about PID recycling. We can
make sure to get rid of the fear by checking for our BEENTRY instead
of PID. However, it seems to me that some additional function is
needed in pgstat.c so that we can check the realtime value of
PgBackendStatus, which might be too much.

The aim of the wait logic is to ensure that the process is gone from
the system processes that is why using kill(), not it's entries are
gone from the shared memory.

+ /* If asked to wait, check whether the timeout value is valid or

not. */

+       if (wait && pid != MyProcPid)
+       {
+               timeout = PG_GETARG_INT64(2);
+
+               if (timeout <= 0)
+                       ereport(ERROR,
+

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),

+ errmsg("\"timeout\" must not be

negative or zero")));

This means that pg_terminate_backend accepts negative timeouts when
terminating myself, which looks odd.

I will change this.

Is there any reason to reject 0 as timeout?

Actually, timeout 0 should mean that "don't wait" and we can error out
on negative values. Thoughts?

+        * Wait only if requested and the termination is successful. Self
+        * termination is allowed but waiting is not.
+        */
+       if (wait && pid != MyProcPid && result)
+               result = pg_wait_until_termination(pid, timeout);

Why don't we wait for myself to be terminated? There's no guarantee
that myself will be terminated without failure. (I agree that that is
not so useful, but I think there's no reason not to do so.)

We could programmatically allow it to wait in case of self termination
and it doesn't make any difference to the user, they would see
"Terminating connection due to administrator command" FATAL error. I
can remove pid != MyProcPid.

The first suggested signature for pg_terminate_backend() with timeout
was pg_terminate_backend(pid, timeout). The current signature (pid,
wait?, timeout) looks redundant. Maybe the reason for rejecting 0
astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
can wait forever in that case (as other features does). On the other
hand pg_terminate_backend(pid, false, 100) is apparently odd but this
patch doesn't seem to reject it. If there's no considerable reason
for the current signature, I would suggest that:

pg_terminate_backend(pid, timeout), where it waits forever if timeout
is zero and waits for the timeout if positive. Negative values are not
accepted.

So, as stated above, how about a timeout 0 (which is default) telling
"don't wait", negative error out, a positive milliseconds value
indicating that we should wait after termination?

And for pg_wait_for_backend_termination timeout 0 or negative, we error
out?

IMO, the above semantics are better than timeout 0 meaning "wait
forever". Thoughts?

+                               ereport(WARNING,
+                                               (errmsg("could not check

the existence of the backend with PID %d: %m",

+                                                               pid)));
+                               return false;

I think this is worth ERROR. We can avoid this handling if we look
into PgBackendEntry instead.

I will change it to ERROR.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#39Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#36)
Re: A new function to wait for the backend exit after termination

On 2021/03/17 11:58, Kyotaro Horiguchi wrote:

The first suggested signature for pg_terminate_backend() with timeout
was pg_terminate_backend(pid, timeout). The current signature (pid,
wait?, timeout) looks redundant. Maybe the reason for rejecting 0
astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
can wait forever in that case (as other features does).

I'm afraid that "waiting forever" can cause something like deadlock situation,
as follows. We have no mechanism to detect this for now.

1. backend 1 took the lock on the relation A.
2. backend 2 took the lock on the relation B.
3. backend 1 tries to take the lock on the relation B and is waiting for
the lock to be released.
4. backend 2 accidentally executes pg_wait_for_backend_termination() with
the pid of backend 1, and then is waiting for backend 1 to be terminated.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#40Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Fujii Masao (#39)
Re: A new function to wait for the backend exit after termination

On Thu, Mar 18, 2021 at 12:46 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/03/17 11:58, Kyotaro Horiguchi wrote:

The first suggested signature for pg_terminate_backend() with timeout
was pg_terminate_backend(pid, timeout). The current signature (pid,
wait?, timeout) looks redundant. Maybe the reason for rejecting 0
astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
can wait forever in that case (as other features does).

I'm afraid that "waiting forever" can cause something like deadlock situation,
as follows. We have no mechanism to detect this for now.

1. backend 1 took the lock on the relation A.
2. backend 2 took the lock on the relation B.
3. backend 1 tries to take the lock on the relation B and is waiting for
the lock to be released.
4. backend 2 accidentally executes pg_wait_for_backend_termination() with
the pid of backend 1, and then is waiting for backend 1 to be terminated.

Yeah this can happen.

So, as stated upthread, how about a timeout 0 (which is default)
telling "don't wait", erroring out on negative value and when
specified a positive milliseconds value, then wait for that amount of
time. With this semantics, we can remove the wait flag for
pg_terminate_backend(pid, 0). Thoughts?

And for pg_wait_for_backend_termination timeout 0 or negative, we
error out. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#41Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#40)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Thu, Mar 18, 2021 at 1:11 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Mar 18, 2021 at 12:46 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/03/17 11:58, Kyotaro Horiguchi wrote:

The first suggested signature for pg_terminate_backend() with timeout
was pg_terminate_backend(pid, timeout). The current signature (pid,
wait?, timeout) looks redundant. Maybe the reason for rejecting 0
astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
can wait forever in that case (as other features does).

I'm afraid that "waiting forever" can cause something like deadlock situation,
as follows. We have no mechanism to detect this for now.

1. backend 1 took the lock on the relation A.
2. backend 2 took the lock on the relation B.
3. backend 1 tries to take the lock on the relation B and is waiting for
the lock to be released.
4. backend 2 accidentally executes pg_wait_for_backend_termination() with
the pid of backend 1, and then is waiting for backend 1 to be terminated.

Yeah this can happen.

So, as stated upthread, how about a timeout 0 (which is default)
telling "don't wait", erroring out on negative value and when
specified a positive milliseconds value, then wait for that amount of
time. With this semantics, we can remove the wait flag for
pg_terminate_backend(pid, 0). Thoughts?

And for pg_wait_for_backend_termination timeout 0 or negative, we
error out. Thoughts?

Attaching v11 patch that removed the wait boolean flag in the
pg_terminate_backend and timeout 0 indicates "no wait", negative value
"errors out", positive value "waits for those many milliseconds". Also
addressed other review comments that I received upthread. Please
review v11 further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v11-0001-pg_terminate_backend-with-wait-and-timeout.patchapplication/x-patch; name=v11-0001-pg_terminate_backend-with-wait-and-timeout.patchDownload
From 96c74184bdf378cb6105ae7d24bfc7a6888b955a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 19 Mar 2021 11:19:36 +0530
Subject: [PATCH v11] pg_terminate_backend with wait and timeout

This patch adds extra default argument timeout (with default
value 0 meaning don't wait), to pg_terminate_backend which
will be used to terminate and wait timeout milliseconds for
the backend's termination.

This patch also adds a new function
pg_wait_for_backend_termination(pid, timeout) which checks
for the existence of the backend with given PID and waits
timeout milliseconds for its termination.
---
 doc/src/sgml/func.sgml                |  32 ++++++-
 doc/src/sgml/monitoring.sgml          |   4 +
 src/backend/catalog/system_views.sql  |  10 ++
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/storage/ipc/signalfuncs.c | 132 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat       |   9 +-
 src/include/pgstat.h                  |   3 +-
 7 files changed, 184 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..523e6dd72a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,7 +24821,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -24829,7 +24829,35 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When
+        <parameter>timeout</parameter> is not specified, this function sends
+        SIGTERM to the backend with the given process ID without waiting for
+        its actual termination and returns <literal>true</literal>. Sometimes
+        the backend process may still exist. With <parameter>timeout</parameter>
+        specified in milliseconds, the function ensures that the backend
+        process is actually goes away from the system processes. It keeps
+        checking for the existence of the backend process until the given
+        <parameter>timeout</parameter> occurs and returns <literal>true</literal>
+        if the process doesn't exit. On timeout a warning is emitted and
+        <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_for_backend_termination</primary>
+        </indexterm>
+        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>5000</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Checks the existence of backend process with specified process ID. It
+        waits until the given <parameter>timeout</parameter> or the default
+        5000 milliseconds or 5 seconds. <literal>true</literal> is returned
+        when the backend is found to be not existing within the
+        <parameter>timeout</parameter> milliseconds. On timeout, a warning is
+        emitted and <literal>false</literal> is returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..f362184d5e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1569,6 +1569,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
     </thead>
 
     <tbody>
+    <row>
+      <entry><literal>BackendTermination</literal></entry>
+      <entry>Waiting for the termination of a backend.</entry>
+     </row>
      <row>
       <entry><literal>BackupWaitWalArchive</literal></entry>
       <entry>Waiting for WAL files required for a backup to be successfully
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..192c0ae2bd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1277,6 +1277,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, timeout int8 DEFAULT 0)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 208a33692f..78729f1b92 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3998,6 +3998,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 
 	switch (w)
 	{
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 		case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
 			event_name = "BackupWaitWalArchive";
 			break;
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 69fe23a256..1e141fba7c 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -126,15 +127,96 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Signal to terminate a backend process.  This is allowed if you are a member
- * of the role whose process is being terminated.
+ * Wait until there is no backend process with the given PID and return true.
+ * On timeout, a warning is emitted and false is returned.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 100;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check existence of the backend. If it doesn't exist, then check for the
+	 * errno ESRCH that confirms it and return true. If the backend still
+	 * exists, then wait for waittime milliseconds, again check for the
+	 * existence. Repeat this until timeout or an error occurs or a pending
+	 * interrupt such as query cancel gets processed.
+	 */
+	do
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INTERNAL_ERROR),
+						 errmsg("could not check the existence of the backend with PID %d: %m",
+								 pid)));
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		ResetLatch(MyLatch);
+
+		remainingtime -= waittime;
+	} while (remainingtime > 0);
+
+	ereport(WARNING,
+			(errmsg("backend with PID %d did not terminate within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal to terminate a backend process. This is allowed if you are a member
+ * of the role whose process is being terminated. If timeout input argument is
+ * 0 (which is default), then this function just signals the backend and
+ * doesn't wait. Otherwise it waits until given the timeout milliseconds or no
+ * process has the given PID and returns true. On timeout, a warning is emitted
+ * and false is returned.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int	 pid;
+	int	 r;
+	int timeout;
+	bool wait;
+	bool result;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					errmsg("\"timeout\" must not be negative")));
+
+	/* If timeout is zero (which is default), then don't wait. */
+	wait = (timeout == 0) ? false : true;
+
+	r = pg_signal_backend(pid, SIGTERM);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
@@ -146,7 +228,49 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
 
-	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
+	result = (r == SIGNAL_BACKEND_SUCCESS);
+
+	/* Wait only if requested and the termination is successful. */
+	if (wait && result)
+		result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the given timeout milliseconds occurs. Returns true if the backend
+ * doesn't exist. On Timeout a warning is emitted and false is returned.
+ */
+Datum
+pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
+{
+	int	 pid;
+	int64	timeout;
+	PGPROC	*proc = NULL;
+	bool	result = false;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("\"timeout\" must not be negative or zero")));
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..2f5fa24bf0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6143,9 +6143,14 @@
 { oid => '2171', descr => 'cancel a server process\' current query',
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
-{ oid => '2096', descr => 'terminate a server process',
+{ oid => '2096', descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
-  proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs',
+  proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..bbfe2f2f04 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -967,7 +967,8 @@ typedef enum
  */
 typedef enum
 {
-	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC,
+	WAIT_EVENT_BACKEND_TERMINATION = PG_WAIT_IPC,
+	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE,
 	WAIT_EVENT_BGWORKER_SHUTDOWN,
 	WAIT_EVENT_BGWORKER_STARTUP,
 	WAIT_EVENT_BTREE_PAGE,
-- 
2.25.1

#42Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#41)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Fri, Mar 19, 2021 at 11:37 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Attaching v11 patch that removed the wait boolean flag in the
pg_terminate_backend and timeout 0 indicates "no wait", negative value
"errors out", positive value "waits for those many milliseconds". Also
addressed other review comments that I received upthread. Please
review v11 further.

Attaching v12 patch after rebasing onto the latest master.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v12-0001-pg_terminate_backend-with-wait-and-timeout.patchapplication/octet-stream; name=v12-0001-pg_terminate_backend-with-wait-and-timeout.patchDownload
From c28058cccf77eb56369ef2f468f88884d0de369f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 5 Apr 2021 08:44:44 +0530
Subject: [PATCH v12] pg_terminate_backend with wait and timeout

This patch adds extra default argument timeout (with default
value 0 meaning don't wait), to pg_terminate_backend which
will be used to terminate and wait timeout milliseconds for
the backend's termination.

This patch also adds a new function
pg_wait_for_backend_termination(pid, timeout) which checks
for the existence of the backend with given PID and waits
timeout milliseconds for its termination.
---
 doc/src/sgml/func.sgml                  |  32 +++++-
 doc/src/sgml/monitoring.sgml            |   4 +
 src/backend/catalog/system_views.sql    |  10 ++
 src/backend/storage/ipc/signalfuncs.c   | 132 +++++++++++++++++++++++-
 src/backend/utils/activity/wait_event.c |   3 +
 src/include/catalog/pg_proc.dat         |   9 +-
 src/include/utils/wait_event.h          |   1 +
 7 files changed, 183 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3cf243a16a..793416772b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24953,7 +24953,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <indexterm>
          <primary>pg_terminate_backend</primary>
         </indexterm>
-        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> )
+        <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -24961,7 +24961,35 @@ SELECT collation for ('foo' COLLATE "de_DE");
         specified process ID.  This is also allowed if the calling role
         is a member of the role whose backend is being terminated or the
         calling role has been granted <literal>pg_signal_backend</literal>,
-        however only superusers can terminate superuser backends.
+        however only superusers can terminate superuser backends. When
+        <parameter>timeout</parameter> is not specified, this function sends
+        SIGTERM to the backend with the given process ID without waiting for
+        its actual termination and returns <literal>true</literal>. Sometimes
+        the backend process may still exist. With <parameter>timeout</parameter>
+        specified in milliseconds, the function ensures that the backend
+        process is actually goes away from the system processes. It keeps
+        checking for the existence of the backend process until the given
+        <parameter>timeout</parameter> occurs and returns <literal>true</literal>
+        if the process doesn't exit. On timeout a warning is emitted and
+        <literal>false</literal> is returned.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wait_for_backend_termination</primary>
+        </indexterm>
+        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>5000</literal> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Checks the existence of backend process with specified process ID. It
+        waits until the given <parameter>timeout</parameter> or the default
+        5000 milliseconds or 5 seconds. <literal>true</literal> is returned
+        when the backend is found to be not existing within the
+        <parameter>timeout</parameter> milliseconds. On timeout, a warning is
+        emitted and <literal>false</literal> is returned.
        </para></entry>
       </row>
      </tbody>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..caa1918656 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1569,6 +1569,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for subplan nodes of an <literal>Append</literal> plan
        node to be ready.</entry>
      </row>
+    <row>
+      <entry><literal>BackendTermination</literal></entry>
+      <entry>Waiting for the termination of a backend.</entry>
+     </row>
      <row>
       <entry><literal>BackupWaitWalArchive</literal></entry>
       <entry>Waiting for WAL files required for a backup to be successfully
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..137ac8569b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1346,6 +1346,16 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION
+  pg_terminate_backend(pid integer, timeout int8 DEFAULT 0)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
+  PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION
+  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000)
+  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
+  PARALLEL SAFE;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 8b55ff6e76..ec2da29141 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -18,6 +18,7 @@
 
 #include "catalog/pg_authid.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "postmaster/syslogger.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
@@ -126,15 +127,96 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Signal to terminate a backend process.  This is allowed if you are a member
- * of the role whose process is being terminated.
+ * Wait until there is no backend process with the given PID and return true.
+ * On timeout, a warning is emitted and false is returned.
+ */
+static bool
+pg_wait_until_termination(int pid, int64 timeout)
+{
+	/*
+	 * Wait in steps of waittime milliseconds until this function exits or
+	 * timeout.
+	 */
+	int64	waittime = 100;
+	/*
+	 * Initially remaining time is the entire timeout specified by the user.
+	 */
+	int64	remainingtime = timeout;
+
+	/*
+	 * Check existence of the backend. If it doesn't exist, then check for the
+	 * errno ESRCH that confirms it and return true. If the backend still
+	 * exists, then wait for waittime milliseconds, again check for the
+	 * existence. Repeat this until timeout or an error occurs or a pending
+	 * interrupt such as query cancel gets processed.
+	 */
+	do
+	{
+		if (remainingtime < waittime)
+			waittime = remainingtime;
+
+		if (kill(pid, 0) == -1)
+		{
+			if (errno == ESRCH)
+				return true;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INTERNAL_ERROR),
+						 errmsg("could not check the existence of the backend with PID %d: %m",
+								 pid)));
+		}
+
+		/* Process interrupts, if any, before going for wait. */
+		CHECK_FOR_INTERRUPTS();
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 waittime,
+						 WAIT_EVENT_BACKEND_TERMINATION);
+
+		ResetLatch(MyLatch);
+
+		remainingtime -= waittime;
+	} while (remainingtime > 0);
+
+	ereport(WARNING,
+			(errmsg("backend with PID %d did not terminate within %lld milliseconds",
+					pid, (long long int) timeout)));
+
+	return false;
+}
+
+/*
+ * Signal to terminate a backend process. This is allowed if you are a member
+ * of the role whose process is being terminated. If timeout input argument is
+ * 0 (which is default), then this function just signals the backend and
+ * doesn't wait. Otherwise it waits until given the timeout milliseconds or no
+ * process has the given PID and returns true. On timeout, a warning is emitted
+ * and false is returned.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int	 pid;
+	int	 r;
+	int timeout;
+	bool wait;
+	bool result;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					errmsg("\"timeout\" must not be negative")));
+
+	/* If timeout is zero (which is default), then don't wait. */
+	wait = (timeout == 0) ? false : true;
+
+	r = pg_signal_backend(pid, SIGTERM);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
@@ -146,7 +228,49 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
 
-	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
+	result = (r == SIGNAL_BACKEND_SUCCESS);
+
+	/* Wait only if requested and the termination is successful. */
+	if (wait && result)
+		result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
+}
+
+/*
+ * This function waits for a backend with the given PID to be terminated or
+ * until the given timeout milliseconds occurs. Returns true if the backend
+ * doesn't exist. On Timeout a warning is emitted and false is returned.
+ */
+Datum
+pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
+{
+	int	 pid;
+	int64	timeout;
+	PGPROC	*proc = NULL;
+	bool	result = false;
+
+	pid = PG_GETARG_INT32(0);
+	timeout = PG_GETARG_INT64(1);
+
+	if (timeout <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("\"timeout\" must not be negative or zero")));
+
+	proc = BackendPidGetProc(pid);
+
+	if (proc == NULL)
+	{
+		ereport(WARNING,
+				(errmsg("PID %d is not a PostgreSQL server process", pid)));
+
+		PG_RETURN_BOOL(result);
+	}
+
+	result = pg_wait_until_termination(pid, timeout);
+
+	PG_RETURN_BOOL(result);
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index accc1eb577..89b5b8b7b9 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -313,6 +313,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_APPEND_READY:
 			event_name = "AppendReady";
 			break;
+		case WAIT_EVENT_BACKEND_TERMINATION:
+			event_name = "BackendTermination";
+			break;
 		case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
 			event_name = "BackupWaitWalArchive";
 			break;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..e04d9d4141 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6168,9 +6168,14 @@
 { oid => '2171', descr => 'cancel a server process\' current query',
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
-{ oid => '2096', descr => 'terminate a server process',
+{ oid => '2096', descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
-  proargtypes => 'int4', prosrc => 'pg_terminate_backend' },
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_terminate_backend' },
+{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs',
+  proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
+  prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 44448b48ec..47accc5ffe 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -80,6 +80,7 @@ typedef enum
 typedef enum
 {
 	WAIT_EVENT_APPEND_READY = PG_WAIT_IPC,
+	WAIT_EVENT_BACKEND_TERMINATION,
 	WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE,
 	WAIT_EVENT_BGWORKER_SHUTDOWN,
 	WAIT_EVENT_BGWORKER_STARTUP,
-- 
2.25.1

#43Magnus Hagander
magnus@hagander.net
In reply to: Bharath Rupireddy (#42)
Re: A new function to wait for the backend exit after termination

On Mon, Apr 5, 2021 at 5:21 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Mar 19, 2021 at 11:37 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Attaching v11 patch that removed the wait boolean flag in the
pg_terminate_backend and timeout 0 indicates "no wait", negative value
"errors out", positive value "waits for those many milliseconds". Also
addressed other review comments that I received upthread. Please
review v11 further.

Attaching v12 patch after rebasing onto the latest master.

I've applied this patch with some minor changes.

I rewrote some parts of the documentation to make it more focused on
the end user rather than the implementation. I also made a small
simplification in pg_terminate_backend() which removes the "wait"
variable (seems like a bit of a leftover since the time when it was a
separate argument). And picked a correct oid for the function (oids
8000-9999 should be used for new patches, 16386 is in the user area of
oids)

Thanks!

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#44Noah Misch
noah@leadboat.com
In reply to: Magnus Hagander (#43)
Re: A new function to wait for the backend exit after termination

On Thu, Apr 08, 2021 at 11:41:17AM +0200, Magnus Hagander wrote:

I've applied this patch with some minor changes.

I wondered if the new pg_wait_for_backend_termination() could replace
regress.c:wait_pid(). I think it can't, because the new function requires the
backend to still be present in the procarray:

proc = BackendPidGetProc(pid);

if (proc == NULL)
{
ereport(WARNING,
(errmsg("PID %d is not a PostgreSQL server process", pid)));

PG_RETURN_BOOL(false);
}

PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));

If a backend has left the procarray but not yet left the kernel process table,
this function will issue the warning and not wait for the final exit. Given
that limitation, is pg_wait_for_backend_termination() useful enough? If
waiting for procarray departure is enough, should pg_wait_until_termination()
check BackendPidGetProc(pid) instead of kill(0, pid), so it can return
earlier? I can see the value of adding the pg_terminate_backend() timeout
argument, in any case.

#45Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Noah Misch (#44)
Re: A new function to wait for the backend exit after termination

On Tue, Jun 1, 2021 at 9:19 AM Noah Misch <noah@leadboat.com> wrote:

On Thu, Apr 08, 2021 at 11:41:17AM +0200, Magnus Hagander wrote:

I've applied this patch with some minor changes.

Thanks for taking a look at this function.

I wondered if the new pg_wait_for_backend_termination() could replace
regress.c:wait_pid().

I was earlier thinking of replacing the wait_pid() with the new
function but arrived at a similar conclusion as yours.

I think it can't, because the new function requires the
backend to still be present in the procarray:

proc = BackendPidGetProc(pid);

if (proc == NULL)
{
ereport(WARNING,
(errmsg("PID %d is not a PostgreSQL server process", pid)));

PG_RETURN_BOOL(false);
}

PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));

If a backend has left the procarray but not yet left the kernel process table,
this function will issue the warning and not wait for the final exit.

Yes, if the backend is not in procarray but still in the kernel
process table, it emits a warning "PID %d is not a PostgreSQL server
process" and returns false.

Given that limitation, is pg_wait_for_backend_termination() useful enough? If
waiting for procarray departure is enough, should pg_wait_until_termination()
check BackendPidGetProc(pid) instead of kill(0, pid), so it can return
earlier?

We can just remove BackendPidGetProc(pid) in
pg_wait_for_backend_termination. With this change, we can get rid of
the wait_pid() from regress.c. But, my concern is that the
pg_wait_for_backend_termination() can also check non-postgres server
process pid. Is this okay? In that case, this function becomes a
generic(OS level function) rather than a postgres server specific
function. I'm not sure if all agree to that. Thoughts?

I can see the value of adding the pg_terminate_backend() timeout
argument, in any case.

True. We can leave pg_terminate_backend() as is.

With Regards,
Bharath Rupireddy.

#46Noah Misch
noah@leadboat.com
In reply to: Bharath Rupireddy (#45)
Re: A new function to wait for the backend exit after termination

On Tue, Jun 01, 2021 at 01:25:24PM +0530, Bharath Rupireddy wrote:

On Tue, Jun 1, 2021 at 9:19 AM Noah Misch <noah@leadboat.com> wrote:

Given that limitation, is pg_wait_for_backend_termination() useful enough? If
waiting for procarray departure is enough, should pg_wait_until_termination()
check BackendPidGetProc(pid) instead of kill(0, pid), so it can return
earlier?

We can just remove BackendPidGetProc(pid) in
pg_wait_for_backend_termination. With this change, we can get rid of
the wait_pid() from regress.c. But, my concern is that the
pg_wait_for_backend_termination() can also check non-postgres server
process pid. Is this okay?

It may or may not be okay. I would not feel good about it.

In that case, this function becomes a
generic(OS level function) rather than a postgres server specific
function. I'm not sure if all agree to that. Thoughts?

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

#47Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Noah Misch (#46)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 5, 2021 at 7:02 AM Noah Misch <noah@leadboat.com> wrote:

On Tue, Jun 01, 2021 at 01:25:24PM +0530, Bharath Rupireddy wrote:

On Tue, Jun 1, 2021 at 9:19 AM Noah Misch <noah@leadboat.com> wrote:

Given that limitation, is pg_wait_for_backend_termination() useful enough? If
waiting for procarray departure is enough, should pg_wait_until_termination()
check BackendPidGetProc(pid) instead of kill(0, pid), so it can return
earlier?

We can just remove BackendPidGetProc(pid) in
pg_wait_for_backend_termination. With this change, we can get rid of
the wait_pid() from regress.c. But, my concern is that the
pg_wait_for_backend_termination() can also check non-postgres server
process pid. Is this okay?

It may or may not be okay. I would not feel good about it.

In that case, this function becomes a
generic(OS level function) rather than a postgres server specific
function. I'm not sure if all agree to that. Thoughts?

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

I was earlier thinking that the function
pg_wait_for_backend_termination() will be useful:
1) If the user wants to pg_terminate_backend(<<pid>>); and
pg_wait_for_backend_termination(<<pid>>, <<timeout>>); separately. It
seems like the proc array entry will be removed as part of SITERM
processing (see [1](gdb) bt #0 ProcArrayRemove (proc=0x55b27f26356c <CleanupInvalidationState+278>, latestXid=32764) at procarray.c:526 #1 0x000055b27f281c9d in RemoveProcFromArray (code=1, arg=0) at proc.c:812 #2 0x000055b27f2542ce in shmem_exit (code=1) at ipc.c:272 #3 0x000055b27f2540d5 in proc_exit_prepare (code=1) at ipc.c:194 #4 0x000055b27f254022 in proc_exit (code=1) at ipc.c:107 #5 0x000055b27f449479 in errfinish (filename=0x55b27f61cd65 "postgres.c", lineno=3191, funcname=0x55b27f61e770 <__func__.40727> "ProcessInterrupts") at elog.c:666 #6 0x000055b27f29097e in ProcessInterrupts () at postgres.c:3191 #7 0x000055b27f28cbf0 in ProcessClientReadInterrupt (blocked=true) at postgres.c:499) and the BackendPidGetProc will return NULL. So,
it's not useful here.
2) If the user wants to pg_wait_for_backend_termination(<<pid>>,
<<timeout>>);, thinking that some event might cause the backend to be
terminated within the <<timeout>>. So, it's still useful here.

[1]: (gdb) bt #0 ProcArrayRemove (proc=0x55b27f26356c <CleanupInvalidationState+278>, latestXid=32764) at procarray.c:526 #1 0x000055b27f281c9d in RemoveProcFromArray (code=1, arg=0) at proc.c:812 #2 0x000055b27f2542ce in shmem_exit (code=1) at ipc.c:272 #3 0x000055b27f2540d5 in proc_exit_prepare (code=1) at ipc.c:194 #4 0x000055b27f254022 in proc_exit (code=1) at ipc.c:107 #5 0x000055b27f449479 in errfinish (filename=0x55b27f61cd65 "postgres.c", lineno=3191, funcname=0x55b27f61e770 <__func__.40727> "ProcessInterrupts") at elog.c:666 #6 0x000055b27f29097e in ProcessInterrupts () at postgres.c:3191 #7 0x000055b27f28cbf0 in ProcessClientReadInterrupt (blocked=true) at postgres.c:499
(gdb) bt
#0 ProcArrayRemove (proc=0x55b27f26356c
<CleanupInvalidationState+278>, latestXid=32764)
at procarray.c:526
#1 0x000055b27f281c9d in RemoveProcFromArray (code=1, arg=0) at proc.c:812
#2 0x000055b27f2542ce in shmem_exit (code=1) at ipc.c:272
#3 0x000055b27f2540d5 in proc_exit_prepare (code=1) at ipc.c:194
#4 0x000055b27f254022 in proc_exit (code=1) at ipc.c:107
#5 0x000055b27f449479 in errfinish (filename=0x55b27f61cd65
"postgres.c", lineno=3191,
funcname=0x55b27f61e770 <__func__.40727> "ProcessInterrupts") at elog.c:666
#6 0x000055b27f29097e in ProcessInterrupts () at postgres.c:3191
#7 0x000055b27f28cbf0 in ProcessClientReadInterrupt (blocked=true) at
postgres.c:499

With Regards,
Bharath Rupireddy.

#48Noah Misch
noah@leadboat.com
In reply to: Bharath Rupireddy (#47)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 05, 2021 at 12:06:46PM +0530, Bharath Rupireddy wrote:

On Sat, Jun 5, 2021 at 7:02 AM Noah Misch <noah@leadboat.com> wrote:

On Tue, Jun 01, 2021 at 01:25:24PM +0530, Bharath Rupireddy wrote:

On Tue, Jun 1, 2021 at 9:19 AM Noah Misch <noah@leadboat.com> wrote:

Given that limitation, is pg_wait_for_backend_termination() useful enough? If
waiting for procarray departure is enough, should pg_wait_until_termination()
check BackendPidGetProc(pid) instead of kill(0, pid), so it can return
earlier?

We can just remove BackendPidGetProc(pid) in
pg_wait_for_backend_termination. With this change, we can get rid of
the wait_pid() from regress.c. But, my concern is that the
pg_wait_for_backend_termination() can also check non-postgres server
process pid. Is this okay?

It may or may not be okay. I would not feel good about it.

In that case, this function becomes a
generic(OS level function) rather than a postgres server specific
function. I'm not sure if all agree to that. Thoughts?

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

I was earlier thinking that the function
pg_wait_for_backend_termination() will be useful:
1) If the user wants to pg_terminate_backend(<<pid>>); and
pg_wait_for_backend_termination(<<pid>>, <<timeout>>); separately. It
seems like the proc array entry will be removed as part of SITERM
processing (see [1]) and the BackendPidGetProc will return NULL. So,
it's not useful here.
2) If the user wants to pg_wait_for_backend_termination(<<pid>>,
<<timeout>>);, thinking that some event might cause the backend to be
terminated within the <<timeout>>. So, it's still useful here.

That is factual. That pg_wait_for_backend_termination() appears to be useful
for (1) but isn't useful for (1) reduces its value. I think it reduces the
value slightly below zero. Relevant to that, if a user doesn't care about the
distinction between "backend has left the procarray" and "backend's PID has
left the kernel process table", that user can poll pg_stat_activity to achieve
the same level of certainty that pg_wait_for_backend_termination() offers.

#49Justin Pryzby
pryzby@telsasoft.com
In reply to: Noah Misch (#48)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

Is this an Opened Issue ?

--
Justin

#50Noah Misch
noah@leadboat.com
In reply to: Justin Pryzby (#49)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote:

On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

Is this an Opened Issue ?

An Open Item? Not really, since there's no objective defect. Nonetheless,
the attached is what I'd like to use.

Attachments:

pg_wait_for_backend_termination-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Remove pg_wait_for_backend_termination().
    
    It was unable wait on a backend that had already left the procarray.
    Users tolerant of that limitation can poll pg_stat_activity.  Other
    users can employ the "timeout" argument of pg_terminate_backend().
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20210605013236.GA208701@rfd.leadboat.com

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fbc80c1..d387a32 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25002,23 +25002,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <literal>false</literal> is returned.
        </para></entry>
       </row>
-
-      <row>
-       <entry role="func_table_entry"><para role="func_signature">
-        <indexterm>
-         <primary>pg_wait_for_backend_termination</primary>
-        </indexterm>
-        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>5000</literal> )
-        <returnvalue>boolean</returnvalue>
-       </para>
-       <para>
-        Waits for the backend process with the specified Process ID to
-        terminate.  If the process terminates before
-        the <parameter>timeout</parameter> (in milliseconds)
-        expires, <literal>true</literal> is returned.  On timeout, a warning
-        is emitted and <literal>false</literal> is returned.
-       </para></entry>
-      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml
index a2ad120..045f2b0 100644
--- a/doc/src/sgml/release-14.sgml
+++ b/doc/src/sgml/release-14.sgml
@@ -611,13 +611,7 @@ Author: Magnus Hagander <magnus@hagander.net>
 -->
 
       <para>
-       Add function <link
-       linkend="functions-admin-signal"><function>pg_wait_for_backend_termination()</function></link>
-       that waits for session exit (Bharath Rupireddy)
-      </para>
-
-      <para>
-       Also add a similar optional wait parameter to <link
+       Add an optional wait parameter to <link
        linkend="functions-admin-signal"><function>pg_terminate_backend()</function></link>
       </para>
      </listitem>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index a4373b1..a416e94 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -397,11 +397,6 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
   PARALLEL SAFE;
 
-CREATE OR REPLACE FUNCTION
-  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000)
-  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
-  PARALLEL SAFE;
-
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 8376994..5ed8b2e 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -231,42 +231,6 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Wait for a backend process with the given PID to exit or until the given
- * timeout milliseconds occurs. Returns true if the backend has exited. On
- * timeout a warning is emitted and false is returned.
- *
- * We allow any user to call this function, consistent with any user being
- * able to view the pid of the process in pg_stat_activity etc.
- */
-Datum
-pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
-{
-	int			pid;
-	int64		timeout;
-	PGPROC	   *proc = NULL;
-
-	pid = PG_GETARG_INT32(0);
-	timeout = PG_GETARG_INT64(1);
-
-	if (timeout <= 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("\"timeout\" must not be negative or zero")));
-
-	proc = BackendPidGetProc(pid);
-
-	if (proc == NULL)
-	{
-		ereport(WARNING,
-				(errmsg("PID %d is not a PostgreSQL server process", pid)));
-
-		PG_RETURN_BOOL(false);
-	}
-
-	PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
-}
-
-/*
  * Signal to reload the database configuration
  *
  * Permission checking for this function is managed through the normal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae4..2d45765 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6191,10 +6191,6 @@
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
   prosrc => 'pg_terminate_backend' },
-{ oid => '2137', descr => 'wait for a backend process exit or timeout occurs',
-  proname => 'pg_wait_for_backend_termination', provolatile => 'v',
-  prorettype => 'bool', proargtypes => 'int4 int8',
-  proargnames => '{pid,timeout}', prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
#51Justin Pryzby
pryzby@telsasoft.com
In reply to: Noah Misch (#50)
Re: A new function to wait for the backend exit after termination

On Fri, Jun 11, 2021 at 09:37:50PM -0700, Noah Misch wrote:

On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote:

On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

Is this an Opened Issue ?

An Open Item? Not really, since there's no objective defect. Nonetheless,
the attached is what I'd like to use.

I think of this as a list of stuff to avoid forgetting that needs to be
addressed or settled before the release.

If the value of the new function is marginal, it may be good to remove it, else
we're committed to supporting it.

Even if it's not removed, the descriptions should be cleaned up.

| src/include/catalog/pg_proc.dat- descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
=> I think doesn't need to change or mention the optional timeout at all

| src/include/catalog/pg_proc.dat-{ oid => '2137', descr => 'wait for a backend process exit or timeout occurs',
=> should just say "wait for a backend process to exit". The timeout has a default.

#52Noah Misch
noah@leadboat.com
In reply to: Justin Pryzby (#51)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 12, 2021 at 12:12:12AM -0500, Justin Pryzby wrote:

Even if it's not removed, the descriptions should be cleaned up.

| src/include/catalog/pg_proc.dat- descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
=> I think doesn't need to change or mention the optional timeout at all

Agreed, these strings generally give less detail. I can revert that to the
v13 wording, 'terminate a server process'.

#53Justin Pryzby
pryzby@telsasoft.com
In reply to: Noah Misch (#52)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 12, 2021 at 08:21:39AM -0700, Noah Misch wrote:

On Sat, Jun 12, 2021 at 12:12:12AM -0500, Justin Pryzby wrote:

Even if it's not removed, the descriptions should be cleaned up.

| src/include/catalog/pg_proc.dat- descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
=> I think doesn't need to change or mention the optional timeout at all

Agreed, these strings generally give less detail. I can revert that to the
v13 wording, 'terminate a server process'.

Maybe you'd also update the release notes.

I suggest some edits from the remaining parts of the original patch.

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fbc80c1403..b7383bc8aa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24998,7 +24998,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         milliseconds) and greater than zero, the function waits until the
         process is actually terminated or until the given time has passed. If
         the process is terminated, the function
-        returns <literal>true</literal>.  On timeout a warning is emitted and
+        returns <literal>true</literal>.  On timeout, a warning is emitted and
         <literal>false</literal> is returned.
        </para></entry>
       </row>
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 837699481c..f12c417854 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -187,12 +187,12 @@ pg_wait_until_termination(int pid, int64 timeout)
 }
 /*
- * Signal to terminate a backend process. This is allowed if you are a member
- * of the role whose process is being terminated. If timeout input argument is
- * 0 (which is default), then this function just signals the backend and
- * doesn't wait. Otherwise it waits until given the timeout milliseconds or no
- * process has the given PID and returns true. On timeout, a warning is emitted
- * and false is returned.
+ * Send a signal to terminate a backend process. This is allowed if you are a
+ * member of the role whose process is being terminated. If the timeout input
+ * argument is 0, then this function just signals the backend and returns true.
+ * If timeout is nonzero, then it waits until no process has the given PID; if
+ * the process ends within the timeout, true is returned, and if the timeout is
+ * exceeded, a warning is emitted and false is returned.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
@@ -201,7 +201,7 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 {
 	int			pid;
 	int			r;
-	int			timeout;
+	int			timeout; /* milliseconds */

pid = PG_GETARG_INT32(0);
timeout = PG_GETARG_INT64(1);

#54Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Noah Misch (#50)
1 attachment(s)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 12, 2021 at 10:07 AM Noah Misch <noah@leadboat.com> wrote:

On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote:

On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:

My preference is to remove pg_wait_for_backend_termination(). The use case
that prompted this thread used pg_terminate_backend(pid, 180000); it doesn't
need pg_wait_for_backend_termination().

Is this an Opened Issue ?

An Open Item? Not really, since there's no objective defect. Nonetheless,
the attached is what I'd like to use.

Thanks. +1 to remove the pg_wait_for_backend_termination function. The
patch basically looks good to me. I'm attaching an updated patch. I
corrected a minor typo in the commit message, took docs and code
comment changes suggested by Justin Pryzby.

Please note that release notes still have the headline "Add functions
to wait for backend termination" of the original commit that added the
pg_wait_for_backend_termination. With the removal of it, I'm not quite
sure if we retain the the commit message or tweak it to something like
"Add optional timeout parameter to pg_terminate_backend".
<!--
Author: Magnus Hagander <magnus@hagander.net>
2021-04-08 [aaf043257] Add functions to wait for backend termination
-->

With Regards,
Bharath Rupireddy.

Attachments:

v2-0001-Remove-pg_wait_for_backend_termination-function.patchapplication/octet-stream; name=v2-0001-Remove-pg_wait_for_backend_termination-function.patchDownload
From 157c17195b085907c19d4c935cc28076a2aca56c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 14 Jun 2021 07:00:09 -0700
Subject: [PATCH v2] Remove pg_wait_for_backend_termination function

It was unable to wait on a backend that had already left the
procarray. Users tolerant of that limitation can poll
pg_stat_activity.  Other users can employ the "timeout" argument
of pg_terminate_backend function.

Per suggestion from Justin Pryzby, reworded the docs, code comments
and release notes relating to the pg_terminate_backend function.
---
 doc/src/sgml/func.sgml                   | 19 +--------
 doc/src/sgml/release-14.sgml             |  8 +---
 src/backend/catalog/system_functions.sql |  5 ---
 src/backend/storage/ipc/signalfuncs.c    | 50 ++++--------------------
 src/include/catalog/pg_proc.dat          |  6 +--
 5 files changed, 10 insertions(+), 78 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fbc80c1403..764c3ad307 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24998,27 +24998,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
         milliseconds) and greater than zero, the function waits until the
         process is actually terminated or until the given time has passed. If
         the process is terminated, the function
-        returns <literal>true</literal>.  On timeout a warning is emitted and
+        returns <literal>true</literal>.  On timeout, a warning is emitted and
         <literal>false</literal> is returned.
        </para></entry>
       </row>
-
-      <row>
-       <entry role="func_table_entry"><para role="func_signature">
-        <indexterm>
-         <primary>pg_wait_for_backend_termination</primary>
-        </indexterm>
-        <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>5000</literal> )
-        <returnvalue>boolean</returnvalue>
-       </para>
-       <para>
-        Waits for the backend process with the specified Process ID to
-        terminate.  If the process terminates before
-        the <parameter>timeout</parameter> (in milliseconds)
-        expires, <literal>true</literal> is returned.  On timeout, a warning
-        is emitted and <literal>false</literal> is returned.
-       </para></entry>
-      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml
index 058ba7cd4e..89bf78500a 100644
--- a/doc/src/sgml/release-14.sgml
+++ b/doc/src/sgml/release-14.sgml
@@ -611,13 +611,7 @@ Author: Magnus Hagander <magnus@hagander.net>
 -->
 
       <para>
-       Add function <link
-       linkend="functions-admin-signal"><function>pg_wait_for_backend_termination()</function></link>
-       that waits for session exit (Bharath Rupireddy)
-      </para>
-
-      <para>
-       Also add a similar optional wait parameter to <link
+       Add an optional timeout parameter to <link
        linkend="functions-admin-signal"><function>pg_terminate_backend()</function></link>
       </para>
      </listitem>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index a4373b176c..a416e94d37 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -397,11 +397,6 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
   PARALLEL SAFE;
 
-CREATE OR REPLACE FUNCTION
-  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000)
-  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination'
-  PARALLEL SAFE;
-
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 837699481c..30c4516f8d 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -187,12 +187,12 @@ pg_wait_until_termination(int pid, int64 timeout)
 }
 
 /*
- * Signal to terminate a backend process. This is allowed if you are a member
- * of the role whose process is being terminated. If timeout input argument is
- * 0 (which is default), then this function just signals the backend and
- * doesn't wait. Otherwise it waits until given the timeout milliseconds or no
- * process has the given PID and returns true. On timeout, a warning is emitted
- * and false is returned.
+ * Send a signal to terminate a backend process. This is allowed if you are a
+ * member of the role whose process is being terminated. If the timeout input
+ * argument is 0, then this function just signals the backend and returns true.
+ * If timeout is nonzero, then it waits until no process has the given PID; if
+ * the process ends within the timeout, true is returned, and if the timeout is
+ * exceeded, a warning is emitted and false is returned.
  *
  * Note that only superusers can signal superuser-owned processes.
  */
@@ -201,7 +201,7 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 {
 	int			pid;
 	int			r;
-	int			timeout;
+	int			timeout;  /* milliseconds */
 
 	pid = PG_GETARG_INT32(0);
 	timeout = PG_GETARG_INT64(1);
@@ -230,42 +230,6 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 		PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
-/*
- * Wait for a backend process with the given PID to exit or until the given
- * timeout milliseconds occurs. Returns true if the backend has exited. On
- * timeout a warning is emitted and false is returned.
- *
- * We allow any user to call this function, consistent with any user being
- * able to view the pid of the process in pg_stat_activity etc.
- */
-Datum
-pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
-{
-	int			pid;
-	int64		timeout;
-	PGPROC	   *proc = NULL;
-
-	pid = PG_GETARG_INT32(0);
-	timeout = PG_GETARG_INT64(1);
-
-	if (timeout <= 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("\"timeout\" must not be negative or zero")));
-
-	proc = BackendPidGetProc(pid);
-
-	if (proc == NULL)
-	{
-		ereport(WARNING,
-				(errmsg("PID %d is not a PostgreSQL server process", pid)));
-
-		PG_RETURN_BOOL(false);
-	}
-
-	PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
-}
-
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae4607..8acb39aa7b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6187,14 +6187,10 @@
   proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4', prosrc => 'pg_cancel_backend' },
 { oid => '2096',
-  descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
+  descr => 'terminate a server process',
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
   prosrc => 'pg_terminate_backend' },
-{ oid => '2137', descr => 'wait for a backend process exit or timeout occurs',
-  proname => 'pg_wait_for_backend_termination', provolatile => 'v',
-  prorettype => 'bool', proargtypes => 'int4 int8',
-  proargnames => '{pid,timeout}', prosrc => 'pg_wait_for_backend_termination' },
 { oid => '2172', descr => 'prepare for taking an online backup',
   proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => 'text bool bool',
-- 
2.25.1

#55Noah Misch
noah@leadboat.com
In reply to: Justin Pryzby (#53)
Re: A new function to wait for the backend exit after termination

On Sat, Jun 12, 2021 at 01:27:43PM -0500, Justin Pryzby wrote:

On Sat, Jun 12, 2021 at 08:21:39AM -0700, Noah Misch wrote:

On Sat, Jun 12, 2021 at 12:12:12AM -0500, Justin Pryzby wrote:

Even if it's not removed, the descriptions should be cleaned up.

| src/include/catalog/pg_proc.dat- descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs',
=> I think doesn't need to change or mention the optional timeout at all

Agreed, these strings generally give less detail. I can revert that to the
v13 wording, 'terminate a server process'.

Maybe you'd also update the release notes.

What part of the notes did you expect to change that the patch did not change?

I suggest some edits from the remaining parts of the original patch.

These look good. I will push them when I push the other part.

#56Justin Pryzby
pryzby@telsasoft.com
In reply to: Noah Misch (#55)
Re: A new function to wait for the backend exit after termination

On Mon, Jun 14, 2021 at 09:40:27AM -0700, Noah Misch wrote:

Agreed, these strings generally give less detail. I can revert that to the
v13 wording, 'terminate a server process'.

...

Maybe you'd also update the release notes.

What part of the notes did you expect to change that the patch did not change?

Sorry, I didn't notice that your patch already adjusted the v14 notes.

Note that Bharath also corrected your commit message to say "unable *to*", and
revert the verbose pg_proc.dat descr change.

Thanks,
--
Justin

#57Noah Misch
noah@leadboat.com
In reply to: Bharath Rupireddy (#54)
Re: A new function to wait for the backend exit after termination

On Mon, Jun 14, 2021 at 07:34:59PM +0530, Bharath Rupireddy wrote:

Thanks. +1 to remove the pg_wait_for_backend_termination function. The
patch basically looks good to me. I'm attaching an updated patch. I
corrected a minor typo in the commit message, took docs and code
comment changes suggested by Justin Pryzby.

Pushed as two commits. I used your log message typo fix. Here were the diffs
in your v2 and not in an earlier patch:

-+       Add an optional wait parameter to <link
++       Add an optional timeout parameter to <link

I used this.

-+	int			timeout; /* milliseconds */
++	int			timeout;  /* milliseconds */

pgindent chooses a third option, so I ran pgindent instead of using this.

Please note that release notes still have the headline "Add functions
to wait for backend termination" of the original commit that added the
pg_wait_for_backend_termination. With the removal of it, I'm not quite
sure if we retain the the commit message or tweak it to something like
"Add optional timeout parameter to pg_terminate_backend".
<!--
Author: Magnus Hagander <magnus@hagander.net>
2021-04-08 [aaf043257] Add functions to wait for backend termination
-->

That part is supposed to mirror "git log --pretty=format:%s", no matter what
happens later. The next set of release note updates might add my latest
commit (5f1df62) to this SGML comment, on another line.