strange parallel query behavior after OOM crashes
Hi,
While doing some benchmarking, I've ran into a fairly strange issue with
OOM breaking LaunchParallelWorkers() after the restart. What I see
happening is this:
1) a query is executed, and at the end of LaunchParallelWorkers we get
nworkers=8 nworkers_launched=8
2) the query does a Hash Aggregate, but ends up eating much more memory
due to n_distinct underestimate (see [1]/messages/by-id/CAFWGqnsxryEevA5A_CqT3dExmTaT44mBpNTy8TWVsSVDS71QMg@mail.gmail.com from 2015 for details), and
gets killed by OOM
3) the server restarts, the query is executed again, but this time we
get in LaunchParallelWorkers
nworkers=8 nworkers_launched=0
There's nothing else running on the server, and there definitely should
be free parallel workers.
4) The query gets killed again, and on the next execution we get
nworkers=8 nworkers_launched=8
again, although not always. I wonder whether the exact impact depends on
OOM killing the leader or worker, for example.
regards
[1]: /messages/by-id/CAFWGqnsxryEevA5A_CqT3dExmTaT44mBpNTy8TWVsSVDS71QMg@mail.gmail.com
/messages/by-id/CAFWGqnsxryEevA5A_CqT3dExmTaT44mBpNTy8TWVsSVDS71QMg@mail.gmail.com
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
Hi,
While doing some benchmarking, I've ran into a fairly strange issue with OOM
breaking LaunchParallelWorkers() after the restart. What I see happening is
this:1) a query is executed, and at the end of LaunchParallelWorkers we get
nworkers=8 nworkers_launched=8
2) the query does a Hash Aggregate, but ends up eating much more memory due
to n_distinct underestimate (see [1] from 2015 for details), and gets killed
by OOM3) the server restarts, the query is executed again, but this time we get in
LaunchParallelWorkersnworkers=8 nworkers_launched=0
There's nothing else running on the server, and there definitely should be
free parallel workers.4) The query gets killed again, and on the next execution we get
nworkers=8 nworkers_launched=8
again, although not always. I wonder whether the exact impact depends on OOM
killing the leader or worker, for example.
I don't know what's going on but I think I have seen this once or
twice myself while hacking on test code that crashed. I wonder if the
DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
the DSM control is somehow confused?
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 31, 2017 at 12:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:Hi,
While doing some benchmarking, I've ran into a fairly strange issue with OOM
breaking LaunchParallelWorkers() after the restart. What I see happening is
this:1) a query is executed, and at the end of LaunchParallelWorkers we get
nworkers=8 nworkers_launched=8
2) the query does a Hash Aggregate, but ends up eating much more memory due
to n_distinct underestimate (see [1] from 2015 for details), and gets killed
by OOM3) the server restarts, the query is executed again, but this time we get in
LaunchParallelWorkersnworkers=8 nworkers_launched=0
There's nothing else running on the server, and there definitely should be
free parallel workers.4) The query gets killed again, and on the next execution we get
nworkers=8 nworkers_launched=8
again, although not always. I wonder whether the exact impact depends on OOM
killing the leader or worker, for example.I don't know what's going on but I think I have seen this once or
twice myself while hacking on test code that crashed. I wonder if the
DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
the DSM control is somehow confused?
I think I've run into the same problem while working on parallelizing
plans containing InitPlans. You can reproduce that scenario by
following steps:
1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
In LaunchParallelWorkers, you can see
nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.
I thought because of my stupid mistake the parallel worker is
crashing, so, this is supposed to happen. Sorry for that.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
In LaunchParallelWorkers, you can see
nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.
parallel_register_count and parallel_terminate_count, both are
unsigned integer. So, whenever the difference is negative, it'll be a
well-defined unsigned integer and certainly much larger than
max_parallel_workers. Hence, no workers will be launched. I've
attached a patch to fix this.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
fix-workers-launched.patchbinary/octet-stream; name=fix-workers-launched.patchDownload
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 10e0f88..4367785 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -931,7 +931,7 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
* postmaster must not take locks; a memory barrier wouldn't guarantee
* anything useful.
*/
- if (parallel && (BackgroundWorkerData->parallel_register_count -
+ if (parallel && (int) (BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
{
On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:
On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
In LaunchParallelWorkers, you can see
nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.parallel_register_count and parallel_terminate_count, both are
unsigned integer. So, whenever the difference is negative, it'll be a
well-defined unsigned integer and certainly much larger than
max_parallel_workers. Hence, no workers will be launched. I've
attached a patch to fix this.
The current explanation of active number of parallel workers is:
* The active
* number of parallel workers is the number of registered workers minus the
* terminated ones.
In the situations like you mentioned above, this formula can give negative
number for active parallel workers. However a negative number for active
parallel workers does not make any sense.
I feel it would be better to explain in code that in what situations, the
formula
can generate a negative result and what that means.
Regards,
Neha
On Fri, Mar 31, 2017 at 5:43 AM, Neha Khatri <nehakhatri5@gmail.com> wrote:
On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
In LaunchParallelWorkers, you can see
nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.parallel_register_count and parallel_terminate_count, both are
unsigned integer. So, whenever the difference is negative, it'll be a
well-defined unsigned integer and certainly much larger than
max_parallel_workers. Hence, no workers will be launched. I've
attached a patch to fix this.The current explanation of active number of parallel workers is:
* The active
* number of parallel workers is the number of registered workers minus the
* terminated ones.In the situations like you mentioned above, this formula can give negative
number for active parallel workers. However a negative number for active
parallel workers does not make any sense.
Agreed.
I feel it would be better to explain in code that in what situations, the
formula
can generate a negative result and what that means.
I think that we need to find a fix so that it never generates a
negative result. The last patch submitted by me generates a negative
value correctly. But, surely that's not enough.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.
Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.
IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Fix-parallel-worker-counts-after-a-crash.patchbinary/octet-stream; name=0001-Fix-parallel-worker-counts-after-a-crash.patchDownload
From b34e788895c33385494674a2a7b8bc93f829b78f Mon Sep 17 00:00:00 2001
From: Kuntal Ghosh <kuntal.ghosh@enterprisedb.com>
Date: Mon, 3 Apr 2017 15:22:33 +0530
Subject: [PATCH] Fix parallel worker counts after a crash
Number of terminated parallel workers should be at least the number
of registered parallel worker. When ForgetBackgroundWorker is called
due to a bgworker crash, we should not increase the terminated parallel
worker count;
---
src/backend/postmaster/bgworker.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 10e0f88..9c3aa9a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -399,7 +399,17 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
Assert(rw->rw_shmem_slot < max_worker_processes);
slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
if ((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
- BackgroundWorkerData->parallel_terminate_count++;
+ {
+ /*
+ * If register count is just initialized, i.e., equals to
+ * zero, we should not increase the terminate count.
+ */
+ if (BackgroundWorkerData->parallel_register_count > 0)
+ BackgroundWorkerData->parallel_terminate_count++;
+
+ Assert(BackgroundWorkerData->parallel_register_count >=
+ BackgroundWorkerData->parallel_terminate_count);
+ }
slot->in_use = false;
@@ -931,6 +941,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
* postmaster must not take locks; a memory barrier wouldn't guarantee
* anything useful.
*/
+ Assert(BackgroundWorkerData->parallel_register_count >=
+ BackgroundWorkerData->parallel_terminate_count);
+
if (parallel && (BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
--
1.8.3.1
Looking further in this context, number of active parallel workers is:
parallel_register_count - parallel_terminate_count
Can active workers ever be greater than max_parallel_workers, I think no.
Then why should there be greater than check in the following condition:
if (parallel && (BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >= max_parallel_workers)
I feel there should be an assert if
(BackgroundWorkerData->parallel_register_count
- BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)
And the check could be
if (parallel && (active_parallel_workers == max_parallel_workers))
then do not register new parallel wokers and return.
There should be no tolerance for the case when active_parallel_workers >
max_parallel_workers. After all that is the purpose of max_parallel_workers.
Is it like multiple backends trying to register parallel workers at the
same time, for which the greater than check should be present?
Thoughts?
Regards,
Neha
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.
While I'm not opposed to that approach, I don't think this is a good
way to implement it. If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine. But with what you've got here, you're
essentially relying on "spooky action at a distance". It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.
BTW, if this isn't on the open items list, it should be. It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/04/2017 06:52 PM, Robert Haas wrote:
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.While I'm not opposed to that approach, I don't think this is a good
way to implement it. If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine. But with what you've got here, you're
essentially relying on "spooky action at a distance". It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.
I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM
followed by a restart, so shouldn't BackgroundWorkerShmemInit() simply
reset the values back to 0? Or do we call ForgetBackgroundWorker() after
the crash for some reason?
In any case, the comment right before BackgroundWorkerArray says this:
* These counters can of course overflow, but it's not important here
* since the subtraction will still give the right number.
which means that this assert
+ Assert(BackgroundWorkerData->parallel_register_count >=
+ BackgroundWorkerData->parallel_terminate_count);
is outright broken, just like any other attempts to rely on simple
comparisons of these two counters, no?
BTW, if this isn't on the open items list, it should be. It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.
Unrelated, but perhaps we should also add this to open items:
/messages/by-id/57bbc52c-5d40-bb80-2992-7e1027d0b67c@2ndquadrant.com
(although that's more a 9.6 material).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On 04/04/2017 06:52 PM, Robert Haas wrote:
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com>
wrote:On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.While I'm not opposed to that approach, I don't think this is a good
way to implement it. If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine. But with what you've got here, you're
essentially relying on "spooky action at a distance". It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM followed
by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
some reason?
AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.
In any case, the comment right before BackgroundWorkerArray says this:
* These counters can of course overflow, but it's not important here
* since the subtraction will still give the right number.which means that this assert
+ Assert(BackgroundWorkerData->parallel_register_count >= + BackgroundWorkerData->parallel_terminate_count);is outright broken, just like any other attempts to rely on simple
comparisons of these two counters, no?
IIUC, the assert statements ensures that register count should always
be greater than or equal to the terminate count.
Whereas, the comment refers to the fact that register count and
terminate count indicate the total number of parallel workers spawned
and terminated respectively since the server has been started. At
every session, we're not resetting the counts, hence they can
overflow. But, their substraction should give you the number of active
parallel worker at any instance.
So, I'm not able to see how the assert is broken w.r.t the aforesaid
comment. Am I missing something here?
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
I feel there should be an assert if
(BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Execute a long running parallel query q2 with number of
parallel worker spawned > 0.
The above assert statement will bring down the server unnecessarily
while executing q2.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On 04/04/2017 06:52 PM, Robert Haas wrote:
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com>
wrote:On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.While I'm not opposed to that approach, I don't think this is a good
way to implement it. If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine. But with what you've got here, you're
essentially relying on "spooky action at a distance". It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM followed
by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
some reason?AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.
OK, so essentially we reset the counters, getting
parallel_register_count = 0
parallel_terminate_count = 0
and then ForgetBackgroundWworker() runs and increments the
terminate_count, breaking the logic?
And you're using (parallel_register_count > 0) to detect whether we're
still in the init phase or not. Hmmm.
In any case, the comment right before BackgroundWorkerArray says this:
* These counters can of course overflow, but it's not important here
* since the subtraction will still give the right number.which means that this assert
+ Assert(BackgroundWorkerData->parallel_register_count >= + BackgroundWorkerData->parallel_terminate_count);is outright broken, just like any other attempts to rely on simple
comparisons of these two counters, no?IIUC, the assert statements ensures that register count should always
be greater than or equal to the terminate count.
Whereas, the comment refers to the fact that register count and
terminate count indicate the total number of parallel workers spawned
and terminated respectively since the server has been started. At
every session, we're not resetting the counts, hence they can
overflow. But, their substraction should give you the number of active
parallel worker at any instance.
So, I'm not able to see how the assert is broken w.r.t the aforesaid
comment. Am I missing something here?
The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these values
parallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAX
which is fine, because the C handles the overflow during subtraction and so
parallel_register_count - parallel_terminate_count = 1
But the assert is not doing subtraction, it's comparing the values
directly:
Assert(parallel_register_count >= parallel_terminate_count);
and the (perfectly valid) values trivially violate this comparison.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.OK, so essentially we reset the counters, getting
parallel_register_count = 0
parallel_terminate_count = 0and then ForgetBackgroundWworker() runs and increments the terminate_count,
breaking the logic?And you're using (parallel_register_count > 0) to detect whether we're still
in the init phase or not. Hmmm.
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.
In any case, the comment right before BackgroundWorkerArray says this:
* These counters can of course overflow, but it's not important here
* since the subtraction will still give the right number.which means that this assert
+ Assert(BackgroundWorkerData->parallel_register_count >= + BackgroundWorkerData->parallel_terminate_count);is outright broken, just like any other attempts to rely on simple
comparisons of these two counters, no?IIUC, the assert statements ensures that register count should always
be greater than or equal to the terminate count.
Whereas, the comment refers to the fact that register count and
terminate count indicate the total number of parallel workers spawned
and terminated respectively since the server has been started. At
every session, we're not resetting the counts, hence they can
overflow. But, their substraction should give you the number of active
parallel worker at any instance.
So, I'm not able to see how the assert is broken w.r.t the aforesaid
comment. Am I missing something here?The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these valuesparallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAXwhich is fine, because the C handles the overflow during subtraction and so
parallel_register_count - parallel_terminate_count = 1
But the assert is not doing subtraction, it's comparing the values directly:
Assert(parallel_register_count >= parallel_terminate_count);
and the (perfectly valid) values trivially violate this comparison.
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(
It seems to me the only thing we can make sure is
(parallel_register_count == parallel_terminate_count == 0) in
ForgetBackgroundWorker in case of a crash.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/05/2017 12:36 PM, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.OK, so essentially we reset the counters, getting
parallel_register_count = 0
parallel_terminate_count = 0and then ForgetBackgroundWworker() runs and increments the terminate_count,
breaking the logic?And you're using (parallel_register_count > 0) to detect whether we're still
in the init phase or not. Hmmm.Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.
Sure, and I agree that having an explicit flag is the right solution.
I'm just trying to make sure I understand what's happening.
The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these valuesparallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAXwhich is fine, because the C handles the overflow during subtraction and so
parallel_register_count - parallel_terminate_count = 1
But the assert is not doing subtraction, it's comparing the values directly:
Assert(parallel_register_count >= parallel_terminate_count);
and the (perfectly valid) values trivially violate this comparison.
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(
Actually, that assert would work, because C does handle overflows on
uint values during subtraction just fine. That is,
(UINT_MAX+x) - UINT_MAX = x
Also, the comment about overflows before BackgroundWorkerArray claims
this is the case.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these valuesparallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAXwhich is fine, because the C handles the overflow during subtraction and
soparallel_register_count - parallel_terminate_count = 1
But the assert is not doing subtraction, it's comparing the values
directly:Assert(parallel_register_count >= parallel_terminate_count);
and the (perfectly valid) values trivially violate this comparison.
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(Actually, that assert would work, because C does handle overflows on uint
values during subtraction just fine. That is,(UINT_MAX+x) - UINT_MAX = x
Also, the comment about overflows before BackgroundWorkerArray claims this
is the case.
Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:
Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */
Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/05/2017 01:09 PM, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these valuesparallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAXwhich is fine, because the C handles the overflow during subtraction and
soparallel_register_count - parallel_terminate_count = 1
But the assert is not doing subtraction, it's comparing the values
directly:Assert(parallel_register_count >= parallel_terminate_count);
and the (perfectly valid) values trivially violate this comparison.
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(Actually, that assert would work, because C does handle overflows on uint
values during subtraction just fine. That is,(UINT_MAX+x) - UINT_MAX = x
Also, the comment about overflows before BackgroundWorkerArray claims this
is the case.Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?
I don't follow. How exactly do you get into this situation, assuming you
actually enforce the (register_count > terminate_count) invariant
consistently? In the modulo arithmetic of course.
But now that I'm thinking about it, the subtraction actually happens in
unsigned ints, so the result will be unsigned int too, i.e. always >=0.
Whether it makes sense depends on the invariant being true.
But I think this would work:
Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
If the difference gets 'negative' and wraps around, it'll be close to
UINT_MAX.
But man, this unsigned int arithmetic makes my brain hurt ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On 04/04/2017 06:52 PM, Robert Haas wrote:
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com>
wrote:On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.Hmm. So this seems like the root of the problem. Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.While I'm not opposed to that approach, I don't think this is a good
way to implement it. If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine. But with what you've got here, you're
essentially relying on "spooky action at a distance". It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM followed
by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
some reason?AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart.
In general, your theory appears right, but can you check how it
behaves in standby server because there is a difference in how the
startup process behaves during master and standby startup? In master,
it stops after recovery whereas in standby it will keep on running to
receive WAL.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 4, 2017 at 1:52 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
In any case, the comment right before BackgroundWorkerArray says this:
* These counters can of course overflow, but it's not important here
* since the subtraction will still give the right number.which means that this assert
+ Assert(BackgroundWorkerData->parallel_register_count >= + BackgroundWorkerData->parallel_terminate_count);is outright broken, just like any other attempts to rely on simple
comparisons of these two counters, no?
Yeah, that's busted. Good catch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.
Did you intend to attach that patch to this email?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.Did you intend to attach that patch to this email?
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.
Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.
The above assert statement will bring down the server unnecessarily
while executing q2. If the assert statement was not there, it could
have gone ahead without launching any workers.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
Did you intend to attach that patch to this email?
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
At this point, parallel_register_count should be equal to
parallel_terminate_count. 4 workers were started, and 4 have
terminated.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.
Now here parallel_register_count should get bumped up to 4+(# of
workers now launched) at the beginning and then
parallel_terminate_count at the end. No problem.
What's going wrong, here?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/05/2017 04:09 PM, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.Did you intend to attach that patch to this email?
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.The above assert statement will bring down the server unnecessarily
while executing q2. If the assert statement was not there, it could
have gone ahead without launching any workers.
Ah, right. I forgot max_parallel_workers may be changed in session. I
think we can use max_worker_processes instead:
Assert(parallel_register_count - parallel_terminate_count <=
max_worker_processes);
The whole point is that if parallel_terminate_count exceeds
parallel_register_count, the subtraction wraps around to a value close
to UINT_MAX. All we need is an maximum possible delta between the two
values to detect this, and max_worker_processes seems fine. We could
also use UINT_MAX/2 for example.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/05/2017 04:15 PM, Robert Haas wrote:
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:Did you intend to attach that patch to this email?
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.At this point, parallel_register_count should be equal to
parallel_terminate_count. 4 workers were started, and 4 have
terminated.
No, the workers were started, but are still running, so
register_count = 4
terminate_count = 0
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.Now here parallel_register_count should get bumped up to 4+(# of
workers now launched) at the beginning and then
parallel_terminate_count at the end. No problem.What's going wrong, here?
Well, assuming the other workers are still running, you get
register_count = 4
terminate_count = 0
and the difference is greater than max_parallel_workers = 3. Which
violates the assert.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:Did you intend to attach that patch to this email?
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.At this point, parallel_register_count should be equal to
parallel_terminate_count. 4 workers were started, and 4 have
terminated.
Actually, I'm referring to the case when q1 is still running. In that
case, parallel_register_count = 4 and parallel_terminate_count = 0.
Backend 2> SET max_parallel_worker = 3;
Now, parallel_register_count - parallel_terminate_count = 4 >
max_parallel_worker.
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.
Hence, the assert will fail here.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/05/2017 04:26 PM, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:Did you intend to attach that patch to this email?
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.At this point, parallel_register_count should be equal to
parallel_terminate_count. 4 workers were started, and 4 have
terminated.Actually, I'm referring to the case when q1 is still running. In that
case, parallel_register_count = 4 and parallel_terminate_count = 0.Backend 2> SET max_parallel_worker = 3;
Now, parallel_register_count - parallel_terminate_count = 4 >
max_parallel_worker.Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.Hence, the assert will fail here.
Actually, you probably don't even need two sessions to trigger the
assert. All you need is to tweak the max_parallel_workers and then
reload the config while the parallel query is running. Then
ForgetBackgroundWorker() will see the new value.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 5:34 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:
On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri <nehakhatri5@gmail.com>
wrote:I feel there should be an assert if
(BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Execute a long running parallel query q2 with number of
parallel worker spawned > 0.The above assert statement will bring down the server unnecessarily
while executing q2.
Right, with multiple backends trying to fiddle with max_parallel_workers,
that might bring the server down with the said assert:
Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers)
The problem here seem to be the change in the max_parallel_workers value while
the parallel workers are still under execution. So this poses two questions:
1. From usecase point of view, why could there be a need to tweak the
max_parallel_workers exactly at the time when the parallel workers are at
play.
2. Could there be a restriction on tweaking of max_parallel_workers while
the parallel workers are at play? At least do not allow setting the
max_parallel_workers less than the current # of active parallel workers.
Regards,
Neha
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
The problem here seem to be the change in the max_parallel_workers value
while the parallel workers are still under execution. So this poses two
questions:1. From usecase point of view, why could there be a need to tweak the
max_parallel_workers exactly at the time when the parallel workers are at
play.
2. Could there be a restriction on tweaking of max_parallel_workers while
the parallel workers are at play? At least do not allow setting the
max_parallel_workers less than the current # of active parallel workers.
Well, that would be letting the tail wag the dog. The maximum value
of max_parallel_workers is only 1024, and what we're really worried
about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
daylight. How about just creating a #define that's used by guc.c as
the maximum for the GUC, and here we assert that we're <= that value?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM followed
by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
some reason?AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart.In general, your theory appears right, but can you check how it
behaves in standby server because there is a difference in how the
startup process behaves during master and standby startup? In master,
it stops after recovery whereas in standby it will keep on running to
receive WAL.
While performing StartupDatabase, both master and standby server
behave in similar way till postmaster spawns startup process.
In master, startup process completes its job and dies. As a result,
reaper is called which in turn calls maybe_start_bgworker().
In standby, after getting a valid snapshot, startup process sends
postmaster a signal to enable connections. Signal handler in
postmaster calls maybe_start_bgworker().
In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
forget the bgworker process.
I've attached the patch for adding an argument in
ForgetBackgroundWorker() to indicate a crashed situation. Based on
that, we can take the necessary actions. I've not included the Assert
statement in this patch.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Fix-parallel-worker-counts-after-a-crash_v1.patchbinary/octet-stream; name=0001-Fix-parallel-worker-counts-after-a-crash_v1.patchDownload
From deee881ab6651dde633d0d53c9bf81d67135ac04 Mon Sep 17 00:00:00 2001
From: Kuntal Ghosh <kuntal.ghosh@enterprisedb.com>
Date: Wed, 5 Apr 2017 14:10:14 +0530
Subject: [PATCH] Fix parallel worker counts after a crash
Number of terminated parallel workers should be at least the number
of registered parallel worker. When ForgetBackgroundWorker is called
due to a bgworker crash, we should not increase the terminated parallel
worker count;
---
src/backend/postmaster/bgworker.c | 16 +++++++++++++---
src/backend/postmaster/postmaster.c | 6 +++---
src/include/postmaster/bgworker_internals.h | 2 +-
3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 0823317..59b13fc 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -399,10 +399,12 @@ BackgroundWorkerStateChange(void)
* points to it. This convention allows deletion of workers during
* searches of the worker list, and saves having to search the list again.
*
+ * wasCrashed indicates whether the worker crashed previously.
+ *
* This function must be invoked only in the postmaster.
*/
void
-ForgetBackgroundWorker(slist_mutable_iter *cur)
+ForgetBackgroundWorker(slist_mutable_iter *cur, bool wasCrashed)
{
RegisteredBgWorker *rw;
BackgroundWorkerSlot *slot;
@@ -412,7 +414,15 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
Assert(rw->rw_shmem_slot < max_worker_processes);
slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
if ((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
- BackgroundWorkerData->parallel_terminate_count++;
+ {
+ /*
+ * If the worker crashed previously, shared memory must have been
+ * initialized. Hence, we don't increase the terminate count in
+ * that case.
+ */
+ if (!wasCrashed)
+ BackgroundWorkerData->parallel_terminate_count++;
+ }
slot->in_use = false;
@@ -471,7 +481,7 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
*/
if (rw->rw_terminate ||
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
- ForgetBackgroundWorker(cur);
+ ForgetBackgroundWorker(cur, false);
if (notify_pid != 0)
kill(notify_pid, SIGUSR1);
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6831342..aa7ccf3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1601,7 +1601,7 @@ DetermineSleepTime(struct timeval * timeout)
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART
|| rw->rw_terminate)
{
- ForgetBackgroundWorker(&siter);
+ ForgetBackgroundWorker(&siter, false);
continue;
}
@@ -5716,7 +5716,7 @@ maybe_start_bgworker(void)
/* marked for death? */
if (rw->rw_terminate)
{
- ForgetBackgroundWorker(&iter);
+ ForgetBackgroundWorker(&iter, false);
continue;
}
@@ -5731,7 +5731,7 @@ maybe_start_bgworker(void)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
- ForgetBackgroundWorker(&iter);
+ ForgetBackgroundWorker(&iter, true);
continue;
}
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 9a2de4f..f50b2b1 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -40,7 +40,7 @@ extern slist_head BackgroundWorkerList;
extern Size BackgroundWorkerShmemSize(void);
extern void BackgroundWorkerShmemInit(void);
extern void BackgroundWorkerStateChange(void);
-extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
+extern void ForgetBackgroundWorker(slist_mutable_iter *cur, bool wasCrashed);
extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
extern void BackgroundWorkerStopNotifications(pid_t pid);
--
1.8.3.1
On Thu, Apr 06, 2017 at 03:04:13PM +0530, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM followed
by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
some reason?AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart.In general, your theory appears right, but can you check how it
behaves in standby server because there is a difference in how the
startup process behaves during master and standby startup? In master,
it stops after recovery whereas in standby it will keep on running to
receive WAL.While performing StartupDatabase, both master and standby server
behave in similar way till postmaster spawns startup process.
In master, startup process completes its job and dies. As a result,
reaper is called which in turn calls maybe_start_bgworker().
In standby, after getting a valid snapshot, startup process sends
postmaster a signal to enable connections. Signal handler in
postmaster calls maybe_start_bgworker().
In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
forget the bgworker process.I've attached the patch for adding an argument in
ForgetBackgroundWorker() to indicate a crashed situation. Based on
that, we can take the necessary actions. I've not included the Assert
statement in this patch.
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
The problem here seem to be the change in the max_parallel_workers value
while the parallel workers are still under execution. So this poses two
questions:1. From usecase point of view, why could there be a need to tweak the
max_parallel_workers exactly at the time when the parallel workers are at
play.
2. Could there be a restriction on tweaking of max_parallel_workers while
the parallel workers are at play? At least do not allow setting the
max_parallel_workers less than the current # of active parallel workers.Well, that would be letting the tail wag the dog. The maximum value
of max_parallel_workers is only 1024, and what we're really worried
about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
daylight. How about just creating a #define that's used by guc.c as
the maximum for the GUC, and here we assert that we're <= that value?
I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;
Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.
I've attached both the patches for better accessibility. PFA.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Fix-parallel-worker-counts-after-a-crash.patchbinary/octet-stream; name=0001-Fix-parallel-worker-counts-after-a-crash.patchDownload
From efdae1208a9e997a74d35b75ed00a7643691a6dd Mon Sep 17 00:00:00 2001
From: Kuntal Ghosh <kuntal.ghosh@enterprisedb.com>
Date: Wed, 5 Apr 2017 14:10:14 +0530
Subject: [PATCH 1/2] Fix parallel worker counts after a crash
Number of terminated parallel workers should be at least the number
of registered parallel worker. When ForgetBackgroundWorker is called
due to a bgworker crash, we should not increase the terminated parallel
worker count;
---
src/backend/postmaster/bgworker.c | 16 +++++++++++++---
src/backend/postmaster/postmaster.c | 6 +++---
src/include/postmaster/bgworker_internals.h | 2 +-
3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 0823317..59b13fc 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -399,10 +399,12 @@ BackgroundWorkerStateChange(void)
* points to it. This convention allows deletion of workers during
* searches of the worker list, and saves having to search the list again.
*
+ * wasCrashed indicates whether the worker crashed previously.
+ *
* This function must be invoked only in the postmaster.
*/
void
-ForgetBackgroundWorker(slist_mutable_iter *cur)
+ForgetBackgroundWorker(slist_mutable_iter *cur, bool wasCrashed)
{
RegisteredBgWorker *rw;
BackgroundWorkerSlot *slot;
@@ -412,7 +414,15 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
Assert(rw->rw_shmem_slot < max_worker_processes);
slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
if ((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
- BackgroundWorkerData->parallel_terminate_count++;
+ {
+ /*
+ * If the worker crashed previously, shared memory must have been
+ * initialized. Hence, we don't increase the terminate count in
+ * that case.
+ */
+ if (!wasCrashed)
+ BackgroundWorkerData->parallel_terminate_count++;
+ }
slot->in_use = false;
@@ -471,7 +481,7 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
*/
if (rw->rw_terminate ||
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
- ForgetBackgroundWorker(cur);
+ ForgetBackgroundWorker(cur, false);
if (notify_pid != 0)
kill(notify_pid, SIGUSR1);
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6831342..aa7ccf3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1601,7 +1601,7 @@ DetermineSleepTime(struct timeval * timeout)
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART
|| rw->rw_terminate)
{
- ForgetBackgroundWorker(&siter);
+ ForgetBackgroundWorker(&siter, false);
continue;
}
@@ -5716,7 +5716,7 @@ maybe_start_bgworker(void)
/* marked for death? */
if (rw->rw_terminate)
{
- ForgetBackgroundWorker(&iter);
+ ForgetBackgroundWorker(&iter, false);
continue;
}
@@ -5731,7 +5731,7 @@ maybe_start_bgworker(void)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
- ForgetBackgroundWorker(&iter);
+ ForgetBackgroundWorker(&iter, true);
continue;
}
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 9a2de4f..f50b2b1 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -40,7 +40,7 @@ extern slist_head BackgroundWorkerList;
extern Size BackgroundWorkerShmemSize(void);
extern void BackgroundWorkerShmemInit(void);
extern void BackgroundWorkerStateChange(void);
-extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
+extern void ForgetBackgroundWorker(slist_mutable_iter *cur, bool wasCrashed);
extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
extern void BackgroundWorkerStopNotifications(pid_t pid);
--
1.8.3.1
0002-Add-GUC-for-max_parallel_workers-upper-limit.patchbinary/octet-stream; name=0002-Add-GUC-for-max_parallel_workers-upper-limit.patchDownload
From b8917f6b5f159ecc6c73957c7806a14f256b1608 Mon Sep 17 00:00:00 2001
From: Kuntal Ghosh <kuntal.ghosh@enterprisedb.com>
Date: Mon, 10 Apr 2017 15:43:17 +0530
Subject: [PATCH 2/2] Add GUC for max_parallel_workers upper limit
---
src/backend/postmaster/bgworker.c | 10 ++++++++++
src/backend/utils/misc/guc.c | 4 ++--
src/include/postmaster/bgworker.h | 7 +++++++
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 59b13fc..d386a3a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -941,6 +941,16 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
* postmaster must not take locks; a memory barrier wouldn't guarantee
* anything useful.
*/
+
+ /*
+ * For a parallel worker, the absolute difference between parallal worker
+ * register count and terminate count must be less than maximum parallel
+ * worker count limit.
+ */
+ Assert(!parallel || abs((int)(BackgroundWorkerData->parallel_register_count -
+ BackgroundWorkerData->parallel_terminate_count)) <=
+ MAX_PARALLEL_WORKER_LIMIT);
+
if (parallel && (BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a57b175..3081cfb 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2689,7 +2689,7 @@ static struct config_int ConfigureNamesInt[] =
NULL
},
&max_parallel_workers_per_gather,
- 2, 0, 1024,
+ 2, 0, MAX_PARALLEL_WORKER_LIMIT,
NULL, NULL, NULL
},
@@ -2699,7 +2699,7 @@ static struct config_int ConfigureNamesInt[] =
NULL
},
&max_parallel_workers,
- 8, 0, 1024,
+ 8, 0, MAX_PARALLEL_WORKER_LIMIT,
NULL, NULL, NULL
},
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 51a5978..7f55f4b 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -41,6 +41,13 @@
#ifndef BGWORKER_H
#define BGWORKER_H
+/* GUC options */
+
+/*
+ * Maximum possible value of parallel workers.
+ */
+#define MAX_PARALLEL_WORKER_LIMIT 1024
+
/*---------------------------------------------------------------------
* External module API.
*---------------------------------------------------------------------
--
1.8.3.1
[ Adding Julien, whose patch this was. ]
On Thu, Apr 6, 2017 at 5:34 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
While performing StartupDatabase, both master and standby server
behave in similar way till postmaster spawns startup process.
In master, startup process completes its job and dies. As a result,
reaper is called which in turn calls maybe_start_bgworker().
In standby, after getting a valid snapshot, startup process sends
postmaster a signal to enable connections. Signal handler in
postmaster calls maybe_start_bgworker().
In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
forget the bgworker process.I've attached the patch for adding an argument in
ForgetBackgroundWorker() to indicate a crashed situation. Based on
that, we can take the necessary actions. I've not included the Assert
statement in this patch.
This doesn't seem right, because this will potentially pass wasCrashed
= true even if there has been no crash-and-restart cycle. The place
where you're passing "true" only knows that rw->rw_crashed_at != 0 &&
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART, but that doesn't
prove much of anything. Some *other* background worker could have
crashed, rather than the one at issue. Even there's only one worker
involved, the test for whether to call HandleChildCrash() involves
other factors:
if ((rw->rw_worker.bgw_flags & BGWORKER_SHMEM_ACCESS) != 0)
{
if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
{
HandleChildCrash(pid, exitstatus, namebuf);
return true;
}
}
if (!ReleasePostmasterChildSlot(rw->rw_child_slot) &&
(rw->rw_worker.bgw_flags & BGWORKER_SHMEM_ACCESS) != 0)
{
HandleChildCrash(pid, exitstatus, namebuf);
return true;
}
After some study, I think the solution here is as follows:
1. Forget BGW_NEVER_RESTART workers in
ResetBackgroundWorkerCrashTimes() rather than leaving them around to
be cleaned up after the conclusion of the restart, so that they go
away before rather than after shared memory is reset.
2. Don't allow BGWORKER_CLASS_PARALLEL workers to be anything other
than BGW_NEVER_RESTART, so that (1) suffices to guarantee that, after
recovering from a crash, 0 is the correct starting value for
parallel_register_count. (Alternatively, we could iterate through the
worker list and compute the correct starting value for
parallel_register_count, but that seems silly since we only ever
expect BGW_NEVER_RESTART here anyway.)
The attached patch seems to fix the problem for me. I'll commit it
tomorrow unless somebody sees a problem with it; if that happens, I'll
provide some other kind of update tomorrow. [For clarity, for
official status update purposes, I am setting a next-update date of
tomorrow, 2017-04-11.]
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
forget-before-crash-v1.patchapplication/octet-stream; name=forget-before-crash-v1.patchDownload
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 0823317..6b2385c 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -515,13 +515,34 @@ ResetBackgroundWorkerCrashTimes(void)
rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
- /*
- * For workers that should not be restarted, we don't want to lose the
- * information that they have crashed; otherwise, they would be
- * restarted, which is wrong.
- */
- if (rw->rw_worker.bgw_restart_time != BGW_NEVER_RESTART)
+ if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
+ {
+ /*
+ * Workers marked BGW_NVER_RESTART shouldn't get relaunched after
+ * the crash, so forget about them. (If we wait until after the
+ * crash to forget about them, and they are parallel workers,
+ * parallel_terminate_count will get incremented after we've
+ * already zeroed parallel_register_count, which would be bad.)
+ */
+ ForgetBackgroundWorker(&iter);
+ }
+ else
+ {
+ /*
+ * The accounting which we do via parallel_register_count and
+ * parallel_terminate_count would get messed up if a worker marked
+ * parallel could survive a crash and restart cycle. All such
+ * workers should be marked BGW_NEVER_RESTART, and thus control
+ * should never reach this branch.
+ */
+ Assert((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) == 0);
+
+ /*
+ * Allow this worker to be restarted immediately after we finish
+ * resetting.
+ */
rw->rw_crashed_at = 0;
+ }
}
}
@@ -589,6 +610,21 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
return false;
}
+ /*
+ * Parallel workers may not be configured for restart, because the
+ * parallel_register_count/parallel_terminate_count accounting can't
+ * handle parallel workers lasting through a crash-and-restart cycle.
+ */
+ if (worker->bgw_restart_time != BGW_NEVER_RESTART &&
+ (worker->bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
+ {
+ ereport(elevel,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("background worker \"%s\": parallel workers may not be configured for restart",
+ worker->bgw_name)));
+ return false;
+ }
+
return true;
}
On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
1. Forget BGW_NEVER_RESTART workers in
ResetBackgroundWorkerCrashTimes() rather than leaving them around to
be cleaned up after the conclusion of the restart, so that they go
away before rather than after shared memory is reset.
Now with this, would it still be required to forget BGW_NEVER_RESTART
workers in maybe_start_bgworker():
if (rw->rw_crashed_at != 0)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
ForgetBackgroundWorker(&iter);
continue;
}
......
}
Regards,
Neha
On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
1. Forget BGW_NEVER_RESTART workers in
ResetBackgroundWorkerCrashTimes() rather than leaving them around to
be cleaned up after the conclusion of the restart, so that they go
away before rather than after shared memory is reset.Now with this, would it still be required to forget BGW_NEVER_RESTART
workers in maybe_start_bgworker():if (rw->rw_crashed_at != 0)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
ForgetBackgroundWorker(&iter);
continue;
}
......
}
Well, as noted before, not every background worker failure leads to a
crash-and-restart; for example, a non-shmem-connected worker or one
that exits with status 1 will set rw_crashed_at but will not cause a
crash-and-restart cycle. It's not 100% clear to me whether the code
you're talking about can be reached in those cases, but I wouldn't bet
against it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/10/2017 01:39 PM, Kuntal Ghosh wrote:
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
The problem here seem to be the change in the max_parallel_workers value
while the parallel workers are still under execution. So this poses two
questions:1. From usecase point of view, why could there be a need to tweak the
max_parallel_workers exactly at the time when the parallel workers are at
play.
2. Could there be a restriction on tweaking of max_parallel_workers while
the parallel workers are at play? At least do not allow setting the
max_parallel_workers less than the current # of active parallel workers.Well, that would be letting the tail wag the dog. The maximum value
of max_parallel_workers is only 1024, and what we're really worried
about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
daylight. How about just creating a #define that's used by guc.c as
the maximum for the GUC, and here we assert that we're <= that value?I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.I've attached both the patches for better accessibility. PFA.
At first I was like 'WTF? Why do we need a new GUC just becase of an
assert?' but you're actually not adding a new GUC parameter, you're
adding a constant which is then used as a max value for max for the two
existing parallel GUCs.
I think this is fine.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 11, 2017 at 2:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
1. Forget BGW_NEVER_RESTART workers in
ResetBackgroundWorkerCrashTimes() rather than leaving them around to
be cleaned up after the conclusion of the restart, so that they go
away before rather than after shared memory is reset.
+1 for the solution.
Now with this, would it still be required to forget BGW_NEVER_RESTART
workers in maybe_start_bgworker():if (rw->rw_crashed_at != 0)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
ForgetBackgroundWorker(&iter);
continue;
}
......
}Well, as noted before, not every background worker failure leads to a
crash-and-restart; for example, a non-shmem-connected worker or one
that exits with status 1 will set rw_crashed_at but will not cause a
crash-and-restart cycle. It's not 100% clear to me whether the code
you're talking about can be reached in those cases, but I wouldn't bet
against it.
I think that for above-mentioned background workers, we follow a
different path to call ForgetBackgroundWorker().
CleanupBackgroundWorker()
- ReportBackgroundWorkerExit()
- ForgetBackgroundWorker()
But, I'm not sure for any other cases.
Should we include the assert statement as well?
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
At first I was like 'WTF? Why do we need a new GUC just becase of an
assert?' but you're actually not adding a new GUC parameter, you're adding a
constant which is then used as a max value for max for the two existing
parallel GUCs.I think this is fine.
I think it is pretty odd-looking. As written, it computes an unsigned
-- and therefore necessarily non-negative -- value into a signed --
and thus possibly neative -- value only to pass it back to abs() to
make sure it's not negative:
+ Assert(!parallel ||
abs((int)(BackgroundWorkerData->parallel_register_count -
+
BackgroundWorkerData->parallel_terminate_count)) <=
+ MAX_PARALLEL_WORKER_LIMIT);
I think we can just say
Assert(!parallel || BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 11, 2017 at 12:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:At first I was like 'WTF? Why do we need a new GUC just becase of an
assert?' but you're actually not adding a new GUC parameter, you're adding a
constant which is then used as a max value for max for the two existing
parallel GUCs.I think this is fine.
I think it is pretty odd-looking. As written, it computes an unsigned
-- and therefore necessarily non-negative -- value into a signed --
and thus possibly neative -- value only to pass it back to abs() to
make sure it's not negative:+ Assert(!parallel || abs((int)(BackgroundWorkerData->parallel_register_count - + BackgroundWorkerData->parallel_terminate_count)) <= + MAX_PARALLEL_WORKER_LIMIT);I think we can just say
Assert(!parallel || BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);
Actually, there's an even simpler way: stick it inside the if () block
where we return false if we're outside the limit. Then we don't need
to test the "parallel" bool either, because it's already known to be
true. Committed that way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers