Segmentation fault when max_parallel degree is very High

Started by Dilip Kumarover 9 years ago4 messages

dilipbalaut@gmail.com

over 9 years ago

1 attachment(s)

When parallel degree is set to very high say 70000, there is a segmentation
fault in parallel code,
and that is because type casting is missing in the code..

Take a look at below test code:

create table abd(n int) with (parallel_degree=70000);
insert into abd values (generate_series(1,1000000)); analyze abd; vacuum
abd;
set max_parallel_degree=70000;
explain analyze verbose select * from abd where n<=1;

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: LOG: server
process (PID 41906) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: explain analyze verbose select * from
abd where n<=1;

This is crashing because in *ExecParallelSetupTupleQueues *function,

*for* (i = 0; i < pcxt->nworkers; ++i)

* {*

*... *

(Here i is Int but arg to shm_mq_create is Size so when worker is beyond
32767 then 32767*65536 will overflow

the integer boundary, and it will access the illegal memory and will crash
or corrupt some memory. Need to typecast

*i * PARALLEL_TUPLE_QUEUE_SIZE --> (Size)i * **PARALLEL_TUPLE_QUEUE_SIZE *and
this will fix

* mq = shm_mq_create(tqueuespace + i * PARALLEL_TUPLE_QUEUE_SIZE,
(Size)PARALLEL_TUPLE_QUEUE_SIZE);*
*...*
*}*

Below attached patch will fix this issue, Apart from here I have done
typecasting at other places also wherever its needed.
typecasting at other places will fix other issue (ERROR: requested shared
memory size overflows size_t) also
described in below mail thread

/messages/by-id/570BACFC.6020305@enterprisedb.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

max_parallel_degree_bug.patchapplication/octet-stream; name=max_parallel_degree_bug.patchDownload

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 0bba9a7..07e4ae0 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -241,7 +241,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 						 PARALLEL_ERROR_QUEUE_SIZE,
 						 "parallel error queue size not buffer-aligned");
 		shm_toc_estimate_chunk(&pcxt->estimator,
-							   PARALLEL_ERROR_QUEUE_SIZE * pcxt->nworkers);
+							   PARALLEL_ERROR_QUEUE_SIZE * (Size)pcxt->nworkers);
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 
 		/* Estimate how much we'll need for extension entrypoint info. */
@@ -347,7 +347,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		 */
 		error_queue_space =
 			shm_toc_allocate(pcxt->toc,
-							 PARALLEL_ERROR_QUEUE_SIZE * pcxt->nworkers);
+							 PARALLEL_ERROR_QUEUE_SIZE * (Size)pcxt->nworkers);
 		for (i = 0; i < pcxt->nworkers; ++i)
 		{
 			char	   *start;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 6df62a7..a90e82c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -278,7 +278,7 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 
 	/* Allocate memory for shared memory queue handles. */
 	responseq = (shm_mq_handle **)
-		palloc(pcxt->nworkers * sizeof(shm_mq_handle *));
+		palloc((Size)pcxt->nworkers * sizeof(shm_mq_handle *));
 
 	/*
 	 * If not reinitializing, allocate space from the DSM for the queues;
@@ -287,7 +287,7 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 	if (!reinitialize)
 		tqueuespace =
 			shm_toc_allocate(pcxt->toc,
-							 PARALLEL_TUPLE_QUEUE_SIZE * pcxt->nworkers);
+							 PARALLEL_TUPLE_QUEUE_SIZE * (Size)pcxt->nworkers);
 	else
 		tqueuespace = shm_toc_lookup(pcxt->toc, PARALLEL_KEY_TUPLE_QUEUE);
 
@@ -296,7 +296,7 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
 	{
 		shm_mq	   *mq;
 
-		mq = shm_mq_create(tqueuespace + i * PARALLEL_TUPLE_QUEUE_SIZE,
+		mq = shm_mq_create(tqueuespace + (Size)i * PARALLEL_TUPLE_QUEUE_SIZE,
 						   (Size) PARALLEL_TUPLE_QUEUE_SIZE);
 
 		shm_mq_set_receiver(mq, MyProc);
@@ -380,12 +380,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	 * looking at pgBufferUsage, so do it unconditionally.
 	 */
 	shm_toc_estimate_chunk(&pcxt->estimator,
-						   sizeof(BufferUsage) * pcxt->nworkers);
+						   sizeof(BufferUsage) * (Size)pcxt->nworkers);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
 	/* Estimate space for tuple queues. */
 	shm_toc_estimate_chunk(&pcxt->estimator,
-						   PARALLEL_TUPLE_QUEUE_SIZE * pcxt->nworkers);
+						   PARALLEL_TUPLE_QUEUE_SIZE * (Size)pcxt->nworkers);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
 	/*
@@ -432,7 +432,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 
 	/* Allocate space for each worker's BufferUsage; no need to initialize. */
 	bufusage_space = shm_toc_allocate(pcxt->toc,
-									  sizeof(BufferUsage) * pcxt->nworkers);
+								sizeof(BufferUsage) * (Size)pcxt->nworkers);
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
 	pei->buffer_usage = bufusage_space;

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Dilip Kumar (#1)

Re: Segmentation fault when max_parallel degree is very High

Dilip Kumar <dilipbalaut@gmail.com> writes:

When parallel degree is set to very high say 70000, there is a segmentation
fault in parallel code,
and that is because type casting is missing in the code..

I'd say the cause is not having a sane range limit on the GUC.

or corrupt some memory. Need to typecast
*i * PARALLEL_TUPLE_QUEUE_SIZE --> (Size)i * **PARALLEL_TUPLE_QUEUE_SIZE *and
this will fix

That might "fix" it on 64-bit machines, but not 32-bit.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: Segmentation fault when max_parallel degree is very High

On Wed, May 4, 2016 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dilip Kumar <dilipbalaut@gmail.com> writes:

When parallel degree is set to very high say 70000, there is a

segmentation

fault in parallel code,
and that is because type casting is missing in the code..

I'd say the cause is not having a sane range limit on the GUC.

I think it might not be advisable to have this value more than the number
of CPU cores, so how about limiting it to 512 or 1024?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: Segmentation fault when max_parallel degree is very High

On Wed, May 4, 2016 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dilip Kumar <dilipbalaut@gmail.com> writes:

When parallel degree is set to very high say 70000, there is a segmentation
fault in parallel code,
and that is because type casting is missing in the code..

I'd say the cause is not having a sane range limit on the GUC.

or corrupt some memory. Need to typecast
*i * PARALLEL_TUPLE_QUEUE_SIZE --> (Size)i * **PARALLEL_TUPLE_QUEUE_SIZE *and
this will fix

That might "fix" it on 64-bit machines, but not 32-bit.

Yeah, I think what we should do here is use mul_size(), which will
error out instead of crashing.

Putting a range limit on the GUC is a good idea, too, but I like
having overflow checks built into these code paths as a backstop, in
case a value that we think is a safe upper limit turns out to be less
safe than we think ... especially on 32-bit platforms.

I'll go do that, and also limit the maximum parallel degree to 1024,
which ought to be enough for anyone (see what I did there?).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers