Parallel Query

Started by Krithika Venkateshabout 8 years ago2 messagesgeneral
Jump to latest
#1Krithika Venkatesh
krithikavenkatesh31@gmail.com

Hi All,

I was reading about parallel queries in postgresql 10.

I came across the property max_parallel_workers_per_gather. It sets the
maximum number of workers that can be started by a single Gather or Gather
Merge node.

I am not able to understand gather and gather merge node.

What is the difference between max_parallel_workers_per_gather and
max_parallel_processes. What is meant by gather node.

Please let me know.

Thanks,
Krithika

#2Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Krithika Venkatesh (#1)
Re: Parallel Query

On Thu, Jan 11, 2018 at 12:24 PM, Krithika Venkatesh <
krithikavenkatesh31@gmail.com> wrote:

Hi All,

I was reading about parallel queries in postgresql 10.

I came across the property max_parallel_workers_per_gather. It sets the
maximum number of workers that can be started by a single Gather or Gather
Merge node.

I am not able to understand gather and gather merge node.

With parallel query in picture, there are two new types of nodes

introduced in postgresql 10 viz, Gather node and other is Gather-Merge
node.
Now, if seq-scan has to be performed in parallel, then gather node comes
into picture, which you may think of as a master node for the parallel
workers which are performing sequential scan. e.g.
QUERY PLAN
------------------------------------------------------------------
Gather (cost=0.00..7.60 rows=246 width=8)
Workers Planned: 2
-> Parallel Seq Scan on t (cost=0.00..7.60 rows=102 width=8)
Filter: (i < 50)
(4 rows)

Here, there are two workers which are in parallel scanning the table using
seq-scan and gather is the process which collects the result of this data
and passes it to the node above it, if any.

Now, just like in seq-scan the order of tuples is not maintained so is the
case in gather node. However, for the cases wherein you need to maintain
the order, gather-merge node comes into picture. For example when index
scan is used in parallel then gather-merge will be above it to collect the
results of parallel scans respecting the order of indexes.

What is the difference between max_parallel_workers_per_gather and

max_parallel_processes. What is meant by gather node.

As you mentioned above, max_parallel_workers_per_gather is the total

number of parallel workers allowed for a node. The parameter
max_parallel_processes sets the upper limit of total parallel workers
running. As a base rule, the value of max_parallel_workers_per_gather
should be less than or equal to max_parallel_processes, otherwise the value
of max_parallel_processes will be the maximum allowable workers for a
parallel node.

You may have a look at the documentation of these parameters for more
details --
https://www.postgresql.org/docs/10/static/when-can-parallel-query-be-used.html

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/