How does Postgres decide if to use additional workers?

Started by Thorsten Schöningabout 5 years ago3 messagesgeneral
Jump to latest
#1Thorsten Schöning
tschoening@am-soft.de

Hi all,

I have the following table with an added BTREE-index on "captured_at".

CREATE TABLE datagram
(
id bigserial NOT NULL,
src_re integer NOT NULL,
src_clt integer NOT NULL,
src_meter integer NOT NULL,
captured_at timestamp with time zone NOT NULL,
captured_rssi smallint NOT NULL,
oms_status smallint NOT NULL,
oms_enc bytea,
oms_dec bytea
);

I have a query containing the following condition:

AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + CAST('P0D' AS INTERVAL))

When that query is applied to the above table with it's index on
"captured_at", that results in the following query plan. The important
thing to note is that NO parallel workers a re used.

-> Hash Join (cost=246164.35..2004405.07 rows=11323 width=51) (actual time=93.802..5776.755 rows=104607 loops=1)
Hash Cond: (meter.meter_bcd = meter_bcd.id)
-> Hash Join (cost=246019.19..2003889.83 rows=68494 width=37) (actual time=93.067..5744.787 rows=104607 loops=1)
Hash Cond: (datagram.src_meter = meter.id)
-> Index Scan using idx_datagram_captured_at_btree on datagram (cost=0.57..1756571.73 rows=495033 width=20) (actual time=0.054..5451.417 rows=514369 loops=1)
Index Cond: ((captured_at >= ('2020-08-28 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + '00:00:00'::interval)))

For various reasons I tested the above table as a partitioned one as
well, with individual partitions containing the rows of one year only.
The important thing to note is that I simply kept the same index on
"captured_at" like before, though the query plan looks different now:

Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=245966.53..272335.67 rows=5419 width=51) (actual time=625.846..1560.103 rows=34869 loops=3)
Hash Cond: (datagram_y2020_h2.src_meter = meter.id)
-> Parallel Append (cost=4.19..25430.72 rows=236911 width=20) (actual time=2.827..863.298 rows=171456 loops=3)
Subplans Removed: 23
-> Parallel Index Scan using datagram_y2020_h2_captured_at_idx on datagram_y2020_h2 (cost=0.44..24051.22 rows=236888 width=20) (actual time=2.826..848.388 rows=171456 loops=3)
Index Cond: ((captured_at >= ('2020-08-28 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + '00:00:00'::interval)))

It seems that only because of a different number of rows per
individual table, additional workers are used. Though, in the past I
had all of those "captured_at" in one table as well, only with far
less columns and for that table additional workers have been used,
pretty much like is the case now:

Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=264793.42..1666293.23 rows=4332 width=51) (actual time=96.080..638.802 rows=34869 loops=3)
Hash Cond: (oms_rec.meter = meter.id)
-> Nested Loop (cost=1.14..1400747.39 rows=189399 width=20) (actual time=0.145..496.366 rows=171456 loops=3)
-> Hash (cost=264709.53..264709.53 rows=6620 width=39) (actual time=95.521..95.528 rows=40044 loops=3)
Buckets: 65536 (originally 8192) Batches: 1 (originally 1) Memory Usage: 3016kB
-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..14853.95 rows=189399 width=24) (actual time=0.098..81.556 rows=171456 loops=3)
-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.32 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=514369)
-> Hash Join (cost=145.59..264709.53 rows=6620 width=39) (actual time=9.883..86.390 rows=40044 loops=3)
Index Cond: (id = clt_rec.oms_rec)
Index Cond: ((captured_at >= ('2020-08-28 10:34:32.855+02'::timestamp with time zone - '5 days'::interval)) AND (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + '00:00:00'::interval)))
Hash Cond: (meter.meter_bcd = meter_bcd.id)

So, based on which facts does Postgres decide if to use aadditional
workers or not? Can I see those decisions explained somewhere? I don't
see anything in the query plan. Thanks!

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web: http://www.AM-SoFT.de/

Telefon: 05151- 9468- 0
Telefon: 05151- 9468-55
Fax: 05151- 9468-88
Mobil: 0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska

Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning

Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

#2Noname
luis.roberto@siscobra.com.br
In reply to: Thorsten Schöning (#1)
Re: How does Postgres decide if to use additional workers?

De: "Thorsten Schöning" <tschoening@am-soft.de>
Para: "pgsql-general" <pgsql-general@lists.postgresql.org>
Enviadas: Terça-feira, 9 de fevereiro de 2021 12:52:02
Assunto: How does Postgres decide if to use additional workers?
So, based on which facts does Postgres decide if to use aadditional
workers or not? Can I see those decisions explained somewhere? I don't
see anything in the query plan. Thanks!

Mit freundlichen Grüßen

Thorsten Schöning

Hi!

I think you should look into min_parallel_table_scan_size and min_parallel_index_scan_size GUCs.

#3Philip Semanchuk
philip@americanefficient.com
In reply to: Thorsten Schöning (#1)
Re: How does Postgres decide if to use additional workers?

On Feb 9, 2021, at 10:52 AM, Thorsten Schöning <tschoening@am-soft.de> wrote:

So, based on which facts does Postgres decide if to use aadditional
workers or not? Can I see those decisions explained somewhere? I don't
see anything in the query plan. Thanks!

Hi Thorsten,
This is an interesting topic for me too. here’s a formula for the max number of workers that Postgres will consider for a table. Of course, most queries use more than just one table, and I don’t yet understand how Postgres handles the situation where the formula suggests multiple workers for some tables and a single worker for others.

There was some conversation about this on the performance mailing list in June. Here’s a link to the message that contains the formula; there’s more items of interest in the whole thread:
/messages/by-id/89423FD3-0F13-447D-8D9E-EB1722150F94@americanefficient.com

Hope this helps
Philip