"actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

Started by PG Bug reporting formover 4 years ago5 messagesdocs

noreply@postgresql.org

over 4 years ago

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/using-explain.html
Description:

https://www.postgresql.org/docs/14/using-explain.html
The doc says
"the loops value reports the total number of executions of the node, and the
actual time and rows values shown are averages per-execution. This is done
to make the numbers comparable with the way that the cost estimates are
shown."

But I found for parallel operation, the above description maybe not
correct.

For example
postgres=# create table c(id int);
CREATE TABLE
postgres=# insert into c select generate_series(1,1000000);
INSERT 0 1000000
postgres=# explain analyze select count(*) from c;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual
time=290.460..290.508 rows=1 loops=1)
-> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual
time=289.605..290.484 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)
(actual time=188.336..188.337 rows=1 loops=3)
-> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667
width=0) (actual time=0.030..140.036 rows=333333 loops=3)
Planning Time: 0.331 ms
Execution Time: 290.607 ms
(8 rows)

postgres=#

According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but
the total cost for this SQL is only 290ms.
Is the output of this explain correct?

nikolai.berkoff

nikolai.berkoff@pm.me

over 4 years ago

In reply to: PG Bug reporting form (#1)

Re: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

https://www.postgresql.org/docs/14/using-explain.html
The doc says

"the loops value reports the total number of executions of the node, and the

actual time and rows values shown are averages per-execution. This is done

to make the numbers comparable with the way that the cost estimates are

shown."

But I found for parallel operation, the above description maybe not

correct.

For example

postgres=# create table c(id int);

CREATE TABLE

postgres=# insert into c select generate_series(1,1000000);

INSERT 0 1000000

postgres=# explain analyze select count(*) from c;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual

time=290.460..290.508 rows=1 loops=1)

-> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual

time=289.605..290.484 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)

(actual time=188.336..188.337 rows=1 loops=3)

-> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667

width=0) (actual time=0.030..140.036 rows=333333 loops=3)

Planning Time: 0.331 ms

Execution Time: 290.607 ms

(8 rows)

postgres=#

According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but

the total cost for this SQL is only 290ms.

Is the output of this explain correct?

Parallel query is explained in https://www.postgresql.org/docs/14/how-parallel-query-works.html and https://www.postgresql.org/docs/14/parallel-plans.html

The docs seem clear to me that as the nodes are executed in parallel then the time execution time is not 140.036*3. The 140.036 value is actual time the Parallel Seq Scan nodes ran for but there were up to 2 running in parallel.

tanghy.fnst@fujitsu.com

over 4 years ago

In reply to: nikolai.berkoff (#2)

RE: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

Wednesday, November 17, 2021 6:29 AM, nikolai.berkoff <nikolai.berkoff@pm.me> wrote:

Parallel query is explained in
https://www.postgresql.org/docs/14/how-parallel-query-works.html and
https://www.postgresql.org/docs/14/parallel-plans.html

The docs seem clear to me that as the nodes are executed in parallel then the
time execution time is not 140.036*3. The 140.036 value is actual time the Parallel
Seq Scan nodes ran for but there were up to 2 running in parallel.

Thanks for your reply.
I read your references but still confused about the 'loops' in parallel query result.

-> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0) (actual time=0.030..140.036 rows=333333 loops=3)

In my previous example, actual row number is 333333*3=1e6(which is correct), so I think the actual time is 140.036*3ms.
Do your think the loops(3) has no meaning for parallel scan node when calculate actual time?

Regards,
Tang

Show quoted text

-----Original Message-----
From: nikolai.berkoff <nikolai.berkoff@pm.me>
Sent: Wednesday, November 17, 2021 6:29 AM
To: Tang, Haiying/唐海英 <tanghy.fnst@fujitsu.com>;
pgsql-docs@lists.postgresql.org
Subject: Re: "actual time" in QUERY PLAN for parallel operation when loops is
bigger than 1

https://www.postgresql.org/docs/14/using-explain.html
The doc says

"the loops value reports the total number of executions of the node, and the

actual time and rows values shown are averages per-execution. This is done

to make the numbers comparable with the way that the cost estimates are

shown."

But I found for parallel operation, the above description maybe not

correct.

For example

postgres=# create table c(id int);

CREATE TABLE

postgres=# insert into c select generate_series(1,1000000);

INSERT 0 1000000

postgres=# explain analyze select count(*) from c;

QUERY PLAN

---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
-----------------------------------------------------------

Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual

time=290.460..290.508 rows=1 loops=1)

-> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual

time=289.605..290.484 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)

(actual time=188.336..188.337 rows=1 loops=3)

-> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667

width=0) (actual time=0.030..140.036 rows=333333 loops=3)

Planning Time: 0.331 ms

Execution Time: 290.607 ms

(8 rows)

postgres=#

According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but

the total cost for this SQL is only 290ms.

Is the output of this explain correct?

Parallel query is explained in
https://www.postgresql.org/docs/14/how-parallel-query-works.html and
https://www.postgresql.org/docs/14/parallel-plans.html

The docs seem clear to me that as the nodes are executed in parallel then the
time execution time is not 140.036*3. The 140.036 value is actual time the Parallel
Seq Scan nodes ran for but there were up to 2 running in parallel.

Guillaume Lelarge

guillaume@lelarge.info

over 4 years ago

In reply to: tanghy.fnst@fujitsu.com (#3)

Re: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

Hi,

Le mer. 17 nov. 2021 à 06:32, tanghy.fnst@fujitsu.com <
tanghy.fnst@fujitsu.com> a écrit :

Wednesday, November 17, 2021 6:29 AM, nikolai.berkoff <
nikolai.berkoff@pm.me> wrote:

Parallel query is explained in
https://www.postgresql.org/docs/14/how-parallel-query-works.html and
https://www.postgresql.org/docs/14/parallel-plans.html

The docs seem clear to me that as the nodes are executed in parallel

then the

time execution time is not 140.036*3. The 140.036 value is actual time

the Parallel

Seq Scan nodes ran for but there were up to 2 running in parallel.

Thanks for your reply.
I read your references but still confused about the 'loops' in parallel
query result.

-> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0)

(actual time=0.030..140.036 rows=333333 loops=3)
In my previous example, actual row number is 333333*3=1e6(which is
correct), so I think the actual time is 140.036*3ms.
Do your think the loops(3) has no meaning for parallel scan node when
calculate actual time?

As far as I understand it, you have to multiply the number of rows by the
number of loops, but this doesn't apply to duration at least for parallel
queries.

--
Guillaume.

tanghy.fnst@fujitsu.com

over 4 years ago

In reply to: Guillaume Lelarge (#4)

RE: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

On Wednesday, November 17, 2021 4:19 PM, Guillaume Lelarge guillaume@lelarge.info<mailto:guillaume@lelarge.info> wrote:

-> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0) (actual time=0.030..140.036 rows=333333 loops=3)

In my previous example, actual row number is 333333*3=1e6(which is correct), so I think the actual time is 140.036*3ms.
Do your think the loops(3) has no meaning for parallel scan node when calculate actual time?

As far as I understand it, you have to multiply the number of rows by the number of loops, but this doesn't apply to duration at least for parallel >queries.

Yes, I got your point.
I’m not familiar with PostgreSQL planner/executor, but if the code is correct, then maybe some modification should be done at [1]https://www.postgresql.org/docs/14/using-explain.html as below:

Before:
Multiply by the loops value to get the total time actually spent in the node.

After:
Multiply by the loops value to get the total time actually spent in the node (for node in the parallel portion of the plan, this is not needed).

[1]: https://www.postgresql.org/docs/14/using-explain.html

What do you think?

Regards,
Tang

"actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

Attachments: