fix cost subqueryscan wrong parallel cost

robertmhaas@gmail.com

about 4 years ago

In reply to: bucoo@sohu.com (#1)

Re: fix cost subqueryscan wrong parallel cost

On Tue, Apr 12, 2022 at 2:57 AM bucoo@sohu.com <bucoo@sohu.com> wrote:

The cost_subqueryscan function does not judge whether it is parallel.

I don't see any reason why it would need to do that. A subquery scan
isn't parallel aware.

regress
-- Incremental sort vs. set operations with varno 0
set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
QUERY PLAN
----------------------------------------------------------
Incremental Sort
Sort Key: t.a, t.c
Presorted Key: t.a
-> Unique
-> Sort
Sort Key: t.a, t.b, t.c
-> Append
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t t_1
to
Incremental Sort
Sort Key: t.a, t.c
Presorted Key: t.a
-> Unique
-> Sort
Sort Key: t.a, t.b, t.c
-> Gather
Workers Planned: 2
-> Parallel Append
-> Parallel Seq Scan on t
-> Parallel Seq Scan on t t_1
Obviously the latter is less expensive

Generally it should be. But there's no subquery scan visible here.

There may well be something wrong here, but I don't think that you've
diagnosed the problem correctly, or explained it clearly.

--
Robert Haas
EDB: http://www.enterprisedb.com

Richard Guo

guofenglinux@gmail.com

about 4 years ago

In reply to: Robert Haas (#2)

Re: fix cost subqueryscan wrong parallel cost

On Fri, Apr 15, 2022 at 12:50 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 12, 2022 at 2:57 AM bucoo@sohu.com <bucoo@sohu.com> wrote:

The cost_subqueryscan function does not judge whether it is parallel.

I don't see any reason why it would need to do that. A subquery scan
isn't parallel aware.

regress
-- Incremental sort vs. set operations with varno 0
set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
QUERY PLAN
----------------------------------------------------------
Incremental Sort
Sort Key: t.a, t.c
Presorted Key: t.a
-> Unique
-> Sort
Sort Key: t.a, t.b, t.c
-> Append
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t t_1
to
Incremental Sort
Sort Key: t.a, t.c
Presorted Key: t.a
-> Unique
-> Sort
Sort Key: t.a, t.b, t.c
-> Gather
Workers Planned: 2
-> Parallel Append
-> Parallel Seq Scan on t
-> Parallel Seq Scan on t t_1
Obviously the latter is less expensive

Generally it should be. But there's no subquery scan visible here.

The paths of subtrees in set operations would be type of subqueryscan.
The SubqueryScan nodes are removed later in set_plan_references() in
this case as they are considered as being trivial.

There may well be something wrong here, but I don't think that you've
diagnosed the problem correctly, or explained it clearly.

Some debugging work shows that the second path is generated but then
fails when competing with the first path. So if there is something
wrong, I think cost calculation is the suspicious point.

Not related to this topic but I noticed another problem from the plan.
Note the first Sort node which is to unique-ify the result of the UNION.
Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
that we can avoid the second Sort node?

Thanks
Richard

about 4 years ago

In reply to: bucoo@sohu.com (#1)

Re: Re: fix cost subqueryscan wrong parallel cost

Generally it should be. But there's no subquery scan visible here.

I wrote a patch for distinct/union and aggregate support last year(I want restart it again).
/messages/by-id/2021091517250848215321@sohu.com
If not apply this patch, some parallel paths will naver be selected.

Some debugging work shows that the second path is generated but then
fails when competing with the first path. So if there is something
wrong, I think cost calculation is the suspicious point.

Maybe, I will check it again.

Not related to this topic but I noticed another problem from the plan.
Note the first Sort node which is to unique-ify the result of the UNION.
Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
that we can avoid the second Sort node?

This is a regress test, just for test Incremental Sort plan.

Justin Pryzby

pryzby@telsasoft.com

about 4 years ago

In reply to: Richard Guo (#3)

Re: fix cost subqueryscan wrong parallel cost

On Fri, Apr 15, 2022 at 05:16:44PM +0800, Richard Guo wrote:

Not related to this topic but I noticed another problem from the plan.
Note the first Sort node which is to unique-ify the result of the UNION.
Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
that we can avoid the second Sort node?

I don't know, but it's possible there's a solution related to commit db0d67db2
"Optimize order of GROUP BY keys" - DISTINCT is the same as GROUP BY k1, ...,
kN. I guess UNION [DISTINCT] should learn to use GROUP BY rather than
DISTINCT?

--
Justin

robertmhaas@gmail.com

about 4 years ago

In reply to: bucoo@sohu.com (#4)

Re: Re: fix cost subqueryscan wrong parallel cost

On Fri, Apr 15, 2022 at 6:06 AM bucoo@sohu.com <bucoo@sohu.com> wrote:

Generally it should be. But there's no subquery scan visible here.

I wrote a patch for distinct/union and aggregate support last year(I want restart it again).
/messages/by-id/2021091517250848215321@sohu.com
If not apply this patch, some parallel paths will naver be selected.

Sure, but that doesn't make the patch correct. The patch proposes
that, when parallelism in use, a subquery scan will produce fewer rows
than when parallelism is not in use, and that's 100% false. Compare
this with the case of a parallel sequential scan. If a table contains
1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
return 1000 rows. But if we scan it with a Parallel Seq Scan using
say 4 workers, the number of rows returned in each worker will be
substantially less than 1000, because 1000 is now the *total* number
of rows to be returned across *all* processes, and what we need is the
number of rows returned in *each* process.

The same thing isn't true for a subquery scan. Consider:

Gather
-> Subquery Scan
-> Parallel Seq Scan

One thing is for sure: the number of rows that will be produced by the
subquery scan in each backend is exactly equal to the number of rows
that the subquery scan receives from its subpath. Parallel Seq Scan
can't just return a row count estimate based on the number of rows in
the table, because those rows are going to be divided among the
workers. But the Subquery Scan doesn't do anything like that. If it
receives let's say 250 rows as input in each worker, it's going to
produce 250 output rows in each worker. Your patch says it's going to
produce fewer than that, and that's wrong, regardless of whether it
gives you the plan you want in this particular case.

--
Robert Haas
EDB: http://www.enterprisedb.com

about 4 years ago

In reply to: bucoo@sohu.com (#1)

Re: Re: fix cost subqueryscan wrong parallel cost

Sure, but that doesn't make the patch correct. The patch proposes
that, when parallelism in use, a subquery scan will produce fewer rows
than when parallelism is not in use, and that's 100% false. Compare
this with the case of a parallel sequential scan. If a table contains
1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
return 1000 rows. But if we scan it with a Parallel Seq Scan using
say 4 workers, the number of rows returned in each worker will be
substantially less than 1000, because 1000 is now the *total* number
of rows to be returned across *all* processes, and what we need is the
number of rows returned in *each* process.

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal subpath
-> Parallel Seq Scan (rows=10000)

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

bucoo@sohu.com

robertmhaas@gmail.com

about 4 years ago

In reply to: bucoo@sohu.com (#7)

Re: Re: fix cost subqueryscan wrong parallel cost

On Wed, Apr 20, 2022 at 10:01 AM bucoo@sohu.com <bucoo@sohu.com> wrote:

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal subpath
-> Parallel Seq Scan (rows=10000)

OK, that's bad.

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

Suppose parallelism is not in use and that param_info is NULL. Then,
is path->subpath->rows guaranteed to be equal to baserel->rows? If
yes, then we don't need to a three-part if statement as you propose
here and can just change the "else" clause to say path->path.rows =
path->subpath->rows. If no, then your change gives the wrong answer.

--
Robert Haas
EDB: http://www.enterprisedb.com

about 4 years ago

In reply to: bucoo@sohu.com (#1)

Re: Re: fix cost subqueryscan wrong parallel cost

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal subpath
-> Parallel Seq Scan (rows=10000)

OK, that's bad.

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

Suppose parallelism is not in use and that param_info is NULL. Then,
is path->subpath->rows guaranteed to be equal to baserel->rows? If
yes, then we don't need to a three-part if statement as you propose
here and can just change the "else" clause to say path->path.rows =
path->subpath->rows. If no, then your change gives the wrong answer.

I checked some regress test, Sometimes subquery scan have filter,
so path->subpath->row guaranteed *not* to be equal to baserel->rows.
If the first patch is false, I don't known how to fix this,
looks like need someone's help.

#10

robertmhaas@gmail.com

about 4 years ago

In reply to: bucoo@sohu.com (#9)

Re: Re: fix cost subqueryscan wrong parallel cost

On Thu, Apr 21, 2022 at 2:38 AM bucoo@sohu.com <bucoo@sohu.com> wrote:

Suppose parallelism is not in use and that param_info is NULL. Then,
is path->subpath->rows guaranteed to be equal to baserel->rows? If
yes, then we don't need to a three-part if statement as you propose
here and can just change the "else" clause to say path->path.rows =
path->subpath->rows. If no, then your change gives the wrong answer.

I checked some regress test, Sometimes subquery scan have filter,
so path->subpath->row guaranteed *not* to be equal to baserel->rows.
If the first patch is false, I don't known how to fix this,
looks like need someone's help.

Please fix your mailer so that it doesn't send me a bounce message
every time I reply to one of your messages on list.

I don't know how to fix this right now either, then; maybe I or
someone else will have a good idea later.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11

about 4 years ago

In reply to: Robert Haas (#10)

Re: fix cost subqueryscan wrong parallel cost

Suppose parallelism is not in use and that param_info is NULL. Then,
is path->subpath->rows guaranteed to be equal to baserel->rows? If
yes, then we don't need to a three-part if statement as you propose
here and can just change the "else" clause to say path->path.rows =
path->subpath->rows. If no, then your change gives the wrong answer.

I checked some regress test, Sometimes subquery scan have filter,
so path->subpath->row guaranteed *not* to be equal to baserel->rows.
If the first patch is false, I don't known how to fix this,
looks like need someone's help.

Please fix your mailer so that it doesn't send me a bounce message
every time I reply to one of your messages on list.

This message send using Outlook.

I don't know how to fix this right now either, then; maybe I or
someone else will have a good idea later.

I don't known too.

#12

david.g.johnston@gmail.com

about 4 years ago

In reply to: bucoo@sohu.com (#9)

Re: Re: fix cost subqueryscan wrong parallel cost

On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:

for now fuction cost_subqueryscan always using *total* rows even

parallel

path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal

subpath

-> Parallel Seq Scan (rows=10000)

OK, that's bad.

I don't understand how that plan shape is possible. Gather requires a
parallel aware subpath, so said subpath can be executed multiple times in
parallel, and subquery isn't. If there is parallelism happening within a
subquery the results are consolidated using Append or Gather first - and
the output rows of that path entry (all subpaths of Subquery have the same
->row value per set_subquery_size_estimates), become the input tuples for
Subquery, to which it then applies its selectivity multiplier and stores
the final result in baserel->rows; which the costing code then examines
when costing the RTE_SUBQUERY path entry.

David J.

#13

robertmhaas@gmail.com

about 4 years ago

In reply to: David G. Johnston (#12)

Re: Re: fix cost subqueryscan wrong parallel cost

On Fri, Apr 22, 2022 at 11:55 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal subpath
-> Parallel Seq Scan (rows=10000)

OK, that's bad.

I don't understand how that plan shape is possible. Gather requires a parallel aware subpath, so said subpath can be executed multiple times in parallel, and subquery isn't. If there is parallelism happening within a subquery the results are consolidated using Append or Gather first - and the output rows of that path entry (all subpaths of Subquery have the same ->row value per set_subquery_size_estimates), become the input tuples for Subquery, to which it then applies its selectivity multiplier and stores the final result in baserel->rows; which the costing code then examines when costing the RTE_SUBQUERY path entry.

Gather doesn't require a parallel aware subpath, just a parallel-safe
subpath. In a case like this, the parallel seq scan will divide the
rows from the underlying relation across the three processes executing
it. Each process will pass the rows it receives through its own copy
of the subquery scan. Then, the Gather node will collect all the rows
from all the workers to produce the final result.

It's an extremely important feature of parallel query that the
parallel-aware node doesn't have to be immediately beneath the Gather.
You need to have a parallel-aware node in there someplace, but it
could be separated from the gather by any number of levels e.g.

Gather
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Parallel Seq Scan
-> Index Scan
-> Index Scan
-> Index Scan

You can stick as many parameterized index scans in there as you like
and you still only need one parallel-aware node at the bottom. Once
the parallel seq scan divides up the rows across workers, each worker
can perform the index lookups for the rows that it receives without
any coordination with other workers. It neither knows nor cares that
this is happening in the midst of an operation that is parallel
overall; all the nested loops and index scans work just as they would
in a non-parallel plan. The only things that needed to know about
parallelism are the operation that is dividing up the rows among
workers (here, the parallel seq scan) and the operation that is
gathering up all the results produced by individual workers (here, the
gather node).

--
Robert Haas
EDB: http://www.enterprisedb.com

#14

david.g.johnston@gmail.com

about 4 years ago

In reply to: Robert Haas (#13)

Re: Re: fix cost subqueryscan wrong parallel cost

On Thu, Apr 28, 2022 at 9:53 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Apr 22, 2022 at 11:55 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:

for now fuction cost_subqueryscan always using *total* rows even

parallel

path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal

subpath

-> Parallel Seq Scan (rows=10000)

OK, that's bad.

Gather doesn't require a parallel aware subpath, just a parallel-safe
subpath. In a case like this, the parallel seq scan will divide the
rows from the underlying relation across the three processes executing
it. Each process will pass the rows it receives through its own copy
of the subquery scan. Then, the Gather node will collect all the rows
from all the workers to produce the final result.

Thank you. I think I got off on a tangent there and do understand the
general design here better now.

I feel like the fact that the 2.4 divisor (for 2 planned workers) isn't
shown in the explain plan anywhere is an oversight.

To move the original complaint forward a bit I am posting the three plan
changes that using path->subpath->rows provokes in the regression tests.

======================================================================
--
-- Check for incorrect optimization when IN subquery contains a SRF
--
select * from int4_tbl o where (f1, f1) in
(select f1, generate_series(1,50) / 10 g from int4_tbl i group by f1);

The material difference between the existing plan and this one is the
estimation of 250 rows
here compared to 1 row.
So (rel.rows != path->subpath->rows) at the top of cost_subqueryscan
+               ->  Subquery Scan on "ANY_subquery"  (cost=1.06..9.28
rows=250 width=8)
+                     Output: "ANY_subquery".f1, "ANY_subquery".g
+                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+                     ->  Result  (cost=1.06..6.15 rows=250 width=8)
======================================================================
The second plan change is basically this same thing, going from rows=4 to
rows=1
causes the plan to include a materialize node.  The shape for purposes of
the security barrier
remains correct.
======================================================================
select * from t union select * from t order by 1,3;
Gather here costs 2,600 vs the Append being 2,950 in the existing plan
shape.
+               ->  Gather  (cost=0.00..2600.00 rows=120000 width=12)
+                     Workers Planned: 2
+                     ->  Parallel Append  (cost=0.00..2600.00 rows=50000
width=12)
+                           ->  Parallel Seq Scan on t  (cost=0.00..575.00
rows=25000 width=12)
+                           ->  Parallel Seq Scan on t t_1
 (cost=0.00..575.00 rows=25000 width=12)
=======================================================================

I've attached the two raw regression output diffs.

Using path->subpath->rows ignores the impact of the node's own filters, but
the base pre-filter number is/should be the correct one; though it is
difficult to say that with certainty when most of these nodes are discarded
and one cannot debug in the middle but only observe the end results.
Disabling that optimization is presently beyond my skill though I may take
it up anyway as its likely still orders easier to do, and then hope some of
these plans produce using data to check with, than actually diving into a C
debugger for the first time.

Reverse engineering the 350 difference may be another approach - namely is
it strictly due to the different plan shape or is it due to the number of
rows. The fact that the row difference is 35,000 and the cost is 1%
(cpu_tuple_cost = 0.01) of that number seems like a red herring after
thinking it through...to many scans plus the differing shapes.

David J.

#15

Richard Guo

guofenglinux@gmail.com

about 4 years ago

In reply to: Robert Haas (#13)

Re: Re: fix cost subqueryscan wrong parallel cost

On Fri, Apr 29, 2022 at 12:53 AM Robert Haas <robertmhaas@gmail.com> wrote:

Gather doesn't require a parallel aware subpath, just a parallel-safe
subpath. In a case like this, the parallel seq scan will divide the
rows from the underlying relation across the three processes executing
it. Each process will pass the rows it receives through its own copy
of the subquery scan. Then, the Gather node will collect all the rows
from all the workers to produce the final result.

It's an extremely important feature of parallel query that the
parallel-aware node doesn't have to be immediately beneath the Gather.
You need to have a parallel-aware node in there someplace, but it
could be separated from the gather by any number of levels e.g.

Gather
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Parallel Seq Scan
-> Index Scan
-> Index Scan
-> Index Scan

Thanks for the explanation. That's really helpful to understand the
parallel query mechanism.

So for the nodes between Gather and parallel-aware node, how should we
calculate their estimated rows?

Currently subquery scan is using rel->rows (if no parameterization),
which I believe is not correct. That's not the size the subquery scan
node in each worker needs to handle, as the rows have been divided
across workers by the parallel-aware node.

Using subpath->rows is not correct either, as subquery scan node may
have quals.

It seems to me the right way is to divide the rel->rows among all the
workers.

Thanks
Richard

#16

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Richard Guo (#15)

Re: fix cost subqueryscan wrong parallel cost

Richard Guo <guofenglinux@gmail.com> writes:

Currently subquery scan is using rel->rows (if no parameterization),
which I believe is not correct. That's not the size the subquery scan
node in each worker needs to handle, as the rows have been divided
across workers by the parallel-aware node.

Really? Maybe I misunderstand the case under consideration, but
what I think will be happening is that each worker will re-execute
the pushed-down subquery in full. Otherwise it can't compute the
correct answer. What gets divided across the set of workers is
the total *number of executions* of the subquery, which should be
independent of the number of workers, so that the cost is (more
or less) the same as the non-parallel case.

At least that's true for a standard correlated subplan, which is
normally run again for each row processed by the parent node.
For hashed subplans and initplans, what would have been "execute
once" semantics becomes "execute once per worker", creating a
strict cost disadvantage for parallelization. I don't know
whether the current costing model accounts for that. But if it
does that wrong, arbitrarily altering the number of rows won't
make it better.

regards, tom lane

#17

david.g.johnston@gmail.com

about 4 years ago

In reply to: Tom Lane (#16)

Re: fix cost subqueryscan wrong parallel cost

On Fri, Apr 29, 2022 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

Currently subquery scan is using rel->rows (if no parameterization),
which I believe is not correct. That's not the size the subquery scan
node in each worker needs to handle, as the rows have been divided
across workers by the parallel-aware node.

Really? Maybe I misunderstand the case under consideration, but
what I think will be happening is that each worker will re-execute
the pushed-down subquery in full. Otherwise it can't compute the
correct answer. What gets divided across the set of workers is
the total *number of executions* of the subquery, which should be
independent of the number of workers, so that the cost is (more
or less) the same as the non-parallel case.

I've been operating under the belief that a subquery node (or, rather, all
nodes) in a path get their row count estimates from self-selectivity * rows
provided by the next node down in the path. In a partial path, eventually
the parallel aware node at the bottom of the path divides its row count
estimate by the parallel divisor (2.4 for two workers). That value then
returns back up through the path until it hits a gather. Every node in
between, which are almost exclusively parallel-safe (not parallel aware),
can just use the row count being percolated up the path (i.e., they get the
divided row count in their partial pathing and full row counts in the
normal pathing). The gather node, realizing that the row count it is
dealing with has been divided by the parallel divisor, undoes the division
in order to provide the correct row count to the non-parallel executing
nodes above it.

So a subquery is only ever executed once in a path - but the number of
returned rows depends on the number of planned workers done under the
assumption that a query executed among 2 workers costs 1/2.4 the amount of
the same query done with just the leader. It is an independent factor
compared to everything else going on and so the multiplication can simply
be done first to the original row count.

allpaths.c@2974-2975 (generate_gather_paths)
(L: 2993 as well)
(L: 3167 - generate_useful_gather_paths)

rows =
cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;

Shouldn't that be:

rows =
cheapest_partial_path->rows * (get_parallel_divisor(cheapest_partial_path))

David J.

#18

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Tom Lane (#16)

Re: fix cost subqueryscan wrong parallel cost

I wrote:

Really? Maybe I misunderstand the case under consideration, but
what I think will be happening is that each worker will re-execute
the pushed-down subquery in full.

Oh ... nevermind that: what I was thinking about was the SubLink/SubPlan
case, which is unrelated to SubqueryScan. -ENOCAFFEINE, sorry about that.

regards, tom lane

#19

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Tom Lane (#18)

Re: fix cost subqueryscan wrong parallel cost

I wrote:

-ENOCAFFEINE, sorry about that.

As penance for that blunder, I spent a little time looking into this
issue (responding to Robert's upthread complaint that it wasn't
explained clearly). See the test case in parallel_subquery.sql,
attached, which is adapted from the test in incremental_sort.sql.
It produces these plans:

explain select * from t union select * from t;

Unique (cost=29344.85..30544.85 rows=120000 width=12)
-> Sort (cost=29344.85..29644.85 rows=120000 width=12)
Sort Key: t.a, t.b, t.c
-> Append (cost=0.00..2950.00 rows=120000 width=12)
-> Gather (cost=0.00..575.00 rows=60000 width=12)
Workers Planned: 2
-> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12)
-> Gather (cost=0.00..575.00 rows=60000 width=12)
Workers Planned: 2
-> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12)

explain select * from t union all select * from t;

Gather (cost=0.00..1400.00 rows=120000 width=12)
Workers Planned: 2
-> Parallel Append (cost=0.00..1400.00 rows=50000 width=12)
-> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12)
-> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12)

I take no position on whether the second plan's costs are correct;
but if they are, then the Gather-atop-Append structure is clearly
cheaper than the Append-atop-Gathers structure, so the planner made
the wrong choice in the first case.

I then modified setrefs.c to not remove SubqueryScan nodes
(just make trivial_subqueryscan() return constant false)
and got this output from the UNION case:

Unique (cost=29344.85..30544.85 rows=120000 width=12)
-> Sort (cost=29344.85..29644.85 rows=120000 width=12)
Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
-> Append (cost=0.00..2950.00 rows=120000 width=12)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..1175.00 rows=60000 width=12)
-> Gather (cost=0.00..575.00 rows=60000 width=12)
Workers Planned: 2
-> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..1175.00 rows=60000 width=12)
-> Gather (cost=0.00..575.00 rows=60000 width=12)
Workers Planned: 2
-> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12)

The UNION ALL plan doesn't change, because no SubqueryScans are ever
created in that case (we flattened the UNION ALL to an appendrel
early on). Removing the test case's settings to allow a parallel
plan, the non-parallelized plan for the UNION case looks like

Unique (cost=30044.85..31244.85 rows=120000 width=12)
-> Sort (cost=30044.85..30344.85 rows=120000 width=12)
Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
-> Append (cost=0.00..3650.00 rows=120000 width=12)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..1525.00 rows=60000 width=12)
-> Seq Scan on t (cost=0.00..925.00 rows=60000 width=12)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..1525.00 rows=60000 width=12)
-> Seq Scan on t t_1 (cost=0.00..925.00 rows=60000 width=12)

So: the row counts for SubqueryScan are correct, thus the upthread
proposal to change them is not. The cost estimates, however, seem
pretty bogus. What we're seeing here is that we're charging 600
cost units to pass the data through SubqueryScan, and that's
consistent across the parallel and non-parallel cases, so it's
correct on its own terms. But actually it's totally wrong because
we're going to elide that node later.

So I think the actual problem here is that we leave the decision
to elide no-op SubqueryScan nodes till setrefs.c. We should detect
that earlier, and when it applies, label the SubqueryScanPath with
the exact cost of its child.

(I think the current implementation might've been all right when
it was devised, on the grounds that we had no real planning
flexibility for UNION operations anyway. But now we do, and the
bogus cost charge is causing visible problems.)

regards, tom lane

#20

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Tom Lane (#19)

Re: fix cost subqueryscan wrong parallel cost

I wrote:

So I think the actual problem here is that we leave the decision
to elide no-op SubqueryScan nodes till setrefs.c. We should detect
that earlier, and when it applies, label the SubqueryScanPath with
the exact cost of its child.

Hmm ... actually, while doing that seems like it'd be a good idea,
it doesn't have much bearing on the case at hand. I approximated
the results of such a change on this test case by just removing
the "cpu_tuple_cost" component from cost_subqueryscan:

@@ -1420,7 +1420,7 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);

    startup_cost = qpqual_cost.startup;
-   cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+   cpu_per_tuple = qpqual_cost.per_tuple;
    run_cost = cpu_per_tuple * baserel->tuples;

/* tlist eval costs are paid per output row, not per tuple scanned */

and what I got was

Unique (cost=28144.85..29344.85 rows=120000 width=12)
-> Sort (cost=28144.85..28444.85 rows=120000 width=12)
Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
-> Append (cost=0.00..1750.00 rows=120000 width=12)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..575.00 rows=60000 width=12)
-> Gather (cost=0.00..575.00 rows=60000 width=12)
Workers Planned: 2
-> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..575.00 rows=60000 width=12)
-> Gather (cost=0.00..575.00 rows=60000 width=12)
Workers Planned: 2
-> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12)

The subqueryscans are being costed the way we want now, but it's still
going for the append-atop-gathers plan. So I dug a bit deeper, and
found that generate_union_paths does also create the gather-atop-append
plan, but *it's costed at 1750 units*, exactly the same as the other
path. So add_path is just making an arbitrary choice between two paths
of identical cost, and it happens to pick this one.

(If I don't make the above tweak to cost_subqueryscan, the same thing
happens, although the two competing paths now each have cost 2950.)

So: why do we come out with a cost of 1750 for the very same plan
that in the UNION ALL case is costed at 1400? AFAICS it's because
the UNION ALL case thinks that the two inputs of the parallel append
produce 25000 rows apiece so the parallel append produces 50000 rows,
and it costs the append's overhead on that basis. But in the UNION
case, the parallel append sees two inputs that are claimed to return
60000 rows, so the append produces 120000 rows, meaning more append
overhead. We can't readily get EXPLAIN to print this tree since
it's rejected by add_path, but what I'm seeing (with the above
costing hack) is

{GATHERPATH
:rows 120000
:startup_cost 0.00
:total_cost 1750.00
:subpath
{APPENDPATH
:parallel_aware true
:parallel_safe true
:parallel_workers 2
:rows 120000
:startup_cost 0.00
:total_cost 1750.00
:subpaths (
{SUBQUERYSCANPATH
:rows 60000
:startup_cost 0.00
:total_cost 575.00
:subpath
{PATH
:parallel_aware true
:parallel_safe true
:parallel_workers 2
:rows 25000
:startup_cost 0.00
:total_cost 575.00
}
}
{SUBQUERYSCANPATH
:rows 60000
:startup_cost 0.00
:total_cost 575.00
:subpath
{PATH
:parallel_aware true
:parallel_safe true
:parallel_workers 2
:rows 25000
:startup_cost 0.00
:total_cost 575.00
}
}
)
}
}

In short, these SubqueryScans are being labeled as producing 60000 rows
when their input only produces 25000 rows, which is surely insane.

So: even though the SubqueryScan itself isn't parallel-aware, the number
of rows it processes has to be de-rated according to the number of workers
involved. Perhaps something like the original patch in this thread is
indeed correct. But I can't help feeling that this definition of
path_rows is mighty unintuitive and error-prone, and that if
cost_subqueryscan is wrong then it's likely got lots of company.
Maybe we need to take two steps back and rethink the whole approach.

regards, tom lane

#21

david.g.johnston@gmail.com

about 4 years ago

In reply to: Tom Lane (#20)

#22