Unwanted expression simplification in PG12b2

Started by Darafei "Komяpa" Praliaskouskiover 6 years ago8 messages

Hi,

Many thanks for the parallel improvements in Postgres 12. Here is one of
cases where a costy function gets moved from a parallel worker into main
one, rendering spatial processing single core once again on some queries.
Perhaps an assumption "expressions should be mashed together as much as
possible" should be reviewed and something along "biggest part of
expression should be pushed down into parallel worker"?

PostgreSQL 12beta2 (Ubuntu 12~beta2-1.pgdg19.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, 64-bit

Here is a reproducer:

-- setup
create extension postgis;
create table postgis_test_table (a geometry, b geometry, id int);
set force_parallel_mode to on;
insert into postgis_test_table (select 'POINT EMPTY', 'POINT EMPTY',
generate_series(0,1000) );

-- unwanted inlining moves difference and unary union calculation into
master worker
21:43:06 [gis] > explain verbose select ST_Collect(geom), id from
(select ST_Difference(a,ST_UnaryUnion(b)) as geom, id from
postgis_test_table) z group by id;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Gather (cost=159.86..42668.93 rows=200 width=36)

│ Output: (st_collect(st_difference(postgis_test_table.a,
st_unaryunion(postgis_test_table.b)))), postgis_test_table.id │
│ Workers Planned: 1

│ Single Copy: true

│ -> GroupAggregate (cost=59.86..42568.73 rows=200 width=36)

│ Output: st_collect(st_difference(postgis_test_table.a,
st_unaryunion(postgis_test_table.b))), postgis_test_table.id │
│ Group Key: postgis_test_table.id

│ -> Sort (cost=59.86..61.98 rows=850 width=68)

│ Output: postgis_test_table.id, postgis_test_table.a,
postgis_test_table.b │
│ Sort Key: postgis_test_table.id

│ -> Seq Scan on public.postgis_test_table
(cost=0.00..18.50 rows=850 width=68) │
│ Output: postgis_test_table.id,
postgis_test_table.a, postgis_test_table.b

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(12 rows)

-- when constrained by OFFSET 0, costy calculation is kept in parallel workers
21:43:12 [gis] > explain verbose select ST_Collect(geom), id from
(select ST_Difference(a,ST_UnaryUnion(b)) as geom, id from
postgis_test_table offset 0) z group by id;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY
PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ GroupAggregate (cost=13863.45..13872.33 rows=200 width=36)

│ Output: st_collect(z.geom), z.id

│ Group Key: z.id

│ -> Sort (cost=13863.45..13865.58 rows=850 width=36)

│ Output: z.id, z.geom

│ Sort Key: z.id

│ -> Subquery Scan on z (cost=100.00..13822.09 rows=850
width=36) │
│ Output: z.id, z.geom

│ -> Gather (cost=100.00..13813.59 rows=850 width=36)

│ Output: (st_difference(postgis_test_table.a,
st_unaryunion(postgis_test_table.b))), postgis_test_table.id │
│ Workers Planned: 3

│ -> Parallel Seq Scan on
public.postgis_test_table (cost=0.00..13712.74 rows=274 width=36)

│ Output:
st_difference(postgis_test_table.a,
st_unaryunion(postgis_test_table.b)), postgis_test_table.id │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(13 rows)

-- teardown
drop table postgis_test_table;

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Darafei "Komяpa" Praliaskouski (#1)
Re: Unwanted expression simplification in PG12b2

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net> writes:

Many thanks for the parallel improvements in Postgres 12. Here is one of
cases where a costy function gets moved from a parallel worker into main
one, rendering spatial processing single core once again on some queries.
Perhaps an assumption "expressions should be mashed together as much as
possible" should be reviewed and something along "biggest part of
expression should be pushed down into parallel worker"?

I don't see anything in your test case that proves what you think it does.
The expensive calculation *is* being done in the worker in the first
example. It's not real clear to me why the first example is only choosing
to use one worker rather than 3, but probably with a larger test case
(ie bigger table) that decision would change.

Just to clarify --- when you see something like this:

│ Gather (cost=159.86..42668.93 rows=200 width=36)
│ Output: (st_collect(st_difference(postgis_test_table.a,st_unaryunion(postgis_test_table.b)))), postgis_test_table.id
│ -> GroupAggregate (cost=59.86..42568.73 rows=200 width=36)
│ Output: st_collect(st_difference(postgis_test_table.a,st_unaryunion(postgis_test_table.b))), postgis_test_table.id

EXPLAIN is trying to tell you that the expression value is being
computed by the lower plan node, and just passed up to the upper
plan node --- that's what the extra parens in the upper expression
printout mean. Perhaps there's some way to make that clearer,
but I haven't thought of one that doesn't seem very clutter-y.

regards, tom lane

In reply to: Tom Lane (#2)
Re: Unwanted expression simplification in PG12b2

Hi,

On Wed, Jul 17, 2019 at 11:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net>
writes:

Many thanks for the parallel improvements in Postgres 12. Here is one of
cases where a costy function gets moved from a parallel worker into main
one, rendering spatial processing single core once again on some queries.
Perhaps an assumption "expressions should be mashed together as much as
possible" should be reviewed and something along "biggest part of
expression should be pushed down into parallel worker"?

I don't see anything in your test case that proves what you think it does.
The expensive calculation *is* being done in the worker in the first
example. It's not real clear to me why the first example is only choosing
to use one worker rather than 3, but probably with a larger test case
(ie bigger table) that decision would change.

Indeed, it seems I failed to minimize my example.

Here is the actual one, on 90GB table with 16M rows:
https://gist.github.com/Komzpa/8d5b9008ad60f9ccc62423c256e78b4c

I can share the table on request if needed, but hope that plan may be
enough.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

#4Robert Haas
robertmhaas@gmail.com
In reply to: Darafei "Komяpa" Praliaskouski (#3)
Re: Unwanted expression simplification in PG12b2

On Wed, Jul 17, 2019 at 5:20 PM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:

Indeed, it seems I failed to minimize my example.

Here is the actual one, on 90GB table with 16M rows:
https://gist.github.com/Komzpa/8d5b9008ad60f9ccc62423c256e78b4c

I can share the table on request if needed, but hope that plan may be enough.

[ replying to an old thread ]

I think that this boils down to a lack of planner smarts about target
lists. The planner currently assumes that any given relation - which
for planner purposes might be an actual table or might be the result
of joining multiple tables, aggregating something, running a subquery,
etc. - more or less has one thing that it's supposed to produce. It
only tries to generate plans that produce that target list. There's
some support in there for the idea that there might be various paths
for the same relation that produce different answers, but I don't know
of that actually being used anywhere (but it might be).

What I taught the planner to do here had to do with making the costing
more accurate for cases like this. It now figures out that if it's
going to stick a Gather in at that point, computing the expressions
below the Gather rather than above the Gather makes a difference to
the cost of that plan vs. other plans. However, it still doesn't
consider any more paths than it did before; it just costs them more
accurately. In your first example, I believe that the planner should
be able to consider both GroupAggregate -> Gather Merge -> Sort ->
Parallel Seq Scan and GroupAggregate -> Sort -> Gather -> Parallel Seq
Scan, but I think it's got a fixed idea about which fields should be
fed into the Sort. In particular, I believe it thinks that sorting
more data is so undesirable that it doesn't want to carry any
unnecessary baggage through the Sort for any reason. To solve this
problem, I think it would need to cost the second plan with projection
done both before the Sort and after the Sort and decide which one was
cheaper.

This class of problem is somewhat annoying in that the extra planner
cycles and complexity to deal with getting this right would be useless
for many queries, but at the same time, there are a few cases where it
can win big. I don't know what to do about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Robert Haas (#4)
Re: Unwanted expression simplification in PG12b2

Hi,

On Fri, Sep 20, 2019 at 11:14 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 17, 2019 at 5:20 PM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:

Indeed, it seems I failed to minimize my example.

Here is the actual one, on 90GB table with 16M rows:
https://gist.github.com/Komzpa/8d5b9008ad60f9ccc62423c256e78b4c

I can share the table on request if needed, but hope that plan may be

enough.

What I taught the planner to do here had to do with making the costing
more accurate for cases like this. It now figures out that if it's
going to stick a Gather in at that point, computing the expressions
below the Gather rather than above the Gather makes a difference to
the cost of that plan vs. other plans. However, it still doesn't
consider any more paths than it did before; it just costs them more
accurately. In your first example, I believe that the planner should
be able to consider both GroupAggregate -> Gather Merge -> Sort ->
Parallel Seq Scan and GroupAggregate -> Sort -> Gather -> Parallel Seq
Scan, but I think it's got a fixed idea about which fields should be
fed into the Sort. In particular, I believe it thinks that sorting
more data is so undesirable that it doesn't want to carry any
unnecessary baggage through the Sort for any reason. To solve this
problem, I think it would need to cost the second plan with projection
done both before the Sort and after the Sort and decide which one was
cheaper.

This class of problem is somewhat annoying in that the extra planner
cycles and complexity to deal with getting this right would be useless
for many queries, but at the same time, there are a few cases where it
can win big. I don't know what to do about that.

A heuristic I believe should help my case (and I hardly imagine how it can
break others) is that in presence of Gather, all the function calls that
are parallel safe should be pushed into it.
In a perfect future this query shouldn't even have a subquery that I have
extracted for the sake of OFFSET 0 demo. Probably as a single loop that in
case of presence of a Gather tries to push down all the inner part of the
nested functions call that is Parallel Safe.
If we go as far as starting more workers, it really makes sense to load
them with actual work and not only wait for the master process.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

#6Robert Haas
robertmhaas@gmail.com
In reply to: Darafei "Komяpa" Praliaskouski (#5)
Re: Unwanted expression simplification in PG12b2

On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:

A heuristic I believe should help my case (and I hardly imagine how it can break others) is that in presence of Gather, all the function calls that are parallel safe should be pushed into it.

The cost of pushing data through the Sort is not necessarily
insignificant. Your functions are (IIUC) extremely expensive, so it's
worth going to any length to reduce the time spent evaluating them.
However, if someone has ||(text,text) in the tlist, that might be the
wrong approach, because it's not saving much to compute that earlier
and it might make the sort a lot wider, especially if de-TOASTing is
involved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Finnerty, Jim
jfinnert@amazon.com
In reply to: Robert Haas (#6)
Re: Unwanted expression simplification in PG12b2

If the function was moved to the FROM clause where it would be executed as a lateral cross join instead of a target list expression, how would this affect the cost-based positioning of the Gather?

On 9/23/19, 8:59 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:

A heuristic I believe should help my case (and I hardly imagine how it can break others) is that in presence of Gather, all the function calls that are parallel safe should be pushed into it.

The cost of pushing data through the Sort is not necessarily
insignificant. Your functions are (IIUC) extremely expensive, so it's
worth going to any length to reduce the time spent evaluating them.
However, if someone has ||(text,text) in the tlist, that might be the
wrong approach, because it's not saving much to compute that earlier
and it might make the sort a lot wider, especially if de-TOASTing is
involved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Robert Haas
robertmhaas@gmail.com
In reply to: Finnerty, Jim (#7)
Re: Unwanted expression simplification in PG12b2

On Mon, Sep 23, 2019 at 9:20 PM Finnerty, Jim <jfinnert@amazon.com> wrote:

If the function was moved to the FROM clause where it would be executed as a lateral cross join instead of a target list expression, how would this affect the cost-based positioning of the Gather?

I think you'd end up turning what is now a Seq Scan into a Nested Loop
with a Seq Scan on one side. I think the computation of the target
list would be done by the Function Scan or Result node on the other
side of the Nested Loop, and couldn't move anywhere else. The planner
would consider putting the Gather either on top of the Nested Loop or
on top of the Seq Scan, and the former would probably win. So I think
this would give the desired behavior, but I haven't thought about it
very hard.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company