Parallel Plans and Cost of non-filter functions

Started by Paul Ramseyabout 8 years ago8 messages
#1Paul Ramsey
pramsey@cleverelephant.ca

I'm working on a custom aggregate, that generates a serialized data format.
The preparation of the geometry before being formatted is pretty intense,
so it is probably a good thing for that work to be done in parallel, in
partial aggregates. Here's an example SQL call:

EXPLAIN analyze
SELECT length(ST_AsMVT(a)) FROM (
SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid,
fed_num
FROM pts_10 p
WHERE p.geom && ::geometry_literal
AND p.geom IS NOT NULL
) a;

The ST_AsMVTGeom() function can be comically expensive, it's really good
when it's in partial aggregates. But the cost of the function seems to be
ignored.

(First note that, in order to consistently get parallel plans I have to
brutally suppress parallel_tuple_cost, as described here
http://blog.cleverelephant.ca/2017/10/parallel-postgis-2.html)

Whether I get a parallel aggregate seems entirely determined by the number
of rows, not the cost of preparing those rows.

When changing the number of rows in the subquery, with a LIMIT, I can
change from a seq scan to a paralllel seq scan and finally to a parallel
aggregate, as the number of rows goes up.

An odd effect: when I have enough rows to get a paralllel seq scan, I get
flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That
seems odd and backwards.

Is there anywhere a guide or rough description to how costs are used in
determining parallel plans? The empirical approach starts to wear one down
after a while :)

P.

#2Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Paul Ramsey (#1)
Re: Parallel Plans and Cost of non-filter functions

Just clarifying myself a little, since I made a dumb error partway through.

On Thu, Nov 2, 2017 at 10:09 AM, Paul Ramsey <pramsey@cleverelephant.ca>
wrote:

I'm working on a custom aggregate, that generates a serialized data
format. The preparation of the geometry before being formatted is pretty
intense, so it is probably a good thing for that work to be done in
parallel, in partial aggregates. Here's an example SQL call:

EXPLAIN analyze
SELECT length(ST_AsMVT(a)) FROM (
SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid,
fed_num
FROM pts_10 p
WHERE p.geom && ::geometry_literal
AND p.geom IS NOT NULL
) a;

The ST_AsMVTGeom() function can be comically expensive, it's really good
when it's in partial aggregates. But the cost of the function seems to be
ignored.

(First note that, in order to consistently get parallel plans I have to
brutally suppress parallel_tuple_cost, as described here http://blog.
cleverelephant.ca/2017/10/parallel-postgis-2.html)

Whether I get a parallel aggregate seems entirely determined by the number
of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing costs
on the subquery select list, and on the aggregate transfn.

When changing the number of rows in the subquery, with a LIMIT, I can
change from a seq scan to a paralllel seq scan and finally to a parallel
aggregate, as the number of rows goes up.

I see now that as soon as I brought the LIMIT in, the plans had to go
sequential, just due to the nature of a LIMIT in a subquery. Ignore the
below, sorry.
Thanks!
P

Show quoted text

An odd effect: when I have enough rows to get a paralllel seq scan, I get
flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That
seems odd and backwards.

Is there anywhere a guide or rough description to how costs are used in
determining parallel plans? The empirical approach starts to wear one down
after a while :)

P.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Paul Ramsey (#2)
Re: Parallel Plans and Cost of non-filter functions

Paul Ramsey <pramsey@cleverelephant.ca> writes:

Whether I get a parallel aggregate seems entirely determined by the number
of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing costs
on the subquery select list, and on the aggregate transfn.

This sounds like it might be the same issue being discussed in

/messages/by-id/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tels
nospam-pg-abuse@bloodgate.com
In reply to: Tom Lane (#3)
Re: Parallel Plans and Cost of non-filter functions

Moin,

On Fri, November 3, 2017 7:13 pm, Tom Lane wrote:

Paul Ramsey <pramsey@cleverelephant.ca> writes:

Whether I get a parallel aggregate seems entirely determined by the
number
of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing
costs
on the subquery select list, and on the aggregate transfn.

This sounds like it might be the same issue being discussed in

/messages/by-id/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com

When looking at the web archive, the link is broken, even though in the
mail above it appears correct for me:

/messages/by-id/28621.1509750807@sss.pgh.pa.us

(shortened: http://bit.ly/2zetO5T)

Seems the email-obfuskation breaks such links?

Here is a short-link for people reading it via the archive on http:

http://bit.ly/2hF4lIt

Best regards,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#3)
Re: Parallel Plans and Cost of non-filter functions

On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul Ramsey <pramsey@cleverelephant.ca> writes:

Whether I get a parallel aggregate seems entirely determined by the number
of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing costs
on the subquery select list, and on the aggregate transfn.

This sounds like it might be the same issue being discussed in

/messages/by-id/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com

I have rebased the patch being discussed on that thread.

Paul, you might want to once check with the recent patch [1]/messages/by-id/CAA4eK1+1H5Urm0_Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA@mail.gmail.com posted on
the thread mentioned by Tom.

[1]: /messages/by-id/CAA4eK1+1H5Urm0_Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Amit Kapila (#5)
Re: Parallel Plans and Cost of non-filter functions

On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul Ramsey <pramsey@cleverelephant.ca> writes:

Whether I get a parallel aggregate seems entirely determined by the

number

of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing

costs

on the subquery select list, and on the aggregate transfn.

This sounds like it might be the same issue being discussed in

/messages/by-id/CAMkU=

1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com

Thanks Tom, Amit; yes, this issue (expensive things in target lists not
affecting plans) seems like what I'm talking about in this particular case
and something that shows up a lot in PostGIS use cases: a function on a
target list like ST_Buffer() or ST_Intersection() will be a couple orders
of magnitude more expensive than anything in the filters.

I have rebased the patch being discussed on that thread.

Paul, you might want to once check with the recent patch [1] posted on
the thread mentioned by Tom.

[1] - /messages/by-id/CAA4eK1+1H5Urm0_
Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com

Awesome! I will compare and report back,
Thanks much!
P

Show quoted text

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#7Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Amit Kapila (#5)
Re: Parallel Plans and Cost of non-filter functions

From my perspective, this is much much better. For sufficiently large
tables, I get parallel behaviour without jimmying with the defaults
on parallel_setup_cost and parallel_tuple_cost. *And*, the parallel
behaviour *is* sensitive to the costs of functions in target lists, so
reasonably chosen costs will flip us into a parallel mode for expensive
functions against smaller tables too.
Hopefully some variant of this finds it's way into core! Is there any way I
can productively help?
P.

On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

Show quoted text

On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul Ramsey <pramsey@cleverelephant.ca> writes:

Whether I get a parallel aggregate seems entirely determined by the

number

of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing

costs

on the subquery select list, and on the aggregate transfn.

This sounds like it might be the same issue being discussed in

/messages/by-id/CAMkU=

1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com

I have rebased the patch being discussed on that thread.

Paul, you might want to once check with the recent patch [1] posted on
the thread mentioned by Tom.

[1] - /messages/by-id/CAA4eK1+1H5Urm0_
Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Paul Ramsey (#7)
Re: Parallel Plans and Cost of non-filter functions

On Mon, Nov 6, 2017 at 7:40 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

From my perspective, this is much much better. For sufficiently large
tables, I get parallel behaviour without jimmying with the defaults on
parallel_setup_cost and parallel_tuple_cost. *And*, the parallel behaviour
*is* sensitive to the costs of functions in target lists, so reasonably
chosen costs will flip us into a parallel mode for expensive functions
against smaller tables too.

Thanks for the confirmation.

Hopefully some variant of this finds it's way into core! Is there any way I
can productively help?

You have already helped a lot by providing the use case, but feel free
to ping on that thread if you find it is not moving.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers