Inaccurate Explain Cost

Started by Robert Sosinskiover 13 years ago8 messagesgeneral

rsosinski@ticketevolution.com

over 13 years ago

Hey Everyone,

I seem to be getting an inaccurate cost from explain. Here are two examples for one query with two different query plans:

exchange_prod=# set enable_nestloop = on;
SET
exchange_prod=#
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=190169.54..190169.55 rows=1 width=4) (actual time=199806.806..199806.807 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..190162.49 rows=2817 width=4) (actual time=163.293..199753.548 rows=43904 loops=1)
-> Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual time=163.275..186869.844 rows=43904 loops=1)
-> Index Scan using upload_destinations_office_id_idx on upload_destinations (cost=0.00..29.95 rows=4 width=8) (actual time=0.060..0.093 rows=6 loops=1)
Index Cond: (office_id = 6)
-> Index Scan using index_exchange_uploads_on_upload_destination_id on exchange_uploads (cost=0.00..37978.21 rows=875 width=12) (actual time=27.197..31140.375 rows=7317 loops=6)
Index Cond: (upload_destination_id = upload_destinations.id)
Filter: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
-> Index Scan using index_import_errors_on_exchange_upload_id on import_errors (cost=0.00..8.49 rows=405 width=4) (actual time=0.291..0.291 rows=0 loops=43904)
Index Cond: (exchange_upload_id = exchange_uploads.id)
Total runtime: 199806.951 ms
(11 rows)

exchange_prod=#
exchange_prod=# set enable_nestloop = off;
SET
exchange_prod=#
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2535992.33..2535992.34 rows=1 width=4) (actual time=133447.507..133447.507 rows=1 loops=1)
-> Hash Right Join (cost=1816553.69..2535985.56 rows=2708 width=4) (actual time=133405.326..133417.078 rows=43906 loops=1)
Hash Cond: (import_errors.exchange_upload_id = exchange_uploads.id)
-> Seq Scan on import_errors (cost=0.00..710802.71 rows=2300471 width=4) (actual time=0.006..19199.569 rows=2321888 loops=1)
-> Hash (cost=1816519.84..1816519.84 rows=2708 width=4) (actual time=112938.606..112938.606 rows=43906 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1544kB
-> Hash Join (cost=28.25..1816519.84 rows=2708 width=4) (actual time=42.957..112892.689 rows=43906 loops=1)
Hash Cond: (exchange_uploads.upload_destination_id = upload_destinations.id)
-> Index Scan using index_upload_destinations_on_created_at on exchange_uploads (cost=0.00..1804094.96 rows=3298545 width=12) (actual time=17.686..111649.272 rows=3303488 loops=1)
Index Cond: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
-> Hash (cost=28.20..28.20 rows=4 width=8) (actual time=0.043..0.043 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on upload_destinations (cost=6.28..28.20 rows=4 width=8) (actual time=0.026..0.036 rows=6 loops=1)
Recheck Cond: (office_id = 6)
-> Bitmap Index Scan on upload_destinations_office_id_idx (cost=0.00..6.28 rows=4 width=0) (actual time=0.020..0.020 rows=6 loops=1)
Index Cond: (office_id = 6)
Total runtime: 133447.790 ms
(17 rows)

The first query shows a cost of 190,169.55 and runs in 199,806.951 ms. When I disable nested loop, I get a cost of 2,535,992.34 which runs in only 133,447.790 ms. We have run queries on our database with a cost of 200K cost before and they ran less then a few seconds, which makes me wonder if the first query plan is inaccurate. The other issue is understanding why a query plan with a much higher cost is taking less time to run.

I do not think these queries are cached differently, as we have gotten the same results ran a couple of times at across a few days. We also analyzed the tables that we are querying before trying the explain analyze again, and were met with the same statistics. Any help on how Postgres comes up with a query plan like this, and why there is a difference would be very helpful.

Thanks!

--
Robert Sosinski

Shaun Thomas

sthomas@optionshouse.com

over 13 years ago

In reply to: Robert Sosinski (#1)

Re: Inaccurate Explain Cost

On 09/26/2012 01:38 PM, Robert Sosinski wrote:

I seem to be getting an inaccurate cost from explain. Here are two
examples for one query with two different query plans:

Well, there's this:

Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual
time=163.275..186869.844 rows=43904 loops=1)

If anything's a smoking gun, that is. I could see why you'd want to turn
off nested loops to get better execution time. But the question is: why
did it think it would match so few rows in the first place? The planner
probably would have thrown away this query plan had it known it would
loop 20x more than it thought.

I think we need to know what your default_statistics_target is set at,
and really... all of your relevant postgresql settings.

Please see this:

http://wiki.postgresql.org/wiki/Slow_Query_Questions

But you also may need to look a lot more into your query itself. The
difference between a 2 or a 3 minute query isn't going to help you
much. Over here, we tend to spend more of our time turning 2 or 3 minute
queries into 20 or 30ms queries. But judging by your date range, getting
the last 2-months of data from a table that large generally won't be
fast by any means.

That said, looking at your actual query:

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
LEFT JOIN uploads u ON u.id = eu.upload_id
LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
LEFT JOIN exchanges e ON e.id = ud.exchange_id
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Doesn't need half of these joins. They're left joins, and never used in
the query results or where criteria. You could just use this:

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Though I presume this is just a count precursor to a query that fetches
the actul results and does need the left join. Either way, the index
scan from your second example matches 3.3M rows by using the created_at
index on exchange_uploads. That's not really very restrictive, and so
you have two problems:

1. Your nested loop stats from office_id are somehow wrong. Try
increasing your stats on that column, or just default_statistics_target
in general, and re-analyze.
2. Your created_at criteria above match way too many rows, and will also
take a long time to process.

Those are your two actual problems. We can probably get your query to
run faster, but those are pretty significant hurdles.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Edson Richter

edsonrichter@hotmail.com

over 13 years ago

In reply to: Shaun Thomas (#2)

Re: [PERFORM] Inaccurate Explain Cost

Em 26/09/2012 17:03, Shaun Thomas escreveu:

On 09/26/2012 01:38 PM, Robert Sosinski wrote:

I seem to be getting an inaccurate cost from explain. Here are two
examples for one query with two different query plans:

Well, there's this:

Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual
time=163.275..186869.844 rows=43904 loops=1)

If anything's a smoking gun, that is. I could see why you'd want to
turn off nested loops to get better execution time. But the question
is: why did it think it would match so few rows in the first place?
The planner probably would have thrown away this query plan had it
known it would loop 20x more than it thought.

I think we need to know what your default_statistics_target is set at,
and really... all of your relevant postgresql settings.

Please see this:

http://wiki.postgresql.org/wiki/Slow_Query_Questions

But you also may need to look a lot more into your query itself. The
difference between a 2 or a 3 minute query isn't going to help you
much. Over here, we tend to spend more of our time turning 2 or 3
minute queries into 20 or 30ms queries. But judging by your date
range, getting the last 2-months of data from a table that large
generally won't be fast by any means.

That said, looking at your actual query:

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
LEFT JOIN uploads u ON u.id = eu.upload_id
LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
LEFT JOIN exchanges e ON e.id = ud.exchange_id
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Doesn't need half of these joins. They're left joins, and never used
in the query results or where criteria. You could just use this:

Interesting. I've similar situation, where user can choose a set of
filters, and then the query must have several left joins "just in case"
(user need in the filer).
I know other database that is able to remove unnecessary outer joins
from queries when they are not relevant and for instance become faster.
Can't PostgreSQL do the same?

Regards,

Edson.

Show quoted text

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Though I presume this is just a count precursor to a query that
fetches the actul results and does need the left join. Either way, the
index scan from your second example matches 3.3M rows by using the
created_at index on exchange_uploads. That's not really very
restrictive, and so you have two problems:

1. Your nested loop stats from office_id are somehow wrong. Try
increasing your stats on that column, or just
default_statistics_target in general, and re-analyze.
2. Your created_at criteria above match way too many rows, and will
also take a long time to process.

Those are your two actual problems. We can probably get your query to
run faster, but those are pretty significant hurdles.

hubert depesz lubaczewski

depesz@depesz.com

over 13 years ago

In reply to: Robert Sosinski (#1)

Re: Inaccurate Explain Cost

On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:

The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
When I disable nested loop, I get a cost of 2,535,992.34 which runs in
only 133,447.790 ms. We have run queries on our database with a cost
of 200K cost before and they ran less then a few seconds, which makes
me wonder if the first query plan is inaccurate. The other issue is
understanding why a query plan with a much higher cost is taking less
time to run.

Are you under impression that cost should be somehow related to actual
time?
If yes - that's not true, and afaik never was.
the fact that you got similar time and cost is just a coincidence.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

Sam Gendler

sgendler@ideasculptor.com

over 13 years ago

In reply to: hubert depesz lubaczewski (#4)

Re: Inaccurate Explain Cost

On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski <
depesz@depesz.com> wrote:

On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:

The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
When I disable nested loop, I get a cost of 2,535,992.34 which runs in
only 133,447.790 ms. We have run queries on our database with a cost
of 200K cost before and they ran less then a few seconds, which makes
me wonder if the first query plan is inaccurate. The other issue is
understanding why a query plan with a much higher cost is taking less
time to run.

Are you under impression that cost should be somehow related to actual
time?
If yes - that's not true, and afaik never was.
the fact that you got similar time and cost is just a coincidence.

Well...only sort of. In a well-tuned db with accurate statistics, relative
cost between 2 plans should be reflected in relative execution time between
those 2 queries (assuming the data in memory is similar for both runs,
anyway), and that's what he seems to be complaining about. The plan with
higher cost had lower execution time, which resulted in the planner picking
the slower query. But the reason for the execution time discrepancy would
appear to be, at least in part, inaccurate statistics resulting in an
incorrect estimate of number of rows in a loop iteration. More info about
the db config would help to identify other things contributing to the
inaccurate cost estimate - as mentioned earlier, please refer to
http://wiki.postgresql.org/wiki/Slow_Query_Questions when asking
performance questions

And yes, I know you know all of this, Hubert. I wrote it for the benefit
of the original questioner.

--sam

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Edson Richter (#3)

Re: [PERFORM] Inaccurate Explain Cost

Edson Richter <edsonrichter@hotmail.com> writes:

That said, looking at your actual query:

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
LEFT JOIN uploads u ON u.id = eu.upload_id
LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
LEFT JOIN exchanges e ON e.id = ud.exchange_id
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Doesn't need half of these joins. They're left joins, and never used
in the query results or where criteria. You could just use this:

Interesting. I've similar situation, where user can choose a set of
filters, and then the query must have several left joins "just in case"
(user need in the filer).
I know other database that is able to remove unnecessary outer joins
from queries when they are not relevant and for instance become faster.
Can't PostgreSQL do the same?

It does, and did - note the query plan is only scanning 3 of the 5
tables mentioned in the query. (The other left join appears to be
to a non-unique column, which makes it not redundant.)

regards, tom lane

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: hubert depesz lubaczewski (#4)

Re: [GENERAL] Inaccurate Explain Cost

On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:

On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:

The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
When I disable nested loop, I get a cost of 2,535,992.34 which runs in
only 133,447.790 ms. We have run queries on our database with a cost
of 200K cost before and they ran less then a few seconds, which makes
me wonder if the first query plan is inaccurate. The other issue is
understanding why a query plan with a much higher cost is taking less
time to run.

Are you under impression that cost should be somehow related to actual
time?

I am certainly under that impression. If the estimated cost has
nothing to do with run time, then what is it that the cost-based
optimizer is trying to optimize?

The arbitrary numbers of the cost parameters do not formally have any
units, but they had better have some vaguely proportional relationship
with the dimension of time, or else there is no point in having an
optimizer. For any given piece of hardware (including table-space, if
you have different table-spaces on different storage), configuration
and cachedness, there should be some constant factor to translate cost
into time. To the extent that there fails to be such a constant
factor, it is either a misconfiguration, or a room for improvement in
the planner.

The only exceptions I can think of is are 1) when there is only one
way to do something, the planner may not bother to cost it (i.e.
assign it a cost of zero) because it will not help make a decision.
However, the only instances of this that I know of are in DML, not in
pure selects, and 2) the costs of setting hint bits and such in
selects is not estimated, except to the extent they are folded into
something else, like the page visiting costs.

Cheers,

Jeff

Matthias

nitrogenycs@gmail.com

over 13 years ago

In reply to: Jeff Janes (#7)

Re: [GENERAL] Inaccurate Explain Cost

Am 27.09.2012, 02:04 Uhr, schrieb Jeff Janes <jeff.janes@gmail.com>:

On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:

On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:

The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
When I disable nested loop, I get a cost of 2,535,992.34 which runs in
only 133,447.790 ms. We have run queries on our database with a cost
of 200K cost before and they ran less then a few seconds, which makes
me wonder if the first query plan is inaccurate. The other issue is
understanding why a query plan with a much higher cost is taking less
time to run.

Are you under impression that cost should be somehow related to actual
time?

I am certainly under that impression. If the estimated cost has
nothing to do with run time, then what is it that the cost-based
optimizer is trying to optimize?

See http://www.postgresql.org/docs/9.2/static/runtime-config-query.html
section "18.7.2. Planner Cost Constants".

-Matthias