tricking EXPLAIN?

Started by Wim Bertelsover 14 years ago4 messagesgeneral
Jump to latest
#1Wim Bertels
wim.bertels@khleuven.be

Hallo,

if u compare the 2 queries, then they should be equivalent:

-- normal
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc,
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp;

-- trying to trick explain with a redundant join
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON pg_amproc.amproc = pg_amproc.amproc;

If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).

So explain is being tricked,
and the reason for this seems the number of rows in de nested loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html

Suggestions, comments are always welcome.

mvg,
Wim Bertels

#2Szymon Guz
mabewlun@gmail.com
In reply to: Wim Bertels (#1)
Re: tricking EXPLAIN?

On 28 November 2011 12:55, Wim Bertels <wim.bertels@khleuven.be> wrote:

Hallo,

if u compare the 2 queries, then they should be equivalent:

-- normal
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc,
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp;

-- trying to trick explain with a redundant join
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON pg_amproc.amproc = pg_amproc.amproc;

If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).

So explain is being tricked,
and the reason for this seems the number of rows in de nested loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html

Suggestions, comments are always welcome.

mvg,
Wim Bertels

Hi,
could you show us the output of explain analyze?

regards
Szymon

#3Wim Bertels
wim.bertels@khleuven.be
In reply to: Szymon Guz (#2)
Re: tricking EXPLAIN?

On ma, 2011-11-28 at 13:00 +0100, Szymon Guz wrote:

On 28 November 2011 12:55, Wim Bertels <wim.bertels@khleuven.be>
wrote:
Hallo,

if u compare the 2 queries, then they should be equivalent:

-- normal
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc,
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp;

"Nested Loop (cost=5.04..13.13 rows=243 width=38) (actual
time=0.333..0.953 rows=243 loops=1)"
" -> Aggregate (cost=5.04..5.05 rows=1 width=2) (actual
time=0.326..0.327 rows=1 loops=1)"
" -> Seq Scan on pg_amproc (cost=0.00..4.43 rows=243 width=2)
(actual time=0.003..0.157 rows=243 loops=1)"
" -> Seq Scan on pg_amproc (cost=0.00..4.43 rows=243 width=6) (actual
time=0.002..0.147 rows=243 loops=1)"
"Total runtime: 1.117 ms"

-- trying to trick explain with a redundant join
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON pg_amproc.amproc = pg_amproc.amproc;

"Nested Loop (cost=5.04..10.11 rows=1 width=38) (actual
time=0.376..80.891 rows=243 loops=1)"
" -> Seq Scan on pg_amproc (cost=0.00..5.04 rows=1 width=6) (actual
time=0.028..0.249 rows=243 loops=1)"
" Filter: ((amproc)::oid = (amproc)::oid)"
" -> Aggregate (cost=5.04..5.05 rows=1 width=2) (actual
time=0.327..0.328 rows=1 loops=243)"
" -> Seq Scan on pg_amproc (cost=0.00..4.43 rows=243 width=2)
(actual time=0.002..0.156 rows=243 loops=243)"
"Total runtime: 81.101 ms"

If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).

So explain is being tricked,
and the reason for this seems the number of rows in de nested
loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html

Suggestions, comments are always welcome.

mvg,
Wim Bertels

Hi,
could you show us the output of explain analyze?

cf supra,
Wim

#4Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Wim Bertels (#1)
Re: tricking EXPLAIN?

(2011/11/28 20:55), Wim Bertels wrote:

If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).

So explain is being tricked,
and the reason for this seems the number of rows in de nested loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html

Suggestions, comments are always welcome.

Interesting. I tried a modified version of second query, and got same
EXPLAIN output as first query.

SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON true; -- semantically same as "amproc = amproc"

So, I think that the point of this issue is somehow PG thinks wrongly
that "amporc = amproc" filters the result to just one row, though such
condition never reduces result. I also tried simplified query, and got
another result which shows that PG estimates that same condition reduces
to half.

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (amproc = amproc);
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on pg_amproc (cost=0.00..67.52 rows=126 width=18) (actual
time=0.039..1.356 rows=252 loops=1)
Filter: (amproc = amproc)
Total runtime: 1.445 ms
(3 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (true);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_amproc (cost=0.00..4.52 rows=252 width=18) (actual
time=0.008..0.045 rows=252 loops=1)
Total runtime: 0.089 ms
(2 rows)

IMHO planner should be modified so that it can estimate result rows
accurately in this case.

--
Shigeru Hanada