BUG #14646: performance hint to remove

Started by Boshomi Phenixalmost 9 years ago3 messagesbugs

boshomi@gmail.com

almost 9 years ago

The following bug has been logged on the website:

Bug reference: 14646
Logged by: Boshomi
Email address: boshomi@gmail.com
PostgreSQL version: 9.6.2
Operating system: Linux opensuse
Description:

the order of tables in from clause or where clause can result in different
query plans, with different execution time.

Begin;
--DROP TABLE testjoinperf.tempsmall CASCADE;
--DROP TABLE testjoinperf.tempbig;
/*
CREATE SCHEMA testjoinperf;
CREATE TABLE testjoinperf.tempsmall
(
idsmall serial NOT NULL,
smallrnd double precision,
CONSTRAINT pk_tempsamll PRIMARY KEY (idsmall)
);

CREATE TABLE testjoinperf.tempbig
(
idbig serial NOT NULL,
idsmall integer,
bigrnd double precision,
CONSTRAINT pk_tempbig PRIMARY KEY (idbig),
CONSTRAINT fk_tempsmall FOREIGN KEY (idsmall)
REFERENCES testjoinperf.tempsmall (idsmall) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);

insert into testjoinperf.tempsmall (idsmall, smallrnd)
select a.a, random() from generate_series(1,50000) as a;

insert into testjoinperf.tempbig (idbig, idsmall, bigrnd)
select b.b, (b.b % 50000)+1 , random() from generate_series(1,1000000) as
b;
analyze testjoinperf.tempsmall;
analyze testjoinperf.tempbig;
analyze;
*/
/*

--fast query big table left
explain analyze select * from testjoinperf.tempbig join
testjoinperf.tempsmall using(idsmall)
where idsmall between 48000 and 49000
limit 100;
*/

-- slow query, small table left.
explain analyze select * from testjoinperf.tempsmall join
testjoinperf.tempbig using(idsmall)
where idsmall between 48000 and 49000
limit 100;
*/
rollback;

/*
faster query
QUERY PLAN
Limit (cost=0.29..137.92 rows=100 width=24) (actual time=6.589..6.896
rows=100 loops=1)
-> Nested Loop (cost=0.29..28656.67 rows=20821 width=24) (actual
time=6.588..6.880 rows=100 loops=1)
-> Seq Scan on tempbig (cost=0.00..20406.00 rows=20821 width=16)
(actual time=6.576..6.609 rows=100 loops=1)
Filter: ((idsmall >= 48000) AND (idsmall <= 49000))
Rows Removed by Filter: 47998
-> Index Scan using pk_tempsamll on tempsmall (cost=0.29..0.39
rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=100)
Index Cond: (idsmall = tempbig.idsmall)
Planning time: 0.272 ms
Execution time: 6.935 ms

slower query
QUERY PLAN
Limit (cost=52.46..153.33 rows=100 width=24) (actual time=13.063..13.123
rows=100 loops=1)
-> Hash Join (cost=52.46..19399.50 rows=19180 width=24) (actual
time=13.062..13.108 rows=100 loops=1)
Hash Cond: (tempbig.idsmall = tempsmall.idsmall)
-> Seq Scan on tempbig (cost=0.00..15406.00 rows=1000000 width=16)
(actual time=0.011..5.361 rows=48098 loops=1)
-> Hash (cost=40.47..40.47 rows=959 width=12) (actual
time=0.406..0.406 rows=1001 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 55kB
-> Index Scan using pk_tempsamll on tempsmall
(cost=0.29..40.47 rows=959 width=12) (actual time=0.019..0.239 rows=1001
loops=1)
Index Cond: ((idsmall >= 48000) AND (idsmall <=
49000))
Planning time: 0.277 ms
Execution time: 13.166 ms

the query are eqal.

it also makes a differnence if the where clause is
where tempbig.idsmall between 48000 and 49000 /* fast */

where tempsmall.idsmall between 48000 and 49000 /* slow */

to change the order of tables is a performance hint, and should be
removed.

(Boshomi is my username on german wikipedia (de.wikipedia.org), feel free to
contact me for any question.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David G. Johnston

david.g.johnston@gmail.com

almost 9 years ago

In reply to: Boshomi Phenix (#1)

Re: BUG #14646: performance hint to remove

On Wed, May 10, 2017 at 10:37 AM, <boshomi@gmail.com> wrote:

The following bug has been logged on the website:

Bug reference: 14646
Logged by: Boshomi
Email address: boshomi@gmail.com
PostgreSQL version: 9.6.2
Operating system: Linux opensuse
Description:

the order of tables in from clause or where clause can result in different
query plans, with different execution time.

[...]

--fast query big table left
explain analyze select * from testjoinperf.tempbig join
testjoinperf.tempsmall using(idsmall)
where idsmall between 48000 and 49000

-- slow query, small table left.
explain analyze select * from testjoinperf.tempsmall join
testjoinperf.tempbig using(idsmall)
where idsmall between 48000 and 49000

Confirmed 9.6.2 on Ubuntu. Not sure this is properly classified as a bug
but its definitely an area where improvement would seem desirable. I am a
particularly heavy user of the join predicate "USING" and never really
thought to look at this dynamic (without the WHERE clause it doesn't
matter, both tables up end sequentially scanned).

Someone more knowledgeable than myself will need to comment on the
technical aspects as to why "where idsmall" seems to be linked to the left
relation instead of the one with a more favorable execution plan.

Boshomi, how did you stumble across this anyway - just with artificial data
or did you come up with that after hitting the problem with real data?

David J.

Import Notes

Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: David G. Johnston (#2)

Re: BUG #14646: performance hint to remove

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Confirmed 9.6.2 on Ubuntu. Not sure this is properly classified as a bug
but its definitely an area where improvement would seem desirable. I am a
particularly heavy user of the join predicate "USING" and never really
thought to look at this dynamic (without the WHERE clause it doesn't
matter, both tables up end sequentially scanned).

Yeah, if you use USING the parser tries to define the merged join variable
as equivalent to just one or the other input variable. With an inner join
it'll arbitrarily pick the left input, with a left or right join it will
use the outer-side variable (since the inner-side one might go to NULL).
With a FULL join, the merged variable is actually defined as
COALESCE(left-input, right-input), which is necessary for semantic
correctness but is pretty awful for optimization purposes.

If you put an equality constraint on the merged join variable, everything
works pretty well anyway because of deduction of implied equalities (ie,
we have left-input = right-input from the USING's join clause, and then
if we also know e.g. left-input = 42 we can deduce right-input = 42).
There are some hacks that make that carry through for outer join cases
as well.

The OP's WHERE clause isn't a simple equality so the equivalence-class
machinery doesn't help there. There's been some speculation about
improving that, but there are a bunch of semantic and practical problems
in the way. One of the biggest is that we can't simply duplicate the
restriction for each input variable, because that would cause the planner
to misestimate their selectivity and produce bad plans of a different
sort (not knowing that the restrictions are redundant). The eclass
machinery knows enough to compensate for that, but it's not clear how
to do it for arbitrary restrictions.

Anyway, this isn't a bug but a missed optimization opportunity, and
I would counsel not holding your breath waiting for an improvement.
It's a difficult problem and it's not tremendously high priority.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs