BUG #8130: Hashjoin still gives issues
The following bug has been logged on the website:
Bug reference: 8130
Logged by: Stefan de Konink
Email address: stefan@konink.de
PostgreSQL version: 9.2.4
Operating system: Linux
Description:
We figured out that two very close query give a massive difference
performance between using select * vs select id.
SELECT *
FROM ambit_privateevent_calendars AS a
,ambit_privateevent AS b
,ambit_calendarsubscription AS c
,ambit_calendar AS d
WHERE c.calendar_id = d.id
AND a.privateevent_id = b.id
AND c.user_id = 1270
AND c.calendar_id = a.calendar_id
AND c.STATUS IN (
1
,8
,2
,15
,18
,4
,12
,20
)
AND NOT b.main_recurrence = true;
With some help on IRC we figured out that "there was a bugfix in hash
estimation recently and I was hoping you were older than that", but since we
are not:
PostgreSQL 9.2.4 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc
(Gentoo 4.7.2-r1 p1.6, pie-0.5.5) 4.7.2, 64-bit
...there might still be a bug around.
We compare:
http://explain.depesz.com/s/jRx
http://explain.depesz.com/s/eKE
By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected but
highly appreciated. The result of the last query:
http://explain.depesz.com/s/AWB
What can we do to provide a bit more of information?
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
stefan@konink.de writes:
By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected but
highly appreciated. The result of the last query:
At least in this example, the query appears to be fully cached and so
you would need a random_page_cost near 1 to reflect the system's
behavior properly. If your DB fits mostly in RAM, adjusting the cost
parameters is a much better idea than fooling with the enable_
parameters.
What can we do to provide a bit more of information?
https://wiki.postgresql.org/wiki/Slow_Query_Questions
There is no particularly good reason to think this is a bug; please
take it up on pgsql-performance if you have more questions.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Dear Tom,
On Wed, 1 May 2013, Tom Lane wrote:
What can we do to provide a bit more of information?
https://wiki.postgresql.org/wiki/Slow_Query_Questions
There is no particularly good reason to think this is a bug; please
take it up on pgsql-performance if you have more questions.
I beg to disagree, the performance of a select * query and the select b.id
query are both "hot". The result in a fundamentally different query plan
(and performance). Combined with the recent bugfix regarding hash
estimation, it gives me a good indication that there might be a bug.
I am not deep into the query optimiser of PostgreSQL but given the above
same were different selections can change an entire query plan (and * is
in fact out of the box 30 times faster than b.id) it does. When hash is
disabled the entire query is -depending on the system checked- 2 to
30x faster.
The original query:
select * from ambit_privateevent_calendars as a, ambit_privateevent as b,
ambit_calendarsubscription as c, ambit_calendar as d where c.calendar_id =
d.id and a.privateevent_id = b.id and c.user_id = 1270 and c.calendar_id
= a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4, 12, 20) and not
b.main_recurrence = true;
select b.id from ambit_privateevent_calendars as a, ambit_privateevent as
b, ambit_calendarsubscription as c, ambit_calendar as d where c.calendar_id =
d.id and a.privateevent_id = b.id and c.user_id = 1270 and c.calendar_id
= a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4, 12, 20) and not
b.main_recurrence = true;
(select * => select b.id, the star query is *fastest*)
We compare:
http://explain.depesz.com/s/jRx
http://explain.depesz.com/s/eKE
By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected
but highly appreciated. The result of the last query switch the mergejoin:
http://explain.depesz.com/s/AWB
It is also visible that after hashjoin is off, the b.id query is faster
than the * query (what would be expected).
Our test machine is overbudgetted, 4x the memory of the entire database
~4GB, and uses the PostgreSQL stock settings.
Stefan
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
-----Original Message-----
The original query:
select * from ambit_privateevent_calendars as a, ambit_privateevent as
b, ambit_calendarsubscription as c, ambit_calendar as d where
c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
and c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
12, 20) and not b.main_recurrence = true;select b.id from ambit_privateevent_calendars as a, ambit_privateevent
as b, ambit_calendarsubscription as c, ambit_calendar as d where
c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
and c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
12, 20) and not b.main_recurrence = true;(select * => select b.id, the star query is *fastest*)
We compare:
http://explain.depesz.com/s/jRx
http://explain.depesz.com/s/eKEBy setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected
but highly appreciated. The result of the last query switch the
mergejoin:http://explain.depesz.com/s/AWB
It is also visible that after hashjoin is off, the b.id query is faster
than the * query (what would be expected).Our test machine is overbudgetted, 4x the memory of the entire database
~4GB, and uses the PostgreSQL stock settings.Stefan
I'd suggest that you adjust Postgres configuration, specifically memory settings (buffer_cache, work_mem, effective_cache_size), to reflect your hardware config, and see how it affects your query.
Regards,
Igor Neyman
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Wed, 2013-05-01 at 17:44 +0200, Stefan de Konink wrote:
Combined with the recent bugfix regarding hash
estimation, it gives me a good indication that there might be a bug.
To which recent bugfix are you referring?
The best venue for fixing an issue like this is pgsql-performance -- it
doesn't make too much difference whether it's a "bug" or not.
Performance problems sometimes end up as bugs and sometimes end up being
treated more like an enhancement; but most of the progress is made on
pgsql-performance regardless.
Regards,
Jeff Davis
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance