BUG #8130: Hashjoin still gives issues

Started by Stefan de Koninkalmost 13 years ago5 messagesbugs
Jump to latest
#1Stefan de Konink
stefan@konink.de

The following bug has been logged on the website:

Bug reference: 8130
Logged by: Stefan de Konink
Email address: stefan@konink.de
PostgreSQL version: 9.2.4
Operating system: Linux
Description:

We figured out that two very close query give a massive difference
performance between using select * vs select id.

SELECT *
FROM ambit_privateevent_calendars AS a
,ambit_privateevent AS b
,ambit_calendarsubscription AS c
,ambit_calendar AS d
WHERE c.calendar_id = d.id
AND a.privateevent_id = b.id
AND c.user_id = 1270
AND c.calendar_id = a.calendar_id
AND c.STATUS IN (
1
,8
,2
,15
,18
,4
,12
,20
)
AND NOT b.main_recurrence = true;

With some help on IRC we figured out that "there was a bugfix in hash
estimation recently and I was hoping you were older than that", but since we
are not:
PostgreSQL 9.2.4 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc
(Gentoo 4.7.2-r1 p1.6, pie-0.5.5) 4.7.2, 64-bit

...there might still be a bug around.

We compare:
http://explain.depesz.com/s/jRx
http://explain.depesz.com/s/eKE

By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected but
highly appreciated. The result of the last query:

http://explain.depesz.com/s/AWB

What can we do to provide a bit more of information?

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stefan de Konink (#1)
Re: BUG #8130: Hashjoin still gives issues

stefan@konink.de writes:

By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected but
highly appreciated. The result of the last query:

At least in this example, the query appears to be fully cached and so
you would need a random_page_cost near 1 to reflect the system's
behavior properly. If your DB fits mostly in RAM, adjusting the cost
parameters is a much better idea than fooling with the enable_
parameters.

What can we do to provide a bit more of information?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

There is no particularly good reason to think this is a bug; please
take it up on pgsql-performance if you have more questions.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Stefan de Konink
stefan@konink.de
In reply to: Tom Lane (#2)
Re: [BUGS] BUG #8130: Hashjoin still gives issues

Dear Tom,

On Wed, 1 May 2013, Tom Lane wrote:

What can we do to provide a bit more of information?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

There is no particularly good reason to think this is a bug; please
take it up on pgsql-performance if you have more questions.

I beg to disagree, the performance of a select * query and the select b.id
query are both "hot". The result in a fundamentally different query plan
(and performance). Combined with the recent bugfix regarding hash
estimation, it gives me a good indication that there might be a bug.

I am not deep into the query optimiser of PostgreSQL but given the above
same were different selections can change an entire query plan (and * is
in fact out of the box 30 times faster than b.id) it does. When hash is
disabled the entire query is -depending on the system checked- 2 to
30x faster.

The original query:

select * from ambit_privateevent_calendars as a, ambit_privateevent as b,
ambit_calendarsubscription as c, ambit_calendar as d where c.calendar_id =
d.id and a.privateevent_id = b.id and c.user_id = 1270 and c.calendar_id
= a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4, 12, 20) and not
b.main_recurrence = true;

select b.id from ambit_privateevent_calendars as a, ambit_privateevent as
b, ambit_calendarsubscription as c, ambit_calendar as d where c.calendar_id =
d.id and a.privateevent_id = b.id and c.user_id = 1270 and c.calendar_id
= a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4, 12, 20) and not
b.main_recurrence = true;

(select * => select b.id, the star query is *fastest*)

We compare:
http://explain.depesz.com/s/jRx
http://explain.depesz.com/s/eKE

By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected
but highly appreciated. The result of the last query switch the mergejoin:

http://explain.depesz.com/s/AWB

It is also visible that after hashjoin is off, the b.id query is faster
than the * query (what would be expected).

Our test machine is overbudgetted, 4x the memory of the entire database
~4GB, and uses the PostgreSQL stock settings.

Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

#4Igor Neyman
ineyman@perceptron.com
In reply to: Stefan de Konink (#3)
Re: [BUGS] BUG #8130: Hashjoin still gives issues

-----Original Message-----

The original query:

select * from ambit_privateevent_calendars as a, ambit_privateevent as
b, ambit_calendarsubscription as c, ambit_calendar as d where
c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
and c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
12, 20) and not b.main_recurrence = true;

select b.id from ambit_privateevent_calendars as a, ambit_privateevent
as b, ambit_calendarsubscription as c, ambit_calendar as d where
c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
and c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
12, 20) and not b.main_recurrence = true;

(select * => select b.id, the star query is *fastest*)

We compare:
http://explain.depesz.com/s/jRx
http://explain.depesz.com/s/eKE

By setting "set enable_hashjoin = off;" performance in our entire
application increased 30 fold in throughput, which was a bit unexpected
but highly appreciated. The result of the last query switch the
mergejoin:

http://explain.depesz.com/s/AWB

It is also visible that after hashjoin is off, the b.id query is faster
than the * query (what would be expected).

Our test machine is overbudgetted, 4x the memory of the entire database
~4GB, and uses the PostgreSQL stock settings.

Stefan

I'd suggest that you adjust Postgres configuration, specifically memory settings (buffer_cache, work_mem, effective_cache_size), to reflect your hardware config, and see how it affects your query.

Regards,
Igor Neyman

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

#5Jeff Davis
pgsql@j-davis.com
In reply to: Stefan de Konink (#3)
Re: [BUGS] BUG #8130: Hashjoin still gives issues

On Wed, 2013-05-01 at 17:44 +0200, Stefan de Konink wrote:

Combined with the recent bugfix regarding hash
estimation, it gives me a good indication that there might be a bug.

To which recent bugfix are you referring?

The best venue for fixing an issue like this is pgsql-performance -- it
doesn't make too much difference whether it's a "bug" or not.
Performance problems sometimes end up as bugs and sometimes end up being
treated more like an enhancement; but most of the progress is made on
pgsql-performance regardless.

Regards,
Jeff Davis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance