Join Bad Performance on different data types

Started by Adarsh Sharmaabout 12 years ago7 messagesgeneral

eddy.adarsh@gmail.com

about 12 years ago

Hi,

Today i need to change datatype of one of my tables from *bigint to
bigint[] *due to
application requirements. But One of my query hangs after this change :-

select DISTINCT glt.id || ':' || gtt.name as id_type, glt.name, latitude,
longitude, radius, latitude || ',' || longitude as latlon from graph1 glt,
graph2 gcr, graph3 gtd, graph5 td, graph6 gtt where gtt.id=td.entity_type_id
AND glt.id=ANY(gtd.id_list) and *gtd.id <http://gtd.id> = gcr.t_ids*) AND
gcr.id=ANY(td.detail_id);

*Explain Analyze Expected time : Total runtime: 19460.809 ms*

Query not finishing from last 1 hour. Tables size are ~ 5k rows and fairly
small tables.

I tried creating simple and gin indexes on the column(t_ids) but still not
helping. Anyone has any idea or faced this before. Postgresql version is
9.2.

Thanks

Sameer Kumar

sameer.kumar@ashnik.com

about 12 years ago

In reply to: Adarsh Sharma (#1)

Re: Join Bad Performance on different data types

On Tue, Mar 4, 2014 at 2:57 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote:

I tried creating simple and gin indexes on the column(t_ids) but still not
helping. Anyone has any idea or faced this before. Postgresql version is
9.2.

have you done a vacuum analyze or analyze after this step?

You might have to disable sequential scans
set enable_seqscan=off;

And then fire the query.

Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com

*[image: icons]*

[image: Email patch] <http://www.ashnik.com/>

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).

Pavel Stehule

pavel.stehule@gmail.com

about 12 years ago

In reply to: Adarsh Sharma (#1)

Re: Join Bad Performance on different data types

Hello

PostgreSQL doesn't use index when types on left and right part are not
equal.

Probably you lost some index - you can see a difference in EXPLAIN SELECT
...

Regards

Pavel

2014-03-04 7:57 GMT+01:00 Adarsh Sharma <eddy.adarsh@gmail.com>:

Show quoted text

Hi,

Today i need to change datatype of one of my tables from *bigint to
bigint[] *due to
application requirements. But One of my query hangs after this change :-

select DISTINCT glt.id || ':' || gtt.name as id_type, glt.name, latitude,
longitude, radius, latitude || ',' || longitude as latlon from graph1 glt,
graph2 gcr, graph3 gtd, graph5 td, graph6 gtt where gtt.id=td.entity_type_id
AND glt.id=ANY(gtd.id_list) and *gtd.id <http://gtd.id> = gcr.t_ids*) AND
gcr.id=ANY(td.detail_id);

*Explain Analyze Expected time : Total runtime: 19460.809 ms*

select DISTINCT glt.id || ':' || gtt.name as id_type, glt.name, latitude,
longitude, radius, latitude || ',' || longitude as latlon from graph1 glt,
graph2 gcr, graph3 gtd, graph5 td, graph6 gtt where gtt.id=td.entity_type_id
AND glt.id=ANY(gtd.id_list) and *gtd.id <http://gtd.id> = ANY(gcr.t_ids)*AND
gcr.id=ANY(td.detail_id);

Query not finishing from last 1 hour. Tables size are ~ 5k rows and fairly
small tables.

I tried creating simple and gin indexes on the column(t_ids) but still not
helping. Anyone has any idea or faced this before. Postgresql version is
9.2.

Thanks

Adarsh Sharma

eddy.adarsh@gmail.com

about 12 years ago

In reply to: Sameer Kumar (#2)

Re: Join Bad Performance on different data types

On Tue, Mar 4, 2014 at 1:13 PM, Sameer Kumar <sameer.kumar@ashnik.com>wrote:

On Tue, Mar 4, 2014 at 2:57 PM, Adarsh Sharma <eddy.adarsh@gmail.com>wrote:

I tried creating simple and gin indexes on the column(t_ids) but still
not helping. Anyone has any idea or faced this before. Postgresql version
is 9.2.

have you done a vacuum analyze or analyze after this step?

You might have to disable sequential scans
set enable_seqscan=off;

And then fire the query.

Thanks Sameer. yes i already did vacuum analyze but i tried
enable_seqscan=off; this time and explain analyze finished in 34 seconds.

PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=10651634346.70..10651780073.12 rows=4163612 width=64)
(actual time=34375.675..34764.705 rows=751392 loops=1)
-> Nested Loop (cost=10000000000.03..10646590270.49 rows=336271747
width=64) (actual time=0.217..24988.534 rows=6541944 loops=1)
-> Nested Loop (cost=10000000000.02..10012318364.23
rows=33628639 width=116) (actual time=0.177..3427.380 rows=1431 loops=1)
-> Nested Loop (cost=10000000000.01..10001045237.36
rows=3368723 width=38) (actual time=0.138..3373.767 rows=1431 loops=1)
-> Nested Loop (cost=10000000000.00..10000097742.23
rows=340181 width=38) (actual time=0.047..2151.183 rows=418145 loops=1)
Join Filter: (td.entity_type_id = gtt.id)
Rows Removed by Join Filter: 1269335
-> Seq Scan on graph5 td
(cost=10000000000.00..10000077008.13 rows=345413 width=33) (actual
time=0.020..1231.823 rows=421870 loops=1)
-> Materialize (cost=0.00..9.33 rows=4
width=13) (actual time=0.000..0.001 rows=4 loops=421870)
-> Index Scan using geo_type_pkey on
graph6 gtt (cost=0.00..9.31 rows=4 width=13) (actual time=0.009..0.012
rows=4 loops=1)
-> Index Scan using graph2_pkey on graph2 gcr
(cost=0.01..2.69 rows=10 width=33) (actual time=0.002..0.002 rows=0
loops=418145)
Index Cond: (id = ANY (td.graph3_id))
-> Index Scan using graph3_pkey on graph3 gtd
(cost=0.01..3.25 rows=10 width=115) (actual time=0.035..0.036 rows=1
loops=1431)
Index Cond: (id = ANY (gcr.t_ids))
-> Index Scan using graph1_pkey on graph1 glt (cost=0.01..18.51
rows=10 width=55) (actual time=0.085..9.082 rows=4572 loops=1431)
Index Cond: (id = ANY (gtd.lat_long_id_list))
Total runtime: 34810.040 ms

Is dere any way i can rewrite the query so that i need not to set
seqscan-off, because i dont want to embed one more line in application
code and also dont want to change global setting in postgresql.conf to
disable seqscan.

Thanks

Show quoted text

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).

Sameer Kumar

sameer.kumar@ashnik.com

about 12 years ago

In reply to: Adarsh Sharma (#4)

Re: Join Bad Performance on different data types

On Tue, Mar 4, 2014 at 4:19 PM, Adarsh Sharma <eddy.adarsh@gmail.com> wrote:

Is dere any way i can rewrite the query so that i need not to set
seqscan-off, because i dont want to embed one more line in application
code and also dont want to change global setting in postgresql.conf to
disable seqscan.

You can use a specific user for this query and set parameter's value at
user level.

But if you don't have any way of doing that then you pretty much have to
"embed one more line in application code" [though I have never understood
why application teams do not want to do that]. It should be fine, unless
you are developing a product which can be used with someother RDBMS and
hence you don't want to introduce a PostgreSQL specific line in code.

Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com

*[image: icons]*

[image: Email patch] <http://www.ashnik.com/>

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).

Victor Yegorov

vyegorov@gmail.com

about 12 years ago

In reply to: Adarsh Sharma (#4)

Re: Join Bad Performance on different data types

2014-03-04 10:19 GMT+02:00 Adarsh Sharma <eddy.adarsh@gmail.com>:

PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=10651634346.70..10651780073.12 rows=4163612
width=64) (actual time=34375.675..34764.705 rows=751392 loops=1)
-> Nested Loop (cost=10000000000.03..10646590270.49 rows=336271747
width=64) (actual time=0.217..24988.534 rows=6541944 loops=1)
-> Nested Loop (cost=10000000000.02..10012318364.23
rows=33628639 width=116) (actual time=0.177..3427.380 rows=1431 loops=1)
-> Nested Loop (cost=10000000000.01..10001045237.36
rows=3368723 width=38) (actual time=0.138..3373.767 rows=1431 loops=1)
-> Nested Loop (cost=10000000000.00..10000097742.23
rows=340181 width=38) (actual time=0.047..2151.183 rows=418145 loops=1)
Join Filter: (td.entity_type_id = gtt.id)
Rows Removed by Join Filter: 1269335
-> Seq Scan on graph5 td
(cost=10000000000.00..10000077008.13 rows=345413 width=33) (actual
time=0.020..1231.823 rows=421870 loops=1)
-> Materialize (cost=0.00..9.33 rows=4
width=13) (actual time=0.000..0.001 rows=4 loops=421870)
-> Index Scan using geo_type_pkey on
graph6 gtt (cost=0.00..9.31 rows=4 width=13) (actual time=0.009..0.012
rows=4 loops=1)
-> Index Scan using graph2_pkey on graph2 gcr
(cost=0.01..2.69 rows=10 width=33) (actual time=0.002..0.002 rows=0
loops=418145)
Index Cond: (id = ANY (td.graph3_id))
-> Index Scan using graph3_pkey on graph3 gtd
(cost=0.01..3.25 rows=10 width=115) (actual time=0.035..0.036 rows=1
loops=1431)
Index Cond: (id = ANY (gcr.t_ids))
-> Index Scan using graph1_pkey on graph1 glt (cost=0.01..18.51
rows=10 width=55) (actual time=0.085..9.082 rows=4572 loops=1431)
Index Cond: (id = ANY (gtd.lat_long_id_list))
Total runtime: 34810.040 ms

Is dere any way i can rewrite the query so that i need not to set
seqscan-off, because i dont want to embed one more line in application
code and also dont want to change global setting in postgresql.conf to
disable seqscan.

Could you, kindly, also post `EXPLAIN` output of your original and modified
queries also, leaving out all `enable_...` setting at their defaults.
Just explain, without `analyze`.

--
Victor Y. Yegorov

Victor Yegorov

vyegorov@gmail.com

about 12 years ago

In reply to: Adarsh Sharma (#1)

Re: Join Bad Performance on different data types

2014-03-05 6:53 GMT+02:00 Adarsh Sharma <eddy.adarsh@gmail.com>:

Anyhow working on finding the root cause.

I would do the following:

1. Check your `default_statistics_target`, like:

SHOW default_statistics_target;

2. Your previously posted `EXPLAIN ANALYZE` with `set enable_seqscan=off`
shows
a significant skew in the estiamted number of rows. I would really bump
up the
`default_statistics_target` higher (modify your `postgresql.conf` and
reload),
up to 500, re-analyzed all tables and checked the performance again.

If you don't want to do it for the whole cluster, then increase for
individual columns.
I think increasing it up to 500 (or higher) for the columns containing
arrays will be
beneficial in any case:

ALTER TABLE graph5 ALTER detail_id SET STATISTICS 1000;
ALTER TABLE graph3 ALTER id_list SET STATISTICS 1000;
ALTER TABLE graph2 ALTER targeting_ids SET STATISTICS 1000;

Of course, you need to `ANALYZE` (or `VACUUM ANALYZE`) the tables after
this change.

3. Next, I've reformatted your query the following way (easier for me):

SELECT DISTINCT
glt.id || ':' || gtt.name as id_type,
glt.name,latitude,longitude,radius,
latitude || ',' || longitude as latlon
FROM graph5 td JOIN graph6 gtt ON gtt.id=td.entity_type_id JOIN
graph2 gcr ON gcr.id=ANY(td.detail_id) JOIN graph3 gtd ON gtd.id=gcr.t_ids
JOIN graph1 glt ON glt.id=ANY(gtd.id_list);

I've ordered joins (well, tables) in the way they depend on each other.
I would try out to move the first join into the CTE in order to isolate
it's results
and help optimizer to better estiamte the rows. Something like this:

WITH first AS (
SELECT td.*,gtt.*
FROM graph5 td
JOIN graph6 gtt ON gtt.id=td.entity_type_id
)
SELECT DISTINCT
glt.id || ':' || first.name as id_type,
glt.name,latitude,longitude,radius,
latitude || ',' || longitude as latlon
FROM first
JOIN graph2 gcr ON gcr.id=ANY(first.detail_id)
JOIN graph3 gtd ON gtd.id=gcr.t_ids
JOIN graph1 glt ON glt.id=ANY(gtd.id_list)

4. Try disabling materialization, like `set enable_material=off`.

--
Victor Y. Yegorov

Import Notes

Reply to msg id not found: CAGx-QqKpnWxVTZGZ2K-V9FTMhkfXhXwA4OxHva=x8+TKUFVXHw@mail.gmail.com

Join Bad Performance on different data types

Attachments:

Attachments: