query problem

Started by Robin Ericssonover 21 years ago6 messagesgeneral

robin.ericsson@profecta.se

over 21 years ago

Hi,

I sent this to general earlier but I was redirected to performance.

The query have been running ok for quite some time, but after I did a
vacuum on the database, it's very very slow. This IN-query is only 2
ids. Before the problem that in was a subselect-query returning around
6-7 ids. The tables included in the query are described in database.txt.

status=# select count(id) from data;
count
---------
1577621
(1 row)

status=# select count(data_id) from data_values;
count
---------
9680931
(1 row)

I did run a new explain analyze on the query and found the attached
result. The obvious problem I see is a full index scan in
"idx_dv_data_id". I tried dropping and adding the index again, thats why
is't called "idx_data_values_data_id" in the dump.

status=# EXPLAIN ANALYZE
status-# SELECT
status-# data.entered,
status-# data.machine_id,
status-# datatemplate_intervals.template_id,
status-# data_values.value
status-# FROM
status-# data, data_values, datatemplate_intervals
status-# WHERE
status-# datatemplate_intervals.id = data_values.template_id AND
status-# data_values.data_id = data.id AND
status-# data.machine_id IN (2,3) AND
status-# current_timestamp::timestamp - interval '60 seconds' <
data.entered;

Regards,
Robin

--
Robin Ericsson <robin.ericsson@profecta.se>
Profecta HB

ken

southerland@samsixedd.com

over 21 years ago

In reply to: Robin Ericsson (#1)

Re: query problem

On Wed, 2004-10-13 at 02:21, Robin Ericsson wrote:

Hi,

I sent this to general earlier but I was redirected to performance.

The query have been running ok for quite some time, but after I did a
vacuum on the database, it's very very slow.

Did you do a VACUUM FULL ANALYZE on the database or just a VACUUM? It
looks like your statistics in your query are all off which ANALYZE
should fix.

Show quoted text

This IN-query is only 2
ids. Before the problem that in was a subselect-query returning around
6-7 ids. The tables included in the query are described in database.txt.

status=# select count(id) from data;
count
---------
1577621
(1 row)

status=# select count(data_id) from data_values;
count
---------
9680931
(1 row)

I did run a new explain analyze on the query and found the attached
result. The obvious problem I see is a full index scan in
"idx_dv_data_id". I tried dropping and adding the index again, thats why
is't called "idx_data_values_data_id" in the dump.

status=# EXPLAIN ANALYZE
status-# SELECT
status-# data.entered,
status-# data.machine_id,
status-# datatemplate_intervals.template_id,
status-# data_values.value
status-# FROM
status-# data, data_values, datatemplate_intervals
status-# WHERE
status-# datatemplate_intervals.id = data_values.template_id AND
status-# data_values.data_id = data.id AND
status-# data.machine_id IN (2,3) AND
status-# current_timestamp::timestamp - interval '60 seconds' <
data.entered;

Regards,
Robin

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Robin Ericsson (#1)

Re: query problem

Robin Ericsson <robin.ericsson@profecta.se> writes:

I sent this to general earlier but I was redirected to performance.

Actually, I think I suggested that you consult the pgsql-performance
archives, where this type of problem has been hashed out before.
See for instance this thread:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php
particularly
http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php
which show three different ways of getting the planner to do something
sane with an index range bound like "now() - interval".

regards, tom lane

Robin Ericsson

robin.ericsson@profecta.se

over 21 years ago

In reply to: Tom Lane (#3)

Re: query problem

On Wed, 2004-10-13 at 11:03 -0400, Tom Lane wrote:

Robin Ericsson <robin.ericsson@profecta.se> writes:

I sent this to general earlier but I was redirected to performance.

Actually, I think I suggested that you consult the pgsql-performance
archives, where this type of problem has been hashed out before.
See for instance this thread:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php
particularly
http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php
which show three different ways of getting the planner to do something
sane with an index range bound like "now() - interval".

Using exact timestamp makes the query go back as it should in speed (see
explain below). However I still have the problem using a stored
procedure or even using the "ago"-example from above.

regards,
Robin

status=# explain analyse
status-# SELECT
status-# data.entered,
status-# data.machine_id,
status-# datatemplate_intervals.template_id,
status-# data_values.value
status-# FROM
status-# data, data_values, datatemplate_intervals
status-# WHERE
status-# datatemplate_intervals.id =
data_values.template_id AND
status-# data_values.data_id = data.id AND
status-# data.machine_id IN (SELECT machine_id FROM
machine_group_xref WHERE group_id = 1) AND
status-# '2004-10-13 17:47:36.902062' < data.entered
status-# ;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.09..481.28 rows=777 width=24) (actual
time=0.637..1.804 rows=57 loops=1)
Hash Cond: ("outer".template_id = "inner".id)
-> Nested Loop (cost=1.17..467.71 rows=776 width=24) (actual
time=0.212..1.012 rows=57 loops=1)
-> Hash IN Join (cost=1.17..9.56 rows=146 width=16) (actual
time=0.165..0.265 rows=9 loops=1)
Hash Cond: ("outer".machine_id = "inner".machine_id)
-> Index Scan using idx_d_entered on data
(cost=0.00..6.14 rows=159 width=16) (actual time=0.051..0.097 rows=10
loops=1)
Index Cond: ('2004-10-13
17:47:36.902062'::timestamp without time zone < entered)
-> Hash (cost=1.14..1.14 rows=11 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
-> Seq Scan on machine_group_xref
(cost=0.00..1.14 rows=11 width=4) (actual time=0.017..0.054 rows=11
loops=1)
Filter: (group_id = 1)
-> Index Scan using idx_data_values_data_id on data_values
(cost=0.00..3.07 rows=5 width=16) (actual time=0.018..0.047 rows=6
loops=9)
Index Cond: (data_values.data_id = "outer".id)
-> Hash (cost=1.74..1.74 rows=74 width=8) (actual time=0.382..0.382
rows=0 loops=1)
-> Seq Scan on datatemplate_intervals (cost=0.00..1.74
rows=74 width=8) (actual time=0.024..0.248 rows=74 loops=1)
Total runtime: 2.145 ms
(15 rows)

Robin Ericsson

robin.ericsson@profecta.se

over 21 years ago

In reply to: Robin Ericsson (#4)

Re: [PERFORM] query problem

On Wed, 2004-10-13 at 18:01 +0200, Robin Ericsson wrote:

Using exact timestamp makes the query go back as it should in speed (see
explain below). However I still have the problem using a stored
procedure or even using the "ago"-example from above.

Well, changing ago() to use timestamp without time zone it goes ok in
the query. This query now takes ~2ms.

SELECT
data.entered,
data.machine_id,
datatemplate_intervals.template_id,
data_values.value
FROM
data, data_values, datatemplate_intervals
WHERE
datatemplate_intervals.id = data_values.template_id AND
data_values.data_id = data.id AND
data.machine_id IN (SELECT machine_id FROM machine_group_xref
WHERE group_id = 1) AND
ago('60 seconds') < data.entered

Using it in this procedure.
select * from get_current_machine_status('60 seconds', 1);
takes ~100s. Maybe there's some obvious wrong I do about it?

CREATE TYPE public.mstatus_holder AS
(entered timestamp,
machine_id int4,
template_id int4,
value varchar);
CREATE OR REPLACE FUNCTION public.get_current_machine_status(interval,
int4)
RETURNS SETOF mstatus_holder AS
'
SELECT
data.entered,
data.machine_id,
datatemplate_intervals.template_id,
data_values.value
FROM
data, data_values, datatemplate_intervals
WHERE
datatemplate_intervals.id = data_values.template_id AND
data_values.data_id = data.id AND
data.machine_id IN (SELECT machine_id FROM machine_group_xref WHERE
group_id = $2) AND
ago($1) < data.entered
'
LANGUAGE 'sql' VOLATILE;

Regards,
Robin

Robin Ericsson

robin.ericsson@profecta.se

over 21 years ago

In reply to: Robin Ericsson (#5)

Re: [PERFORM] query problem

Sorry, this should have been going to performance.

Regards,
Robin

query problem

Attachments: