query problem
Hi,
I sent this to general earlier but I was redirected to performance.
The query have been running ok for quite some time, but after I did a
vacuum on the database, it's very very slow. This IN-query is only 2
ids. Before the problem that in was a subselect-query returning around
6-7 ids. The tables included in the query are described in database.txt.
status=# select count(id) from data;
count
---------
1577621
(1 row)
status=# select count(data_id) from data_values;
count
---------
9680931
(1 row)
I did run a new explain analyze on the query and found the attached
result. The obvious problem I see is a full index scan in
"idx_dv_data_id". I tried dropping and adding the index again, thats why
is't called "idx_data_values_data_id" in the dump.
status=# EXPLAIN ANALYZE
status-# SELECT
status-# data.entered,
status-# data.machine_id,
status-# datatemplate_intervals.template_id,
status-# data_values.value
status-# FROM
status-# data, data_values, datatemplate_intervals
status-# WHERE
status-# datatemplate_intervals.id = data_values.template_id AND
status-# data_values.data_id = data.id AND
status-# data.machine_id IN (2,3) AND
status-# current_timestamp::timestamp - interval '60 seconds' <
data.entered;
Regards,
Robin
--
Robin Ericsson <robin.ericsson@profecta.se>
Profecta HB
On Wed, 2004-10-13 at 02:21, Robin Ericsson wrote:
Hi,
I sent this to general earlier but I was redirected to performance.
The query have been running ok for quite some time, but after I did a
vacuum on the database, it's very very slow.
Did you do a VACUUM FULL ANALYZE on the database or just a VACUUM? It
looks like your statistics in your query are all off which ANALYZE
should fix.
Show quoted text
This IN-query is only 2
ids. Before the problem that in was a subselect-query returning around
6-7 ids. The tables included in the query are described in database.txt.status=# select count(id) from data;
count
---------
1577621
(1 row)status=# select count(data_id) from data_values;
count
---------
9680931
(1 row)I did run a new explain analyze on the query and found the attached
result. The obvious problem I see is a full index scan in
"idx_dv_data_id". I tried dropping and adding the index again, thats why
is't called "idx_data_values_data_id" in the dump.status=# EXPLAIN ANALYZE
status-# SELECT
status-# data.entered,
status-# data.machine_id,
status-# datatemplate_intervals.template_id,
status-# data_values.value
status-# FROM
status-# data, data_values, datatemplate_intervals
status-# WHERE
status-# datatemplate_intervals.id = data_values.template_id AND
status-# data_values.data_id = data.id AND
status-# data.machine_id IN (2,3) AND
status-# current_timestamp::timestamp - interval '60 seconds' <
data.entered;Regards,
Robin
Robin Ericsson <robin.ericsson@profecta.se> writes:
I sent this to general earlier but I was redirected to performance.
Actually, I think I suggested that you consult the pgsql-performance
archives, where this type of problem has been hashed out before.
See for instance this thread:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php
particularly
http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php
which show three different ways of getting the planner to do something
sane with an index range bound like "now() - interval".
regards, tom lane
On Wed, 2004-10-13 at 11:03 -0400, Tom Lane wrote:
Robin Ericsson <robin.ericsson@profecta.se> writes:
I sent this to general earlier but I was redirected to performance.
Actually, I think I suggested that you consult the pgsql-performance
archives, where this type of problem has been hashed out before.
See for instance this thread:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00169.php
particularly
http://archives.postgresql.org/pgsql-performance/2004-07/msg00175.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00184.php
http://archives.postgresql.org/pgsql-performance/2004-07/msg00185.php
which show three different ways of getting the planner to do something
sane with an index range bound like "now() - interval".
Using exact timestamp makes the query go back as it should in speed (see
explain below). However I still have the problem using a stored
procedure or even using the "ago"-example from above.
regards,
Robin
status=# explain analyse
status-# SELECT
status-# data.entered,
status-# data.machine_id,
status-# datatemplate_intervals.template_id,
status-# data_values.value
status-# FROM
status-# data, data_values, datatemplate_intervals
status-# WHERE
status-# datatemplate_intervals.id =
data_values.template_id AND
status-# data_values.data_id = data.id AND
status-# data.machine_id IN (SELECT machine_id FROM
machine_group_xref WHERE group_id = 1) AND
status-# '2004-10-13 17:47:36.902062' < data.entered
status-# ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.09..481.28 rows=777 width=24) (actual
time=0.637..1.804 rows=57 loops=1)
Hash Cond: ("outer".template_id = "inner".id)
-> Nested Loop (cost=1.17..467.71 rows=776 width=24) (actual
time=0.212..1.012 rows=57 loops=1)
-> Hash IN Join (cost=1.17..9.56 rows=146 width=16) (actual
time=0.165..0.265 rows=9 loops=1)
Hash Cond: ("outer".machine_id = "inner".machine_id)
-> Index Scan using idx_d_entered on data
(cost=0.00..6.14 rows=159 width=16) (actual time=0.051..0.097 rows=10
loops=1)
Index Cond: ('2004-10-13
17:47:36.902062'::timestamp without time zone < entered)
-> Hash (cost=1.14..1.14 rows=11 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
-> Seq Scan on machine_group_xref
(cost=0.00..1.14 rows=11 width=4) (actual time=0.017..0.054 rows=11
loops=1)
Filter: (group_id = 1)
-> Index Scan using idx_data_values_data_id on data_values
(cost=0.00..3.07 rows=5 width=16) (actual time=0.018..0.047 rows=6
loops=9)
Index Cond: (data_values.data_id = "outer".id)
-> Hash (cost=1.74..1.74 rows=74 width=8) (actual time=0.382..0.382
rows=0 loops=1)
-> Seq Scan on datatemplate_intervals (cost=0.00..1.74
rows=74 width=8) (actual time=0.024..0.248 rows=74 loops=1)
Total runtime: 2.145 ms
(15 rows)
On Wed, 2004-10-13 at 18:01 +0200, Robin Ericsson wrote:
Using exact timestamp makes the query go back as it should in speed (see
explain below). However I still have the problem using a stored
procedure or even using the "ago"-example from above.
Well, changing ago() to use timestamp without time zone it goes ok in
the query. This query now takes ~2ms.
SELECT
data.entered,
data.machine_id,
datatemplate_intervals.template_id,
data_values.value
FROM
data, data_values, datatemplate_intervals
WHERE
datatemplate_intervals.id = data_values.template_id AND
data_values.data_id = data.id AND
data.machine_id IN (SELECT machine_id FROM machine_group_xref
WHERE group_id = 1) AND
ago('60 seconds') < data.entered
Using it in this procedure.
select * from get_current_machine_status('60 seconds', 1);
takes ~100s. Maybe there's some obvious wrong I do about it?
CREATE TYPE public.mstatus_holder AS
(entered timestamp,
machine_id int4,
template_id int4,
value varchar);
CREATE OR REPLACE FUNCTION public.get_current_machine_status(interval,
int4)
RETURNS SETOF mstatus_holder AS
'
SELECT
data.entered,
data.machine_id,
datatemplate_intervals.template_id,
data_values.value
FROM
data, data_values, datatemplate_intervals
WHERE
datatemplate_intervals.id = data_values.template_id AND
data_values.data_id = data.id AND
data.machine_id IN (SELECT machine_id FROM machine_group_xref WHERE
group_id = $2) AND
ago($1) < data.entered
'
LANGUAGE 'sql' VOLATILE;
Regards,
Robin