how to remove a for-loop from programming language and put it into the query?

Started by Pedro Zorzenon Netoalmost 16 years ago9 messagesgeneral
Jump to latest
#1Pedro Zorzenon Neto
pedro2009@mandic.com.br

Hello!

Can someone help me to develop a query?

Things are more complex than this example, but with this simple example
I can explain what I need and get an answer.

Table: "diagnose_logs"
Fields:
- id serial
- hardware_id integer
- diag_value integer
- ts timestamp

So I collect many diagnose information from many hardwares.

So, I need to get a report of all diagnostics of all hardware on
december 25th.

(external programming language)
for ($i = 1; $i < 500; $i++) {
// return me the "most recent" diag_value from a hardware_id $i
// at the desired timestamp
runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
23:59:59' and hardware_id = $i order by ts desc limit 1");
}

Currently I have an index on diagnose_logs(ts,hardware_id)
I have 3 milion registers of 500 different hardware_id.

The time to run 500 times this query is long... about 1 minute. When I
need a montly day-by-day report of 500 hardwares, it takes about half an
hour.

can I turn this for-loop into a single query to run in postgres?

Thanks,
Pedro

#2Thom Brown
thombrown@gmail.com
In reply to: Pedro Zorzenon Neto (#1)
Re: how to remove a for-loop from programming language and put it into the query?

On 5 July 2010 15:48, Pedro Zorzenon Neto <pedro2009@mandic.com.br> wrote:

Hello!

Can someone help me to develop a query?

Things are more complex than this example, but with this simple example
I can explain what I need and get an answer.

Table: "diagnose_logs"
Fields:
 - id serial
 - hardware_id integer
 - diag_value integer
 - ts timestamp

So I collect many diagnose information from many hardwares.

So, I need to get a report of all diagnostics of all hardware on
december 25th.

(external programming language)
for ($i = 1; $i < 500; $i++) {
 // return me the "most recent" diag_value from a hardware_id $i
 // at the desired timestamp
 runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
23:59:59' and hardware_id = $i order by ts desc limit 1");
}

Currently I have an index on diagnose_logs(ts,hardware_id)
I have 3 milion registers of 500 different hardware_id.

The time to run 500 times this query is long... about 1 minute. When I
need a montly day-by-day report of 500 hardwares, it takes about half an
hour.

can I turn this for-loop into a single query to run in postgres?

Thanks,
Pedro

I'm probably misunderstanding the problem, but can't you just do:

SELECT
diag_value
FROM
diagnose_logs
WHERE
ts <= '2009-12-25 23:59:59'
AND
hardware_id BETWEEN 1 AND 500
ORDER BY
ts DESC
LIMIT 1

Regards

Thom

#3Sam Mason
sam@samason.me.uk
In reply to: Pedro Zorzenon Neto (#1)
Re: how to remove a for-loop from programming language and put it into the query?

On Mon, Jul 05, 2010 at 11:48:37AM -0300, Pedro Zorzenon Neto wrote:

for ($i = 1; $i < 500; $i++) {
// return me the "most recent" diag_value from a hardware_id $i
// at the desired timestamp
runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
23:59:59' and hardware_id = $i order by ts desc limit 1");
}

can I turn this for-loop into a single query to run in postgres?

You want to be using DISTINCT ON or some sort of WINDOW function.
DISTINCT ON works with older version of PG, but isn't as standards'
conforming. The following should do the trick with DISTINCT ON:

SELECT DISTINCT ON (hardware_id) hardware_id, diag_value, ts
FROM diagnose_logs
WHERE ts <= '2009-12-25 23:59:59'
ORDER BY hardware_id, ts DESC;

You can obviously put in the normal clauses to limit the hardware_ids to
be things you consider important in the normal ways.

--
Sam http://samason.me.uk/

#4Pedro Zorzenon Neto
pedro2009@mandic.com.br
In reply to: Thom Brown (#2)
Re: how to remove a for-loop from programming language and put it into the query?

I'm probably misunderstanding the problem, but can't you just do:

SELECT
diag_value
FROM
diagnose_logs
WHERE
ts <= '2009-12-25 23:59:59'
AND
hardware_id BETWEEN 1 AND 500
ORDER BY
ts DESC
LIMIT 1

Hi Thom,

Yes, I think you misunderstood.

An example of a table:

hardware_id | ts | diag_value
1 | 2009-12-25 14:00:00 | 43.5 (*)
1 | 2009-12-26 15:00:00 | 43.6
1 | 2009-12-24 13:00:00 | 43.7
2 | 2009-12-24 15:00:00 | 43.8 (*)
2 | 2009-12-24 14:00:00 | 43.9
2 | 2009-12-24 14:16:00 | 43.9
2 | 2009-12-27 14:00:00 | 44.0

I need to get the "most recent" value before "2009-12-25 23:59:59" from
every hardware_id.

For hardware_id=1, the value would be:
1 | 2009-12-25 14:00:00 | 43.5
for hardware_id=2, the value would be:
2 | 2009-12-24 15:00:00 | 43.8

I need a query that will return me those lines marked with (*) :-) is
this possible?

Thanks!

#5Tim Landscheidt
tim@tim-landscheidt.de
In reply to: Pedro Zorzenon Neto (#1)
Re: how to remove a for-loop from programming language and put it into the query?

Pedro Zorzenon Neto <pedro2009@mandic.com.br> wrote:

[...]
So, I need to get a report of all diagnostics of all hardware on
december 25th.

(external programming language)
for ($i = 1; $i < 500; $i++) {
// return me the "most recent" diag_value from a hardware_id $i
// at the desired timestamp
runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
23:59:59' and hardware_id = $i order by ts desc limit 1");
}

Currently I have an index on diagnose_logs(ts,hardware_id)
I have 3 milion registers of 500 different hardware_id.

The time to run 500 times this query is long... about 1 minute. When I
need a montly day-by-day report of 500 hardwares, it takes about half an
hour.

can I turn this for-loop into a single query to run in postgres?

Another month, another case for "DISTINCT ON":

| SELECT DISTINCT ON (hardware_id)
| hardware_id, diag_value
| FROM diagnose_logs
| WHERE ts <= '2009-12-25 23:59:59'
| ORDER BY hardware_id, ts DESC;

BTW, I'd prefer "WHERE ts < '2009-12-26'" as otherwise you
don't catch a timestamp '2009-12-25 23:59:59.5' (not to
speak of leap seconds).

Tim

#6Thom Brown
thombrown@gmail.com
In reply to: Pedro Zorzenon Neto (#4)
Re: how to remove a for-loop from programming language and put it into the query?

On 5 July 2010 16:26, Pedro Zorzenon Neto <pedro2009@mandic.com.br> wrote:

I'm probably misunderstanding the problem, but can't you just do:

SELECT
      diag_value
FROM
      diagnose_logs
WHERE
      ts <= '2009-12-25 23:59:59'
AND
      hardware_id BETWEEN 1 AND 500
ORDER BY
      ts DESC
LIMIT 1

Hi Thom,

Yes, I think you misunderstood.

An example of a table:

hardware_id | ts                  | diag_value
         1 | 2009-12-25 14:00:00 | 43.5 (*)
         1 | 2009-12-26 15:00:00 | 43.6
         1 | 2009-12-24 13:00:00 | 43.7
         2 | 2009-12-24 15:00:00 | 43.8 (*)
         2 | 2009-12-24 14:00:00 | 43.9
         2 | 2009-12-24 14:16:00 | 43.9
         2 | 2009-12-27 14:00:00 | 44.0

I need to get the "most recent" value before "2009-12-25 23:59:59" from
every hardware_id.

For hardware_id=1, the value would be:
         1 | 2009-12-25 14:00:00 | 43.5
for hardware_id=2, the value would be:
         2 | 2009-12-24 15:00:00 | 43.8

I need a query that will return me those lines marked with (*) :-) is
this possible?

Thanks!

D'oh! I completely ignored that "LIMIT 1". okay... what Sam said.

Thom

#7Pedro Zorzenon Neto
pedro2009@mandic.com.br
In reply to: Sam Mason (#3)
Re: how to remove a for-loop from programming language and put it into the query?

Em 05-07-2010 12:22, Sam Mason escreveu:

You want to be using DISTINCT ON or some sort of WINDOW function.
DISTINCT ON works with older version of PG, but isn't as standards'
conforming. The following should do the trick with DISTINCT ON:

SELECT DISTINCT ON (hardware_id) hardware_id, diag_value, ts
FROM diagnose_logs
WHERE ts <= '2009-12-25 23:59:59'
ORDER BY hardware_id, ts DESC;

You can obviously put in the normal clauses to limit the hardware_ids to
be things you consider important in the normal ways.

Hi Sam!

It worked ok! your solution solves what I need. The process time went
from 60 to 20 seconds. nice!

Can you help me to discover why the "Seq Scan" in explain analyse? I
tried to create some indexes to change seq scan to index scan, but
couldn't do it.

Now the real table and field names...

explain analyse select distinct on (callbox_id) callbox_id, ts, imei,
temperatura from diag_resultados where ts <= '2010-06-15 00:00:00' order
by callbox_id, ts desc;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=408118.90..417725.43 rows=406 width=18) (actual
time=19608.347..22626.744 rows=458 loops=1)
-> Sort (cost=408118.90..412922.17 rows=1921306 width=18) (actual
time=19608.345..21503.135 rows=1905941 loops=1)
Sort Key: callbox_id, ts
-> Seq Scan on diag_resultados (cost=0.00..58795.50
rows=1921306 width=18) (actual time=0.024..4886.113 rows=1905941 loops=1)
Filter: (ts <= '2010-06-15 00:00:00-03'::timestamp with
time zone)
Total runtime: 22762.754 ms

Pg is old in this machine. 7.4.17

#8Sam Mason
sam@samason.me.uk
In reply to: Pedro Zorzenon Neto (#7)
Re: how to remove a for-loop from programming language and put it into the query?

On Mon, Jul 05, 2010 at 12:44:55PM -0300, Pedro Zorzenon Neto wrote:

Em 05-07-2010 12:22, Sam Mason escreveu:

SELECT DISTINCT ON (hardware_id) hardware_id, diag_value, ts
FROM diagnose_logs
WHERE ts <= '2009-12-25 23:59:59'
ORDER BY hardware_id, ts DESC;

It worked ok! your solution solves what I need. The process time went
from 60 to 20 seconds. nice!

Always nice when less code is faster!

Can you help me to discover why the "Seq Scan" in explain analyse? I
tried to create some indexes to change seq scan to index scan, but
couldn't do it.

It's because the only way PG knows how to do a DISTINCT ON is to sort
the whole table and then pull out the appropriate values. Sorting the
whole of a table is generally going to be faster than referring to an
index for every row and hence PG won't use an index.

I'm not sure if that's changed more recently, but for 7.4 I'm pretty
sure that's the case anyway.

--
Sam http://samason.me.uk/

#9Sim Zacks
sim@compulab.co.il
In reply to: Pedro Zorzenon Neto (#1)
Re: how to remove a for-loop from programming language and put it into the query?

(external programming language)
for ($i = 1; $i < 500; $i++) {
// return me the "most recent" diag_value from a hardware_id $i
// at the desired timestamp
runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
23:59:59' and hardware_id = $i order by ts desc limit 1");
}

can I turn this for-loop into a single query to run in postgres?

Thanks,
Pedro

Try:

SELECT diag_value FROM diagnose_logs a where id in
(
SELECT id FROM diagnose_logs b
WHERE a.hardware_id=b.hardware_id
and ts <= '2009-12-25 23:59:59'
and hardware_id between 1 and 500
ORDER BY ts LIMIT 1)
ORDER BY hardware_id;