SQL Help - Finding Next Lowest Value of Current Row Value

Started by Jeff Adamsover 14 years ago5 messagesgeneral
Jump to latest
#1Jeff Adams
Jeff.Adams@noaa.gov

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff

#2Chris Travers
chris.travers@gmail.com
In reply to: Jeff Adams (#1)
Re: SQL Help - Finding Next Lowest Value of Current Row Value

On Sat, Oct 1, 2011 at 2:30 PM, Jeff Adams <Jeff.Adams@noaa.gov> wrote:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

How many rows per vessel?

Either way it seems to me we are talking about selecting the current
record, and then selecting the max of another record within a
timeframe. I would probably use a plpgsql function that would do this
in two stages, ideally being able to do an index scan twice....

Best Wishes,
Chris Travers

#3Chris Curvey
chris@chriscurvey.com
In reply to: Jeff Adams (#1)
Re: SQL Help - Finding Next Lowest Value of Current Row Value

On Sat, Oct 1, 2011 at 5:30 PM, Jeff Adams <Jeff.Adams@noaa.gov> wrote:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach
that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff

Would a self-join with a MAX() help, like this? (Where "v" is your
vessel_id and "e" is your time value?)

create table stuff
( v int
, e timestamp
);

insert into stuff (v, e) values (1, '1/1/2011');
insert into stuff (v, e) values (1, '1/2/2011');
insert into stuff (v, e) values (1, '1/3/2011');

insert into stuff (v, e) values (2, '2/1/2011');
insert into stuff (v, e) values (2, '2/2/2011');

select a.v, a.e, max(b.e), a.e - max(b.e)
from stuff a
join stuff b on a.v = b.v
where a.e > b.e
group by a.v, a.e

I don't have a multi-million row table handy, but I'd be interested to hear
your results.

--
e-Mail is the equivalent of a postcard written in pencil. This message may
not have been sent by me, or intended for you. It may have been read or
even modified while in transit. e-Mail disclaimers have the same force in
law as a note passed in study hall. If your corporate attorney says that
you need an disclaimer in your signature, you need a new corporate attorney.

#4Thomas Kellerer
spam_eater@gmx.net
In reply to: Jeff Adams (#1)
Re: SQL Help - Finding Next Lowest Value of Current Row Value

Jeff Adams wrote on 01.10.2011 23:30:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff

Something like:

SELECT vessel,
time_column,
time_column - lag(time_column) over (partition by vessel order by time_column) as diff
FROM your_table

No sure how good that performs though.

#5Jeff Adams
Jeff.Adams@noaa.gov
In reply to: Chris Curvey (#3)
Re: SQL Help - Finding Next Lowest Value of Current Row Value

Thanks for the suggestions Chris (and Chris). After a bit more investigation
I stumbled upon the Window functions. The approach below turned out to be
much more efficient that a function or self join approach. The SQL that I
used is provided below (event_id and mmsi uniquely identify a vessel transit
for which I wished to compute how much time had elapsed between successive
records):

SELECT

a.event_id,

a.mmsi,

(a.epoch - lag(epoch) OVER (PARTITION BY event_id, mmsi ORDER BY epoch
ASC))/60 AS elapsed

FROM

dmas_ais a

Jeff

From: ccurvey@gmail.com [mailto:ccurvey@gmail.com] On Behalf Of Chris Curvey
Sent: Saturday, October 01, 2011 10:55 PM
To: Jeff Adams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row
Value

On Sat, Oct 1, 2011 at 5:30 PM, Jeff Adams <Jeff.Adams@noaa.gov> wrote:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff

Would a self-join with a MAX() help, like this? (Where "v" is your
vessel_id and "e" is your time value?)

create table stuff

( v int

, e timestamp

);

insert into stuff (v, e) values (1, '1/1/2011');

insert into stuff (v, e) values (1, '1/2/2011');

insert into stuff (v, e) values (1, '1/3/2011');

insert into stuff (v, e) values (2, '2/1/2011');

insert into stuff (v, e) values (2, '2/2/2011');

select a.v, a.e, max(b.e), a.e - max(b.e)

from stuff a

join stuff b on a.v = b.v

where a.e > b.e

group by a.v, a.e

I don't have a multi-million row table handy, but I'd be interested to hear
your results.

--

e-Mail is the equivalent of a postcard written in pencil. This message may
not have been sent by me, or intended for you. It may have been read or
even modified while in transit. e-Mail disclaimers have the same force in
law as a note passed in study hall. If your corporate attorney says that
you need an disclaimer in your signature, you need a new corporate attorney.