Improve a query...

Started by Eric G. Milleralmost 25 years ago6 messagesgeneral
Jump to latest
#1Eric G. Miller
egm2@jps.net

Looking for the best way to formulate a query to select
the most "recent" entry for an organization in a table
like:

CREATE TABLE reports (

-- Report Id used to link up related 1:M rather than multi-key
rpt_id SERIAL NOT NULL PRIMARY KEY,

-- A Unique ID for the organization
org_id char(10) NOT NULL CHECK(CHARACTER_LENGTH(op_id) = 10),

-- The reporting period
period integer NOT NULL

-- Various and Sundry ...
.
.
.

UNIQUE (org_id,period)
);

If max(period) for an organization yields the most recent reporting,
I want to get whatever is the most recent report for each organization.

This query works, but seems expensive...

SELECT a.rpt_id, a.org_id, a.period, ...
FROM reports As a
INNER JOIN
(SELECT b.org_id, max(b.period) As period
FROM reports b group by b.org_id) As c
ON a.org_id = c.org_id and a.period = c.period;

EXPLAIN looks thusly:

NOTICE: QUERY PLAN:

Merge Join (cost=147.98..164.48 rows=10 width=48)
-> Sort (cost=69.83..69.83 rows=1000 width=32)
-> Seq Scan on reports a (cost=0.00..20.00 rows=1000 width=32)
-> Sort (cost=78.15..78.15 rows=100 width=16)
-> Subquery Scan c (cost=69.83..74.83 rows=100 width=16)
-> Aggregate (cost=69.83..74.83 rows=100 width=16)
-> Group (cost=69.83..72.33 rows=1000 width=16)
-> Sort (cost=69.83..69.83 rows=1000 width=16)
-> Seq Scan on reports b (cost=0.00..20.00
rows=1000 width=16)

The data is very hierarchical so I didn't want to carry around alot of
key fields in related "many-sided" tables which may also have related
"many-sided" tables. Any ideas on how to minimize the multiple
scans on the table? The numbers for explain probably aren't telling
much since there's not much real data in the table at this time...

--
Eric G. Miller <egm2@jps.net>

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric G. Miller (#1)
Re: Improve a query...

"Eric G. Miller" <egm2@jps.net> writes:

Looking for the best way to formulate a query to select
the most "recent" entry for an organization in a table
like:

Take a look at the SELECT reference page's example for
SELECT DISTINCT ON:

: For example,
:
: SELECT DISTINCT ON (location) location, time, report
: FROM weatherReports
: ORDER BY location, time DESC;
:
: retrieves the most recent weather report for each location.

A tad nonstandard, but bloody effective, particularly if you
have an appropriate index in place...

regards, tom lane

#3Eric G. Miller
egm2@jps.net
In reply to: Tom Lane (#2)
Re: Improve a query...

On Wed, May 02, 2001 at 01:17:44AM -0400, Tom Lane wrote:

"Eric G. Miller" <egm2@jps.net> writes:

Looking for the best way to formulate a query to select
the most "recent" entry for an organization in a table
like:

Take a look at the SELECT reference page's example for
SELECT DISTINCT ON:

: For example,
:
: SELECT DISTINCT ON (location) location, time, report
: FROM weatherReports
: ORDER BY location, time DESC;
:
: retrieves the most recent weather report for each location.

A tad nonstandard, but bloody effective, particularly if you
have an appropriate index in place...

Tricky! After adding the unique 2 column index and using that construct
I get:

NOTICE: QUERY PLAN:

Unique (cost=1.20..1.22 rows=1 width=32)
-> Sort (cost=1.20..1.20 rows=8 width=32)
-> Seq Scan on reports (cost=0.00..1.08 rows=8 width=32)

EXPLAIN

--
Eric G. Miller <egm2@jps.net>

#4Jeff Eckermann
jeckermann@verio.net
In reply to: Eric G. Miller (#3)
RE: Improve a query...

Try:
SELECT DISTINCT ON (org_id) rpt_id, org_id, period,...
FROM reports
ORDER BY period DESC;

Show quoted text

-----Original Message-----
From: Eric G. Miller [SMTP:egm2@jps.net]
Sent: Tuesday, May 01, 2001 4:04 AM
To: PostgreSQL General
Subject: [GENERAL] Improve a query...

Looking for the best way to formulate a query to select
the most "recent" entry for an organization in a table
like:

CREATE TABLE reports (

-- Report Id used to link up related 1:M rather than multi-key
rpt_id SERIAL NOT NULL PRIMARY KEY,

-- A Unique ID for the organization
org_id char(10) NOT NULL CHECK(CHARACTER_LENGTH(op_id) = 10),

-- The reporting period
period integer NOT NULL

-- Various and Sundry ...
.
.
.

UNIQUE (org_id,period)
);

If max(period) for an organization yields the most recent reporting,
I want to get whatever is the most recent report for each organization.

This query works, but seems expensive...

SELECT a.rpt_id, a.org_id, a.period, ...
FROM reports As a
INNER JOIN
(SELECT b.org_id, max(b.period) As period
FROM reports b group by b.org_id) As c
ON a.org_id = c.org_id and a.period = c.period;

EXPLAIN looks thusly:

NOTICE: QUERY PLAN:

Merge Join (cost=147.98..164.48 rows=10 width=48)
-> Sort (cost=69.83..69.83 rows=1000 width=32)
-> Seq Scan on reports a (cost=0.00..20.00 rows=1000 width=32)
-> Sort (cost=78.15..78.15 rows=100 width=16)
-> Subquery Scan c (cost=69.83..74.83 rows=100 width=16)
-> Aggregate (cost=69.83..74.83 rows=100 width=16)
-> Group (cost=69.83..72.33 rows=1000 width=16)
-> Sort (cost=69.83..69.83 rows=1000 width=16)
-> Seq Scan on reports b
(cost=0.00..20.00
rows=1000 width=16)

The data is very hierarchical so I didn't want to carry around alot of
key fields in related "many-sided" tables which may also have related
"many-sided" tables. Any ideas on how to minimize the multiple
scans on the table? The numbers for explain probably aren't telling
much since there's not much real data in the table at this time...

--
Eric G. Miller <egm2@jps.net>

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Eric G. Miller (#1)
Re: Improve a query...

Eric G. Miller writes:

If max(period) for an organization yields the most recent reporting,
I want to get whatever is the most recent report for each organization.

SELECT org_id, max(period) FROM reports GROUP BY org_id;

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#6Eric G. Miller
egm2@jps.net
In reply to: Peter Eisentraut (#5)
Re: Improve a query...

On Wed, May 02, 2001 at 07:30:13PM +0200, Peter Eisentraut wrote:

Eric G. Miller writes:

If max(period) for an organization yields the most recent reporting,
I want to get whatever is the most recent report for each organization.

SELECT org_id, max(period) FROM reports GROUP BY org_id;

Naw, doesn't work. I need the other columns, so the GROUP BY is out
(except in a subselect). The DISTINCT ON () syntax, though
non-standard, seems to do the job.

--
Eric G. Miller <egm2@jps.net>