Question about plan

Started by Martijn van Oosterhoutover 24 years ago3 messagesgeneral
Jump to latest
#1Martijn van Oosterhout
kleptog@svana.org

Basically, I have one table A with a set of start times and duration and
another table B with just a set of times. What I want is a result that for
each time in table B count the number of intervals in table A that it is in.

The query is not so difficult but it's a large set. The plan looks like
this:

Aggregate (cost=17365.57..17671.74 rows=3062 width=28)
-> Group (cost=17365.57..17595.20 rows=30618 width=28)
-> Sort (cost=17365.57..17365.57 rows=30618 width=28)
-> Nested Loop (cost=0.00..14718.41 rows=30618 width=28)
-> Seq Scan on internetusage (cost=0.00..596.04 rows=191 width=16)
-> Seq Scan on times (cost=0.00..23.40 rows=1440 width=12)

The problem is that it is running out of memory (after using around 500MB).
I imagine this is because it has to build the entire result in the sort
step. Can someone tell me why it doesn't use a plan like this one?

Aggregate
-> Group
-> Merge Join
-> Sort
-> Seq Scan on internetusage
-> Sort
-> Seq Scan on times

This would use a fraction of the memory and in this case that would make it
faster since it could start producing output faster. Using limit 1 has no
effect. The version is 7.0.2

If later versions fix it, let me know. Workarounds would be appreciated
also. The actual query follows:

select date_trunc('day',starttime) + minute as date,
count(iuid)
from times, internetusage
where date_trunc('month',starttime) = '2001-07-01'
and minute between starttime::time
and starttime::time + (duration || 'seconds')::interval
group by date, starttime, minute;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Show quoted text

It would be nice if someone came up with a certification system that
actually separated those who can barely regurgitate what they crammed over
the last few weeks from those who command secret ninja networking powers.

#2omid omoomi
oomoomi@hotmail.com
In reply to: Martijn van Oosterhout (#1)
Re: Question about plan

hi,
Haven't you made any index on internetusage(starttime)? I think it would be
very usful.
At the moment I have the same internet accounting system with a log table of
about 500'000 records, and it is doing well even with pg v. 6.5.2. I have
indexed my table on userID,starttime.
If I were you , I would change my table so that it contains
starttime,stoptime which was indexed on start_time (if no userID needed).

HTH
Omid

From: Martijn van Oosterhout <kleptog@svana.org>
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
To: Postgresql General <pgsql-general@postgresql.org>
Subject: [GENERAL] Question about plan
Date: Wed, 29 Aug 2001 16:55:48 +1000

Basically, I have one table A with a set of start times and duration and
another table B with just a set of times. What I want is a result that for
each time in table B count the number of intervals in table A that it is
in.

The query is not so difficult but it's a large set. The plan looks like
this:

Aggregate (cost=17365.57..17671.74 rows=3062 width=28)
-> Group (cost=17365.57..17595.20 rows=30618 width=28)
-> Sort (cost=17365.57..17365.57 rows=30618 width=28)
-> Nested Loop (cost=0.00..14718.41 rows=30618 width=28)
-> Seq Scan on internetusage (cost=0.00..596.04
rows=191 width=16)
-> Seq Scan on times (cost=0.00..23.40 rows=1440
width=12)

The problem is that it is running out of memory (after using around 500MB).
I imagine this is because it has to build the entire result in the sort
step. Can someone tell me why it doesn't use a plan like this one?

Aggregate
-> Group
-> Merge Join
-> Sort
-> Seq Scan on internetusage
-> Sort
-> Seq Scan on times

This would use a fraction of the memory and in this case that would make it
faster since it could start producing output faster. Using limit 1 has no
effect. The version is 7.0.2

If later versions fix it, let me know. Workarounds would be appreciated
also. The actual query follows:

select date_trunc('day',starttime) + minute as date,
count(iuid)
from times, internetusage
where date_trunc('month',starttime) = '2001-07-01'
and minute between starttime::time
and starttime::time + (duration || 'seconds')::interval
group by date, starttime, minute;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

It would be nice if someone came up with a certification system that
actually separated those who can barely regurgitate what they crammed

over

the last few weeks from those who command secret ninja networking

powers.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#1)
Re: Question about plan

Martijn van Oosterhout <kleptog@svana.org> writes:

The problem is that it is running out of memory (after using around 500MB).

Update to 7.1, which fixes lots of intra-query memory leakage problems.

regards, tom lane