max time in a table query takes ages

Started by Grzegorz Jaśkiewiczover 17 years ago8 messagesgeneral
Jump to latest
#1Grzegorz Jaśkiewicz
gryzman@gmail.com

hey folks

I have a simple query over a fairly simple query here, that scans for max
date in a table that's fairly hudge (300M rows). there's index on that field
that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where
processed = false

explain analyze:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=0) (actual
time=2558459.883..2558459.884 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.89 rows=1 width=8) (actual
time=2558362.751..2558362.753 rows=1 loops=1)
-> Index Scan Backward using sstats_date_idx on staticstats
(cost=0.00..1566198296.88 rows=402561795 width=8) (actual
time=2558362.747..2558362.747 rows=1 loops=1)
Filter: ((data IS NOT NULL) AND (NOT processed))
Total runtime: 2558540.800 ms
(6 rows)

Time: 2558545.012 ms

one thing I am amazed by, is the filter data is not null, well - take a look
at the schema here:

staty=> \d+ staticstats
Table "public.staticstats"
Column | Type | Modifiers | Description
-----------+--------------------------------+------------------------------------------------------+-------------
data | timestamp(0) without time zone | not null |
size | integer | not null default 0 |
proto | integer | not null |
macfrom | integer | not null |
macto | integer | not null |
processed | boolean | not null default false |
id | bigint | not null default nextval('sstatic_id_seq'::regclass) |
Indexes:
"blah123s" PRIMARY KEY, btree (macto, data, proto, macfrom)
"sstats_id_idx" UNIQUE, btree (id)
"sstats_date_idx" btree (data)
"staticstat_processed_idxs" btree (processed)
Foreign-key constraints:
"staty_fk1s" FOREIGN KEY (macfrom) REFERENCES macs(id)
"staty_fks" FOREIGN KEY (macto) REFERENCES macs(id)
Has OIDs: no

it takes ms if there's somethign that's been recently added to that table.
The table itself is vacuumed/analyzed quite often, and more or less
clustered by sstats_date_idx - althrough in that instance, I wasn't able to
recluster it - because there's not enough disc space (only 45GB free, and
for whatever reason - even tho the table is only about 25GB in size -
postgresql requires more than 40GB of space to recluster it).

any hints please ?

--
GJ

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Grzegorz Jaśkiewicz (#1)
Re: max time in a table query takes ages

Hello

try VACUUM and REINDEX

regards
Pavel Stehule

2008/10/23 Grzegorz Jaśkiewicz <gryzman@gmail.com>:

Show quoted text

hey folks

I have a simple query over a fairly simple query here, that scans for max
date in a table that's fairly hudge (300M rows). there's index on that field
that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where
processed = false

explain analyze:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=0) (actual
time=2558459.883..2558459.884 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.89 rows=1 width=8) (actual
time=2558362.751..2558362.753 rows=1 loops=1)
-> Index Scan Backward using sstats_date_idx on staticstats
(cost=0.00..1566198296.88 rows=402561795 width=8) (actual
time=2558362.747..2558362.747 rows=1 loops=1)
Filter: ((data IS NOT NULL) AND (NOT processed))
Total runtime: 2558540.800 ms
(6 rows)

Time: 2558545.012 ms

one thing I am amazed by, is the filter data is not null, well - take a look
at the schema here:

staty=> \d+ staticstats
Table "public.staticstats"
Column | Type | Modifiers | Description
-----------+--------------------------------+------------------------------------------------------+-------------
data | timestamp(0) without time zone | not null |
size | integer | not null default 0 |
proto | integer | not null |
macfrom | integer | not null |
macto | integer | not null |
processed | boolean | not null default false |
id | bigint | not null default nextval('sstatic_id_seq'::regclass) |
Indexes:
"blah123s" PRIMARY KEY, btree (macto, data, proto, macfrom)
"sstats_id_idx" UNIQUE, btree (id)
"sstats_date_idx" btree (data)
"staticstat_processed_idxs" btree (processed)
Foreign-key constraints:
"staty_fk1s" FOREIGN KEY (macfrom) REFERENCES macs(id)
"staty_fks" FOREIGN KEY (macto) REFERENCES macs(id)
Has OIDs: no

it takes ms if there's somethign that's been recently added to that table.
The table itself is vacuumed/analyzed quite often, and more or less
clustered by sstats_date_idx - althrough in that instance, I wasn't able to
recluster it - because there's not enough disc space (only 45GB free, and
for whatever reason - even tho the table is only about 25GB in size -
postgresql requires more than 40GB of space to recluster it).

any hints please ?

--

GJ

#3Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Pavel Stehule (#2)
Re: max time in a table query takes ages

if reindex will help (which I will run overnight, and will let you know
Tomorrow) - I find it quite worrying, cos it is 8.3, and I was hoping - with
HOT in place, and all these features - that reindexing of that table
wouldn't be needed. it is 'only' 375M rows now, but I can definetively feel
pain ppl with much bigger dbs/tables have, if they have to reindex once in a
while.

I also have to think about maybe partitioning that table by quarters, or
months even, but that's a subject for different set of questions - if I run
into any problems here.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Grzegorz Jaśkiewicz (#1)
Re: max time in a table query takes ages

"=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?=" <gryzman@gmail.com> writes:

I have a simple query over a fairly simple query here, that scans for max
date in a table that's fairly hudge (300M rows). there's index on that field
that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where
processed = false

I suppose the problem is that rows with processed = false are very few
in the upper range of data. If so, and if you really need this to go
fast, a partial index might be worth its overhead:
create index foo on staticstats(data) where processed = false;

regards, tom lane

#5Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Tom Lane (#4)
Re: max time in a table query takes ages

On Thu, Oct 23, 2008 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I suppose the problem is that rows with processed = false are very few
in the upper range of data. If so, and if you really need this to go
fast, a partial index might be worth its overhead:
create index foo on staticstats(data) where processed = false;

I mostly scan that table for processed=false. Everything with processed =
true is really useless, and there to guard other tables against duplication
(all other tables contain pretty much the same data, but information about
relation isn't stored - because whole structure is quite complicated, and I
need to retrieve data from other tables in matter of miliseconds - I get
live graphs plotted based on huge range of input params, no chance to
pre-render it).
I also thought about create index bar staticstats(data, processed) where
processed = false;
but that would be useful only to that query.
Can postgres use combined indicies for queries that would only require part
of it ?
iow, do I have to create index for every possible combination that can be
used in a query ? what's the best way to do it ?

--
GJ

#6Scott Ribe
scott_ribe@killerbytes.com
In reply to: Grzegorz Jaśkiewicz (#5)
Re: max time in a table query takes ages

Can postgres use combined indicies for queries that would only require part of
it ?

Even if not, if there is at least one index that reduces the potential
matches to a small set, then scanning those rows against the other criteria
won't take so long. (Assuming good stats and PG choosing a good plan.)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

#7Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Scott Ribe (#6)
Re: max time in a table query takes ages

that index did the job, also reindexing, and getting rid of two other not
quite often used indices helped a lot. Now, this whole machine is fairly
simple two way p4, with two sata discs in software raid 1 on linux. And it
seems to spend loads of time (40-60% sometimes) on waits. I guess this is
due to lack of aio support in postgresql, but still its times better from
what it used to be on 8.1 (the machine is running 8.3).

#8Alan Hodgson
ahodgson@simkin.ca
In reply to: Grzegorz Jaśkiewicz (#7)
Re: max time in a table query takes ages

On Friday 24 October 2008, "Grzegorz Jaśkiewicz" <gryzman@gmail.com> wrote:

with two sata discs in software raid 1 on linux. And it seems to spend
loads of time (40-60% sometimes) on waits. I guess this is due to lack of

aio support in postgresql,

No, it's due to the fact that hard disks are slow; much, much slower than
CPU or memory.

--
Alan