max time in a table query takes ages

Started by Grzegorz Jaśkiewiczover 17 years ago8 messagesgeneral

gryzman@gmail.com

over 17 years ago

hey folks

I have a simple query over a fairly simple query here, that scans for max
date in a table that's fairly hudge (300M rows). there's index on that field
that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where
processed = false

explain analyze:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=0) (actual
time=2558459.883..2558459.884 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.89 rows=1 width=8) (actual
time=2558362.751..2558362.753 rows=1 loops=1)
-> Index Scan Backward using sstats_date_idx on staticstats
(cost=0.00..1566198296.88 rows=402561795 width=8) (actual
time=2558362.747..2558362.747 rows=1 loops=1)
Filter: ((data IS NOT NULL) AND (NOT processed))
Total runtime: 2558540.800 ms
(6 rows)

Time: 2558545.012 ms

one thing I am amazed by, is the filter data is not null, well - take a look
at the schema here:

it takes ms if there's somethign that's been recently added to that table.
The table itself is vacuumed/analyzed quite often, and more or less
clustered by sstats_date_idx - althrough in that instance, I wasn't able to
recluster it - because there's not enough disc space (only 45GB free, and
for whatever reason - even tho the table is only about 25GB in size -
postgresql requires more than 40GB of space to recluster it).

any hints please ?

--
GJ

Pavel Stehule

pavel.stehule@gmail.com

over 17 years ago

In reply to: Grzegorz Jaśkiewicz (#1)

Re: max time in a table query takes ages

Hello

try VACUUM and REINDEX

regards
Pavel Stehule

2008/10/23 Grzegorz Jaśkiewicz <gryzman@gmail.com>:

Show quoted text

hey folks

I have a simple query over a fairly simple query here, that scans for max
date in a table that's fairly hudge (300M rows). there's index on that field
that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where
processed = false

explain analyze:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=0) (actual
time=2558459.883..2558459.884 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.89 rows=1 width=8) (actual
time=2558362.751..2558362.753 rows=1 loops=1)
-> Index Scan Backward using sstats_date_idx on staticstats
(cost=0.00..1566198296.88 rows=402561795 width=8) (actual
time=2558362.747..2558362.747 rows=1 loops=1)
Filter: ((data IS NOT NULL) AND (NOT processed))
Total runtime: 2558540.800 ms
(6 rows)

Time: 2558545.012 ms

one thing I am amazed by, is the filter data is not null, well - take a look
at the schema here:

staty=> \d+ staticstats
Table "public.staticstats"
Column | Type | Modifiers | Description
-----------+--------------------------------+------------------------------------------------------+-------------
data | timestamp(0) without time zone | not null |
size | integer | not null default 0 |
proto | integer | not null |
macfrom | integer | not null |
macto | integer | not null |
processed | boolean | not null default false |
id | bigint | not null default nextval('sstatic_id_seq'::regclass) |
Indexes:
"blah123s" PRIMARY KEY, btree (macto, data, proto, macfrom)
"sstats_id_idx" UNIQUE, btree (id)
"sstats_date_idx" btree (data)
"staticstat_processed_idxs" btree (processed)
Foreign-key constraints:
"staty_fk1s" FOREIGN KEY (macfrom) REFERENCES macs(id)
"staty_fks" FOREIGN KEY (macto) REFERENCES macs(id)
Has OIDs: no

it takes ms if there's somethign that's been recently added to that table.
The table itself is vacuumed/analyzed quite often, and more or less
clustered by sstats_date_idx - althrough in that instance, I wasn't able to
recluster it - because there's not enough disc space (only 45GB free, and
for whatever reason - even tho the table is only about 25GB in size -
postgresql requires more than 40GB of space to recluster it).

any hints please ?

--

GJ

Grzegorz Jaśkiewicz

gryzman@gmail.com

over 17 years ago

In reply to: Pavel Stehule (#2)

Re: max time in a table query takes ages

if reindex will help (which I will run overnight, and will let you know
Tomorrow) - I find it quite worrying, cos it is 8.3, and I was hoping - with
HOT in place, and all these features - that reindexing of that table
wouldn't be needed. it is 'only' 375M rows now, but I can definetively feel
pain ppl with much bigger dbs/tables have, if they have to reindex once in a
while.

I also have to think about maybe partitioning that table by quarters, or
months even, but that's a subject for different set of questions - if I run
into any problems here.

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Grzegorz Jaśkiewicz (#1)

Re: max time in a table query takes ages

"=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?=" <gryzman@gmail.com> writes:

I have a simple query over a fairly simple query here, that scans for max
date in a table that's fairly hudge (300M rows). there's index on that field
that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where
processed = false

I suppose the problem is that rows with processed = false are very few
in the upper range of data. If so, and if you really need this to go
fast, a partial index might be worth its overhead:
create index foo on staticstats(data) where processed = false;

regards, tom lane

Grzegorz Jaśkiewicz

gryzman@gmail.com

over 17 years ago

In reply to: Tom Lane (#4)

Re: max time in a table query takes ages

On Thu, Oct 23, 2008 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I suppose the problem is that rows with processed = false are very few
in the upper range of data. If so, and if you really need this to go
fast, a partial index might be worth its overhead:
create index foo on staticstats(data) where processed = false;

I mostly scan that table for processed=false. Everything with processed =
true is really useless, and there to guard other tables against duplication
(all other tables contain pretty much the same data, but information about
relation isn't stored - because whole structure is quite complicated, and I
need to retrieve data from other tables in matter of miliseconds - I get
live graphs plotted based on huge range of input params, no chance to
pre-render it).
I also thought about create index bar staticstats(data, processed) where
processed = false;
but that would be useful only to that query.
Can postgres use combined indicies for queries that would only require part
of it ?
iow, do I have to create index for every possible combination that can be
used in a query ? what's the best way to do it ?

--
GJ

Scott Ribe

scott_ribe@killerbytes.com

over 17 years ago

In reply to: Grzegorz Jaśkiewicz (#5)

Re: max time in a table query takes ages

Can postgres use combined indicies for queries that would only require part of
it ?

Even if not, if there is at least one index that reduces the potential
matches to a small set, then scanning those rows against the other criteria
won't take so long. (Assuming good stats and PG choosing a good plan.)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

Grzegorz Jaśkiewicz

gryzman@gmail.com

over 17 years ago

In reply to: Scott Ribe (#6)

Re: max time in a table query takes ages

that index did the job, also reindexing, and getting rid of two other not
quite often used indices helped a lot. Now, this whole machine is fairly
simple two way p4, with two sata discs in software raid 1 on linux. And it
seems to spend loads of time (40-60% sometimes) on waits. I guess this is
due to lack of aio support in postgresql, but still its times better from
what it used to be on 8.1 (the machine is running 8.3).

Alan Hodgson

ahodgson@simkin.ca

over 17 years ago

In reply to: Grzegorz Jaśkiewicz (#7)

Re: max time in a table query takes ages

On Friday 24 October 2008, "Grzegorz Jaśkiewicz" <gryzman@gmail.com> wrote:

with two sata discs in software raid 1 on linux. And it seems to spend
loads of time (40-60% sometimes) on waits. I guess this is due to lack of

aio support in postgresql,

No, it's due to the fact that hard disks are slow; much, much slower than
CPU or memory.

--
Alan