more problems with count(*) on large table

Started by Mike Charnokyover 18 years ago27 messagesgeneral

Jump to latest

Mike Charnoky

noky@nextbus.com

over 18 years ago

Hi,

I am still having problems performing a count(*) on a large table. This
is a followup from a recent thread:

http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php

Since the last time these problems happened, we have tweaked some
postgresql config parameters (fsm, etc). I also recreated the large
table, with the assumption it was somehow corrupted.

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

select count(*) from mytable where evtime between '2007-09-26' and
'2007-09-27';

However, this query runs successfully and takes 2 minutes:

select count(*) from mytable where evtime between '2007-09-25' and
'2007-09-26';

count
----------
14150928
(1 row)

What is going on? I analyzed the table before running the query and
have no reason to believe that the amount of data added to the table
varies much from day to day. No data has been deleted from the table
yet, just added.

Here is some config info:

PostgreSQL 8.1.8 on Fedora 3

shared_buffers = 8000
temp_buffers = 1000
work_mem = 16384
maintenance_work_mem = 262144
max_fsm_pages = 500000
max_fsm_relations = 30000

Mike

A. Kretschmer

andreas.kretschmer@schollglas.com

over 18 years ago

In reply to: Mike Charnoky (#1)

Re: more problems with count(*) on large table

am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:

Hi,

I am still having problems performing a count(*) on a large table. This

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

Alvaro Herrera

alvherre@2ndquadrant.com

over 18 years ago

In reply to: A. Kretschmer (#2)

Re: more problems with count(*) on large table

A. Kretschmer wrote:

am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:

Hi,

I am still having problems performing a count(*) on a large table. This

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.

But he does have a WHERE condition. THe problem is, probably, that the
condition is not selective enough so the planner chooses to do a
seqscan.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Andrew Sullivan

ajs@crankycanuck.ca

over 18 years ago

In reply to: Alvaro Herrera (#3)

Re: more problems with count(*) on large table

On Fri, Sep 28, 2007 at 12:50:34PM -0400, Alvaro Herrera wrote:

But he does have a WHERE condition. THe problem is, probably, that the
condition is not selective enough so the planner chooses to do a
seqscan.

Or else the planner has a bad idea of how selective the condition is.
I've found that this sort of inconsistent selectivity on an indexed
column can be because the column statistics aren't good enough. You
can try ALTER TABLE...SET STATISTICS to see if it helps.

--
Andrew Sullivan | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
--Dennis Ritchie

A. Kretschmer

andreas.kretschmer@schollglas.com

over 18 years ago

In reply to: Alvaro Herrera (#3)

Re: more problems with count(*) on large table

am Fri, dem 28.09.2007, um 12:50:34 -0400 mailte Alvaro Herrera folgendes:

A. Kretschmer wrote:

am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:

Hi,

I am still having problems performing a count(*) on a large table. This

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.

But he does have a WHERE condition. THe problem is, probably, that the
condition is not selective enough so the planner chooses to do a
seqscan.

Yes, i'm sorry, i havn't recognize this. Maybe wrong statistics. As
Andrew suggested, a 'ALTER TABLE...SET STATISTICS' can help.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Alvaro Herrera (#3)

Re: more problems with count(*) on large table

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

A. Kretschmer wrote:

am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:

Hi,

I am still having problems performing a count(*) on a large table. This

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.

But he does have a WHERE condition. THe problem is, probably, that the
condition is not selective enough so the planner chooses to do a
seqscan.

What does EXPLAIN SELECT ... say?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Sean Davis

sdavis2@mail.nih.gov

over 18 years ago

In reply to: Mike Charnoky (#1)

Re: more problems with count(*) on large table

Mike Charnoky wrote:

Hi,

I am still having problems performing a count(*) on a large table. This
is a followup from a recent thread:

http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php

Since the last time these problems happened, we have tweaked some
postgresql config parameters (fsm, etc). I also recreated the large
table, with the assumption it was somehow corrupted.

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

select count(*) from mytable where evtime between '2007-09-26' and
'2007-09-27';

However, this query runs successfully and takes 2 minutes:

select count(*) from mytable where evtime between '2007-09-25' and
'2007-09-26';

count
----------
14150928
(1 row)

What is going on? I analyzed the table before running the query and
have no reason to believe that the amount of data added to the table
varies much from day to day. No data has been deleted from the table
yet, just added.

Here is some config info:

PostgreSQL 8.1.8 on Fedora 3

Have you vacuumed recently?

Sean

Mike Charnoky

noky@nextbus.com

over 18 years ago

In reply to: Sean Davis (#7)

Re: more problems with count(*) on large table

The autovacuum is turned on. Since this is pg8.1, I don't know when the
table was actually last vacuumed. I *did* run analyze on the table,
though. Also, nothing has been deleted in this table... so vacuum
should have no affect, right?

Mike

Sean Davis wrote:

Show quoted text

Mike Charnoky wrote:

Hi,

I am still having problems performing a count(*) on a large table.

Have you vacuumed recently?

Sean

Bill Moran

wmoran@potentialtech.com

over 18 years ago

In reply to: Mike Charnoky (#8)

Re: more problems with count(*) on large table

In response to Mike Charnoky <noky@nextbus.com>:

The autovacuum is turned on. Since this is pg8.1, I don't know when the
table was actually last vacuumed. I *did* run analyze on the table,
though. Also, nothing has been deleted in this table... so vacuum
should have no affect, right?

Updated rows also produce dead tuples that require vacuuming. If the
table is insert only, you don't need vacuum.

--
Bill Moran
http://www.potentialtech.com

#10

Mike Charnoky

noky@nextbus.com

over 18 years ago

In reply to: Bruce Momjian (#6)

Re: more problems with count(*) on large table

With respect to the ALTER TABLE SET STATISTICS... how do I determine a
good value to use? This wasn't really clear in the pg docs. Also, do I
need to run ANALYZE on the table after I change the statistics?

Here are the EXPLAINs from the queries:

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-25' and '2007-09-26';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=475677.40..475677.41 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..444451.44 rows=12490383 width=0)
Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-26' and '2007-09-27';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=486615.04..486615.05 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..454671.07 rows=12777586 width=0)
Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
time zone))
(3 rows)

Mike

Gregory Stark wrote:

Show quoted text

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

A. Kretschmer wrote:

am Fri, dem 28.09.2007, um 11:56:46 -0400 mailte Mike Charnoky folgendes:

Hi,

I am still having problems performing a count(*) on a large table. This

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.

But he does have a WHERE condition. THe problem is, probably, that the
condition is not selective enough so the planner chooses to do a
seqscan.

What does EXPLAIN SELECT ... say?

#11

Scott Marlowe

scott.marlowe@gmail.com

over 18 years ago

In reply to: Mike Charnoky (#1)

Re: more problems with count(*) on large table

On 9/28/07, Mike Charnoky <noky@nextbus.com> wrote:

Hi,

I am still having problems performing a count(*) on a large table. This
is a followup from a recent thread:

http://archives.postgresql.org/pgsql-general/2007-09/msg00561.php

Since the last time these problems happened, we have tweaked some
postgresql config parameters (fsm, etc). I also recreated the large
table, with the assumption it was somehow corrupted.

Now, certain count(*) queries are failing to complete for certain time
ranges (I killed the query after about 24 hours). The table is indexed
on a timestamp field. Here is one query that hangs:

select count(*) from mytable where evtime between '2007-09-26' and
'2007-09-27';

However, this query runs successfully and takes 2 minutes:

select count(*) from mytable where evtime between '2007-09-25' and
'2007-09-26';

I had a similar problem pop up recently in my medium large reporting
database. I found the problem solved by two things, one was upping
the stats target on the time column to 100, and the other was running
regularly scheduled analyzes on the large reporting table, about once
every 8 hours.

autovacuum works fine otherwise, but was not running analyze often enough

#12

Alban Hertroys

a.hertroys@magproductions.nl

over 18 years ago

In reply to: A. Kretschmer (#2)

Re: more problems with count(*) on large table

A. Kretschmer wrote:

Again: an index can't help! Because of MVCC: 'select count(*)' without
WHERE-condition forces an seq. table-scan.

That has very little to do with MVCC.

If I understand correctly, MVCC is about the availability of records in
the current transaction. For that it makes no difference whether a seq
scan or an index scan is performed - both cases need to check at the
record level whether it's visible (where the seq scan is already looking
at the actual record, of course).

The only difference MVCC makes here is that the balance between a seq
scan or an index scan being more efficient is somewhat sooner in favour
of the seq scan than on some other databases, because the index scan
needs to look at the actual record for visibility.

I pleed not guilty ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#13

Alban Hertroys

a.hertroys@magproductions.nl

over 18 years ago

In reply to: Mike Charnoky (#10)

Re: more problems with count(*) on large table

Mike Charnoky wrote:

With respect to the ALTER TABLE SET STATISTICS... how do I determine a
good value to use? This wasn't really clear in the pg docs. Also, do I
need to run ANALYZE on the table after I change the statistics?

Here are the EXPLAINs from the queries:

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-25' and '2007-09-26';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=475677.40..475677.41 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..444451.44 rows=12490383 width=0)
Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-26' and '2007-09-27';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=486615.04..486615.05 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..454671.07 rows=12777586 width=0)
Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
time zone))
(3 rows)

Interesting, same plans and no sequential scans... Yet totally different
run times. Almost as if something prevents you to read some records
between 26 and 27 september...

I'm no expert on locking in Postgres, but AFAIK locks that prevent you
from reading records are rather rare and probably only issued from
userland code.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#14

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Alban Hertroys (#13)

Re: more problems with count(*) on large table

"Alban Hertroys" <a.hertroys@magproductions.nl> writes:

Mike Charnoky wrote:

With respect to the ALTER TABLE SET STATISTICS... how do I determine a
good value to use? This wasn't really clear in the pg docs. Also, do I
need to run ANALYZE on the table after I change the statistics?

Here are the EXPLAINs from the queries:

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-25' and '2007-09-26';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=475677.40..475677.41 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..444451.44 rows=12490383 width=0)
Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-26' and '2007-09-27';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=486615.04..486615.05 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..454671.07 rows=12777586 width=0)
Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
time zone))
(3 rows)

Interesting, same plans and no sequential scans... Yet totally different
run times. Almost as if something prevents you to read some records
between 26 and 27 september...

Just to be sure we're looking at the right plan do this:

\timing
PREPARE stmt AS
SELECT count(*)
FROM prediction_accuracy
WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';

EXPLAIN EXECUTE stmt;
EXECUTE stmt;

I'm no expert on locking in Postgres, but AFAIK locks that prevent you
from reading records are rather rare and probably only issued from
userland code.

Pages can be locked but never for very long.

What other work is going on in this server? Is there anything which might be
locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
etc?)

Perhaps there's a hardware problem, is there anything in your system logs from
dmesg?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#15

Laurenz Albe

laurenz.albe@cybertec.at

over 18 years ago

In reply to: Alban Hertroys (#12)

Re: more problems with count(*) on large table

Alban Hertroys wrote:

A. Kretschmer wrote:

Again: an index can't help! Because of MVCC: 'select count(*)'

without

WHERE-condition forces an seq. table-scan.

That has very little to do with MVCC.

[...] For that it makes no difference whether a seq
scan or an index scan is performed - both cases need to check at the
record level whether it's visible (where the seq scan is
already looking at the actual record, of course).

If you do not use MVCC (say, you use DB2), you need not check
the record itself because if it is there (which it is if there
is an index entry), it will be 'visible'.

I pleed not guilty ;)

Declined, sorry.

Yours,
Laurenz Albe

#16

Alban Hertroys

a.hertroys@magproductions.nl

over 18 years ago

In reply to: Laurenz Albe (#15)

Re: more problems with count(*) on large table

Albe Laurenz wrote:

Alban Hertroys wrote:

A. Kretschmer wrote:

Again: an index can't help! Because of MVCC: 'select count(*)'

without

WHERE-condition forces an seq. table-scan.

That has very little to do with MVCC.

[...] For that it makes no difference whether a seq
scan or an index scan is performed - both cases need to check at the
record level whether it's visible (where the seq scan is
already looking at the actual record, of course).

If you do not use MVCC (say, you use DB2), you need not check
the record itself because if it is there (which it is if there
is an index entry), it will be 'visible'.

Still, that's not because of MVCC, but because of the way it is
implemented in PostgreSQL.

There has been talk in the past (regularly) about why the MVCC
information is not in the index and whether it should be, see the ML
archives.

Besides, there are still many situations where a sequential scan
(whether for count(*) or not) is faster than an index scan, no matter
whether you have MVCC or not.

As I said, MVCC has little to do with it.
The real problem is that in postgres you cannot tell from an index
whether a record is visible or not, while you can in DB2 (because it has
an index entry or not).

I pleed not guilty ;)

Declined, sorry.

Overruled, sorry.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#17

Mike Charnoky

noky@nextbus.com

over 18 years ago

In reply to: Bruce Momjian (#14)

Re: more problems with count(*) on large table

I altered the table in question, with "set statistics 100" on the
timestamp column, then ran analyze. This seemed to help somewhat. Now,
queries don't seem to hang, but it still takes a long time to do the count:
* "where evtime between '2007-09-26' and '2007-09-27'"
took 37 minutes to run (result was ~12 million)
* "where evtime between '2007-09-25' and '2007-09-26'"
took 40 minutes to run (result was ~14 million)

Still stymied about the seemingly random performance, especially since I
have seen this query execute in 2 minutes.

Nothing should be locking the table, the only things happening with this
database are: periodic autovacuums and a process which is constantly
inserting data into the large table in question (and a few other smaller
logging tables).

FYI: Here is the output from explain (I never knew about "\timing" in
psql, that is a big help!). This was run immediately after I ran the
query a first time (without using the prepare and explain)... This
second run took a little over a minute! Bizarre...

mydb=# prepare stmt as select count(*) from prediction_accuracy where
evtime between '2007-09-25' and '2007-09-26';
PREPARE
Time: 90.854 ms
mydb=# explain execute stmt;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=642338.27..642338.28 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..600219.37 rows=16847557 width=0)
Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

Time: 131.559 ms
mydb# execute stmt;
count
----------
14150928
(1 row)

Time: 101721.346 ms

Mike

Gregory Stark wrote:

Show quoted text

"Alban Hertroys" <a.hertroys@magproductions.nl> writes:

Mike Charnoky wrote:

With respect to the ALTER TABLE SET STATISTICS... how do I determine a
good value to use? This wasn't really clear in the pg docs. Also, do I
need to run ANALYZE on the table after I change the statistics?

Here are the EXPLAINs from the queries:

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-25' and '2007-09-26';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=475677.40..475677.41 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..444451.44 rows=12490383 width=0)
Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

db=# explain select count(*) from prediction_accuracy where evtime
between '2007-09-26' and '2007-09-27';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=486615.04..486615.05 rows=1 width=0)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..454671.07 rows=12777586 width=0)
Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
time zone))
(3 rows)

Interesting, same plans and no sequential scans... Yet totally different
run times. Almost as if something prevents you to read some records
between 26 and 27 september...

Just to be sure we're looking at the right plan do this:

\timing
PREPARE stmt AS
SELECT count(*)
FROM prediction_accuracy
WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';

EXPLAIN EXECUTE stmt;
EXECUTE stmt;

I'm no expert on locking in Postgres, but AFAIK locks that prevent you
from reading records are rather rare and probably only issued from
userland code.

Pages can be locked but never for very long.

What other work is going on in this server? Is there anything which might be
locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
etc?)

Perhaps there's a hardware problem, is there anything in your system logs from
dmesg?

#18

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Mike Charnoky (#17)

Re: more problems with count(*) on large table

"Mike Charnoky" <noky@nextbus.com> writes:

I altered the table in question, with "set statistics 100" on the
timestamp column, then ran analyze. This seemed to help somewhat. Now,
queries don't seem to hang, but it still takes a long time to do the count:
* "where evtime between '2007-09-26' and '2007-09-27'"
took 37 minutes to run (result was ~12 million)
* "where evtime between '2007-09-25' and '2007-09-26'"
took 40 minutes to run (result was ~14 million)

Still stymied about the seemingly random performance, especially since I
have seen this query execute in 2 minutes.

And the "explain analyze" for these?

Are you still sure it's certain date ranges which are consistently problems
and others are consistently fast? Or could it be something unrelated.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#19

Mike Charnoky

noky@nextbus.com

over 18 years ago

In reply to: Bruce Momjian (#18)

Re: more problems with count(*) on large table

This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes. It seems that the
first time the data is queried it takes about 40 minutes. If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others. I have done "set statistics
100" for the evtime field in this table.

Here is the output from EXPLAIN ANALYZE. This is the same query run
back to back, first time takes 42 minutes, second time takes less than 2
minutes!

mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.02..3.03 rows=1 width=0) (actual
time=2549854.351..2549854.352 rows=1 loops=1)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
rows=11423786 loops=1)
Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
Total runtime: 2549854.411 ms
(4 rows)

Time: 2549943.506 ms
mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.02..3.03 rows=1 width=0) (actual
time=111200.943..111200.944 rows=1 loops=1)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
rows=11423786 loops=1)
Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
Total runtime: 111201.000 ms
(4 rows)

Time: 111298.695 ms

Mike

Gregory Stark wrote:

Show quoted text

"Mike Charnoky" <noky@nextbus.com> writes:

I altered the table in question, with "set statistics 100" on the
timestamp column, then ran analyze. This seemed to help somewhat. Now,
queries don't seem to hang, but it still takes a long time to do the count:
* "where evtime between '2007-09-26' and '2007-09-27'"
took 37 minutes to run (result was ~12 million)
* "where evtime between '2007-09-25' and '2007-09-26'"
took 40 minutes to run (result was ~14 million)

Still stymied about the seemingly random performance, especially since I
have seen this query execute in 2 minutes.

And the "explain analyze" for these?

Are you still sure it's certain date ranges which are consistently problems
and others are consistently fast? Or could it be something unrelated.

#20

Alan Hodgson

ahodgson@simkin.ca

over 18 years ago

In reply to: Mike Charnoky (#19)

Re: more problems with count(*) on large table

On Monday 01 October 2007, Mike Charnoky <noky@nextbus.com> wrote:

This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes. It seems that the
first time the data is queried it takes about 40 minutes. If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others. I have done "set statistics
100" for the evtime field in this table.

The first time, you're reading from disk. The second time, you're reading
from cache. Tens of millions of disk seeks don't come cheap.

--
We're Microsoft. Everything else is just patent infringement.

#21

Alvaro Herrera

alvherre@2ndquadrant.com

over 18 years ago

In reply to: Mike Charnoky (#19)

#22

Bill Moran

wmoran@potentialtech.com

over 18 years ago

In reply to: Mike Charnoky (#19)