Slow count(*)
Hi,
We are looking for a patch that will help us count using the indexes.
Our product is about 20 times slower on Postgres compared to MS SQL
Server.
Any ideas?
Danny Abraham
BMC Software
CTM&D Business Unit
972-52-4286-513
danny_abraham@bmc.com
On 02/01/2008, Abraham, Danny <danny_abraham@bmc.com> wrote:
Hi,
We are looking for a patch that will help us count using the indexes.
Our product is about 20 times slower on Postgres compared to MS SQL
Server.Any ideas?
There isn't any similar patch and will not be.
Use materialized views or similar techniques.
Are you sure, so all your problems are only in SELECT COUNT(*)?
Check, please, all slow queries.
Regards
Pavel Stehule
Show quoted text
Danny Abraham
BMC Software
CTM&D Business Unit
972-52-4286-513
danny_abraham@bmc.com---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
am Wed, dem 02.01.2008, um 9:29:24 -0600 mailte Abraham, Danny folgendes:
Hi,
We are looking for a patch that will help us count using the indexes.
Our product is about 20 times slower on Postgres compared to MS SQL
Server.Any ideas?
Please show us your SQL and the execution plan (EXPLAIN or, better,
EXPLAIN ANALYSE) and read our FAQ.
Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Wed, Jan 02, 2008 at 09:29:24AM -0600, Abraham, Danny wrote:
We are looking for a patch that will help us count using the indexes.
Is this for
SELECT count(*) FROM table;
or
SELECT count(1) FROM table WHERE. . .
The latter _will_ use an index, if the index is correct, the statistics are
right, and the index selectivity is worth the cost of reading the index.
The former will not use an index at all, because the answer depends on
visibility, and you can't know that without reading the table. If you're
counting how many rows are in the table (for, for instance, display
purposes), you probably need to do something else.
Our product is about 20 times slower on Postgres compared to MS SQL
Server.Any ideas?
Not without the queries, the EXPLAIN ANALYZE plans, and some information
about the database.
A
On 02/01/2008, Abraham, Danny <danny_abraham@bmc.com> wrote:
When comparing the OLTP part of our product, PG is about 15% slower
compared to Oracle, which is reasonable.When comparing the DSS part, it is about 20 times slower.
Unfortunately, we need both.
Send slow queries, please, and than we can help you.
don't forget on VACUUM and ANALYZE statements before and then send
execution plans
http://www.postgresql.org/docs/8.2/static/sql-explain.html
Show quoted text
-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, January 02, 2008 5:47 PM
To: Abraham, Danny
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Slow count(*)On 02/01/2008, Abraham, Danny <danny_abraham@bmc.com> wrote:
Hi,
We are looking for a patch that will help us count using the indexes.
Our product is about 20 times slower on Postgres compared to MS SQL
Server.Any ideas?
There isn't any similar patch and will not be.
Use materialized views or similar techniques.
Are you sure, so all your problems are only in SELECT COUNT(*)?
Check, please, all slow queries.
Regards
Pavel StehuleDanny Abraham
BMC Software
CTM&D Business Unit
972-52-4286-513
danny_abraham@bmc.com---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Import Notes
Reply to msg id not found: BE67D1149BBD5746984545ED91F702E04DCCF3@hou-ex-02.adprod.bmc.com
On Wed, 2008-01-02 at 09:29 -0600, Abraham, Danny wrote:
Our product is about 20 times slower on Postgres compared to MS SQL
Server.
If you want to have a cross-platform product then you must consider how
to access multiple systems both accurately and quickly. Not much point
catering for the different SQL dialects and then ignoring the
performance differences. All products are not the same; you will find
many advantages with Postgres.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Wed, Jan 2, 2008 at 9:29 AM, in message
<BE67D1149BBD5746984545ED91F702E04DCC7A@hou-ex-02.adprod.bmc.com>, "Abraham,
Danny" <danny_abraham@bmc.com> wrote:
We are looking for a patch that will help us count using the indexes.
As others have mentioned, that's not currently possible for a count
of all rows in a table, because there can be many versions of a row
under PostgreSQL's MVCC techniques, and the row must currently be
visited to determine whether it is visible in the context of your
database transaction.
Our product is about 20 times slower on Postgres compared to MS SQL
Server.Any ideas?
Again, it is best to show a particular example of a problem, because
you might be making a bad assumption about the cause of your slowness.
If you don't understand MVCC and the need for maintenance, you might
have table bloat which could be the issue. Also, always give the
exact version of PostgreSQL, the OS, and a description of the
hardware.
If you really are doing proper maintenance, and you don't need exact
counts, you might be able to use the approximation stored in the
system tables:
cc=> \timing
Timing is on.
cc=> select count(*) from "Party";
count
--------
135093
(1 row)
Time: 48.626 ms
cc=> select reltuples from pg_class where relname = 'Party';
reltuples
-----------
135091
(1 row)
Time: 9.799 ms
-Kevin
Kevin Grittner wrote:
If you really are doing proper maintenance, and you don't need exact
counts, you might be able to use the approximation stored in the
system tables:
Also, if you're using count(*) as an existance test (common in Mysql
code), it's better to use exists instead. Using a table in my system, I
see:
proddb=> explain analyze select count(*) from instrument_listings
where update_date is null and delist_date is null;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18385.45..18385.46 rows=1 width=0) (actual
time=897.799..897.801 rows=1 loops=1)
-> Seq Scan on instrument_listings (cost=0.00..17973.43
rows=164807 width=0) (actual time=0.018..634.197 rows=146122 loops=1)
Filter: ((update_date IS NULL) AND (delist_date IS NULL))
Total runtime: 897.846 ms
(4 rows)Time: 898.478 ms
proddb=> explain analyze select true where exists(select 1 from
instrument_listings where update_date is null and delist_date is null
limit 1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.11..0.12 rows=1 width=0) (actual time=0.026..0.028
rows=1 loops=1)
One-Time Filter: $0
InitPlan
-> Limit (cost=0.00..0.11 rows=1 width=0) (actual
time=0.022..0.022 rows=1 loops=1)
-> Seq Scan on instrument_listings (cost=0.00..17973.43
rows=164807 width=0) (actual time=0.017..0.017 rows=1 loops=1)
Filter: ((update_date IS NULL) AND (delist_date IS NULL))
Total runtime: 0.063 ms
(7 rows)Time: 0.768 ms
proddb=>
The exists version is over 1000x faster (and the only reason it's not
more studly is that I'm working on the table as we speak, so it's all in
memory).
As a general rule in postgres, don't do count(*) unless you really mean it.
Brian