Postgres not using indexes
We have a huge performance issues in Postgres that surfaced due to existing indexes not being used like in the example below in both 8.35 and 9.0 versions.
Client_Orders table with and int ID as PK which is the order_id and indexed - about 155,000 rows
Order_Items table with and int ID primary key and INDEX on Order_id (int) matching the ID in the above client_orders table. - about 33 million rows
A query like below takes almost ten minutes to complete however the result set is 33768 rows and Total query runtime: 427539 ms.!!! This is due to the fact that the index on Order_Items it is NOT used and a sequence scan is done instead but this is obviously not acceptable from performance point of view. If I add a LIMIT 1000 for instance then the index is used and query returns results in no time as expected but as soon as I go higher in the limit to a few thousands then the index on Order_Items.Order_id is no longer used - why??? Is there any way to force Postgres to use the existing indexes instead of table seq scan which is deadly?
select oi.id from order_items oi INNER JOIN client_orders co ON oi.order_id = co.id
Regards,
Nenea Nelu.
________________________________
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
I force postgresql to use indexes instead of sequential scans by setting
enable_seqscan = off in postgresql.conf and it helps in a lot of cases.
Probably not the best practice, but it does improve a lot of the queries
we will execute on a regular basis. It forces the planner to prefer
indexes. I've also noticed that limit behavior which is sort of puzzling
to me.
From: pgsql-bugs-owner@postgresql.org
[mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Lawrence Cohan
Sent: Wednesday, March 30, 2011 10:01 AM
To: pgsql-bugs@postgresql.org
Subject: [BUGS] Postgres not using indexes
We have a huge performance issues in Postgres that surfaced due to
existing indexes not being used like in the example below in both 8.35
and 9.0 versions.
Client_Orders table with and int ID as PK which is the order_id and
indexed - about 155,000 rows
Order_Items table with and int ID primary key and INDEX on Order_id
(int) matching the ID in the above client_orders table. - about 33
million rows
A query like below takes almost ten minutes to complete however the
result set is 33768 rows and Total query runtime: 427539 ms.!!! This is
due to the fact that the index on Order_Items it is NOT used and a
sequence scan is done instead but this is obviously not acceptable from
performance point of view. If I add a LIMIT 1000 for instance then the
index is used and query returns results in no time as expected but as
soon as I go higher in the limit to a few thousands then the index on
Order_Items.Order_id is no longer used - why??? Is there any way to
force Postgres to use the existing indexes instead of table seq scan
which is deadly?
select oi.id from order_items oi INNER JOIN client_orders co ON
oi.order_id = co.id
Regards,
Nenea Nelu.
________________________________
Attention:
The information contained in this message and or attachments is intended
only for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended
recipient is prohibited. If you received this in error, please contact
the sender and delete the material from any system and destroy any
copies.
Hello
2011/3/30 Lawrence Cohan <LCohan@web.com>:
We have a huge performance issues in Postgres that surfaced due to existing
indexes not being used like in the example below in both 8.35 and 9.0
versions.Client_Orders table with and int ID as PK which is the order_id and indexed
– about 155,000 rowsOrder_Items table with and int ID primary key and INDEX on Order_id (int)
matching the ID in the above client_orders table. – about 33 million rowsA query like below takes almost ten minutes to complete however the result
set is 33768 rows and Total query runtime: 427539 ms.!!! This is due to the
fact that the index on Order_Items it is NOT used and a sequence scan is
done instead but this is obviously not acceptable from performance point of
view. If I add a LIMIT 1000 for instance then the index is used and query
returns results in no time as expected but as soon as I go higher in the
limit to a few thousands then the index on Order_Items.Order_id is no longer
used – why??? Is there any way to force Postgres to use the existing indexes
instead of table seq scan which is deadly?select oi.id from order_items oi INNER JOIN client_orders co ON oi.order_id
= co.id
Do you do a ANALYZE and VACUUM. Can you send a result of EXPLAIN
ANALYZE SELECT ...
Please, do ANALYZE and VACUUM first.
regards
Pavel Stehule
Show quoted text
Regards,
Nenea Nelu.
________________________________
Attention:
The information contained in this message and or attachments is intended
only for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
delete the material from any system and destroy any copies.
Thanks for the tip however No 1 is that we can't do that in the production environment due to impact and No 2 that I tried that and is still not using an index on the large table but seq scan.
From: Nathan M. Davalos [mailto:n.davalos@sharedmarketing.com]
Sent: March-30-11 12:05 PM
To: Lawrence Cohan; pgsql-bugs@postgresql.org
Subject: RE: [BUGS] Postgres not using indexes
I force postgresql to use indexes instead of sequential scans by setting enable_seqscan = off in postgresql.conf and it helps in a lot of cases. Probably not the best practice, but it does improve a lot of the queries we will execute on a regular basis. It forces the planner to prefer indexes. I've also noticed that limit behavior which is sort of puzzling to me.
From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Lawrence Cohan
Sent: Wednesday, March 30, 2011 10:01 AM
To: pgsql-bugs@postgresql.org
Subject: [BUGS] Postgres not using indexes
We have a huge performance issues in Postgres that surfaced due to existing indexes not being used like in the example below in both 8.35 and 9.0 versions.
Client_Orders table with and int ID as PK which is the order_id and indexed - about 155,000 rows
Order_Items table with and int ID primary key and INDEX on Order_id (int) matching the ID in the above client_orders table. - about 33 million rows
A query like below takes almost ten minutes to complete however the result set is 33768 rows and Total query runtime: 427539 ms.!!! This is due to the fact that the index on Order_Items it is NOT used and a sequence scan is done instead but this is obviously not acceptable from performance point of view. If I add a LIMIT 1000 for instance then the index is used and query returns results in no time as expected but as soon as I go higher in the limit to a few thousands then the index on Order_Items.Order_id is no longer used - why??? Is there any way to force Postgres to use the existing indexes instead of table seq scan which is deadly?
select oi.id from order_items oi INNER JOIN client_orders co ON oi.order_id = co.id
Regards,
Nenea Nelu.
________________________________
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
________________________________
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
Lawrence Cohan <LCohan@web.com> wrote:
We have a huge performance issues in Postgres that surfaced due to
existing indexes not being used
This doesn't sound like a bug; it sounds like you haven't tuned your
server.
For starters, you should check out this page:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
As a quick try, you could issue these statements on the connection
right before one of the problem queries:
set effective_cache_size = '7GB';
-- use 1 or 2 GB less than RAM on the machine
set random_page_cost = 2;
-- now try your query
If, after reading the above-cited page and tuning your server you
still have performance problems, pick one query to work on first,
and follow the step outlined here:
http://wiki.postgresql.org/wiki/SlowQueryQuestions
Use the pgsql-performance list for performance issues, not the bugs
list. You'll find more people who will be able to help you with
performance issues there.
-Kevin
Please see results attached before and after the vacuum and note they are taken from version 9.0
As regular maintenance we reindex/vacuum/analyze entire database once a week and run ANALYZE against it every few hours.
Lawrence Cohan.
-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: March-30-11 12:08 PM
To: Lawrence Cohan
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres not using indexes
Hello
2011/3/30 Lawrence Cohan <LCohan@web.com>:
We have a huge performance issues in Postgres that surfaced due to existing
indexes not being used like in the example below in both 8.35 and 9.0
versions.Client_Orders table with and int ID as PK which is the order_id and indexed
– about 155,000 rowsOrder_Items table with and int ID primary key and INDEX on Order_id (int)
matching the ID in the above client_orders table. – about 33 million rowsA query like below takes almost ten minutes to complete however the result
set is 33768 rows and Total query runtime: 427539 ms.!!! This is due to the
fact that the index on Order_Items it is NOT used and a sequence scan is
done instead but this is obviously not acceptable from performance point of
view. If I add a LIMIT 1000 for instance then the index is used and query
returns results in no time as expected but as soon as I go higher in the
limit to a few thousands then the index on Order_Items.Order_id is no longer
used – why??? Is there any way to force Postgres to use the existing indexes
instead of table seq scan which is deadly?select oi.id from order_items oi INNER JOIN client_orders co ON oi.order_id
= co.id
Do you do a ANALYZE and VACUUM. Can you send a result of EXPLAIN
ANALYZE SELECT ...
Please, do ANALYZE and VACUUM first.
regards
Pavel Stehule
Regards,
Nenea Nelu.
________________________________
Attention:
The information contained in this message and or attachments is intended
only for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
delete the material from any system and destroy any copies.
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
Attachments:
We thank you for the links that have a lots of info and please note that we tuned our servers as recommended by Enterprise DB experts while they were in house for our hardware/software migrations and the setting you mentioned are in place already.
Regards,
Lawrence Cohan.
-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: March-30-11 12:45 PM
To: pgsql-bugs@postgresql.org; Lawrence Cohan
Subject: Re: [BUGS] Postgres not using indexes
Lawrence Cohan <LCohan@web.com> wrote:
We have a huge performance issues in Postgres that surfaced due to
existing indexes not being used
This doesn't sound like a bug; it sounds like you haven't tuned your
server.
For starters, you should check out this page:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
As a quick try, you could issue these statements on the connection
right before one of the problem queries:
set effective_cache_size = '7GB';
-- use 1 or 2 GB less than RAM on the machine
set random_page_cost = 2;
-- now try your query
If, after reading the above-cited page and tuning your server you
still have performance problems, pick one query to work on first,
and follow the step outlined here:
http://wiki.postgresql.org/wiki/SlowQueryQuestions
Use the pgsql-performance list for performance issues, not the bugs
list. You'll find more people who will be able to help you with
performance issues there.
-Kevin
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
Lawrence Cohan <LCohan@web.com> wrote:
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
[configuration advice]
If, after reading the above-cited page and tuning your server you
still have performance problems, pick one query to work on first,
and follow the step outlined here:
We thank you for the links that have a lots of info and please
note that we tuned our servers as recommended by Enterprise DB
experts while they were in house for our hardware/software
migrations and the setting you mentioned are in place already.
Then the next step would be to provide enough information on one of
the slow queries for people to be able to offer useful advice. Your
other post showed the query and the EXPLAIN ANALYZE output, but the
other information listed in the above-cited page is useful when
trying to understand a problem. I'm particularly curious about the
data types of the id columns and the specifics of the index
definitions.
-Kevin
Please see updated attachment that includes the tables involved in the simple query below and all their indexes. We believe that the performance issue is due to the query not using any index but doing seq scans instead and this is very little related to the knowledge from the link you posted below. As you can see we picked a simple query with INNER JOIN between two indexed tables where postgres 8.3 and 9.0 decides to not use existing indexes for whatever reason.
select oi.id from order_items oi inner join clients_orders co on oi.order_id = co.id;
Lawrence Cohan.
-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: March-30-11 1:33 PM
To: pgsql-bugs@postgresql.org; Lawrence Cohan
Subject: RE: [BUGS] Postgres not using indexes
Lawrence Cohan <LCohan@web.com> wrote:
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
[configuration advice]
If, after reading the above-cited page and tuning your server you
still have performance problems, pick one query to work on first,
and follow the step outlined here:
We thank you for the links that have a lots of info and please
note that we tuned our servers as recommended by Enterprise DB
experts while they were in house for our hardware/software
migrations and the setting you mentioned are in place already.
Then the next step would be to provide enough information on one of
the slow queries for people to be able to offer useful advice. Your
other post showed the query and the EXPLAIN ANALYZE output, but the
other information listed in the above-cited page is useful when
trying to understand a problem. I'm particularly curious about the
data types of the id columns and the specifics of the index
definitions.
-Kevin
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
Attachments:
Lawrence Cohan <LCohan@web.com> wrote:
Please see updated attachment that includes the tables involved in
the simple query below and all their indexes.
Well, that rules out a couple common problems (comparisons between
different types and incorrect indexing).
We believe that the performance issue is due to the query not
using any index but doing seq scans instead
So it appears.
and this is very little related to the knowledge from the link you
posted below.
Oh, but it is very much related. The PostgreSQL optimizer looks at
all the various plans available, calculates a cost for each, and run
the one with the lowest calculated cost. Various configuration
parameters affect the costing calculations, and thus the plan
ultimately chosen. To get good plans, the configuration must
accurately model the actual costs for your particular machine.
Having seen that the types match and the indexes look usable, it
must come down to something in your configuration. Probably the
easiest way to show that is to run the query here and post the
results:
http://wiki.postgresql.org/wiki/Server_Configuration
As you can see we picked a simple query with INNER JOIN between
two indexed tables where postgres 8.3 and 9.0 decides to not use
existing indexes for whatever reason.
The reason is that with the configuration you're using, PostgreSQL
calculates the cost of using the index as being higher than the cost
of a sequential scan. The trick is to find where your configuration
is wrong, so that the calculated costs better match the reality on
your server.
-Kevin
I think you are right (my bad) and please see the results below plus a little bit more info about the environment and sorry I missed that before. I've been told the server was tuned to the best for what we need and looks like we will need to change at least the two values below and maybe play with work_mem to see if it solves our issues.
The only issue is that we are running a 24/7 web site against the db and if we need to restart PG for the changes to take place we will need to wait for a downtime before any changes can be made.
'shared_buffers';'500MB' - shared_buffers should be 10% to 25% of available RAM -> change it to 2GB
'effective_cache_size';'2GB' - effective_cache_size should be 75% of available RAM -> change it to 10GB
'work_mem';'1MB' - increase it to 8MB, 32MB, 256MB, 1GB and check if better results.
PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
2 x Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
4 x 4GB = 16GB RAM
--query results below:
'version';'PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit'
'archive_command';'cp %p /pglog/wal_export/%f'
'archive_mode';'on'
'archive_timeout';'3min'
'autovacuum_analyze_threshold';'1000'
'autovacuum_vacuum_threshold';'1000'
'bytea_output';'escape'
'checkpoint_segments';'64'
'checkpoint_warning';'1min'
'client_encoding';'UNICODE'
'effective_cache_size';'2GB'
'escape_string_warning';'off'
'lc_collate';'en_US.UTF-8'
'lc_ctype';'en_US.UTF-8'
'listen_addresses';'xxx.xxx.xxx.xxx'
'log_autovacuum_min_duration';'2s'
'log_checkpoints';'on'
'log_destination';'syslog'
'log_line_prefix';'user=%u,db=%d '
'log_min_duration_statement';'1s'
'maintenance_work_mem';'256MB'
'max_connections';'1200'
'max_stack_depth';'2MB'
'port';'5432'
'server_encoding';'UTF8'
'shared_buffers';'500MB'
'syslog_facility';'local0'
'syslog_ident';'postgres'
'TimeZone';'Canada/Eastern'
'vacuum_cost_delay';'10ms'
'wal_buffers';'4MB'
'wal_level';'hot_standby'
-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: March-30-11 1:33 PM
To: pgsql-bugs@postgresql.org; Lawrence Cohan
Subject: RE: [BUGS] Postgres not using indexes
Lawrence Cohan <LCohan@web.com> wrote:
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
[configuration advice]
If, after reading the above-cited page and tuning your server you
still have performance problems, pick one query to work on first,
and follow the step outlined here:
We thank you for the links that have a lots of info and please
note that we tuned our servers as recommended by Enterprise DB
experts while they were in house for our hardware/software
migrations and the setting you mentioned are in place already.
Then the next step would be to provide enough information on one of
the slow queries for people to be able to offer useful advice. Your
other post showed the query and the EXPLAIN ANALYZE output, but the
other information listed in the above-cited page is useful when
trying to understand a problem. I'm particularly curious about the
data types of the id columns and the specifics of the index
definitions.
-Kevin
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
I'm just a lowly end user. Bumping the default statistics target or
using ALTER TABLE SET STATISTICS has made large differences in query
performance on large tables.
The query planner has to guesstimate using the statistics. Sometimes on
large tables the guesstimate isn't very good with a small statistical
sample.
Show quoted text
On 3/30/2011 12:05 PM, Lawrence Cohan wrote:
I think you are right (my bad) and please see the results below plus a little bit more info about the environment and sorry I missed that before. I've been told the server was tuned to the best for what we need and looks like we will need to change at least the two values below and maybe play with work_mem to see if it solves our issues.
The only issue is that we are running a 24/7 web site against the db and if we need to restart PG for the changes to take place we will need to wait for a downtime before any changes can be made.'shared_buffers';'500MB' - shared_buffers should be 10% to 25% of available RAM -> change it to 2GB
'effective_cache_size';'2GB' - effective_cache_size should be 75% of available RAM -> change it to 10GB
'work_mem';'1MB' - increase it to 8MB, 32MB, 256MB, 1GB and check if better results.PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
2 x Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
4 x 4GB = 16GB RAM--query results below:
'version';'PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit'
'archive_command';'cp %p /pglog/wal_export/%f'
'archive_mode';'on'
'archive_timeout';'3min'
'autovacuum_analyze_threshold';'1000'
'autovacuum_vacuum_threshold';'1000'
'bytea_output';'escape'
'checkpoint_segments';'64'
'checkpoint_warning';'1min'
'client_encoding';'UNICODE'
'effective_cache_size';'2GB'
'escape_string_warning';'off'
'lc_collate';'en_US.UTF-8'
'lc_ctype';'en_US.UTF-8'
'listen_addresses';'xxx.xxx.xxx.xxx'
'log_autovacuum_min_duration';'2s'
'log_checkpoints';'on'
'log_destination';'syslog'
'log_line_prefix';'user=%u,db=%d'
'log_min_duration_statement';'1s'
'maintenance_work_mem';'256MB'
'max_connections';'1200'
'max_stack_depth';'2MB'
'port';'5432'
'server_encoding';'UTF8'
'shared_buffers';'500MB'
'syslog_facility';'local0'
'syslog_ident';'postgres'
'TimeZone';'Canada/Eastern'
'vacuum_cost_delay';'10ms'
'wal_buffers';'4MB'
'wal_level';'hot_standby'-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: March-30-11 1:33 PM
To: pgsql-bugs@postgresql.org; Lawrence Cohan
Subject: RE: [BUGS] Postgres not using indexesLawrence Cohan<LCohan@web.com> wrote:
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
[configuration advice]
If, after reading the above-cited page and tuning your server you
still have performance problems, pick one query to work on first,
and follow the step outlined here:We thank you for the links that have a lots of info and please
note that we tuned our servers as recommended by Enterprise DB
experts while they were in house for our hardware/software
migrations and the setting you mentioned are in place already.Then the next step would be to provide enough information on one of
the slow queries for people to be able to offer useful advice. Your
other post showed the query and the EXPLAIN ANALYZE output, but the
other information listed in the above-cited page is useful when
trying to understand a problem. I'm particularly curious about the
data types of the id columns and the specifics of the index
definitions.-Kevin
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
Harry Rossignol <harrywr2@comcast.net> wrote:
I'm just a lowly end user. Bumping the default statistics target
or using ALTER TABLE SET STATISTICS has made large differences in
query performance on large tables.
The default has been bumped up in later versions, so that shouldn't
be as big a problem as it once was. In this particular case the
EXPLAIN ANALYZE showed the estimated row counts were quite accurate
compared to the actual, so this definitely wasn't the problem this
time.
-Kevin
Lawrence Cohan <LCohan@web.com> wrote:
looks like we will need to change at least the two values below
and maybe play with work_mem to see if it solves our issues.
You will probably get better throughput by bumping up
shared_buffers to the recommended setting, but beware of "stalls" in
query processing at checkpoint time. If that happens you want to
make the background writer more aggressive and/or back off on
shared_memory, so that there isn't such a glut of dirty pages to
write during a checkpoint.
I think even the recommended setting for effective_cache size is on
the low side. This one affects how expensive the optimizer thinks
index usage will be, so given your current problem this is probably
important to raise. I add up shared_buffers and what free tells me
is cached space is after PostgreSQL has been running a while. That
usually winds up being 1GB to 2GB less than total memory on our
machines, so actually, I usually just start there.
We usually need to reduce random_page_cost to get good plans. For a
fully-cached database you may want to reduce both seq_page_cost and
random_page_cost to equal numbers around 0.05. With partial
caching, we often leave seq_page_cost alone and reduce
random_page_cost to 2. YMMV.
The setting for work_mem can be tricky, especially with 1200
connections configured. Each connection may be using one or more
allocations of work_mem at the same time. Which leads to the
question of why you have 1200 connections configured. You are
almost always better off using a connection pooler to limit this to
something on the order of twice your CPU cores plus your effective
spindle count. Tomcat has a very good connection pooler built in,
as do many other products. There are also good external poolers,
like pgpool and pgbouncer.
With a reasonable amount of RAM you're almost always better off
bumping wal_buffers to 32MB.
The only issue is that we are running a 24/7 web site against the
db and if we need to restart PG for the changes to take place we
will need to wait for a downtime before any changes can be made.
Some of these can be set per user with ALTER ROLE. New connections
would then start using the new settings with no down time.
-Kevin
Thank you for all your suggestions - will attempt to make changes as recommended one at a time and will post back the results.
Regards,
Lawrence Cohan.
-----Original Message-----
From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Kevin Grittner
Sent: March-30-11 4:12 PM
To: pgsql-bugs@postgresql.org; Lawrence Cohan
Subject: Re: [BUGS] Postgres not using indexes
Lawrence Cohan <LCohan@web.com> wrote:
looks like we will need to change at least the two values below
and maybe play with work_mem to see if it solves our issues.
You will probably get better throughput by bumping up
shared_buffers to the recommended setting, but beware of "stalls" in
query processing at checkpoint time. If that happens you want to
make the background writer more aggressive and/or back off on
shared_memory, so that there isn't such a glut of dirty pages to
write during a checkpoint.
I think even the recommended setting for effective_cache size is on
the low side. This one affects how expensive the optimizer thinks
index usage will be, so given your current problem this is probably
important to raise. I add up shared_buffers and what free tells me
is cached space is after PostgreSQL has been running a while. That
usually winds up being 1GB to 2GB less than total memory on our
machines, so actually, I usually just start there.
We usually need to reduce random_page_cost to get good plans. For a
fully-cached database you may want to reduce both seq_page_cost and
random_page_cost to equal numbers around 0.05. With partial
caching, we often leave seq_page_cost alone and reduce
random_page_cost to 2. YMMV.
The setting for work_mem can be tricky, especially with 1200
connections configured. Each connection may be using one or more
allocations of work_mem at the same time. Which leads to the
question of why you have 1200 connections configured. You are
almost always better off using a connection pooler to limit this to
something on the order of twice your CPU cores plus your effective
spindle count. Tomcat has a very good connection pooler built in,
as do many other products. There are also good external poolers,
like pgpool and pgbouncer.
With a reasonable amount of RAM you're almost always better off
bumping wal_buffers to 32MB.
The only issue is that we are running a 24/7 web site against the
db and if we need to restart PG for the changes to take place we
will need to wait for a downtime before any changes can be made.
Some of these can be set per user with ALTER ROLE. New connections
would then start using the new settings with no down time.
-Kevin
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.
On Wed, Mar 30, 2011 at 7:32 PM, Lawrence Cohan <LCohan@web.com> wrote:
Please see updated attachment that includes the tables involved in the simple query below and all their indexes. We believe that the performance issue is due to the query not using any index but doing seq scans instead and this is very little related to the knowledge from the link you posted below. As you can see we picked a simple query with INNER JOIN between two indexed tables where postgres 8.3 and 9.0 decides to not use existing indexes for whatever reason.
Have you tried using enable_seqscan = off and looked at the explain output then?
I think you'll find Postgres is choosing not to use the indexes
because they're just not helpful. If you disable seqscan it should use
the indexes but I expect it will be slower. The current plan is doing
a single pass through both tables using entirely sequential i/o.
That's about as fast as you could hope for it to be as your query does
require reading all the data.
--
greg
Greg Stark <gsstark@mit.edu> wrote:
your query does require reading all the data.
Huh? It requires reading all the data from at least *one* of the
tables. I could conceivably be faster to read all the data from the
table with 23,980 rows and randomly pick out the necessary 33,768
rows from the table with 33,909,690 rows using the
idx_order_items_order_id index. That seems like it might be faster.
-Kevin
On Thu, Mar 31, 2011 at 11:33 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Greg Stark <gsstark@mit.edu> wrote:
your query does require reading all the data.
Huh? It requires reading all the data from at least *one* of the
tables.
The query he posted a plan for was:
EXPLAIN ANALYZE select oi.id from order_items oi inner join
clients_orders co on oi.order_id = co.id;
And the plan for it looks like it's optimal to me:
'Hash Join (cost=780.55..1908023.16 rows=1027457 width=4) (actual
time=63.506..85607.003 rows=33768 loops=1)'
' Hash Cond: (oi.order_id = co.id)'
' -> Seq Scan on order_items oi (cost=0.00..1558536.52
rows=33843152 width=8) (actual time=0.005..69718.563 rows=33909137
loops=1)'
' -> Hash (cost=480.80..480.80 rows=23980 width=4) (actual
time=13.072..13.072 rows=23980 loops=1)'
' Buckets: 4096 Batches: 1 Memory Usage: 844kB'
' -> Seq Scan on clients_orders co (cost=0.00..480.80
rows=23980 width=4) (actual time=0.006..6.570 rows=23980 loops=1)'
'Total runtime: 85613.391 ms'
--
greg
Greg Stark <gsstark@mit.edu> wrote:
On Thu, Mar 31, 2011 at 11:33 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:Greg Stark <gsstark@mit.edu> wrote:
your query does require reading all the data.
Huh? It requires reading all the data from at least *one* of the
tables.The query he posted a plan for was:
EXPLAIN ANALYZE select oi.id from order_items oi inner join
clients_orders co on oi.order_id = co.id;And the plan for it looks like it's optimal to me:
'Hash Join (cost=780.55..1908023.16 rows=1027457 width=4) (actual
time=63.506..85607.003 rows=33768 loops=1)'
' Hash Cond: (oi.order_id = co.id)'
' -> Seq Scan on order_items oi (cost=0.00..1558536.52
rows=33843152 width=8) (actual time=0.005..69718.563 rows=33909137
loops=1)'
' -> Hash (cost=480.80..480.80 rows=23980 width=4) (actual
time=13.072..13.072 rows=23980 loops=1)'
' Buckets: 4096 Batches: 1 Memory Usage: 844kB'
' -> Seq Scan on clients_orders co (cost=0.00..480.80
rows=23980 width=4) (actual time=0.006..6.570 rows=23980 loops=1)'
'Total runtime: 85613.391 ms'
It may or may not be optimal, but the assertion that all 33.9
*million* order_items rows must be read to pick out the needed 33.8
*thousand* is just plain incorrect. Personally, I won't be shocked
if using the index to cut the tuples accessed by three orders of
magnitude is faster.
-Kevin
Thank you for all your suggestions and I hope the "set enable_seqscan = off;" will work for the time being until we can make PG config changes and more testing in the near future. We expect indeed much better performance with index being used on the 33+million rows table vs seq scan and I will post back real time results as soon as I can get them done in production servers.
Regards,
Lawrence Cohan.
-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: April-01-11 10:38 AM
To: Greg Stark
Cc: pgsql-bugs@postgresql.org; Lawrence Cohan
Subject: Re: Postgres not using indexes
Greg Stark <gsstark@mit.edu> wrote:
On Thu, Mar 31, 2011 at 11:33 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:Greg Stark <gsstark@mit.edu> wrote:
your query does require reading all the data.
Huh? It requires reading all the data from at least *one* of the
tables.The query he posted a plan for was:
EXPLAIN ANALYZE select oi.id from order_items oi inner join
clients_orders co on oi.order_id = co.id;And the plan for it looks like it's optimal to me:
'Hash Join (cost=780.55..1908023.16 rows=1027457 width=4) (actual
time=63.506..85607.003 rows=33768 loops=1)'
' Hash Cond: (oi.order_id = co.id)'
' -> Seq Scan on order_items oi (cost=0.00..1558536.52
rows=33843152 width=8) (actual time=0.005..69718.563 rows=33909137
loops=1)'
' -> Hash (cost=480.80..480.80 rows=23980 width=4) (actual
time=13.072..13.072 rows=23980 loops=1)'
' Buckets: 4096 Batches: 1 Memory Usage: 844kB'
' -> Seq Scan on clients_orders co (cost=0.00..480.80
rows=23980 width=4) (actual time=0.006..6.570 rows=23980 loops=1)'
'Total runtime: 85613.391 ms'
It may or may not be optimal, but the assertion that all 33.9
*million* order_items rows must be read to pick out the needed 33.8
*thousand* is just plain incorrect. Personally, I won't be shocked
if using the index to cut the tuples accessed by three orders of
magnitude is faster.
-Kevin
Attention:
The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies.