Planner very slow on same query to slightly different tables
I'm using Postgres 7.2.1 on a dual-Athlon running RedHat 7.3bigmem
with 2 Gig of RAM and a 240 Gig RAID 5 (3ware IDE RAID). I just did a
'vacuum analyze' on the database, however the same query to two
similar tables is coming up quite different. The two tables only
differ in that one ("center_out_cell") has an extra int2 field called
"target" which can take up to 8 different values.
Here are the queries:
db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:
Unique (cost=87795.47..87795.80 rows=13 width=5)
-> Sort (cost=87795.47..87795.47 rows=131 width=5)
-> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131
width=5)
EXPLAIN
db02=# explain select distinct area from circles_cell where subject =
'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:
Unique (cost=258.36..258.52 rows=6 width=5)
-> Sort (cost=258.36..258.36 rows=64 width=5)
-> Index Scan using pk1circles_cell on circles_cell
(cost=0.00..256.43 rows=64 width=5)
EXPLAIN
Here are the definitions for the 2 tables:
db02=# \d center_out_cell
Table "center_out_cell"
Column | Type | Modifiers
------------+--------------------+-----------
subject | text |
arm | character(1) |
target | smallint |
rep | integer |
direction | smallint |
success | smallint |
hemisphere | character(1) |
area | text |
filenumber | integer |
dsp_chan | text |
num_spikes | integer |
spike_data | double precision[] |
Unique keys: pk0center_out_cell,
pk1center_out_cell
where:
db02=# \d pk1center_out_cell
Index "pk1center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 1)
and
db02=# \d pk0center_out_cell
Index "pk0center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 0)
db02=# \d circles_cell
Table "circles_cell"
Column | Type | Modifiers
------------+--------------------+-----------
subject | text |
arm | character(1) |
rep | integer |
direction | smallint |
success | smallint |
hemisphere | character(1) |
area | text |
filenumber | integer |
dsp_chan | text |
num_spikes | integer |
spike_data | double precision[] |
Unique keys: pk0circles_cell,
pk1circles_cell
where:
db02=# \d pk1circles_cell
Index "pk1circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 1)
db02=# \d pk0circles_cell
Index "pk0circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 0)
Now I know that, due to the extra field "target", "center_out_cell"
can be as large as 8 times "circles_cell", but according to the cost
of the planner, the statement is 340 times more costly. I think this
is because the planner is using the index in the circles_cell case and
not in the center_out_cell case. However, I don't pretend to
understand the intricasies of the planner to make an intelligent
guess. I've been trying random changes to postgresql.conf like
increasing the shared memory size, changing the random_page_cost size,
etc., but would like some help in trying to speed things up.
Here are some relevant settings from my postgresql.conf (made in an
attempt to max out buffers):
shared_buffers = 9000 # 2*max_connections, min 16
wal_buffers = 32 # min 4
sort_mem = 64000 # min 32
vacuum_mem = 16384 # min 1024
wal_files = 32
effective_cache_size = 1000 # default in 8k pages
Thanks in advance.
-Tony
reina@nsi.edu (Tony Reina) writes:
db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:
Unique (cost=87795.47..87795.80 rows=13 width=5)
-> Sort (cost=87795.47..87795.47 rows=131 width=5)
-> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131
width=5)
Index "pk1center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 1)
I imagine the problem with this index is that there's no constraint for
"target" in the query; so the planner could only use the first two index
columns (subject and arm), which probably isn't very selective. The
index used in the other query is defined differently:
db02=# \d pk1circles_cell
Index "pk1circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 1)
This allows "rep" to be used in the indexscan too (and if you were to
cast properly, viz "direction = 1::smallint", then that column could be
used as well).
regards, tom lane
If I understand correctly, I tried specifying the target and even casting
all of the smallint's, but it still is a slow estimate. Perhaps, this is
just due to a large amount of data, but my gut is telling me that I have
something wrong here.
db02=# explain select distinct area from center_out_cell where subject =
'M' and arm = 'R' and rep = 10 and success = 1::smallint and direction =
1::smallint and target = 3::smallint;
NOTICE: QUERY PLAN:
Unique (cost=100105115.88..100105115.93 rows=2 width=5)
-> Sort (cost=100105115.88..100105115.88 rows=19 width=5)
-> Seq Scan on center_out_cell (cost=100000000.00..100105115.47
rows=19 width=5)
EXPLAIN
db02=# explain select distinct area from center_out_cell where subject =
'M' and arm = 'R' and rep = 10::int and success = 1::smallint and direction
= 1::smallint and target = 3::smallint;
NOTICE: QUERY PLAN:
Unique (cost=100105115.88..100105115.93 rows=2 width=5)
-> Sort (cost=100105115.88..100105115.88 rows=19 width=5)
-> Seq Scan on center_out_cell (cost=100000000.00..100105115.47
rows=19 width=5)
EXPLAIN
db02=#
-Tony
At 09:47 PM 7/17/02 -0400, Tom Lane wrote:
Show quoted text
reina@nsi.edu (Tony Reina) writes:
db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:Unique (cost=87795.47..87795.80 rows=13 width=5)
-> Sort (cost=87795.47..87795.47 rows=131 width=5)
-> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131
width=5)Index "pk1center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 1)I imagine the problem with this index is that there's no constraint for
"target" in the query; so the planner could only use the first two index
columns (subject and arm), which probably isn't very selective. The
index used in the other query is defined differently:db02=# \d pk1circles_cell
Index "pk1circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 1)This allows "rep" to be used in the indexscan too (and if you were to
cast properly, viz "direction = 1::smallint", then that column could be
used as well).regards, tom lane
As a followup to my slow query plans:
I've experimented with removing the inheritance schema to see how it
affects my database calls. Originally, because several fields of the
primary key (subject, arm, rep, direction, success) were common to
every table, I made those fields a separate table and had subsequent
tables inherit the fields. Just out of curiosity, I built a new
database with the same data but didn't use the inheritance (i.e. each
table had its own copy of those common fields). It looks like about a
20% increase in execution speed when I run my programs side by side.
I'm not sure if that kind of performance hit should be expected.
Anyone have an idea about this?
-Tony
reina@nsi.edu (Tony Reina) wrote in message news:<5.1.1.6.0.20020718095319.009ecec0@schubert.nsi.edu>...
Show quoted text
If I understand correctly, I tried specifying the target and even casting
all of the smallint's, but it still is a slow estimate. Perhaps, this is
just due to a large amount of data, but my gut is telling me that I have
something wrong here.
On 19 Jul 2002, Tony Reina wrote:
Just out of curiosity, I built a new
database with the same data but didn't use the inheritance (i.e. each
table had its own copy of those common fields). It looks like about a
20% increase in execution speed when I run my programs side by side.
Have you tried it using the standard relational method of doing this?
(I.e., you put the common fields in one table, and the extra fields in
other tables, along with a foreign key relating the extra fields back
to the main table.) That would more accurately replacate what you were
doing with inheritance.
I have a suspicion, in fact, that inheritance may just be syntatic sugar
for doing this and adding a couple of views. :-)
Anyway, it could be that by denormalizing the data (copying it to the
other tables), you reduced the number of joins you do, and so you got a
performance increase.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
At 02:17 PM 7/20/02 +0900, Curt Sampson wrote:
Have you tried it using the standard relational method of doing this?
(I.e., you put the common fields in one table, and the extra fields in
other tables, along with a foreign key relating the extra fields back
to the main table.) That would more accurately replacate what you were
doing with inheritance.I have a suspicion, in fact, that inheritance may just be syntatic sugar
for doing this and adding a couple of views. :-)
Yes, I thought this was the case too. I haven't specifically setup foreign
keys, but I was under the impression that the "INHERITS" command would do this.
Anyway, it could be that by denormalizing the data (copying it to the
other tables), you reduced the number of joins you do, and so you got a
performance increase.
Yes, I guess this is probably the case although it speaks against
normalizing too much. I guess too much of a good thing is bad.
-Tony