Planner very slow on same query to slightly different tables

Started by Nonameover 23 years ago6 messages
#1Noname
reina@nsi.edu

I'm using Postgres 7.2.1 on a dual-Athlon running RedHat 7.3bigmem
with 2 Gig of RAM and a 240 Gig RAID 5 (3ware IDE RAID). I just did a
'vacuum analyze' on the database, however the same query to two
similar tables is coming up quite different. The two tables only
differ in that one ("center_out_cell") has an extra int2 field called
"target" which can take up to 8 different values.

Here are the queries:

db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:

Unique (cost=87795.47..87795.80 rows=13 width=5)
-> Sort (cost=87795.47..87795.47 rows=131 width=5)
-> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131
width=5)

EXPLAIN
db02=# explain select distinct area from circles_cell where subject =
'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:

Unique (cost=258.36..258.52 rows=6 width=5)
-> Sort (cost=258.36..258.36 rows=64 width=5)
-> Index Scan using pk1circles_cell on circles_cell
(cost=0.00..256.43 rows=64 width=5)

EXPLAIN

Here are the definitions for the 2 tables:

db02=# \d center_out_cell
Table "center_out_cell"
Column | Type | Modifiers
------------+--------------------+-----------
subject | text |
arm | character(1) |
target | smallint |
rep | integer |
direction | smallint |
success | smallint |
hemisphere | character(1) |
area | text |
filenumber | integer |
dsp_chan | text |
num_spikes | integer |
spike_data | double precision[] |
Unique keys: pk0center_out_cell,
pk1center_out_cell

where:
db02=# \d pk1center_out_cell
Index "pk1center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 1)

and

db02=# \d pk0center_out_cell
Index "pk0center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 0)

db02=# \d circles_cell
Table "circles_cell"
Column | Type | Modifiers
------------+--------------------+-----------
subject | text |
arm | character(1) |
rep | integer |
direction | smallint |
success | smallint |
hemisphere | character(1) |
area | text |
filenumber | integer |
dsp_chan | text |
num_spikes | integer |
spike_data | double precision[] |
Unique keys: pk0circles_cell,
pk1circles_cell

where:

db02=# \d pk1circles_cell
Index "pk1circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 1)

db02=# \d pk0circles_cell
Index "pk0circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 0)

Now I know that, due to the extra field "target", "center_out_cell"
can be as large as 8 times "circles_cell", but according to the cost
of the planner, the statement is 340 times more costly. I think this
is because the planner is using the index in the circles_cell case and
not in the center_out_cell case. However, I don't pretend to
understand the intricasies of the planner to make an intelligent
guess. I've been trying random changes to postgresql.conf like
increasing the shared memory size, changing the random_page_cost size,
etc., but would like some help in trying to speed things up.

Here are some relevant settings from my postgresql.conf (made in an
attempt to max out buffers):

shared_buffers = 9000 # 2*max_connections, min 16
wal_buffers = 32 # min 4
sort_mem = 64000 # min 32
vacuum_mem = 16384 # min 1024
wal_files = 32
effective_cache_size = 1000 # default in 8k pages

Thanks in advance.
-Tony

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: Planner very slow on same query to slightly different tables

reina@nsi.edu (Tony Reina) writes:

db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:

Unique (cost=87795.47..87795.80 rows=13 width=5)
-> Sort (cost=87795.47..87795.47 rows=131 width=5)
-> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131
width=5)

Index "pk1center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 1)

I imagine the problem with this index is that there's no constraint for
"target" in the query; so the planner could only use the first two index
columns (subject and arm), which probably isn't very selective. The
index used in the other query is defined differently:

db02=# \d pk1circles_cell
Index "pk1circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 1)

This allows "rep" to be used in the indexscan too (and if you were to
cast properly, viz "direction = 1::smallint", then that column could be
used as well).

regards, tom lane

#3Tony Reina
reina@nsi.edu
In reply to: Tom Lane (#2)
Re: Planner very slow on same query to slightly

If I understand correctly, I tried specifying the target and even casting
all of the smallint's, but it still is a slow estimate. Perhaps, this is
just due to a large amount of data, but my gut is telling me that I have
something wrong here.

db02=# explain select distinct area from center_out_cell where subject =
'M' and arm = 'R' and rep = 10 and success = 1::smallint and direction =
1::smallint and target = 3::smallint;
NOTICE: QUERY PLAN:
Unique (cost=100105115.88..100105115.93 rows=2 width=5)
-> Sort (cost=100105115.88..100105115.88 rows=19 width=5)
-> Seq Scan on center_out_cell (cost=100000000.00..100105115.47
rows=19 width=5)
EXPLAIN
db02=# explain select distinct area from center_out_cell where subject =
'M' and arm = 'R' and rep = 10::int and success = 1::smallint and direction
= 1::smallint and target = 3::smallint;
NOTICE: QUERY PLAN:
Unique (cost=100105115.88..100105115.93 rows=2 width=5)
-> Sort (cost=100105115.88..100105115.88 rows=19 width=5)
-> Seq Scan on center_out_cell (cost=100000000.00..100105115.47
rows=19 width=5)
EXPLAIN
db02=#

-Tony

At 09:47 PM 7/17/02 -0400, Tom Lane wrote:

Show quoted text

reina@nsi.edu (Tony Reina) writes:

db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE: QUERY PLAN:

Unique (cost=87795.47..87795.80 rows=13 width=5)
-> Sort (cost=87795.47..87795.47 rows=131 width=5)
-> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131
width=5)

Index "pk1center_out_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
target | smallint
rep | integer
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
direction | smallint
unique btree
Index predicate: (success = 1)

I imagine the problem with this index is that there's no constraint for
"target" in the query; so the planner could only use the first two index
columns (subject and arm), which probably isn't very selective. The
index used in the other query is defined differently:

db02=# \d pk1circles_cell
Index "pk1circles_cell"
Column | Type
------------+--------------
subject | text
arm | character(1)
rep | integer
direction | smallint
hemisphere | character(1)
area | text
filenumber | integer
dsp_chan | text
unique btree
Index predicate: (success = 1)

This allows "rep" to be used in the indexscan too (and if you were to
cast properly, viz "direction = 1::smallint", then that column could be
used as well).

regards, tom lane

#4Noname
reina@nsi.edu
In reply to: Tony Reina (#3)
Inheritance a burden?

As a followup to my slow query plans:

I've experimented with removing the inheritance schema to see how it
affects my database calls. Originally, because several fields of the
primary key (subject, arm, rep, direction, success) were common to
every table, I made those fields a separate table and had subsequent
tables inherit the fields. Just out of curiosity, I built a new
database with the same data but didn't use the inheritance (i.e. each
table had its own copy of those common fields). It looks like about a
20% increase in execution speed when I run my programs side by side.
I'm not sure if that kind of performance hit should be expected.
Anyone have an idea about this?

-Tony

reina@nsi.edu (Tony Reina) wrote in message news:<5.1.1.6.0.20020718095319.009ecec0@schubert.nsi.edu>...

Show quoted text

If I understand correctly, I tried specifying the target and even casting
all of the smallint's, but it still is a slow estimate. Perhaps, this is
just due to a large amount of data, but my gut is telling me that I have
something wrong here.

#5Curt Sampson
cjs@cynic.net
In reply to: Noname (#4)
Re: Inheritance a burden?

On 19 Jul 2002, Tony Reina wrote:

Just out of curiosity, I built a new
database with the same data but didn't use the inheritance (i.e. each
table had its own copy of those common fields). It looks like about a
20% increase in execution speed when I run my programs side by side.

Have you tried it using the standard relational method of doing this?
(I.e., you put the common fields in one table, and the extra fields in
other tables, along with a foreign key relating the extra fields back
to the main table.) That would more accurately replacate what you were
doing with inheritance.

I have a suspicion, in fact, that inheritance may just be syntatic sugar
for doing this and adding a couple of views. :-)

Anyway, it could be that by denormalizing the data (copying it to the
other tables), you reduced the number of joins you do, and so you got a
performance increase.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#6Tony Reina
reina@nsi.edu
In reply to: Curt Sampson (#5)
Re: Inheritance a burden?

At 02:17 PM 7/20/02 +0900, Curt Sampson wrote:

Have you tried it using the standard relational method of doing this?
(I.e., you put the common fields in one table, and the extra fields in
other tables, along with a foreign key relating the extra fields back
to the main table.) That would more accurately replacate what you were
doing with inheritance.

I have a suspicion, in fact, that inheritance may just be syntatic sugar
for doing this and adding a couple of views. :-)

Yes, I thought this was the case too. I haven't specifically setup foreign
keys, but I was under the impression that the "INHERITS" command would do this.

Anyway, it could be that by denormalizing the data (copying it to the
other tables), you reduced the number of joins you do, and so you got a
performance increase.

Yes, I guess this is probably the case although it speaks against
normalizing too much. I guess too much of a good thing is bad.

-Tony