Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

Started by Nonamealmost 25 years ago5 messages
#1Noname
bruc@stone.congenomics.com
3 attachment(s)

Dear Tom,
I am writing to you because you are the maintainer of the
query optimizer and planner.
I have found a very significant performance degradation
between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two
large applications that I have developed and maintain for several
clients. The performance difference is seen with the use of indices in
SELECT statements, whereby the current release does not make effective
use of the indices and 6.5.3 does. All of these tests were run on a SGI
R10000 Indigo2 system running Irix 6.5. All the regression tests passed
as expected for both versions.
I have followed the discussion in pgsql-hackers over the previous
months and others have noted some performance problems, and the response
has typically been to VACUUM the tables. Unfortunately, this is not a
practical option for my applications. They are very large -- I have one
table that is 17GB in length, and the applications are used frequently.
More importantly, PostgreSQL 6.5.3 works very, very well without
VACUUM'ing.
In order to assist you to diagnosing and correcting this
problem, I have prepared a test database that shows the problems. I
will attach three files; the test script, the log from running it on
version 6.5.3, and the log from running it on version 7.1beta3. In
addition, I have setup an anonymous FTP directory on
ftp.congen.com:/pub/pg_perf which contains all of these files as well
as the compressed table dumps used to build the test database. (When
you have finished copying the files, please let me know.)
The test script creates the database including the necessary
indexing, and then runs EXPLAIN on each of the queries followed by
actually executing the queries with "timex" commands to report elapsed
times. The striking difference in the query plans is that 7.1 uses
only sequential searches for the SELECT's whereas 6.5.3 uses index
scans. As a result, 7.1 is almost two orders of magnitude slower than
6.5.3 with exactly the same data, schema, and queries.

I plead with you to revisit this question of performance and
fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend
upon PostgreSQL for much of my work, and I do not want to abandon it
because of this performance problem which arose only recently. Thank
you.

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+

Attachments:

perf_test.shtext/plain; charset=US-ASCIIDownload
perf_test.log.6.5.3text/plain; charset=US-ASCIIDownload
perf_test.log.7.1beta3text/plain; charset=US-ASCIIDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

bruc@stone.congenomics.com (Robert E. Bruccoleri) writes:

I have followed the discussion in pgsql-hackers over the previous
months and others have noted some performance problems, and the response
has typically been to VACUUM the tables. Unfortunately, this is not a
practical option for my applications. They are very large -- I have one
table that is 17GB in length, and the applications are used frequently.

You can't afford to run a VACUUM ANALYZE even once in the lifetime of
the table?

More importantly, PostgreSQL 6.5.3 works very, very well without
VACUUM'ing.

6.5 effectively assumes that "foo = constant" will select exactly one
row, if it has no statistics to prove otherwise. I don't regard that
as a well-chosen default, even if it does happen to work OK for your
application. Selecting an indexscan when a seqscan is needed is just
as evil as doing the reverse; what's much worse is that 6.5 will
pick incredibly bad join plans (ie, nested loops) because it thinks
that very little data is coming out of the scans.

If you want to revert to the 6.5 behavior without doing a VACUUM, you
could probably get pretty close with
update pg_attribute set attdispersion = -1.0;

Stats-gathering and planning certainly does need a great deal of
additional work, but I'm afraid that none of that will happen before
7.1.

regards, tom lane

#3mlw
markw@mohawksoft.com
In reply to: Noname (#1)
Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

"Robert E. Bruccoleri" wrote:

You can try starting postmaster with the "-o -fs" option. This will disable sequential scans if there is an index. There is also an environment variable you can set, prior to the operation. I have run into this same problem.

Show quoted text

Dear Tom,
I am writing to you because you are the maintainer of the
query optimizer and planner.
I have found a very significant performance degradation
between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two
large applications that I have developed and maintain for several
clients. The performance difference is seen with the use of indices in
SELECT statements, whereby the current release does not make effective
use of the indices and 6.5.3 does. All of these tests were run on a SGI
R10000 Indigo2 system running Irix 6.5. All the regression tests passed
as expected for both versions.
I have followed the discussion in pgsql-hackers over the previous
months and others have noted some performance problems, and the response
has typically been to VACUUM the tables. Unfortunately, this is not a
practical option for my applications. They are very large -- I have one
table that is 17GB in length, and the applications are used frequently.
More importantly, PostgreSQL 6.5.3 works very, very well without
VACUUM'ing.
In order to assist you to diagnosing and correcting this
problem, I have prepared a test database that shows the problems. I
will attach three files; the test script, the log from running it on
version 6.5.3, and the log from running it on version 7.1beta3. In
addition, I have setup an anonymous FTP directory on
ftp.congen.com:/pub/pg_perf which contains all of these files as well
as the compressed table dumps used to build the test database. (When
you have finished copying the files, please let me know.)
The test script creates the database including the necessary
indexing, and then runs EXPLAIN on each of the queries followed by
actually executing the queries with "timex" commands to report elapsed
times. The striking difference in the query plans is that 7.1 uses
only sequential searches for the SELECT's whereas 6.5.3 uses index
scans. As a result, 7.1 is almost two orders of magnitude slower than
6.5.3 with exactly the same data, schema, and queries.

I plead with you to revisit this question of performance and
fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend
upon PostgreSQL for much of my work, and I do not want to abandon it
because of this performance problem which arose only recently. Thank
you.

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+

------------------------------------------------------------------------
#!/bin/csh

createdb perf_test

gunzip <proteins.dmp.gz | timex psql -e perf_test
gunzip <comparisons_4.dmp.gz | timex psql -e perf_test
gunzip <concordance_91.dmp.gz | timex psql -e perf_test

psql -e perf_test <<EOF
explain select * from comparisons_4 where name1 = 'HI0001';
explain select count(*) from comparisons_4 where code = 80003;
explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name;
EOF

timex psql -e -c "select * from comparisons_4 where name1 = 'HI0001'" perf_test
timex psql -e -c "select count(*) from comparisons_4 where code = 80003" perf_test
timex psql -e -c "select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name" perf_test
timex psql -e -c "select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name" perf_test

------------------------------------------------------------------------
CREATE TABLE "proteins" (
"name" character varying(16),
"organism" text,
"start_position" int4,
"last_position" int4,
"seq" text,
"purpose" text,
"alternate_key" character varying(16),
"comment" text,
"compared" bool,
"complement" bool,
"chromosome" character varying(4),
"essentiality" float8);
QUERY: CREATE TABLE "proteins" (
"name" character varying(16),
"organism" text,
"start_position" int4,
"last_position" int4,
"seq" text,
"purpose" text,
"alternate_key" character varying(16),
"comment" text,
"compared" bool,
"complement" bool,
"chromosome" character varying(4),
"essentiality" float8);
COPY "proteins" FROM stdin;
QUERY: COPY "proteins" FROM stdin;
CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
QUERY: CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
QUERY: CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
CREATE
CREATE
CREATE
EOF

real 1:11.42
user 3.15
sys 0.53

CREATE TABLE "comparisons_4" (
"name1" character varying(16),
"name2" character varying(16),
"z_score" float8,
"expected" float8,
"local_overlap_ratio" float8,
"local_overlap_count" int4,
"overlap_ratio" float8,
"code" int4);
QUERY: CREATE TABLE "comparisons_4" (
"name1" character varying(16),
"name2" character varying(16),
"z_score" float8,
"expected" float8,
"local_overlap_ratio" float8,
"local_overlap_count" int4,
"overlap_ratio" float8,
"code" int4);
COPY "comparisons_4" FROM stdin;
QUERY: COPY "comparisons_4" FROM stdin;
CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
QUERY: CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
QUERY: CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
QUERY: CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
CREATE
CREATE
CREATE
CREATE
EOF

real 16:42.13
user 5.86
sys 0.96

CREATE TABLE "concordance_91" (
"target_name" character varying(16),
"matched_name" character varying(16),
"score" text);
QUERY: CREATE TABLE "concordance_91" (
"target_name" character varying(16),
"matched_name" character varying(16),
"score" text);
REVOKE ALL on "concordance_91" from PUBLIC;
QUERY: REVOKE ALL on "concordance_91" from PUBLIC;
GRANT ALL on "concordance_91" to PUBLIC;
QUERY: GRANT ALL on "concordance_91" to PUBLIC;
COPY "concordance_91" FROM stdin;
QUERY: COPY "concordance_91" FROM stdin;
CREATE
CHANGE
CHANGE
EOF

real 0.30
user 0.02
sys 0.04

explain select * from comparisons_4 where name1 = 'HI0001';
QUERY: explain select * from comparisons_4 where name1 = 'HI0001';
NOTICE: QUERY PLAN:

Index Scan using comparisons_4_name1 on comparisons_4 (cost=2.05 rows=1 width=64)

explain select count(*) from comparisons_4 where code = 80003;
QUERY: explain select count(*) from comparisons_4 where code = 80003;
NOTICE: QUERY PLAN:

Aggregate (cost=2.05 rows=1 width=12)
-> Index Scan using comparisons_4_code on comparisons_4 (cost=2.05 rows=1 width=12)

explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
QUERY: explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
NOTICE: QUERY PLAN:

Nested Loop (cost=4.10 rows=1 width=36)
-> Index Scan using comparisons_4_name1 on comparisons_4 c (cost=2.05 rows=1 width=12)
-> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24)

explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name;
QUERY: explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name;
NOTICE: QUERY PLAN:

Nested Loop (cost=2093.00 rows=36840 width=60)
-> Seq Scan on concordance_91 c (cost=43.00 rows=1000 width=36)
-> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24)

EXPLAIN
EXPLAIN
EXPLAIN
EXPLAIN
EOF
QUERY: select * from comparisons_4 where name1 = 'HI0001'
name1 |name2 |z_score|expected|local_overlap_ratio|local_overlap_count|overlap_ratio| code
------+---------------+-------+--------+-------------------+-------------------+-------------+-----
HI0001|PDB2DBV_O | 1217.4| 0| 0.56716| 335| 0.560468|30012
HI0001|PDB4DBV_O | 1207| 0| 0.56418| 335| 0.557523|30012
HI0001|PDB2GD1_P | 1226.4| 0| 0.57015| 335| 0.563423|30012
HI0001|PDB1GAE_O | 1861.8| 0| 0.83133| 332| 0.814164|30012
HI0001|PDB4GPD_1 | 1357.8| 0| 0.64865| 333| 0.637169|30012
HI0001|HP1346 | 850.3| 6.9e-41| 0.39222| 334| 0.386435|30005
HI0001|TP0844 | 780.3| 5.8e-37| 0.46307| 352| 0.465716|30014
HI0001|PDB1HDG_O | 1020.4| 0| 0.48024| 329| 0.466074|30012
HI0001|SCPIR-DEBYG1 | 1405.2| 0| 0.6497| 334| 0.640117|30000
HI0001|Rv1436 | 970.4| 0| 0.49558| 339| 0.49558|30010
HI0001|PDB1CER_O | 949.7| 0| 0.47734| 331| 0.466075|30012
HI0001|PDB1NLH_ | 935.1| 0| 0.46847| 333| 0.458825|30012
HI0001|PDB1GGA_A | 918| 0| 0.52125| 353| 0.51397|30012
HI0001|PDB1GAD_O | 1869.5| 0| 0.83434| 332| 0.817112|30012
HI0001|PDB1GYP_A | 900.1| 0| 0.51275| 353| 0.505589|30012
HI0001|MG301 | 866.7| 0| 0.43155| 336| 0.427731|30004
HI0001|SCSW-G3P1_YEAST| 1425.3| 0| 0.65868| 334| 0.648965|30000
HI0001|ScTDH1 | 1424.6| 0| 0.65868| 334| 0.648965|30013
HI0001|ScTDH2 | 1405.2| 0| 0.6497| 334| 0.640117|30013
HI0001|SCSW-G3P3_YEAST| 1417.5| 0| 0.65868| 334| 0.648965|30000
HI0001|ScTDH3 | 1416.8| 0| 0.65868| 334| 0.648965|30013
HI0001|SCGP-3720 | 1416.8| 0| 0.66168| 334| 0.651921|30000
HI0001|SCGP-E243731 | 1416.8| 0| 0.65868| 334| 0.648965|30000
HI0001|SCSW-G3P2_YEAST| 1405.9| 0| 0.6497| 334| 0.640117|30000
HI0001|SCGP-1008189 | 1424.6| 0| 0.65868| 334| 0.648965|30000
HI0001|SCGP-3726 | 1398.7| 0| 0.6497| 334| 0.640117|30000
HI0001|PDB3GPD_R | 1432.2| 0| 0.63772| 334| 0.628314|30012
HI0001|HP0921 | 762.6| 5.6e-36| 0.40407| 344| 0.41003|30005
HI0001|MJ1146 | 124.7| 1.9| 0.25094| 267| 0.195338|30007
HI0001|SCGP-3724 | 1371.5| 0| 0.63772| 334| 0.628314|30000
(30 rows)

real 0.18
user 0.02
sys 0.03

QUERY: select count(*) from comparisons_4 where code = 80003
count
-----
3231
(1 row)

real 0.34
user 0.02
sys 0.03

QUERY: select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name
name |purpose
-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MG263 |hypothetical protein
HP0652 |phosphoserine phosphatase
MJ1594 |phosphoserine phosphatase
MG125 |hypothetical protein
TP0290 |conserved hypothetical protein
HI1033 |phosphoserine phosphatase (o-phosphoserine phosphohydrolase)
HI0597 |hypothetical protein
Rv3813c|(MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2 kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap
Rv3042c|(MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI. N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC 3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884
MG265 |hypothetical protein
(10 rows)

real 0.24
user 0.02
sys 0.03

QUERY: select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name
target_name|matched_name| score|purpose
-----------+------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ECinfA |BSInfA |0.680556|initiation factor IF-1
ECinfA |HI0548 | 0.80952|initiation factor IF-1
ECinfA |HP1298 | 0.61111|translation initiation factor EF-1
ECinfA |Rv3462c |0.684936|(MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results: identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa)
ECrpmA |BB0780 |0.635297|ribosomal protein L27
ECrpmA |HI0879 | 0.87059|ribosomal protein L27
ECrpmA |HP0297 |0.613632|ribosomal protein L27
ECrpmA |Rv2441c |0.616278|(MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal protein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831 Ribosomal protein L27 signature
(8 rows)

real 0.17
user 0.02
sys 0.03

------------------------------------------------------------------------
CREATE DATABASE
CREATE TABLE "proteins" (
"name" character varying(16),
"organism" text,
"start_position" int4,
"last_position" int4,
"seq" text,
"purpose" text,
"alternate_key" character varying(16),
"comment" text,
"compared" bool,
"complement" bool,
"chromosome" character varying(4),
"essentiality" float8);
CREATE
COPY "proteins" FROM stdin;
CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
CREATE
CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
CREATE

real 1:04.49
user 3.14
sys 0.57

CREATE TABLE "comparisons_4" (
"name1" character varying(16),
"name2" character varying(16),
"z_score" float8,
"expected" float8,
"local_overlap_ratio" float8,
"local_overlap_count" int4,
"overlap_ratio" float8,
"code" int4);
CREATE
COPY "comparisons_4" FROM stdin;
CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
CREATE
CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
CREATE
CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
CREATE

real 7:04.43
user 5.87
sys 1.03

CREATE TABLE "concordance_91" (
"target_name" character varying(16),
"matched_name" character varying(16),
"score" text);
CREATE
REVOKE ALL on "concordance_91" from PUBLIC;
CHANGE
GRANT ALL on "concordance_91" to PUBLIC;
CHANGE
COPY "concordance_91" FROM stdin;

real 0.60
user 0.01
sys 0.03

explain select * from comparisons_4 where name1 = 'HI0001';
NOTICE: QUERY PLAN:

Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=64)

EXPLAIN
explain select count(*) from comparisons_4 where code = 80003;
NOTICE: QUERY PLAN:

Aggregate (cost=15655.61..15655.61 rows=1 width=0)
-> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=0)

EXPLAIN
explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
NOTICE: QUERY PLAN:

Merge Join (cost=22495.22..23029.70 rows=2180283 width=36)
-> Sort (cost=16011.62..16011.62 rows=5918 width=12)
-> Seq Scan on comparisons_4 c (cost=0.00..15640.81 rows=5918 width=12)
-> Sort (cost=6483.60..6483.60 rows=36840 width=24)
-> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24)

EXPLAIN
explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name;
NOTICE: QUERY PLAN:

Merge Join (cost=6553.43..7026.43 rows=368400 width=60)
-> Sort (cost=69.83..69.83 rows=1000 width=36)
-> Seq Scan on concordance_91 c (cost=0.00..20.00 rows=1000 width=36)
-> Sort (cost=6483.60..6483.60 rows=36840 width=24)
-> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24)

EXPLAIN
select * from comparisons_4 where name1 = 'HI0001'
name1 | name2 | z_score | expected | local_overlap_ratio | local_overlap_count | overlap_ratio | code
--------+-----------------+---------+----------+---------------------+---------------------+---------------+-------
HI0001 | PDB1GAD_O | 1869.5 | 0 | 0.83434 | 332 | 0.817112 | 30012
HI0001 | PDB1GAE_O | 1861.8 | 0 | 0.83133 | 332 | 0.814164 | 30012
HI0001 | PDB3GPD_R | 1432.2 | 0 | 0.63772 | 334 | 0.628314 | 30012
HI0001 | SCSW-G3P1_YEAST | 1425.3 | 0 | 0.65868 | 334 | 0.648965 | 30000
HI0001 | SCGP-1008189 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30000
HI0001 | ScTDH1 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30013
HI0001 | SCSW-G3P3_YEAST | 1417.5 | 0 | 0.65868 | 334 | 0.648965 | 30000
HI0001 | ScTDH3 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30013
HI0001 | SCGP-3720 | 1416.8 | 0 | 0.66168 | 334 | 0.651921 | 30000
HI0001 | SCGP-E243731 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30000
HI0001 | SCSW-G3P2_YEAST | 1405.9 | 0 | 0.6497 | 334 | 0.640117 | 30000
HI0001 | ScTDH2 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30013
HI0001 | SCPIR-DEBYG1 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30000
HI0001 | SCGP-3726 | 1398.7 | 0 | 0.6497 | 334 | 0.640117 | 30000
HI0001 | SCGP-3724 | 1371.5 | 0 | 0.63772 | 334 | 0.628314 | 30000
HI0001 | PDB4GPD_1 | 1357.8 | 0 | 0.64865 | 333 | 0.637169 | 30012
HI0001 | PDB2GD1_P | 1226.4 | 0 | 0.57015 | 335 | 0.563423 | 30012
HI0001 | PDB2DBV_O | 1217.4 | 0 | 0.56716 | 335 | 0.560468 | 30012
HI0001 | PDB4DBV_O | 1207 | 0 | 0.56418 | 335 | 0.557523 | 30012
HI0001 | PDB1HDG_O | 1020.4 | 0 | 0.48024 | 329 | 0.466074 | 30012
HI0001 | Rv1436 | 970.4 | 0 | 0.49558 | 339 | 0.49558 | 30010
HI0001 | PDB1CER_O | 949.7 | 0 | 0.47734 | 331 | 0.466075 | 30012
HI0001 | PDB1NLH_ | 935.1 | 0 | 0.46847 | 333 | 0.458825 | 30012
HI0001 | PDB1GGA_A | 918 | 0 | 0.52125 | 353 | 0.51397 | 30012
HI0001 | PDB1GYP_A | 900.1 | 0 | 0.51275 | 353 | 0.505589 | 30012
HI0001 | MG301 | 866.7 | 0 | 0.43155 | 336 | 0.427731 | 30004
HI0001 | HP1346 | 850.3 | 6.9e-41 | 0.39222 | 334 | 0.386435 | 30005
HI0001 | TP0844 | 780.3 | 5.8e-37 | 0.46307 | 352 | 0.465716 | 30014
HI0001 | HP0921 | 762.6 | 5.6e-36 | 0.40407 | 344 | 0.41003 | 30005
HI0001 | MJ1146 | 124.7 | 1.9 | 0.25094 | 267 | 0.195338 | 30007
(30 rows)

real 22.68
user 0.01
sys 0.03

select count(*) from comparisons_4 where code = 80003
count
-------
3231
(1 row)

real 21.49
user 0.01
sys 0.03

select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name
name | purpose
---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HI0597 | hypothetical protein
HI1033 | phosphoserine phosphatase (o-phosphoserine phosphohydrolase)
HP0652 | phosphoserine phosphatase
MG125 | hypothetical protein
MG263 | hypothetical protein
MG265 | hypothetical protein
MJ1594 | phosphoserine phosphatase
Rv3042c | (MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI. N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC 3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884
Rv3813c | (MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2 kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap
TP0290 | conserved hypothetical protein
(10 rows)

real 23.13
user 0.01
sys 0.03

select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name
target_name | matched_name | score | purpose
-------------+--------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ECrpmA | BB0780 | 0.635297 | ribosomal protein L27
ECinfA | BSInfA | 0.680556 | initiation factor IF-1
ECinfA | HI0548 | 0.80952 | initiation factor IF-1
ECrpmA | HI0879 | 0.87059 | ribosomal protein L27
ECrpmA | HP0297 | 0.613632 | ribosomal protein L27
ECinfA | HP1298 | 0.61111 | translation initiation factor EF-1
ECrpmA | Rv2441c | 0.616278 | (MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal protein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831 Ribosomal protein L27 signature
ECinfA | Rv3462c | 0.684936 | (MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results: identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa)
(8 rows)

real 11.16
user 0.01
sys 0.03

#4Noname
bruc@stone.congenomics.com
In reply to: Tom Lane (#2)
Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

Dear Tom,

You can't afford to run a VACUUM ANALYZE even once in the lifetime of
the table?

Not very often at best, and certainly not routinely. Some of my
tables exceed 10GB and have multiple indices.

However, to test your suggestion, I modified my performance test
script to "VACUUM ANALYZE" all the tables prior to invoking EXPLAIN,
and it improves all of the searches except this one (EXPLAIN output
also included):

explain select count(*) from comparisons_4 where code = 80003;
NOTICE: QUERY PLAN:

Aggregate (cost=15659.29..15659.29 rows=1 width=0)
-> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0)

EXPLAIN

The choice of sequential scan here takes 30x longer.

More importantly, PostgreSQL 6.5.3 works very, very well without
VACUUM'ing.

6.5 effectively assumes that "foo = constant" will select exactly one
row, if it has no statistics to prove otherwise. I don't regard that
as a well-chosen default, even if it does happen to work OK for your
application. Selecting an indexscan when a seqscan is needed is just
as evil as doing the reverse; what's much worse is that 6.5 will
pick incredibly bad join plans (ie, nested loops) because it thinks
that very little data is coming out of the scans.

I've tuned my applications to work well with these defaults (and
demonstrate to my peers that PostgreSQL is comparable to Oracle in
performance for these types of queries). I am willing to make changes
to my applications to make them work as well with 7.1, but the
performance of the query above worries me. I think the current planner
will make the wrong decision more often than the right one. To test
this further on this table, I went through the comparisons_4 table and
found that code 13 appears the most (73912) out of 591825 rows. In
this case, 6.5.3 takes 8.56 seconds to return its answer, whereas 7.1
takes 12.11 seconds. Even in the worst case for this table, the
indexed scan is faster, but the optimizer decides on the sequential
scan. It appears that the decision point for the switch to sequential
scans isn't set properly. To me, this is a bug.

If you want to revert to the 6.5 behavior without doing a VACUUM, you
could probably get pretty close with
update pg_attribute set attdispersion = -1.0;

Does VACUUM ANALYZE set this column to its calculated value? What
kinds of queries would not give 6.5 behavior if I set this column as
you suggest?

Alternatively, how hard would it be to add another SET variable like
USE_6_5_PLANNING_RULES? Personally, that would be most helpful from an
application development viewpoint because I could switch to PostgreSQL
7.1 without destroying the performance of my applications, and then
test new versions with the 7.1 planner with less potential for service
disruption.

Stats-gathering and planning certainly does need a great deal of
additional work, but I'm afraid that none of that will happen before
7.1.

As I said above, I've put a lot of effort into making my applications
work quickly with Postgres, and I'm looking forward to using the new
features that are available with version 7.1. However, I'm very
concerned that I will not be able to achieve the same performance
without detailed knowledge of the internals. Shouldn't I be assured
that I will improve the performance of a query by creating a index on
the fields used for selecting the row? That is not the case for the
query above.

Finally, I apologize for being a little strident here. I've been
advocating for and using Postgres for four years, and it's frustrating
when a new version results in a serious and noticeable performance
degradation.

Sincerely,
Bob

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+
#5Hannu Krosing
hannu@tm.ee
In reply to: Noname (#4)
Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

"Robert E. Bruccoleri" wrote:

explain select count(*) from comparisons_4 where code = 80003;
NOTICE: QUERY PLAN:

Aggregate (cost=15659.29..15659.29 rows=1 width=0)
-> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0)

EXPLAIN

What is the type of field "code" ?

---------------
Hannu