important decrease of performance using the BETA version in one particular case

Started by Isabelle Therrienabout 25 years ago5 messagesbugs
Jump to latest
#1Isabelle Therrien
therriei@LUB.UMontreal.CA

(sorry for previous HTML)

Your name: Isabelle Therrien
Your email address : therriei@cirano.qc.ca

System Configuration
---------------------
Architecture : Intel Pentium

Operating System : Linux

PostgreSQL version : PostgreSQL-7.1 Beta (but I use JDBC drivers
v7.0.2)

I'm using Weblogic Server 5.1

Please enter a FULL description of your problem:
------------------------------------------------
I observed an important decrease of performance using the BETA version of
PostgreSQL in one
particular case:

I have a big query, reported below, that is called several times in my
application.
At least 4 active connections call it at the same time.
Normally, this query is executed in about 30-50 milliseconds.
But after a while (depending on how many connections are used, and how
often the query is called),
the query is executed in 1000ms, then 2000ms, and it continues to grow
exponentially. I've already
seen it reaching 80 seconds.

While having these time reports, I tried to query the database directly
using "psql", and I got the
same result (around 80 sec)

I compared this to an older version (7.0.2) of PostgreSQL, in the same
circumstances, and the worst
I had was a 4000ms, without going any further. That's why I suppose the
problem comes from the BETA
version.

----------
The query: (the ? are replaced by 'GPK', GPK being the key we want to look
for)
----------
SELECT quotes.xmldocument, prodrefs.xmldocument, orders.filteredorder,
responses.xmldocument
FROM quotes,prodrefs,responses,orders
WHERE quotes.negotiationGPK = ?
AND quotes.ctime = (SELECT MAX(ctime) FROM quotes WHERE
negotiationgpk = ?)
AND prodrefs.negotiationGPK = ?
AND orders.productreferenceGPK = prodrefs.gpk
AND owner=u1 AND overridden=FALSE
AND responses.orderGPK = orders.gpk
AND responses.ctime = (SELECT MAX(ctime) FROM responses WHERE
ordergpk=orders.gpk)
UNION
SELECT quotes.xmldocument,prodrefs.xmldocument,(''),('')
FROM quotes,prodrefs
WHERE quotes.negotiationGPK = ?
AND quotes.ctime = (SELECT MAX(ctime) FROM quotes WHERE
negotiationgpk = ?)
AND prodrefs.negotiationGPK = ?

----------- 
The tables: 
----------- 
            Table "quotes" 
   Attribute    |   Type    | Modifier 
----------------+-----------+---------- 
 gpk            | bigint    | not null 
 xmldocument    | text      | not null 
 negotiationgpk | bigint    | not null 
 ctime          | timestamp | not null 
Index: quotes_pkey 
           Table "prodrefs" 
   Attribute    |   Type    | Modifier 
----------------+-----------+---------- 
 gpk            | bigint    | not null 
 ctime          | timestamp | not null 
 xmldocument    | text      | not null 
 negotiationgpk | bigint    | not null 
Index: prodrefs_pkey 
             Table "orders" 
      Attribute      |   Type    | Modifier 
---------------------+-----------+---------- 
 gpk                 | bigint    | not null 
 ctime               | timestamp | not null 
 productreferencegpk | bigint    | 
 owner               | text      | not null 
 overridden          | boolean   | 
 overridingordergpk  | bigint    | 
 rawdata             | text      | not null 
 filteredorder       | text      | not null 
 previousquotegpk    | bigint    | 
Index: orders_pkey 
         Table "responses" 
  Attribute  |   Type    | Modifier 
-------------+-----------+---------- 
 gpk         | bigint    | not null 
 ctime       | timestamp | not null 
 ordergpk    | bigint    | 
 xmldocument | text      | not null 
 quotegpk    | bigint    | 
Index: responses_pkey 
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Isabelle Therrien (#1)
Re: important decrease of performance using the BETA version in one particular case

Isabelle Therrien <therriei@LUB.UMontreal.CA> writes:

I have a big query, reported below, that is called several times in my
application.
At least 4 active connections call it at the same time.
Normally, this query is executed in about 30-50 milliseconds.
But after a while (depending on how many connections are used, and how
often the query is called),
the query is executed in 1000ms, then 2000ms, and it continues to grow
exponentially. I've already seen it reaching 80 seconds.

Hmm, that's odd.  What causes the time to drop back down to milliseconds
--- do you have to restart the whole database, or just run it in a new
backend?  Does the amount of memory being used by the backend increase
as the time goes up?  What does EXPLAIN show as the query plan for the
query?  How large are the tables, and how many tuples are actually
retrieved?

Also, which beta release exactly, and how did you build it (what
configure options)?

Finally, it would be nice to see the full schemas for these tables, to
be sure we're not missing something. You can generate those via

pg_dump -s -t tablename databasename

regards, tom lane

#3Isabelle Therrien
therriei@LUB.UMontreal.CA
In reply to: Isabelle Therrien (#1)
Re: important decrease of performance using the BETA version in one particular case

Tom Lane wrote:

Isabelle Therrien <therriei@LUB.UMontreal.CA> writes:

I have a big query, reported below, that is called several times in my
application.
At least 4 active connections call it at the same time.
Normally, this query is executed in about 30-50 milliseconds.
But after a while (depending on how many connections are used, and how
often the query is called),
the query is executed in 1000ms, then 2000ms, and it continues to grow
exponentially. I've already seen it reaching 80 seconds.

Hmm, that's odd.  What causes the time to drop back down to milliseconds
--- do you have to restart the whole database, or just run it in a new
backend?  Does The amount of memory being used by the backend increase
as the time goes up?  What does EXPLAIN show as the query plan for the
query?  How large are the tables, and how many tuples are actually
retrieved?

Also, which beta release exactly, and how did you build it (what
configure options)?

Finally, it would be nice to see the full schemas for these tables, to
be sure we're not missing something. You can generate those via

pg_dump -s -t tablename databasename

regards, tom lane

The only way the time drops down to milliseconds is to wait a couple of hours or the restart the
whole database.
The amount of memory being used by the backend doesn't seem to increase significantly.
The tables are emptied often. We don't keep these datas. So there's never more than 50 tuples per
table. And with this query, about 3-4 tuples are retrieved.
I have these results with BETA1 (no idea what configure options were used) AND BETA4(rpm).

EXPLAIN:

NOTICE: QUERY PLAN:
------------------
(PostgreSQL 7.0.2)
------------------
Unique (cost=10766.48..10766.50 rows=0 width=88)
InitPlan
-> Aggregate (cost=18.69..18.69 rows=1 width=8)
-> Seq Scan on quotes (cost=0.00..18.69 rows=1 width=8)
-> Aggregate (cost=18.69..18.69 rows=1 width=8)
-> Seq Scan on quotes (cost=0.00..18.69 rows=1 width=8)
-> Sort (cost=10766.48..10766.48 rows=2 width=88)
-> Append (cost=0.00..10766.47 rows=2 width=88)
-> Nested Loop (cost=0.00..10745.06 rows=1 width=88)
-> Nested Loop (cost=0.00..10726.03 rows=1 width=76)
-> Nested Loop (cost=0.00..18.01 rows=1 width=48)
-> Seq Scan on orders (cost=0.00..15.98 rows=1 width=28)
-> Index Scan using prodrefs_pkey on prodrefs (cost=0.00..2.02
rows=1 width=20)
-> Seq Scan on responses (cost=0.00..25.96 rows=396 width=28)
SubPlan
-> Aggregate (cost=26.96..26.96 rows=1 width=8)
-> Seq Scan on responses (cost=0.00..26.95 rows=4 width=8)
-> Seq Scan on quotes (cost=0.00..19.02 rows=1 width=12)
-> Nested Loop (cost=0.00..21.41 rows=1 width=24)
-> Seq Scan on prodrefs (cost=0.00..2.38 rows=1 width=12)
-> Seq Scan on quotes (cost=0.00..19.02 rows=1 width=12)

-----------------------
(PostgreSQL 7.1 BETA 1)
-----------------------
Unique (cost=0.06..0.08 rows=1 width=88)
-> Sort (cost=0.06..0.06 rows=2 width=88)
-> Append (cost=0.00..0.05 rows=2 width=88)
-> Subquery Scan *SELECT* 1 (cost=0.00..0.04 rows=1 width=88)
-> Nested Loop (cost=0.00..0.04 rows=1 width=88)
InitPlan
-> Aggregate (cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on quotes (cost=0.00..0.00 rows=1 width=8)
-> Nested Loop (cost=0.00..0.03 rows=1 width=68)
-> Nested Loop (cost=0.00..0.01 rows=1 width=40)
-> Seq Scan on quotes (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on responses (cost=0.00..0.00 rows=1 width=28)
-> Seq Scan on orders (cost=0.00..0.00 rows=1 width=28)
SubPlan
-> Aggregate (cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on responses (cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on prodrefs (cost=0.00..0.00 rows=1 width=20)
-> Subquery Scan *SELECT* 2 (cost=0.00..0.01 rows=1 width=24)
-> Nested Loop (cost=0.00..0.01 rows=1 width=24)
InitPlan
-> Aggregate (cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on quotes (cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on quotes (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on prodrefs (cost=0.00..0.00 rows=1 width=12)

FULL SCH�MAS:

CREATE TABLE "quotes" (
"gpk" int8 NOT NULL,
"xmldocument" text NOT NULL,
"negotiationgpk" int8 NOT NULL,
"ctime" timestamp NOT NULL,
PRIMARY KEY ("gpk")
);
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "quotes" NOT DEFERRABLE INITIALLY
IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'quotes', 'negotiations',
'UNSPECIFIED', 'negotiationgpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "quotes" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'orders', 'quotes',
'UNSPECIFIED', 'previousquotegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "quotes" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'orders', 'quotes',
'UNSPECIFIED', 'previousquotegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "quotes" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'responses', 'quotes',
'UNSPECIFIED', 'quotegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "quotes" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'responses', 'quotes',
'UNSPECIFIED', 'quotegpk', 'gpk');

CREATE TABLE "prodrefs" (
"gpk" int8 NOT NULL,
"ctime" timestamp NOT NULL,
"xmldocument" text NOT NULL,
"negotiationgpk" int8 NOT NULL,
PRIMARY KEY ("gpk")
);
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "prodrefs" NOT DEFERRABLE INITIALLY
IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'prodrefs',
'negotiations', 'UNSPECIFIED', 'negotiationgpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "prodrefs" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'orders', 'prodrefs',
'UNSPECIFIED', 'productreferencegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "prodrefs" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'orders', 'prodrefs',
'UNSPECIFIED', 'productreferencegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "prodrefs" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'adjudications', 'prodrefs',
'UNSPECIFIED', 'productreferencegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "prodrefs" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'adjudications', 'prodrefs',
'UNSPECIFIED', 'productreferencegpk', 'gpk');

CREATE TABLE "orders" (
"gpk" int8 NOT NULL,
"ctime" timestamp NOT NULL,
"productreferencegpk" int8,
"owner" text NOT NULL,
"overridden" bool,
"overridingordergpk" int8,
"rawdata" text NOT NULL,
"filteredorder" text NOT NULL,
"previousquotegpk" int8,
PRIMARY KEY ("gpk")
);
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "orders" NOT DEFERRABLE INITIALLY
IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'orders', 'prodrefs',
'UNSPECIFIED', 'productreferencegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "orders" NOT DEFERRABLE INITIALLY
IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'orders', 'orders',
'UNSPECIFIED', 'overridingordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'orders', 'orders',
'UNSPECIFIED', 'overridingordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'orders', 'orders',
'UNSPECIFIED', 'overridingordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "orders" NOT DEFERRABLE INITIALLY
IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'orders', 'quotes',
'UNSPECIFIED', 'previousquotegpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'adjudications', 'orders',
'UNSPECIFIED', 'sellordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'adjudications', 'orders',
'UNSPECIFIED', 'sellordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'adjudications', 'orders',
'UNSPECIFIED', 'buyordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'adjudications', 'orders',
'UNSPECIFIED', 'buyordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'responses', 'orders',
'UNSPECIFIED', 'ordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "orders" NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'responses', 'orders',
'UNSPECIFIED', 'ordergpk', 'gpk');

CREATE TABLE "responses" (
"gpk" int8 NOT NULL,
"ctime" timestamp NOT NULL,
"ordergpk" int8,
"xmldocument" text NOT NULL,
"quotegpk" int8,
PRIMARY KEY ("gpk")
);
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "responses" NOT DEFERRABLE
INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'responses',
'orders', 'UNSPECIFIED', 'ordergpk', 'gpk');
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER INSERT OR UPDATE ON "responses" NOT DEFERRABLE
INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_check_ins" ('<unnamed>', 'responses',
'quotes', 'UNSPECIFIED', 'quotegpk', 'gpk');

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Isabelle Therrien (#3)
Re: important decrease of performance using the BETA version in one particular case

Isabelle Therrien <therriei@LUB.UMontreal.CA> writes:

The tables are emptied often. We don't keep these datas. So there's
never more than 50 tuples per table. And with this query, about 3-4
tuples are retrieved.

Well, it would appear that in the 7.1 installation, you last vacuumed
the tables just after emptying them --- notice how all the cost
estimates are nearly zero. The 7.0 optimizer on the other hand is
working with more reasonable cost values, and is presumably able to
select a smarter plan because of that.

I'd suggest making a practice of vacuum analyzing the tables just before
you empty them, not just after. This may seem weird but it will leave
the optimizer with appropriate statistics.

If you see a decrease in performance even when 7.1 and 7.0 are being
given equivalent vacuum statistics, then I'd like to know more.

BTW, this query could be rewritten to be much more efficient by using
outer joins and SELECT DISTINCT ON ... but that's not really relevant
to the question of why 7.1 is slower than 7.0 for you ...

regards, tom lane

#5Isabelle Therrien
therriei@LUB.UMontreal.CA
In reply to: Isabelle Therrien (#3)
Re: important decrease of performance using the BETA version in one particular case

Thanks, I will try that. It is in fact possible that I vacuumed the tables after emptying them. I
couldn't imagine it had so much effect on the optimizer. But it didn't happen just once, so I
wonder if the problem is only related to that.
And thanks for the tip for my query, I will surely change it.

Isabelle Therrien

Tom Lane wrote:

Show quoted text

Isabelle Therrien <therriei@LUB.UMontreal.CA> writes:

The tables are emptied often. We don't keep these datas. So there's
never more than 50 tuples per table. And with this query, about 3-4
tuples are retrieved.

Well, it would appear that in the 7.1 installation, you last vacuumed
the tables just after emptying them --- notice how all the cost
estimates are nearly zero. The 7.0 optimizer on the other hand is
working with more reasonable cost values, and is presumably able to
select a smarter plan because of that.

I'd suggest making a practice of vacuum analyzing the tables just before
you empty them, not just after. This may seem weird but it will leave
the optimizer with appropriate statistics.

If you see a decrease in performance even when 7.1 and 7.0 are being
given equivalent vacuum statistics, then I'd like to know more.

BTW, this query could be rewritten to be much more efficient by using
outer joins and SELECT DISTINCT ON ... but that's not really relevant
to the question of why 7.1 is slower than 7.0 for you ...

regards, tom lane