Optimizer generates bad plans.
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans. I have
attached the query and explain analyze outputs against today's cvs head
for queries that take between 9 and 845941 msec. In the JDBC Driver I
will specify a reasonable join order using explicit JOINs, but I thought
someone might be interested in a test case for the optimizer.
Kris Jurka
The query tries to determine what foreign keys exists between the
following tables.
create table people (id int4 primary key, name text);
create table policy (id int4 primary key, name text);
create table users (id int4 primary key, people_id int4,
policy_id int4,
CONSTRAINT people FOREIGN KEY (people_id) references people(id),
constraint policy FOREIGN KEY (policy_id) references policy(id));
Congratulations. That is the largest plan I have ever seen. ;-)
---------------------------------------------------------------------------
Kris Jurka wrote:
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans. I have
attached the query and explain analyze outputs against today's cvs head
for queries that take between 9 and 845941 msec. In the JDBC Driver I
will specify a reasonable join order using explicit JOINs, but I thought
someone might be interested in a test case for the optimizer.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Well I was really hoping pg_constraint would solve all my problems, but
since contrib/array is not installed by default the conkeys and confkeys
columns aren't terribly useful because they can't be joined to
pg_attribute.
Also there is not a column to tell you the unique constraint that
supports a given foreign key constraint.
See my post to bugs:
http://fts.postgresql.org/db/mw/msg.html?mid=1074855
Kris Jurka
On Thu, 19 Sep 2002, Bruce Momjian wrote:
Show quoted text
Congratulations. That is the largest plan I have ever seen. ;-)
---------------------------------------------------------------------------
Kris Jurka wrote:
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans. I have
attached the query and explain analyze outputs against today's cvs head
for queries that take between 9 and 845941 msec. In the JDBC Driver I
will specify a reasonable join order using explicit JOINs, but I thought
someone might be interested in a test case for the optimizer.-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Kris Jurka <jurka@ejurka.com> writes:
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans.
Hm, does an ANALYZE help?
regards, tom lane
Maybe not nice, but there's only 32 (64 now?) of them...
JOIN pg_attribute WHERE attnum IN (conkeys[1], conkeys[2], conkeys[3],
..., conkeys[32])
Great fun...
On Thu, 2002-09-19 at 18:31, Kris Jurka wrote:
Well I was really hoping pg_constraint would solve all my problems, but
since contrib/array is not installed by default the conkeys and confkeys
columns aren't terribly useful because they can't be joined to
pg_attribute.Also there is not a column to tell you the unique constraint that
supports a given foreign key constraint.See my post to bugs:
http://fts.postgresql.org/db/mw/msg.html?mid=1074855
Kris Jurka
On Thu, 19 Sep 2002, Bruce Momjian wrote:
Congratulations. That is the largest plan I have ever seen. ;-)
---------------------------------------------------------------------------
Kris Jurka wrote:
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans. I have
attached the query and explain analyze outputs against today's cvs head
for queries that take between 9 and 845941 msec. In the JDBC Driver I
will specify a reasonable join order using explicit JOINs, but I thought
someone might be interested in a test case for the optimizer.-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Rod Taylor
Kris Jurka <jurka@ejurka.com> writes:
While adding schema support to the JDBC Driver, I came across a
query which occasionally generates some spectacularly bad plans.
Interesting. The inconsistency you're seeing is a result of GEQO. I
would have hoped that it would have produced a better quality plan
more often, but apparently not. On my system, the regular query
optimizer handily beats GEQO for this query: it produces more
efficienty query plans 100% of the time and takes less time to do so.
For *this* query at least, raising geqo_threshold would be a good
idea, but that may not be true universally.
I thought someone might be interested in a test case for the
optimizer.
Thanks, it's a useful query -- I've been meaning to take a look at
GEQO for a while now...
Cheers,
Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
On Thu, 19 Sep 2002, Tom Lane wrote:
Kris Jurka <jurka@ejurka.com> writes:
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans.Hm, does an ANALYZE help?
Yes, it does, but I don't understand why. The query is entirely against
pg_catalog tables which have had all of three tables added to them. How
can the new ANALYZE stats be significantly different than what came from
the ANALYZED template1.
Kris Jurka
Neil Conway <neilc@samurai.com> writes:
Interesting. The inconsistency you're seeing is a result of GEQO. I
would have hoped that it would have produced a better quality plan
more often, but apparently not. On my system, the regular query
optimizer handily beats GEQO for this query: it produces more
efficienty query plans 100% of the time and takes less time to do so.
For *this* query at least, raising geqo_threshold would be a good
idea, but that may not be true universally.
The current GEQO threshold was set some time ago; since then, the
regular optimizer has been improved while the GEQO code hasn't been
touched. It might well be time to ratchet up the threshold.
Anyone care to do some additional experiments?
regards, tom lane
On Thu, 19 Sep 2002, Kris Jurka wrote:
On Thu, 19 Sep 2002, Tom Lane wrote:
Kris Jurka <jurka@ejurka.com> writes:
While adding schema support to the JDBC Driver, I came across a query
which occasionally generates some spectacularly bad plans.Hm, does an ANALYZE help?
Yes, it does, but I don't understand why. The query is entirely against
pg_catalog tables which have had all of three tables added to them. How
can the new ANALYZE stats be significantly different than what came from
the ANALYZED template1.Kris Jurka
Looking at the differences in statistics before and after the ANALYZE the
only differences are in correlation. This comes from initdb around line
1046...
"$PGPATH"/postgres $PGSQL_OPT template1 >/dev/null <<EOF
ANALYZE;
VACUUM FULL FREEZE;
EOF
Could this be done better in the one step VACUUM FULL FREEZE ANALYZE or
ANALYZING after the VACUUM FULL?
Kris Jurka
Kris Jurka <books@ejurka.com> writes:
Looking at the differences in statistics before and after the ANALYZE the
only differences are in correlation. This comes from initdb around line
1046...
"$PGPATH"/postgres $PGSQL_OPT template1 >/dev/null <<EOF
ANALYZE;
VACUUM FULL FREEZE;
EOF
Could this be done better in the one step VACUUM FULL FREEZE ANALYZE or
ANALYZING after the VACUUM FULL?
Hm. We can't do it like that, because that would leave the pg_statistic
rows unfrozen. I suppose we could do
VACUUM FULL;
ANALYZE;
VACUUM FREEZE;
regards, tom lane
Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
Interesting. The inconsistency you're seeing is a result of GEQO. I
would have hoped that it would have produced a better quality plan
more often, but apparently not. On my system, the regular query
optimizer handily beats GEQO for this query: it produces more
efficienty query plans 100% of the time and takes less time to do so.
For *this* query at least, raising geqo_threshold would be a good
idea, but that may not be true universally.The current GEQO threshold was set some time ago; since then, the
regular optimizer has been improved while the GEQO code hasn't been
touched. It might well be time to ratchet up the threshold.Anyone care to do some additional experiments?
Added to TODO:
* Check GUC geqo_threshold to see if it is still accurate
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073