Weird query plans for my queries, causing terrible performance.

Started by Arjen van der Meijdenabout 23 years ago5 messagesgeneral
Jump to latest
#1Arjen van der Meijden
acm@tweakers.net

To create a list of recent topics in a forum I use these queries in mysql:
SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE ForumID = 15 AND Lastmessage > '2002-08-01 23:27:03+02'
AND Deleted = false
ORDER BY Lastmessage DESC
LIMIT 1000

and

SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE StatusID IN(1) AND Deleted = false AND ForumID = 15

Where in the client-code some simple checks are done on the queryresults
(for the rights a user has), which would make the queries quite slow on
mysql.

And after that it simply selects the needed info from the database using
a in list:

SELECT F_Topics.Name, F_Topics.TopicID, etc
FROM
F_Topics
LEFT JOIN
F_Users AS L_Users ON (L_Users.UserID = F_Topics.LastposterID)
, F_Users
WHERE
TopicID IN (list of topicids gathered from the above queries)
AND
F_Users.UserID = F_Topics.UserID
AND
!F_Topics.Deleted
AND
F_Topics.ForumID=15

Together it takes around 60ms to finish in mysql. That isn't the nicest
query setup. So I tried to create a single query for this in postgresql.

The F_Topics table has some fields, including TopicID (primary key),
UserID (foreign key to F_Users.UserID), LastPosterID (nullable foreign
to F_Users.UserID), LastMessage (timestamp), ForumID, StatusID (being a
integer with just 3 different values) and deleted (boolean)

There are three indices:
f_topics_pkey (on the topicid)
f_topics_forum_lastmessage_deleted (forumid, lastmessage, deleted)
f_topics_forum_status_deleted (forumid, statusid, deleted)

and the only relevant index on the F_Users table is its primary key on
UserID.

The query:
SELECT F_Topics.TopicID,F_Topics.StatusID, UserID
FROM F_Topics
WHERE (StatusID IN(1) OR F_Topics.Lastmessage > '2002-08-01 23:27:03+02')
AND F_Topics.Deleted = false AND ForumID = 15

Results in the plan:

Seq Scan on f_topics (cost=0.00..20347.58 rows=59 width=12)
Filter: (((statusid = 1) OR (lastmessage > '2002-08-01
23:27:03'::timestamp without time zone)) AND (deleted = false) AND
(forumid = 15))
Which would take around 2 seconds to complete.

While the query:
SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15)
OR
(Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID
= 15)

Results in the plan:

Index Scan using f_topics_forum_status_deleted,
f_topics_forum_lastmessage_deleted on f_topics (cost=0.00..241.87
rows=60 width=12)
Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted =
false)) OR ((forumid = 15) AND (lastmessage > '2002-08-01
23:27:03'::timestamp without time zone) AND (deleted = false)))
Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15))
OR ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
AND (deleted = false) AND (forumid = 15)))
(3 rows)

Executing in around 9ms and exactly what I need.

Of coarse the query needs to be joined with the F_Users table.
My tries were:
SELECT TopicID, StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15 AND
F_Topics.UserID = F_Users.UserID)
OR
(Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID
= 15 AND F_Topics.UserID = F_Users.UserID)

and

SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (
(StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
OR
(F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15))
AND F_Topics.UserID = F_Users.UserID

(and a inner join'ed version of the above)

All three took 500-600ms to complete and changed the query plan to
something similar to:
Nested Loop (cost=0.00..23788.87 rows=1 width=16)
-> Index Scan using f_topics_forum_status_deleted on f_topics
(cost=0.00..23785.84 rows=1 width=12)
Index Cond: (forumid = 15)
Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp
without time zone) OR (statusid = 1)) AND ((deleted = false) OR
(statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage

'2002-08-01 23:27:03'::timestamp without time zone) OR (deleted =

false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false))
AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15)))
-> Index Scan using f_users_pkey on f_users (cost=0.00..3.01
rows=1 width=4)
Index Cond: ("outer".userid = f_users.userid)
(6 rows)

I.e. It creates a much harder query plan to execute and terribly slows
down what should be much simpler and much faster.

By the way, I did do 'vacuum full analyze' before gathering these plans.
The result of the above queries is 134 rows from 549679 topicrows in
total. 3 rows being selected due to the status = 1 and 131 due to the
lastmessage time restriction.
In total there are 15943 topics in the 15th forum, in total 102 having a
statusid = 1, 5321 meeting the last message requirement and 15 topics
have the deleted boolean set to true.

I hope someone can shine a bright light on this, since I'm out of clues.
I also tries using subselects for the separate parts either union'ed
together or OR'ed together. Resulting in even worse plans (with
seq-scans for the f_users.userid for instance, or the f_topics.topicid)
of over 60 seconds execution time.

Best regards,

Arjen van der Meijden

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arjen van der Meijden (#1)
Re: Weird query plans for my queries, causing terrible performance.

Arjen van der Meijden <acm@tweakers.net> writes:

Of coarse the query needs to be joined with the F_Users table.
My tries were:
...
SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (
(StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
OR
(F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15))
AND F_Topics.UserID = F_Users.UserID

(and a inner join'ed version of the above)

You sure you tried the inner-join case? I did

explain SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
FROM F_Topics JOIN F_Users USING (userid)
WHERE
(StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
OR
(F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15)

and got the plan you wanted:

Nested Loop (cost=0.00..14.51 rows=1 width=16)
-> Index Scan using f_topics_forum_status_deleted, f_topics_forum_lastmessage_deleted on f_topics (cost=0.00..9.67 rows=1 width=12)
Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted = false)) OR ((forumid = 15) AND (lastmessage > '2002-08-01 17:27:03-04'::timestamp with time zone) AND (deleted = false)))
Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15)) OR ((lastmessage > '2002-08-01 17:27:03-04'::timestamp with time zone) AND (deleted = false) AND (forumid = 15)))
-> Index Scan using f_users_pkey on f_users (cost=0.00..4.82 rows=1 width=4)
Index Cond: ("outer".userid = f_users.userid)

Of course the cost estimates are bogus because I have no data in the
test tables, but the thing is capable of producing the plan you want.

I believe the reason the query you give above doesn't work like you want
is that the planner first converts the whole WHERE clause to CNF form
(canonical AND-of-OR layout), and then is unable to separate out the
join clause from the spaghetti-like restriction clause. The
CNF-conversion heuristic is usually a good one, but not in this case.

Writing the join clause as a JOIN clause keeps it separate from WHERE,
preventing this mistake from being made. Then the WHERE clause is
already in DNF (canonical OR-of-ANDs) form, which the planner also
likes, so it doesn't rearrange it.

It might be that we could improve the qual-rearrangement heuristics
by trying to keep join clauses separate from single-relation
restrictions. Anyone care to take a look at it? The gold is all hidden
in src/backend/optimizer/prep/prepqual.c ...

regards, tom lane

#3Arjen van der Meijden
acm@tweakers.net
In reply to: Tom Lane (#2)
Re: Weird query plans for my queries,

Van: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Verzonden: vrijdag 31 januari 2003 2:35

You sure you tried the inner-join case? I did

explain SELECT F_Topics.TopicID,F_Topics.StatusID as
StatusID, F_Users.UserID FROM F_Topics JOIN F_Users USING
(userid) WHERE (StatusID IN(1) AND F_Topics.Deleted = false
AND ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01
23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15)

and got the plan you wanted:

Nested Loop (cost=0.00..14.51 rows=1 width=16)
-> Index Scan using f_topics_forum_status_deleted,
f_topics_forum_lastmessage_deleted on f_topics
(cost=0.00..9.67 rows=1 width=12)
Index Cond: (((forumid = 15) AND (statusid = 1) AND
(deleted = false)) OR ((forumid = 15) AND (lastmessage >
'2002-08-01 17:27:03-04'::timestamp with time zone) AND
(deleted = false)))
Filter: (((statusid = 1) AND (deleted = false) AND
(forumid = 15)) OR ((lastmessage > '2002-08-01
17:27:03-04'::timestamp with time zone) AND (deleted = false)
AND (forumid = 15)))
-> Index Scan using f_users_pkey on f_users
(cost=0.00..4.82 rows=1 width=4)
Index Cond: ("outer".userid = f_users.userid)

That is weird, a copy&paste of your command into my psql results in:

Nested Loop (cost=0.00..23788.87 rows=1 width=16)
-> Index Scan using f_topics_forum_status_deleted on f_topics
(cost=0.00..23785.84 rows=1 width=12)
Index Cond: (forumid = 15)
Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp
without time zone) OR (statusid = 1)) AND ((deleted = false) OR
(statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage

'2002-08-01 23:27:03'::timestamp without time zone) OR (deleted =

false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false))
AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15)))
-> Index Scan using f_users_pkey on f_users (cost=0.00..3.01 rows=1
width=4)
Index Cond: ("outer".userid = f_users.userid)

Which is the same as the one as when using a normal join in the where
part.

Of course the cost estimates are bogus because I have no data
in the test tables, but the thing is capable of producing the
plan you want.

I didn't doubt it could, I was even counting on it :)

Actually, when I tried *not* to have postgres use any of the
dataknowledge using this:

prepare the_query(integer, integer, timestamp) AS
SELECT TopicID, StatusID as StatusID, F_Users.UserID
FROM F_Topics INNER JOIN F_Users USING (UserID)
WHERE (StatusID IN ( $1 ) AND F_Topics.Deleted = false AND ForumID = $2
)
OR
(Lastmessage > $3 AND F_Topics.Deleted = false AND ForumID = $2 )

execute the_query(1, 15, '2002-08-01 23:27:03+02')

The execution time is 2 seconds, ie it is doing a sequential scan
somewhere along the path.

I believe the reason the query you give above doesn't work
like you want is that the planner first converts the whole
WHERE clause to CNF form (canonical AND-of-OR layout), and
then is unable to separate out the join clause from the
spaghetti-like restriction clause. The CNF-conversion
heuristic is usually a good one, but not in this case.

Writing the join clause as a JOIN clause keeps it separate
from WHERE, preventing this mistake from being made. Then
the WHERE clause is already in DNF (canonical OR-of-ANDs)
form, which the planner also likes, so it doesn't rearrange it.

That would be nice, but in my case postgres 7.3 decides to rearrange it
apparently.
For all variants, I know, that I tried the same explain output resulted.

Regards,

Arjen

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arjen van der Meijden (#3)
Re: Weird query plans for my queries, causing terrible performance.

Arjen van der Meijden <acm@tweakers.net> writes:

That is weird, a copy&paste of your command into my psql results in:
[different results]

Hm, there must be some difference on this query between 7.3 and CVS tip
then; I was not expecting that. [ ... a debugger is fired up ... time
passes ... ]

Ah hah. There's a rather shaky heuristic in canonicalize_qual that
prefers DNF if certain things are true, one of them being that the
qual condition mentions only one relation. This test is being fooled
because you wrote some of the variables with "F_Topics." and some
without (which means that, according to the letter of the SQL spec,
they refer to the join relation's result and not the original table).
I find that 7.3 will produce the desired plan if I write all the
variables in the WHERE clause the same way, either with or without
"F_Topics.". CVS tip doesn't show this effect because it handles join
variables differently.

So that's your workaround for the moment. As I was saying, this code
could use some fresh ideas...

regards, tom lane

#5Arjen van der Meijden
acm@tweakers.net
In reply to: Tom Lane (#4)
Re: Weird query plans for my queries,

I suppose you meant running this:

explain SELECT F_Topics.TopicID,F_Topics.StatusID, F_Users.UserID
FROM F_Topics JOIN F_Users ON F_Topics.UserID = F_Users.UserID
WHERE ((F_Topics.StatusID IN(1) AND F_Topics.Deleted = false AND
F_Topics.ForumID = 15) OR (F_Topics.Lastmessage > '2002-08-01
23:27:03+02' AND F_Topics.Deleted =
false AND F_Topics.ForumID = 15))

Did indeed produce the required query plan, even after adding a second
(left) join to the F_Users table.
You might like to know that mysql takes around 330ms to complete the
above query, after allowing it to fetch and cache anything it wants
(initial runs take over half a second, which is why we chose the
separate queries) while postgresql on the same machines finishes it in
12ms...

The mysql-multi-query variant takes around 45ms of time for the queries,
while the single query postgresvariant (including more advanced results
and another join) does the same thing in the same time.
Though with much less code-iterations involved.

Thanks for your help,

Regards,

Arjen van der Meijden

Show quoted text

-----Oorspronkelijk bericht-----
Van: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Verzonden: vrijdag 31 januari 2003 16:31
Aan: Arjen van der Meijden
CC: pgsql-general@postgresql.org
Onderwerp: Re: [GENERAL] Weird query plans for my queries,
causing terrible performance.

Arjen van der Meijden <acm@tweakers.net> writes:

That is weird, a copy&paste of your command into my psql

results in:

[different results]

Hm, there must be some difference on this query between 7.3
and CVS tip then; I was not expecting that. [ ... a debugger
is fired up ... time passes ... ]

Ah hah. There's a rather shaky heuristic in
canonicalize_qual that prefers DNF if certain things are
true, one of them being that the qual condition mentions only
one relation. This test is being fooled because you wrote
some of the variables with "F_Topics." and some without
(which means that, according to the letter of the SQL spec,
they refer to the join relation's result and not the original
table). I find that 7.3 will produce the desired plan if I
write all the variables in the WHERE clause the same way,
either with or without "F_Topics.". CVS tip doesn't show
this effect because it handles join variables differently.

So that's your workaround for the moment. As I was saying,
this code could use some fresh ideas...

regards, tom lane