Why is posgres picking a suboptimal plan for this query?

Started by Sam Saffronalmost 9 years ago5 messagesgeneral

sam.saffron@gmail.com

almost 9 years ago

I have this query that is not picking the right index unless I hard code dates:

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= '2017-05-11 20:56:24'

"Index Scan using index_topics_on_last_unread_at on topics
(cost=0.41..8.43 rows=1 width=725) (actual time=0.005..0.065 rows=5
loops=1)"
" Index Cond: (last_unread_at >= '2017-05-11 20:56:24'::timestamp
without time zone)"
"Planning time: 0.136 ms"
"Execution time: 0.087 ms"

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= (select first_topic_unread_at from
user_stats us where us.user_id = 1)

"Seq Scan on topics (cost=8.30..24030.12 rows=20555 width=725) (actual
time=3.186..59.636 rows=5 loops=1)"
" Filter: (last_unread_at >= $0)"
" Rows Removed by Filter: 61660"
" InitPlan 1 (returns $0)"
" -> Index Scan using user_stats_pkey on user_stats us
(cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.009 rows=1
loops=1)"
" Index Cond: (user_id = 1)"
"Planning time: 0.147 ms"
"Execution time: 59.671 ms"

select first_topic_unread_at from user_stats us where us.user_id = 1
"2017-05-11 20:56:24.842356"

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

jeff.janes@gmail.com

almost 9 years ago

In reply to: Sam Saffron (#1)

Re: Why is posgres picking a suboptimal plan for this query?

On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com> wrote:

I have this query that is not picking the right index unless I hard code
dates:

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= '2017-05-11 20:56:24'

"Index Scan using index_topics_on_last_unread_at on topics
(cost=0.41..8.43 rows=1 width=725) (actual time=0.005..0.065 rows=5
loops=1)"
" Index Cond: (last_unread_at >= '2017-05-11 20:56:24'::timestamp
without time zone)"
"Planning time: 0.136 ms"
"Execution time: 0.087 ms"

PostgreSQL knows that few entries come after 2017-05-11 (it thinks 1,
actually 5) and comes up with a plan which works well for that situation.

SELECT "topics".* FROM "topics"

WHERE topics.last_unread_at >= (select first_topic_unread_at from
user_stats us where us.user_id = 1)

"Seq Scan on topics (cost=8.30..24030.12 rows=20555 width=725) (actual
time=3.186..59.636 rows=5 loops=1)"
" Filter: (last_unread_at >= $0)"
" Rows Removed by Filter: 61660"
" InitPlan 1 (returns $0)"
" -> Index Scan using user_stats_pkey on user_stats us
(cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.009 rows=1
loops=1)"
" Index Cond: (user_id = 1)"
"Planning time: 0.147 ms"
"Execution time: 59.671 ms"

At the time PostgreSQL plans this query, it doesn't know what the answer to
the subquery is going to be. Not having the true answer at its fingertips,
it guesses that one third of the table is going to fall after the results
of that subquery.

Maybe it should first execute the subquery and then re-plan the rest of the
query based on the results. But there is no provision for it to do that,
and no concrete plans (that I know of) to implement such a thing.

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?

That is the most pragmatic approach. It isn't very nice, but the
alternatives are worse.

Cheers,

Jeff

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: Jeff Janes (#2)

Re: Why is posgres picking a suboptimal plan for this query?

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com> wrote:

I have this query that is not picking the right index unless I hard code
dates:
...

Maybe it should first execute the subquery and then re-plan the rest of the
query based on the results. But there is no provision for it to do that,
and no concrete plans (that I know of) to implement such a thing.

I don't know of any plans for that, either.

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?

That is the most pragmatic approach. It isn't very nice, but the
alternatives are worse.

You could probably get the behavior you want by replacing the subquery
with a "stable" function:

create function first_topic_unread_for(userid int) returns timestamp as
'select first_topic_unread_at from user_stats us where us.user_id = $1'
language sql stable;

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= first_topic_unread_for(1);

This should convince the planner to pre-run the function to get an
estimated result at plan time.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

sam.saffron@gmail.com

almost 9 years ago

In reply to: Tom Lane (#3)

Re: Why is posgres picking a suboptimal plan for this query?

Awesome, thanks! I will give that a shot

On Wed, 24 May 2017 at 6:14 pm, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com>

wrote:

I have this query that is not picking the right index unless I hard code
dates:
...

Maybe it should first execute the subquery and then re-plan the rest of

the

query based on the results. But there is no provision for it to do that,
and no concrete plans (that I know of) to implement such a thing.

I don't know of any plans for that, either.

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?

That is the most pragmatic approach. It isn't very nice, but the
alternatives are worse.

You could probably get the behavior you want by replacing the subquery
with a "stable" function:

create function first_topic_unread_for(userid int) returns timestamp as
'select first_topic_unread_at from user_stats us where us.user_id = $1'
language sql stable;

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= first_topic_unread_for(1);

This should convince the planner to pre-run the function to get an
estimated result at plan time.

regards, tom lane

sam.saffron@gmail.com

almost 9 years ago

In reply to: Sam Saffron (#4)

Re: Why is posgres picking a suboptimal plan for this query?

OK, I committed a fix to Discourse, the suggested pattern by Tom works
like a charm, in my particular user case it cuts a query down from
200-500ms to 8ms.

Thank you heaps

https://github.com/discourse/discourse/commit/29fac1ac18acdc1f0d2c1650d33d2d4a1aab0a0b

On Wed, May 24, 2017 at 6:33 PM, Sam Saffron <sam.saffron@gmail.com> wrote:

Awesome, thanks! I will give that a shot

On Wed, 24 May 2017 at 6:14 pm, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com>
wrote:

I have this query that is not picking the right index unless I hard
code
dates:
...

Maybe it should first execute the subquery and then re-plan the rest of
the
query based on the results. But there is no provision for it to do
that,
and no concrete plans (that I know of) to implement such a thing.

I don't know of any plans for that, either.

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?

That is the most pragmatic approach. It isn't very nice, but the
alternatives are worse.

You could probably get the behavior you want by replacing the subquery
with a "stable" function:

create function first_topic_unread_for(userid int) returns timestamp as
'select first_topic_unread_at from user_stats us where us.user_id = $1'
language sql stable;

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= first_topic_unread_for(1);

This should convince the planner to pre-run the function to get an
estimated result at plan time.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general