Plan changes from index scan to seq scan after 5 executions

Started by Alexander Kukushkinover 8 years ago4 messagesgeneral
Jump to latest
#1Alexander Kukushkin
cyberdemn@gmail.com

Hi,

Recently I've been investigating a strange behavior of one stored procedure.
According to the statistics its execution time was very high (15 seconds),
but if I run the same statement from console it was very fast, just a few
milliseconds.

At the end I was able to prepare a short script, which reproduces the
problem:

localhost/postgres=# create table test(id serial not null primary key,
value text);
CREATE TABLE
localhost/postgres=# create index ON test(id) where value =
'l';
CREATE INDEX
localhost/postgres=# insert into test(value) select 'h' FROM
generate_series(1, 1000000);
INSERT 0 1000000
localhost/postgres=# analyze test;
ANALYZE
localhost/postgres=# prepare foo as select * FROM test where value = $1
limit 1;
PREPARE

Now we will run prepared statement. First 5 times it will produce following
output:
localhost/postgres=# explain analyze execute foo('l');
QUERY
PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Limit (cost=0.12..4.14 rows=1 width=6) (actual time=0.003..0.003 rows=0
loops=1)
-> Index Scan using test_id_idx on test (cost=0.12..4.14 rows=1
width=6) (actual time=0.002..0.002 rows=0 loops=1)
Planning time: 0.188 ms
Execution time: 0.014 ms
(4 rows)

But on the 6th time (and further) plan it starts using Seq Scan:
localhost/postgres=# explain analyze execute foo('l');
QUERY
PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Limit (cost=0.00..0.02 rows=1 width=6) (actual time=77.890..77.890 rows=0
loops=1)
-> Seq Scan on test (cost=0.00..16925.00 rows=1000000 width=6) (actual
time=77.889..77.889 rows=0 loops=1)
Filter: (value = $1)
Rows Removed by Filter: 1000000
Planning time: 0.104 ms
Execution time: 77.904 ms
(6 rows)

If I remove "LIMIT 1" from the query, everything is good, it will always
use Index Scan.

Regards,
--
Alexander Kukushkin

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Alexander Kukushkin (#1)
Re: Plan changes from index scan to seq scan after 5 executions

On Sat, Sep 30, 2017 at 10:57 AM, Alexander Kukushkin <cyberdemn@gmail.com>
wrote:

Hi,

Recently I've been investigating a strange behavior of one stored
procedure.

Please provide the output of:

SELECT version();

David J.

#3Alexander Kukushkin
cyberdemn@gmail.com
In reply to: David G. Johnston (#2)
Re: Plan changes from index scan to seq scan after 5 executions

Hi David,

sorry, absolutely forgot about important stuff like version.
Original problem has been found on 9.4.9, but I was able to reproduce it on
10rc1.

localhost/postgres=# select version();

version
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 10rc1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit
(1 row)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Kukushkin (#3)
Re: Plan changes from index scan to seq scan after 5 executions

Alexander Kukushkin <cyberdemn@gmail.com> writes:

Original problem has been found on 9.4.9, but I was able to reproduce it on
10rc1.

The problem here is that the generic estimate of the selectivity of
"where value = $1" is 1.0, because var_eq_non_const makes the not
unreasonable assumption that the unknown value being searched for is
one of the values appearing in the table, and there is only one.
This makes the estimated cost of the seqscan+limit plan nearly nil,
since it's expected to return the first row it comes to. So that
plan gets chosen if we're considering a generic plan that doesn't
know the specific value of $1. And at that point the plancache
says to itself "I'm not getting any benefit in estimated cost for
the custom plans I've been making, so I'll stop doing that".

This is not an easy thing to improve without making other cases
worse :-(. One heuristic that I've been wondering about is whether
we could say "if the generic plan appears cheaper than any custom
plan we've made so far, disbelieve it, because probably it's based
on wrong estimates". But I'm not sure if that would have its own
failure modes. Anyway nobody's tried it yet.

You can find more discussion of this problem in the -hackers archives.

As for workarounds, the only short-term fix I can suggest is to use
EXECUTE for this query in your function, thus preventing caching of
a plan for it.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general