Query plan not updated after dropped index

Started by Victor Blomqvistabout 10 years ago6 messagesgeneral

vb@viblo.se

about 10 years ago

Hello!

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index. At
least this is our theory, since the function in question became much slower
and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name
3. analyze table

After these steps normally our functions will update their plans and use
the new index just fine. However this time the function (only one function
use this particular index) seemed to take forever to complete. This is a
40GB table so querying for something not indexed would take a long time.
Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was
fast, only when called from our application through pg_bouncer it was slow.
I should also say that the function is only used on our 3 read slaves setup
to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra
useless where clause (in the hope that it would force it to create a new
plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Obviously we are now worried why this happened and how we can avoid it in
the future? We run Postgres 9.3 on CentOS 6.

Thanks!
Victor

Oleg Bartunov

oleg@sai.msu.su

about 10 years ago

In reply to: Victor Blomqvist (#1)

Re: Query plan not updated after dropped index

On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist <vb@viblo.se> wrote:

Hello!

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index. At
least this is our theory, since the function in question became much slower
and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name

why do you did this !?

Show quoted text

3. analyze table

After these steps normally our functions will update their plans and use
the new index just fine. However this time the function (only one function
use this particular index) seemed to take forever to complete. This is a
40GB table so querying for something not indexed would take a long time.
Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was
fast, only when called from our application through pg_bouncer it was slow.
I should also say that the function is only used on our 3 read slaves setup
to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra
useless where clause (in the hope that it would force it to create a new
plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Obviously we are now worried why this happened and how we can avoid it in
the future? We run Postgres 9.3 on CentOS 6.

Thanks!
Victor

Victor Blomqvist

vb@viblo.se

about 10 years ago

In reply to: Oleg Bartunov (#2)

Re: Query plan not updated after dropped index

The end goal is to get rid of index bloat. If there is a better way to
handle this Im all ears!

/Victor

On Thu, Feb 18, 2016 at 5:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

Show quoted text

On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist <vb@viblo.se> wrote:

Hello!

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index. At
least this is our theory, since the function in question became much slower
and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name

why do you did this !?

3. analyze table

After these steps normally our functions will update their plans and use
the new index just fine. However this time the function (only one function
use this particular index) seemed to take forever to complete. This is a
40GB table so querying for something not indexed would take a long time.
Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was
fast, only when called from our application through pg_bouncer it was slow.
I should also say that the function is only used on our 3 read slaves setup
to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra
useless where clause (in the hope that it would force it to create a new
plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Obviously we are now worried why this happened and how we can avoid it in
the future? We run Postgres 9.3 on CentOS 6.

Thanks!
Victor

Vitaly Burovoy

vitaly.burovoy@gmail.com

about 10 years ago

In reply to: Victor Blomqvist (#1)

Re: Query plan not updated after dropped index

On 2/18/16, Victor Blomqvist <vb@viblo.se> wrote:

Hello!

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index. At
least this is our theory, since the function in question became much slower
and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name
4. analyze table

After these steps normally our functions will update their plans and use
the new index just fine. However this time the function (only one function
use this particular index) seemed to take forever to complete. This is a
40GB table so querying for something not indexed would take a long time.
Therefore my suspicion is that the function didnt start to use the new
index.

My guess is that backends somehow cached generic plan[1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING and didn't
recalculate it.

Adding to the strangeness is that if I ran the function manually it was
fast,

It is because _backends_ (processes) cache plans, not DBMS (i.e. they
are not shared).
So you connected to DB (making a new backend process), run the
function; backend could not find cached plan and create it for itself.
Plan for _your_ connection includes the new index.

only when called from our application through pg_bouncer it was slow.
I should also say that the function is only used on our 3 read slaves setup
to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Since pgbouncer reuses connections then backends processes still have
cached plan. If you tried pg_terminate_backend() it could help.

Instead what helped in the end was to replace the function with an extra
useless where clause (in the hope that it would force it to create a new
plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Yes, it is a new query for PG, and therefore it requires a new plan
because it is not in a cache.

Obviously we are now worried why this happened

Also my guess you did CREATE INDEX CONCURRENTLY and there is several
cases[2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY -- Best regards, Vitaly Burovoy when it can not be used ("invalid" state or waiting for
unfinished transactions).
When the old index is dropped but the new index is not accessible
while a query/function is running a generated plan does not include
that index. And in case of caching such plan is caching and uses later
without index too.

and how we can avoid it in
the future? We run Postgres 9.3 on CentOS 6.

Firstly you can drop the old index not immediately but a little later
depending on yours queries time.
Also after creating the new index (and possible waiting a little) you
can drop index in a _transaction_ and see whether the new index is
used in an EXPLAIN of any query that use it or not. In the first case
do COMMIT, in the second case just do ROLLBACK and leave old index for
using.

Thanks!
Victor

[1]: http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
[2]: http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY -- Best regards, Vitaly Burovoy
--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Victor Blomqvist (#1)

Re: Query plan not updated after dropped index

Victor Blomqvist <vb@viblo.se> writes:

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index.

I'm suspicious that this is some variant of the problem discussed a couple
days ago:

/messages/by-id/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=DBHf4L6rkfjtEJ_w@mail.gmail.com

However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one. (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)

Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Victor Blomqvist

vb@viblo.se

about 10 years ago

In reply to: Tom Lane (#5)

Re: Query plan not updated after dropped index

On Thu, Feb 18, 2016 at 11:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Victor Blomqvist <vb@viblo.se> writes:

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index.

I'm suspicious that this is some variant of the problem discussed a couple
days ago:

/messages/by-id/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=DBHf4L6rkfjtEJ_w@mail.gmail.com

However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one. (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)

Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?

No the new index looked exactly as the old one. The index was created with
CONCURRENTLY, and we waited until it returned.

I could use the index just fine when running a query in a separate
connection, so I am not sure if it helps dropping the old index in a
transaction and run a query to verify that the new index is usable? How can
I know that the new index is usable from already open connections?

Another thing to note about this index is that it sometimes has an
unexpected zero page inside, resulting in this error: ERROR: index
"user_pictures_picture_dhash_idx" contains unexpected zero page at block
123780. But it always happens sporadically on hour read slaves, and
rerunning the query that cause it doesnt reproduce the error. I asked about
this problem before, here:
/messages/by-id/CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com

Show quoted text

regards, tom lane