BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

Started by Nonamealmost 10 years ago5 messagesbugs

dwaller@microsoft.com

almost 10 years ago

The following bug has been logged on the website:

Bug reference: 14237
Logged by: David Waller
Email address: dwaller@microsoft.com
PostgreSQL version: 9.3.7
Operating system: Linux (Ubuntu)
Description:

Summary: While running ‘create index concurrently’ on a very large table,
running ‘drop index’ for the same index caused Postgres to perform terribly
badly, until the ‘create index’ server process was killed. I would expect
that the ‘drop index’ would either fail immediately, or wait, without
performance impact, until the ‘create index concurrently’ had completed.

In detail:

I started adding an index to a large table using 'create index concurrently
...' (I was running the command using a Rails migration, connecting to
Postgres remotely, via PgBouncer.) It seems that something failed in the
connection to Postgres as my migration terminated with an error after about
1.5 hours (for comparison creating this index successfully took about 3.5
hours).

\d table_name showed the index marked as invalid so I ran 'drop index',
intending to run the index creation again. The average query duration
against this database immediately increased from ~1ms to ~250ms, and stayed
in the range 50–350ms for the next ten minutes, until someone spotted that
there was still a process on the server that was running ‘create index’.
Killing that process caused performance to return to normal.

This was rather bad for the availability of our application during those ten
minutes!

I would expect that running ‘drop index’ while that index is still being
created would either fail immediately, or wait, without performance impact,
until the ‘create index concurrently’ had completed.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Noname (#1)

Re: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

dwaller@microsoft.com writes:

Summary: While running ‘create index concurrently’ on a very large table,
running ‘drop index’ for the same index caused Postgres to perform terribly
badly, until the ‘create index’ server process was killed. I would expect
that the ‘drop index’ would either fail immediately, or wait, without
performance impact, until the ‘create index concurrently’ had completed.

Hmm, the DROP INDEX should have blocked waiting for an exclusive lock on
the table, and then other queries on the table should have queued up
behind that lock request. I think the fact that they didn't just stop
dead probably indicates that when the deadlock checker ran, it concluded
it needed to let them jump the queue and go in front of the DROP INDEX to
avoid a deadlock --- likely because the lock manager could see that the
DROP was blocked by the CREATE INDEX CONCURRENTLY while the latter was
waiting for the other queries to finish. So your slowdown corresponds to
an extra wait of deadlock_timeout ms per query. This is not a bug, but
designed behavior.

I would expect that running ‘drop index’ while that index is still being
created would either fail immediately, or wait, without performance impact,
until the ‘create index concurrently’ had completed.

If the DROP's lock request is not allowed to block other incoming requests
for a table lock, the DROP could face indefinite lock starvation. That is
not better. Reducing deadlock_timeout to zero is not going to improve
your overall performance, either.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David Waller

dwaller@yammer-inc.com

almost 10 years ago

In reply to: Tom Lane (#2)

Re: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

On 08/07/2016, 16:08, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

dwaller@microsoft.com writes:

Summary: While running ‘create index concurrently’ on a very large table,
running ‘drop index’ for the same index caused Postgres to perform terribly
badly, until the ‘create index’ server process was killed. I would expect
that the ‘drop index’ would either fail immediately, or wait, without
performance impact, until the ‘create index concurrently’ had completed.

Hmm, the DROP INDEX should have blocked waiting for an exclusive lock on
the table, and then other queries on the table should have queued up
behind that lock request. I think the fact that they didn't just stop
dead probably indicates that when the deadlock checker ran, it concluded
it needed to let them jump the queue and go in front of the DROP INDEX to
avoid a deadlock --- likely because the lock manager could see that the
DROP was blocked by the CREATE INDEX CONCURRENTLY while the latter was
waiting for the other queries to finish. So your slowdown corresponds to
an extra wait of deadlock_timeout ms per query. This is not a bug, but
designed behavior.

I would expect that running ‘drop index’ while that index is still being
created would either fail immediately, or wait, without performance impact,
until the ‘create index concurrently’ had completed.

If the DROP's lock request is not allowed to block other incoming requests
for a table lock, the DROP could face indefinite lock starvation. That is
not better. Reducing deadlock_timeout to zero is not going to improve
your overall performance, either.

Thank you for the detailed explanation. This all seems very sensible, and
reasonable behaviour from Postgres. Yet... it still 'allowed' me to shoot myself
painfully in the foot. User error, I agree, yet people make mistakes - could
Postgres behave more gracefully?

For example, would it be at all feasible for Postgres to handle DDL statements
differently from regular requests? In this example it was pointless for DROP
INDEX to take any locks while there was already another DDL statement (CREATE
INDEX) running. Could it have been added to a queue of DDL statements against
that table and not attempted to take a lock until CREATE INDEX completed and
DROP INDEX then reached the head of the queue?

Thanks,

David

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: David Waller (#3)

Re: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

David Waller <dwaller@yammer-inc.com> writes:

Thank you for the detailed explanation. This all seems very sensible, and
reasonable behaviour from Postgres. Yet... it still 'allowed' me to shoot myself
painfully in the foot. User error, I agree, yet people make mistakes - could
Postgres behave more gracefully?

Well, there are always tradeoffs. You could choose to run with a
non-infinite setting of lock_timeout, which would have caused the DROP to
fail after waiting a second or two (or whatever you set the timeout to
be). That would move the denial of service over to the problematic DDL,
which might be a good tradeoff for your environment. But not everybody is
going to think that query failure is a "more graceful" solution.

For example, would it be at all feasible for Postgres to handle DDL statements
differently from regular requests? In this example it was pointless for DROP
INDEX to take any locks while there was already another DDL statement (CREATE
INDEX) running. Could it have been added to a queue of DDL statements against
that table and not attempted to take a lock until CREATE INDEX completed and
DROP INDEX then reached the head of the queue?

This is handwaving: the DROP already was in a lock queue. I really doubt
there are any easy fixes that won't create as many problems as they solve.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David Waller

dwaller@yammer-inc.com

almost 10 years ago

In reply to: Tom Lane (#4)

Re: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

On 11/07/2016, 18:08, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Waller <dwaller@yammer-inc.com> writes:

Thank you for the detailed explanation. This all seems very sensible, and
reasonable behaviour from Postgres. Yet... it still 'allowed' me to shoot myself
painfully in the foot. User error, I agree, yet people make mistakes - could
Postgres behave more gracefully?

Well, there are always tradeoffs. You could choose to run with a
non-infinite setting of lock_timeout, which would have caused the DROP to
fail after waiting a second or two (or whatever you set the timeout to
be). That would move the denial of service over to the problematic DDL,
which might be a good tradeoff for your environment. But not everybody is
going to think that query failure is a "more graceful" solution.

Thank you! lock_timeout sounds like exactly what I need to set - thank you for
helping out.

Now I know about lock_timeout I agree, there's nothing worth changing here.
Postgres already has the tools built in to allow me to get the behaviour I
wanted in this sitution.

Thank you for your patient explanations.

David

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs