BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTS cancels autovacuum even if the index already exists.

Started by Marcin Barczyńskialmost 9 years ago4 messagesbugs

mba.ogolny@gmail.com

almost 9 years ago

The following bug has been logged on the website:

Bug reference: 14768
Logged by: Marcin Barczyński
Email address: mba.ogolny@gmail.com
PostgreSQL version: 9.6.3
Operating system: Ubuntu 14.04 but most probably it doesn't matter
Description:

I am perfectly aware of the fact that CREATE INDEX CONCURRENTLY on a table
cancels a running autovacuum process on that table.
But CREATE INDEX CONCURRENTLY IF NOT EXISTS should take
ShareUpdateExclusiveLock only after checking that the index doesn't exist.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Michael Paquier

michael@paquier.xyz

almost 9 years ago

In reply to: Marcin Barczyński (#1)

Re: BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTS cancels autovacuum even if the index already exists.

On Wed, Aug 2, 2017 at 12:39 PM, <mba.ogolny@gmail.com> wrote:

I am perfectly aware of the fact that CREATE INDEX CONCURRENTLY on a table
cancels a running autovacuum process on that table.
But CREATE INDEX CONCURRENTLY IF NOT EXISTS should take
ShareUpdateExclusiveLock only after checking that the index doesn't exist.

Logically the checks in index_create could happen in DefineIndex() as
there is no if_not_exists logic for toast indexes. But do we want to
skip all the sanity checks done before that, particularly for
exclusion constraints with concurrent creation? I am less sure, and
this point deserves extra opinions as those pre-checks have value in
themselves because the query defined is incorrectly shaped, so likely
users want to know about that before being sure that the named
relation already exists or not.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: Michael Paquier (#2)

Re: BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTS cancels autovacuum even if the index already exists.

Michael Paquier <michael.paquier@gmail.com> writes:

On Wed, Aug 2, 2017 at 12:39 PM, <mba.ogolny@gmail.com> wrote:

I am perfectly aware of the fact that CREATE INDEX CONCURRENTLY on a table
cancels a running autovacuum process on that table.
But CREATE INDEX CONCURRENTLY IF NOT EXISTS should take
ShareUpdateExclusiveLock only after checking that the index doesn't exist.

Logically the checks in index_create could happen in DefineIndex() as
there is no if_not_exists logic for toast indexes. But do we want to
skip all the sanity checks done before that, particularly for
exclusion constraints with concurrent creation?

I'm afraid this complaint is just wishful/sloppy thinking. It's useless
to perform an "index doesn't exist" check without holding a lock that's
sufficient to prevent such an index from being created by a concurrent
transaction. There is no lock level less than SHARE UPDATE EXCLUSIVE
that would prevent that; and even if there was, taking that level to
make the check and then upgrading to SHARE UPDATE EXCLUSIVE would
constitute a deadlock risk in itself.

Perhaps the OP's problem --- which he failed to state exactly, but
I suppose can be written as "I wish a failed CREATE INDEX CONCURRENTLY
didn't kill a concurrent autovacuum before failing" --- could be resolved
by subdividing SHARE UPDATE EXCLUSIVE into more than one lock level.
But that's not exactly a trivial change. And it's not very clear why
this is such a big problem that we need to be making a delicate redesign
of the locking logic to avoid it. Autovacuum cancels are pretty routine,
while I'm having a hard time understanding why index builds would happen
so often that they'd lock out autovacuum for problematic amounts of time.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Marcin Barczyński

mba.ogolny@gmail.com

almost 9 years ago

In reply to: Tom Lane (#3)

Re: BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTS cancels autovacuum even if the index already exists.

On Thu, Aug 24, 2017 at 7:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm afraid this complaint is just wishful/sloppy thinking.

Cancelling autovacuum was a problem for two reasons:

1. We ran a bunch of CREATE INDEX CONCURRENTLY IF NOT EXISTS queries on our
service start up. We thought it's a no-op if an index already exists, and
wanted to have some kind of background migrations: the service works
normally, but some operations are slower until the index is created.

2. Autovacuum takes days/weeks in our scale (billions of rows), so
effectively it never completed due to service restarts. We investigated the
problem, and it turned out that most of the time was spent on vacuuming a
GIST index on timestamp range. I took a look at the code, and during vacuum
GIST index is traversed in a logical order which translates into random
disk accesses (function gistbulkdelete in gistvacuum.c). Our index has
almost 800 GB, so random accesses affect us badly. On the other hand, btree
indexes are vacuumed in physical order (function btvacuumscan in nbtree.c).

We replaced CREATE INDEX CONCURRENTLY IF NOT EXISTS with proper migrations.
By the way, CREATE INDEX IF NOT EXISTS also cancels autovacuum task. I get
your point with locks, but from a user's perspective, I don't understand
why I have to resort to the following code to avoid cancelling autovacuum:

DO $$
BEGIN
IF NOT EXISTS (
SELECT 1
FROM pg_class c JOIN pg_namespace n ON n.oid =
c.relnamespace
WHERE n.nspname = 'my_namespace' AND c.relname =
'my_index') THEN
CREATE INDEX my_index ...;
END IF;
END$$;

As for the long autovacuum, maybe I should report it as a separate bug. For
now, I'm planning to replace all uses of 'contains' operator with the
following function:

CREATE OR REPLACE FUNCTION tstzrange_contains(
range tstzrange,
ts timestamptz)
RETURNS bool AS
$$
SELECT (ts >= lower(range) AND (lower_inc(range) OR ts > lower(range)))
AND (ts <= upper(range) AND (upper_inc(range) OR ts < upper(range)))
$$ LANGUAGE SQL IMMUTABLE;

and create btree indexes on lower and upper bound:

CREATE INDEX my_table_time_range_lower_idx ON my_table
(lower(time_range));
CREATE INDEX my_table_time_range_upper_idx ON my_table
(upper(time_range));

Is it the best approach?

--
Regards,
Marcin