Duplicate entries in pg_depend after REINDEX CONCURRENTLY

Started by Michael Paquierover 6 years ago4 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

While digging into a separate issue, I have found a new bug with
REINDEX CONCURRENTLY. Once the new index is built and validated,
a couple of things are done at the swap phase, like switching
constraints, comments, and dependencies. The current code moves all
the dependency entries of pg_depend from the old index to the new
index, but it never counted on the fact that the new index may have
some entries already. So, once the swapping is done, pg_depend
finishes with duplicated entries: the ones coming from the old index
and the ones of the index freshly-created. For example, take an index
which uses an attribute or an expression and has dependencies with the
parent's columns.

Attached is a patch to fix the issue. As we know that the old index
will have a definition and dependencies that match with the old one, I
think that we should just remove any dependency records on the new
index before moving the new set of dependencies from the old to the
new index. The patch includes regression tests that scan pg_depend to
check that everything remains consistent after REINDEX CONCURRENTLY.

Any thoughts?
--
Michael

Attachments:

reindex-conc-deps.patchtext/x-diff; charset=us-asciiDownload+185-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: Duplicate entries in pg_depend after REINDEX CONCURRENTLY

On Fri, Oct 25, 2019 at 03:43:18PM +0900, Michael Paquier wrote:

Attached is a patch to fix the issue. As we know that the old index
will have a definition and dependencies that match with the old one, I
think that we should just remove any dependency records on the new
index before moving the new set of dependencies from the old to the
new index. The patch includes regression tests that scan pg_depend to
check that everything remains consistent after REINDEX CONCURRENTLY.

Any thoughts?

I have done more tests for this one through the day, and committed the
patch. There is still one bug pending related to partitioned indexes
where REINDEX CONCURRENTLY is cancelled after phase 4 (swap) has
committed. I am still looking more into that.
--
Michael

#3Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#2)
Re: Duplicate entries in pg_depend after REINDEX CONCURRENTLY

On Mon, Oct 28, 2019 at 03:01:31PM +0900, Michael Paquier wrote:

On Fri, Oct 25, 2019 at 03:43:18PM +0900, Michael Paquier wrote:

Attached is a patch to fix the issue. As we know that the old index
will have a definition and dependencies that match with the old one, I
think that we should just remove any dependency records on the new
index before moving the new set of dependencies from the old to the
new index. The patch includes regression tests that scan pg_depend to
check that everything remains consistent after REINDEX CONCURRENTLY.

Any thoughts?

I have done more tests for this one through the day, and committed the
patch. There is still one bug pending related to partitioned indexes
where REINDEX CONCURRENTLY is cancelled after phase 4 (swap) has
committed. I am still looking more into that.

Are there any bad effects of this bug on PG 12?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#4Michael Paquier
michael@paquier.xyz
In reply to: Bruce Momjian (#3)
Re: Duplicate entries in pg_depend after REINDEX CONCURRENTLY

On Tue, Nov 05, 2019 at 06:26:56PM -0500, Bruce Momjian wrote:

Are there any bad effects of this bug on PG 12?

Not that I could guess, except a bloat of pg_depend... The more you
issue REINDEX CONCURRENTLY on an index, the more duplicated entries
accumulate in pg_depend as the dependencies of the old index are
passed to the new one, say:
=# create table aa (a int);
CREATE TABLE
=# create index aai on aa(a);
CREATE INDEX
=# select count(pg_describe_object(classid, objid, objsubid))
from pg_depend
where classid = 'pg_class'::regclass AND
objid in ('aai'::regclass);
count
-------
1
(1 row)
=# reindex index concurrently aai;
REINDEX
=# reindex index concurrently aai;
REINDEX
=# select count(pg_describe_object(classid, objid, objsubid))
from pg_depend
where classid = 'pg_class'::regclass AND
objid in ('aai'::regclass);
count
-------
3
(1 row)

After that, if for example one drops a column the rebuilt index
depends on or just drops the index, then all the duplicated entries
get removed as well with the index. Note that we have also cases
where it is legit to have multiple entries in pg_depend. For example
take the case of one index which lists two times the same column.
--
Michael