REINDEX SCHEMA/DATABASE/SYSTEM weak with dropped relations
Hi all,
While working on support for REINDEX for partitioned relations, I have
noticed an old bug in the logic of ReindexMultipleTables(): the list
of relations to process is built in a first transaction, and then each
table is done in an independent transaction, but we don't actually
check that the relation still exists when doing the real work. I
think that a complete fix involves two things:
1) Addition of one SearchSysCacheExists1() at the beginning of the
loop processing each item in the list in ReindexMultipleTables().
This would protect from a relation dropped, but that would not be
enough if ReindexMultipleTables() is looking at a relation dropped
before we lock it in reindex_table() or ReindexRelationConcurrently().
Still that's simple, cheaper, and would protect from most problems.
2) Be completely water-proof and adopt a logic close to what we do for
VACUUM where we try to open a relation, but leave if it does not
exist. This can be achieved with just some try_relation_open() calls
with the correct lock used, and we also need to have a new
REINDEXOPT_* flag to control this behavior conditionally.
2) and 1) are complementary, but 2) is invasive, so based on the lack
of complaints we have seen that does not seem really worth
backpatching to me, and I think that we could also just have 1) on
stable branches as a minimal fix, to take care of most of the
problems that could show up to users.
Attached is a patch to fix all that, with a cheap isolation test that
fails on HEAD with a cache lookup failure. I am adding that to the
next CF for now, and I would rather fix this issue before moving on
with REINDEX for partitioned relations as fixing this issue reduces
the use of session locks for partition trees.
Any thoughts?
--
Michael
Attachments:
reindex-schema-fix-v1.patchtext/x-diff; charset=us-asciiDownload+155-8
On 13.08.2020 07:38, Michael Paquier wrote:
Hi all,
While working on support for REINDEX for partitioned relations, I have
noticed an old bug in the logic of ReindexMultipleTables(): the list
of relations to process is built in a first transaction, and then each
table is done in an independent transaction, but we don't actually
check that the relation still exists when doing the real work. I
think that a complete fix involves two things:
1) Addition of one SearchSysCacheExists1() at the beginning of the
loop processing each item in the list in ReindexMultipleTables().
This would protect from a relation dropped, but that would not be
enough if ReindexMultipleTables() is looking at a relation dropped
before we lock it in reindex_table() or ReindexRelationConcurrently().
Still that's simple, cheaper, and would protect from most problems.
2) Be completely water-proof and adopt a logic close to what we do for
VACUUM where we try to open a relation, but leave if it does not
exist. This can be achieved with just some try_relation_open() calls
with the correct lock used, and we also need to have a new
REINDEXOPT_* flag to control this behavior conditionally.2) and 1) are complementary, but 2) is invasive, so based on the lack
of complaints we have seen that does not seem really worth
backpatching to me, and I think that we could also just have 1) on
stable branches as a minimal fix, to take care of most of the
problems that could show up to users.Attached is a patch to fix all that, with a cheap isolation test that
fails on HEAD with a cache lookup failure. I am adding that to the
next CF for now, and I would rather fix this issue before moving on
with REINDEX for partitioned relations as fixing this issue reduces
the use of session locks for partition trees.Any thoughts?
--
Michael
Hi,
I reviewed the patch. It does work and the code is clean and sane. It
implements a logic that we already had in CLUSTER, so I think it was
simply an oversight. Thank you for fixing this.
I noticed that REINDEXOPT_MISSING_OK can be passed to the TOAST table
reindex. I think it would be better to reset the flag in this
reindex_relation() call, as we don't expect a concurrent DROP here.
��� /*
��� �* If the relation has a secondary toast rel, reindex that too while we
��� �* still hold the lock on the main table.
��� �*/
��� if ((flags & REINDEX_REL_PROCESS_TOAST) && OidIsValid(toast_relid))
��� ��� result |= reindex_relation(toast_relid, flags, options);
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Aug 31, 2020 at 06:10:46PM +0300, Anastasia Lubennikova wrote:
I reviewed the patch. It does work and the code is clean and sane. It
implements a logic that we already had in CLUSTER, so I think it was simply
an oversight. Thank you for fixing this.
Thanks Anastasia for the review.
I noticed that REINDEXOPT_MISSING_OK can be passed to the TOAST table
reindex. I think it would be better to reset the flag in this
reindex_relation() call, as we don't expect a concurrent DROP here.
Good point, and fixed by resetting the flag here if it is set.
I have added some extra comments. There is one in
ReindexRelationConcurrently() to mention that there should be no extra
use of MISSING_OK once the list of indexes is built as session locks
are taken where needed.
Does the version attached look fine to you? I have done one round of
indentation while on it.
--
Michael
Attachments:
reindex-schema-fix-v2.patchtext/x-diff; charset=us-asciiDownload+175-9
On 01.09.2020 04:38, Michael Paquier wrote:
I have added some extra comments. There is one in
ReindexRelationConcurrently() to mention that there should be no extra
use of MISSING_OK once the list of indexes is built as session locks
are taken where needed.
Great, it took me a moment to understand the logic around index list
check at first pass.
Does the version attached look fine to you? I have done one round of
indentation while on it.
Yes, this version is good.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Sep 01, 2020 at 01:25:27PM +0300, Anastasia Lubennikova wrote:
Yes, this version is good.
Thanks. I have added an extra comment for the case of RELKIND_INDEX
with REINDEXOPT_MISSING_OK while on it, as it was not really obvious
why the parent relation needs to be locked (at least attempted to) at
this stage. And applied the change. Thanks for the review,
Anastasia.
--
Michael
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 47d4c07306..23840bb8e6 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -3352,6 +3352,7 @@ typedef struct ConstraintsSetStmt /* Reindex options */ #define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */ #define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */ +#define REINDEXOPT_MISSING_OK (2 << 1) /* skip missing relations */
I think you probably intended to write: 1<<2
Even though it's the same, someone is likely to be confused if they try to use
3<<1 vs 1<<3.
I noticed while resolving merge conflict.
--
Justin