reindex concurrently and two toast indexes
Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:
About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove only invalid ones. Valid index gives "ERROR: permission denied: "pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.com
It looks like this was never addressed.
I noticed a ccnew toast index sitting around since October - what do I do with it ?
ts=# DROP INDEX pg_toast.pg_toast_463881620_index_ccnew;
ERROR: permission denied: "pg_toast_463881620_index_ccnew" is a system catalog
--
Justin
On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:
Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove
only invalid ones. Valid index gives "ERROR: permission denied:
"pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.comIt looks like this was never addressed.
On HEAD, this exact scenario leads to the presence of an old toast
index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
on a follow-up concurrent reindex:
=# reindex table CONCURRENTLY test_toast ;
WARNING: XX002: cannot reindex invalid index
"pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2863
REINDEX
And this toast index can be dropped while it remains invalid:
=# drop index pg_toast.pg_toast_16385_index_ccold;
DROP INDEX
I recall testing that stuff for all the interrupts which could be
triggered and in this case, this waits at step 5 within
WaitForLockersMultiple(). Now, in your case you take an extra step
with a plain REINDEX, which forces a rebuild of the invalid toast
index, making it per se valid, and not droppable.
Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.
Any thoughts?
--
Michael
On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:
Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove
only invalid ones. Valid index gives "ERROR: permission denied:
"pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.comIt looks like this was never addressed.
On HEAD, this exact scenario leads to the presence of an old toast
index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
on a follow-up concurrent reindex:
=# reindex table CONCURRENTLY test_toast ;
WARNING: XX002: cannot reindex invalid index
"pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2863
REINDEXAnd this toast index can be dropped while it remains invalid:
=# drop index pg_toast.pg_toast_16385_index_ccold;
DROP INDEXI recall testing that stuff for all the interrupts which could be
triggered and in this case, this waits at step 5 within
WaitForLockersMultiple(). Now, in your case you take an extra step
with a plain REINDEX, which forces a rebuild of the invalid toast
index, making it per se valid, and not droppable.Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.
Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?
On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:
Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?
No, I don't like much outsmarting REINDEX with more index drops than
it needs to do. And this would not take care of the case with REINDEX
INDEX done directly on a toast index.
--
Michael
On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:
Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?No, I don't like much outsmarting REINDEX with more index drops than
it needs to do. And this would not take care of the case with REINDEX
INDEX done directly on a toast index.
Well, we could still do both but I get the objection. Then skipping
invalid toast indexes in reindex_relation looks like the best fix.
On Tue, Feb 18, 2020 at 07:39:49AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:
Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?No, I don't like much outsmarting REINDEX with more index drops than
it needs to do. And this would not take care of the case with REINDEX
INDEX done directly on a toast index.Well, we could still do both but I get the objection. Then skipping
invalid toast indexes in reindex_relation looks like the best fix.
PFA a patch to fix the problem using this approach.
I also added isolation tester regression tests. The failure is simulated using
a pg_cancel_backend() on top of pg_stat_activity, using filters on a
specifically set application name and the query text to avoid any unwanted
interaction. I also added a 1s locking delay, to ensure that even slow/CCA
machines can consistently reproduce the failure. Maybe that's not enough, or
maybe testing this scenario is not worth the extra time.
Attachments:
0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v1.patchtext/plain; charset=us-asciiDownload+100-2
On Tue, Feb 18, 2020 at 02:29:33PM +0900, Michael Paquier wrote:
On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:
Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove
only invalid ones. Valid index gives "ERROR: permission denied:
"pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.comIt looks like this was never addressed.
On HEAD, this exact scenario leads to the presence of an old toast
index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
on a follow-up concurrent reindex:
=# reindex table CONCURRENTLY test_toast ;
WARNING: XX002: cannot reindex invalid index
"pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2863
REINDEXAnd this toast index can be dropped while it remains invalid:
=# drop index pg_toast.pg_toast_16385_index_ccold;
DROP INDEXI recall testing that stuff for all the interrupts which could be
triggered and in this case, this waits at step 5 within
WaitForLockersMultiple(). Now, in your case you take an extra step
with a plain REINDEX, which forces a rebuild of the invalid toast
index, making it per se valid, and not droppable.Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.
Julien sent a patch for that, but here are my ideas (which you are free to
reject):
Could you require an AEL for that case, or something which will preclude
reindex table test_toast from working ?
Could you use atomic updates to ensure that exactly one index in an {old,new}
pair is invalid at any given time ?
Could you make the new (invalid) toast index not visible to other transactions?
--
Justin Pryzby
On Sat, Feb 22, 2020 at 08:09:24AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 07:39:49AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:
On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:
Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.PFA a patch to fix the problem using this approach.
I also added isolation tester regression tests. The failure is simulated using
a pg_cancel_backend() on top of pg_stat_activity, using filters on a
specifically set application name and the query text to avoid any unwanted
interaction. I also added a 1s locking delay, to ensure that even slow/CCA
machines can consistently reproduce the failure. Maybe that's not enough, or
maybe testing this scenario is not worth the extra time.
Sorry, I just realized that I forgot to commit the last changes before sending
the patch, so here's the correct v2.
Attachments:
0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v2.patchtext/plain; charset=us-asciiDownload+110-2
On Sat, Feb 22, 2020 at 04:06:57PM +0100, Julien Rouhaud wrote:
Sorry, I just realized that I forgot to commit the last changes before sending
the patch, so here's the correct v2.
Thanks for the patch.
+ if (skipit) + { + ereport(NOTICE, + (errmsg("skipping invalid index \"%s.%s\"", + get_namespace_name(get_rel_namespace(indexOid)), + get_rel_name(indexOid))));
ReindexRelationConcurrently() issues a WARNING when bumping on an
invalid index, shouldn't the same log level be used?
Even with this patch, it is possible to reindex an invalid toast index
with REINDEX INDEX (with and without CONCURRENTLY), which is the
problem I mentioned upthread (Er, actually only for the non-concurrent
case as told about reindex_index). Shouldn't both cases be prevented
as well with an ERROR?
--
Michael
On Thu, Feb 27, 2020 at 04:32:11PM +0900, Michael Paquier wrote:
On Sat, Feb 22, 2020 at 04:06:57PM +0100, Julien Rouhaud wrote:
Sorry, I just realized that I forgot to commit the last changes before sending
the patch, so here's the correct v2.Thanks for the patch.
+ if (skipit) + { + ereport(NOTICE, + (errmsg("skipping invalid index \"%s.%s\"", + get_namespace_name(get_rel_namespace(indexOid)), + get_rel_name(indexOid))));ReindexRelationConcurrently() issues a WARNING when bumping on an
invalid index, shouldn't the same log level be used?
For ReindexRelationConcurrently, the index is skipped because the feature isn't
supported, thus a warning. In this case that would work, it's just that we
don't want to process such indexes, so I used a notice instead.
I'm not opposed to use a warning instead if you prefer. What errcode should be
used though, ERRCODE_WARNING? ERRCODE_FEATURE_NOT_SUPPORTED doesn't feel
right.
Even with this patch, it is possible to reindex an invalid toast index
with REINDEX INDEX (with and without CONCURRENTLY), which is the
problem I mentioned upthread (Er, actually only for the non-concurrent
case as told about reindex_index). Shouldn't both cases be prevented
as well with an ERROR?
Ah indeed, sorry I missed that.
While looking at it, I see that invalid indexes seem to leaked when the table
is dropped, with no way to get rid of them:
s1:
CREATE TABLE t1(val text);
CREATE INDEX ON t1 (val);
BEGIN;
SELECT * FROM t1 FOR UPDATE;
s2:
REINDEX TABLE CONCURRENTLY t1;
[stucked and canceled]
SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid;
indexrelid | indrelid
-------------------------------------+-------------------------
t1_val_idx_ccold | t1
pg_toast.pg_toast_16385_index_ccold | pg_toast.pg_toast_16385
(2 rows)
s1:
ROLLBACK;
DROP TABLE t1;
SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid;
indexrelid | indrelid
-------------------------------------+----------
t1_val_idx_ccold | 16385
pg_toast.pg_toast_16385_index_ccold | 16388
(2 rows)
REINDEX INDEX t1_val_idx_ccold;
ERROR: XX000: could not open relation with OID 16385
LOCATION: relation_open, relation.c:62
DROP INDEX t1_val_idx_ccold;
ERROR: XX000: could not open relation with OID 16385
LOCATION: relation_open, relation.c:62
REINDEX INDEX pg_toast.pg_toast_16385_index_ccold;
ERROR: XX000: could not open relation with OID 16388
LOCATION: relation_open, relation.c:62
DROP INDEX pg_toast.pg_toast_16385_index_ccold;
ERROR: XX000: could not open relation with OID 16388
LOCATION: relation_open, relation.c:62
REINDEX DATABASE rjuju;
REINDEX
SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid;
indexrelid | indrelid
-------------------------------------+----------
t1_val_idx_ccold | 16385
pg_toast.pg_toast_16385_index_ccold | 16388
(2 rows)
Shouldn't DROP TABLE be fixed to also drop invalid indexes?
On Thu, Feb 27, 2020 at 09:07:35AM +0100, Julien Rouhaud wrote:
While looking at it, I see that invalid indexes seem to leaked when the table
is dropped, with no way to get rid of them:Shouldn't DROP TABLE be fixed to also drop invalid indexes?
Hmm. The problem here is that I think that we don't have the correct
correct interface to handle the dependency switching between the old
and new indexes from the start, and 68ac9cf made things better in some
aspects (like non-cancellation and old index drop) but not in others
(like yours, or even a column drop). changeDependenciesOf/On() have
been added especially for REINDEX CONCURRENTLY, but they are not
actually able to handle the case we want them to handle: do a switch
for both relations within the same scan. It is possible to use three
times the existing routines with a couple of CCIs in-between and what
I would call a fake placeholder OID to switch all the records cleanly,
but it would be actually cleaner to do a single scan of pg_depend and
switch the dependencies of both objects at once.
Attached is a draft patch to take care of that problem for HEAD. It
still needs a lot of polishing (variable names are not actually old
or new anymore, etc.) but that's enough to show the idea. If a version
reaches PG12, we would need to keep around the past routines to avoid
an ABI breakage, even if I doubt there are callers of it, but who
knows..
--
Michael
Attachments:
reindex-deps-v1.patchtext/x-diff; charset=us-asciiDownload+55-55
On Tue, Mar 03, 2020 at 05:06:42PM +0900, Michael Paquier wrote:
Attached is a draft patch to take care of that problem for HEAD. It
still needs a lot of polishing (variable names are not actually old
or new anymore, etc.) but that's enough to show the idea. If a version
reaches PG12, we would need to keep around the past routines to avoid
an ABI breakage, even if I doubt there are callers of it, but who
knows..
Or actually, a more simple solution is to abuse of the two existing
routines so as the dependency switch is done the other way around,
from the new index to the old one. That would visibly work because
there is no CCI between each scan, and that's faster because the scan
of pg_depend is done only on the entries in need of an update. I'll
look at that again tomorrow, it is late here and I may be missing
something obvious.
--
Michael
Attachments:
reindex-deps-v2.patchtext/x-diff; charset=us-asciiDownload+4-5
On Tue, Mar 03, 2020 at 06:25:51PM +0900, Michael Paquier wrote:
Or actually, a more simple solution is to abuse of the two existing
routines so as the dependency switch is done the other way around,
from the new index to the old one. That would visibly work because
there is no CCI between each scan, and that's faster because the scan
of pg_depend is done only on the entries in need of an update. I'll
look at that again tomorrow, it is late here and I may be missing
something obvious.
It was a good inspiration. I have been torturing this patch today and
played with it by injecting elog(ERROR) calls in the middle of reindex
concurrently for all the phases, and checked manually the handling of
entries in pg_depend for the new and old indexes, and these correctly
map. So this is taking care of your problem. Attached is an updated
patch with an updated comment about the dependency of this code with
CCIs. I'd like to go fix this issue first.
--
Michael
Attachments:
reindex-deps-v3.patchtext/x-diff; charset=us-asciiDownload+6-5
On Wed, Mar 4, 2020 at 6:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Mar 03, 2020 at 06:25:51PM +0900, Michael Paquier wrote:
Or actually, a more simple solution is to abuse of the two existing
routines so as the dependency switch is done the other way around,
from the new index to the old one. That would visibly work because
there is no CCI between each scan, and that's faster because the scan
of pg_depend is done only on the entries in need of an update. I'll
look at that again tomorrow, it is late here and I may be missing
something obvious.It was a good inspiration. I have been torturing this patch today and
played with it by injecting elog(ERROR) calls in the middle of reindex
concurrently for all the phases, and checked manually the handling of
entries in pg_depend for the new and old indexes, and these correctly
map. So this is taking care of your problem. Attached is an updated
patch with an updated comment about the dependency of this code with
CCIs. I'd like to go fix this issue first.
Thanks for the patch! I started to look at it during the weekend, but
I got interrupted and unfortunately didn't had time to look at it
since.
The fix looks good to me. I also tried multiple failure scenario and
it's unsurprisingly working just fine. Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.
On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:
Thanks for the patch! I started to look at it during the weekend, but
I got interrupted and unfortunately didn't had time to look at it
since.
No problem, thanks for looking at it. I have looked at it again this
morning, and applied it.
The fix looks good to me. I also tried multiple failure scenario and
it's unsurprisingly working just fine. Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.
I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread). I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind:
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.
--
Michael
On Thu, Mar 05, 2020 at 12:53:54PM +0900, Michael Paquier wrote:
On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:
Thanks for the patch! I started to look at it during the weekend, but
I got interrupted and unfortunately didn't had time to look at it
since.No problem, thanks for looking at it. I have looked at it again this
morning, and applied it.The fix looks good to me. I also tried multiple failure scenario and
it's unsurprisingly working just fine. Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread). I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind:
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.
I agree that the approach wasn't quite robust. I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?
Here's a v3 that takes address the various comments you previously noted, and
for which I also removed the regression tests.
Note that while looking at it, I noticed another bug in RIC:
# create table t1(id integer, val text); create index on t1(val);
CREATE TABLE
CREATE INDEX
# reindex table concurrently t1;
^CCancel request sent
ERROR: 57014: canceling statement due to user request
LOCATION: ProcessInterrupts, postgres.c:3171
# select indexrelid::regclass, indrelid::regclass, indexrelid, indrelid from pg_index where not indisvalid;
indexrelid | indrelid | indexrelid | indrelid
-------------------------------------+-------------------------+------------+----------
t1_val_idx_ccold | t1 | 16401 | 16395
pg_toast.pg_toast_16395_index_ccold | pg_toast.pg_toast_16395 | 16400 | 16398
(2 rows)
# reindex table concurrently t1;
WARNING: 0A000: cannot reindex invalid index "public.t1_val_idx_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2821
WARNING: XX002: cannot reindex invalid index "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2867
REINDEX
# reindex index concurrently t1_val_idx_ccold;
REINDEX
That case is also fixed in this patch.
Attachments:
0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v3.patchtext/plain; charset=us-asciiDownload+67-11
On Thu, Mar 05, 2020 at 05:57:07PM +0100, Julien Rouhaud wrote:
I agree that the approach wasn't quite robust. I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?
Yes, that's too late.
Note that while looking at it, I noticed another bug in RIC:
[...]
# reindex table concurrently t1;
WARNING: 0A000: cannot reindex invalid index "public.t1_val_idx_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2821
WARNING: XX002: cannot reindex invalid index "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2867
REINDEX
# reindex index concurrently t1_val_idx_ccold;
REINDEXThat case is also fixed in this patch.
This choice is intentional. The idea about bypassing invalid indexes
for table-level REINDEX is that this would lead to a bloat in the
number of relations to handling if multiple runs are failing, leading
to more and more invalid indexes to handle each time. Allowing a
single invalid non-toast index to be reindexed with CONCURRENTLY can
be helpful in some cases, like for example a CIC for a unique index
that failed and was invalid, where the relation already defined can be
reused.
--
Michael
On Fri, Mar 06, 2020 at 10:38:44AM +0900, Michael Paquier wrote:
On Thu, Mar 05, 2020 at 05:57:07PM +0100, Julien Rouhaud wrote:
I agree that the approach wasn't quite robust. I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?Yes, that's too late.
Note that while looking at it, I noticed another bug in RIC:
[...]
# reindex table concurrently t1;
WARNING: 0A000: cannot reindex invalid index "public.t1_val_idx_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2821
WARNING: XX002: cannot reindex invalid index "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2867
REINDEX
# reindex index concurrently t1_val_idx_ccold;
REINDEXThat case is also fixed in this patch.
This choice is intentional. The idea about bypassing invalid indexes
for table-level REINDEX is that this would lead to a bloat in the
number of relations to handling if multiple runs are failing, leading
to more and more invalid indexes to handle each time. Allowing a
single invalid non-toast index to be reindexed with CONCURRENTLY can
be helpful in some cases, like for example a CIC for a unique index
that failed and was invalid, where the relation already defined can be
reused.
Ah I see, thanks for the clarification. I guess there's room for improvement
in the comments about that, since the ERRCODE_FEATURE_NOT_SUPPORTED usage is
quite misleading there.
v4 attached, which doesn't prevent a REINDEX INDEX CONCURRENTLY on any invalid
non-TOAST index anymore.
Attachments:
0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v4.patchtext/plain; charset=us-asciiDownload+45-1
On Thu, Mar 05, 2020 at 12:53:54PM +0900, Michael Paquier wrote:
On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:
Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread). I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind:
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.
Here's a patch to add an optional "timeout val" clause to isolationtester's
step definition. When used, isolationtester will actively wait on the query
rather than continuing with the permutation next step, and will issue a cancel
once the defined timeout is reached. I also added as a POC the previous
regression tests for invalid TOAST indexes, updated to use this new
infrastructure (which won't pass as long as the original bug for invalid TOAST
indexes isn't fixed).
I'll park that in the next commitfest, with a v14 target version.
On Fri, Mar 06, 2020 at 02:15:47PM +0100, Julien Rouhaud wrote:
Here's a patch to add an optional "timeout val" clause to isolationtester's
step definition. When used, isolationtester will actively wait on the query
rather than continuing with the permutation next step, and will issue a cancel
once the defined timeout is reached. I also added as a POC the previous
regression tests for invalid TOAST indexes, updated to use this new
infrastructure (which won't pass as long as the original bug for invalid TOAST
indexes isn't fixed).
One problem with this approach is that it does address the stability
of the test on very slow machines, and there are some of them in the
buildfarm. Taking your patch, I can make the test fail by applying
the following sleep because the query would be cancelled before some
of the indexes are marked as invalid:
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3046,6 +3046,8 @@ ReindexRelationConcurrently(Oid relationOid, int
options)
CommitTransactionCommand();
StartTransactionCommand();
+ pg_usleep(100000L * 10); /* 10s */
+
/*
* Phase 2 of REINDEX CONCURRENTLY
Another problem is that on faster machines the test is slow because of
the timeout used. What are your thoughts about having instead a
cancel meta-command instead?
--
Michael