reindex concurrently and two toast indexes

Started by Justin Pryzbyabout 6 years ago48 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:

About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove only invalid ones. Valid index gives "ERROR: permission denied: "pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.com

It looks like this was never addressed.

I noticed a ccnew toast index sitting around since October - what do I do with it ?

ts=# DROP INDEX pg_toast.pg_toast_463881620_index_ccnew;
ERROR: permission denied: "pg_toast_463881620_index_ccnew" is a system catalog

--
Justin

#2Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#1)
Re: reindex concurrently and two toast indexes

On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:

Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:

About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove
only invalid ones. Valid index gives "ERROR: permission denied:
"pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.com

It looks like this was never addressed.

On HEAD, this exact scenario leads to the presence of an old toast
index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
on a follow-up concurrent reindex:
=# reindex table CONCURRENTLY test_toast ;
WARNING: XX002: cannot reindex invalid index
"pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2863
REINDEX

And this toast index can be dropped while it remains invalid:
=# drop index pg_toast.pg_toast_16385_index_ccold;
DROP INDEX

I recall testing that stuff for all the interrupts which could be
triggered and in this case, this waits at step 5 within
WaitForLockersMultiple(). Now, in your case you take an extra step
with a plain REINDEX, which forces a rebuild of the invalid toast
index, making it per se valid, and not droppable.

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

Any thoughts?
--
Michael

#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#2)
Re: reindex concurrently and two toast indexes

On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:

Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:

About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove
only invalid ones. Valid index gives "ERROR: permission denied:
"pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.com

It looks like this was never addressed.

On HEAD, this exact scenario leads to the presence of an old toast
index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
on a follow-up concurrent reindex:
=# reindex table CONCURRENTLY test_toast ;
WARNING: XX002: cannot reindex invalid index
"pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2863
REINDEX

And this toast index can be dropped while it remains invalid:
=# drop index pg_toast.pg_toast_16385_index_ccold;
DROP INDEX

I recall testing that stuff for all the interrupts which could be
triggered and in this case, this waits at step 5 within
WaitForLockersMultiple(). Now, in your case you take an extra step
with a plain REINDEX, which forces a rebuild of the invalid toast
index, making it per se valid, and not droppable.

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?

#4Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#3)
Re: reindex concurrently and two toast indexes

On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?

No, I don't like much outsmarting REINDEX with more index drops than
it needs to do. And this would not take care of the case with REINDEX
INDEX done directly on a toast index.
--
Michael

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#4)
Re: reindex concurrently and two toast indexes

On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?

No, I don't like much outsmarting REINDEX with more index drops than
it needs to do. And this would not take care of the case with REINDEX
INDEX done directly on a toast index.

Well, we could still do both but I get the objection. Then skipping
invalid toast indexes in reindex_relation looks like the best fix.

#6Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#5)
Re: reindex concurrently and two toast indexes

On Tue, Feb 18, 2020 at 07:39:49AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

Or even automatically drop any invalid index on toast relation in
reindex_relation, since those can't be due to a failed CIC?

No, I don't like much outsmarting REINDEX with more index drops than
it needs to do. And this would not take care of the case with REINDEX
INDEX done directly on a toast index.

Well, we could still do both but I get the objection. Then skipping
invalid toast indexes in reindex_relation looks like the best fix.

PFA a patch to fix the problem using this approach.

I also added isolation tester regression tests. The failure is simulated using
a pg_cancel_backend() on top of pg_stat_activity, using filters on a
specifically set application name and the query text to avoid any unwanted
interaction. I also added a 1s locking delay, to ensure that even slow/CCA
machines can consistently reproduce the failure. Maybe that's not enough, or
maybe testing this scenario is not worth the extra time.

Attachments:

0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v1.patchtext/plain; charset=us-asciiDownload+100-2
#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#2)
Re: reindex concurrently and two toast indexes

On Tue, Feb 18, 2020 at 02:29:33PM +0900, Michael Paquier wrote:

On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:

Forking old, long thread:
/messages/by-id/36712441546604286@sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:

About reindex invalid indexes - i found one good question in archives [1]: how about toast indexes?
I check it now, i am able drop invalid toast index, but i can not drop reduntant valid index.
Reproduce:
session 1: begin; select from test_toast ... for update;
session 2: reindex table CONCURRENTLY test_toast ;
session 2: interrupt by ctrl+C
session 1: commit
session 2: reindex table test_toast ;
and now we have two toast indexes. DROP INDEX is able to remove
only invalid ones. Valid index gives "ERROR: permission denied:
"pg_toast_16426_index_ccnew" is a system catalog"
[1]: /messages/by-id/CAB7nPqT+6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw@mail.gmail.com

It looks like this was never addressed.

On HEAD, this exact scenario leads to the presence of an old toast
index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
on a follow-up concurrent reindex:
=# reindex table CONCURRENTLY test_toast ;
WARNING: XX002: cannot reindex invalid index
"pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2863
REINDEX

And this toast index can be dropped while it remains invalid:
=# drop index pg_toast.pg_toast_16385_index_ccold;
DROP INDEX

I recall testing that stuff for all the interrupts which could be
triggered and in this case, this waits at step 5 within
WaitForLockersMultiple(). Now, in your case you take an extra step
with a plain REINDEX, which forces a rebuild of the invalid toast
index, making it per se valid, and not droppable.

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

Julien sent a patch for that, but here are my ideas (which you are free to
reject):

Could you require an AEL for that case, or something which will preclude
reindex table test_toast from working ?

Could you use atomic updates to ensure that exactly one index in an {old,new}
pair is invalid at any given time ?

Could you make the new (invalid) toast index not visible to other transactions?

--
Justin Pryzby

#8Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#6)
Re: reindex concurrently and two toast indexes

On Sat, Feb 22, 2020 at 08:09:24AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 07:39:49AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:

On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:

Hmm. There could be an argument here for skipping invalid toast
indexes within reindex_index(), because we are sure about having at
least one valid toast index at anytime, and these are not concerned
with CIC.

PFA a patch to fix the problem using this approach.

I also added isolation tester regression tests. The failure is simulated using
a pg_cancel_backend() on top of pg_stat_activity, using filters on a
specifically set application name and the query text to avoid any unwanted
interaction. I also added a 1s locking delay, to ensure that even slow/CCA
machines can consistently reproduce the failure. Maybe that's not enough, or
maybe testing this scenario is not worth the extra time.

Sorry, I just realized that I forgot to commit the last changes before sending
the patch, so here's the correct v2.

Attachments:

0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v2.patchtext/plain; charset=us-asciiDownload+110-2
#9Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#8)
Re: reindex concurrently and two toast indexes

On Sat, Feb 22, 2020 at 04:06:57PM +0100, Julien Rouhaud wrote:

Sorry, I just realized that I forgot to commit the last changes before sending
the patch, so here's the correct v2.

Thanks for the patch.

+	if (skipit)
+	{
+		ereport(NOTICE,
+			 (errmsg("skipping invalid index \"%s.%s\"",
+				 get_namespace_name(get_rel_namespace(indexOid)),
+				 get_rel_name(indexOid))));

ReindexRelationConcurrently() issues a WARNING when bumping on an
invalid index, shouldn't the same log level be used?

Even with this patch, it is possible to reindex an invalid toast index
with REINDEX INDEX (with and without CONCURRENTLY), which is the
problem I mentioned upthread (Er, actually only for the non-concurrent
case as told about reindex_index). Shouldn't both cases be prevented
as well with an ERROR?
--
Michael

#10Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#9)
Re: reindex concurrently and two toast indexes

On Thu, Feb 27, 2020 at 04:32:11PM +0900, Michael Paquier wrote:

On Sat, Feb 22, 2020 at 04:06:57PM +0100, Julien Rouhaud wrote:

Sorry, I just realized that I forgot to commit the last changes before sending
the patch, so here's the correct v2.

Thanks for the patch.

+	if (skipit)
+	{
+		ereport(NOTICE,
+			 (errmsg("skipping invalid index \"%s.%s\"",
+				 get_namespace_name(get_rel_namespace(indexOid)),
+				 get_rel_name(indexOid))));

ReindexRelationConcurrently() issues a WARNING when bumping on an
invalid index, shouldn't the same log level be used?

For ReindexRelationConcurrently, the index is skipped because the feature isn't
supported, thus a warning. In this case that would work, it's just that we
don't want to process such indexes, so I used a notice instead.

I'm not opposed to use a warning instead if you prefer. What errcode should be
used though, ERRCODE_WARNING? ERRCODE_FEATURE_NOT_SUPPORTED doesn't feel
right.

Even with this patch, it is possible to reindex an invalid toast index
with REINDEX INDEX (with and without CONCURRENTLY), which is the
problem I mentioned upthread (Er, actually only for the non-concurrent
case as told about reindex_index). Shouldn't both cases be prevented
as well with an ERROR?

Ah indeed, sorry I missed that.

While looking at it, I see that invalid indexes seem to leaked when the table
is dropped, with no way to get rid of them:

s1:
CREATE TABLE t1(val text);
CREATE INDEX ON t1 (val);
BEGIN;
SELECT * FROM t1 FOR UPDATE;

s2:
REINDEX TABLE CONCURRENTLY t1;
[stucked and canceled]
SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid;
indexrelid | indrelid
-------------------------------------+-------------------------
t1_val_idx_ccold | t1
pg_toast.pg_toast_16385_index_ccold | pg_toast.pg_toast_16385
(2 rows)

s1:
ROLLBACK;
DROP TABLE t1;

SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid;
indexrelid | indrelid
-------------------------------------+----------
t1_val_idx_ccold | 16385
pg_toast.pg_toast_16385_index_ccold | 16388
(2 rows)

REINDEX INDEX t1_val_idx_ccold;
ERROR: XX000: could not open relation with OID 16385
LOCATION: relation_open, relation.c:62

DROP INDEX t1_val_idx_ccold;
ERROR: XX000: could not open relation with OID 16385
LOCATION: relation_open, relation.c:62

REINDEX INDEX pg_toast.pg_toast_16385_index_ccold;
ERROR: XX000: could not open relation with OID 16388
LOCATION: relation_open, relation.c:62

DROP INDEX pg_toast.pg_toast_16385_index_ccold;
ERROR: XX000: could not open relation with OID 16388
LOCATION: relation_open, relation.c:62

REINDEX DATABASE rjuju;
REINDEX

SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid;
indexrelid | indrelid
-------------------------------------+----------
t1_val_idx_ccold | 16385
pg_toast.pg_toast_16385_index_ccold | 16388
(2 rows)

Shouldn't DROP TABLE be fixed to also drop invalid indexes?

#11Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#10)
Re: reindex concurrently and two toast indexes

On Thu, Feb 27, 2020 at 09:07:35AM +0100, Julien Rouhaud wrote:

While looking at it, I see that invalid indexes seem to leaked when the table
is dropped, with no way to get rid of them:

Shouldn't DROP TABLE be fixed to also drop invalid indexes?

Hmm. The problem here is that I think that we don't have the correct
correct interface to handle the dependency switching between the old
and new indexes from the start, and 68ac9cf made things better in some
aspects (like non-cancellation and old index drop) but not in others
(like yours, or even a column drop). changeDependenciesOf/On() have
been added especially for REINDEX CONCURRENTLY, but they are not
actually able to handle the case we want them to handle: do a switch
for both relations within the same scan. It is possible to use three
times the existing routines with a couple of CCIs in-between and what
I would call a fake placeholder OID to switch all the records cleanly,
but it would be actually cleaner to do a single scan of pg_depend and
switch the dependencies of both objects at once.

Attached is a draft patch to take care of that problem for HEAD. It
still needs a lot of polishing (variable names are not actually old
or new anymore, etc.) but that's enough to show the idea. If a version
reaches PG12, we would need to keep around the past routines to avoid
an ABI breakage, even if I doubt there are callers of it, but who
knows..
--
Michael

Attachments:

reindex-deps-v1.patchtext/x-diff; charset=us-asciiDownload+55-55
#12Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#11)
Re: reindex concurrently and two toast indexes

On Tue, Mar 03, 2020 at 05:06:42PM +0900, Michael Paquier wrote:

Attached is a draft patch to take care of that problem for HEAD. It
still needs a lot of polishing (variable names are not actually old
or new anymore, etc.) but that's enough to show the idea. If a version
reaches PG12, we would need to keep around the past routines to avoid
an ABI breakage, even if I doubt there are callers of it, but who
knows..

Or actually, a more simple solution is to abuse of the two existing
routines so as the dependency switch is done the other way around,
from the new index to the old one. That would visibly work because
there is no CCI between each scan, and that's faster because the scan
of pg_depend is done only on the entries in need of an update. I'll
look at that again tomorrow, it is late here and I may be missing
something obvious.
--
Michael

Attachments:

reindex-deps-v2.patchtext/x-diff; charset=us-asciiDownload+4-5
#13Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#12)
Re: reindex concurrently and two toast indexes

On Tue, Mar 03, 2020 at 06:25:51PM +0900, Michael Paquier wrote:

Or actually, a more simple solution is to abuse of the two existing
routines so as the dependency switch is done the other way around,
from the new index to the old one. That would visibly work because
there is no CCI between each scan, and that's faster because the scan
of pg_depend is done only on the entries in need of an update. I'll
look at that again tomorrow, it is late here and I may be missing
something obvious.

It was a good inspiration. I have been torturing this patch today and
played with it by injecting elog(ERROR) calls in the middle of reindex
concurrently for all the phases, and checked manually the handling of
entries in pg_depend for the new and old indexes, and these correctly
map. So this is taking care of your problem. Attached is an updated
patch with an updated comment about the dependency of this code with
CCIs. I'd like to go fix this issue first.
--
Michael

Attachments:

reindex-deps-v3.patchtext/x-diff; charset=us-asciiDownload+6-5
#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#13)
Re: reindex concurrently and two toast indexes

On Wed, Mar 4, 2020 at 6:15 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Mar 03, 2020 at 06:25:51PM +0900, Michael Paquier wrote:

Or actually, a more simple solution is to abuse of the two existing
routines so as the dependency switch is done the other way around,
from the new index to the old one. That would visibly work because
there is no CCI between each scan, and that's faster because the scan
of pg_depend is done only on the entries in need of an update. I'll
look at that again tomorrow, it is late here and I may be missing
something obvious.

It was a good inspiration. I have been torturing this patch today and
played with it by injecting elog(ERROR) calls in the middle of reindex
concurrently for all the phases, and checked manually the handling of
entries in pg_depend for the new and old indexes, and these correctly
map. So this is taking care of your problem. Attached is an updated
patch with an updated comment about the dependency of this code with
CCIs. I'd like to go fix this issue first.

Thanks for the patch! I started to look at it during the weekend, but
I got interrupted and unfortunately didn't had time to look at it
since.

The fix looks good to me. I also tried multiple failure scenario and
it's unsurprisingly working just fine. Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.

#15Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#14)
Re: reindex concurrently and two toast indexes

On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:

Thanks for the patch! I started to look at it during the weekend, but
I got interrupted and unfortunately didn't had time to look at it
since.

No problem, thanks for looking at it. I have looked at it again this
morning, and applied it.

The fix looks good to me. I also tried multiple failure scenario and
it's unsurprisingly working just fine. Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.

I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread). I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind:
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.
--
Michael

#16Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#15)
Re: reindex concurrently and two toast indexes

On Thu, Mar 05, 2020 at 12:53:54PM +0900, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:

Thanks for the patch! I started to look at it during the weekend, but
I got interrupted and unfortunately didn't had time to look at it
since.

No problem, thanks for looking at it. I have looked at it again this
morning, and applied it.

The fix looks good to me. I also tried multiple failure scenario and
it's unsurprisingly working just fine. Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.

I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread). I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind:
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.

I agree that the approach wasn't quite robust. I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?

Here's a v3 that takes address the various comments you previously noted, and
for which I also removed the regression tests.

Note that while looking at it, I noticed another bug in RIC:

# create table t1(id integer, val text); create index on t1(val);
CREATE TABLE

CREATE INDEX

# reindex table concurrently t1;
^CCancel request sent
ERROR: 57014: canceling statement due to user request
LOCATION: ProcessInterrupts, postgres.c:3171

# select indexrelid::regclass, indrelid::regclass, indexrelid, indrelid from pg_index where not indisvalid;
indexrelid | indrelid | indexrelid | indrelid
-------------------------------------+-------------------------+------------+----------
t1_val_idx_ccold | t1 | 16401 | 16395
pg_toast.pg_toast_16395_index_ccold | pg_toast.pg_toast_16395 | 16400 | 16398
(2 rows)

# reindex table concurrently t1;
WARNING: 0A000: cannot reindex invalid index "public.t1_val_idx_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2821
WARNING: XX002: cannot reindex invalid index "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2867
REINDEX

# reindex index concurrently t1_val_idx_ccold;
REINDEX

That case is also fixed in this patch.

Attachments:

0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v3.patchtext/plain; charset=us-asciiDownload+67-11
#17Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#16)
Re: reindex concurrently and two toast indexes

On Thu, Mar 05, 2020 at 05:57:07PM +0100, Julien Rouhaud wrote:

I agree that the approach wasn't quite robust. I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?

Yes, that's too late.

Note that while looking at it, I noticed another bug in RIC:

[...]

# reindex table concurrently t1;
WARNING: 0A000: cannot reindex invalid index "public.t1_val_idx_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2821
WARNING: XX002: cannot reindex invalid index "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2867
REINDEX
# reindex index concurrently t1_val_idx_ccold;
REINDEX

That case is also fixed in this patch.

This choice is intentional. The idea about bypassing invalid indexes
for table-level REINDEX is that this would lead to a bloat in the
number of relations to handling if multiple runs are failing, leading
to more and more invalid indexes to handle each time. Allowing a
single invalid non-toast index to be reindexed with CONCURRENTLY can
be helpful in some cases, like for example a CIC for a unique index
that failed and was invalid, where the relation already defined can be
reused.
--
Michael

#18Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#17)
Re: reindex concurrently and two toast indexes

On Fri, Mar 06, 2020 at 10:38:44AM +0900, Michael Paquier wrote:

On Thu, Mar 05, 2020 at 05:57:07PM +0100, Julien Rouhaud wrote:

I agree that the approach wasn't quite robust. I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?

Yes, that's too late.

Note that while looking at it, I noticed another bug in RIC:

[...]

# reindex table concurrently t1;
WARNING: 0A000: cannot reindex invalid index "public.t1_val_idx_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2821
WARNING: XX002: cannot reindex invalid index "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION: ReindexRelationConcurrently, indexcmds.c:2867
REINDEX
# reindex index concurrently t1_val_idx_ccold;
REINDEX

That case is also fixed in this patch.

This choice is intentional. The idea about bypassing invalid indexes
for table-level REINDEX is that this would lead to a bloat in the
number of relations to handling if multiple runs are failing, leading
to more and more invalid indexes to handle each time. Allowing a
single invalid non-toast index to be reindexed with CONCURRENTLY can
be helpful in some cases, like for example a CIC for a unique index
that failed and was invalid, where the relation already defined can be
reused.

Ah I see, thanks for the clarification. I guess there's room for improvement
in the comments about that, since the ERRCODE_FEATURE_NOT_SUPPORTED usage is
quite misleading there.

v4 attached, which doesn't prevent a REINDEX INDEX CONCURRENTLY on any invalid
non-TOAST index anymore.

Attachments:

0001-Don-t-reindex-invalid-indexes-on-TOAST-tables-v4.patchtext/plain; charset=us-asciiDownload+45-1
#19Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#15)
Add an optional timeout clause to isolationtester step.

On Thu, Mar 05, 2020 at 12:53:54PM +0900, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:

Should we add some regression
tests for that? I guess most of it could be borrowed from the patch
to fix the toast index issue I sent last week.

I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread). I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind:
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.

Here's a patch to add an optional "timeout val" clause to isolationtester's
step definition. When used, isolationtester will actively wait on the query
rather than continuing with the permutation next step, and will issue a cancel
once the defined timeout is reached. I also added as a POC the previous
regression tests for invalid TOAST indexes, updated to use this new
infrastructure (which won't pass as long as the original bug for invalid TOAST
indexes isn't fixed).

I'll park that in the next commitfest, with a v14 target version.

Attachments:

0001-Add-an-optional-timeout-value-to-isolationtester-ste-v1.patchtext/x-diff; charset=us-asciiDownload+70-12
0002-Add-regression-tests-for-failed-REINDEX-TABLE-CONCUR-v2.patchtext/x-diff; charset=us-asciiDownload+72-2
#20Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#19)
Re: Add an optional timeout clause to isolationtester step.

On Fri, Mar 06, 2020 at 02:15:47PM +0100, Julien Rouhaud wrote:

Here's a patch to add an optional "timeout val" clause to isolationtester's
step definition. When used, isolationtester will actively wait on the query
rather than continuing with the permutation next step, and will issue a cancel
once the defined timeout is reached. I also added as a POC the previous
regression tests for invalid TOAST indexes, updated to use this new
infrastructure (which won't pass as long as the original bug for invalid TOAST
indexes isn't fixed).

One problem with this approach is that it does address the stability
of the test on very slow machines, and there are some of them in the
buildfarm.  Taking your patch, I can make the test fail by applying
the following sleep because the query would be cancelled before some
of the indexes are marked as invalid:
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3046,6 +3046,8 @@ ReindexRelationConcurrently(Oid relationOid, int
options)
    CommitTransactionCommand();
        StartTransactionCommand();
+   pg_usleep(100000L * 10); /* 10s */
+
    /*
     * Phase 2 of REINDEX CONCURRENTLY

Another problem is that on faster machines the test is slow because of
the timeout used. What are your thoughts about having instead a
cancel meta-command instead?
--
Michael

#21Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#21)
#23Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#23)
#25Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#26)
#28Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#18)
#29Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#22)
#31Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#30)
#32Andres Freund
andres@anarazel.de
In reply to: Julien Rouhaud (#25)
#33Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#34)
#36Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#29)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#35)
#38Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#36)
#39Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#37)
#40Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#39)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#40)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#41)
#43Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#43)
#45Julien Rouhaud
rjuju123@gmail.com
In reply to: Alvaro Herrera (#42)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#45)
#47Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#47)