Does a cancelled REINDEX CONCURRENTLY need to be messy?

Started by Thom Brownalmost 3 years ago9 messageshackers

thom@linux.com

almost 3 years ago

Hi,

It's documented that a failed REINDEX can leave behind a transient
index, and I'm not going to speculate on all the conditions that could
lead to this situation. However, cancelling a REINDEX CONCURRENTLY
will reliably leave behind the index it was building (<index
name>_ccnew).

Doesn't a cancellation instruct the process that the user has made a
decision regarding the fate of the new version of the index? Is there
a situation where the incomplete transient index might need to be
inspected following a cancellation?

Because if not, why not get it to tidy up after itself? If the
process crashed, fair enough, but it just doesn't sit well that
leftover artifacts of an aborted operation aren't tidied up,
especially since subsequent attempts to REINDEX will find these
invalid transient versions and attempt to REINDEX them. Why should
the user need to know about them and take manual action in the case of
a cancellation?

I get the feeling that this is deliberate, and perhaps an attempt to
mitigate locking issues, or some other explanation, but the rationale
isn't immediately apparent to me if this is the case.

Thanks

Thom

Andreas Karlsson

andreas.karlsson@percona.com

almost 3 years ago

In reply to: Thom Brown (#1)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

On 6/29/23 11:13, Thom Brown wrote:

I get the feeling that this is deliberate, and perhaps an attempt to
mitigate locking issues, or some other explanation, but the rationale
isn't immediately apparent to me if this is the case.

I have always assumed the reason is that there might be other
transactions using the index so if we are going to drop it on rollback
we might get stuck forever waiting for an exclusive lock on the index.
How do you get around that? Rollback being stuck waiting forever is
certainly not a nice behavior.

Andreas

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Andreas Karlsson (#2)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

Andreas Karlsson <andreas@proxel.se> writes:

On 6/29/23 11:13, Thom Brown wrote:

I get the feeling that this is deliberate, and perhaps an attempt to
mitigate locking issues, or some other explanation, but the rationale
isn't immediately apparent to me if this is the case.

I have always assumed the reason is that there might be other
transactions using the index so if we are going to drop it on rollback
we might get stuck forever waiting for an exclusive lock on the index.
How do you get around that? Rollback being stuck waiting forever is
certainly not a nice behavior.

Right. The whole point of CONCURRENTLY is to never take an exclusive
lock. But once we reach the stage where the index is open for other
transactions to insert into, it's difficult to back out in a nice way.

Now that we have DROP INDEX CONCURRENTLY, you could imagine switching
into that code path --- but that *also* involves waiting for other
transactions, so you still have the problem that the transaction may
appear to be stuck and not responding to cancel.

(IIRC, cancelling DROP INDEX CONCURRENTLY also leads to a messy
situation, in that the index is still there but might not be fully
functional.)

regards, tom lane

alvherre@2ndquadrant.com

almost 3 years ago

In reply to: Tom Lane (#3)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

ALTER TABLE DETACH CONCURRENTLY had to deal with this also, and it did it by having a COMPLETE option you can run later in case things got stuck the first time around. I suppose we could do something similar, where the server automatically does the needful, whatever that is.

thom@linux.com

almost 3 years ago

In reply to: Alvaro Herrera (#4)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

On Thu, 29 Jun 2023, 14:45 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote:

ALTER TABLE DETACH CONCURRENTLY had to deal with this also, and it did it
by having a COMPLETE option you can run later in case things got stuck the
first time around. I suppose we could do something similar, where the
server automatically does the needful, whatever that is.

So there doesn't appear to be provision for deferred activities.

Could, perhaps, the fact that it is an invalid index that has no locks on
it, and is dependent on the table mean it could be removed by a VACUUM?

I just don't like the idea of the user needing to remove broken things.

Thom

alvherre@2ndquadrant.com

almost 3 years ago

In reply to: Thom Brown (#5)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

On 2023-Jul-01, Thom Brown wrote:

On Thu, 29 Jun 2023, 14:45 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote:

ALTER TABLE DETACH CONCURRENTLY had to deal with this also, and it did it
by having a COMPLETE option you can run later in case things got stuck the
first time around. I suppose we could do something similar, where the
server automatically does the needful, whatever that is.

So there doesn't appear to be provision for deferred activities.

There is not.

Could, perhaps, the fact that it is an invalid index that has no locks on
it, and is dependent on the table mean it could be removed by a VACUUM?

Well, I definitely agree that it would be useful to have *something*
that automatically removes debris (I'm not sure VACUUM is the best place
to do it. Perhaps we could have autovacuum check for it, and do it
separately of vacuum proper.)

On the whole, the reason we don't have such a mechanism AFAIK is that
nobody has presented a credible implementation for it. There was a push
to use UNDO to remove orphan files; if we had that, we could also use it
to implement cleanup of dead indexes and partially-detached partitions.
However, that project crashed and burned a long time ago and has seen no
resurrection as yet.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Alvaro Herrera (#6)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

On Mon, Jul 03, 2023 at 07:46:27PM +0200, Alvaro Herrera wrote:

On 2023-Jul-01, Thom Brown wrote:

On Thu, 29 Jun 2023, 14:45 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote:

ALTER TABLE DETACH CONCURRENTLY had to deal with this also, and it did it
by having a COMPLETE option you can run later in case things got stuck the
first time around. I suppose we could do something similar, where the
server automatically does the needful, whatever that is.

I could imagine a code path for manual and automatic operations for
REINDEX (?) at table level and at database level, but using this
keyword would be strange, as well. CONCURRENTLY cannot work on system
indexes so SYSTEM does not make sense, and index level is no different
than a DROP.

Well, I definitely agree that it would be useful to have *something*
that automatically removes debris (I'm not sure VACUUM is the best place
to do it. Perhaps we could have autovacuum check for it, and do it
separately of vacuum proper.)

Being able to reuse some of the worker/launcher parts from autovacuum
could make things easier for a bgworker implementation, perhaps?
--
Michael

alvherre@2ndquadrant.com

almost 3 years ago

In reply to: Michael Paquier (#7)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

On 2023-Jul-04, Michael Paquier wrote:

On Mon, Jul 03, 2023 at 07:46:27PM +0200, Alvaro Herrera wrote:

Perhaps we could have autovacuum check for it, and do it
separately of vacuum proper.)

Being able to reuse some of the worker/launcher parts from autovacuum
could make things easier for a bgworker implementation, perhaps?

TBH I don't understand what you are thinking about.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I can see support will not be a problem. 10 out of 10." (Simon Wittber)
(http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)

Peter Eisentraut

peter_e@gmx.net

almost 3 years ago

In reply to: Alvaro Herrera (#6)

Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

On 03.07.23 19:46, Álvaro Herrera wrote:

Well, I definitely agree that it would be useful to have*something*
that automatically removes debris

Yeah, like "undo".