Should I implement DROP INDEX CONCURRENTLY?
Hello list,
At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).
I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.
Quoth index.c:
/*
* To drop an index safely, we must grab exclusive lock on its parent
* table. Exclusive lock on the index alone is insufficient because
* another backend might be about to execute a query on the parent table.
* If it relies on a previously cached list of index OIDs, then it could
* attempt to access the just-dropped index. We must therefore take a
* table lock strong enough to prevent all queries on the table from
* proceeding until we commit and send out a shared-cache-inval notice
* that will make them update their index lists.
*/
Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?
The general idea is:
1) set an index as "invalid", to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.
A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.
--
fdr
On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:
Hello list,
At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).
Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.
merlin
On Wed, Aug 24, 2011 at 2:24 PM, Daniel Farina <daniel@heroku.com> wrote:
Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?The general idea is:
1) set an index as "invalid", to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.
This might be a dumb idea, but could we rearrange CommitTransaction()
so that smgrDoPendingDeletes() happens just a bit further down, after
those ResourceOwnerRelease() calls? It seems like that might
accomplish what you're trying to do here without needing a new
command.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Merlin Moncure <mmoncure@gmail.com> writes:
On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:
At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).
Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.
Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.
regards, tom lane
On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:
At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.
Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.
--
fdr
On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina <daniel@heroku.com> wrote:
On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.
This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process. There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.
I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.
Back to the drawing board...
--
fdr
On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.
Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.
On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.
Here's a patch for this. Please review.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
drop_index_concurrently.v1.patchtext/x-patch; charset=US-ASCII; name=drop_index_concurrently.v1.patchDownload+314-23
On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina <daniel@heroku.com> wrote:
On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina <daniel@heroku.com> wrote:
On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process. There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.
This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.
On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
strategyfreebuffer.v1.patchtext/x-patch; charset=US-ASCII; name=strategyfreebuffer.v1.patchDownload+11-2
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.
Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
In fact, IIRC the function that scans for buffers actually checks to see if a rel still exists before it returns the buffer...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes:
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.
Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.
I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist. Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.
(BTW, do we have a separate clock sweep hand for each backend? If not,
there might be some low hanging fruit there.)
regards, tom lane
On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:
Jim Nasby <jim@nasby.net> writes:
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist. Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.
Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
Heh, it occurs to me that the SQL analogy for how things work right now is that backends currently have to run a SeqScan (or 5) to find a free page... what we need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.
(BTW, do we have a separate clock sweep hand for each backend? If not,
there might be some low hanging fruit there.)
No... having multiple clock hands is an interesting idea, but I'm worried that it could potentially get us into trouble if scores of backends were suddenly decrementing usage counts all over the place. For example, what if 5 backends all had their hands in basically the same place, all pointing at a very heavily used buffer. All 5 backends go for free space, they each grab the spinlock on that buffer in succession and suddenly this highly used buffer that started with a count of 5 has now been freed. We could potentially use more than one hand, but I think the relation between the number of hands and the maximum usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes:
Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.
Still, he who wants to do the work can try whatever approach he feels
like.
regards, tom lane
On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jim Nasby <jim@nasby.net> writes:
Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.Still, he who wants to do the work can try whatever approach he feels
like.
It might be possible to partition the free list. So imagine, for
example, 8 free lists. The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list. Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it. If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again. This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.
Of course you have to think about what happens if you find that your
chosen free list is empty. In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground. You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jim Nasby <jim@nasby.net> writes:
Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.Still, he who wants to do the work can try whatever approach he feels
like.It might be possible to partition the free list. So imagine, for
example, 8 free lists. The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list. Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it. If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again. This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.Of course you have to think about what happens if you find that your
chosen free list is empty. In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground. You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.
If it comes down to it, we can look at partitioning the free list. But here's the thing: this is the strategy that FreeBSD (and I think now Linux as well) use to service memory requests, be they for free memory or for reading data from disk. If it's good enough for an OS to use, I would expect we could make it work as well.
I would expect that pulling a page off the free list would be an extremely fast operation... lock the read lock on the list (we should probably have separate locks for putting stuff on the list vs taking it off), pin the buffer (which shouldn't be contentious), update the read pointer and drop the read lock. Even in C code that's probably less than 100 machine instructions, and no looping. Compared to the cost of running a clock sweep it should be significantly faster. So while there will be contention on the free list read lock, none of that contention should last very long.
Do we have any other places where we take extremely short locks like this and still run into problems?
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jim Nasby <jim@nasby.net> writes:
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.
My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.
Simon is after a different solution involving getting
rid of the clock sweep...
err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.
But lets discuss that on the main freelist thread.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
removebufmgrfreelist.v1.patchtext/x-patch; charset=US-ASCII; name=removebufmgrfreelist.v1.patchDownload+40-11
On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.
v2 attached with cleanup of some random stuff that crept onto patch.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
removebufmgrfreelist.v2.patchtext/x-patch; charset=US-ASCII; name=removebufmgrfreelist.v2.patchDownload+29-3
On Sat, Dec 31, 2011 at 8:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.Here's a patch for this. Please review.
I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.
Maybe we need to change indisvalid to a "char" and make it three
valued: c = being created currently, v = valid, d = being dropped
concurrently, or something like that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.
ISTM that one would need to set indisready to false instead.
On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.ISTM that one would need to set indisready to false instead.
Maybe we should set both to false?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company