Bugs in CREATE/DROP INDEX CONCURRENTLY
1. These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table. The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*. That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.
I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.
2. DROP INDEX CONCURRENTLY doesn't bother to do
TransferPredicateLocksToHeapRelation until long after it's invalidated
the index. Surely that's no good? Is it even possible to do that
correctly, when we don't have a lock that will prevent new predicate
locks from being taken out meanwhile?
regards, tom lane
On 6 October 2012 00:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table. The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*. That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.2. DROP INDEX CONCURRENTLY doesn't bother to do
TransferPredicateLocksToHeapRelation until long after it's invalidated
the index. Surely that's no good? Is it even possible to do that
correctly, when we don't have a lock that will prevent new predicate
locks from being taken out meanwhile?
I'm in the middle of reviewing other fixes there, so will comment
soon, just not right now.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Oct 6, 2012 at 12:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table.
I wonder if we need a manual that lists exhaustively what operations
can be taken with locks on various objects. Relying on understanding
every other possible operation in the system and knowing which
operations might or might not happen is clearly getting us into
trouble.
This would be a lot less worrying if we had a regression test suite
that could reliably test race conditions like this.
The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*. That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.
I marked this email to read later but now I'm reading it and I realize
I'm not sure I can really help. Using a nontransactional update to
flag indisready makes sense to me since the whole point of the wait is
that we've waited long enough that anyone should see this record. I
guess that's what I had in mind was happening when I wrote the update.
But I'm not sure what other requirements nontransactional updates have
to work properly.
--
greg
On 6 October 2012 00:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table. The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*. That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.
Only solution there is to fix SnapshotNow, as we discussed last
Christmas. I'll dig out my patch for that, which IIRC I was nervous of
committing at last minute into 9.2.
2. DROP INDEX CONCURRENTLY doesn't bother to do
TransferPredicateLocksToHeapRelation until long after it's invalidated
the index. Surely that's no good? Is it even possible to do that
correctly, when we don't have a lock that will prevent new predicate
locks from being taken out meanwhile?
No idea there. Input appreciated.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 9, 2012 at 6:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 6 October 2012 00:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table. The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*. That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.Only solution there is to fix SnapshotNow, as we discussed last
Christmas. I'll dig out my patch for that, which IIRC I was nervous of
committing at last minute into 9.2.
Hi Simon,
Do you have an URL to this patch?
--
Michael Paquier
http://michael.otacoo.com
On 2012-10-05 19:56:40 -0400, Tom Lane wrote:
1. These operations think they can use ordinary heap_update operations
to change pg_index entries when they don't have exclusive lock on the
parent table. The lack of ex-lock means that another backend could be
currently loading up its list of index OIDs for the table --- and since
it scans pg_index with SnapshotNow to do that, the heap_update could
result in the other backend failing to see this index *at all*. That's
okay if it causes the other backend to not use the index for scanning...
but not okay if it causes the other backend to fail to make index
entries it is supposed to make.I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.
I couldn't really think of any realistic method to fix this other than
update in place. I thought about it for a while and I think it should
work, but I have to say it makes me slightly uneasy.
If we could could ensure both land on the same page it would be possible
to fix in a nicer way, but thats not really possible. Especially not in
any way thats backpatchable.
Unless somebody has a better idea I am going to write a patch for that.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-10-05 19:56:40 -0400, Tom Lane wrote:
I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.
I couldn't really think of any realistic method to fix this other than
update in place. I thought about it for a while and I think it should
work, but I have to say it makes me slightly uneasy.
I looked through the code a bit, and I think we should be able to make
this work, but note there are also places that update indcheckxmin using
heap_update, and that's just as dangerous as changing the other two
flags via heap_update, if you don't have exclusive lock on the table.
This is going to need some careful thought, because we currently expect
that such an update will set the pg_index row's xmin to the current XID,
which is something an in-place update can *not* do. I think this is a
non-problem during construction of a new index, since the xmin ought to
be the current XID already anyway, but it's less clear what to do in
REINDEX. In the short term there may be no problem since REINDEX takes
exclusive lock on the parent table anyway (and hence could safely do
a transactional update) but if we ever want REINDEX CONCURRENTLY we'll
need a better answer.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-11-26 22:44:49 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-10-05 19:56:40 -0400, Tom Lane wrote:
I think this could possibly be fixed by using nontransactional
update-in-place when we're trying to change indisvalid and/or
indisready, but I've not really thought through the details.I couldn't really think of any realistic method to fix this other than
update in place. I thought about it for a while and I think it should
work, but I have to say it makes me slightly uneasy.I looked through the code a bit, and I think we should be able to make
this work, but note there are also places that update indcheckxmin using
heap_update, and that's just as dangerous as changing the other two
flags via heap_update, if you don't have exclusive lock on the table.
This is going to need some careful thought, because we currently expect
that such an update will set the pg_index row's xmin to the current XID,
which is something an in-place update can *not* do. I think this is a
non-problem during construction of a new index, since the xmin ought to
be the current XID already anyway, but it's less clear what to do in
REINDEX. In the short term there may be no problem since REINDEX takes
exclusive lock on the parent table anyway (and hence could safely do
a transactional update) but if we ever want REINDEX CONCURRENTLY we'll
need a better answer.
Isn't setting indexcheckxmin already problematic in the case of CREATE
INDEX CONCURRENTLY? index_build already runs in a separate transaction
there.
I am really surprised that we haven't seen any evidence of a problem
with this. On a database with a busy & bigger catalog that ought to be
a real problem.
I wonder whether we couldn't fix it by introducing a variant/wrapper of
heap_fetch et al. that follows t_ctid if the tuple became invisible
"recently". That ought to be able to fix most of these issues in a
pretty general fashion.
That would make a nice implementation of REINDEX CONCURRENTLY easier as
well...
Btw, even if we manage to find a sensible fix for this I would suggest
postponing it after the next back branch release.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-26 22:44:49 -0500, Tom Lane wrote:
I looked through the code a bit, and I think we should be able to make
this work, but note there are also places that update indcheckxmin using
heap_update, and that's just as dangerous as changing the other two
flags via heap_update, if you don't have exclusive lock on the table.
Isn't setting indexcheckxmin already problematic in the case of CREATE
INDEX CONCURRENTLY? index_build already runs in a separate transaction
there.
Yeah, you are right, except that AFAICS indcheckxmin is really only
needed for regular non-concurrent CREATE INDEX, which needs it because
it commits without waiting for readers that might be bothered by broken
HOT chains. In a concurrent CREATE INDEX, we handle that problem by
waiting out all such readers before setting indisvalid. So the
concurrent code path should not be bothering to set indcheckxmin at all,
I think. (This is underdocumented.)
Looking closer at the comment in reindex_index, what it's really full of
angst about is that simple_heap_update will update the tuple's xmin
*when we would rather that it didn't*. So switching to update-in-place
there will solve a problem, not create one.
In short, all flag changes in pg_index should be done by
update-in-place, and the tuple's xmin will never change from the
original creating transaction (until frozen).
What we want the xmin to do, for indcheckxmin purposes, is reflect the
time at which the index started to be included in HOT-safety decisions.
Since we're never going to move the xmin, that means that *we cannot
allow REINDEX to mark a dead index live again*. Once DROP INDEX
CONCURRENTLY has reached the final state, you can't revalidate the index
by reindexing it, you'll have to drop it and then make a brand new one.
That seems like a pretty minor compromise.
What I propose to do next is create a patch for HEAD that includes a
new pg_index flag column, since I think the logic will be clearer
with that. Once we're happy with that, we can back-port the patch
into a form that uses the existing flag columns.
Anybody feel like bikeshedding the flag column name? I'm thinking
"indislive" but maybe there's a better name.
Note: I'm not impressed by the proposal to replace these columns with
a single integer flag column. Aside from any possible incompatibility
with existing client code, it just isn't going to be easy to read the
index's state manually if we do that. We could maybe dodge that
complaint with a char (pseudo-enum) status column but I don't think
that will simplify the code at all, and it's got the same or worse
compatibility issues.
Btw, even if we manage to find a sensible fix for this I would suggest
postponing it after the next back branch release.
AFAICS this is a data loss/corruption problem, and as such a "must fix".
If we can't have it done by next week, I'd rather postpone the releases
until it is done.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-11-27 13:45:08 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-26 22:44:49 -0500, Tom Lane wrote:
I looked through the code a bit, and I think we should be able to make
this work, but note there are also places that update indcheckxmin using
heap_update, and that's just as dangerous as changing the other two
flags via heap_update, if you don't have exclusive lock on the table.Isn't setting indexcheckxmin already problematic in the case of CREATE
INDEX CONCURRENTLY? index_build already runs in a separate transaction
there.Yeah, you are right, except that AFAICS indcheckxmin is really only
needed for regular non-concurrent CREATE INDEX, which needs it because
it commits without waiting for readers that might be bothered by broken
HOT chains. In a concurrent CREATE INDEX, we handle that problem by
waiting out all such readers before setting indisvalid. So the
concurrent code path should not be bothering to set indcheckxmin at all,
I think. (This is underdocumented.)
Seems to be correct to me.
Looking closer at the comment in reindex_index, what it's really full of
angst about is that simple_heap_update will update the tuple's xmin
*when we would rather that it didn't*. So switching to update-in-place
there will solve a problem, not create one.
It strikes me that the whole idea of reusing xmin when indexcheckxmin is
set is overly clever and storing the xid itself would have be been
better... Too late though.
In short, all flag changes in pg_index should be done by
update-in-place, and the tuple's xmin will never change from the
original creating transaction (until frozen).
Hm. That doesn't sound that easy to guarantee. Several ALTER TABLE and
ALTER INDEX operations are expected to work transactionally right
now. As far as I see there is nothing that prohibits a indexcheckxmin
requiring index to be altered while indexcheckxmin is still required?
What we want the xmin to do, for indcheckxmin purposes, is reflect the
time at which the index started to be included in HOT-safety decisions.
Since we're never going to move the xmin, that means that *we cannot
allow REINDEX to mark a dead index live again*. Once DROP INDEX
CONCURRENTLY has reached the final state, you can't revalidate the index
by reindexing it, you'll have to drop it and then make a brand new one.
That seems like a pretty minor compromise.
That would be a regression compared with the current state though. We
have officially documented REINDEX as a way out of INVALID indexes...
If we store the xid of the reindexing transaction there that might be
pessimal (because there should be not HOT safety problems) but should
always be correct, or am I missing something?
What I propose to do next is create a patch for HEAD that includes a
new pg_index flag column, since I think the logic will be clearer
with that. Once we're happy with that, we can back-port the patch
into a form that uses the existing flag columns.Anybody feel like bikeshedding the flag column name? I'm thinking
"indislive" but maybe there's a better name.
I personally would slightly favor indisdead instead...
Btw, even if we manage to find a sensible fix for this I would suggest
postponing it after the next back branch release.AFAICS this is a data loss/corruption problem, and as such a "must fix".
If we can't have it done by next week, I'd rather postpone the releases
until it is done.
Ok, just seemed like a rather complex fix in a short time for something
that seemingly hasn't been noticed since 8.3. I am a bit worried about
introducing something worse while fixing this.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 13:45:08 -0500, Tom Lane wrote:
In short, all flag changes in pg_index should be done by
update-in-place, and the tuple's xmin will never change from the
original creating transaction (until frozen).
Hm. That doesn't sound that easy to guarantee. Several ALTER TABLE and
ALTER INDEX operations are expected to work transactionally right
now. As far as I see there is nothing that prohibits a indexcheckxmin
requiring index to be altered while indexcheckxmin is still required?
I said "in pg_index". There is no reason to ever alter an index's
pg_index entry transactionally, because we don't support redefining
the index columns. The stuff you are allowed to ALTER is pretty much
irrelevant to the index's life as an index.
What we want the xmin to do, for indcheckxmin purposes, is reflect the
time at which the index started to be included in HOT-safety decisions.
Since we're never going to move the xmin, that means that *we cannot
allow REINDEX to mark a dead index live again*.
That would be a regression compared with the current state though. We
have officially documented REINDEX as a way out of INVALID indexes...
It's a way out of failed CREATE operations. If DROP fails at the last
step, you won't be able to go back, but why would you want to? Just
do the DROP again.
Anybody feel like bikeshedding the flag column name? I'm thinking
"indislive" but maybe there's a better name.
I personally would slightly favor indisdead instead...
Meh ... the other two flags are positive, in the sense of
true-is-the-normal-state, so I thought this one should be too.
I had also toyed with "indishot", to reflect the idea that this controls
whether the index is included in HOT-safety decisions, but that seems
maybe a little too cute.
Btw, even if we manage to find a sensible fix for this I would suggest
postponing it after the next back branch release.
AFAICS this is a data loss/corruption problem, and as such a "must fix".
If we can't have it done by next week, I'd rather postpone the releases
until it is done.
Ok, just seemed like a rather complex fix in a short time for something
that seemingly hasn't been noticed since 8.3. I am a bit worried about
introducing something worse while fixing this.
Hm? The fact that the DROP patch broke CREATE INDEX CONCURRENTLY is a
new and very nasty bug in 9.2. I would agree with you if we were
considering the unsafe-row-update problem alone, but it seems like we
might as well fix both aspects while we're looking at this code.
There is a question of whether we should risk trying to back-patch the
in-place-update changes further than 9.2. Given the lack of complaints
I'm inclined not to, but could be persuaded differently.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-11-27 14:41:34 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 13:45:08 -0500, Tom Lane wrote:
In short, all flag changes in pg_index should be done by
update-in-place, and the tuple's xmin will never change from the
original creating transaction (until frozen).Hm. That doesn't sound that easy to guarantee. Several ALTER TABLE and
ALTER INDEX operations are expected to work transactionally right
now. As far as I see there is nothing that prohibits a indexcheckxmin
requiring index to be altered while indexcheckxmin is still required?I said "in pg_index". There is no reason to ever alter an index's
pg_index entry transactionally, because we don't support redefining
the index columns. The stuff you are allowed to ALTER is pretty much
irrelevant to the index's life as an index.
Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
... USING someindex ; is done? Also I think indoption might be written
to as well.
What we want the xmin to do, for indcheckxmin purposes, is reflect the
time at which the index started to be included in HOT-safety decisions.
Since we're never going to move the xmin, that means that *we cannot
allow REINDEX to mark a dead index live again*.That would be a regression compared with the current state though. We
have officially documented REINDEX as a way out of INVALID indexes...It's a way out of failed CREATE operations. If DROP fails at the last
step, you won't be able to go back, but why would you want to? Just
do the DROP again.
Oh, sorry, misunderstood you.
Anybody feel like bikeshedding the flag column name? I'm thinking
"indislive" but maybe there's a better name.I personally would slightly favor indisdead instead...
Meh ... the other two flags are positive, in the sense of
true-is-the-normal-state, so I thought this one should be too.
Good point.
I had also toyed with "indishot", to reflect the idea that this controls
whether the index is included in HOT-safety decisions, but that seems
maybe a little too cute.
indislive seems better than that, yes.
Btw, even if we manage to find a sensible fix for this I would suggest
postponing it after the next back branch release.AFAICS this is a data loss/corruption problem, and as such a "must fix".
If we can't have it done by next week, I'd rather postpone the releases
until it is done.Ok, just seemed like a rather complex fix in a short time for something
that seemingly hasn't been noticed since 8.3. I am a bit worried about
introducing something worse while fixing this.Hm? The fact that the DROP patch broke CREATE INDEX CONCURRENTLY is a
new and very nasty bug in 9.2. I would agree with you if we were
considering the unsafe-row-update problem alone, but it seems like we
might as well fix both aspects while we're looking at this code.
There is a question of whether we should risk trying to back-patch the
in-place-update changes further than 9.2. Given the lack of complaints
I'm inclined not to, but could be persuaded differently.
Oh, I only was talking about the inplace changes. The DROP INDEX
CONCURRENTLY breakage definitely needs to get backpatched.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 14:41:34 -0500, Tom Lane wrote:
The stuff you are allowed to ALTER is pretty much
irrelevant to the index's life as an index.
Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
... USING someindex ; is done? Also I think indoption might be written
to as well.
Ugh, you're right. Somebody wasn't paying attention when those ALTER
commands were added.
We could probably alleviate the consequences of this by having those
operations reset indcheckxmin if the tuple's old xmin is below the
GlobalXmin horizon. That's something for later though --- it's not
a data corruption issue, it just means that the index might unexpectedly
not be used for queries for a little bit after an ALTER.
It strikes me that the whole idea of reusing xmin when indexcheckxmin is
set is overly clever and storing the xid itself would have be been
better... Too late though.
Well, the reason for that is documented in README.HOT: if the XID were
in an ordinary column there'd be no nice way to get it frozen when it
gets too old. As-is, VACUUM takes care of that problem automatically.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 14:41:34 -0500, Tom Lane wrote:
The stuff you are allowed to ALTER is pretty much
irrelevant to the index's life as an index.Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
... USING someindex ; is done? Also I think indoption might be written
to as well.Ugh, you're right. Somebody wasn't paying attention when those ALTER
commands were added.We could probably alleviate the consequences of this by having those
operations reset indcheckxmin if the tuple's old xmin is below the
GlobalXmin horizon. That's something for later though --- it's not
a data corruption issue, it just means that the index might unexpectedly
not be used for queries for a little bit after an ALTER.
mark_index_clustered() does the same btw, its not a problem in the
CLUSTER ... USING ...; case because that creates a new pg_index entry
anyway but in the ALTER TABLE one thats not the case.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
... USING someindex ; is done? Also I think indoption might be written
to as well.
Ugh, you're right. Somebody wasn't paying attention when those ALTER
commands were added.
On closer look, indoption is never updated --- perhaps you were thinking
about pg_class.reloptions. indisprimary, indimmediate, and
indisclustered are all subject to post-creation updates though.
We could probably alleviate the consequences of this by having those
operations reset indcheckxmin if the tuple's old xmin is below the
GlobalXmin horizon. That's something for later though --- it's not
a data corruption issue, it just means that the index might unexpectedly
not be used for queries for a little bit after an ALTER.
mark_index_clustered() does the same btw, its not a problem in the
CLUSTER ... USING ...; case because that creates a new pg_index entry
anyway but in the ALTER TABLE one thats not the case.
After further study I think the situation is that
(1) The indisvalid/indisready/indislive updates in CREATE/DROP INDEX
CONCURRENTLY can, and must, be done in-place since we don't have
exclusive lock on the parent table.
(2) All the other updates can be transactional because we hold
sufficient locks to ensure that nothing bad will happen. The proposed
reductions in ALTER TABLE lock strength would break this in several
cases, but that's a problem for another day.
Attached is a very preliminary draft patch for this. I've not addressed
the question of whether we can clear indcheckxmin during transactional
updates of pg_index rows, but I think it covers everything else talked
about in this thread.
regards, tom lane
Attachments:
index-flags-fix.patchtext/x-patch; charset=us-asciiDownload+515-320
On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
... USING someindex ; is done? Also I think indoption might be written
to as well.Ugh, you're right. Somebody wasn't paying attention when those ALTER
commands were added.On closer look, indoption is never updated --- perhaps you were thinking
about pg_class.reloptions. indisprimary, indimmediate, and
indisclustered are all subject to post-creation updates though.
Yea, I haven't really checked what inoption actually does.
We could probably alleviate the consequences of this by having those
operations reset indcheckxmin if the tuple's old xmin is below the
GlobalXmin horizon. That's something for later though --- it's not
a data corruption issue, it just means that the index might unexpectedly
not be used for queries for a little bit after an ALTER.mark_index_clustered() does the same btw, its not a problem in the
CLUSTER ... USING ...; case because that creates a new pg_index entry
anyway but in the ALTER TABLE one thats not the case.After further study I think the situation is that
(1) The indisvalid/indisready/indislive updates in CREATE/DROP INDEX
CONCURRENTLY can, and must, be done in-place since we don't have
exclusive lock on the parent table.(2) All the other updates can be transactional because we hold
sufficient locks to ensure that nothing bad will happen. The proposed
reductions in ALTER TABLE lock strength would break this in several
cases, but that's a problem for another day.
Attached is a very preliminary draft patch for this. I've not addressed
the question of whether we can clear indcheckxmin during transactional
updates of pg_index rows, but I think it covers everything else talked
about in this thread.
Looks good on a quick lookthrough. Will play a bit more once the
indexcheckxmin stuff is sorted out.
Some comments:
- INDEX_DROP_CLEAR_READY clears indislive, perhasp INDEX_DROP_SET_DEAD
or NOT_ALIVE is more appropriate?
- I noticed while trying my old isolationtester test that
heap_update_inplace disregards any locks on the tuple. I don't really
see a scenario where this is problematic right now, seems a bit
dangerous for the future though.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
Attached is a very preliminary draft patch for this. I've not addressed
the question of whether we can clear indcheckxmin during transactional
updates of pg_index rows, but I think it covers everything else talked
about in this thread.
Looks good on a quick lookthrough. Will play a bit more once the
indexcheckxmin stuff is sorted out.
Some comments:
- INDEX_DROP_CLEAR_READY clears indislive, perhasp INDEX_DROP_SET_DEAD
or NOT_ALIVE is more appropriate?
I changed it to SET_DEAD. Also, on further reflection, I took your
advice to use macros instead of direct tests of the flag columns where
possible.
- I noticed while trying my old isolationtester test that
heap_update_inplace disregards any locks on the tuple. I don't really
see a scenario where this is problematic right now, seems a bit
dangerous for the future though.
I think this should be all right --- we have at least
ShareUpdateExclusiveLock on the table and the index before we do
anything, so nobody else should be fooling with its pg_index entry.
Attached is an updated patch for HEAD that I think is about ready to go.
I'll start making a back-patchable version shortly.
regards, tom lane
Attachments:
index-flags-fix-2.patchtext/x-patch; charset=us-asciiDownload+693-485
On 2012-11-28 14:09:11 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-11-27 23:46:58 -0500, Tom Lane wrote:
Attached is a very preliminary draft patch for this. I've not addressed
the question of whether we can clear indcheckxmin during transactional
updates of pg_index rows, but I think it covers everything else talked
about in this thread.
- I noticed while trying my old isolationtester test that
heap_update_inplace disregards any locks on the tuple. I don't really
see a scenario where this is problematic right now, seems a bit
dangerous for the future though.I think this should be all right --- we have at least
ShareUpdateExclusiveLock on the table and the index before we do
anything, so nobody else should be fooling with its pg_index entry.Attached is an updated patch for HEAD that I think is about ready to go.
I'll start making a back-patchable version shortly.
Looks good!
One minor thing I haven't noticed earlier: Perhaps we should also skip
over invalid indexes in transformTableLikeClause's
CREATE_TABLE_LIKE_INDEXES case?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
One minor thing I haven't noticed earlier: Perhaps we should also skip
over invalid indexes in transformTableLikeClause's
CREATE_TABLE_LIKE_INDEXES case?
I left that as-is intentionally: the fact that an index isn't valid
doesn't prevent us from cloning it. A relevant data point is that
pg_dump doesn't care whether indexes are valid or not --- it'll dump
their definitions anyway.
I agree it's a judgment call, though. Anybody want to argue for the
other position?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-11-28 17:42:18 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
One minor thing I haven't noticed earlier: Perhaps we should also skip
over invalid indexes in transformTableLikeClause's
CREATE_TABLE_LIKE_INDEXES case?I left that as-is intentionally: the fact that an index isn't valid
doesn't prevent us from cloning it. A relevant data point is that
pg_dump doesn't care whether indexes are valid or not --- it'll dump
their definitions anyway.I agree it's a judgment call, though. Anybody want to argue for the
other position?
Hm. Seems odd to include indexes that are being dropped concurrently at
that moment. But then, we can't really detect that situation and as you
say its consistent with pg_dump...
Hm.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers