Tricky bugs in concurrent index build

Started by Tom Laneover 19 years ago59 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I see fairly nasty problems in the concurrent-index patch when it's
trying to build a unique index. I think it's solvable but want some
more eyeballs on my reasoning.

Look at the code in IndexBuildHeapScan where we are deciding whether or
not to include a tuple in the index ("indexIt") and also whether or not
to include it in the uniqueness check ("tupleIsAlive"). In a normal
non-concurrent build, we have to include recently-dead tuples in the
index because transactions started before ours might try to use the
index after we finish it. (Remember system catalogs generally operate
on SnapshotNow, so a query could use a newly created index even though
it will be run with a serializable snapshot much older than the index.)
So we have to put tuples into the index if any active transaction might
wish to see those tuples. OTOH, we should exclude dead tuples from the
uniqueness check: the uniqueness constraint ought to be across
currently-valid tuples only. In particular, for tuples previously
created or deleted by our own transaction, we certainly must include
created ones and not include deleted ones in the uniqueness check.

In the past, the only way we could see HEAPTUPLE_INSERT_IN_PROGRESS
or HEAPTUPLE_DELETE_IN_PROGRESS was for tuples created/deleted by our
own transaction, and so the actions taken by IndexBuildHeapScan are
to include in the index in both cases, but exclude DELETE_IN_PROGRESS
tuples from the uniqueness check.

This does not work for a concurrent build, though, because if the
in-progress delete is from another transaction, it could roll back after
we look. In that case we have an entry that is in the index and has
escaped the uniqueness check. If it conflicts with another tuple also
entered into the index in the first pass, we'll never notice that.

I think we can solve this by having IndexBuildHeapScan not index
DELETE_IN_PROGRESS tuples if it's doing a concurrent build. The problem
of old transactions trying to use the index does not exist, because
we'll wait 'em out before marking the index valid, so we need not
worry about preserving validity for old snapshots. And if the deletion
does in fact roll back, we'll insert the tuple during the second pass,
and catch any uniqueness violation at that point.

But wait, there's more: in the patch as it stands, the second pass
over the table ignores DELETE_IN_PROGRESS tuples, which is wrong.
It's entirely possible for a tuple that is RECENTLY_DEAD or
DELETE_IN_PROGRESS to have no entry in the index, if it was inserted
during the first pass, and then the deletion occurred after the first
pass (and either has or hasn't committed yet). If we ignore
DELETE_IN_PROGRESS and then the deleter rolls back, the tuple never
gets into the index at all. Furthermore, the patch also tries to
insert RECENTLY_DEAD tuples, which is good for MVCC coverage, but wrong
for uniqueness checking --- keep in mind that in the second pass,
we are just doing normal index insertions, and so anything we insert
into the index will be uniqueness-checked as though still alive.
We could get a uniqueness failure that should not occur, eg from
trying to insert the old version of a recently-updated row.

What I think we can do about this is to include DELETE_IN_PROGRESS
tuples in the set of candidate tuples to insert in the second pass.
During the merge step that verifies whether the tuple is already
in the index, if we find that it's not, then we must wait for the
deleter to commit or roll back. If the deleter commits then we
ignore the tuple. If the deleter rolls back then we have to insert
the tuple in the index. (I think we have to actually take a FOR
UPDATE or possibly FOR SHARE lock on the tuple while we do this,
else we have race conditions against someone else starting a new
deletion attempt on the tuple.) In the commit case we've essentially
waited long enough to transform DELETE_IN_PROGRESS into RECENTLY_DEAD,
and for both of those statuses we are not going to insert into the
index for fear of causing a false uniqueness violation. What that
means is that after we finish the second pass, we need *another*
wait-for-everyone-to-die before we can mark the index valid. Otherwise
we have the same MVCC problem that someone might try to use the index
for a query with a snapshot old enough that it should be able to see
the not-inserted tuple.

Have I missed anything? This is tricky stuff.

In any case it's clear that the patch still needs major work.
Greg, do you have cycles to spare now?

regards, tom lane

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Tricky bugs in concurrent index build

Tom Lane <tgl@sss.pgh.pa.us> writes:

I think we can solve this by having IndexBuildHeapScan not index
DELETE_IN_PROGRESS tuples if it's doing a concurrent build.

Sure

It's entirely possible for a tuple that is RECENTLY_DEAD or
DELETE_IN_PROGRESS to have no entry in the index, if it was inserted
during the first pass, and then the deletion occurred after the first
pass (and either has or hasn't committed yet).

Egads. That's nasty indeed.

Furthermore, the patch also tries to insert RECENTLY_DEAD tuples, which is
good for MVCC coverage, but wrong for uniqueness checking --- keep in mind
that in the second pass, we are just doing normal index insertions, and so
anything we insert into the index will be uniqueness-checked as though still
alive. We could get a uniqueness failure that should not occur, eg from
trying to insert the old version of a recently-updated row.

Hm, I hadn't absorbed the purpose of isAlive and the distinction between live
for uniqueness checks and live for index build purposes.

Is it not possible to brute force this adding an AM method to insert without
the uniqueness check? That would mean the index build would fail even if the
transaction eventually aborts though. (or even if it has already aborted?)

[ extended description of complex footwork involving more waiting while
holding locks ]

Have I missed anything? This is tricky stuff.

Wow, that seems pretty unsatisfactory, all the waiting and locking sounds
awful. If you have a lot of update transactions starting continuously you
could keep bumping into this situation and repeatedly have to wait for new
transactions to end.

It also seems like a lot of code :(

In any case it's clear that the patch still needs major work.
Greg, do you have cycles to spare now?

I do. But I'll have to spend some time just rereading the code and your
comments to convince myself that all this waiting and locking is the best
solution.

--
greg

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Tricky bugs in concurrent index build

Greg Stark <gsstark@mit.edu> writes:

Is it not possible to brute force this adding an AM method to insert without
the uniqueness check?

Hm. Actually there already is a feature of aminsert to allow
suppressing the unique check, but I'm not sure whether using it for
RECENTLY_DEAD tuples helps. Seems like we have to wait to see whether
DELETE_IN_PROGRESS deleters commit in any case.

Have I missed anything? This is tricky stuff.

Wow, that seems pretty unsatisfactory, all the waiting and locking sounds
awful.

Yeah, I'm very unhappy. The whole idea may be going down in flames :-(
It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

regards, tom lane

#4Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#3)
Re: Tricky bugs in concurrent index build

Wow, that seems pretty unsatisfactory, all the waiting and locking sounds
awful.

Yeah, I'm very unhappy. The whole idea may be going down in flames :-(
It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

I believe there would be. Most PostgreSQL users I run into, develop in
production, which means being able to add an index they forgot when
doing query analysis.

Most of the time (I would say >95%) this is not a unique index.

Sincerely,

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#4)
Re: Tricky bugs in concurrent index build

"Joshua D. Drake" <jd@commandprompt.com> writes:

It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

I believe there would be. Most PostgreSQL users I run into, develop in
production, which means being able to add an index they forgot when
doing query analysis.

True, unique constraints are usually something you should get right to
start with. But it'll be annoying if we can do everything BUT that :-(

regards, tom lane

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#5)
Re: Tricky bugs in concurrent index build

Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

I believe there would be. Most PostgreSQL users I run into, develop in
production, which means being able to add an index they forgot when
doing query analysis.

True, unique constraints are usually something you should get right to
start with. But it'll be annoying if we can do everything BUT that :-(

Agreed, but better then nothing :).

Sincerely,

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: Tricky bugs in concurrent index build

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

Is it not possible to brute force this adding an AM method to insert without
the uniqueness check?

Hm. Actually there already is a feature of aminsert to allow
suppressing the unique check, but I'm not sure whether using it for
RECENTLY_DEAD tuples helps. Seems like we have to wait to see whether
DELETE_IN_PROGRESS deleters commit in any case.

Hm, actually don't we need both of these to make it work? We need to see
whether the deleter commits to determine whether to enforce the uniqueness
constraint on the missing tuple.

. If the deleter aborts we need to insert the tuple normally including
enforcing the constraint.

. If the deleter commits then we still need to insert the tuple but without
enforcing the constraint in case some old transaction queries the index

What would happen if we just insert DELETE_IN_PROGRESS tuples normally? Would
the only risk be that the index build would fail with a spurious unique
constraint violation? I suppose it would be pretty common though given how
updates work.

Incidentally does this point out a problem with the planner depending on
unique constraints? For old (serializable) transactions the unique index
exists and the constraint is enforced but they can still find tuples that were
deleted before the index was built and might violate the unique constraint.
Even if we had the plan invalidation mechanism that's frequently mentioned you
would still have to check constraints against your snapshot and not just
snapshotnow for planning purposes.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: Tricky bugs in concurrent index build

Greg Stark <gsstark@mit.edu> writes:

What would happen if we just insert DELETE_IN_PROGRESS tuples normally? Would
the only risk be that the index build would fail with a spurious unique
constraint violation? I suppose it would be pretty common though given how
updates work.

Yeah, that's the problem: if we can't support UPDATEs that don't change
the to-be-unique column, it ain't much of a feature.

Incidentally does this point out a problem with the planner depending on
unique constraints?

Good point. It doesn't depend on them yet, but we've been hoping to
make it do so once we have plan invalidation capability. We shall have
to think very carefully about timing semantics of all that.

regards, tom lane

#9Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#5)
Re: Tricky bugs in concurrent index build

Ühel kenal päeval, T, 2006-08-22 kell 16:48, kirjutas Tom Lane:

"Joshua D. Drake" <jd@commandprompt.com> writes:

It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

I believe there would be. Most PostgreSQL users I run into, develop in
production, which means being able to add an index they forgot when
doing query analysis.

True, unique constraints are usually something you should get right to
start with. But it'll be annoying if we can do everything BUT that :-(

Maybe we could find a way to build a non-unique index first and then
convert it to a unique one later, in yet another pass ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#10Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#9)
Re: Tricky bugs in concurrent index build

Ühel kenal päeval, K, 2006-08-23 kell 11:05, kirjutas Hannu Krosing:

Ühel kenal päeval, T, 2006-08-22 kell 16:48, kirjutas Tom Lane:

"Joshua D. Drake" <jd@commandprompt.com> writes:

It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

I believe there would be. Most PostgreSQL users I run into, develop in
production, which means being able to add an index they forgot when
doing query analysis.

True, unique constraints are usually something you should get right to
start with. But it'll be annoying if we can do everything BUT that :-(

Maybe we could find a way to build a non-unique index first and then
convert it to a unique one later, in yet another pass ?

Or even add ALTER INDEX myindex ADD/DROP UNIQUE; command

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#11Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#3)
Re: Tricky bugs in concurrent index build

Is it not possible to brute force this adding an AM method to insert

without the uniqueness check?

Hm. Actually there already is a feature of aminsert to allow
suppressing the unique check, but I'm not sure whether using
it for RECENTLY_DEAD tuples helps. Seems like we have to
wait to see whether DELETE_IN_PROGRESS deleters commit in any case.

Um, but if we wait for the DELETE_IN_PROGRESS tuple, after the wait we
can
add it eighter with or without the unique check (depending on
commit/abort).

Then at least we don't need to wait in a 3rd pass for readers ?

Andreas

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Tricky bugs in concurrent index build

Tom Lane <tgl@sss.pgh.pa.us> writes:

What I think we can do about this is to include DELETE_IN_PROGRESS
tuples in the set of candidate tuples to insert in the second pass.
During the merge step that verifies whether the tuple is already
in the index, if we find that it's not, then we must wait for the
deleter to commit or roll back. If the deleter commits then we
ignore the tuple. If the deleter rolls back then we have to insert
the tuple in the index. (I think we have to actually take a FOR
UPDATE or possibly FOR SHARE lock on the tuple while we do this,
else we have race conditions against someone else starting a new
deletion attempt on the tuple.)

Hm, my first thought was to just try to get a lock on the record which would
inherently wait until the deleter commits or aborts.

But then wouldn't we have deadlock risks? If we come across these records in a
different order from someone else (possibly even the deleter) who also wants
to lock them? Or would it be safe to lock and release them one by one so we
only every hold one lock at a time?

I'm also pondering whether it might be worth saving up all the
DELETE_IN_PROGRESS tuples in a second tuplesort and processing them all in a
third phase. That seems like it would reduce the amount of waiting that might
be involved. The fear I have though is that this third phase could become
quite large.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Tricky bugs in concurrent index build

Tom Lane <tgl@sss.pgh.pa.us> writes:

In the past, the only way we could see HEAPTUPLE_INSERT_IN_PROGRESS
or HEAPTUPLE_DELETE_IN_PROGRESS was for tuples created/deleted by our
own transaction, and so the actions taken by IndexBuildHeapScan are
to include in the index in both cases, but exclude DELETE_IN_PROGRESS
tuples from the uniqueness check.

This does not work for a concurrent build, though, because if the
in-progress delete is from another transaction, it could roll back after
we look. In that case we have an entry that is in the index and has
escaped the uniqueness check. If it conflicts with another tuple also
entered into the index in the first pass, we'll never notice that.

I think we can solve this by having IndexBuildHeapScan not index
DELETE_IN_PROGRESS tuples if it's doing a concurrent build. The problem
of old transactions trying to use the index does not exist, because
we'll wait 'em out before marking the index valid, so we need not
worry about preserving validity for old snapshots. And if the deletion
does in fact roll back, we'll insert the tuple during the second pass,
and catch any uniqueness violation at that point.

Actually that's a bit of a pain. This function is called from the AM functions
so it doesn't have any access to whether it's doing a concurrent build or not.
I would have to stuff a flag in the indexInfo or change the AM api.

The API is pretty bizarre around this. The core calls the AM to build the
index, the AM calls back this function in core to do the scan, then this
function calls back into the AM to handle the inserts.

It seems like it would be simpler to leave the core in charge the whole time.
It would call an AM method to initialize state, then call an AM method for
each tuple that should be indexed, and lastly call a finalize method.

Also, I think there may be another problem here with INSERT_IN_PROGRESS. I'm
currently testing unique index builds while pgbench is running and I'm
consistently getting unique index violations from phase 1. I think what's
happening is that insert that haven't committed yet (and hence ought to be
invisible to us) are hitting unique constraint violations against older
versions that are still alive to us.

--
greg

#14Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Tricky bugs in concurrent index build

[Sorry for the duplicate -- I accidentally sent the previous before I was
finished editing it]

Tom Lane <tgl@sss.pgh.pa.us> writes:

I think we can solve this by having IndexBuildHeapScan not index
DELETE_IN_PROGRESS tuples if it's doing a concurrent build. The problem
of old transactions trying to use the index does not exist, because
we'll wait 'em out before marking the index valid, so we need not
worry about preserving validity for old snapshots. And if the deletion
does in fact roll back, we'll insert the tuple during the second pass,
and catch any uniqueness violation at that point.

Actually that's a bit of a pain. This function is called from the AM functions
so it doesn't have any access to whether it's doing a concurrent build or not.
I would have to stuff a flag in the indexInfo or change the AM api.

The API is pretty bizarre around this. The core calls the AM to build the
index, the AM calls back this function in core to do the scan, then this
function calls back into the AM to handle the inserts.

It seems like it would be simpler to leave the core in charge the whole time.
It would call an AM method to initialize state, then call an AM method for
each tuple that should be indexed, and lastly call a finalize method.

Also, I think there may be another problem here with INSERT_IN_PROGRESS. I'm
currently testing unique index builds while pgbench is running and I'm
consistently getting unique index violations from phase 1. I think what's
happening is that insert that haven't committed yet (and hence ought to be
invisible to us) are hitting unique constraint violations against older
versions that are still alive to us.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#15Bruce Momjian
bruce@momjian.us
In reply to: Hannu Krosing (#10)
Re: Tricky bugs in concurrent index build

Hannu Krosing <hannu@skype.net> writes:

Ühel kenal päeval, K, 2006-08-23 kell 11:05, kirjutas Hannu Krosing:

Maybe we could find a way to build a non-unique index first and then
convert it to a unique one later, in yet another pass ?

Or even add ALTER INDEX myindex ADD/DROP UNIQUE; command

That would be great. But note that it suffers from precisely the same problem.
If you come across a couple of records with the same key and one of them is
DELETE_IN_PROGRESS then you'll have to wait until you can acquire a sharelock
on it before you can determine if there's a constraint violation.

Hmmm. Or is that true. The problem may be somewhat easier since at least you
can be sure every tuple in the heap is in the index. So if you see a
DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
and failing is reasonable or it's an update in which case maybe it's possible
to detect that they're part of the same chain?

(Actually there is another corner case. a transaction that inserts a value,
then deletes it in the same transaction then inserts that same value again.
Now you have a INSERT_IN_PROGRESS and a DELETE_IN_PROGRESS that conflict but
should be allowed since they come from the same transaction. Hopefully the
ALTER INDEX command would be able to determine they come from the same
transaction.)

In the case of concurrent index builds that's not really safe since you don't
have the other tuples you're conflicting with together at the same time and
even if you did you may or may not have a complete set of them.

Tom's right. This stuff is tricky.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: Tricky bugs in concurrent index build

Greg Stark <gsstark@mit.edu> writes:

But then wouldn't we have deadlock risks? If we come across these records in a
different order from someone else (possibly even the deleter) who also wants
to lock them? Or would it be safe to lock and release them one by one so we
only every hold one lock at a time?

AFAICS we could release the lock as soon as we've inserted the index
entry. (Whether there is any infrastructure to do that is another
question...)

I'm also pondering whether it might be worth saving up all the
DELETE_IN_PROGRESS tuples in a second tuplesort and processing them all in a
third phase. That seems like it would reduce the amount of waiting that might
be involved. The fear I have though is that this third phase could become
quite large.

Actually --- a tuple that is live when we do the "second pass" scan
could well be DELETE_IN_PROGRESS (or even RECENTLY_DEAD) by the time we
do the merge and discover that it hasn't got an index entry. So offhand
I'm thinking that we *must* take a tuple lock on *every* tuple we insert
in stage two. Ugh.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#13)
Re: Tricky bugs in concurrent index build

Greg Stark <gsstark@mit.edu> writes:

It seems like it would be simpler to leave the core in charge the whole time.
It would call an AM method to initialize state, then call an AM method for
each tuple that should be indexed, and lastly call a finalize method.

[ shrug... ] I'm uninterested in refactoring the AM API right now.
We've got enough stuff to deal with before beta, not to mention an
uncommitted bitmap AM patch that it would certainly break.

Also, I think there may be another problem here with INSERT_IN_PROGRESS. I'm
currently testing unique index builds while pgbench is running and I'm
consistently getting unique index violations from phase 1. I think what's
happening is that insert that haven't committed yet (and hence ought to be
invisible to us) are hitting unique constraint violations against older
versions that are still alive to us.

Hmm ... it's certainly highly likely that we could pick up multiple
versions of a row during pass 1, but the uniqueness checker should
notice that some versions are dead? Oooh, no there's a problem:
the tuple we are inserting could become dead in the interval between
when we pick it up and when we put it into the index. So we could
try to put multiple versions of the same row into the uniqueness check.

Right at the moment, unique index builds with this mechanism are looking
unfixably broken :-(. Anyone see any chance at all of making them work?
Maybe we should just cut our losses and go with the nonunique case only.
That one is pretty easy: just stuff everything in the index ...

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#15)
Re: Tricky bugs in concurrent index build

Greg Stark <gsstark@mit.edu> writes:

Hmmm. Or is that true. The problem may be somewhat easier since at least you
can be sure every tuple in the heap is in the index. So if you see a
DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
and failing is reasonable or it's an update in which case maybe it's possible
to detect that they're part of the same chain?

Unless we are willing to lock every single tuple while we insert it,
this seems unfixable to me. Without a lock, the tuple could become
DELETE_IN_PROGRESS immediately after we look at it.

Actually it's worse than that. We could examine a tuple, see that
it's good, include it in the uniqueness check. Then someone updates
the tuple and puts the new version near the end of the table. By
the time we reach that version, it could be committed good. There
is absolutely no way that we could notice an issue without applying
extremely expensive tests to *every* apparently-good tuple.

[ thinks for a bit... ] At least, it seems hopeless if we use
SnapshotNow. Does it help if we use a real snapshot? I'm thinking
pass 1 inserts exactly those tuples that are good according to a
snap taken at its beginning, and then pass 2 considers only tuples
that are good according to a snap taken at *its* beginning. But
having consumed no caffeine yet this morning, I'm not sure I can
spot any flaws that might exist in this idea.

regards, tom lane

#19Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#18)
Re: Tricky bugs in concurrent index build

Ühel kenal päeval, K, 2006-08-23 kell 09:01, kirjutas Tom Lane:

Greg Stark <gsstark@mit.edu> writes:

Hmmm. Or is that true. The problem may be somewhat easier since at least you
can be sure every tuple in the heap is in the index. So if you see a
DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
and failing is reasonable or it's an update in which case maybe it's possible
to detect that they're part of the same chain?

Unless we are willing to lock every single tuple while we insert it,
this seems unfixable to me. Without a lock, the tuple could become
DELETE_IN_PROGRESS immediately after we look at it.

Actually it's worse than that. We could examine a tuple, see that
it's good, include it in the uniqueness check. Then someone updates
the tuple and puts the new version near the end of the table. By
the time we reach that version, it could be committed good.

Perhaps we should scan the index in index order, and set the unique flag
per index page.

That would
a) help us avoid going to heap unless there are multiple entries for
some value and
b) enable us, once all index pages containing pointers to some possibly
duplicate value are checked, to release locks on those tuples.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#18)
Re: Tricky bugs in concurrent index build

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

Hmmm. Or is that true. The problem may be somewhat easier since at least you
can be sure every tuple in the heap is in the index. So if you see a
DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
and failing is reasonable or it's an update in which case maybe it's possible
to detect that they're part of the same chain?

Unless we are willing to lock every single tuple while we insert it,
this seems unfixable to me. Without a lock, the tuple could become
DELETE_IN_PROGRESS immediately after we look at it.

I think there's some confusion here. This above paragraph was taken from some
thoughts about Hannu's suggestion of having a separate ALTER INDEX SET UNIQUE
command. That command might have an advantage over CREATE INDEX CONCURRENTLY
because it knows the index is already complete; it doesn't have to worry about
potential conflicts with tuples that it will only find later in the scan.

Effectively this is equivalent to making CREATE UNIQUE INDEX CONCURRENTLY
three phases. The first two phases would be a regular CREATE INDEX
CONCURRENTLY and the third phase would be what ALTER INDEX SET UNIQUE does
which is scan the index and verify that it's unique.

ALTER INDEX SET UNIQUE would have to perform a similar two-transaction dance
though. It would have to set the index unique, wait until everyone has seen
the new constraint. Then verify that the property is indeed unique, possibly
rolling back the constraint creation if it's not.

That would make the whole process of creating a unique index quite long. On
the plus side it would be a useful command in itself. Doing an index scan
might be pretty slow but if the table is mostly clean of dead and recently
dead tuples it won't have to visit the heap much and should still be much
quicker than building a new index. And it would itself be a concurrent
command.

Actually it's worse than that. We could examine a tuple, see that
it's good, include it in the uniqueness check. Then someone updates
the tuple and puts the new version near the end of the table. By
the time we reach that version, it could be committed good. There
is absolutely no way that we could notice an issue without applying
extremely expensive tests to *every* apparently-good tuple.

I think ALTER INDEX SET UNIQUE would not have this problem. It would only have
to look at tuples using its own snapshot and see if there's a violation. If
there isn't a violation as of its own snapshot then it can be sure later
transactions will preserve this property since the index was always complete
and it waited after creating the constraint.

[ thinks for a bit... ] At least, it seems hopeless if we use
SnapshotNow. Does it help if we use a real snapshot? I'm thinking
pass 1 inserts exactly those tuples that are good according to a
snap taken at its beginning, and then pass 2 considers only tuples
that are good according to a snap taken at *its* beginning. But
having consumed no caffeine yet this morning, I'm not sure I can
spot any flaws that might exist in this idea.

What about tuples that are inserted and committed in the window between the
two phases. Ie, they're RECENTLY_DEAD but not in phase2's snapshot.

Or do you mean we use SatisfiesVacuum to determine what to insert but
SatisfiesSnapshot to determine whether to check uniqueness?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#30)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#33)
#35Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#35)
#37Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#36)
#38Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#36)
#39Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#36)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#39)
#41Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#39)
#42Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#41)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#42)
#44Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#43)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#44)
#46Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#41)
#47Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#43)
#48Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zeugswetter Andreas SB SD (#47)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#47)
#50Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Alvaro Herrera (#48)
#51Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#48)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#50)
#53Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#51)
#54Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#45)
#55Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#31)
#56David Fetter
david@fetter.org
In reply to: Jim Nasby (#53)
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#55)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#57)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#58)