Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Started by Goel, Dhruvalmost 7 years ago15 messageshackers
Jump to latest
#1Goel, Dhruv
goeldhru@amazon.com

Hi,
Currently any DDL operations (Create Indexes, Drop Indexes etc.) when run during an existing concurrent index build on the same table causes the index build to fail with “deadlock detected”. This is a pain-point specially when we want to kick-off multiple concurrent index builds on the same table; the index build will reach phase 3 (consuming resources) and then fail with deadlock errors.

I have a patch that might improve the build times and reduce deadlock occurrences. Is this something the community would be interested in? I might be missing some documentation changes in the patch but wanted to get some feedback on the functional aspect of the patch first.

Problem:
In the Concurrent Index creation implementation there are three waits that are relevant:

1. Wait 1 at start of Phase 2: Postgres waits for all transactions that started before this transaction and conflict with “Share Lock” on this relation. This is to make sure from this point forward all HOT updates to the table will be compatible with the new index.
2. Wait 2 at the start of Phase 3: Postgres waits for all transactions that started before this transaction and conflict with “Share Lock” on this relation.
3. Wait 3 at the end of Phase 3: PG waits for all transactions that started before this transaction primarily because they should not start using the index as they might be using an older snapshot and the index does not have all the entries (missing deleted tuples) for snapshot.

Typically, all the three wait states can cause deadlocks. Deadlocks due to the third wait state is reproduced by transactions that are waiting for a lock to be freed from “CREATE INDEX CONCURRENTLY” will cause deadlocks (primarily DDLs). The former 2 waits are much harder to reproduce with the test case being a Insert/Update/Delete as first statement of the transaction and then another DDL which causes lock escalation.

Proposed Solution:
We remove the third wait state completely from the concurrent index build. When we mark the index as ready, we also mark “indcheckxmin” to true which essentially enforces Postgres to not use this index for older snapshots.

Tests:
Added an isolation test which breaks without the patch. Manual test with a Repeatable Read Transaction that has an older snapshot with a tuple that has been deleted since and not part of the index.

May the force be with you,
Dhruv

Attachments:

patch-cic.patchapplication/octet-stream; name=patch-cic.patchDownload+123-15
#2Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Goel, Dhruv (#1)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Hello,

On Wed, May 15, 2019 at 1:45 PM Goel, Dhruv <goeldhru@amazon.com> wrote:

Proposed Solution:

We remove the third wait state completely from the concurrent index build.
When we mark the index as ready, we also mark “indcheckxmin” to true which
essentially enforces Postgres to not use this index for older snapshots.

I think there is a problem in the proposed solution. When phase 3 is
reached, the index is valid. But it might not contain tuples deleted
just before the reference snapshot was taken. Hence, we wait for those
transactions that might have older snapshot. The TransactionXmin of these
transactions can be greater than the xmin of the pg_index entry for this
index.
Instead of waiting in the third phase, if we just set indcheckxmin as true,
the above transactions will be able to use the index which is wrong.
(because they won't find the recently deleted tuples from the index that
are still live according to their snapshots)

The respective code from get_relation_info:
if (index->indcheckxmin &&

!TransactionIdPrecedes(HeapTupleHeaderGetXmin(indexRelation->rd_indextuple->t_data),
TransactionXmin))
{ /* don't use this index */ }

Please let me know if I'm missing something.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

#3Goel, Dhruv
goeldhru@amazon.com
In reply to: Kuntal Ghosh (#2)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Yes, you are correct. The test case here was that if a tuple is inserted after the reference snapshot is taken in Phase 2 and before the index is marked ready. If this tuple is deleted before the reference snapshot of Phase 3, it will never make it to the index. I have fixed this problem by making pg_index tuple updates transactional (I believe there is no reason why it has to be in place now) so that the xmin of the pg_index tuple is same the xmin of the snapshot in Phase 3.

Attached the amended patch.

From: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Date: Wednesday, May 15, 2019 at 3:45 AM
To: "Goel, Dhruv" <goeldhru@amazon.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Subject: Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Hello,

On Wed, May 15, 2019 at 1:45 PM Goel, Dhruv <goeldhru@amazon.com<mailto:goeldhru@amazon.com>> wrote:

Proposed Solution:
We remove the third wait state completely from the concurrent index build. When we mark the index as ready, we also mark “indcheckxmin” to true which essentially enforces Postgres to not use this index for older snapshots.

I think there is a problem in the proposed solution. When phase 3 is reached, the index is valid. But it might not contain tuples deleted just before the reference snapshot was taken. Hence, we wait for those transactions that might have older snapshot. The TransactionXmin of these transactions can be greater than the xmin of the pg_index entry for this index.
Instead of waiting in the third phase, if we just set indcheckxmin as true, the above transactions will be able to use the index which is wrong. (because they won't find the recently deleted tuples from the index that are still live according to their snapshots)

The respective code from get_relation_info:
if (index->indcheckxmin &&
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(indexRelation->rd_indextuple->t_data), TransactionXmin))
{ /* don't use this index */ }

Please let me know if I'm missing something.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

patch-cic.patchapplication/octet-stream; name=patch-cic.patchDownload+126-20
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Goel, Dhruv (#3)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

"Goel, Dhruv" <goeldhru@amazon.com> writes:

Yes, you are correct. The test case here was that if a tuple is inserted after the reference snapshot is taken in Phase 2 and before the index is marked ready. If this tuple is deleted before the reference snapshot of Phase 3, it will never make it to the index. I have fixed this problem by making pg_index tuple updates transactional (I believe there is no reason why it has to be in place now) so that the xmin of the pg_index tuple is same the xmin of the snapshot in Phase 3.

I think you are mistaken that doing transactional updates in pg_index
is OK. If memory serves, we rely on xmin of the pg_index row for purposes
such as detecting whether a concurrently-created index is safe to use yet.
So a transactional update would restart that clock and result in temporary
denial of service.

regards, tom lane

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Hi,

On June 9, 2019 8:36:37 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Goel, Dhruv" <goeldhru@amazon.com> writes:
I think you are mistaken that doing transactional updates in pg_index
is OK. If memory serves, we rely on xmin of the pg_index row for
purposes
such as detecting whether a concurrently-created index is safe to use
yet.

We could replace that with storing a 64 xid in a normal column nowadays.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#5)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Andres Freund <andres@anarazel.de> writes:

On June 9, 2019 8:36:37 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think you are mistaken that doing transactional updates in pg_index
is OK. If memory serves, we rely on xmin of the pg_index row for
purposes such as detecting whether a concurrently-created index is safe
to use yet.

We could replace that with storing a 64 xid in a normal column nowadays.

Perhaps, but that's a nontrivial change that'd be prerequisite to
doing what's suggested in this thread.

regards, tom lane

#7Goel, Dhruv
goeldhru@amazon.com
In reply to: Tom Lane (#6)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Jun 9, 2019, at 5:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On June 9, 2019 8:36:37 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think you are mistaken that doing transactional updates in pg_index
is OK. If memory serves, we rely on xmin of the pg_index row for
purposes such as detecting whether a concurrently-created index is safe
to use yet.

I took a deeper look regarding this use case but was unable to find more evidence. As part of this patch, we essentially make concurrently-created index safe to use only if transaction started after the xmin of Phase 3. Even today concurrent indexes can not be used for transactions before this xmin because of the wait (which I am trying to get rid of in this patch), is there any other denial of service you are talking about? Both the other states indislive, indisready can be transactional updates as far as I understand. Is there anything more I am missing here?

#8Goel, Dhruv
goeldhru@amazon.com
In reply to: Goel, Dhruv (#7)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Jun 10, 2019, at 1:20 PM, Goel, Dhruv <goeldhru@amazon.com> wrote:

On Jun 9, 2019, at 5:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On June 9, 2019 8:36:37 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think you are mistaken that doing transactional updates in pg_index
is OK. If memory serves, we rely on xmin of the pg_index row for
purposes such as detecting whether a concurrently-created index is safe
to use yet.

I took a deeper look regarding this use case but was unable to find more evidence. As part of this patch, we essentially make concurrently-created index safe to use only if transaction started after the xmin of Phase 3. Even today concurrent indexes can not be used for transactions before this xmin because of the wait (which I am trying to get rid of in this patch), is there any other denial of service you are talking about? Both the other states indislive, indisready can be transactional updates as far as I understand. Is there anything more I am missing here?

Hi,

I did some more concurrency testing here through some python scripts which compare the end state of the concurrently created indexes. I also back-ported this patch to PG 9.6 and ran some custom concurrency tests (Inserts, Deletes, and Create Index Concurrently) which seem to succeed. The intermediate states unfortunately are not easy to test in an automated manner, but to be fair concurrent indexes could never be used for older transactions. Do you have more inputs/ideas on this patch?

Thanks,
Dhruv

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Goel, Dhruv (#8)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Sun, Jun 30, 2019 at 7:30 PM Goel, Dhruv <goeldhru@amazon.com> wrote:

On Jun 10, 2019, at 1:20 PM, Goel, Dhruv <goeldhru@amazon.com> wrote:

On Jun 9, 2019, at 5:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:

On June 9, 2019 8:36:37 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think you are mistaken that doing transactional updates in pg_index
is OK. If memory serves, we rely on xmin of the pg_index row for
purposes such as detecting whether a concurrently-created index is safe
to use yet.

I took a deeper look regarding this use case but was unable to find more evidence. As part of this patch, we essentially make concurrently-created index safe to use only if transaction started after the xmin of Phase 3. Even today concurrent indexes can not be used for transactions before this xmin because of the wait (which I am trying to get rid of in this patch), is there any other denial of service you are talking about? Both the other states indislive, indisready can be transactional updates as far as I understand. Is there anything more I am missing here?

I did some more concurrency testing here through some python scripts which compare the end state of the concurrently created indexes. I also back-ported this patch to PG 9.6 and ran some custom concurrency tests (Inserts, Deletes, and Create Index Concurrently) which seem to succeed. The intermediate states unfortunately are not easy to test in an automated manner, but to be fair concurrent indexes could never be used for older transactions. Do you have more inputs/ideas on this patch?

I noticed that check-world passed several times with this patch
applied, but the most recent CI run failed in multiple-cic:

+error in steps s2i s1i: ERROR: cache lookup failed for index 26303

https://travis-ci.org/postgresql-cfbot/postgresql/builds/555472214

--
Thomas Munro
https://enterprisedb.com

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#9)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Mon, Jul 8, 2019 at 9:51 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I noticed that check-world passed several times with this patch
applied, but the most recent CI run failed in multiple-cic:

+error in steps s2i s1i: ERROR: cache lookup failed for index 26303

https://travis-ci.org/postgresql-cfbot/postgresql/builds/555472214

And in another run, this time on Windows, create_index failed:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46455

--
Thomas Munro
https://enterprisedb.com

#11Goel, Dhruv
goeldhru@amazon.com
In reply to: Thomas Munro (#10)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Hi,

Thank you Thomas.

On Jul 7, 2019, at 3:15 PM, Thomas Munro <thomas.munro@gmail.com<mailto:thomas.munro@gmail.com>> wrote:

On Mon, Jul 8, 2019 at 9:51 AM Thomas Munro <thomas.munro@gmail.com<mailto:thomas.munro@gmail.com>> wrote:
I noticed that check-world passed several times with this patch
applied, but the most recent CI run failed in multiple-cic:

+error in steps s2i s1i: ERROR: cache lookup failed for index 26303

https://travis-ci.org/postgresql-cfbot/postgresql/builds/555472214

And in another run, this time on Windows, create_index failed:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46455

--
Thomas Munro
https://enterprisedb.com
I have attached the revised patch. I ran check-world multiple times on my machine and it seems to succeed now. Do you mind kicking-off the CI build with the latest patch?

Attachments:

patch-cic.patchapplication/octet-stream; name=patch-cic.patchDownload+126-20
ATT00001.htmtext/html; name=ATT00001.htmDownload
#12Thomas Munro
thomas.munro@gmail.com
In reply to: Goel, Dhruv (#11)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Tue, Jul 9, 2019 at 10:33 AM Goel, Dhruv <goeldhru@amazon.com> wrote:

I have attached the revised patch. I ran check-world multiple times on my machine and it seems to succeed now. Do you mind kicking-off the CI build with the latest patch?

Thanks.

It's triggered automatically when you post patches to the thread and
also once a day, though it took ~35 minutes to get around to noticing
your new version due to other activity in other threads, and general
lack of horsepower. I'm planning to fix that with more horses.

It passed on both OSes. See here:

http://cfbot.cputube.org/dhruv-goel.html

--
Thomas Munro
https://enterprisedb.com

#13imai.yoshikazu@fujitsu.com
imai.yoshikazu@fujitsu.com
In reply to: Goel, Dhruv (#8)
RE: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

Hi Dhruv,

On Sun, June 30, 2019 at 7:30 AM, Goel, Dhruv wrote:

On Jun 10, 2019, at 1:20 PM, Goel, Dhruv <goeldhru@amazon.com> wrote:

On Jun 9, 2019, at 5:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:

On June 9, 2019 8:36:37 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think you are mistaken that doing transactional updates in
pg_index is OK. If memory serves, we rely on xmin of the pg_index
row for purposes such as detecting whether a concurrently-created
index is safe to use yet.

I took a deeper look regarding this use case but was unable to find more evidence. As part of this patch, we essentially

make concurrently-created index safe to use only if transaction started after the xmin of Phase 3. Even today concurrent
indexes can not be used for transactions before this xmin because of the wait (which I am trying to get rid of in this
patch), is there any other denial of service you are talking about? Both the other states indislive, indisready can
be transactional updates as far as I understand. Is there anything more I am missing here?

Hi,

I did some more concurrency testing here through some python scripts which compare the end state of the concurrently
created indexes. I also back-ported this patch to PG 9.6 and ran some custom concurrency tests (Inserts, Deletes, and
Create Index Concurrently) which seem to succeed. The intermediate states unfortunately are not easy to test in an
automated manner, but to be fair concurrent indexes could never be used for older transactions. Do you have more
inputs/ideas on this patch?

According to the commit 3c8404649 [1]https://github.com/postgres/postgres/commit/3c84046490bed3c22e0873dc6ba492e02b8b9051#diff-b279fc6d56760ed80ce4178de1401d2c, transactional update in pg_index is not safe in non-MVCC catalog scans before PG9.4.
But it seems to me that we can use transactional update in pg_index after the commit 813fb03155 [2]https://github.com/postgres/postgres/commit/813fb0315587d32e3b77af1051a0ef517d187763#diff-b279fc6d56760ed80ce4178de1401d2c which got rid of SnapshotNow.

If we apply this patch back to 9.3 or earlier, we might need to consider another way or take the Andres suggestion (which I don't understand the way fully though), but which version do you want/do we need to apply this patch?

Also, if we apply this patch in this way, there are several comments to be fixed which state the method of CREATE INDEX CONCURRENTLY.

ex.
[index.c]
/*
* validate_index - support code for concurrent index builds
...
* After completing validate_index(), we wait until all transactions that
* were alive at the time of the reference snapshot are gone; this is
* necessary to be sure there are none left with a transaction snapshot
* older than the reference (and hence possibly able to see tuples we did
* not index). Then we mark the index "indisvalid" and commit. Subsequent
* transactions will be able to use it for queries.
...
valiate_index()
{
}

[1]: https://github.com/postgres/postgres/commit/3c84046490bed3c22e0873dc6ba492e02b8b9051#diff-b279fc6d56760ed80ce4178de1401d2c
[2]: https://github.com/postgres/postgres/commit/813fb0315587d32e3b77af1051a0ef517d187763#diff-b279fc6d56760ed80ce4178de1401d2c

--
Yoshikazu Imai

#14Michael Paquier
michael@paquier.xyz
In reply to: imai.yoshikazu@fujitsu.com (#13)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Mon, Oct 28, 2019 at 05:17:52AM +0000, imai.yoshikazu@fujitsu.com wrote:

According to the commit 3c8404649 [1], transactional update in
pg_index is not safe in non-MVCC catalog scans before PG9.4.
But it seems to me that we can use transactional update in pg_index
after the commit 813fb03155 [2] which got rid of SnapshotNow.

That's actually this part of the patch:
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
And this thread (for commit 3c84046):
/messages/by-id/19082.1349481400@sss.pgh.pa.us

And while looking at this patch, I have doubts that what you are doing
is actually safe either.

If we apply this patch back to 9.3 or earlier, we might need to
consider another way or take the Andres suggestion (which I don't
understand the way fully though), but which version do you want/do
we need to apply this patch?

Per the arguments of upthread, storing a 64-bit XID would require a
catalog change and you cannot backpatch that. I would suggest to keep
this patch focused on HEAD, and keep it as an improvement of the
existing features. Concurrent deadlock risks caused by CCI exist
since the feature came to life.

Also, if we apply this patch in this way, there are several comments
to be fixed which state the method of CREATE INDEX CONCURRENTLY.

Are we sure as well that all the cache lookup failures are addressed?
The CF robot does not complain per its latest status, but are we sure
to be out of the ground here?

The indentation of your patch is wrong in some places by the way.
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#14)
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

On Fri, Nov 08, 2019 at 10:30:39AM +0900, Michael Paquier wrote:

Per the arguments of upthread, storing a 64-bit XID would require a
catalog change and you cannot backpatch that. I would suggest to keep
this patch focused on HEAD, and keep it as an improvement of the
existing features. Concurrent deadlock risks caused by CCI exist
since the feature came to life.

Marked as returned with feedback per lack of activity and the patch
was waiting on author for a bit more than two weeks.
--
Michael