remove spurious CREATE INDEX CONCURRENTLY wait
I previously[1]/messages/by-id/20200805021109.GA9079@alvherre.pgsql posted a patch to have multiple CREATE INDEX CONCURRENTLY
not wait for the slowest of them. This is an update of that, with minor
conflicts fixed and a fresh thread.
To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots. We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes. With this, CIC on small
tables don't have to wait for CIC on large tables to complete.
[1]: /messages/by-id/20200805021109.GA9079@alvherre.pgsql
--
�lvaro Herrera http://www.linkedin.com/in/alvherre
"Escucha y olvidar�s; ve y recordar�s; haz y entender�s" (Confucio)
Attachments:
0001-Flag-CREATE-INDEX-CONCURRENTLY-to-avoid-spurious-wai.patchtext/x-diff; charset=us-asciiDownload+18-4
+ James Coleman
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots. We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes. With this, CIC on small
tables don't have to wait for CIC on large tables to complete.
Hm. +1 for improving this, if we can, but ...
It seems clearly unsafe to ignore a CIC that is in active index-building;
a snapshot held for that purpose is just as real as any other. It *might*
be all right to ignore a CIC that is just waiting, but you haven't made
any argument in the patch comments as to why that's safe either.
(Moreover, at the points where we're just waiting, I don't think we have
a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
anyway.)
Actually, it doesn't look like you've touched the comments at all.
WaitForOlderSnapshots' header comment has a long explanation of why
it's safe to ignore certain processes. That certainly needs to be
updated by any patch that's going to change the rules.
BTW, what about REINDEX CONCURRENTLY?
regards, tom lane
On Mon, Aug 10, 2020 at 8:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots. We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes. With this, CIC on small
tables don't have to wait for CIC on large tables to complete.Hm. +1 for improving this, if we can, but ...
It seems clearly unsafe to ignore a CIC that is in active index-building;
a snapshot held for that purpose is just as real as any other. It *might*
be all right to ignore a CIC that is just waiting, but you haven't made
any argument in the patch comments as to why that's safe either.
(Moreover, at the points where we're just waiting, I don't think we have
a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
anyway.)
Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).
Actually, it doesn't look like you've touched the comments at all.
WaitForOlderSnapshots' header comment has a long explanation of why
it's safe to ignore certain processes. That certainly needs to be
updated by any patch that's going to change the rules.
Agreed that the comment needs to be updated to discuss the
(im)possibility of arbitrary operations within a snapshot held by CIC.
James
James Coleman <jtc331@gmail.com> writes:
Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).
Expression indexes that call user-defined functions seem like a
pretty serious risk factor for that argument. Those are exactly
the same expressions that ANALYZE will evaluate, as a result of
which we judge it unsafe to ignore. Why would CIC be different?
regards, tom lane
On Tue, Aug 11, 2020 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).Expression indexes that call user-defined functions seem like a
pretty serious risk factor for that argument. Those are exactly
the same expressions that ANALYZE will evaluate, as a result of
which we judge it unsafe to ignore. Why would CIC be different?
The comments for WaitForOlderSnapshots() don't discuss that problem at
all; as far as ANALYZE goes they only say:
* Manual ANALYZEs, however, can't be excluded because they
* might be within transactions that are going to do arbitrary operations
* later.
But nonetheless over in the thread Álvaro linked to we'd discussed
these issues already. Andres in [1] and [2] believed that:
For the snapshot waits we can add a procarray flag
(alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is
doing. Which WaitForOlderSnapshots() can then use to ignore those CICs,
which is safe, because those transactions definitely don't insert into
relations targeted by CIC. The change to WaitForOlderSnapshots() would
just be to pass the new flag to GetCurrentVirtualXIDs, I think.
and
What I was thinking of was a new flag, with a distinct value from
PROC_IN_VACUUM. It'd currently just be specified in the
GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid
needing to wait for other CICs on different relations. Since CIC is not
permitted on system tables, and CIC doesn't do DML on normal tables, it
seems fairly obviously correct to exclude other CICs.
In [3] I'd brought up that a function index can do arbitrary
operations (including, terribly, a query of another table), and Andres
(in [4]) noted that:
Well, even if we consider this an actual problem, we could still use
PROC_IN_CIC for non-expression non-partial indexes (index operator
themselves better ensure this isn't a problem, or they're ridiculously
broken already - they can get called during vacuum).
But went on to describe how this is a problem with all expression
indexes (even if those expressions don't do dangerous things) because
of syscache lookups, but that ideally for expression indexes we'd have
a way to use a different (or more frequently taken) snapshot for the
purpose of computing those expressions. That's a significantly more
involved patch though.
So from what I understand, everything that I'd claimed in my previous
message still holds true for non-expression/non-partial indexes. Is
there something else I'm missing?
James
1: /messages/by-id/20200325191935.jjhdg2zy5k7ath5v@alap3.anarazel.de
2: /messages/by-id/20200325195841.gq4hpl25t6pxv3gl@alap3.anarazel.de
3: /messages/by-id/CAAaqYe_fveT_tvKonVt1imujOURUUMrydGeaBGahqLLy9D-REw@mail.gmail.com
4: /messages/by-id/20200416221207.wmnzbxvjqqpazeob@alap3.anarazel.de
On 2020-Aug-11, James Coleman wrote:
In [3] I'd brought up that a function index can do arbitrary
operations (including, terribly, a query of another table), and Andres
(in [4]) noted that:Well, even if we consider this an actual problem, we could still use
PROC_IN_CIC for non-expression non-partial indexes (index operator
themselves better ensure this isn't a problem, or they're ridiculously
broken already - they can get called during vacuum).But went on to describe how this is a problem with all expression
indexes (even if those expressions don't do dangerous things) because
of syscache lookups, but that ideally for expression indexes we'd have
a way to use a different (or more frequently taken) snapshot for the
purpose of computing those expressions. That's a significantly more
involved patch though.
So the easy first patch here is to add the flag as PROC_IN_SAFE_CIC,
which is set only for CIC when it's used for indexes that are neither
on expressions nor partial. Then ignore those in WaitForOlderSnapshots.
The attached patch does it that way. (Also updated per recent
conflicts).
I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.
Also, per [1]/messages/by-id/20191101203310.GA12239@alvherre.pgsql, ISTM this flag could be used to tell lazy VACUUM to
ignore the Xmin of this process too, which the previous formulation
(where all CICs were so marked) could not. This patch doesn't do that
yet, but it seems the natural next step to take.
[1]: /messages/by-id/20191101203310.GA12239@alvherre.pgsql
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v2-0001-Avoid-spurious-CREATE-INDEX-CONCURRENTLY-waits.patchtext/x-diff; charset=us-asciiDownload+52-5
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.
Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.
Also, per [1], ISTM this flag could be used to tell lazy VACUUM to
ignore the Xmin of this process too, which the previous formulation
(where all CICs were so marked) could not. This patch doesn't do that
yet, but it seems the natural next step to take.
Could we consider renaming vacuumFlags? With more flags associated to
a PGPROC entry that are not related to vacuum, the current naming
makes things confusing. Something like statusFlags could fit better
in the picture?
--
Michael
On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.
Hi,
After looking through the thread and reading the patch it seems good,
and there are only few minor questions:
* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
fact it's already mentioned in the commentaries as done, which a bit
confusing.
* Naming, to be more precise what suggested Michael:
Could we consider renaming vacuumFlags? With more flags associated to
a PGPROC entry that are not related to vacuum, the current naming
makes things confusing. Something like statusFlags could fit better
in the picture?
which sounds reasonable, and similar one about flag name
PROC_IN_SAFE_CIC - if it covers both CREATE INDEX/REINDEX CONCURRENTLY
maybe just PROC_IN_SAFE_IC?
Any plans about those questions? I can imagine that are the only missing
parts.
On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:
On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.Hi,
After looking through the thread and reading the patch it seems good,
and there are only few minor questions:* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
fact it's already mentioned in the commentaries as done, which a bit
confusing.
Just to give it a shot, would the attached change be enough?
Attachments:
v3-0001-Avoid-spurious-CREATE-INDEX-CONCURRENTLY-waits.patchtext/x-diff; charset=us-asciiDownload+124-5
On Mon, Nov 09, 2020 at 04:47:43PM +0100, Dmitry Dolgov wrote:
On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:
On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.Hi,
After looking through the thread and reading the patch it seems good,
and there are only few minor questions:* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
fact it's already mentioned in the commentaries as done, which a bit
confusing.Just to give it a shot, would the attached change be enough?
Could it be possible to rename vacuumFlags to statusFlags first? I
did not see any objection to do this suggestion.
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + MyProc->vacuumFlags |= PROC_IN_SAFE_IC; + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags; + LWLockRelease(ProcArrayLock);
I can't help noticing that you are repeating the same code pattern
eight times. I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + MyProc->vacuumFlags |= PROC_IN_SAFE_IC; + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags; + LWLockRelease(ProcArrayLock);
I can't help noticing that you are repeating the same code pattern
eight times. I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.
Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.
regards, tom lane
On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.
Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock. That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.
Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock. That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.
Not sure I believe that it doesn't matter much in practice. If there's
a steady stream of shared ProcArrayLock acquisitions (for snapshot
acquisition) then somebody wanting exclusive lock will create a big
hiccup, whether they hold it for a short time or not.
regards, tom lane
On 2020-Nov-09, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock. That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.Not sure I believe that it doesn't matter much in practice. If there's
a steady stream of shared ProcArrayLock acquisitions (for snapshot
acquisition) then somebody wanting exclusive lock will create a big
hiccup, whether they hold it for a short time or not.
Yeah ... it would be much better if we can make it use atomics instead.
Currently it's an uint8, and in PGPROC itself it's probably not a big
deal to enlarge that, but I fear that quadrupling the size of the
mirroring array in PROC_HDR might be bad for performance. However,
maybe if we use atomics to access it, then we don't need to mirror it
anymore? That would need some benchmarking of GetSnapshotData.
On Mon, Nov 09, 2020 at 11:31:15PM -0300, Alvaro Herrera wrote:
Yeah ... it would be much better if we can make it use atomics instead.
Currently it's an uint8, and in PGPROC itself it's probably not a big
deal to enlarge that, but I fear that quadrupling the size of the
mirroring array in PROC_HDR might be bad for performance. However,
maybe if we use atomics to access it, then we don't need to mirror it
anymore? That would need some benchmarking of GetSnapshotData.
Hmm. If you worry about the performance impact here, it is possible
to do a small performance test without this patch. vacuum_rel() sets
the flag for a non-full VACUUM, so with one backend running a manual
VACUUM in loop on an empty relation could apply some pressure on any
benchmark, even a simple pgbench.
--
Michael
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Yeah ... it would be much better if we can make it use atomics instead.
I was thinking more like "do we need any locking at all".
Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about. On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes. And even if they don't see it, at worst
we lose the optimization being proposed.
There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"
regards, tom lane
On Mon, Nov 09, 2020 at 10:02:27PM -0500, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Yeah ... it would be much better if we can make it use atomics instead.
I was thinking more like "do we need any locking at all".
Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about. On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes. And even if they don't see it, at worst
we lose the optimization being proposed.There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"
Sounds right, but after reading the thread about GetSnapshotData
scalability more thoroughly there seem to be an assumption that those
copies have to be updated at the same time under the same lock, and
claims that in some cases justification for correctness around not
taking ProcArrayLock is too complicated, at least for now.
Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.
On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:
Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.
Thanks for planning some benchmarking for this specific patch. I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.
--
Michael
On Fri, Nov 13, 2020 at 09:25:40AM +0900, Michael Paquier wrote:
On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.Thanks for planning some benchmarking for this specific patch. I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.
I've tried first to test scenarios where GetSnapshotData produces
significant lock contention and "reindex concurrently" implementation
with locks interferes with it. The idea I had is to create a test
function that constantly calls GetSnapshotData (perf indeed shows
significant portion of time spent on contended lock), and clash it with
a stream of "reindex concurrently" of an empty relation (which still
reaches safe_index check). I guess it could be considered as an
artificial extreme case. Measuring GetSnapshotData (or rather the
surrounding wrapper, to distinguish calls from the test function from
everything else) latency without reindex, with reindex and locks, with
reindex without locks should produce different "modes" and comparing
them we can make some conclusions.
Latency histograms without reindex (nanoseconds):
nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 10001209 |****************************************|
2048 -> 4095 : 76936 | |
4096 -> 8191 : 1468 | |
8192 -> 16383 : 98 | |
16384 -> 32767 : 39 | |
32768 -> 65535 : 6 | |
The same with reindex without locks:
nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 111345 | |
2048 -> 4095 : 6997627 |****************************************|
4096 -> 8191 : 18575 | |
8192 -> 16383 : 586 | |
16384 -> 32767 : 312 | |
32768 -> 65535 : 18 | |
The same with reindex with locks:
nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 59438 | |
2048 -> 4095 : 6901187 |****************************************|
4096 -> 8191 : 18584 | |
8192 -> 16383 : 581 | |
16384 -> 32767 : 280 | |
32768 -> 65535 : 84 | |
Looks like with reindex without locks is indeed faster (there are mode
samples in lower time section), but not particularly significant to the
whole distribution, especially taking into account extremity of the
test.
I'll take a look at benchmarking of switching vacuumFlags to use
atomics, but as it's probably a bit off topic I'm going to attach
another version of the patch with locks and suggested changes. To which
I have one question:
Michael Paquier <michael@paquier.xyz> writes:
I think that this should be in its own routine, and that we had better
document that this should be called just after starting a transaction,
with an assertion enforcing that.
I'm not sure which exactly assertion condition do you mean?