pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).
Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.
Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/fdd965d074d46765c295223b119ca437dbcac973
Modified Files
--------------
contrib/amcheck/t/002_cic.pl | 78 ++++++++++++++++
src/backend/utils/cache/inval.c | 12 ++-
src/backend/utils/cache/relcache.c | 115 +++++++++++++++++++++--
src/bin/pgbench/t/001_pgbench_with_server.pl | 118 +++++++----------------
src/include/utils/inval.h | 1 +
src/include/utils/relcache.h | 2 +-
src/test/perl/PostgresNode.pm | 134 +++++++++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 368 insertions(+), 93 deletions(-)
On 10/24/21 03:40, Noah Misch wrote:
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds.
Since this commit, initdb never completes due to infinite retrying over
and over (on the first RelationBuildDesc call).
We have a CLOBBER_CACHE_ALWAYS buildfarm machine "avocet", and that
currently looks like this (top):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
2626 buildfa+ 20 0 202888 21416 20084 R 98.34 0.531 151507:16
/home/buildfarm/avocet/buildroot/REL9_6_STABLE/pgsql.build/tmp_install/home/buildfarm/avocet/buildroot/REL9_6_STABLE/inst/bin/postgres
--boot -x1 -F
Yep, that's 151507 minutes, i.e. 104 days in initdb :-/
I haven't looked at this very closely yet, but it seems the whole
problem is we do this at the very beginning:
in_progress_list[in_progress_offset].invalidated = false;
/*
* find the tuple in pg_class corresponding to the given relation id
*/
pg_class_tuple = ScanPgRelation(targetRelId, true, false);
which seems entirely self-defeating, because ScanPgRelation acquires a
lock (on pg_class), which accepts invalidations, which invalidates
system caches (in clobber_cache_always), which sets promptly sets
in_progress_list[in_progress_offset].invalidated = false;
guaranteeing an infinite loop.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:
On 10/24/21 03:40, Noah Misch wrote:
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).
Ugh. Do we need to do something about WRT the next set of minor releases? Is
there a a chance of this occuring in "real" workloads?
Greetings,
Andres Freund
On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:
Ugh. Do we need to do something about WRT the next set of minor
releases?
The set of minor releases of this week has already been stamped, so
that's too late :/
Is there a a chance of this occuring in "real" workloads?
Ugh++. The problem is that we would not really detect that
automatically, isn't it?
--
Michael
On 2/9/22 01:43, Andres Freund wrote:
Hi,
On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:
On 10/24/21 03:40, Noah Misch wrote:
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).Ugh. Do we need to do something about WRT the next set of minor releases? Is
there a a chance of this occuring in "real" workloads?
AFAICS this only affects builds with CLOBBER_CACHE_ALWAYS, and anyone
running such build in production clearly likes painful things anyway.
But really, for the infinite loop to happen, building a relation
descriptor has to invalidate a cache. And I haven't found a way to do
that without the CLOBBER_CACHE_ALWAYS thing.
Also, all the November minor releases include this commit, and there
were no reports about this (pretty obvious) issue. Buildfarm did not
complain either (but an animal may be stuck for months and we would not
know about it).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2022-02-09 10:23:06 +0900, Michael Paquier wrote:
On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:
Ugh. Do we need to do something about WRT the next set of minor
releases?The set of minor releases of this week has already been stamped, so
that's too late :/
It's stamped, not tagged, so we could send out new tarballs. Or we could skip
a release number. IIRC we had to do something along those lines before.
Ugh++. The problem is that we would not really detect that
automatically, isn't it?
What do you mean with detect here?
Greetings,
Andres Freund
Hi,
On 2022-02-09 02:25:09 +0100, Tomas Vondra wrote:
AFAICS this only affects builds with CLOBBER_CACHE_ALWAYS, and anyone
running such build in production clearly likes painful things anyway.
Yea, realistically nobody does that.
But really, for the infinite loop to happen, building a relation descriptor
has to invalidate a cache. And I haven't found a way to do that without the
CLOBBER_CACHE_ALWAYS thing.
Phew.
Also, all the November minor releases include this commit, and there were no
reports about this (pretty obvious) issue. Buildfarm did not complain either
(but an animal may be stuck for months and we would not know about it).
Ah, somehow I thought that wasn't yet in the last set of releases. Phew #2.
Greetings,
Andres Freund
On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:
On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:
On 10/24/21 03:40, Noah Misch wrote:
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).
Thanks for the report. I had added the debug_discard arguments of
InvalidateSystemCachesExtended() and RelationCacheInvalidate() to make the new
code survive a CREATE TABLE at debug_discard_caches=5. Apparently that's not
enough for initdb. I'll queue a task to look at it.
It's a good reminder to set wait_timeout on buildfarm animals. (I should take
that advice, too.)
Ugh. Do we need to do something about WRT the next set of minor releases?
No, given that this code already debuted in the November releases.
On Tue, Feb 08, 2022 at 05:43:34PM -0800, Andres Freund wrote:
It's stamped, not tagged, so we could send out new tarballs. Or we could skip
a release number. IIRC we had to do something along those lines before.
It does not matter now, but the release is stamped and tagged.
What do you mean with detect here?
Well, we would not be able to see that something is stuck by default,
but Noah has just answered to my question by mentioning wait_timeout
in the buildfarm configuration.
--
Michael
On Tue, Feb 08, 2022 at 06:04:03PM -0800, Noah Misch wrote:
On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:
On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:
On 10/24/21 03:40, Noah Misch wrote:
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).Thanks for the report. I had added the debug_discard arguments of
InvalidateSystemCachesExtended() and RelationCacheInvalidate() to make the new
code survive a CREATE TABLE at debug_discard_caches=5. Apparently that's not
enough for initdb. I'll queue a task to look at it.
The explanation was more boring than that. v13 and earlier have an additional
InvalidateSystemCaches() call site, which I neglected to update. Here's the
fix I intend to push.
Attachments:
fix-inval-build-race-v1.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Fix back-patch of "Avoid race in RelationBuildDesc() ..."
The back-patch of commit fdd965d074d46765c295223b119ca437dbcac973 broke
CLOBBER_CACHE_ALWAYS for v9.6 through v13, because it updated the
InvalidateSystemCaches() call pertaining to CLOBBER_CACHE_RECURSIVELY
and not also the one pertaining to CLOBBER_CACHE_ALWAYS. Back-patch to
v13, v12, v11, and v10.
Reported by Tomas Vondra. Reviewed by FIXME.
Discussion: https://postgr.es/m/df7b4c0b-7d92-f03f-75c4-9e08b269a716@enterprisedb.com
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 8b0503f..8fe1f0d 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -715,27 +715,27 @@ AcceptInvalidationMessages(void)
* slows things by at least a factor of 10000, so I wouldn't suggest
* trying to run the entire regression tests that way. It's useful to try
* a few simple tests, to make sure that cache reload isn't subject to
* internal cache-flush hazards, but after you've done a few thousand
* recursive reloads it's unlikely you'll learn more.
*/
#if defined(CLOBBER_CACHE_ALWAYS)
{
static bool in_recursion = false;
if (!in_recursion)
{
in_recursion = true;
- InvalidateSystemCaches();
+ InvalidateSystemCachesExtended(true);
in_recursion = false;
}
}
#elif defined(CLOBBER_CACHE_RECURSIVELY)
{
static int recursion_depth = 0;
/* Maximum depth is arbitrary depending on your threshold of pain */
if (recursion_depth < 3)
{
recursion_depth++;
InvalidateSystemCachesExtended(true);
recursion_depth--;
Michael Paquier <michael@paquier.xyz> writes:
On Tue, Feb 08, 2022 at 05:43:34PM -0800, Andres Freund wrote:
It's stamped, not tagged, so we could send out new tarballs. Or we could skip
a release number. IIRC we had to do something along those lines before.
It does not matter now, but the release is stamped and tagged.
Yeah, I see no need to do anything about this on an emergency
basis.
What do you mean with detect here?
Well, we would not be able to see that something is stuck by default,
but Noah has just answered to my question by mentioning wait_timeout
in the buildfarm configuration.
The buildfarm's wait_timeout option isn't that helpful here, because
when it triggers, the client just goes belly-up *with no report*.
So even if the CCA animals had it on, you'd not notice unless you
started to wonder why they'd not reported lately.
I think that's a bug that ought to be fixed. I do agree that
wait_timeout ought to be finite by default, too.
regards, tom lane
On 2/9/22 06:41, Noah Misch wrote:
The explanation was more boring than that. v13 and earlier have an additional
InvalidateSystemCaches() call site, which I neglected to update. Here's the
fix I intend to push.
I tried this patch on 10 and 13, and it seems to fix the issue. So +1.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company