pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

Started by Noah Mischabout 4 years ago12 messages
#1Noah Misch
noah@leadboat.com

Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/fdd965d074d46765c295223b119ca437dbcac973

Modified Files
--------------
contrib/amcheck/t/002_cic.pl | 78 ++++++++++++++++
src/backend/utils/cache/inval.c | 12 ++-
src/backend/utils/cache/relcache.c | 115 +++++++++++++++++++++--
src/bin/pgbench/t/001_pgbench_with_server.pl | 118 +++++++----------------
src/include/utils/inval.h | 1 +
src/include/utils/relcache.h | 2 +-
src/test/perl/PostgresNode.pm | 134 +++++++++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 368 insertions(+), 93 deletions(-)

#2Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Noah Misch (#1)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On 10/24/21 03:40, Noah Misch wrote:

Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com

Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds.
Since this commit, initdb never completes due to infinite retrying over
and over (on the first RelationBuildDesc call).

We have a CLOBBER_CACHE_ALWAYS buildfarm machine "avocet", and that
currently looks like this (top):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
2626 buildfa+ 20 0 202888 21416 20084 R 98.34 0.531 151507:16
/home/buildfarm/avocet/buildroot/REL9_6_STABLE/pgsql.build/tmp_install/home/buildfarm/avocet/buildroot/REL9_6_STABLE/inst/bin/postgres
--boot -x1 -F

Yep, that's 151507 minutes, i.e. 104 days in initdb :-/

I haven't looked at this very closely yet, but it seems the whole
problem is we do this at the very beginning:

in_progress_list[in_progress_offset].invalidated = false;

/*
* find the tuple in pg_class corresponding to the given relation id
*/
pg_class_tuple = ScanPgRelation(targetRelId, true, false);

which seems entirely self-defeating, because ScanPgRelation acquires a
lock (on pg_class), which accepts invalidations, which invalidates
system caches (in clobber_cache_always), which sets promptly sets

in_progress_list[in_progress_offset].invalidated = false;

guaranteeing an infinite loop.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#2)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

Hi,

On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:

On 10/24/21 03:40, Noah Misch wrote:

Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com

Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).

Ugh. Do we need to do something about WRT the next set of minor releases? Is
there a a chance of this occuring in "real" workloads?

Greetings,

Andres Freund

#4Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#3)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:

Ugh. Do we need to do something about WRT the next set of minor
releases?

The set of minor releases of this week has already been stamped, so
that's too late :/

Is there a a chance of this occuring in "real" workloads?

Ugh++. The problem is that we would not really detect that
automatically, isn't it?
--
Michael

#5Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Andres Freund (#3)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On 2/9/22 01:43, Andres Freund wrote:

Hi,

On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:

On 10/24/21 03:40, Noah Misch wrote:

Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com

Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).

Ugh. Do we need to do something about WRT the next set of minor releases? Is
there a a chance of this occuring in "real" workloads?

AFAICS this only affects builds with CLOBBER_CACHE_ALWAYS, and anyone
running such build in production clearly likes painful things anyway.

But really, for the infinite loop to happen, building a relation
descriptor has to invalidate a cache. And I haven't found a way to do
that without the CLOBBER_CACHE_ALWAYS thing.

Also, all the November minor releases include this commit, and there
were no reports about this (pretty obvious) issue. Buildfarm did not
complain either (but an animal may be stuck for months and we would not
know about it).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#4)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

Hi,

On 2022-02-09 10:23:06 +0900, Michael Paquier wrote:

On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:

Ugh. Do we need to do something about WRT the next set of minor
releases?

The set of minor releases of this week has already been stamped, so
that's too late :/

It's stamped, not tagged, so we could send out new tarballs. Or we could skip
a release number. IIRC we had to do something along those lines before.

Ugh++. The problem is that we would not really detect that
automatically, isn't it?

What do you mean with detect here?

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#5)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

Hi,

On 2022-02-09 02:25:09 +0100, Tomas Vondra wrote:

AFAICS this only affects builds with CLOBBER_CACHE_ALWAYS, and anyone
running such build in production clearly likes painful things anyway.

Yea, realistically nobody does that.

But really, for the infinite loop to happen, building a relation descriptor
has to invalidate a cache. And I haven't found a way to do that without the
CLOBBER_CACHE_ALWAYS thing.

Phew.

Also, all the November minor releases include this commit, and there were no
reports about this (pretty obvious) issue. Buildfarm did not complain either
(but an animal may be stuck for months and we would not know about it).

Ah, somehow I thought that wasn't yet in the last set of releases. Phew #2.

Greetings,

Andres Freund

#8Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#3)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:

On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:

On 10/24/21 03:40, Noah Misch wrote:

Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com

Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).

Thanks for the report. I had added the debug_discard arguments of
InvalidateSystemCachesExtended() and RelationCacheInvalidate() to make the new
code survive a CREATE TABLE at debug_discard_caches=5. Apparently that's not
enough for initdb. I'll queue a task to look at it.

It's a good reminder to set wait_timeout on buildfarm animals. (I should take
that advice, too.)

Ugh. Do we need to do something about WRT the next set of minor releases?

No, given that this code already debuted in the November releases.

#9Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#6)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On Tue, Feb 08, 2022 at 05:43:34PM -0800, Andres Freund wrote:

It's stamped, not tagged, so we could send out new tarballs. Or we could skip
a release number. IIRC we had to do something along those lines before.

It does not matter now, but the release is stamped and tagged.

What do you mean with detect here?

Well, we would not be able to see that something is stuck by default,
but Noah has just answered to my question by mentioning wait_timeout
in the buildfarm configuration.
--
Michael

#10Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#8)
1 attachment(s)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On Tue, Feb 08, 2022 at 06:04:03PM -0800, Noah Misch wrote:

On Tue, Feb 08, 2022 at 04:43:47PM -0800, Andres Freund wrote:

On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:

On 10/24/21 03:40, Noah Misch wrote:

Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: /messages/by-id/20210730022548.GA1940096@gust.leadboat.com

Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
this commit, initdb never completes due to infinite retrying over and over
(on the first RelationBuildDesc call).

Thanks for the report. I had added the debug_discard arguments of
InvalidateSystemCachesExtended() and RelationCacheInvalidate() to make the new
code survive a CREATE TABLE at debug_discard_caches=5. Apparently that's not
enough for initdb. I'll queue a task to look at it.

The explanation was more boring than that. v13 and earlier have an additional
InvalidateSystemCaches() call site, which I neglected to update. Here's the
fix I intend to push.

Attachments:

fix-inval-build-race-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix back-patch of "Avoid race in RelationBuildDesc() ..."
    
    The back-patch of commit fdd965d074d46765c295223b119ca437dbcac973 broke
    CLOBBER_CACHE_ALWAYS for v9.6 through v13, because it updated the
    InvalidateSystemCaches() call pertaining to CLOBBER_CACHE_RECURSIVELY
    and not also the one pertaining to CLOBBER_CACHE_ALWAYS.  Back-patch to
    v13, v12, v11, and v10.
    
    Reported by Tomas Vondra.  Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/df7b4c0b-7d92-f03f-75c4-9e08b269a716@enterprisedb.com

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 8b0503f..8fe1f0d 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -715,27 +715,27 @@ AcceptInvalidationMessages(void)
 	 * slows things by at least a factor of 10000, so I wouldn't suggest
 	 * trying to run the entire regression tests that way.  It's useful to try
 	 * a few simple tests, to make sure that cache reload isn't subject to
 	 * internal cache-flush hazards, but after you've done a few thousand
 	 * recursive reloads it's unlikely you'll learn more.
 	 */
 #if defined(CLOBBER_CACHE_ALWAYS)
 	{
 		static bool in_recursion = false;
 
 		if (!in_recursion)
 		{
 			in_recursion = true;
-			InvalidateSystemCaches();
+			InvalidateSystemCachesExtended(true);
 			in_recursion = false;
 		}
 	}
 #elif defined(CLOBBER_CACHE_RECURSIVELY)
 	{
 		static int	recursion_depth = 0;
 
 		/* Maximum depth is arbitrary depending on your threshold of pain */
 		if (recursion_depth < 3)
 		{
 			recursion_depth++;
 			InvalidateSystemCachesExtended(true);
 			recursion_depth--;
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#9)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

Michael Paquier <michael@paquier.xyz> writes:

On Tue, Feb 08, 2022 at 05:43:34PM -0800, Andres Freund wrote:

It's stamped, not tagged, so we could send out new tarballs. Or we could skip
a release number. IIRC we had to do something along those lines before.

It does not matter now, but the release is stamped and tagged.

Yeah, I see no need to do anything about this on an emergency
basis.

What do you mean with detect here?

Well, we would not be able to see that something is stuck by default,
but Noah has just answered to my question by mentioning wait_timeout
in the buildfarm configuration.

The buildfarm's wait_timeout option isn't that helpful here, because
when it triggers, the client just goes belly-up *with no report*.
So even if the CCA animals had it on, you'd not notice unless you
started to wonder why they'd not reported lately.

I think that's a bug that ought to be fixed. I do agree that
wait_timeout ought to be finite by default, too.

regards, tom lane

#12Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Noah Misch (#10)
Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR

On 2/9/22 06:41, Noah Misch wrote:

The explanation was more boring than that. v13 and earlier have an additional
InvalidateSystemCaches() call site, which I neglected to update. Here's the
fix I intend to push.

I tried this patch on 10 and 13, and it seems to fix the issue. So +1.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company