Concurrency bug with vacuum full (cluster) and toast

Started by Alexander Korotkovalmost 7 years ago7 messages
#1Alexander Korotkov
a.korotkov@postgrespro.ru
1 attachment(s)

Hi all,

I've discovered bug, when vacuum full fails with error, because it
couldn't find toast chunks deleted by itself. That happens because
cluster_rel() sets OldestXmin, but toast accesses gets snapshot later
and independently. That causes heap_page_prune_opt() to clean chunks,
which rebuild_relation() expects to exist. This bug very rarely
happens on busy systems which actively update toast values. But I
found way to reliably reproduce it using debugger.

*Setup*

CREATE FUNCTION random_string(seed integer, length integer) RETURNS text
AS $$
SELECT substr(
string_agg(
substr(
encode(
decode(
md5(seed::text || '-' || i::text),
'hex'),
'base64'),
1, 21),
''),
1, length)
FROM generate_series(1, (length + 20) / 21) i; $$
LANGUAGE SQL;

CREATE TABLE test (val text);
INSERT INTO test (random_string(1,100000));

*Reproduction steps*

s1-s3 are three parallel PostgreSQL sessions
s3lldb is lldb connected to s1

At first s1 acquires snapshot and holds it.

s1# begin transaction isolation level repeatable read;
s1# select 1;

Then s2 makes multiple updates of our toasted value.

s2# update test set val = random_string(2,100000);
s2# update test set val = random_string(3,100000);
s2# update test set val = random_string(4,100000);
s2# update test set val = random_string(5,100000);
s2# update test set val = random_string(6,100000);
s2# update test set val = random_string(7,100000);

Then s3 starting vacuum full stopping on vacuum_set_xid_limits().

s3lldb# b vacuum_set_xid_limits
s3# vacuum full test;

We pass vacuum_set_xid_limits() making sure old tuple versions made by
s2 would be recently dead for vacuum full.

s3lldb# finish

Then s1 releases snapshot. Then heap_page_prune_opt() called from
toast accessed would cleanup toast chunks, which vacuum full expects
to be recently dead.

s1# commit;

Finally, we continue our vacuum full and get error!

s3lldb# continue
s3#
ERROR: unexpected chunk number 50 (expected 2) for toast value 16429
in pg_toast_16387

Attached patch contains dirty fix of this bug, which just prevents
heap_page_prune_opt() from clean tuples, when it's called from
rebuild_relation(). Actually, it's not something I'm proposing to
commit or even review, it might be just some start point for thoughts.

Any ideas?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

cluster_toast_concurrency_fix.patchapplication/octet-stream; name=cluster_toast_concurrency_fix.patchDownload
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a3e51922d85..c2eadf58f87 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
 #include "access/htup_details.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
@@ -84,6 +85,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	if (RecoveryInProgress())
 		return;
 
+	if (is_inside_rebuild_relation())
+		return;
+
 	/*
 	 * Use the appropriate xmin horizon for this relation. If it's a proper
 	 * catalog relation or a user defined, additional, catalog relation, we
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3e2a807640f..0493d748dc8 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -561,6 +561,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+bool inside_rebuild_relation = false;
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
@@ -581,36 +583,52 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
+	PG_TRY();
+	{
+		inside_rebuild_relation = true;
 
-	/* Remember info about rel before closing OldHeap */
-	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
+		/* Mark the correct index as clustered */
+		if (OidIsValid(indexOid))
+			mark_index_clustered(OldHeap, indexOid, true);
 
-	/* Close relcache entry, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
+		/* Remember info about rel before closing OldHeap */
+		relpersistence = OldHeap->rd_rel->relpersistence;
+		is_system_catalog = IsSystemRelation(OldHeap);
 
-	/* Create the transient table that will receive the re-ordered data */
-	OIDNewHeap = make_new_heap(tableOid, tableSpace,
-							   relpersistence,
-							   AccessExclusiveLock);
+		/* Close relcache entry, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
 
-	/* Copy the heap data into the new table in the desired order */
-	copy_heap_data(OIDNewHeap, tableOid, indexOid, verbose,
-				   &swap_toast_by_content, &frozenXid, &cutoffMulti);
+		/* Create the transient table that will receive the re-ordered data */
+		OIDNewHeap = make_new_heap(tableOid, tableSpace,
+								   relpersistence,
+								   AccessExclusiveLock);
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		/* Copy the heap data into the new table in the desired order */
+		copy_heap_data(OIDNewHeap, tableOid, indexOid, verbose,
+					   &swap_toast_by_content, &frozenXid, &cutoffMulti);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
+	PG_CATCH();
+	{
+		inside_rebuild_relation = false;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 }
 
+bool
+is_inside_rebuild_relation()
+{
+	return inside_rebuild_relation;
+}
 
 /*
  * Create the transient table that will be filled with new data during
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index d30e9c66dae..ab2ee8046d8 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -34,5 +34,6 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 TransactionId frozenXid,
 				 MultiXactId minMulti,
 				 char newrelpersistence);
+extern bool is_inside_rebuild_relation(void);
 
 #endif							/* CLUSTER_H */
#2Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#1)
Re: Concurrency bug with vacuum full (cluster) and toast

On Mon, Mar 18, 2019 at 12:53 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

I've discovered bug, when vacuum full fails with error, because it
couldn't find toast chunks deleted by itself. That happens because
cluster_rel() sets OldestXmin, but toast accesses gets snapshot later
and independently. That causes heap_page_prune_opt() to clean chunks,
which rebuild_relation() expects to exist. This bug very rarely
happens on busy systems which actively update toast values. But I
found way to reliably reproduce it using debugger.

Boy, I really feel like we've talked about this before. These are
somewhat-related discussions, but none of them are exactly the same
thing:

/messages/by-id/1335.1304187758@sss.pgh.pa.us
/messages/by-id/20362.1359747327@sss.pgh.pa.us
/messages/by-id/87in8nec96.fsf@news-spur.riddles.org.uk

I don't know whether we've actually talked about this precise problem
before and I just can't find the thread, or whether I'm confusing what
you've found here with some closely-related issue.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Robert Haas (#2)
Re: Concurrency bug with vacuum full (cluster) and toast

On Tue, Mar 19, 2019 at 6:48 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 18, 2019 at 12:53 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

I've discovered bug, when vacuum full fails with error, because it
couldn't find toast chunks deleted by itself. That happens because
cluster_rel() sets OldestXmin, but toast accesses gets snapshot later
and independently. That causes heap_page_prune_opt() to clean chunks,
which rebuild_relation() expects to exist. This bug very rarely
happens on busy systems which actively update toast values. But I
found way to reliably reproduce it using debugger.

Boy, I really feel like we've talked about this before. These are
somewhat-related discussions, but none of them are exactly the same
thing:

/messages/by-id/1335.1304187758@sss.pgh.pa.us
/messages/by-id/20362.1359747327@sss.pgh.pa.us
/messages/by-id/87in8nec96.fsf@news-spur.riddles.org.uk

I don't know whether we've actually talked about this precise problem
before and I just can't find the thread, or whether I'm confusing what
you've found here with some closely-related issue.

Thank you for pointing, but none of the threads you pointed describe
this exact problem. Now I see this bug have a set of cute siblings :)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#3)
Re: Concurrency bug with vacuum full (cluster) and toast

On Tue, Mar 19, 2019 at 1:37 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Thank you for pointing, but none of the threads you pointed describe
this exact problem. Now I see this bug have a set of cute siblings :)

Yeah. I really thought this precise issue -- the interlocking between
the VACUUM of the main table and the VACUUM of the TOAST table -- had
been discussed somewhere before. But I couldn't find that discussion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#4)
Re: Concurrency bug with vacuum full (cluster) and toast

Hi,

On Fri, Mar 22, 2019 at 02:27:07PM -0400, Robert Haas wrote:

On Tue, Mar 19, 2019 at 1:37 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Thank you for pointing, but none of the threads you pointed describe
this exact problem. Now I see this bug have a set of cute siblings :)

Yeah. I really thought this precise issue -- the interlocking between
the VACUUM of the main table and the VACUUM of the TOAST table -- had
been discussed somewhere before. But I couldn't find that discussion.

That also describes the longstanding issue with pg_statistic / pg_toast_2619,
no ?

I think that's maybe what Robert is remembering, and searching for
pg_toast_2619 gives a good number of results (including my own problem report).

Is this an "Opened Item" ?

#6Stephen Frost
sfrost@snowman.net
In reply to: Justin Pryzby (#5)
Re: Concurrency bug with vacuum full (cluster) and toast

Greetings,

* Justin Pryzby (pryzby@telsasoft.com) wrote:

On Fri, Mar 22, 2019 at 02:27:07PM -0400, Robert Haas wrote:

On Tue, Mar 19, 2019 at 1:37 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Thank you for pointing, but none of the threads you pointed describe
this exact problem. Now I see this bug have a set of cute siblings :)

Yeah. I really thought this precise issue -- the interlocking between
the VACUUM of the main table and the VACUUM of the TOAST table -- had
been discussed somewhere before. But I couldn't find that discussion.

That also describes the longstanding issue with pg_statistic / pg_toast_2619,
no ?

I think that's maybe what Robert is remembering, and searching for
pg_toast_2619 gives a good number of results (including my own problem report).

Is this an "Opened Item" ?

If you're referring to the v12 open items list, then, no, I wouldn't
think it would be as it's not a new issue (unless I've misunderstood).
Only regressions from prior versions are appropriate for the v12 open
items list, long-standing bugs/issues should be addressed and fixed, of
course, but those would be fixed and then back-patched.

Thanks!

Stephen

#7Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#6)
Re: Concurrency bug with vacuum full (cluster) and toast

On Wed, Apr 03, 2019 at 11:26:20AM -0400, Stephen Frost wrote:

If you're referring to the v12 open items list, then, no, I wouldn't
think it would be as it's not a new issue (unless I've misunderstood).
Only regressions from prior versions are appropriate for the v12 open
items list, long-standing bugs/issues should be addressed and fixed, of
course, but those would be fixed and then back-patched.

Please no open items which do not apply directly and only to v12.
There is a section on the page for older bugs however, which could
prove to be useful for this case (items listed in this section do not
have any impact on the release normally):
https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items#Older_Bugs
--
Michael