vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.
Hi hackers,
Recently, we encounter an issue that if the database grant too many roles that `datacl` toasted, vacuum freeze will fail with error "wrong tuple length".
To reproduce the issue, please follow below steps:
CREATE DATABASE vacuum_freeze_test;
-- create helper function
create or replace function toast_pg_database_datacl() returns text as $body$
declare
mycounter int;
begin
for mycounter in select i from generate_series(1, 2800) i loop
execute 'create role aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
execute 'grant ALL on database vacuum_freeze_test to aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
end loop;
return 'ok';
end;
$body$ language plpgsql volatile strict;
-- create roles and grant on the database
select toast_pg_database_datacl();
-- connect to the database
\c vacuum_freeze_test
-- chech the size of column datacl
select datname, pg_column_size(datacl) as datacl_size, age(datfrozenxid) from pg_database where datname='vacuum_freeze_test';
-- execute vacuum freeze and it should raise "wrong tuple length"
vacuum freeze;
The root cause is that vac_update_datfrozenxid fetch a copy of pg_database tuple from system cache.
But the system cache flatten any toast attributes, which cause the length chech failed in heap_inplace_update.
A path is attached co auther by Ashwin Agrawal, the solution is to fetch the pg_database tuple from disk instead of system cache if needed.
Attachments:
v1-0001-Fix-vacuum-freeze-with-pg_database-toast-attribute.patchapplication/octet-stream; name=v1-0001-Fix-vacuum-freeze-with-pg_database-toast-attribute.patchDownload
From a08e420a59244c98ba5be8ed5b17746ee5bf5404 Mon Sep 17 00:00:00 2001
From: "Junfeng(Jerome) Yang" <jeyang@pivotal.io>
Date: Wed, 18 Nov 2020 14:16:23 +0800
Subject: [PATCH] Fix vacuum freeze with pg_database toast attribute.
In vac_update_datfrozenxid(), the pg_database tuple for current
database should be fetched from disk heap table instead of system cache.
Since the cache already flatten toast tuple, so if the tuple in
pg_database contains toast attribute, heap_inplace_update() will fail
with "wrong tuple length".
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
---
src/backend/commands/vacuum.c | 84 ++++++++++++++++++++--------
src/test/regress/expected/vacuum.out | 58 +++++++++++++++++++
src/test/regress/sql/vacuum.sql | 40 +++++++++++++
3 files changed, 160 insertions(+), 22 deletions(-)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d89ba3031f..be2c7e17ef 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1328,6 +1328,35 @@ vac_update_relstats(Relation relation,
table_close(rd, RowExclusiveLock);
}
+/*
+ * fetch_database_tuple - Fetch a copy of database tuple from pg_database.
+ *
+ * This using disk heap table instead of system cache.
+ * relation: opened pg_database relation in vac_update_datfrozenxid().
+ */
+static HeapTuple
+fetch_database_tuple(Relation relation, Oid dbOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc sscan;
+ HeapTuple tuple = NULL;
+
+ ScanKeyInit(&skey[0],
+ Anum_pg_database_oid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(dbOid));
+
+ sscan = systable_beginscan(relation, DatabaseOidIndexId, true,
+ NULL, 1, skey);
+
+ tuple = systable_getnext(sscan);
+ if (HeapTupleIsValid(tuple))
+ tuple = heap_copytuple(tuple);
+
+ systable_endscan(sscan);
+
+ return tuple;
+}
/*
* vac_update_datfrozenxid() -- update pg_database.datfrozenxid for our DB
@@ -1350,8 +1379,8 @@ vac_update_relstats(Relation relation,
void
vac_update_datfrozenxid(void)
{
- HeapTuple tuple;
- Form_pg_database dbform;
+ HeapTuple cached_tuple;
+ Form_pg_database cached_dbform;
Relation relation;
SysScanDesc scan;
HeapTuple classTup;
@@ -1479,42 +1508,53 @@ vac_update_datfrozenxid(void)
/* Now fetch the pg_database tuple we need to update. */
relation = table_open(DatabaseRelationId, RowExclusiveLock);
- /* Fetch a copy of the tuple to scribble on */
- tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
- if (!HeapTupleIsValid(tuple))
- elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
- dbform = (Form_pg_database) GETSTRUCT(tuple);
+ /* Get the pg_database tuple to scribble on */
+ cached_tuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+ cached_dbform = (Form_pg_database) GETSTRUCT(cached_tuple);
/*
* As in vac_update_relstats(), we ordinarily don't want to let
* datfrozenxid go backward; but if it's "in the future" then it must be
* corrupt and it seems best to overwrite it.
*/
- if (dbform->datfrozenxid != newFrozenXid &&
- (TransactionIdPrecedes(dbform->datfrozenxid, newFrozenXid) ||
- TransactionIdPrecedes(lastSaneFrozenXid, dbform->datfrozenxid)))
- {
- dbform->datfrozenxid = newFrozenXid;
+ if (cached_dbform->datfrozenxid != newFrozenXid &&
+ (TransactionIdPrecedes(cached_dbform->datfrozenxid, newFrozenXid) ||
+ TransactionIdPrecedes(lastSaneFrozenXid, cached_dbform->datfrozenxid)))
dirty = true;
- }
else
- newFrozenXid = dbform->datfrozenxid;
+ newFrozenXid = cached_dbform->datfrozenxid;
/* Ditto for datminmxid */
- if (dbform->datminmxid != newMinMulti &&
- (MultiXactIdPrecedes(dbform->datminmxid, newMinMulti) ||
- MultiXactIdPrecedes(lastSaneMinMulti, dbform->datminmxid)))
- {
- dbform->datminmxid = newMinMulti;
+ if (cached_dbform->datminmxid != newMinMulti &&
+ (MultiXactIdPrecedes(cached_dbform->datminmxid, newMinMulti) ||
+ MultiXactIdPrecedes(lastSaneMinMulti, cached_dbform->datminmxid)))
dirty = true;
- }
else
- newMinMulti = dbform->datminmxid;
+ newMinMulti = cached_dbform->datminmxid;
if (dirty)
+ {
+ HeapTuple tuple;
+ Form_pg_database tmp_dbform;
+ /*
+ * Fetch a copy of the tuple to scribble on from pg_database disk
+ * heap table instead of system cache
+ * "SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId))".
+ * Since the cache already flatten toast tuple, so the
+ * heap_inplace_update will fail with "wrong tuple length".
+ */
+ tuple = fetch_database_tuple(relation, MyDatabaseId);
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
+ tmp_dbform = (Form_pg_database) GETSTRUCT(tuple);
+ tmp_dbform->datfrozenxid = newFrozenXid;
+ tmp_dbform->datminmxid = newMinMulti;
+
heap_inplace_update(relation, tuple);
+ heap_freetuple(tuple);
+ }
- heap_freetuple(tuple);
+ ReleaseSysCache(cached_tuple);
table_close(relation, RowExclusiveLock);
/*
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 3fccb183c0..5ff2732a70 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -383,3 +383,61 @@ RESET ROLE;
DROP TABLE vacowned;
DROP TABLE vacowned_parted;
DROP ROLE regress_vacuum;
+-- Vacuum freeze for database with toast attribute in pg_database tuple cause
+-- heap_inplace_update raise error "wrong tuple length". This is because system
+-- cache flatten toast tuple.
+DROP DATABASE IF EXISTS vacuum_freeze_test;
+NOTICE: database "vacuum_freeze_test" does not exist, skipping
+CREATE DATABASE vacuum_freeze_test;
+create or replace function toast_pg_database_datacl() returns text as $body$
+declare
+ mycounter int;
+begin
+ for mycounter in select i from generate_series(1, 2800) i loop
+ execute 'create role aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
+ execute 'grant ALL on database vacuum_freeze_test to aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
+ end loop;
+ return 'ok';
+end;
+$body$ language plpgsql volatile strict;
+create or replace function clean_roles() returns text as $body$
+declare
+ mycounter int;
+begin
+ for mycounter in select i from generate_series(1, 2800) i loop
+ execute 'drop role aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
+ end loop;
+ return 'ok';
+end;
+$body$ language plpgsql volatile strict;
+select toast_pg_database_datacl();
+ toast_pg_database_datacl
+--------------------------
+ ok
+(1 row)
+
+\c vacuum_freeze_test
+create temp table before_vacuum as select datname, pg_column_size(datacl) > 8192 as datacl_size, age(datfrozenxid) from pg_database where datname='vacuum_freeze_test';
+select datname, datacl_size from before_vacuum;
+ datname | datacl_size
+--------------------+-------------
+ vacuum_freeze_test | t
+(1 row)
+
+vacuum freeze;
+select datname, pg_column_size(datacl) > 8192 as datacl_size, age(datfrozenxid) != (select age from before_vacuum) as age_changed from pg_database where datname='vacuum_freeze_test';
+ datname | datacl_size | age_changed
+--------------------+-------------+-------------
+ vacuum_freeze_test | t | t
+(1 row)
+
+\c regression
+DROP DATABASE vacuum_freeze_test;
+select clean_roles();
+ clean_roles
+-------------
+ ok
+(1 row)
+
+drop function toast_pg_database_datacl();
+drop function clean_roles();
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index c7b5f96f6b..79fc8e91e0 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -294,3 +294,43 @@ RESET ROLE;
DROP TABLE vacowned;
DROP TABLE vacowned_parted;
DROP ROLE regress_vacuum;
+
+-- Vacuum freeze for database with toast attribute in pg_database tuple cause
+-- heap_inplace_update raise error "wrong tuple length". This is because system
+-- cache flatten toast tuple.
+DROP DATABASE IF EXISTS vacuum_freeze_test;
+CREATE DATABASE vacuum_freeze_test;
+create or replace function toast_pg_database_datacl() returns text as $body$
+declare
+ mycounter int;
+begin
+ for mycounter in select i from generate_series(1, 2800) i loop
+ execute 'create role aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
+ execute 'grant ALL on database vacuum_freeze_test to aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
+ end loop;
+ return 'ok';
+end;
+$body$ language plpgsql volatile strict;
+
+create or replace function clean_roles() returns text as $body$
+declare
+ mycounter int;
+begin
+ for mycounter in select i from generate_series(1, 2800) i loop
+ execute 'drop role aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' || mycounter;
+ end loop;
+ return 'ok';
+end;
+$body$ language plpgsql volatile strict;
+
+select toast_pg_database_datacl();
+\c vacuum_freeze_test
+create temp table before_vacuum as select datname, pg_column_size(datacl) > 8192 as datacl_size, age(datfrozenxid) from pg_database where datname='vacuum_freeze_test';
+select datname, datacl_size from before_vacuum;
+vacuum freeze;
+select datname, pg_column_size(datacl) > 8192 as datacl_size, age(datfrozenxid) != (select age from before_vacuum) as age_changed from pg_database where datname='vacuum_freeze_test';
+\c regression
+DROP DATABASE vacuum_freeze_test;
+select clean_roles();
+drop function toast_pg_database_datacl();
+drop function clean_roles();
--
2.17.1
Hi hackers,
Can anyone help to verify this?
On Tue, Nov 24, 2020 at 2:07 PM Junfeng Yang <yjerome@vmware.com> wrote:
Hi hackers,
Can anyone help to verify this?
I think one way to get feedback is to register this patch for the next
commit fest (https://commitfest.postgresql.org/31/)
--
With Regards,
Amit Kapila.
On Wed, Nov 18, 2020 at 06:32:51AM +0000, Junfeng Yang wrote:
A path is attached co auther by Ashwin Agrawal, the solution is to
fetch the pg_database tuple from disk instead of system cache if
needed.
Yeah, we had better fix and I guess backpatch something here. That's
annoying.
+DROP DATABASE IF EXISTS vacuum_freeze_test;
+NOTICE: database "vacuum_freeze_test" does not exist, skipping
+CREATE DATABASE vacuum_freeze_test;
This test is costly. Extracting that into a TAP test would be more
adapted, still I am not really convinced that this is a case worth
spending extra cycles on.
+ tuple = systable_getnext(sscan);
+ if (HeapTupleIsValid(tuple))
+ tuple = heap_copytuple(tuple);
That's an unsafe pattern as the tuple scanned could finish by being
invalid.
One thing that strikes me as unwise is that we could run into similar
problems with vac_update_relstats() in the future, and there have been
recent talks about having more toast tables like for pg_class. That
is not worth caring about on stable branches because it is not an
active problem there, but we could do something better on HEAD.
+ /* Get the pg_database tuple to scribble on */
+ cached_tuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+ cached_dbform = (Form_pg_database) GETSTRUCT(cached_tuple);
Er, shouldn't we *not* use the system cache at all here then? Or am I
missing something obvious? Please see the attached, that is more
simpleg.
--
Michael
Attachments:
pg_database-vacuum-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 395e75f768..6cab35126b 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1361,6 +1361,7 @@ vac_update_datfrozenxid(void)
MultiXactId lastSaneMinMulti;
bool bogus = false;
bool dirty = false;
+ ScanKeyData key[1];
/*
* Restrict this task to one backend per database. This avoids race
@@ -1479,10 +1480,24 @@ vac_update_datfrozenxid(void)
/* Now fetch the pg_database tuple we need to update. */
relation = table_open(DatabaseRelationId, RowExclusiveLock);
- /* Fetch a copy of the tuple to scribble on */
- tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+ /*
+ * Get the pg_database tuple to scribble on, and do not rely on
+ * the syscache to avoid problem with toasted values.
+ */
+ ScanKeyInit(&key[0],
+ Anum_pg_database_oid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(MyDatabaseId));
+
+ scan = systable_beginscan(relation, DatabaseOidIndexId, true,
+ NULL, 1, key);
+ tuple = systable_getnext(scan);
+ tuple = heap_copytuple(tuple);
+ systable_endscan(scan);
+
if (!HeapTupleIsValid(tuple))
elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
+
dbform = (Form_pg_database) GETSTRUCT(tuple);
/*
On Wed, Nov 25, 2020 at 04:12:15PM +0900, Michael Paquier wrote:
Yeah, we had better fix and I guess backpatch something here. That's
annoying.
I have considered that over the weekend, and let's be conservative.
pg_database has gained a toast table since 12, and before that we
would just have an error "row is too big", but nobody has really
complained about that either.
One thing that strikes me as unwise is that we could run into similar
problems with vac_update_relstats() in the future, and there have been
recent talks about having more toast tables like for pg_class. That
is not worth caring about on stable branches because it is not an
active problem there, but we could do something better on HEAD.
For now, I have added just a comment at the top of
heap_inplace_update() to warn callers.
Junfeng and Ashwin also mentioned to me off-list that their patch used
a second copy for performance reasons, but I don't see why this could
become an issue as we update each pg_database row only once a job is
done. So I'd like to apply something like the attached on HEAD,
comments are welcome.
--
Michael
Attachments:
pg_database-vacuum-v3.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..992e559216 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5710,6 +5710,9 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
*
* tuple is an in-memory tuple structure containing the data to be written
* over the target tuple. Also, tuple->t_self identifies the target tuple.
+ *
+ * Note that the tuple updated here had better not come directly from the
+ * syscache if the relation has toasted values as these would be flattened.
*/
void
heap_inplace_update(Relation relation, HeapTuple tuple)
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index f1112111de..98270a1049 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1361,6 +1361,7 @@ vac_update_datfrozenxid(void)
MultiXactId lastSaneMinMulti;
bool bogus = false;
bool dirty = false;
+ ScanKeyData key[1];
/*
* Restrict this task to one backend per database. This avoids race
@@ -1479,10 +1480,25 @@ vac_update_datfrozenxid(void)
/* Now fetch the pg_database tuple we need to update. */
relation = table_open(DatabaseRelationId, RowExclusiveLock);
- /* Fetch a copy of the tuple to scribble on */
- tuple = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+ /*
+ * Get the pg_database tuple to scribble on. Note that this does not
+ * directly rely on the syscache to avoid issues with flattened toast
+ * values for the in-place update.
+ */
+ ScanKeyInit(&key[0],
+ Anum_pg_database_oid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(MyDatabaseId));
+
+ scan = systable_beginscan(relation, DatabaseOidIndexId, true,
+ NULL, 1, key);
+ tuple = systable_getnext(scan);
+ tuple = heap_copytuple(tuple);
+ systable_endscan(scan);
+
if (!HeapTupleIsValid(tuple))
elog(ERROR, "could not find tuple for database %u", MyDatabaseId);
+
dbform = (Form_pg_database) GETSTRUCT(tuple);
/*
On Sun, Nov 29, 2020 at 5:27 PM Michael Paquier <michael@paquier.xyz> wrote:
One thing that strikes me as unwise is that we could run into similar
problems with vac_update_relstats() in the future, and there have been
recent talks about having more toast tables like for pg_class. That
is not worth caring about on stable branches because it is not an
active problem there, but we could do something better on HEAD.For now, I have added just a comment at the top of
heap_inplace_update() to warn callers.
I am thinking if there is some way to assert this aspect, but seems no way.
So, yes, having at least a comment is good for now.
Junfeng and Ashwin also mentioned to me off-list that their patch used
a second copy for performance reasons, but I don't see why this could
become an issue as we update each pg_database row only once a job is
done. So I'd like to apply something like the attached on HEAD,
comments are welcome.
Yes the attached patch looks good to me for PostgreSQL. Thanks Michael.
(In Greenplum, due to per table dispatch to segments, during database wide
vacuum this function gets called per table instead of only at the end, hence
we were trying to be conservative.)
On Mon, Nov 30, 2020 at 04:37:14PM -0800, Ashwin Agrawal wrote:
I am thinking if there is some way to assert this aspect, but seems no way.
So, yes, having at least a comment is good for now.
Yeah, I have been thinking about this one without coming to a clear
idea, but that would be nice.
Yes the attached patch looks good to me for PostgreSQL. Thanks Michael.
Okay, committed to HEAD then.
--
Michael
Hi Michael,
I got a customer case that matches this issue. The customer is on an
old version of 12, but I see the patch only went into 14:
On Tue, 8 Dec 2020 at 00:32, Michael Paquier <michael@paquier.xyz> wrote:
Yes the attached patch looks good to me for PostgreSQL. Thanks Michael.
Okay, committed to HEAD then.
commit 947789f1f5fb61daf663f26325cbe7cad8197d58
Author: Michael Paquier <michael@paquier.xyz>
Date: Tue Dec 8 12:13:19 2020 +0900
Avoid using tuple from syscache for update of pg_database.datfrozenxid
Would it be possible to backport the patch to 12 and 13?
--
Arthur Nascimento
EDB
On Thu, Jan 19, 2023 at 10:42:47AM -0300, Arthur Nascimento wrote:
Would it be possible to backport the patch to 12 and 13?
This was recently back-patched [0]/messages/by-id/Y70XNVbUWQsR2Car@paquier.xyz [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c0ee694 [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=72b6098.
[0]: /messages/by-id/Y70XNVbUWQsR2Car@paquier.xyz
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c0ee694
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=72b6098
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, 19 Jan 2023 at 14:10, Nathan Bossart <nathandbossart@gmail.com> wrote:
This was recently back-patched [0] [1] [2].
Oh, I see that now. Thanks! Sorry for the noise.
--
Arthur Nascimento
EDB