Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

Started by Ayush Tiwariover 1 year ago6 messages
#1Ayush Tiwari
ayushtiwari.slg01@gmail.com
1 attachment(s)

Hi hackers,

We encountered an issue lately, that if the database grants too many roles
`datacl` is toasted, following which, the drop database command will fail
with error "wrong tuple length".

To reproduce the issue, please follow below steps:

CREATE DATABASE test;

-- create helper function
CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
declare
mycounter int;
begin
for mycounter in select i from generate_series(1,2000) i loop
execute 'CREATE
ROLE aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
execute 'GRANT ALL ON DATABASE test to
aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
end loop;
return 'ok';
end;
$body$ language plpgsql volatile strict;

-- create roles and grant on the database.
SELECT data_tuple();

-- drop database command, this will result in "wrong tuple length" error.
DROP DATABASE test;

The root cause of this behaviour is that the HeapTuple in dropdb
function fetches a copy of pg_database tuple from system cache.
But the system cache flattens any toast attributes, which cause the length
check to fail in heap_inplace_update.

A patch for this issue is attached to the mail, the solution is to
change the logic to fetch the tuple by directly scanning pg_database rather
than using the catcache.

Regards,
Ayush

Attachments:

0001-Fix-drop-database-with-pg_database-toast-attribute.patchapplication/x-patch; name=0001-Fix-drop-database-with-pg_database-toast-attribute.patchDownload
From 52bfa40c67815e5929f09d6b5c48310c6a4db49b Mon Sep 17 00:00:00 2001
From: Ayush Tiwari <aytiwari@microsoft.com>
Date: Tue, 13 Aug 2024 02:54:57 +0530
Subject: [PATCH] Fix-drop-database-with-pg_database-toast-attribute

---
 src/backend/commands/dbcommands.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 7026352bc9..b5673217b8 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1649,6 +1649,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	bool		db_istemplate;
 	Relation	pgdbrel;
 	HeapTuple	tup;
+	ScanKeyData scankey;
+	SysScanDesc scan;
 	Form_pg_database datform;
 	int			notherbackends;
 	int			npreparedxacts;
@@ -1786,7 +1788,15 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	pgstat_drop_database(db_id);
 
-	tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id));
+	ScanKeyInit(&scankey,
+				Anum_pg_database_datname,
+				BTEqualStrategyNumber, F_NAMEEQ,
+				CStringGetDatum(dbname));
+
+	scan = systable_beginscan(pgdbrel, DatabaseNameIndexId, true,
+							  NULL, 1, &scankey);
+	tup = systable_getnext(scan);
+
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for database %u", db_id);
 	datform = (Form_pg_database) GETSTRUCT(tup);
@@ -1812,6 +1822,8 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	 */
 	CatalogTupleDelete(pgdbrel, &tup->t_self);
 
+	systable_endscan(scan);
+
 	/*
 	 * Drop db-specific replication slots.
 	 */
-- 
2.34.1

#2Tomas Vondra
tomas@vondra.me
In reply to: Ayush Tiwari (#1)
Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

Hi Ayush,

On 8/13/24 07:37, Ayush Tiwari wrote:

Hi hackers,

We encountered an issue lately, that if the database grants too many
roles `datacl` is toasted, following which, the drop database command
will fail with error "wrong tuple length".

To reproduce the issue, please follow below steps:

CREATE DATABASE test;

-- create helper function
CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
declare
          mycounter int;
begin
          for mycounter in select i from generate_series(1,2000) i loop
                    execute 'CREATE
ROLE aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
                    execute 'GRANT ALL ON DATABASE test to
aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
          end loop;
          return 'ok';
end;
$body$ language plpgsql volatile strict; 

-- create roles and grant on the database.
SELECT data_tuple(); 

-- drop database command, this will result in "wrong tuple length" error.
DROP DATABASE test;

The root cause of this behaviour is that the HeapTuple in dropdb
function fetches a copy of pg_database tuple from system cache.
But the system cache flattens any toast attributes, which cause the
length check to fail in heap_inplace_update.

A patch for this issue is attached to the mail, the solution is to
change the logic to fetch the tuple by directly scanning pg_database
rather than using the catcache.

Thanks for the report. I can reproduce the issue following your
instructions, and the fix seems reasonable ...

But there's also one thing I don't quite understand. I did look for
other places that might have a similar issue, that is places that

1) lookup tuple using SearchSysCacheCopy1

2) call on the tuple heap_inplace_update

And I found about four places doing that:

- index_update_stats (src/backend/catalog/index.c)

- create_toast_table (src/backend/catalog/toasting.c)

- vac_update_relstats / vac_update_datfrozenxid (commands/vacuum.c)

But I haven't managed to trigger the same kind of failure for any of
those places, despite trying. AFAIK that's because those places update
pg_class, and that doesn't have TOAST, so the tuple length can't change.

So this fix seems reasonable.

--
Tomas Vondra

#3Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#2)
Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

On 8/16/24 13:26, Tomas Vondra wrote:

Hi Ayush,

...

So this fix seems reasonable.

I've pushed this to all affected branches, except for 11 which is EOL.

I thought about adding a test, but I couldn't think of a TAP test where
this would really fit, and it didn't seem very practical to have a test
creating hundreds of roles. So I abandoned the idea.

Thanks for the report and the fix!

--
Tomas Vondra

#4Yugo Nagata
nagata@sraoss.co.jp
In reply to: Tomas Vondra (#3)
1 attachment(s)
Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

On Mon, 19 Aug 2024 00:35:39 +0200
Tomas Vondra <tomas@vondra.me> wrote:

On 8/16/24 13:26, Tomas Vondra wrote:

Hi Ayush,

...

So this fix seems reasonable.

I've pushed this to all affected branches, except for 11 which is EOL.

I thought about adding a test, but I couldn't think of a TAP test where
this would really fit, and it didn't seem very practical to have a test
creating hundreds of roles. So I abandoned the idea.

I tried to add Assert in heap_inplace_update to prevent possible similar
failures, but I gave up because I could not find a good way to determine if
a tuple is detoasted of not.

By the way, I found a comment in vac_update_datfrozenxid() and EventTriggerOnLogin()
that explains why we could not use tuples from the syscache for heap_inplace_update.
I think it is better ad d the same comment in dropdb(). I attached a trivial patch for it.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Attachments:

improve_comment_on_dropdb.patchtext/x-diff; name=improve_comment_on_dropdb.patchDownload
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 7a7e2c6e3e..d00ae40e19 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1790,7 +1790,9 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgstat_drop_database(db_id);
 
 	/*
-	 * Update the database's pg_database tuple
+	 * Get the pg_database tuple to scribble on.  Note that this does not
+	 * directly rely on the syscache to avoid issues with flattened toast
+	 * values for the in-place update.
 	 */
 	ScanKeyInit(&scankey,
 				Anum_pg_database_datname,
#5Tomas Vondra
tomas@vondra.me
In reply to: Yugo Nagata (#4)
Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

On 8/19/24 11:01, Yugo Nagata wrote:

On Mon, 19 Aug 2024 00:35:39 +0200
Tomas Vondra <tomas@vondra.me> wrote:

On 8/16/24 13:26, Tomas Vondra wrote:

Hi Ayush,

...

So this fix seems reasonable.

I've pushed this to all affected branches, except for 11 which is EOL.

I thought about adding a test, but I couldn't think of a TAP test where
this would really fit, and it didn't seem very practical to have a test
creating hundreds of roles. So I abandoned the idea.

I tried to add Assert in heap_inplace_update to prevent possible similar
failures, but I gave up because I could not find a good way to determine if
a tuple is detoasted of not.

Right, not sure there's a good way to check for that.

By the way, I found a comment in vac_update_datfrozenxid() and EventTriggerOnLogin()
that explains why we could not use tuples from the syscache for heap_inplace_update.
I think it is better ad d the same comment in dropdb(). I attached a trivial patch for it.

Agreed. That seems like a nice improvement to the comment.

regards

--
Tomas Vondra

#6Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#5)
Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

On 8/19/24 12:16, Tomas Vondra wrote:

On 8/19/24 11:01, Yugo Nagata wrote:

...

By the way, I found a comment in vac_update_datfrozenxid() and EventTriggerOnLogin()
that explains why we could not use tuples from the syscache for heap_inplace_update.
I think it is better ad d the same comment in dropdb(). I attached a trivial patch for it.

Agreed. That seems like a nice improvement to the comment.

Done, thanks for the suggestion / patch.

regards

--
Tomas Vondra