Improve behavior of concurrent ANALYZE/VACUUM

Started by Michael Paquierover 7 years ago18 messages

michael@paquier.xyz

over 7 years ago

1 attachment(s)

Hi all,
(In CC are the folks who have reviewed the first patch versions, Nathan
and Horiguchi-san.)

After TRUNCATE and REINDEX, here is the third and last thread I am
spawning for the previous thread "Canceling authentication due to
timeout aka Denial of Service Attack":
/messages/by-id/152512087100.19803.12733865831237526317@wrigleys.postgresql.org

And this time the discussion is about VACUUM/ANALYZE. In this case, we
also check relation ownership after queuing for a lock, which can allow
any user to potentially lock a relation which others could use,
particularly with VACUUM FULL which needs an AEL (access exclusive
lock).

In the previous thread, we discussed a couple of approaches, but I was
not happy with any of those, hence I have been spending more time in
getting to a solution which has no user-facing changes, and still solves
the problems folks have been complaining about, and the result is the
patch attached. The patch changes a couple of things regarding ACL
checks, by simply centralizing the ownership checks into a single
routine used by both ANALYZE and VACUUM. This routine is then used in
two more places for manual ANALYZE and VACUUM:
- When specifying directly one or more relations in the command, in
expand_vacuum_rel().
- When building the complete list of relations to work on in the case of
a database-wide operation, in get_all_vacuum_rels().

analyze_rel() and vacuum_rel() have been using the same logic to check
for relation ownership, so refactoring things into a single routine is a
win in my opinion.

While reviewing the code, I have of course noticed that analyze_rel()
makes an effort to not produce a WARNING if both VACOPT_VACUUM and
VACOPT_ANALYZE are specified in VacuumStmt->options, however we can
never see that scenario as analyze_rel() never gets called at the same
time as vacuum_rel() for a single relation.

The patch attached includes tests which I have used to also check that
correct error messages are produced for VACUUM, VACUUM ANALYZE and
ANALYZE.

Please note that like the previous one for TRUNCATE, I would no plans
for a back-patch with the same arguments as previously. There are also
serious bugs being worked on for REL_11_STABLE so I don't want to take
any risk for this branch.

Thoughts?
--
Michael

Attachments:

0001-Improve-VACUUM-and-ANALYZE-by-avoiding-early-lock-qu.patchtext/x-diff; charset=us-asciiDownload

From 879284c919ee5d40bb258354dd162395191015a3 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 13 Aug 2018 00:17:20 +0200
Subject: [PATCH] Improve VACUUM and ANALYZE by avoiding early lock queue

A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
to try to lock the relation but the thing would just lock.  When the
client specifies a list of relations and the relation needs to be
skipped, fail hard so as there is no conflict with any relation a user
has no rights to work on.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early lock,
for both manual VACUUM with and without a list of relations specified.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
---
 src/backend/commands/analyze.c                |  27 +--
 src/backend/commands/vacuum.c                 | 156 +++++++++++++-----
 src/include/commands/vacuum.h                 |   3 +
 .../isolation/expected/vacuum-conflict.out    | 149 +++++++++++++++++
 src/test/isolation/isolation_schedule         |   1 +
 src/test/isolation/specs/vacuum-conflict.spec |  51 ++++++
 6 files changed, 331 insertions(+), 56 deletions(-)
 create mode 100644 src/test/isolation/expected/vacuum-conflict.out
 create mode 100644 src/test/isolation/specs/vacuum-conflict.spec

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3e148f03d0..ff641c2f24 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -196,27 +196,16 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
 	}
 
 	/*
-	 * Check permissions --- this should match vacuum's check!
+	 * Check if relation needs to be skipped based on ownership.  This
+	 * check happens also when building the relation list to analyze
+	 * for a manual operation, and needs to be done additionally here
+	 * as ANALYZE could happen across multiple transactions where relation
+	 * ownership could have changed in-between.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options))
 	{
-		/* No need for a WARNING if we already complained during VACUUM */
-		if (!(options & VACOPT_VACUUM))
-		{
-			if (onerel->rd_rel->relisshared)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser can analyze it",
-								RelationGetRelationName(onerel))));
-			else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-			else
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-		}
 		relation_close(onerel, ShareUpdateExclusiveLock);
 		return;
 	}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ee32fe8871..36ad286278 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -68,8 +68,8 @@ static BufferAccessStrategy vac_strategy;
 
 
 /* non-export function prototypes */
-static List *expand_vacuum_rel(VacuumRelation *vrel);
-static List *get_all_vacuum_rels(void);
+static List *expand_vacuum_rel(VacuumRelation *vrel, int options);
+static List *get_all_vacuum_rels(int options);
 static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
@@ -257,7 +257,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 			List	   *sublist;
 			MemoryContext old_context;
 
-			sublist = expand_vacuum_rel(vrel);
+			sublist = expand_vacuum_rel(vrel, options);
 			old_context = MemoryContextSwitchTo(vac_context);
 			newrels = list_concat(newrels, sublist);
 			MemoryContextSwitchTo(old_context);
@@ -265,7 +265,15 @@ vacuum(int options, List *relations, VacuumParams *params,
 		relations = newrels;
 	}
 	else
-		relations = get_all_vacuum_rels();
+		relations = get_all_vacuum_rels(options);
+
+	/*
+	 * Depending on the permission checks done while building the list
+	 * of relations to work on, it could be possible that the list is
+	 * empty, hence do nothing in this case.
+	 */
+	if (relations == NIL)
+		return;
 
 	/*
 	 * Decide whether we need to start/commit our own transactions.
@@ -408,6 +416,75 @@ vacuum(int options, List *relations, VacuumParams *params,
 	vac_context = NULL;
 }
 
+/*
+ * Check if a given relation can be safely vacuumed or not.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+	Assert(options & (VACOPT_VACUUM | VACOPT_ANALYZE));
+
+	/*
+	 * Check permissions.
+	 *
+	 * We allow the user to vacuum a table if he is superuser, the table
+	 * owner, or the database owner (but in the latter case, only if it's not
+	 * a shared relation).  pg_class_ownercheck includes the superuser case.
+	 *
+	 * Note we choose to treat permissions failure as a WARNING and keep
+	 * trying to vacuum the rest of the DB --- is this appropriate?
+	 */
+	if (!(pg_class_ownercheck(relid, GetUserId()) ||
+		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared)))
+	{
+		char		   *relname = NameStr(reltuple->relname);
+
+		/*
+		 * VACOPT_ANALYZE is checked here as both VACOPT_ANALYZE and
+		 * VACOPT_VACUUM could be analyzed for a VACUUM_ANALYZE query,
+		 * still we want to warn the user about a relation analyze in the
+		 * error message.
+		 */
+		if ((options & VACOPT_ANALYZE) != 0)
+		{
+			if (reltuple->relisshared)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser can analyze it",
+								relname)));
+			else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
+								relname)));
+			else
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
+								relname)));
+		}
+		else
+		{
+			if (reltuple->relisshared)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser can vacuum it",
+								relname)));
+			else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
+								relname)));
+			else
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
+								relname)));
+		}
+
+		return false;
+	}
+
+	return true;
+}
+
+
 /*
  * Given a VacuumRelation, fill in the table OID if it wasn't specified,
  * and optionally add VacuumRelations for partitions of the table.
@@ -423,7 +500,7 @@ vacuum(int options, List *relations, VacuumParams *params,
  * are made in vac_context.
  */
 static List *
-expand_vacuum_rel(VacuumRelation *vrel)
+expand_vacuum_rel(VacuumRelation *vrel, int options)
 {
 	List	   *vacrels = NIL;
 	MemoryContext oldcontext;
@@ -456,6 +533,28 @@ expand_vacuum_rel(VacuumRelation *vrel)
 		 */
 		relid = RangeVarGetRelid(vrel->relation, AccessShareLock, false);
 
+		/*
+		 * To check whether the relation is a partitioned table and its
+		 * ownership, fetch its syscache entry.
+		 */
+		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for relation %u", relid);
+		classForm = (Form_pg_class) GETSTRUCT(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+		{
+			ReleaseSysCache(tuple);
+
+			/*
+			 * Release lock again with AccessShareLock -- see below for
+			 * the reason why this lock is released.
+			 */
+			UnlockRelationOid(relid, AccessShareLock);
+			return vacrels;
+		}
+
 		/*
 		 * Make a returnable VacuumRelation for this rel.
 		 */
@@ -465,14 +564,6 @@ expand_vacuum_rel(VacuumRelation *vrel)
 													  vrel->va_cols));
 		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * To check whether the relation is a partitioned table, fetch its
-		 * syscache entry.
-		 */
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-		if (!HeapTupleIsValid(tuple))
-			elog(ERROR, "cache lookup failed for relation %u", relid);
-		classForm = (Form_pg_class) GETSTRUCT(tuple);
 		include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
 		ReleaseSysCache(tuple);
 
@@ -530,7 +621,7 @@ expand_vacuum_rel(VacuumRelation *vrel)
  * the current database.  The list is built in vac_context.
  */
 static List *
-get_all_vacuum_rels(void)
+get_all_vacuum_rels(int options)
 {
 	List	   *vacrels = NIL;
 	Relation	pgclass;
@@ -545,6 +636,11 @@ get_all_vacuum_rels(void)
 	{
 		Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
 		MemoryContext oldcontext;
+		Oid			relid = HeapTupleGetOid(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+			continue;
 
 		/*
 		 * We include partitioned tables here; depending on which operation is
@@ -563,7 +659,7 @@ get_all_vacuum_rels(void)
 		 */
 		oldcontext = MemoryContextSwitchTo(vac_context);
 		vacrels = lappend(vacrels, makeVacuumRelation(NULL,
-													  HeapTupleGetOid(tuple),
+													  relid,
 													  NIL));
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -1436,30 +1532,16 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	}
 
 	/*
-	 * Check permissions.
-	 *
-	 * We allow the user to vacuum a table if he is superuser, the table
-	 * owner, or the database owner (but in the latter case, only if it's not
-	 * a shared relation).  pg_class_ownercheck includes the superuser case.
-	 *
-	 * Note we choose to treat permissions failure as a WARNING and keep
-	 * trying to vacuum the rest of the DB --- is this appropriate?
+	 * Check if relation needs to be skipped based on ownership.  This
+	 * check happens also when building the relation list to vacuum
+	 * for a manual operation, and needs to be done additionally here
+	 * as VACUUM could happen across multiple transactions where relation
+	 * ownership could have changed in-between.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options))
 	{
-		if (onerel->rd_rel->relisshared)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser can vacuum it",
-							RelationGetRelationName(onerel))));
-		else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
-		else
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
 		relation_close(onerel, lmode);
 		PopActiveSnapshot();
 		CommitTransactionCommand();
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 85d472f0a5..85b181bf2f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -185,6 +186,8 @@ extern void vacuum_set_xid_limits(Relation rel,
 					  MultiXactId *mxactFullScanLimit);
 extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
+extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
+									 int options);
 
 /* in commands/vacuumlazy.c */
 extern void lazy_vacuum_rel(Relation onerel, int options,
diff --git a/src/test/isolation/expected/vacuum-conflict.out b/src/test/isolation/expected/vacuum-conflict.out
new file mode 100644
index 0000000000..06ac75ef23
--- /dev/null
+++ b/src/test/isolation/expected/vacuum-conflict.out
@@ -0,0 +1,149 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 48ae740739..c23b401225 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,6 +66,7 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: vacuum-conflict
 test: predicate-hash
 test: predicate-gist
 test: predicate-gin
diff --git a/src/test/isolation/specs/vacuum-conflict.spec b/src/test/isolation/specs/vacuum-conflict.spec
new file mode 100644
index 0000000000..9b45d26c65
--- /dev/null
+++ b/src/test/isolation/specs/vacuum-conflict.spec
@@ -0,0 +1,51 @@
+# Tests for locking conflicts with VACUUM and ANALYZE commands.
+
+setup
+{
+	CREATE ROLE regress_vacuum_conflict;
+	CREATE TABLE vacuum_tab (a int);
+}
+
+teardown
+{
+	DROP TABLE vacuum_tab;
+	DROP ROLE regress_vacuum_conflict;
+}
+
+session "s1"
+step "s1_begin"          { BEGIN; }
+step "s1_lock"           { LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE; }
+step "s1_commit"         { COMMIT; }
+
+session "s2"
+step "s2_grant"          { ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict; }
+step "s2_auth"           { SET ROLE regress_vacuum_conflict; }
+step "s2_vacuum"         { VACUUM vacuum_tab; }
+step "s2_analyze"        { ANALYZE vacuum_tab; }
+step "s2_reset"          { RESET ROLE; }
+
+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# The role has privileges to vacuum the table, VACUUM will block if
+# another session holds a lock on the table and succeed in all cases.
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
-- 
2.18.0

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Michael Paquier (#1)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Sun, Aug 12, 2018 at 10:21 PM, Michael Paquier <michael@paquier.xyz> wrote:

In the previous thread, we discussed a couple of approaches, but I was
not happy with any of those, hence I have been spending more time in
getting to a solution which has no user-facing changes, and still solves
the problems folks have been complaining about, and the result is the
patch attached. The patch changes a couple of things regarding ACL
checks, by simply centralizing the ownership checks into a single
routine used by both ANALYZE and VACUUM. This routine is then used in
two more places for manual ANALYZE and VACUUM:
- When specifying directly one or more relations in the command, in
expand_vacuum_rel().
- When building the complete list of relations to work on in the case of
a database-wide operation, in get_all_vacuum_rels().

I feel like you're not being very clear about exactly what this new
approach is. Sorry if I'm being dense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Robert Haas (#2)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Tue, Aug 14, 2018 at 03:26:29PM +0000, Robert Haas wrote:

I feel like you're not being very clear about exactly what this new
approach is. Sorry if I'm being dense.

On HEAD, we check the ownership of the relation vacuumed or analyzed
after taking a lock on it in respectively vacuum_rel() and
analyze_rel(), where we already know the OID of the relation and there
may be no RangeVar which we could use with RangeVarGetRelidExtended
(like partitions). I don't think that we want to use again
RangeVarGetRelidExtended once the relation OID is known anyway. So My
proposal is to add more ownership checks when building the list of
VacuumRelations, and skip the relations the user has no right to work on
at an earlier stage. Looking at the code may be easier to understand
than a comment, please remember that there are three code paths used to
build the list of VacuumRelations (each one may be processed in its own
transaction):
1) autovacuum, which specifies only one relation at a time with its OID,
and we build the list in expand_vacuum_rel(), which finishes with a
single element.
2) Manual VACUUM with a list of relation specified, where the list of
elements is built in the second part of expand_vacuum_rel(), which is
able to expand partitioned tables as well.
3) Manual VACUUM with no list specified, where the list of relations is
built in get_all_vacuum_rels().

My proposal is to add two more ownership checks in 2) and 3), and also
refactor the code so as we use a single routine for ANALYZE and VACUUM.
This has the advantage of not making the code of ANALYZE and VACUUM
diverge anymore for the existing ownership checks, and we still generate
WARNINGs if a relation needs to be skipped.

(Thinking about it, it could make sense to add an extra assert at the
beginning of expand_vacuum_rel like I did in 6551f3d...)
--
Michael

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Michael Paquier (#3)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Tue, Aug 14, 2018 at 11:59 AM, Michael Paquier <michael@paquier.xyz> wrote:

On HEAD, we check the ownership of the relation vacuumed or analyzed
after taking a lock on it in respectively vacuum_rel() and
analyze_rel(), where we already know the OID of the relation and there
may be no RangeVar which we could use with RangeVarGetRelidExtended
(like partitions). I don't think that we want to use again
RangeVarGetRelidExtended once the relation OID is known anyway.

We definitely don't want to use RangeVarGetRelidExtended more than
necessary. It is important that we use that function only when
necessary - that is, to look up names supplied by users - and it is
also important that we look up each user-supplied name only once, lest
we get different answers on different occasions, possibly introducing
either outright security problems or at the least ludicrous behavior.

In the case where we have an OID already, I think we could just
perform a permissions test before locking the OID. It's true that
permissions might be revoked after we test them and before the lock is
acquired, but that doesn't seem terrible. The real point of all of
this stuff is to keep users from locking objects which they never had
any right to access, not to worry about what happens if permissions
are concurrently revoked while we're getting the lock.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Robert Haas (#4)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Thu, Aug 16, 2018 at 03:07:18PM -0400, Robert Haas wrote:

We definitely don't want to use RangeVarGetRelidExtended more than
necessary. It is important that we use that function only when
necessary - that is, to look up names supplied by users - and it is
also important that we look up each user-supplied name only once, lest
we get different answers on different occasions, possibly introducing
either outright security problems or at the least ludicrous behavior.

Thanks, that matches my feelings about this stuff.

In the case where we have an OID already, I think we could just
perform a permissions test before locking the OID. It's true that
permissions might be revoked after we test them and before the lock is
acquired, but that doesn't seem terrible. The real point of all of
this stuff is to keep users from locking objects which they never had
any right to access, not to worry about what happens if permissions
are concurrently revoked while we're getting the lock.

One thing is that neither pg_class_ownercheck nor pg_database_ownercheck
are fail-safe, and would issue an ERROR when the relation does not
exist, and this can happen when using multiple transactions for VACUUM
FULL or such, so we cannot simply swap the owner checks before trying to
lock the relation in vacuum_rel() and analyze_rel(). Or we invent new
flavors of those routine able to handle missing relations, then swap the
ACL checks to happen before the relations are locked.

For VACUUM/ANALYZE, I tend to think that it is incorrect to include from
the start in the list of relations to process all the ones a user is not
an owner of, so my approach seems quite natural, at least to me. Each
one of the two approaches has its good and bad sides.
--
Michael

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#1)

1 attachment(s)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Mon, Aug 13, 2018 at 12:21:42AM +0200, Michael Paquier wrote:

The patch attached includes tests which I have used to also check that
correct error messages are produced for VACUUM, VACUUM ANALYZE and
ANALYZE.

I have reworked the patch on this side, clarifying the use of the new
common API for the logs. One thing I am wondering about is what do we
want to do when VACUUM ANALYZE is used. As of HEAD, if vacuum_rel()
stops, then analyze_rel() is never called, and the only log showing up
to a non-owner user would be:
skipping "foo" --- only superuser can vacuum it

With this patch, things become perhaps more confusing by generating two
WARNING log entries:
skipping "foo" --- only superuser can vacuum it
skipping "foo" --- only superuser can analyze it

We could either combine both in a single message, or just generate the
message for vacuum as HEAD does now. I have also added some simple
regression tests triggering the skipping logs for shared catalogs,
non-shared catalogs and non-owners. This could be a separate patch as
well.

Input is welcome.
--
Michael

Attachments:

0001-Improve-VACUUM-and-ANALYZE-by-avoiding-early-lock-qu.patchtext/x-diff; charset=us-asciiDownload

From 3faa871f439bf0b9477c978557f1b90604ffa5d3 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 17 Aug 2018 15:24:30 +0900
Subject: [PATCH] Improve VACUUM and ANALYZE by avoiding early lock queue

A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
to try to lock the relation but the thing would just lock.  When the
client specifies a list of relations and the relation needs to be
skipped, fail hard so as there is no conflict with any relation a user
has no rights to work on.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early lock,
for both manual VACUUM with and without a list of relations specified.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
---
 src/backend/commands/analyze.c                |  28 ++--
 src/backend/commands/vacuum.c                 | 152 +++++++++++++-----
 src/include/commands/vacuum.h                 |   3 +
 .../isolation/expected/vacuum-conflict.out    | 149 +++++++++++++++++
 src/test/isolation/isolation_schedule         |   1 +
 src/test/isolation/specs/vacuum-conflict.spec |  51 ++++++
 src/test/regress/expected/vacuum.out          |  31 ++++
 src/test/regress/sql/vacuum.sql               |  20 +++
 8 files changed, 379 insertions(+), 56 deletions(-)
 create mode 100644 src/test/isolation/expected/vacuum-conflict.out
 create mode 100644 src/test/isolation/specs/vacuum-conflict.spec

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3e148f03d0..4f0753e02a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -196,27 +196,17 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
 	}
 
 	/*
-	 * Check permissions --- this should match vacuum's check!
+	 * Check if relation needs to be skipped based on ownership.  This
+	 * check happens also when building the relation list to analyze
+	 * for a manual operation, and needs to be done additionally here
+	 * as ANALYZE could happen across multiple transactions where relation
+	 * ownership could have changed in-between.  Make sure to generate
+	 * only logs for ANALYZE in this case.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options & VACOPT_ANALYZE))
 	{
-		/* No need for a WARNING if we already complained during VACUUM */
-		if (!(options & VACOPT_VACUUM))
-		{
-			if (onerel->rd_rel->relisshared)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser can analyze it",
-								RelationGetRelationName(onerel))));
-			else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-			else
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-		}
 		relation_close(onerel, ShareUpdateExclusiveLock);
 		return;
 	}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ee32fe8871..39816a742d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -68,8 +68,8 @@ static BufferAccessStrategy vac_strategy;
 
 
 /* non-export function prototypes */
-static List *expand_vacuum_rel(VacuumRelation *vrel);
-static List *get_all_vacuum_rels(void);
+static List *expand_vacuum_rel(VacuumRelation *vrel, int options);
+static List *get_all_vacuum_rels(int options);
 static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
@@ -257,7 +257,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 			List	   *sublist;
 			MemoryContext old_context;
 
-			sublist = expand_vacuum_rel(vrel);
+			sublist = expand_vacuum_rel(vrel, options);
 			old_context = MemoryContextSwitchTo(vac_context);
 			newrels = list_concat(newrels, sublist);
 			MemoryContextSwitchTo(old_context);
@@ -265,7 +265,15 @@ vacuum(int options, List *relations, VacuumParams *params,
 		relations = newrels;
 	}
 	else
-		relations = get_all_vacuum_rels();
+		relations = get_all_vacuum_rels(options);
+
+	/*
+	 * Depending on the permission checks done while building the list
+	 * of relations to work on, it could be possible that the list is
+	 * empty, hence do nothing in this case.
+	 */
+	if (relations == NIL)
+		return;
 
 	/*
 	 * Decide whether we need to start/commit our own transactions.
@@ -408,6 +416,70 @@ vacuum(int options, List *relations, VacuumParams *params,
 	vac_context = NULL;
 }
 
+/*
+ * Check if a given relation can be safely vacuumed or not.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+	Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
+
+	/*
+	 * Check permissions.
+	 *
+	 * We allow the user to vacuum a table if he is superuser, the table
+	 * owner, or the database owner (but in the latter case, only if it's not
+	 * a shared relation).  pg_class_ownercheck includes the superuser case.
+	 *
+	 * Note we choose to treat permissions failure as a WARNING and keep
+	 * trying to vacuum the rest of the DB --- is this appropriate?
+	 */
+	if (!(pg_class_ownercheck(relid, GetUserId()) ||
+		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared)))
+	{
+		char		   *relname = NameStr(reltuple->relname);
+
+		if ((options & VACOPT_VACUUM) != 0)
+		{
+			if (reltuple->relisshared)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser can vacuum it",
+								relname)));
+			else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
+								relname)));
+			else
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
+								relname)));
+		}
+
+		if ((options & VACOPT_ANALYZE) != 0)
+		{
+			if (reltuple->relisshared)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser can analyze it",
+								relname)));
+			else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
+								relname)));
+			else
+				ereport(WARNING,
+						(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
+								relname)));
+		}
+
+		return false;
+	}
+
+	return true;
+}
+
+
 /*
  * Given a VacuumRelation, fill in the table OID if it wasn't specified,
  * and optionally add VacuumRelations for partitions of the table.
@@ -423,7 +495,7 @@ vacuum(int options, List *relations, VacuumParams *params,
  * are made in vac_context.
  */
 static List *
-expand_vacuum_rel(VacuumRelation *vrel)
+expand_vacuum_rel(VacuumRelation *vrel, int options)
 {
 	List	   *vacrels = NIL;
 	MemoryContext oldcontext;
@@ -456,6 +528,28 @@ expand_vacuum_rel(VacuumRelation *vrel)
 		 */
 		relid = RangeVarGetRelid(vrel->relation, AccessShareLock, false);
 
+		/*
+		 * To check whether the relation is a partitioned table and its
+		 * ownership, fetch its syscache entry.
+		 */
+		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for relation %u", relid);
+		classForm = (Form_pg_class) GETSTRUCT(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+		{
+			ReleaseSysCache(tuple);
+
+			/*
+			 * Release lock again with AccessShareLock -- see below for
+			 * the reason why this lock is released.
+			 */
+			UnlockRelationOid(relid, AccessShareLock);
+			return vacrels;
+		}
+
 		/*
 		 * Make a returnable VacuumRelation for this rel.
 		 */
@@ -465,14 +559,6 @@ expand_vacuum_rel(VacuumRelation *vrel)
 													  vrel->va_cols));
 		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * To check whether the relation is a partitioned table, fetch its
-		 * syscache entry.
-		 */
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-		if (!HeapTupleIsValid(tuple))
-			elog(ERROR, "cache lookup failed for relation %u", relid);
-		classForm = (Form_pg_class) GETSTRUCT(tuple);
 		include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
 		ReleaseSysCache(tuple);
 
@@ -530,7 +616,7 @@ expand_vacuum_rel(VacuumRelation *vrel)
  * the current database.  The list is built in vac_context.
  */
 static List *
-get_all_vacuum_rels(void)
+get_all_vacuum_rels(int options)
 {
 	List	   *vacrels = NIL;
 	Relation	pgclass;
@@ -545,6 +631,11 @@ get_all_vacuum_rels(void)
 	{
 		Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
 		MemoryContext oldcontext;
+		Oid			relid = HeapTupleGetOid(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+			continue;
 
 		/*
 		 * We include partitioned tables here; depending on which operation is
@@ -563,7 +654,7 @@ get_all_vacuum_rels(void)
 		 */
 		oldcontext = MemoryContextSwitchTo(vac_context);
 		vacrels = lappend(vacrels, makeVacuumRelation(NULL,
-													  HeapTupleGetOid(tuple),
+													  relid,
 													  NIL));
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -1436,30 +1527,17 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	}
 
 	/*
-	 * Check permissions.
-	 *
-	 * We allow the user to vacuum a table if he is superuser, the table
-	 * owner, or the database owner (but in the latter case, only if it's not
-	 * a shared relation).  pg_class_ownercheck includes the superuser case.
-	 *
-	 * Note we choose to treat permissions failure as a WARNING and keep
-	 * trying to vacuum the rest of the DB --- is this appropriate?
+	 * Check if relation needs to be skipped based on ownership.  This
+	 * check happens also when building the relation list to vacuum
+	 * for a manual operation, and needs to be done additionally here
+	 * as VACUUM could happen across multiple transactions where relation
+	 * ownership could have changed in-between.  Make sure to only generate
+	 * logs for VACUUM in this case.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options & VACOPT_VACUUM))
 	{
-		if (onerel->rd_rel->relisshared)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser can vacuum it",
-							RelationGetRelationName(onerel))));
-		else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
-		else
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
 		relation_close(onerel, lmode);
 		PopActiveSnapshot();
 		CommitTransactionCommand();
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 85d472f0a5..85b181bf2f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -185,6 +186,8 @@ extern void vacuum_set_xid_limits(Relation rel,
 					  MultiXactId *mxactFullScanLimit);
 extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
+extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
+									 int options);
 
 /* in commands/vacuumlazy.c */
 extern void lazy_vacuum_rel(Relation onerel, int options,
diff --git a/src/test/isolation/expected/vacuum-conflict.out b/src/test/isolation/expected/vacuum-conflict.out
new file mode 100644
index 0000000000..06ac75ef23
--- /dev/null
+++ b/src/test/isolation/expected/vacuum-conflict.out
@@ -0,0 +1,149 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 48ae740739..c23b401225 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,6 +66,7 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: vacuum-conflict
 test: predicate-hash
 test: predicate-gist
 test: predicate-gin
diff --git a/src/test/isolation/specs/vacuum-conflict.spec b/src/test/isolation/specs/vacuum-conflict.spec
new file mode 100644
index 0000000000..9b45d26c65
--- /dev/null
+++ b/src/test/isolation/specs/vacuum-conflict.spec
@@ -0,0 +1,51 @@
+# Tests for locking conflicts with VACUUM and ANALYZE commands.
+
+setup
+{
+	CREATE ROLE regress_vacuum_conflict;
+	CREATE TABLE vacuum_tab (a int);
+}
+
+teardown
+{
+	DROP TABLE vacuum_tab;
+	DROP ROLE regress_vacuum_conflict;
+}
+
+session "s1"
+step "s1_begin"          { BEGIN; }
+step "s1_lock"           { LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE; }
+step "s1_commit"         { COMMIT; }
+
+session "s2"
+step "s2_grant"          { ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict; }
+step "s2_auth"           { SET ROLE regress_vacuum_conflict; }
+step "s2_vacuum"         { VACUUM vacuum_tab; }
+step "s2_analyze"        { ANALYZE vacuum_tab; }
+step "s2_reset"          { RESET ROLE; }
+
+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# The role has privileges to vacuum the table, VACUUM will block if
+# another session holds a lock on the table and succeed in all cases.
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index d66e2aa3b7..6daee8e47d 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -122,3 +122,34 @@ LINE 1: ANALYZE (nonexistant-arg) does_not_exist;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
+-- relation ownership, WARNING logs generated as all are skipped.
+CREATE TABLE vacowned (a int);
+CREATE ROLE regress_vacuum;
+SET ROLE regress_vacuum;
+-- Simple table
+VACUUM vacowned;
+WARNING:  skipping "vacowned" --- only table or database owner can vacuum it
+ANALYZE vacowned;
+WARNING:  skipping "vacowned" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacowned;
+WARNING:  skipping "vacowned" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned" --- only table or database owner can analyze it
+-- Catalog
+VACUUM pg_catalog.pg_class;
+WARNING:  skipping "pg_class" --- only superuser or database owner can vacuum it
+ANALYZE pg_catalog.pg_class;
+WARNING:  skipping "pg_class" --- only superuser or database owner can analyze it
+VACUUM (ANALYZE) pg_catalog.pg_class;
+WARNING:  skipping "pg_class" --- only superuser or database owner can vacuum it
+WARNING:  skipping "pg_class" --- only superuser or database owner can analyze it
+-- Shared catalog
+VACUUM pg_catalog.pg_authid;
+WARNING:  skipping "pg_authid" --- only superuser can vacuum it
+ANALYZE pg_catalog.pg_authid;
+WARNING:  skipping "pg_authid" --- only superuser can analyze it
+VACUUM (ANALYZE) pg_catalog.pg_authid;
+WARNING:  skipping "pg_authid" --- only superuser can vacuum it
+WARNING:  skipping "pg_authid" --- only superuser can analyze it
+RESET ROLE;
+DROP TABLE vacowned;
+DROP ROLE regress_vacuum;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 275ce2e270..0feff7c413 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -96,3 +96,23 @@ ANALYZE (nonexistant-arg) does_not_exist;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
+
+-- relation ownership, WARNING logs generated as all are skipped.
+CREATE TABLE vacowned (a int);
+CREATE ROLE regress_vacuum;
+SET ROLE regress_vacuum;
+-- Simple table
+VACUUM vacowned;
+ANALYZE vacowned;
+VACUUM (ANALYZE) vacowned;
+-- Catalog
+VACUUM pg_catalog.pg_class;
+ANALYZE pg_catalog.pg_class;
+VACUUM (ANALYZE) pg_catalog.pg_class;
+-- Shared catalog
+VACUUM pg_catalog.pg_authid;
+ANALYZE pg_catalog.pg_authid;
+VACUUM (ANALYZE) pg_catalog.pg_authid;
+RESET ROLE;
+DROP TABLE vacowned;
+DROP ROLE regress_vacuum;
-- 
2.18.0

Bossart, Nathan

bossartn@amazon.com

over 7 years ago

In reply to: Michael Paquier (#6)

Re: Improve behavior of concurrent ANALYZE/VACUUM

Hi,

Sorry for the delay! I looked through the latest patch.

On 8/17/18, 1:43 AM, "Michael Paquier" <michael@paquier.xyz> wrote:

I have reworked the patch on this side, clarifying the use of the new
common API for the logs. One thing I am wondering about is what do we
want to do when VACUUM ANALYZE is used. As of HEAD, if vacuum_rel()
stops, then analyze_rel() is never called, and the only log showing up
to a non-owner user would be:
skipping "foo" --- only superuser can vacuum it

With this patch, things become perhaps more confusing by generating two
WARNING log entries:
skipping "foo" --- only superuser can vacuum it
skipping "foo" --- only superuser can analyze it

We could either combine both in a single message, or just generate the
message for vacuum as HEAD does now. I have also added some simple
regression tests triggering the skipping logs for shared catalogs,
non-shared catalogs and non-owners. This could be a separate patch as
well.

I like the idea of emitting a separate WARNING for each for clarity on
what operations are being skipped. However, I think it could be a bit
confusing with the current wording. Perhaps something like "skipping
vacuum of..." and "skipping analyze of..." would make things clearer.
Another thing to keep in mind is how often only one of these messages
will apply. IIUC the vast majority of VACUUM (ANALYZE) statements
that need to emit such log statements would emit both. Plus, while
VACUUM (ANALYZE) is clearly documented as performing both operations,
I can easily see the argument that users may view it as one big
command and that emitting multiple log entries could be a confusing
change in behavior.

In short, my vote would be to maintain the current behavior for now
and to bring up any logging improvements separately.

+       /*
+        * Depending on the permission checks done while building the list
+        * of relations to work on, it could be possible that the list is
+        * empty, hence do nothing in this case.
+        */
+       if (relations == NIL)
+               return;

It might be better to let the command go through normally so that we
don't miss any cleanup at the end (e.g. deleting vac_context).

+/*
+ * Check if a given relation can be safely vacuumed or not.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+       Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
+
+       /*
+        * Check permissions.
+        *
+        * We allow the user to vacuum a table if he is superuser, the table
+        * owner, or the database owner (but in the latter case, only if it's not
+        * a shared relation).  pg_class_ownercheck includes the superuser case.
+        *
+        * Note we choose to treat permissions failure as a WARNING and keep
+        * trying to vacuum the rest of the DB --- is this appropriate?
+        */

Do you think it's worth adding ANALYZE to these comments as well?

+       if (!(pg_class_ownercheck(relid, GetUserId()) ||
+                 (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared)))

Returning true right away when the role does have permissions might be
a nice way to save a level of indentation.

+               /*
+                * To check whether the relation is a partitioned table and its
+                * ownership, fetch its syscache entry.
+                */
+               tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+               if (!HeapTupleIsValid(tuple))
+                       elog(ERROR, "cache lookup failed for relation %u", relid);
+               classForm = (Form_pg_class) GETSTRUCT(tuple);
+
+               /* check permissions of relation */
+               if (!vacuum_is_relation_owner(relid, classForm, options))
+               {
+                       ReleaseSysCache(tuple);
+
+                       /*
+                        * Release lock again with AccessShareLock -- see below for
+                        * the reason why this lock is released.
+                        */
+                       UnlockRelationOid(relid, AccessShareLock);
+                       return vacrels;
+               }

I think this actually changes the behavior for partitioned tables.
Presently, we still go through and collect all the partitions in the
vacrels list. With this change, we will skip collecting a table's
partitions if the current role doesn't have the required permissions.
Perhaps we should skip adding the current relation to vacrels if
vacuum_is_relation_owner() returns false, and then we could go through
the partitions and check permissions on those as well. Since we don't
take any locks on the individual partitions yet, getting the relation
name and calling pg_class_ownercheck() safely might be tricky, though.

Nathan

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Bossart, Nathan (#7)

2 attachment(s)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Mon, Aug 20, 2018 at 08:57:00PM +0000, Bossart, Nathan wrote:

Sorry for the delay! I looked through the latest patch.

Thanks a lot for the review!

I like the idea of emitting a separate WARNING for each for clarity on
what operations are being skipped. However, I think it could be a bit
confusing with the current wording. Perhaps something like "skipping
vacuum of..." and "skipping analyze of..." would make things clearer.
Another thing to keep in mind is how often only one of these messages
will apply. IIUC the vast majority of VACUUM (ANALYZE) statements
that need to emit such log statements would emit both. Plus, while
VACUUM (ANALYZE) is clearly documented as performing both operations,
I can easily see the argument that users may view it as one big
command and that emitting multiple log entries could be a confusing
change in behavior.

In short, my vote would be to maintain the current behavior for now
and to bring up any logging improvements separately.

On the other hand, it would be useful for the user to know exactly what
is getting skipped. For example if VACUUM ANALYZE is used then both
operations would happen, but now the user would only know that VACUUM
has been skipped, and may miss the fact that ANALYZE was not attempted.
Let's do as you suggest at the end, aka if both VACOPT_VACUUM and
VACOPT_ANALYZE are passed down to vacuum_is_relation_owner, then only
the log for VACUUM is generated, which is consistent. Any other changes
could happen later on if necessary.

+       /*
+        * Depending on the permission checks done while building the list
+        * of relations to work on, it could be possible that the list is
+        * empty, hence do nothing in this case.
+        */
+       if (relations == NIL)
+               return;
It might be better to let the command go through normally so that we
don't miss any cleanup at the end (e.g. deleting vac_context).

Right, that was a bad idea. Updating datfrozenxid can actually be a
good thing.

+/*
+ * Check if a given relation can be safely vacuumed or not.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+       Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
+
+       /*
+        * Check permissions.
+        *
+        * We allow the user to vacuum a table if he is superuser, the table
+        * owner, or the database owner (but in the latter case, only if it's not
+        * a shared relation).  pg_class_ownercheck includes the superuser case.
+        *
+        * Note we choose to treat permissions failure as a WARNING and keep
+        * trying to vacuum the rest of the DB --- is this appropriate?
+        */

Do you think it's worth adding ANALYZE to these comments as well?

Done.

+       if (!(pg_class_ownercheck(relid, GetUserId()) ||
+                 (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared)))
Returning true right away when the role does have permissions might be
a nice way to save a level of indentation.

Done.

I think this actually changes the behavior for partitioned tables.
Presently, we still go through and collect all the partitions in the
vacrels list. With this change, we will skip collecting a table's
partitions if the current role doesn't have the required permissions.
Perhaps we should skip adding the current relation to vacrels if
vacuum_is_relation_owner() returns false, and then we could go through
the partitions and check permissions on those as well. Since we don't
take any locks on the individual partitions yet, getting the relation
name and calling pg_class_ownercheck() safely might be tricky, though.

Yes, that's actually intentional on my side as this keeps the logic more
simple, and we avoid risks around deadlocks when working on partitions.
It seems to me that it is also more intuitive to *not* scan a full
partition tree if the user does not have ownership on its root if the
relation is listed, instead of trying to scan all leafs to find perhaps
some of them. In most data models it would matter much anyway, no?
This is also more consistent with what is done for TRUNCATE where the
ACLs of the parent are considered first. The documentation also
actually mentions that:
"To vacuum a table, one must ordinarily be the table's owner or a
superuser."
Perhaps we could make that clearer for partitions, with something like:
"If listed explicitly, the vacuum of a partitioned table will include
all its partitions if the user is the owner of the partitioned table."

If we don't want to change the current behavior, then one simple
solution would be close to what you mention, aka skip adding the
partitioned table to the list, include *all* the partitions in the list
as we cannot sanely check their ACLs at this stage, and rely on the
checks already happening in vacuum_rel() and analyze_rel(). This would
cause the original early lock attempts to not be solved for partitions,
which is why the approach taken in the patches makes the most sense.

I have split the patch into two parts:
- 0001 includes new tests which generate WARNING messages for VACUUM,
ANALYZE and VACUUM (ANALYZE). That's useful separately.
- 0002 is the original patch discussed here.

Thanks,
--
Michael

Attachments:

0001-Add-regression-tests-for-VACUUM-and-ANALYZE-with-rel.patchtext/x-diff; charset=us-asciiDownload

From 4215f41d38d8a3cbe1d52d97bf471d745783ac79 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 21 Aug 2018 10:11:45 +0900
Subject: [PATCH 1/2] Add regression tests for VACUUM and ANALYZE with relation
 skips

When a user does not have ownership on a relation, then specific log
messages are generated.  This new test suite adds coverage for all the
possible log messages generated.
---
 src/test/regress/expected/vacuum.out | 28 ++++++++++++++++++++++++++++
 src/test/regress/sql/vacuum.sql      | 20 ++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index d66e2aa3b7..c9be71ef60 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -122,3 +122,31 @@ LINE 1: ANALYZE (nonexistant-arg) does_not_exist;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
+-- relation ownership, WARNING logs generated as all are skipped.
+CREATE TABLE vacowned (a int);
+CREATE ROLE regress_vacuum;
+SET ROLE regress_vacuum;
+-- Simple table
+VACUUM vacowned;
+WARNING:  skipping "vacowned" --- only table or database owner can vacuum it
+ANALYZE vacowned;
+WARNING:  skipping "vacowned" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacowned;
+WARNING:  skipping "vacowned" --- only table or database owner can vacuum it
+-- Catalog
+VACUUM pg_catalog.pg_class;
+WARNING:  skipping "pg_class" --- only superuser or database owner can vacuum it
+ANALYZE pg_catalog.pg_class;
+WARNING:  skipping "pg_class" --- only superuser or database owner can analyze it
+VACUUM (ANALYZE) pg_catalog.pg_class;
+WARNING:  skipping "pg_class" --- only superuser or database owner can vacuum it
+-- Shared catalog
+VACUUM pg_catalog.pg_authid;
+WARNING:  skipping "pg_authid" --- only superuser can vacuum it
+ANALYZE pg_catalog.pg_authid;
+WARNING:  skipping "pg_authid" --- only superuser can analyze it
+VACUUM (ANALYZE) pg_catalog.pg_authid;
+WARNING:  skipping "pg_authid" --- only superuser can vacuum it
+RESET ROLE;
+DROP TABLE vacowned;
+DROP ROLE regress_vacuum;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 275ce2e270..0feff7c413 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -96,3 +96,23 @@ ANALYZE (nonexistant-arg) does_not_exist;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
+
+-- relation ownership, WARNING logs generated as all are skipped.
+CREATE TABLE vacowned (a int);
+CREATE ROLE regress_vacuum;
+SET ROLE regress_vacuum;
+-- Simple table
+VACUUM vacowned;
+ANALYZE vacowned;
+VACUUM (ANALYZE) vacowned;
+-- Catalog
+VACUUM pg_catalog.pg_class;
+ANALYZE pg_catalog.pg_class;
+VACUUM (ANALYZE) pg_catalog.pg_class;
+-- Shared catalog
+VACUUM pg_catalog.pg_authid;
+ANALYZE pg_catalog.pg_authid;
+VACUUM (ANALYZE) pg_catalog.pg_authid;
+RESET ROLE;
+DROP TABLE vacowned;
+DROP ROLE regress_vacuum;
-- 
2.18.0

0002-Improve-VACUUM-and-ANALYZE-by-avoiding-early-lock-qu.patchtext/x-diff; charset=us-asciiDownload

From f5b0f63c50cb8436bed093f280124fb2e2709af1 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 21 Aug 2018 10:25:05 +0900
Subject: [PATCH 2/2] Improve VACUUM and ANALYZE by avoiding early lock queue

A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
to try to lock the relation but the thing would just lock.  When the
client specifies a list of relations and the relation needs to be
skipped, fail hard so as there is no conflict with any relation a user
has no rights to work on.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early lock,
for both manual VACUUM with and without a list of relations specified.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
---
 src/backend/commands/analyze.c                |  28 ++--
 src/backend/commands/vacuum.c                 | 153 +++++++++++++-----
 src/include/commands/vacuum.h                 |   3 +
 .../isolation/expected/vacuum-conflict.out    | 149 +++++++++++++++++
 src/test/isolation/isolation_schedule         |   1 +
 src/test/isolation/specs/vacuum-conflict.spec |  51 ++++++
 6 files changed, 329 insertions(+), 56 deletions(-)
 create mode 100644 src/test/isolation/expected/vacuum-conflict.out
 create mode 100644 src/test/isolation/specs/vacuum-conflict.spec

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3e148f03d0..4f0753e02a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -196,27 +196,17 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
 	}
 
 	/*
-	 * Check permissions --- this should match vacuum's check!
+	 * Check if relation needs to be skipped based on ownership.  This
+	 * check happens also when building the relation list to analyze
+	 * for a manual operation, and needs to be done additionally here
+	 * as ANALYZE could happen across multiple transactions where relation
+	 * ownership could have changed in-between.  Make sure to generate
+	 * only logs for ANALYZE in this case.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options & VACOPT_ANALYZE))
 	{
-		/* No need for a WARNING if we already complained during VACUUM */
-		if (!(options & VACOPT_VACUUM))
-		{
-			if (onerel->rd_rel->relisshared)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser can analyze it",
-								RelationGetRelationName(onerel))));
-			else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-			else
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-		}
 		relation_close(onerel, ShareUpdateExclusiveLock);
 		return;
 	}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ee32fe8871..2cf2393ce1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -68,8 +68,8 @@ static BufferAccessStrategy vac_strategy;
 
 
 /* non-export function prototypes */
-static List *expand_vacuum_rel(VacuumRelation *vrel);
-static List *get_all_vacuum_rels(void);
+static List *expand_vacuum_rel(VacuumRelation *vrel, int options);
+static List *get_all_vacuum_rels(int options);
 static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
@@ -257,7 +257,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 			List	   *sublist;
 			MemoryContext old_context;
 
-			sublist = expand_vacuum_rel(vrel);
+			sublist = expand_vacuum_rel(vrel, options);
 			old_context = MemoryContextSwitchTo(vac_context);
 			newrels = list_concat(newrels, sublist);
 			MemoryContextSwitchTo(old_context);
@@ -265,7 +265,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 		relations = newrels;
 	}
 	else
-		relations = get_all_vacuum_rels();
+		relations = get_all_vacuum_rels(options);
 
 	/*
 	 * Decide whether we need to start/commit our own transactions.
@@ -408,6 +408,79 @@ vacuum(int options, List *relations, VacuumParams *params,
 	vac_context = NULL;
 }
 
+/*
+ * Check if a given relation can be safely vacuumed or analyzed.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.  This
+ * routine is used for the decision-making of VACUUM and ANALYZE.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+	char		   *relname;
+
+	Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
+
+	/*
+	 * Check permissions.
+	 *
+	 * We allow the user to vacuum or analyze a table if he is superuser, the
+	 * table owner, or the database owner (but in the latter case, only if
+	 * it's not a shared relation).  pg_class_ownercheck includes the
+	 * superuser case.
+	 *
+	 * Note we choose to treat permissions failure as a WARNING and keep
+	 * trying to vacuum or analyze the rest of the DB --- is this appropriate?
+	 */
+	if (pg_class_ownercheck(relid, GetUserId()) ||
+		(pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared))
+		return true;
+
+	relname = NameStr(reltuple->relname);
+
+	if ((options & VACOPT_VACUUM) != 0)
+	{
+		if (reltuple->relisshared)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser can vacuum it",
+							relname)));
+		else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
+							relname)));
+		else
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
+							relname)));
+
+		/*
+		 * For VACUUM ANALYZE, both logs could show up, but just generate
+		 * information for VACUUM as that would be the first one to
+		 * process.
+		 */
+		return;
+	}
+
+	if ((options & VACOPT_ANALYZE) != 0)
+	{
+		if (reltuple->relisshared)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser can analyze it",
+							relname)));
+		else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
+							relname)));
+		else
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
+							relname)));
+	}
+
+	return false;
+}
+
+
 /*
  * Given a VacuumRelation, fill in the table OID if it wasn't specified,
  * and optionally add VacuumRelations for partitions of the table.
@@ -423,7 +496,7 @@ vacuum(int options, List *relations, VacuumParams *params,
  * are made in vac_context.
  */
 static List *
-expand_vacuum_rel(VacuumRelation *vrel)
+expand_vacuum_rel(VacuumRelation *vrel, int options)
 {
 	List	   *vacrels = NIL;
 	MemoryContext oldcontext;
@@ -456,6 +529,28 @@ expand_vacuum_rel(VacuumRelation *vrel)
 		 */
 		relid = RangeVarGetRelid(vrel->relation, AccessShareLock, false);
 
+		/*
+		 * To check whether the relation is a partitioned table and its
+		 * ownership, fetch its syscache entry.
+		 */
+		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for relation %u", relid);
+		classForm = (Form_pg_class) GETSTRUCT(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+		{
+			ReleaseSysCache(tuple);
+
+			/*
+			 * Release lock again with AccessShareLock -- see below for
+			 * the reason why this lock is released.
+			 */
+			UnlockRelationOid(relid, AccessShareLock);
+			return vacrels;
+		}
+
 		/*
 		 * Make a returnable VacuumRelation for this rel.
 		 */
@@ -465,14 +560,6 @@ expand_vacuum_rel(VacuumRelation *vrel)
 													  vrel->va_cols));
 		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * To check whether the relation is a partitioned table, fetch its
-		 * syscache entry.
-		 */
-		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-		if (!HeapTupleIsValid(tuple))
-			elog(ERROR, "cache lookup failed for relation %u", relid);
-		classForm = (Form_pg_class) GETSTRUCT(tuple);
 		include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
 		ReleaseSysCache(tuple);
 
@@ -530,7 +617,7 @@ expand_vacuum_rel(VacuumRelation *vrel)
  * the current database.  The list is built in vac_context.
  */
 static List *
-get_all_vacuum_rels(void)
+get_all_vacuum_rels(int options)
 {
 	List	   *vacrels = NIL;
 	Relation	pgclass;
@@ -545,6 +632,11 @@ get_all_vacuum_rels(void)
 	{
 		Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
 		MemoryContext oldcontext;
+		Oid			relid = HeapTupleGetOid(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+			continue;
 
 		/*
 		 * We include partitioned tables here; depending on which operation is
@@ -563,7 +655,7 @@ get_all_vacuum_rels(void)
 		 */
 		oldcontext = MemoryContextSwitchTo(vac_context);
 		vacrels = lappend(vacrels, makeVacuumRelation(NULL,
-													  HeapTupleGetOid(tuple),
+													  relid,
 													  NIL));
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -1436,30 +1528,17 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	}
 
 	/*
-	 * Check permissions.
-	 *
-	 * We allow the user to vacuum a table if he is superuser, the table
-	 * owner, or the database owner (but in the latter case, only if it's not
-	 * a shared relation).  pg_class_ownercheck includes the superuser case.
-	 *
-	 * Note we choose to treat permissions failure as a WARNING and keep
-	 * trying to vacuum the rest of the DB --- is this appropriate?
+	 * Check if relation needs to be skipped based on ownership.  This
+	 * check happens also when building the relation list to vacuum
+	 * for a manual operation, and needs to be done additionally here
+	 * as VACUUM could happen across multiple transactions where relation
+	 * ownership could have changed in-between.  Make sure to only generate
+	 * logs for VACUUM in this case.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options & VACOPT_VACUUM))
 	{
-		if (onerel->rd_rel->relisshared)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser can vacuum it",
-							RelationGetRelationName(onerel))));
-		else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
-		else
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
 		relation_close(onerel, lmode);
 		PopActiveSnapshot();
 		CommitTransactionCommand();
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 85d472f0a5..85b181bf2f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -185,6 +186,8 @@ extern void vacuum_set_xid_limits(Relation rel,
 					  MultiXactId *mxactFullScanLimit);
 extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
+extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
+									 int options);
 
 /* in commands/vacuumlazy.c */
 extern void lazy_vacuum_rel(Relation onerel, int options,
diff --git a/src/test/isolation/expected/vacuum-conflict.out b/src/test/isolation/expected/vacuum-conflict.out
new file mode 100644
index 0000000000..06ac75ef23
--- /dev/null
+++ b/src/test/isolation/expected/vacuum-conflict.out
@@ -0,0 +1,149 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 48ae740739..c23b401225 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,6 +66,7 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: vacuum-conflict
 test: predicate-hash
 test: predicate-gist
 test: predicate-gin
diff --git a/src/test/isolation/specs/vacuum-conflict.spec b/src/test/isolation/specs/vacuum-conflict.spec
new file mode 100644
index 0000000000..9b45d26c65
--- /dev/null
+++ b/src/test/isolation/specs/vacuum-conflict.spec
@@ -0,0 +1,51 @@
+# Tests for locking conflicts with VACUUM and ANALYZE commands.
+
+setup
+{
+	CREATE ROLE regress_vacuum_conflict;
+	CREATE TABLE vacuum_tab (a int);
+}
+
+teardown
+{
+	DROP TABLE vacuum_tab;
+	DROP ROLE regress_vacuum_conflict;
+}
+
+session "s1"
+step "s1_begin"          { BEGIN; }
+step "s1_lock"           { LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE; }
+step "s1_commit"         { COMMIT; }
+
+session "s2"
+step "s2_grant"          { ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict; }
+step "s2_auth"           { SET ROLE regress_vacuum_conflict; }
+step "s2_vacuum"         { VACUUM vacuum_tab; }
+step "s2_analyze"        { ANALYZE vacuum_tab; }
+step "s2_reset"          { RESET ROLE; }
+
+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# The role has privileges to vacuum the table, VACUUM will block if
+# another session holds a lock on the table and succeed in all cases.
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
-- 
2.18.0

Bossart, Nathan

bossartn@amazon.com

over 7 years ago

In reply to: Michael Paquier (#8)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On 8/20/18, 8:29 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

In short, my vote would be to maintain the current behavior for now
and to bring up any logging improvements separately.

On the other hand, it would be useful for the user to know exactly what
is getting skipped. For example if VACUUM ANALYZE is used then both
operations would happen, but now the user would only know that VACUUM
has been skipped, and may miss the fact that ANALYZE was not attempted.
Let's do as you suggest at the end, aka if both VACOPT_VACUUM and
VACOPT_ANALYZE are passed down to vacuum_is_relation_owner, then only
the log for VACUUM is generated, which is consistent. Any other changes
could happen later on if necessary.

Sounds good.

If we don't want to change the current behavior, then one simple
solution would be close to what you mention, aka skip adding the
partitioned table to the list, include *all* the partitions in the list
as we cannot sanely check their ACLs at this stage, and rely on the
checks already happening in vacuum_rel() and analyze_rel(). This would
cause the original early lock attempts to not be solved for partitions,
which is why the approach taken in the patches makes the most sense.

I think my biggest concern with this approach is that we'd be
introducing inconsistent behavior whenever there are concurrent
changes. If a user never had permissions to VACUUM the partitioned
table, the partitions are skipped outright. However, if the user
loses permissions to VACUUM the partitioned table between
expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
each individual partition.

I'll admit I don't have a great alternative proposal that doesn't
involve adding deadlock risk or complexity, but it still seems worth
mulling over.

I have split the patch into two parts:
- 0001 includes new tests which generate WARNING messages for VACUUM,
ANALYZE and VACUUM (ANALYZE). That's useful separately.

0001 looks good to me.

- 0002 is the original patch discussed here.

I'd suggest even splitting 0002 into two patches: one for refactoring
the existing permissions checks into vacuum_is_relation_owner() and
another for the new checks.

+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.

Can we add tests for concurrent changes that cause the relation to be
skipped in vacuum_rel() and analyze_rel() instead of
expand_vacuum_rel()?

Nathan

#10

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Bossart, Nathan (#9)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Tue, Aug 21, 2018 at 04:01:50PM +0000, Bossart, Nathan wrote:

I think my biggest concern with this approach is that we'd be
introducing inconsistent behavior whenever there are concurrent
changes. If a user never had permissions to VACUUM the partitioned
table, the partitions are skipped outright. However, if the user
loses permissions to VACUUM the partitioned table between
expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
each individual partition.

I'll admit I don't have a great alternative proposal that doesn't
involve adding deadlock risk or complexity, but it still seems worth
mulling over.

That counts only for a manual vacuum/analyze listing directly the
relation in question. If running a system-wide VACUUM then all the
relations are still processed. This is a rather edge case in my opinion
but.. I don't mind mulling over it (as you say). So please let me
think over it for a couple of days. I don't see a smart solution which
does not create risks of lock upgrades and deadlocks now, there may be
one able to preserve the existing behavior.

I have split the patch into two parts:
- 0001 includes new tests which generate WARNING messages for VACUUM,
ANALYZE and VACUUM (ANALYZE). That's useful separately.

0001 looks good to me.

Thanks, I have pushed this one.

- 0002 is the original patch discussed here.

I'd suggest even splitting 0002 into two patches: one for refactoring
the existing permissions checks into vacuum_is_relation_owner() and
another for the new checks.

Hmmm. The second patch changes also some comment blocks when calling
vacuum_is_relation_owner(), so we finish by changing the same code
areas, resulting in more code churn for no real gain.

+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.
Can we add tests for concurrent changes that cause the relation to be
skipped in vacuum_rel() and analyze_rel() instead of
expand_vacuum_rel()?

Doing that deterministically with concurrent tests look difficult to me
as doing ALTER TABLE OWNER TO to a relation in a first session causes a
second session running VACUUM to block in expand_vacuum_rel(), be it
with a plain table or a partitioned table (doing the ALTER TABLE on a
leaf will block scanning the parent as well).
--
Michael

#11

Bossart, Nathan

bossartn@amazon.com

over 7 years ago

In reply to: Michael Paquier (#10)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On 8/21/18, 7:44 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

On Tue, Aug 21, 2018 at 04:01:50PM +0000, Bossart, Nathan wrote:

I think my biggest concern with this approach is that we'd be
introducing inconsistent behavior whenever there are concurrent
changes. If a user never had permissions to VACUUM the partitioned
table, the partitions are skipped outright. However, if the user
loses permissions to VACUUM the partitioned table between
expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
each individual partition.

I'll admit I don't have a great alternative proposal that doesn't
involve adding deadlock risk or complexity, but it still seems worth
mulling over.

That counts only for a manual vacuum/analyze listing directly the
relation in question. If running a system-wide VACUUM then all the
relations are still processed. This is a rather edge case in my opinion
but.. I don't mind mulling over it (as you say). So please let me
think over it for a couple of days. I don't see a smart solution which
does not create risks of lock upgrades and deadlocks now, there may be
one able to preserve the existing behavior.

Right. If we don't come up with anything, the behavior change for
this edge case is probably reasonable as long as we update the
documentation like you proposed earlier.

- 0002 is the original patch discussed here.

I'd suggest even splitting 0002 into two patches: one for refactoring
the existing permissions checks into vacuum_is_relation_owner() and
another for the new checks.

Hmmm. The second patch changes also some comment blocks when calling
vacuum_is_relation_owner(), so we finish by changing the same code
areas, resulting in more code churn for no real gain.

I see. I only made this suggestion so that we could get some of the
easy changes out of the way, but there's no need if it's just adding
unnecessary code churn.

+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.
Can we add tests for concurrent changes that cause the relation to be
skipped in vacuum_rel() and analyze_rel() instead of
expand_vacuum_rel()?
Doing that deterministically with concurrent tests look difficult to me
as doing ALTER TABLE OWNER TO to a relation in a first session causes a
second session running VACUUM to block in expand_vacuum_rel(), be it
with a plain table or a partitioned table (doing the ALTER TABLE on a
leaf will block scanning the parent as well).

I think this is doable by locking the table in SHARE mode. That won't
conflict with the AccessShareLock that expand_vacuum_rel() obtains,
but it will conflict with the ShareUpdateExclusiveLock or
AccessExclusiveLock that vacuum_rel() takes.

session 1> BEGIN; LOCK test IN SHARE MODE;
session 2> VACUUM test;
session 1> ALTER TABLE test OWNER TO not_session_2_user; COMMIT;

Nathan

#12

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Bossart, Nathan (#11)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Wed, Aug 22, 2018 at 02:17:44AM +0000, Bossart, Nathan wrote:

I think this is doable by locking the table in SHARE mode. That won't
conflict with the AccessShareLock that expand_vacuum_rel() obtains,
but it will conflict with the ShareUpdateExclusiveLock or
AccessExclusiveLock that vacuum_rel() takes.

Good point. Still is that really worth adding? This implies a test
which has at least two roles, one switching the ownership to the other
and do so back-and-forth. At least that should be on a different
isolation spec file to not complicate the first one.
--
Michael

#13

Bossart, Nathan

bossartn@amazon.com

over 7 years ago

In reply to: Michael Paquier (#12)

1 attachment(s)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On 8/22/18, 12:37 AM, "Michael Paquier" <michael@paquier.xyz> wrote:

On Wed, Aug 22, 2018 at 02:17:44AM +0000, Bossart, Nathan wrote:

I think this is doable by locking the table in SHARE mode. That won't
conflict with the AccessShareLock that expand_vacuum_rel() obtains,
but it will conflict with the ShareUpdateExclusiveLock or
AccessExclusiveLock that vacuum_rel() takes.

Good point. Still is that really worth adding? This implies a test
which has at least two roles, one switching the ownership to the other
and do so back-and-forth. At least that should be on a different
isolation spec file to not complicate the first one.

I think so, since this is the only ownership checks we do on
individual partitions. Another simple way to test this would be to
create a partitioned table with a different owner than the partitions
and to run VACUUM as the partitioned table owner. In this case, we
would still rely on the checks in vacuum_rel() and analyze_rel(). IMO
this is a reason to avoid skipping gathering the individual partitions
based upon the ownership of the partitioned table. It's true that
this wouldn't fix the locking issue for partitions, but the
aforementioned edge case is still present with 0002 anyway. Plus, it
would add a bit more consistency to partition handling in VACUUM.

I've attached a patch that applies on top of 0002 that adds a simple
test to exercise the checks in vacuum_rel() and analyze_rel().

+		/*
+		 * For VACUUM ANALYZE, both logs could show up, but just generate
+		 * information for VACUUM as that would be the first one to
+		 * process.
+		 */
+		return;

We should probably return false here.

Nathan

Attachments:

vacuum_permission_checks.patchapplication/octet-stream; name=vacuum_permission_checks.patchDownload

diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index c9be71ef60..45e69573ce 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -119,9 +119,21 @@ ANALYZE (nonexistant-arg) does_not_exist;
 ERROR:  syntax error at or near "nonexistant"
 LINE 1: ANALYZE (nonexistant-arg) does_not_exist;
                  ^
+-- permission checks in vacuum_rel() and analyze_rel()
+CREATE ROLE vacparted_owner;
+ALTER TABLE vacparted OWNER TO vacparted_owner;
+SET ROLE vacparted_owner;
+VACUUM vacparted;
+WARNING:  skipping "vacparted1" --- only table or database owner can vacuum it
+ANALYZE vacparted;
+WARNING:  skipping "vacparted1" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacparted;
+WARNING:  skipping "vacparted1" --- only table or database owner can vacuum it
+RESET ROLE;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
+DROP ROLE vacparted_owner;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
 CREATE ROLE regress_vacuum;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 0feff7c413..9e4d081da9 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -93,9 +93,19 @@ ANALYZE vactst (i), vacparted (does_not_exist);
 ANALYZE (VERBOSE) does_not_exist;
 ANALYZE (nonexistant-arg) does_not_exist;
 
+-- permission checks in vacuum_rel() and analyze_rel()
+CREATE ROLE vacparted_owner;
+ALTER TABLE vacparted OWNER TO vacparted_owner;
+SET ROLE vacparted_owner;
+VACUUM vacparted;
+ANALYZE vacparted;
+VACUUM (ANALYZE) vacparted;
+RESET ROLE;
+
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
+DROP ROLE vacparted_owner;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);

#14

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Bossart, Nathan (#13)

1 attachment(s)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Wed, Aug 22, 2018 at 03:49:16PM +0000, Bossart, Nathan wrote:

I think so, since this is the only ownership checks we do on
individual partitions. Another simple way to test this would be to
create a partitioned table with a different owner than the partitions
and to run VACUUM as the partitioned table owner. In this case, we
would still rely on the checks in vacuum_rel() and analyze_rel(). IMO
this is a reason to avoid skipping gathering the individual partitions
based upon the ownership of the partitioned table. It's true that
this wouldn't fix the locking issue for partitions, but the
aforementioned edge case is still present with 0002 anyway. Plus, it
would add a bit more consistency to partition handling in VACUUM.

Normal regression tests are less costly than isolation tests, so let's
use them as possible. What you attached is covering only a portion of
all the scenarios though, as it is as well interesting to see what
happens if another user owns only the partitioned table, only one
partition, and the partitioned as well as at least one partition. I
have extended your patch as attached. It applies on top of HEAD. Once
applied with the other patch one can easily stop the difference in
behavior, and this stresses the ownership checks in vacuum_rel() and
analyze_rel() as well. Perhaps we could begin by that?

We should probably return false here.

Oh, my compiler complained here as well. Fixed it on my branch.
--
Michael

Attachments:

vacuum_permission_checks_v2.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index c9be71ef60..57e451738c 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -124,6 +124,9 @@ DROP TABLE vactst;
 DROP TABLE vacparted;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
+CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
+CREATE TABLE vacowned_part1 PARTITION OF vacowned_parted FOR VALUES IN (1);
+CREATE TABLE vacowned_part2 PARTITION OF vacowned_parted FOR VALUES IN (2);
 CREATE ROLE regress_vacuum;
 SET ROLE regress_vacuum;
 -- Simple table
@@ -147,6 +150,102 @@ ANALYZE pg_catalog.pg_authid;
 WARNING:  skipping "pg_authid" --- only superuser can analyze it
 VACUUM (ANALYZE) pg_catalog.pg_authid;
 WARNING:  skipping "pg_authid" --- only superuser can vacuum it
+-- partitioned table and its partitions, no ownership.
+-- Relations are not listed in a single command to test ownership
+-- independently.
+VACUUM vacowned_parted;
+WARNING:  skipping "vacowned_parted" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM vacowned_part1;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+VACUUM vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+ANALYZE vacowned_parted;
+WARNING:  skipping "vacowned_parted" --- only table or database owner can analyze it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+ANALYZE vacowned_part1;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
+ANALYZE vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacowned_parted;
+WARNING:  skipping "vacowned_parted" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM (ANALYZE) vacowned_part1;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+VACUUM (ANALYZE) vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+RESET ROLE;
+-- Partitioned table and one partition owned by other user.
+ALTER TABLE vacowned_parted OWNER TO regress_vacuum;
+ALTER TABLE vacowned_part1 OWNER TO regress_vacuum;
+SET ROLE regress_vacuum;
+VACUUM vacowned_parted;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM vacowned_part1;
+VACUUM vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+ANALYZE vacowned_parted;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+ANALYZE vacowned_part1;
+ANALYZE vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacowned_parted;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM (ANALYZE) vacowned_part1;
+VACUUM (ANALYZE) vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+RESET ROLE;
+-- Only one partition owned by other user
+ALTER TABLE vacowned_parted OWNER TO CURRENT_USER;
+SET ROLE regress_vacuum;
+VACUUM vacowned_parted;
+WARNING:  skipping "vacowned_parted" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM vacowned_part1;
+VACUUM vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+ANALYZE vacowned_parted;
+WARNING:  skipping "vacowned_parted" --- only table or database owner can analyze it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+ANALYZE vacowned_part1;
+ANALYZE vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacowned_parted;
+WARNING:  skipping "vacowned_parted" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM (ANALYZE) vacowned_part1;
+VACUUM (ANALYZE) vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+RESET ROLE;
+-- Only partitioned table owned by other user
+ALTER TABLE vacowned_parted OWNER TO regress_vacuum;
+ALTER TABLE vacowned_part1 OWNER TO CURRENT_USER;
+SET ROLE regress_vacuum;
+VACUUM vacowned_parted;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM vacowned_part1;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+VACUUM vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+ANALYZE vacowned_parted;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+ANALYZE vacowned_part1;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
+ANALYZE vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+VACUUM (ANALYZE) vacowned_parted;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+VACUUM (ANALYZE) vacowned_part1;
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
+VACUUM (ANALYZE) vacowned_part2;
+WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
 RESET ROLE;
 DROP TABLE vacowned;
+DROP TABLE vacowned_parted;
 DROP ROLE regress_vacuum;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 0feff7c413..fc4c4a3bcc 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -99,6 +99,9 @@ DROP TABLE vacparted;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
+CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
+CREATE TABLE vacowned_part1 PARTITION OF vacowned_parted FOR VALUES IN (1);
+CREATE TABLE vacowned_part2 PARTITION OF vacowned_parted FOR VALUES IN (2);
 CREATE ROLE regress_vacuum;
 SET ROLE regress_vacuum;
 -- Simple table
@@ -113,6 +116,60 @@ VACUUM (ANALYZE) pg_catalog.pg_class;
 VACUUM pg_catalog.pg_authid;
 ANALYZE pg_catalog.pg_authid;
 VACUUM (ANALYZE) pg_catalog.pg_authid;
+-- partitioned table and its partitions, no ownership.
+-- Relations are not listed in a single command to test ownership
+-- independently.
+VACUUM vacowned_parted;
+VACUUM vacowned_part1;
+VACUUM vacowned_part2;
+ANALYZE vacowned_parted;
+ANALYZE vacowned_part1;
+ANALYZE vacowned_part2;
+VACUUM (ANALYZE) vacowned_parted;
+VACUUM (ANALYZE) vacowned_part1;
+VACUUM (ANALYZE) vacowned_part2;
+RESET ROLE;
+-- Partitioned table and one partition owned by other user.
+ALTER TABLE vacowned_parted OWNER TO regress_vacuum;
+ALTER TABLE vacowned_part1 OWNER TO regress_vacuum;
+SET ROLE regress_vacuum;
+VACUUM vacowned_parted;
+VACUUM vacowned_part1;
+VACUUM vacowned_part2;
+ANALYZE vacowned_parted;
+ANALYZE vacowned_part1;
+ANALYZE vacowned_part2;
+VACUUM (ANALYZE) vacowned_parted;
+VACUUM (ANALYZE) vacowned_part1;
+VACUUM (ANALYZE) vacowned_part2;
+RESET ROLE;
+-- Only one partition owned by other user
+ALTER TABLE vacowned_parted OWNER TO CURRENT_USER;
+SET ROLE regress_vacuum;
+VACUUM vacowned_parted;
+VACUUM vacowned_part1;
+VACUUM vacowned_part2;
+ANALYZE vacowned_parted;
+ANALYZE vacowned_part1;
+ANALYZE vacowned_part2;
+VACUUM (ANALYZE) vacowned_parted;
+VACUUM (ANALYZE) vacowned_part1;
+VACUUM (ANALYZE) vacowned_part2;
+RESET ROLE;
+-- Only partitioned table owned by other user
+ALTER TABLE vacowned_parted OWNER TO regress_vacuum;
+ALTER TABLE vacowned_part1 OWNER TO CURRENT_USER;
+SET ROLE regress_vacuum;
+VACUUM vacowned_parted;
+VACUUM vacowned_part1;
+VACUUM vacowned_part2;
+ANALYZE vacowned_parted;
+ANALYZE vacowned_part1;
+ANALYZE vacowned_part2;
+VACUUM (ANALYZE) vacowned_parted;
+VACUUM (ANALYZE) vacowned_part1;
+VACUUM (ANALYZE) vacowned_part2;
 RESET ROLE;
 DROP TABLE vacowned;
+DROP TABLE vacowned_parted;
 DROP ROLE regress_vacuum;

#15

Bossart, Nathan

bossartn@amazon.com

over 7 years ago

In reply to: Michael Paquier (#14)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On 8/23/18, 12:08 AM, "Michael Paquier" <michael@paquier.xyz> wrote:

Normal regression tests are less costly than isolation tests, so let's
use them as possible. What you attached is covering only a portion of
all the scenarios though, as it is as well interesting to see what
happens if another user owns only the partitioned table, only one
partition, and the partitioned as well as at least one partition. I
have extended your patch as attached. It applies on top of HEAD. Once
applied with the other patch one can easily stop the difference in
behavior, and this stresses the ownership checks in vacuum_rel() and
analyze_rel() as well. Perhaps we could begin by that?

This seems reasonable to me. I think establishing the expected
behavior here is a good idea.

Nathan

#16

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Bossart, Nathan (#15)

1 attachment(s)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Thu, Aug 23, 2018 at 09:53:57PM +0000, Bossart, Nathan wrote:

This seems reasonable to me. I think establishing the expected
behavior here is a good idea.

Thanks, I have pushed the new test series, and reused it to check the
new version of the main patch, which is attached. I have added a commit
message and I have indented the thing.

After pondering about it, I have also reworked the portion for
partitioned tables so as the list of partitions processed is unchanged
on HEAD, and we keep a consistent behavior compared to past versions.
If VACUUM processing for partitioned tables was something new in 11, I
think that we could have considered it, but changing silently something
that people may rely on for more than one year now is not very
appealing.

I can personally imagine data models with multiple layers of partitions
where the top-most parent has the most restricted access, and then
things get more permitted the more down you get. For example let's
imagine a table listing a population, which is split by cities. The
top-most partitioned table references the whole country, which say only
the president has access to. Then there are partitions which can be
accessed only by the majors of each city. In this case, even if a mayor
does a VACUUM FULL of its leaf partition then a full read would be
blocked even for the president.

The reverse is technically possible, aka the top-most parent is not
really restrictive, and leafs get more and more restricted, but
logically that does not make much sense as the top-most parent would be
just useless for any kind of operations so as a full table scan.

Still, in the first case, say that each city major uses the same
application layer which vacuums the top-most parent, then we'd break
something that worked in 10 and 11.
--
Michael

Attachments:

vacuum-locks-v5.patchtext/x-diff; charset=us-asciiDownload

From 4f0b6da8236286ab1487d2a537ae777aa3a6490b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 24 Aug 2018 11:02:19 +0900
Subject: [PATCH] Improve VACUUM and ANALYZE by avoiding early lock queue

A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
to try to lock the relation but the thing would just lock.  When the
client specifies a list of relations and the relation needs to be
skipped, fail hard so as there is no conflict with any relation a user
has no rights to work on.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early lock,
for both manual VACUUM with and without a list of relations specified.

An isolation test is added emulating the fact that early locks do not
happen anymore.

When a partitioned table is listed in a manual VACUUM or ANALYZE
command, its full list of partitions is fetched, all partition gets
added to the list to work on, and then each one of them is processed one
by one, with ownership checks happening at the later phase of
vacuum_rel() or analyze_rel().  Trying to do early ownership checks for
each partition is proving to be tedious as this would result in deadlock
risks with lock upgrades, and skipping all partitions if the listed
partitioned table is not owned would result in a behavior change
compared to how Postgres 10 has implemented vacuum for partitioned
tables.  The original problem reported related to early lock queue for
critical relations is fixed anyway, so priority is given to avoiding a
backward-incompatible behavior.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
---
 src/backend/commands/analyze.c                |  28 +---
 src/backend/commands/vacuum.c                 | 156 +++++++++++++-----
 src/include/commands/vacuum.h                 |   3 +
 .../isolation/expected/vacuum-conflict.out    | 149 +++++++++++++++++
 src/test/isolation/isolation_schedule         |   1 +
 src/test/isolation/specs/vacuum-conflict.spec |  51 ++++++
 6 files changed, 328 insertions(+), 60 deletions(-)
 create mode 100644 src/test/isolation/expected/vacuum-conflict.out
 create mode 100644 src/test/isolation/specs/vacuum-conflict.spec

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3e148f03d0..edbdce81f2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -196,27 +196,17 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
 	}
 
 	/*
-	 * Check permissions --- this should match vacuum's check!
+	 * Check if relation needs to be skipped based on ownership.  This check
+	 * happens also when building the relation list to analyze for a manual
+	 * operation, and needs to be done additionally here as ANALYZE could
+	 * happen across multiple transactions where relation ownership could have
+	 * changed in-between.  Make sure to generate only logs for ANALYZE in
+	 * this case.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options & VACOPT_ANALYZE))
 	{
-		/* No need for a WARNING if we already complained during VACUUM */
-		if (!(options & VACOPT_VACUUM))
-		{
-			if (onerel->rd_rel->relisshared)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser can analyze it",
-								RelationGetRelationName(onerel))));
-			else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-			else
-				ereport(WARNING,
-						(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
-								RelationGetRelationName(onerel))));
-		}
 		relation_close(onerel, ShareUpdateExclusiveLock);
 		return;
 	}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ee32fe8871..f166509734 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -68,8 +68,8 @@ static BufferAccessStrategy vac_strategy;
 
 
 /* non-export function prototypes */
-static List *expand_vacuum_rel(VacuumRelation *vrel);
-static List *get_all_vacuum_rels(void);
+static List *expand_vacuum_rel(VacuumRelation *vrel, int options);
+static List *get_all_vacuum_rels(int options);
 static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
@@ -257,7 +257,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 			List	   *sublist;
 			MemoryContext old_context;
 
-			sublist = expand_vacuum_rel(vrel);
+			sublist = expand_vacuum_rel(vrel, options);
 			old_context = MemoryContextSwitchTo(vac_context);
 			newrels = list_concat(newrels, sublist);
 			MemoryContextSwitchTo(old_context);
@@ -265,7 +265,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 		relations = newrels;
 	}
 	else
-		relations = get_all_vacuum_rels();
+		relations = get_all_vacuum_rels(options);
 
 	/*
 	 * Decide whether we need to start/commit our own transactions.
@@ -408,6 +408,80 @@ vacuum(int options, List *relations, VacuumParams *params,
 	vac_context = NULL;
 }
 
+/*
+ * Check if a given relation can be safely vacuumed or analyzed.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.  This
+ * routine is used to decide if a relation can be processed for VACUUM or
+ * ANALYZE.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+	char	   *relname;
+
+	Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
+
+	/*
+	 * Check permissions.
+	 *
+	 * We allow the user to vacuum or analyze a table if he is superuser, the
+	 * table owner, or the database owner (but in the latter case, only if
+	 * it's not a shared relation).  pg_class_ownercheck includes the
+	 * superuser case.
+	 *
+	 * Note we choose to treat permissions failure as a WARNING and keep
+	 * trying to vacuum or analyze the rest of the DB --- is this appropriate?
+	 */
+	if (pg_class_ownercheck(relid, GetUserId()) ||
+		(pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared))
+		return true;
+
+	relname = NameStr(reltuple->relname);
+
+	if ((options & VACOPT_VACUUM) != 0)
+	{
+		if (reltuple->relisshared)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser can vacuum it",
+							relname)));
+		else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
+							relname)));
+		else
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
+							relname)));
+
+		/*
+		 * For VACUUM ANALYZE, both logs could show up, but just generate
+		 * information for VACUUM as that would be the first one to be
+		 * processed.
+		 */
+		return false;
+	}
+
+	if ((options & VACOPT_ANALYZE) != 0)
+	{
+		if (reltuple->relisshared)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser can analyze it",
+							relname)));
+		else if (reltuple->relnamespace == PG_CATALOG_NAMESPACE)
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only superuser or database owner can analyze it",
+							relname)));
+		else
+			ereport(WARNING,
+					(errmsg("skipping \"%s\" --- only table or database owner can analyze it",
+							relname)));
+	}
+
+	return false;
+}
+
+
 /*
  * Given a VacuumRelation, fill in the table OID if it wasn't specified,
  * and optionally add VacuumRelations for partitions of the table.
@@ -423,7 +497,7 @@ vacuum(int options, List *relations, VacuumParams *params,
  * are made in vac_context.
  */
 static List *
-expand_vacuum_rel(VacuumRelation *vrel)
+expand_vacuum_rel(VacuumRelation *vrel, int options)
 {
 	List	   *vacrels = NIL;
 	MemoryContext oldcontext;
@@ -457,22 +531,28 @@ expand_vacuum_rel(VacuumRelation *vrel)
 		relid = RangeVarGetRelid(vrel->relation, AccessShareLock, false);
 
 		/*
-		 * Make a returnable VacuumRelation for this rel.
-		 */
-		oldcontext = MemoryContextSwitchTo(vac_context);
-		vacrels = lappend(vacrels, makeVacuumRelation(vrel->relation,
-													  relid,
-													  vrel->va_cols));
-		MemoryContextSwitchTo(oldcontext);
-
-		/*
-		 * To check whether the relation is a partitioned table, fetch its
-		 * syscache entry.
+		 * To check whether the relation is a partitioned table and its
+		 * ownership, fetch its syscache entry.
 		 */
 		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for relation %u", relid);
 		classForm = (Form_pg_class) GETSTRUCT(tuple);
+
+		/*
+		 * Make a returnable VacuumRelation for this rel if user is a proper
+		 * owner.
+		 */
+		if (vacuum_is_relation_owner(relid, classForm, options))
+		{
+			oldcontext = MemoryContextSwitchTo(vac_context);
+			vacrels = lappend(vacrels, makeVacuumRelation(vrel->relation,
+														  relid,
+														  vrel->va_cols));
+			MemoryContextSwitchTo(oldcontext);
+		}
+
+
 		include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
 		ReleaseSysCache(tuple);
 
@@ -481,7 +561,9 @@ expand_vacuum_rel(VacuumRelation *vrel)
 		 * the list returned by find_all_inheritors() includes the passed-in
 		 * OID, so we have to skip that.  There's no point in taking locks on
 		 * the individual partitions yet, and doing so would just add
-		 * unnecessary deadlock risk.
+		 * unnecessary deadlock risk.  For this last reason we do not check
+		 * yet the ownership of the partitions, which get added to the list to
+		 * process.  Ownership will be checked later on anyway.
 		 */
 		if (include_parts)
 		{
@@ -530,7 +612,7 @@ expand_vacuum_rel(VacuumRelation *vrel)
  * the current database.  The list is built in vac_context.
  */
 static List *
-get_all_vacuum_rels(void)
+get_all_vacuum_rels(int options)
 {
 	List	   *vacrels = NIL;
 	Relation	pgclass;
@@ -545,6 +627,11 @@ get_all_vacuum_rels(void)
 	{
 		Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
 		MemoryContext oldcontext;
+		Oid			relid = HeapTupleGetOid(tuple);
+
+		/* check permissions of relation */
+		if (!vacuum_is_relation_owner(relid, classForm, options))
+			continue;
 
 		/*
 		 * We include partitioned tables here; depending on which operation is
@@ -563,7 +650,7 @@ get_all_vacuum_rels(void)
 		 */
 		oldcontext = MemoryContextSwitchTo(vac_context);
 		vacrels = lappend(vacrels, makeVacuumRelation(NULL,
-													  HeapTupleGetOid(tuple),
+													  relid,
 													  NIL));
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -1436,30 +1523,17 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	}
 
 	/*
-	 * Check permissions.
-	 *
-	 * We allow the user to vacuum a table if he is superuser, the table
-	 * owner, or the database owner (but in the latter case, only if it's not
-	 * a shared relation).  pg_class_ownercheck includes the superuser case.
-	 *
-	 * Note we choose to treat permissions failure as a WARNING and keep
-	 * trying to vacuum the rest of the DB --- is this appropriate?
+	 * Check if relation needs to be skipped based on ownership.  This check
+	 * happens also when building the relation list to vacuum for a manual
+	 * operation, and needs to be done additionally here as VACUUM could
+	 * happen across multiple transactions where relation ownership could have
+	 * changed in-between.  Make sure to only generate logs for VACUUM in this
+	 * case.
 	 */
-	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
-		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
+	if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
+								  onerel->rd_rel,
+								  options & VACOPT_VACUUM))
 	{
-		if (onerel->rd_rel->relisshared)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser can vacuum it",
-							RelationGetRelationName(onerel))));
-		else if (onerel->rd_rel->relnamespace == PG_CATALOG_NAMESPACE)
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only superuser or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
-		else
-			ereport(WARNING,
-					(errmsg("skipping \"%s\" --- only table or database owner can vacuum it",
-							RelationGetRelationName(onerel))));
 		relation_close(onerel, lmode);
 		PopActiveSnapshot();
 		CommitTransactionCommand();
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 85d472f0a5..5af96fdc8a 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -185,6 +186,8 @@ extern void vacuum_set_xid_limits(Relation rel,
 					  MultiXactId *mxactFullScanLimit);
 extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
+extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
+						 int options);
 
 /* in commands/vacuumlazy.c */
 extern void lazy_vacuum_rel(Relation onerel, int options,
diff --git a/src/test/isolation/expected/vacuum-conflict.out b/src/test/isolation/expected/vacuum-conflict.out
new file mode 100644
index 0000000000..06ac75ef23
--- /dev/null
+++ b/src/test/isolation/expected/vacuum-conflict.out
@@ -0,0 +1,149 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can vacuum it
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_auth: SET ROLE regress_vacuum_conflict;
+WARNING:  skipping "vacuum_tab" --- only table or database owner can analyze it
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_vacuum s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_vacuum s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_vacuum: VACUUM vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_vacuum: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_vacuum s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_vacuum: VACUUM vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s1_lock s2_auth s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s2_analyze s1_lock s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_grant s2_auth s1_lock s2_analyze s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_analyze: ANALYZE vacuum_tab; <waiting ...>
+step s1_commit: COMMIT;
+step s2_analyze: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s2_grant s2_auth s2_analyze s1_begin s1_lock s1_commit s2_reset
+step s2_grant: ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict;
+step s2_auth: SET ROLE regress_vacuum_conflict;
+step s2_analyze: ANALYZE vacuum_tab;
+step s1_begin: BEGIN;
+step s1_lock: LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 48ae740739..c23b401225 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,6 +66,7 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: vacuum-conflict
 test: predicate-hash
 test: predicate-gist
 test: predicate-gin
diff --git a/src/test/isolation/specs/vacuum-conflict.spec b/src/test/isolation/specs/vacuum-conflict.spec
new file mode 100644
index 0000000000..9b45d26c65
--- /dev/null
+++ b/src/test/isolation/specs/vacuum-conflict.spec
@@ -0,0 +1,51 @@
+# Tests for locking conflicts with VACUUM and ANALYZE commands.
+
+setup
+{
+	CREATE ROLE regress_vacuum_conflict;
+	CREATE TABLE vacuum_tab (a int);
+}
+
+teardown
+{
+	DROP TABLE vacuum_tab;
+	DROP ROLE regress_vacuum_conflict;
+}
+
+session "s1"
+step "s1_begin"          { BEGIN; }
+step "s1_lock"           { LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE; }
+step "s1_commit"         { COMMIT; }
+
+session "s2"
+step "s2_grant"          { ALTER TABLE vacuum_tab OWNER TO regress_vacuum_conflict; }
+step "s2_auth"           { SET ROLE regress_vacuum_conflict; }
+step "s2_vacuum"         { VACUUM vacuum_tab; }
+step "s2_analyze"        { ANALYZE vacuum_tab; }
+step "s2_reset"          { RESET ROLE; }
+
+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# The role has privileges to vacuum the table, VACUUM will block if
+# another session holds a lock on the table and succeed in all cases.
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_vacuum" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_vacuum" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_vacuum" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
+
+# Same as previously for ANALYZE
+permutation "s1_begin" "s2_grant" "s1_lock" "s2_auth" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_analyze" "s1_lock" "s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_lock" "s2_analyze" "s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_analyze" "s1_begin" "s1_lock" "s1_commit" "s2_reset"
-- 
2.18.0

#17

Bossart, Nathan

bossartn@amazon.com

over 7 years ago

In reply to: Michael Paquier (#16)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On 8/23/18, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

Thanks, I have pushed the new test series, and reused it to check the
new version of the main patch, which is attached. I have added a commit
message and I have indented the thing.

Thanks for the new version!

After pondering about it, I have also reworked the portion for
partitioned tables so as the list of partitions processed is unchanged
on HEAD, and we keep a consistent behavior compared to past versions.
If VACUUM processing for partitioned tables was something new in 11, I
think that we could have considered it, but changing silently something
that people may rely on for more than one year now is not very
appealing.

Agreed. Even though we're not fixing the issue for partitions yet,
this patch should still fix the originally reported authentication
issue (which I see is highlighted in your commit message). I think
there's still a slight behavior change with the ordering of the
"skipped" log messages in some cases, but that doesn't seem terribly
important. We might be able to work around this by storing all the
information we need for the log message in the VacuumRelation and
waiting to emit it until vacuum_rel() or analyze_rel(), but I doubt
it's worth the complexity.

Without patch:
postgres=> VACUUM parted1, parted2;
WARNING: skipping "parted1" --- only table or database owner can vacuum it
WARNING: skipping "parted1_part1" --- only table or database owner can vacuum it
WARNING: skipping "parted1_part2" --- only table or database owner can vacuum it
WARNING: skipping "parted2" --- only table or database owner can vacuum it
WARNING: skipping "parted2_part1" --- only table or database owner can vacuum it
WARNING: skipping "parted2_part2" --- only table or database owner can vacuum it
VACUUM

With patch:
postgres=> VACUUM parted1, parted2;
WARNING: skipping "parted1" --- only table or database owner can vacuum it
WARNING: skipping "parted2" --- only table or database owner can vacuum it
WARNING: skipping "parted1_part1" --- only table or database owner can vacuum it
WARNING: skipping "parted1_part2" --- only table or database owner can vacuum it
WARNING: skipping "parted2_part1" --- only table or database owner can vacuum it
WARNING: skipping "parted2_part2" --- only table or database owner can vacuum it
VACUUM

The new version of the patch applies cleanly, builds cleanly, and
'make check-world' succeeds. Also, I'm no longer able to reproduce
the authentication issue involving 'VACUUM FULL' run by non-
superusers, so it looks good to me.

Nathan

#18

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Bossart, Nathan (#17)

Re: Improve behavior of concurrent ANALYZE/VACUUM

On Fri, Aug 24, 2018 at 05:30:01PM +0000, Bossart, Nathan wrote:

On 8/23/18, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

Thanks, I have pushed the new test series, and reused it to check the
new version of the main patch, which is attached. I have added a commit
message and I have indented the thing.

Thanks for the new version!

Finally, I have been able to come back to it, and pushed the latest
version. We have come a long way... I'll check the rest of the backend
code for weird calls of relation_open or such. We may have other cases
with similar problems.

After pondering about it, I have also reworked the portion for
partitioned tables so as the list of partitions processed is unchanged
on HEAD, and we keep a consistent behavior compared to past versions.
If VACUUM processing for partitioned tables was something new in 11, I
think that we could have considered it, but changing silently something
that people may rely on for more than one year now is not very
appealing.

Agreed. Even though we're not fixing the issue for partitions yet,
this patch should still fix the originally reported authentication
issue (which I see is highlighted in your commit message). I think
there's still a slight behavior change with the ordering of the
"skipped" log messages in some cases, but that doesn't seem terribly
important. We might be able to work around this by storing all the
information we need for the log message in the VacuumRelation and
waiting to emit it until vacuum_rel() or analyze_rel(), but I doubt
it's worth the complexity.

This one is definitely not worth worrying in my opinion, we still
process the same relations, and the order is preserved when using a
single relation.

The new version of the patch applies cleanly, builds cleanly, and
'make check-world' succeeds. Also, I'm no longer able to reproduce
the authentication issue involving 'VACUUM FULL' run by non-
superusers, so it looks good to me.

Thanks for the help!
--
Michael