Allow non-superuser to cancel superuser tasks.
Hi hackers!
In our Cloud we have a patch, which allows non-superuser role ('mdb_admin')
to do some superuser things.
In particular, we have a patch that allows mdb admin to cancel the
autovacuum process and some other processes (processes with
application_name = 'MDB'), see the attachment.
This is needed to allow non-superuser roles to run pg_repack and to cancel
pg_repack.
We need to cancel running autovac to run pg_repack (because of locks), and
we need to cancel pg_repack sometimes also.
I want to reduce our internal patch size and transfer this logic to
extension or to core.
I have found similar threads [1]/messages/by-id/F9408A5A-B20B-42D2-9E7F-49CD3D1547BC@enterprisedb.com and [2]/messages/by-id/20220722203735.GB3996698@nathanxps13, but, as far as I understand, they
do not solve this particular case.
I see 2 possible ways to implement this. The first one is to have hool in
pg_signal_backend, and define a hook in extension which can do the thing.
The second one is to have a predefined role. Something like a
`pg_signal_autovacuum` role which can signal running autovac to cancel. But
I don't see how we can handle specific `application_name` with this
solution.
[1]: /messages/by-id/F9408A5A-B20B-42D2-9E7F-49CD3D1547BC@enterprisedb.com
/messages/by-id/F9408A5A-B20B-42D2-9E7F-49CD3D1547BC@enterprisedb.com
[2]: /messages/by-id/20220722203735.GB3996698@nathanxps13
/messages/by-id/20220722203735.GB3996698@nathanxps13
Attachments:
v1-0001-cloud-mdb_admin-patch-part-to-illustrate.patchapplication/octet-stream; name=v1-0001-cloud-mdb_admin-patch-part-to-illustrate.patchDownload
From 23b2805caf73f44167472b668702fb8746681ba3 Mon Sep 17 00:00:00 2001
From: Kirill Reshke <reshke@qavm-ac96707f.qemu>
Date: Mon, 26 Feb 2024 09:59:13 +0300
Subject: [PATCH v1] cloud mdb_admin patch part to illustrate.
---
src/backend/storage/ipc/signalfuncs.c | 30 +++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 81d1a59659..1885e31e7d 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -74,14 +74,40 @@ pg_signal_backend(int pid, int sig)
return SIGNAL_BACKEND_ERROR;
}
+ local_beentry = pgstat_get_local_beentry_by_backend_id(proc->backendId);
+ if (local_beentry != NULL)
+ beentry = &local_beentry->backendStatus;
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
*/
if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
- return SIGNAL_BACKEND_NOSUPERUSER;
+ !superuser()) {
+ Oid role;
+ char * appname;
+
+ if (local_beentry == NULL) {
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+
+ role = get_role_oid("mdb_admin", true /*if nodoby created mdb_admin role in this database*/);
+ appname = local_beentry->backendStatus.st_appname;
+
+ // only allow mdb_admin to kill su queries
+ if (!is_member_of_role(GetUserId(), role)) {
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+
+ /* mdb admin allowed to kill proc with application name 'MDB' or autovacuum */
+ if (local_beentry->backendStatus.st_backendType == B_AUTOVAC_WORKER) {
+ // ok
+ } else if (appname != NULL && strcmp(appname, "MDB") == 0) {
+ // ok
+ } else {
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+ }
/* Users can signal backends they have role membership in. */
if (!has_privs_of_role(GetUserId(), proc->roleId) &&
--
2.43.1
On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
I see 2 possible ways to implement this. The first one is to have hool in
pg_signal_backend, and define a hook in extension which can do the thing.
The second one is to have a predefined role. Something like a
`pg_signal_autovacuum` role which can signal running autovac to cancel. But
I don't see how we can handle specific `application_name` with this
solution.
pg_signal_autovacuum seems useful given commit 3a9b18b.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Mon, 26 Feb 2024 at 20:10, Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
I see 2 possible ways to implement this. The first one is to have hool in
pg_signal_backend, and define a hook in extension which can do the thing.
The second one is to have a predefined role. Something like a
`pg_signal_autovacuum` role which can signal running autovac to cancel.But
I don't see how we can handle specific `application_name` with this
solution.pg_signal_autovacuum seems useful given commit 3a9b18b.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Thank you for your response.
Please find a patch attached.
In patch, pg_signal_autovacuum role with oid 6312 added. I grabbed oid from
unused_oids script output.
Also, tap tests for functionality added. I'm not sure where to place them,
so I placed them in a separate directory in `src/test/`
Seems that regression tests for this feature are not possible, am i right?
Also, I was thinking of pg_signal_autovacuum vs pg_signal_backend.
Should pg_signal_autovacuum have power of pg_signal_backend (implicity)? Or
should this role have such little scope...
Attachments:
v1-0001-Allow-non-superuser-to-cancel-superuser-tasks.patchapplication/octet-stream; name=v1-0001-Allow-non-superuser-to-cancel-superuser-tasks.patchDownload
From d096d391630186fa89a937f1bd209d35754181c2 Mon Sep 17 00:00:00 2001
From: reshke kirill <reshke@double.cloud>
Date: Mon, 26 Feb 2024 20:10:21 +0000
Subject: [PATCH v1] Allow non-superuser to cancel superuser tasks.
Patch adds new pre-defined role `pg_signal_autovacuum` and tap tests.
Role, granted with `pg_signal_autovacuum` is able to cancel running
autovacuum worker. Note that to cancel all other type of queries
role need to be granted with `pg_singal_backend`.
---
src/backend/storage/ipc/signalfuncs.c | 28 ++++++---
src/include/catalog/pg_authid.dat | 5 ++
src/test/signals/.gitignore | 2 +
src/test/signals/Makefile | 23 ++++++++
src/test/signals/t/001_signal_autovacuum.pl | 63 +++++++++++++++++++++
5 files changed, 113 insertions(+), 8 deletions(-)
create mode 100644 src/test/signals/.gitignore
create mode 100644 src/test/signals/Makefile
create mode 100644 src/test/signals/t/001_signal_autovacuum.pl
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 81d1a59659..c006f4101e 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -78,15 +78,27 @@ pg_signal_backend(int pid, int sig)
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
+ * In later case, check if pid is actually autovacuum_worker, and allow
+ * to signal if role has proper priviledge.
*/
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
- return SIGNAL_BACKEND_NOSUPERUSER;
-
- /* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
- return SIGNAL_BACKEND_NOPERMISSION;
+ if (!superuser()) {
+ if (!OidIsValid(proc->roleId)) {
+ LocalPgBackendStatus *local_beentry;
+ local_beentry = pgstat_get_local_beentry_by_backend_id(proc->backendId);
+
+ if (!(local_beentry && local_beentry->backendStatus.st_backendType == B_AUTOVAC_WORKER &&
+ has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)))
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ } else {
+ if (superuser_arg(proc->roleId))
+ return SIGNAL_BACKEND_NOSUPERUSER;
+
+ /* Users can signal backends they have role membership in. */
+ if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ return SIGNAL_BACKEND_NOPERMISSION;
+ }
+ }
/*
* Can the process we just validated above end, followed by the pid being
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 82a2ec2862..aa0d9845ef 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -94,5 +94,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '6312', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM',
+ rolname => 'pg_signal_autovacuum', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
diff --git a/src/test/signals/.gitignore b/src/test/signals/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/signals/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/signals/Makefile b/src/test/signals/Makefile
new file mode 100644
index 0000000000..688f6a7e9e
--- /dev/null
+++ b/src/test/signals/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/recovery
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/recovery/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/signals
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
+
+clean distclean maintainer-clean:
+ rm -rf tmp_check
diff --git a/src/test/signals/t/001_signal_autovacuum.pl b/src/test/signals/t/001_signal_autovacuum.pl
new file mode 100644
index 0000000000..ec4b3e78de
--- /dev/null
+++ b/src/test/signals/t/001_signal_autovacuum.pl
@@ -0,0 +1,63 @@
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
+
+# Minimal test testing pg_signal_autovacuum role.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init();
+$node_primary->start;
+
+$node_primary->safe_psql('postgres',
+ "
+ CREATE DATABASE regress;
+ CREATE ROLE psa_reg_role_1;
+ CREATE ROLE psa_reg_role_2;
+ GRANT pg_signal_backend TO psa_reg_role_1;
+ GRANT pg_signal_autovacuum TO psa_reg_role_2;
+");
+
+# Create some content on primary and set autovacuum setting such that
+# it would be triggered.
+$node_primary->safe_psql('regress',
+ "
+ CREATE TABLE tab_int(i int);
+ INSERT INTO tab_int SELECT * FROM generate_series(1, 1000000);
+ ALTER SYSTEM SET autovacuum_vacuum_cost_limit TO 1;
+ ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO 100;
+ ALTER SYSTEM SET autovacuum_naptime TO 1;
+");
+
+$node_primary->restart;
+
+#wait for autovac to start.
+
+sleep 1;
+
+my $res_pid = $node_primary->safe_psql('regress',
+ "
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker' and datname = 'regress';;
+");
+
+
+my ($res_reg_psa_1, $stdout_reg_psa_1, $stderr_reg_psa_1) = $node_primary->psql('regress',
+ "
+ SET ROLE psa_reg_role_1;
+ SELECT pg_terminate_backend($res_pid);
+");
+
+ok($res_reg_psa_1 != 0, "should fail for non pg_signal_autovacuum");
+like($stderr_reg_psa_1, qr/Only roles with the SUPERUSER attribute may terminate processes of roles with the SUPERUSER attribute./, "matches");
+
+my ($res_reg_psa_2, $stdout_reg_psa_2, $stderr_reg_psa_2) = $node_primary->psql('regress', "
+ SET ROLE psa_reg_role_2;
+ SELECT pg_terminate_backend($res_pid);
+");
+
+ok($res_reg_psa_2 == 0, "should success for pg_signal_autovacuum");
+
+done_testing();
--
2.34.1
On Tue, 27 Feb 2024 at 01:22, Kirill Reshke <reshkekirill@gmail.com> wrote:
On Mon, 26 Feb 2024 at 20:10, Nathan Bossart <nathandbossart@gmail.com>
wrote:On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
I see 2 possible ways to implement this. The first one is to have hool
in
pg_signal_backend, and define a hook in extension which can do the
thing.
The second one is to have a predefined role. Something like a
`pg_signal_autovacuum` role which can signal running autovac to cancel.But
I don't see how we can handle specific `application_name` with this
solution.pg_signal_autovacuum seems useful given commit 3a9b18b.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.comThank you for your response.
Please find a patch attached.In patch, pg_signal_autovacuum role with oid 6312 added. I grabbed oid
from unused_oids script output.
Also, tap tests for functionality added. I'm not sure where to place them,
so I placed them in a separate directory in `src/test/`
Seems that regression tests for this feature are not possible, am i right?
Also, I was thinking of pg_signal_autovacuum vs pg_signal_backend.
Should pg_signal_autovacuum have power of pg_signal_backend (implicity)?
Or should this role have such little scope...Have a little thought on this, will share.
Do we need to test the pg_cancel_backend vs autovacuum case at all?
I think we do. Would it be better to split work into 2 patches: first one
with tests against current logic, and second
one with some changes/enhancements which allows to cancel running autovac
to non-superuser (via `pg_signal_autovacuum` role or some other mechanism)?
On Tue, Feb 27, 2024 at 01:22:31AM +0500, Kirill Reshke wrote:
Also, tap tests for functionality added. I'm not sure where to place them,
so I placed them in a separate directory in `src/test/`
Seems that regression tests for this feature are not possible, am i right?
It might be difficult to create reliable tests for pg_signal_autovacuum.
If we can, it would probably be easiest to do with a TAP test.
Also, I was thinking of pg_signal_autovacuum vs pg_signal_backend.
Should pg_signal_autovacuum have power of pg_signal_backend (implicity)? Or
should this role have such little scope...
-1. I don't see why giving a role privileges of pg_signal_autovacuum
should also give them the ability to signal all other non-superuser
backends.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Tue, Feb 27, 2024 at 11:59:00PM +0500, Kirill Reshke wrote:
Do we need to test the pg_cancel_backend vs autovacuum case at all?
I think we do. Would it be better to split work into 2 patches: first one
with tests against current logic, and second
one with some changes/enhancements which allows to cancel running autovac
to non-superuser (via `pg_signal_autovacuum` role or some other mechanism)?
If we need to add tests for pg_signal_backend, I think it's reasonable to
keep those in a separate patch from pg_signal_autovacuum.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Hi,
I'm new to reviewing postgres patches, but I took an interest in reviewing this patch as recommended by Nathan.
I have the following comments:
if (!superuser()) {
if (!OidIsValid(proc->roleId)) {
LocalPgBackendStatus *local_beentry;
local_beentry = pgstat_get_local_beentry_by_backend_id(proc->backendId);if (!(local_beentry && local_beentry->backendStatus.st_backendType == B_AUTOVAC_WORKER &&
has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)))
return SIGNAL_BACKEND_NOSUPERUSER;
} else {
if (superuser_arg(proc->roleId))
return SIGNAL_BACKEND_NOSUPERUSER;/* Users can signal backends they have role membership in. */
if (!has_privs_of_role(GetUserId(), proc->roleId) &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
return SIGNAL_BACKEND_NOPERMISSION;
}
}
1. I would suggest not to do nested if blocks since it's becoming harder to read. Also, does it make sense to have a utilities function in backend_status.c to check if a backend of a given backend id is of a certain backend_type. And should we check if proc->backendId is invalid?
ALTER SYSTEM SET autovacuum_vacuum_cost_limit TO 1;
ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO 100;
ALTER SYSTEM SET autovacuum_naptime TO 1;
2. Could we set these parameters at the beginning of the test before $node->start with $node->append_conf ? That way we can avoid restarting the node and doing the sleep later on.
my $res_pid = $node_primary->safe_psql(
. 'regress',
"SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker' and datname = 'regress';"
);my ($res_reg_psa_1, $stdout_reg_psa_1, $stderr_reg_psa_1) = $node_primary->psql('regress', qq[
SET ROLE psa_reg_role_1;
SELECT pg_terminate_backend($res_pid);
]);ok($res_reg_psa_1 != 0, "should fail for non pg_signal_autovacuum");
like($stderr_reg_psa_1, qr/Only roles with the SUPERUSER attribute may terminate processes of roles with the SUPERUSER attribute./, "matches");my ($res_reg_psa_2, $stdout_reg_psa_2, $stderr_reg_psa_2) = $node_primary->psql('regress', qq[
SET ROLE psa_reg_role_2;
SELECT pg_terminate_backend($res_pid);
]");
3. Some nits on styling
4. According to Postgres styles, I believe open brackets should be in a new line
Another comment that I forgot to mention is that we should also make the documentation change in doc/src/sgml/user-manag.sgml for this new predefined role
Thanks.
--
Anthony Leung
Amazon Web Services: https://aws.amazon.com
I took the liberty of continuing to work on this after chatting with Nathan.
I've attached the updated patch with some improvements.
Thanks.
--
Anthony Leung
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Introduce-pg_signal_autovacuum-role-to-allow-non-sup.patchapplication/octet-stream; name=v1-0001-Introduce-pg_signal_autovacuum-role-to-allow-non-sup.patchDownload
From 763f9052f291f4cfe32197379f58b9d9bc35ebb7 Mon Sep 17 00:00:00 2001
From: Anthony Leung <antholeu@amazon.com>
Date: Mon, 1 Apr 2024 04:46:55 +0000
Subject: [PATCH] Introduce pg_signal_autovacuum role to allow non-superuser to
termiante autovacuum worker
Non-superusers are currently not allowed to termiante autovacuum
backends which can create friction in some scenarios when performing
maintenance. This commit introduce a new pre-defined role
'pg_signal_autovacuum'. Non-superusers granted with this role are
allowed to terminate backends that are running autovacuum.
---
doc/src/sgml/user-manag.sgml | 4 ++
src/backend/storage/ipc/signalfuncs.c | 31 ++++++++---
src/backend/utils/activity/backend_status.c | 20 +++++++
src/include/catalog/pg_authid.dat | 5 ++
src/include/utils/backend_status.h | 2 +-
src/test/signals/.gitignore | 2 +
src/test/signals/Makefile | 23 ++++++++
src/test/signals/t/001_signal_autovacuum.pl | 59 +++++++++++++++++++++
8 files changed, 137 insertions(+), 9 deletions(-)
create mode 100644 src/test/signals/.gitignore
create mode 100644 src/test/signals/Makefile
create mode 100644 src/test/signals/t/001_signal_autovacuum.pl
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 1f7d7e75ce..541a45ab4a 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -693,6 +693,10 @@ DROP ROLE doomed_role;
database to issue
<link linkend="sql-createsubscription"><command>CREATE SUBSCRIPTION</command></link>.</entry>
</row>
+ <row>
+ <entry>pg_signal_autovacuum</entry>
+ <entry>Allow terminating backend running autovacuum</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index b595c2d691..a9f0466c46 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -79,14 +79,29 @@ pg_signal_backend(int pid, int sig)
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
*/
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
- return SIGNAL_BACKEND_NOSUPERUSER;
-
- /* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
- return SIGNAL_BACKEND_NOPERMISSION;
+ if (!superuser())
+ {
+ if (!OidIsValid(proc->roleId))
+ {
+ /*
+ * We only allow user with pg_signal_autovacuum role to terminate
+ * autovacuum worker as an exception.
+ */
+ if (!(pg_stat_is_backend_autovac_worker(proc->backendId) &&
+ has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)))
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+ else
+ {
+ if (superuser_arg(proc->roleId))
+ return SIGNAL_BACKEND_NOSUPERUSER;
+
+ /* Users can signal backends they have role membership in. */
+ if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ return SIGNAL_BACKEND_NOPERMISSION;
+ }
+ }
/*
* Can the process we just validated above end, followed by the pid being
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index f78ccc204b..b38ba319d2 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1057,6 +1057,26 @@ pgstat_get_my_query_id(void)
return MyBEEntry->st_query_id;
}
+/* ----------
+ * pg_stat_is_backend_autovac_worker() -
+ *
+ * Return whether the backend of the given backend id is of type autovacuum worker.
+ */
+bool
+pg_stat_is_backend_autovac_worker(BackendId beid)
+{
+ PgBackendStatus *ret;
+
+ Assert(beid != InvalidBackendId);
+
+ ret = pgstat_get_beentry_by_backend_id(beid);
+
+ if (!ret)
+ return false;
+
+ return ret->st_backendType == B_AUTOVAC_WORKER;
+}
+
/* ----------
* cmp_lbestatus
*
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 6b4a0aaaad..556a38f702 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -94,5 +94,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '6312', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM',
+ rolname => 'pg_signal_autovacuum', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index d51c840daf..a84aafabd9 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -325,7 +325,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern bool pg_stat_is_backend_autovac_worker(BackendId beid);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/signals/.gitignore b/src/test/signals/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/signals/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/signals/Makefile b/src/test/signals/Makefile
new file mode 100644
index 0000000000..688f6a7e9e
--- /dev/null
+++ b/src/test/signals/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/recovery
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/recovery/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/signals
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
+
+clean distclean maintainer-clean:
+ rm -rf tmp_check
diff --git a/src/test/signals/t/001_signal_autovacuum.pl b/src/test/signals/t/001_signal_autovacuum.pl
new file mode 100644
index 0000000000..93432b3a6b
--- /dev/null
+++ b/src/test/signals/t/001_signal_autovacuum.pl
@@ -0,0 +1,59 @@
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
+
+# Minimal test testing pg_signal_autovacuum role.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize postgres
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init();
+$node->start;
+
+$node->safe_psql('postgres', qq(
+ CREATE DATABASE regress;
+ CREATE ROLE psa_reg_role_1;
+ CREATE ROLE psa_reg_role_2;
+ GRANT pg_signal_backend TO psa_reg_role_1;
+ GRANT pg_signal_autovacuum TO psa_reg_role_2;
+));
+
+# Create some content in a table and set autovacuum setting such that
+# it would be triggered.
+$node->safe_psql('regress', qq(
+ CREATE TABLE t(i int);
+ INSERT INTO t SELECT * FROM generate_series(1, 1000000);
+ ALTER SYSTEM SET autovacuum_vacuum_cost_limit TO 1;
+ ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO 100;
+ ALTER SYSTEM SET autovacuum_naptime TO 1;
+));
+
+$node->restart;
+
+# wait for autovacuum to start.
+
+sleep 1;
+
+my $res_pid = $node->safe_psql('regress', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker' and datname = 'regress';;
+));
+
+
+my ($res_reg_psa_1, $stdout_reg_psa_1, $stderr_reg_psa_1) = $node->psql('regress', qq(
+ SET ROLE psa_reg_role_1;
+ SELECT pg_terminate_backend($res_pid);
+));
+
+ok($res_reg_psa_1 != 0, "should fail for non pg_signal_autovacuum");
+like($stderr_reg_psa_1, qr/Only roles with the SUPERUSER attribute may terminate processes of roles with the SUPERUSER attribute./, "matches");
+
+my ($res_reg_psa_2, $stdout_reg_psa_2, $stderr_reg_psa_2) = $node->psql('regress', qq(
+ SET ROLE psa_reg_role_2;
+ SELECT pg_terminate_backend($res_pid);
+));
+
+ok($res_reg_psa_2 == 0, "should success for pg_signal_autovacuum");
+
+done_testing();
--
2.40.1
On Mon, Apr 01, 2024 at 02:29:29PM +0000, Leung, Anthony wrote:
I've attached the updated patch with some improvements.
Thanks!
+ <row>
+ <entry>pg_signal_autovacuum</entry>
+ <entry>Allow terminating backend running autovacuum</entry>
+ </row>
I think we should be more precise here by calling out the exact types of
workers:
"Allow signaling autovacuum worker processes to..."
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
- return SIGNAL_BACKEND_NOSUPERUSER;
-
- /* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
- return SIGNAL_BACKEND_NOPERMISSION;
+ if (!superuser())
+ {
+ if (!OidIsValid(proc->roleId))
+ {
+ /*
+ * We only allow user with pg_signal_autovacuum role to terminate
+ * autovacuum worker as an exception.
+ */
+ if (!(pg_stat_is_backend_autovac_worker(proc->backendId) &&
+ has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)))
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+ else
+ {
+ if (superuser_arg(proc->roleId))
+ return SIGNAL_BACKEND_NOSUPERUSER;
+
+ /* Users can signal backends they have role membership in. */
+ if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ return SIGNAL_BACKEND_NOPERMISSION;
+ }
+ }
I don't think we should rely on !OidIsValid(proc->roleId) for signaling
autovacuum workers. That might not always be true, and I don't see any
need to rely on that otherwise. IMHO we should just add a few lines before
the existing code, which doesn't need to be changed at all:
if (pg_stat_is_backend_autovac_worker(proc->backendId) &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
return SIGNAL_BACKEND_NOAUTOVACUUM;
I also think we need to return something besides SIGNAL_BACKEND_NOSUPERUSER
in this case. Specifically, we probably need to introduce a new value and
provide the relevant error messages in pg_cancel_backend() and
pg_terminate_backend().
+/* ----------
+ * pg_stat_is_backend_autovac_worker() -
+ *
+ * Return whether the backend of the given backend id is of type autovacuum worker.
+ */
+bool
+pg_stat_is_backend_autovac_worker(BackendId beid)
+{
+ PgBackendStatus *ret;
+
+ Assert(beid != InvalidBackendId);
+
+ ret = pgstat_get_beentry_by_backend_id(beid);
+
+ if (!ret)
+ return false;
+
+ return ret->st_backendType == B_AUTOVAC_WORKER;
+}
Can we have this function return the backend type so that we don't have to
create a new function for every possible type? That might be handy in the
future.
I haven't looked too closely, but I'm pretty skeptical that the test suite
in your patch would be stable. Unfortunately, I don't have any better
ideas at the moment besides not adding a test for this new role.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On 2 Apr 2024, at 01:21, Nathan Bossart <nathandbossart@gmail.com> wrote:
I haven't looked too closely, but I'm pretty skeptical that the test suite
in your patch would be stable. Unfortunately, I don't have any better
ideas at the moment besides not adding a test for this new role.
We can add tests just like [0]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=eeefd4280f6 with injection points.
I mean replace that "sleep 1" with something like "$node->wait_for_event('autovacuum worker', 'autocauum-runing');".
Currently we have no infrastructure to wait for autovacuum of particular table, but I think it's doable.
Also I do not like that test is changing system-wide autovac settings, AFAIR these settings can be set for particular table.
Thanks!
Best regards, Andrey Borodin.
[0]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=eeefd4280f6
I don't think we should rely on !OidIsValid(proc->roleId) for signaling
autovacuum workers. That might not always be true, and I don't see any
need to rely on that otherwise. IMHO we should just add a few lines before
the existing code, which doesn't need to be changed at all:if (pg_stat_is_backend_autovac_worker(proc->backendId) &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
return SIGNAL_BACKEND_NOAUTOVACUUM;
I tried to add them above the existing code. When I test it locally, a user without pg_signal_autovacuum will actually fail at this block because the user is not superuser and !OidIsValid(proc->roleId) is also true in the following:
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
*/
if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
!superuser())
return SIGNAL_BACKEND_NOSUPERUSER;
This is what Im planning to do - If the backend is autovacuum worker and the user is not superuser or has pg_signal_autovacuum role, we return the new value and provide the relevant error message
/*
* If the backend is autovacuum worker, allow user with privileges of the
* pg_signal_autovacuum role to signal the backend.
*/
if (pgstat_get_backend_type(proc->backendId) == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM) || !superuser())
return SIGNAL_BACKEND_NOAUTOVACUUM;
}
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
*/
else if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
!superuser())
{
return SIGNAL_BACKEND_NOSUPERUSER;
}
/* Users can signal backends they have role membership in. */
else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
{
return SIGNAL_BACKEND_NOPERMISSION;
}
We can add tests just like [0] with injection points.
I mean replace that "sleep 1" with something like "$node->wait_for_event('autovacuum worker', 'autocauum-runing');".
Currently we have no infrastructure to wait for autovacuum of particular table, but I think it's doable.
Also I do not like that test is changing system-wide autovac settings, AFAIR these settings can be set for particular table.
Thanks for the suggestion. I will take a look at this. Let me also separate the test into a different patch file.
--
Anthony Leung
Amazon Web Services: https://aws.amazon.com
Update - the condition should be &&
if (pgstat_get_backend_type(proc->backendId) == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM) && !superuser())
return SIGNAL_BACKEND_NOAUTOVACUUM;
}
Thanks
--
Anthony Leung
Amazon Web Services: https://aws.amazon.com
On Tue, Apr 02, 2024 at 04:35:28PM +0500, Andrey M. Borodin wrote:
We can add tests just like [0] with injection points.
I mean replace that "sleep 1" with something like
"$node->wait_for_event('autovacuum worker', 'autocauum-runing');".
Currently we have no infrastructure to wait for autovacuum of
particular table, but I think it's doable.
Also I do not like that test is changing system-wide autovac
settings, AFAIR these settings can be set for particular table.
Yeah, hardcoded sleeps are not really acceptable. On fast machines
they eat in global runtime making the whole slower, impacting the CI.
On slow machines, that's not going to be stable and we have a lot of
buildfarm animals starved on CPU, like the ones running valgrind or
just because their environment is slow (one of my animals runs on a
RPI, for example). Note that slow machines have a lot of value
because they're usually better at catching race conditions. Injection
points would indeed make the tests more deterministic by controlling
the waits and wakeups you'd like to have in the patch's tests.
eeefd4280f6e would be a good example of how to implement a test.
--
Michael
On Thu, Apr 04, 2024 at 12:30:51AM +0000, Leung, Anthony wrote:
if (pg_stat_is_backend_autovac_worker(proc->backendId) &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
return SIGNAL_BACKEND_NOAUTOVACUUM;I tried to add them above the existing code. When I test it locally, a
user without pg_signal_autovacuum will actually fail at this block
because the user is not superuser and !OidIsValid(proc->roleId) is also
true in the following:
Good catch.
This is what Im planning to do - If the backend is autovacuum worker and
the user is not superuser or has pg_signal_autovacuum role, we return the
new value and provide the relevant error message/*
* If the backend is autovacuum worker, allow user with privileges of the
* pg_signal_autovacuum role to signal the backend.
*/
if (pgstat_get_backend_type(proc->backendId) == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM) || !superuser())
return SIGNAL_BACKEND_NOAUTOVACUUM;
}
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
*/
else if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
!superuser())
{
return SIGNAL_BACKEND_NOSUPERUSER;
}
/* Users can signal backends they have role membership in. */
else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
{
return SIGNAL_BACKEND_NOPERMISSION;
}
There's no need for the explicit superuser() check in the
pg_signal_autovacuum section. That's built into has_privs_of_role()
already.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
I made some updates based on the feedbacks in v2. This patch only contains the code change for allowing the signaling to av worker with pg_signal_autovacuum. I will send a separate patch for the tap test shortly.
Thanks
--
Anthony Leung
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Introduce-pg_signal_autovacuum-role-to-cancel-or-termiante-autovacuum-worker.patchapplication/octet-stream; name=v2-0001-Introduce-pg_signal_autovacuum-role-to-cancel-or-termiante-autovacuum-worker.patchDownload
From dc132bf5b15ddde69615a297db5f7969aef93547 Mon Sep 17 00:00:00 2001
From: Anthony Leung <antholeu@amazon.com>
Date: Thu, 4 Apr 2024 20:24:33 +0000
Subject: [PATCH] [PATCH v2 1/2] Introduce pg_signal_autovacuum role to allow
non-superuser to cancel or termiante autovacuum worker
Non-superusers are currently not allowed to termiante autovacuum
backends which can create friction in some scenarios when performing
maintenance. This commit introduce a new pre-defined role
'pg_signal_autovacuum'. Non-superusers granted with this role are
allowed to terminate backends that are running autovacuum.
---
doc/src/sgml/user-manag.sgml | 4 +++
src/backend/storage/ipc/signalfuncs.c | 39 +++++++++++++++++----
src/backend/utils/activity/backend_status.c | 19 ++++++++++
src/include/catalog/pg_authid.dat | 5 +++
src/include/utils/backend_status.h | 2 +-
5 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 07a16247d7..de8b6c28a4 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -705,6 +705,10 @@ DROP ROLE doomed_role;
database to issue
<link linkend="sql-createsubscription"><command>CREATE SUBSCRIPTION</command></link>.</entry>
</row>
+ <row>
+ <entry>pg_signal_autovacuum</entry>
+ <entry>Allow signaling autovacuum worker backend to cancel or terminate</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 88e9bf8125..5a533eb6ac 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -45,6 +45,7 @@
#define SIGNAL_BACKEND_ERROR 1
#define SIGNAL_BACKEND_NOPERMISSION 2
#define SIGNAL_BACKEND_NOSUPERUSER 3
+#define SIGNAL_BACKEND_NOAUTOVACUUM 4
static int
pg_signal_backend(int pid, int sig)
{
@@ -74,19 +75,31 @@ pg_signal_backend(int pid, int sig)
return SIGNAL_BACKEND_ERROR;
}
+ /*
+ * If the backend is autovacuum worker, allow user with the privileges of
+ * pg_signal_autovacuum role to signal the backend.
+ */
+ if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)
+ {
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
+ return SIGNAL_BACKEND_NOAUTOVACUUM;
+ }
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
- */
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
+ */
+ else if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
+ !superuser())
+ {
return SIGNAL_BACKEND_NOSUPERUSER;
-
+ }
/* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ {
return SIGNAL_BACKEND_NOPERMISSION;
+ }
/*
* Can the process we just validated above end, followed by the pid being
@@ -137,6 +150,13 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with privileges of the role whose query is being canceled or with privileges of the \"%s\" role may cancel this query.",
"pg_signal_backend")));
+ if (r == SIGNAL_BACKEND_NOAUTOVACUUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to cancel autovacuum worker backend"),
+ errdetail("Only roles with the %s attribute or with privileges of the \"%s\" role may cancel autovacuum worker backend",
+ "SUPERUSER", "pg_signal_autovacuum")));
+
PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}
@@ -243,6 +263,13 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with privileges of the role whose process is being terminated or with privileges of the \"%s\" role may terminate this process.",
"pg_signal_backend")));
+ if (r == SIGNAL_BACKEND_NOAUTOVACUUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to terminate autovacuum worker process"),
+ errdetail("Only roles with the %s attribute or with privileges of the \"%s\" role may terminate autovacuum worker backend",
+ "SUPERUSER", "pg_signal_autovacuum")));
+
/* Wait only on success and if actually requested */
if (r == SIGNAL_BACKEND_SUCCESS && timeout > 0)
PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1ccf4c6d83..edbb4341f5 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1038,6 +1038,25 @@ pgstat_get_my_query_id(void)
return MyBEEntry->st_query_id;
}
+/* ----------
+ * pgstat_get_backend_type() -
+ *
+ * Return the backend type of the backend for the given proc number.
+ * ----------
+ */
+BackendType
+pgstat_get_backend_type(ProcNumber procNumber)
+{
+ PgBackendStatus *ret;
+
+ ret = pgstat_get_beentry_by_proc_number(procNumber);
+
+ if (!ret)
+ return false;
+
+ return ret->st_backendType;
+}
+
/* ----------
* cmp_lbestatus
*
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 55cabdda6f..a481f0bcf1 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -99,5 +99,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '6312', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM',
+ rolname => 'pg_signal_autovacuum', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7b7f6f59d0..7fd9cd7167 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -323,7 +323,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern BackendType pgstat_get_backend_type(ProcNumber procNumber);
/* ----------
* Support functions for the SQL-callable functions to
--
2.40.1
Adding tap test for pg_signal_autovacuum using injection points as a separate patch. I also made a minor change on the original patch.
Thanks.
--
Anthony
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Introduce-pg_signal_autovacuum-role-to-signal-autovacuum-worker.patchapplication/octet-stream; name=v3-0001-Introduce-pg_signal_autovacuum-role-to-signal-autovacuum-worker.patchDownload
From 6b11275dbe04068a9ed1b744c3eeff4395f2ef81 Mon Sep 17 00:00:00 2001
From: Anthony Leung <antholeu@amazon.com>
Date: Thu, 4 Apr 2024 20:24:33 +0000
Subject: [PATCH] Introduce pg_signal_autovacuum role to allow non-superuser to
signal autovacuum worker
Non-superusers are currently not allowed to signal autovacuum
workers which can create friction in some scenarios when performing
maintenance. This commit introduce a new pre-defined role
'pg_signal_autovacuum'. Non-superusers granted with this role are
allowed to signal backends that are running autovacuum.
---
doc/src/sgml/user-manag.sgml | 4 +++
src/backend/storage/ipc/signalfuncs.c | 39 +++++++++++++++++----
src/backend/utils/activity/backend_status.c | 19 ++++++++++
src/include/catalog/pg_authid.dat | 5 +++
src/include/utils/backend_status.h | 2 +-
5 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 07a16247d7..de8b6c28a4 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -705,6 +705,10 @@ DROP ROLE doomed_role;
database to issue
<link linkend="sql-createsubscription"><command>CREATE SUBSCRIPTION</command></link>.</entry>
</row>
+ <row>
+ <entry>pg_signal_autovacuum</entry>
+ <entry>Allow signaling autovacuum worker backend to cancel or terminate</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 88e9bf8125..4d769c85d3 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -45,6 +45,7 @@
#define SIGNAL_BACKEND_ERROR 1
#define SIGNAL_BACKEND_NOPERMISSION 2
#define SIGNAL_BACKEND_NOSUPERUSER 3
+#define SIGNAL_BACKEND_NOAUTOVACUUM 4
static int
pg_signal_backend(int pid, int sig)
{
@@ -74,19 +75,31 @@ pg_signal_backend(int pid, int sig)
return SIGNAL_BACKEND_ERROR;
}
+ /*
+ * If the backend is autovacuum worker, allow user with the privileges of
+ * pg_signal_autovacuum role to signal the backend.
+ */
+ if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)
+ {
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
+ return SIGNAL_BACKEND_NOAUTOVACUUM;
+ }
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
- */
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
+ */
+ else if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
+ !superuser())
+ {
return SIGNAL_BACKEND_NOSUPERUSER;
-
+ }
/* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ {
return SIGNAL_BACKEND_NOPERMISSION;
+ }
/*
* Can the process we just validated above end, followed by the pid being
@@ -137,6 +150,13 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with privileges of the role whose query is being canceled or with privileges of the \"%s\" role may cancel this query.",
"pg_signal_backend")));
+ if (r == SIGNAL_BACKEND_NOAUTOVACUUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to cancel autovacuum worker backend"),
+ errdetail("Only roles with the %s attribute or with privileges of the \"%s\" role may cancel autovacuum worker backend",
+ "SUPERUSER", "pg_signal_autovacuum")));
+
PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}
@@ -243,6 +263,13 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with privileges of the role whose process is being terminated or with privileges of the \"%s\" role may terminate this process.",
"pg_signal_backend")));
+ if (r == SIGNAL_BACKEND_NOAUTOVACUUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to terminate autovacuum worker backend"),
+ errdetail("Only roles with the %s attribute or with privileges of the \"%s\" role may terminate autovacuum worker backend",
+ "SUPERUSER", "pg_signal_autovacuum")));
+
/* Wait only on success and if actually requested */
if (r == SIGNAL_BACKEND_SUCCESS && timeout > 0)
PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1ccf4c6d83..edbb4341f5 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1038,6 +1038,25 @@ pgstat_get_my_query_id(void)
return MyBEEntry->st_query_id;
}
+/* ----------
+ * pgstat_get_backend_type() -
+ *
+ * Return the backend type of the backend for the given proc number.
+ * ----------
+ */
+BackendType
+pgstat_get_backend_type(ProcNumber procNumber)
+{
+ PgBackendStatus *ret;
+
+ ret = pgstat_get_beentry_by_proc_number(procNumber);
+
+ if (!ret)
+ return false;
+
+ return ret->st_backendType;
+}
+
/* ----------
* cmp_lbestatus
*
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 55cabdda6f..a481f0bcf1 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -99,5 +99,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '6312', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM',
+ rolname => 'pg_signal_autovacuum', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7b7f6f59d0..7fd9cd7167 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -323,7 +323,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern BackendType pgstat_get_backend_type(ProcNumber procNumber);
/* ----------
* Support functions for the SQL-callable functions to
--
2.40.1
v3-0002-Add-tap-test-for-pg-signal-autovacuum-role.patchapplication/octet-stream; name=v3-0002-Add-tap-test-for-pg-signal-autovacuum-role.patchDownload
From 83345fd5405039ed28b728ebe7f05ab84b9562f8 Mon Sep 17 00:00:00 2001
From: Anthony Leung <antholeu@amazon.com>
Date: Thu, 4 Apr 2024 23:33:13 +0000
Subject: [PATCH] Add tap test for pg_signal_autovacuum role
Add a tap test to validate the following:
1. Normal user cannot signal autovacuum worker backend
2. User with pg_signal_backend role cannot signal autovacuum worker
backend
3. User with pg_signal_autovacuum role can signal autovacuum worker
backend
---
src/backend/postmaster/autovacuum.c | 3 +
src/test/signals/.gitignore | 2 +
src/test/signals/Makefile | 27 +++++++
src/test/signals/meson.build | 15 ++++
src/test/signals/t/001_signal_autovacuum.pl | 87 +++++++++++++++++++++
5 files changed, 134 insertions(+)
create mode 100644 src/test/signals/.gitignore
create mode 100644 src/test/signals/Makefile
create mode 100644 src/test/signals/meson.build
create mode 100644 src/test/signals/t/001_signal_autovacuum.pl
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c367ede6f8..241d8712c4 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
#include "utils/fmgroids.h"
#include "utils/fmgrprotos.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -1889,6 +1890,8 @@ do_autovacuum(void)
bool found_concurrent_worker = false;
int i;
+ INJECTION_POINT("autovacuum-start");
+
/*
* StartTransactionCommand and CommitTransactionCommand will automatically
* switch to other contexts. We need this one to keep the list of
diff --git a/src/test/signals/.gitignore b/src/test/signals/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/signals/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/signals/Makefile b/src/test/signals/Makefile
new file mode 100644
index 0000000000..18d8d2d0af
--- /dev/null
+++ b/src/test/signals/Makefile
@@ -0,0 +1,27 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/recovery
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/recovery/Makefile
+#
+#-------------------------------------------------------------------------
+
+EXTRA_INSTALL=src/test/modules/injection_points
+
+subdir = src/test/signals
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+export enable_injection_points enable_injection_points
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
+
+clean distclean maintainer-clean:
+ rm -rf tmp_check
diff --git a/src/test/signals/meson.build b/src/test/signals/meson.build
new file mode 100644
index 0000000000..3280d79ebc
--- /dev/null
+++ b/src/test/signals/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'signals',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_signal_autovacuum.pl',
+ ],
+ },
+}
diff --git a/src/test/signals/t/001_signal_autovacuum.pl b/src/test/signals/t/001_signal_autovacuum.pl
new file mode 100644
index 0000000000..a219b3e2a1
--- /dev/null
+++ b/src/test/signals/t/001_signal_autovacuum.pl
@@ -0,0 +1,87 @@
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
+
+# Test signaling for pg_signal_autovacuum role.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init();
+$node->append_conf(
+ 'postgresql.conf', 'autovacuum_naptime = 1
+');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Regular user and user with pg_signal_backend role cannot signal autovacuum worker
+# User with pg_signal_backend can signal autovacuum worker
+$node->safe_psql('postgres', qq(
+ CREATE ROLE regular_role;
+ CREATE ROLE signal_backend_role;
+ CREATE ROLE signal_autovacuum_role;
+ GRANT pg_signal_backend TO signal_backend_role;
+ GRANT pg_signal_autovacuum TO signal_autovacuum_role;
+));
+
+# From this point, autovacuum worker will wait before doing any vacuum.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-start', 'wait');");
+
+# Create some content and set autovacuum setting such that it would be triggered.
+$node->safe_psql('postgres', qq(
+ CREATE TABLE tab_int(i int);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100);
+));
+
+$node->safe_psql('postgres', qq(
+ INSERT INTO tab_int SELECT * FROM generate_series(1, 1000000);
+));
+
+# Wait until the autovacuum worker starts
+$node->wait_for_event('autovacuum worker', 'autovacuum-start');
+
+my $av_pid = $node->safe_psql('postgres', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular user cannot terminate autovacuum worker
+my $terminate_with_no_pg_signal_av = $node->psql('postgres', qq(
+ SET ROLE regular_role;
+ SELECT pg_terminate_backend($av_pid);
+), stdout => \$psql_out, stderr => \$psql_err);
+
+ok($terminate_with_no_pg_signal_av != 0, "Terminating autovacuum worker should not succeed without pg_signal_autovacuum role");
+like($psql_err, qr/ERROR: permission denied to terminate autovacuum worker backend\nDETAIL: Only roles with the SUPERUSER attribute or with privileges of the "pg_signal_autovacuum" role may terminate autovacuum worker backend/,
+ "passcheck errors gracefully when passcheck database is invalid");
+
+# User with signal_backend_role cannot terminate autovacuum worker
+my $terminate_with_signal_backend_role = $node->psql('postgres', qq(
+ SET ROLE signal_backend_role;
+ SELECT pg_terminate_backend($av_pid);
+), stdout => \$psql_out, stderr => \$psql_err);
+
+ok($terminate_with_no_pg_signal_av != 0, "Terminating autovacuum worker should not succeed without pg_signal_autovacuum role");
+like($psql_err, qr/ERROR: permission denied to terminate autovacuum worker backend\nDETAIL: Only roles with the SUPERUSER attribute or with privileges of the "pg_signal_autovacuum" role may terminate autovacuum worker backend/,
+ "passcheck errors gracefully when passcheck database is invalid");
+
+# User with pg_signal_backend can terminate autovacuum worker
+my $terminate_with_pg_signal_av = $node->psql('postgres', qq(
+ SET ROLE signal_autovacuum_role;
+ SELECT pg_terminate_backend($av_pid);
+), stdout => \$psql_out, stderr => \$psql_err);
+
+ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should succeed with pg_signal_autovacuum role");
+
+done_testing();
--
2.40.1
On 5 Apr 2024, at 05:03, Leung, Anthony <antholeu@amazon.com> wrote:
Adding tap test for pg_signal_autovacuum using injection points as a separate patch. I also made a minor change on the original patch.
The test looks good, but:
1. remove references to passcheck :)
2. detach injection point when it's not needed anymore
Thanks!
Best regards, Andrey Borodin.
On Fri, Apr 05, 2024 at 12:03:05AM +0000, Leung, Anthony wrote:
Adding tap test for pg_signal_autovacuum using injection points as a
separate patch. I also made a minor change on the original patch.
+ ret = pgstat_get_beentry_by_proc_number(procNumber);
+
+ if (!ret)
+ return false;
An invalid BackendType is not false, but B_INVALID.
+{ oid => '6312', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM',
OIDs in patches under development should use a value in the range
8000-9999. Newly-assigned OIDs are renumbered after the feature
freeze.
+ /*
+ * If the backend is autovacuum worker, allow user with the privileges of
+ * pg_signal_autovacuum role to signal the backend.
+ */
+ if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)
+ {
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
+ return SIGNAL_BACKEND_NOAUTOVACUUM;
+ }
I was wondering why this is not done after we've checked that we have
a superuser-owned backend, and this is giving me a pause. @Nathan,
why do you think we should not rely on the roleId for an autovacuum
worker? In core, do_autovacuum() is only called in a process without
a role specified, and I've noticed your remark here:
/messages/by-id/20240401202146.GA2354284@nathanxps13
It's feeling more natural here to check that we have a superuser-owned
backend first, and then do a lookup of the process type.
One thing that we should definitely not do is letting any user calling
pg_signal_backend() know that a given PID maps to an autovacuum
worker. This information is hidden in pg_stat_activity. And
actually, doesn't the patch leak this information to all users when
calling pg_signal_backend with random PID numbers because of the fact
that SIGNAL_BACKEND_NOAUTOVACUUM exists? Any role could guess which
PIDs are used by an autovacuum worker because of the granularity
required for the error related to pg_signal_autovacuum.
+ INJECTION_POINT("autovacuum-start");
Perhaps autovacuum-worker-start is more suited here. I am not sure
that the beginning of do_autovacuum() is the optimal location, as what
matters is that we've done InitPostgres() to be able to grab the PID
from pg_stat_activity. This location does the job.
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
[...]
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-start', 'wait');");
[...]
+# Wait until the autovacuum worker starts
+$node->wait_for_event('autovacuum worker', 'autovacuum-start');
This integration with injection points looks correct to me.
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
[...]
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
These need to be cleaned up.
+# Makefile for src/test/recovery
+#
+# src/test/recovery/Makefile
This is incorrect, twice. No problems for me with using a new path in
src/test/ for that kind of tests. There are no similar locations.
+ INSERT INTO tab_int SELECT * FROM generate_series(1, 1000000);
A good chunk of the test would be spent on that, but you don't need
that many tuples to trigger an autovacuum worker as the short naptime
is able to do it. I would recommend to reduce that to a minimum.
+# User with signal_backend_role cannot terminate autovacuum worker
Not sure that there is a huge point in checking after a role that
holds pg_signal_backend. An autovacuum worker is not a backend. Only
the case of a role not member of pg_signal_autovacuum should be
enough.
+# Test signaling for pg_signal_autovacuum role.
This needs a better documentation: the purpose of the test is to
signal an autovacuum worker, aka it uses an injection point to ensure
that the worker for the whole duration of the test.
It seems to me that it would be a better practice to wakeup the
injection point and detach it before being done with the worker.
That's not mandatory but it would encourage the correct flow if this
code is copy-pasted around to other tests.
+like($psql_err, qr/ERROR: permission denied to terminate ...
Checking only the ERRROR, and not the DETAIL should be sufficient
here.
+# User with pg_signal_backend can terminate autovacuum worker
+my $terminate_with_pg_signal_av = $node->psql('postgres', qq(
+ SET ROLE signal_autovacuum_role;
+ SELECT pg_terminate_backend($av_pid);
+), stdout => \$psql_out, stderr => \$psql_err);
+
+ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should succeed with pg_signal_autovacuum role");
Is that enough for the validation? How about checking some pattern in
the server logs from an offset before running this last query?
--
Michael
On Fri, Apr 05, 2024 at 02:39:05PM +0900, Michael Paquier wrote:
+ /* + * If the backend is autovacuum worker, allow user with the privileges of + * pg_signal_autovacuum role to signal the backend. + */ + if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER) + { + if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)) + return SIGNAL_BACKEND_NOAUTOVACUUM; + }I was wondering why this is not done after we've checked that we have
a superuser-owned backend, and this is giving me a pause. @Nathan,
why do you think we should not rely on the roleId for an autovacuum
worker? In core, do_autovacuum() is only called in a process without
a role specified, and I've noticed your remark here:
/messages/by-id/20240401202146.GA2354284@nathanxps13
It's feeling more natural here to check that we have a superuser-owned
backend first, and then do a lookup of the process type.
I figured since there's no reason to rely on that behavior, we might as
well do a bit of future-proofing in case autovacuum workers are ever not
run as InvalidOid. It'd be easy enough to fix this code if that ever
happened, so I'm not too worried about this.
One thing that we should definitely not do is letting any user calling
pg_signal_backend() know that a given PID maps to an autovacuum
worker. This information is hidden in pg_stat_activity. And
actually, doesn't the patch leak this information to all users when
calling pg_signal_backend with random PID numbers because of the fact
that SIGNAL_BACKEND_NOAUTOVACUUM exists? Any role could guess which
PIDs are used by an autovacuum worker because of the granularity
required for the error related to pg_signal_autovacuum.
Hm. I hadn't considered that angle. IIUC right now they'll just get the
generic superuser error for autovacuum workers. I don't know how concerned
to be about users distinguishing autovacuum workers from other superuser
backends, _but_ if roles with pg_signal_autovacuum can't even figure out
the PIDs for the autovacuum workers, then this feature seems kind-of
useless. Perhaps we should allow roles with privileges of
pg_signal_autovacuum to see the autovacuum workers in pg_stat_activity.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, Apr 05, 2024 at 07:56:56AM -0500, Nathan Bossart wrote:
On Fri, Apr 05, 2024 at 02:39:05PM +0900, Michael Paquier wrote:
One thing that we should definitely not do is letting any user calling
pg_signal_backend() know that a given PID maps to an autovacuum
worker. This information is hidden in pg_stat_activity. And
actually, doesn't the patch leak this information to all users when
calling pg_signal_backend with random PID numbers because of the fact
that SIGNAL_BACKEND_NOAUTOVACUUM exists? Any role could guess which
PIDs are used by an autovacuum worker because of the granularity
required for the error related to pg_signal_autovacuum.Hm. I hadn't considered that angle. IIUC right now they'll just get the
generic superuser error for autovacuum workers. I don't know how concerned
to be about users distinguishing autovacuum workers from other superuser
backends, _but_ if roles with pg_signal_autovacuum can't even figure out
the PIDs for the autovacuum workers, then this feature seems kind-of
useless. Perhaps we should allow roles with privileges of
pg_signal_autovacuum to see the autovacuum workers in pg_stat_activity.
There is pg_read_all_stats as well, so I don't see a big issue in
requiring to be a member of this role as well for the sake of what's
proposing here. I'd rather not leak any information at the end for
anybody calling pg_signal_backend without access to the stats, so
checking the backend type after the role sounds kind of a safer
long-term approach for me.
--
Michael
On Sat, Apr 06, 2024 at 08:56:04AM +0900, Michael Paquier wrote:
There is pg_read_all_stats as well, so I don't see a big issue in
requiring to be a member of this role as well for the sake of what's
proposing here.
Well, that tells you quite a bit more than just which PIDs correspond to
autovacuum workers, but maybe that's good enough for now.
I'd rather not leak any information at the end for
anybody calling pg_signal_backend without access to the stats, so
checking the backend type after the role sounds kind of a safer
long-term approach for me.
I'm not following what you mean by this. Are you suggesting that we should
keep the existing superuser message for the autovacuum workers?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, Apr 05, 2024 at 08:07:51PM -0500, Nathan Bossart wrote:
On Sat, Apr 06, 2024 at 08:56:04AM +0900, Michael Paquier wrote:
There is pg_read_all_stats as well, so I don't see a big issue in
requiring to be a member of this role as well for the sake of what's
proposing here.Well, that tells you quite a bit more than just which PIDs correspond to
autovacuum workers, but maybe that's good enough for now.
That may be a good initial compromise, for now.
I'd rather not leak any information at the end for
anybody calling pg_signal_backend without access to the stats, so
checking the backend type after the role sounds kind of a safer
long-term approach for me.I'm not following what you mean by this. Are you suggesting that we should
keep the existing superuser message for the autovacuum workers?
Mostly. Just to be clear the patch has the following problem:
=# CREATE ROLE popo LOGIN;
CREATE ROLE
=# CREATE EXTENSION injection_points;
CREATE EXTENSION
=# select injection_points_attach('autovacuum-start', 'wait');
injection_points_attach
-------------------------
(1 row)
=# select pid, backend_type from pg_stat_activity
where wait_event = 'autovacuum-start' LIMIT 1;
pid | backend_type
-------+-------------------
14163 | autovacuum worker
(1 row)
=> \c postgres popo
You are now connected to database "postgres" as user "popo".
=> select pg_terminate_backend(14163);
ERROR: 42501: permission denied to terminate autovacuum worker backend
DETAIL: Only roles with the SUPERUSER attribute or with privileges of
the "pg_signal_autovacuum" role may terminate autovacuum worker
backend
LOCATION: pg_terminate_backend, signalfuncs.c:267
=> select backend_type from pg_stat_activity where pid = 14163;
backend_type
--------------
null
(1 row)
And we should try to reshape things so as we get an ERROR like
"permission denied to terminate process" or "permission denied to
cancel query" for all the error paths, including autovacuum workers
and backends, so as we never leak any information about the backend
types involved when a role has no permission to issue the signal.
Perhaps that's the most intuitive thing as well, because autovacuum
workers are backends. One thing that we could do is to mention both
pg_signal_backend and pg_signal_autovacuum in the errdetail, and have
both cases be handled by SIGNAL_BACKEND_NOPERMISSION on failure.
--
Michael
There is pg_read_all_stats as well, so I don't see a big issue in
requiring to be a member of this role as well for the sake of what's
proposing here.Well, that tells you quite a bit more than just which PIDs correspond to
autovacuum workers, but maybe that's good enough for now.That may be a good initial compromise, for now.
Sounds good to me. I will update the documentation.
And we should try to reshape things so as we get an ERROR like
"permission denied to terminate process" or "permission denied to
cancel query" for all the error paths, including autovacuum workers
and backends, so as we never leak any information about the backend
types involved when a role has no permission to issue the signal.
Perhaps that's the most intuitive thing as well, because autovacuum
workers are backends. One thing that we could do is to mention both
pg_signal_backend and pg_signal_autovacuum in the errdetail, and have
both cases be handled by SIGNAL_BACKEND_NOPERMISSION on failure.
I understand your concern that we should avoid exposing the fact that the backend which the user is attempting to terminate is an AV worker unless the user has pg_signal_backend privileges and pg_signal_autovacuum privileges.
But Im not following how we can re-use SIGNAL_BACKEND_NOPERMISSION for this. If we return SIGNAL_BACKEND_NOPERMISSION here as the following, it'll stay return the "permission denied to terminate / cancel query" errmsg and errdetail in pg_cancel/terminate_backend.
/*
* If the backend is autovacuum worker, allow user with the privileges of
* pg_signal_autovacuum role to signal the backend.
*/
if (pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM))
return SIGNAL_BACKEND_NOPERMISSION;
}
Are you suggesting that we check if the backend is B_AUTOVAC in pg_cancel/ terminate_backend? That seems a bit unclean to me since pg_cancel_backend & pg_cancel_backend does not access to the procNumber to check the type of the backend.
IMHO, we can keep SIGNAL_BACKEND_NOAUTOVACUUM but just improve the errmsg / errdetail to not expose that the backend is an AV worker. It'll also be helpful if you can suggest what errdetail we should use here.
Thanks
--
Anthony Leung
Amazon Web Services: https://aws.amazon.com
On Mon, Apr 08, 2024 at 05:42:05PM +0000, Leung, Anthony wrote:
Are you suggesting that we check if the backend is B_AUTOVAC in
pg_cancel/ terminate_backend? That seems a bit unclean to me since
pg_cancel_backend & pg_cancel_backend does not access to the
procNumber to check the type of the backend.IMHO, we can keep SIGNAL_BACKEND_NOAUTOVACUUM but just improve the
errmsg / errdetail to not expose that the backend is an AV
worker. It'll also be helpful if you can suggest what errdetail we
should use here.
The thing is that you cannot rely on a lookup of the backend type for
the error information, or you open yourself to letting the caller of
pg_cancel_backend or pg_terminate_backend know if a backend is
controlled by a superuser or if a backend is an autovacuum worker.
And they may have no access to this information by default, except if
the role is a member of pg_read_all_stats able to scan
pg_stat_activity. An option that I can think of, even if it is not
the most elegant ever, would be list all the possible system users
that can be used in the errdetail under a single SIGNAL_BACKEND_NO*
state.
In the case of your patch it would mean to mention both
pg_signal_backend and pg_signal_autovacuum.
The choice of pg_signal_autovacuum is a bit inconsistent, as well,
because autovacuum workers operate like regular backends. This name
can also be confused with the autovacuum launcher.
--
Michael
Hi, thanks for looking into this.
On Tue, 9 Apr 2024 at 08:53, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 08, 2024 at 05:42:05PM +0000, Leung, Anthony wrote:
Are you suggesting that we check if the backend is B_AUTOVAC in
pg_cancel/ terminate_backend? That seems a bit unclean to me since
pg_cancel_backend & pg_cancel_backend does not access to the
procNumber to check the type of the backend.IMHO, we can keep SIGNAL_BACKEND_NOAUTOVACUUM but just improve the
errmsg / errdetail to not expose that the backend is an AV
worker. It'll also be helpful if you can suggest what errdetail we
should use here.The thing is that you cannot rely on a lookup of the backend type for
the error information, or you open yourself to letting the caller of
pg_cancel_backend or pg_terminate_backend know if a backend is
controlled by a superuser or if a backend is an autovacuum worker.
Good catch. Thanks. I think we need to update the error message to not
leak backend type info.
The choice of pg_signal_autovacuum is a bit inconsistent, as well,
because autovacuum workers operate like regular backends. This name
can also be confused with the autovacuum launcher.
Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good
enough?
On Wed, Apr 10, 2024 at 12:52:19AM +0300, Kirill Reshke wrote:
On Tue, 9 Apr 2024 at 08:53, Michael Paquier <michael@paquier.xyz> wrote:
The thing is that you cannot rely on a lookup of the backend type for
the error information, or you open yourself to letting the caller of
pg_cancel_backend or pg_terminate_backend know if a backend is
controlled by a superuser or if a backend is an autovacuum worker.Good catch. Thanks. I think we need to update the error message to not
leak backend type info.
Yep, that's necessary I am afraid.
The choice of pg_signal_autovacuum is a bit inconsistent, as well,
because autovacuum workers operate like regular backends. This name
can also be confused with the autovacuum launcher.Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good
enough?
Sounds fine to me. Perhaps others have an opinion about that?
--
Michael
On Wed, Apr 10, 2024 at 07:58:39AM +0900, Michael Paquier wrote:
On Wed, Apr 10, 2024 at 12:52:19AM +0300, Kirill Reshke wrote:
On Tue, 9 Apr 2024 at 08:53, Michael Paquier <michael@paquier.xyz> wrote:
The thing is that you cannot rely on a lookup of the backend type for
the error information, or you open yourself to letting the caller of
pg_cancel_backend or pg_terminate_backend know if a backend is
controlled by a superuser or if a backend is an autovacuum worker.Good catch. Thanks. I think we need to update the error message to not
leak backend type info.Yep, that's necessary I am afraid.
Isn't it relatively easy to discover this same information today via
pg_stat_progress_vacuum? That has the following code:
/* Value available to all callers */
values[0] = Int32GetDatum(beentry->st_procpid);
values[1] = ObjectIdGetDatum(beentry->st_databaseid);
I guess I'm not quite following why we are worried about leaking whether a
backend is an autovacuum worker.
The choice of pg_signal_autovacuum is a bit inconsistent, as well,
because autovacuum workers operate like regular backends. This name
can also be confused with the autovacuum launcher.Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good
enough?Sounds fine to me. Perhaps others have an opinion about that?
WFM
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Wed, Apr 10, 2024 at 10:00:34AM -0500, Nathan Bossart wrote:
Isn't it relatively easy to discover this same information today via
pg_stat_progress_vacuum? That has the following code:/* Value available to all callers */
values[0] = Int32GetDatum(beentry->st_procpid);
values[1] = ObjectIdGetDatum(beentry->st_databaseid);I guess I'm not quite following why we are worried about leaking whether a
backend is an autovacuum worker.
Good point. I've missed that we make no effort currently to hide any
PID information from the progress tables. And we can guess more
context data because of the per-table split of the progress tables.
This choice comes down to b6fb6471f6af that has introduced the
progress report facility, so this ship has long sailed it seems. And
it makes my argument kind of moot.
--
Michael
Posting updated version of this patch with comments above addressed.
1) pg_signal_autovacuum -> pg_signal_autovacuum_worker, as there seems
to be no objections to that.
2)
There are comments on how to write if statement:
In core, do_autovacuum() is only called in a process without
a role specified
It's feeling more natural here to check that we have a superuser-owned
backend first, and then do a lookup of the process type.
I figured since there's no reason to rely on that behavior, we might as
well do a bit of future-proofing in case autovacuum workers are ever not
run as InvalidOid.
I have combined them into this:
if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
&& pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)
This is both future-proofing and natural, I suppose. Downside of this
is double checking condition (!OidIsValid(proc->roleId) ||
superuser_arg(proc->roleId)), but i think that is ok for the sake of
simplicity.
3) pg_signal_autovacuum_worker Oid changed to random one: 8916
4)
An invalid BackendType is not false, but B_INVALID.
fixed, thanks
5)
There is pg_read_all_stats as well, so I don't see a big issue in
requiring to be a member of this role as well for the sake of what's
proposing here.Well, that tells you quite a bit more than just which PIDs correspond to
autovacuum workers, but maybe that's good enough for now.That may be a good initial compromise, for now.
Sounds good to me. I will update the documentation.
@Anthony if you feel that documentation update adds much value here,
please do. Given that we know autovacuum worker PIDs from
pg_stat_progress_vacuum, I don't know how to reflect something about
pg_stat_autovac_worker in doc, and if it is worth it.
6)
+ INJECTION_POINT("autovacuum-start");
Perhaps autovacuum-worker-start is more suited here
fixed, thanks
7)
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group [...] +# Copyright (c) 2024-2024, PostgreSQL Global Development Group
These need to be cleaned up.
+# Makefile for src/test/recovery +# +# src/test/recovery/Makefile
This is incorrect, twice.
Cleaned up, thanks!
8)
Not sure that there is a huge point in checking after a role that
holds pg_signal_backend.
Ok. Removed.
Then:
+like($psql_err, qr/ERROR: permission denied to terminate ...
Checking only the ERRROR, and not the DETAIL should be sufficient
here.
After removing the pg_signal_backend test case we have only one place
where errors check is done. So, I think we should keep DETAIL here to
ensure detail is correct (it differs from regular backend case).
9)
+# Test signaling for pg_signal_autovacuum role.
This needs a better documentation:
Updated. Hope now the test documentation helps to understand it.
10)
+ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should succeed with pg_signal_autovacuum role");
Is that enough for the validation?
Added:
ok($node->log_contains(qr/FATAL: terminating autovacuum process due to
administrator command/, $offset),
"Autovacuum terminates when role is granted with pg_signal_autovacuum_worker");
11) references to `passcheck` extension removed. errors messages rephrased.
12) injection_point_detach added.
13)
+ INSERT INTO tab_int SELECT * FROM generate_series(1, 1000000);
A good chunk of the test would be spent on that, but you don't need
that many tuples to trigger an autovacuum worker as the short naptime
is able to do it. I would recommend to reduce that to a minimum.
+1
Single tuple works.
14)
v3 suffers from segfault:
2024-04-11 11:28:31.116 UTC [147437] 001_signal_autovacuum.pl LOG:
statement: SELECT pg_terminate_backend(147427);
2024-04-11 11:28:31.116 UTC [147427] FATAL: terminating autovacuum
process due to administrator command
2024-04-11 11:28:31.116 UTC [147410] LOG: server process (PID 147427)
was terminated by signal 11: Segmentation fault
2024-04-11 11:28:31.116 UTC [147410] LOG: terminating any other
active server processes
2024-04-11 11:28:31.117 UTC [147410] LOG: shutting down because
restart_after_crash is off
2024-04-11 11:28:31.121 UTC [147410] LOG: database system is shut down
The test doesn't fail because pg_terminate_backend actually meets his
point: autovac is killed. But while dying, autovac also receives
segfault. Thats because of injections points:
(gdb) bt
#0 0x000056361c3379ea in tas (lock=0x7fbcb9632224 <error: Cannot
access memory at address 0x7fbcb9632224>) at
../../../../src/include/storage/s_lock.h:228
#1 ConditionVariableCancelSleep () at condition_variable.c:238
#2 0x000056361c337e4b in ConditionVariableBroadcast
(cv=0x7fbcb66f498c) at condition_variable.c:310
#3 0x000056361c330a40 in CleanupProcSignalState (status=<optimized
out>, arg=<optimized out>) at procsignal.c:240
#4 0x000056361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276
#5 0x000056361c3288fc in proc_exit_prepare (code=code@entry=1) at ipc.c:198
#6 0x000056361c3289bf in proc_exit (code=code@entry=1) at ipc.c:111
#7 0x000056361c49ffa8 in errfinish (filename=<optimized out>,
lineno=<optimized out>, funcname=0x56361c654370 <__func__.16>
"ProcessInterrupts") at elog.c:592
#8 0x000056361bf7191e in ProcessInterrupts () at postgres.c:3264
#9 0x000056361c3378d7 in ConditionVariableTimedSleep
(cv=0x7fbcb9632224, timeout=timeout@entry=-1,
wait_event_info=117440513) at condition_variable.c:196
#10 0x000056361c337d0b in ConditionVariableTimedSleep
(wait_event_info=<optimized out>, timeout=-1, cv=<optimized out>) at
condition_variable.c:135
#11 ConditionVariableSleep (cv=<optimized out>,
wait_event_info=<optimized out>) at condition_variable.c:98
#12 0x00000000b96347d0 in ?? ()
#13 0x3a3f1d9baa4f5500 in ?? ()
#14 0x000056361cc6cbd0 in ?? ()
#15 0x000056361ccac300 in ?? ()
#16 0x000056361c62be63 in ?? ()
#17 0x00007fbcb96347d0 in ?? () at injection_points.c:201 from
/home/reshke/postgres/tmp_install/home/reshke/postgres/pgbin/lib/injection_points.so
#18 0x00007fffe4122b10 in ?? ()
#19 0x00007fffe4122b70 in ?? ()
#20 0x0000000000000000 in ?? ()
discovered because of
# Release injection point.
$node->safe_psql('postgres',
"SELECT injection_point_detach('autovacuum-worker-start');");
added
v4 also suffers from that. i will try to fix that.
Attachments:
v4-0001-Introduce-pg_signal_autovacuum-role-to-allow-non-.patchapplication/octet-stream; name=v4-0001-Introduce-pg_signal_autovacuum-role-to-allow-non-.patchDownload
From 591fffa188eaecf59280ff32fe377fc782dd6457 Mon Sep 17 00:00:00 2001
From: Anthony Leung <antholeu@amazon.com>
Date: Thu, 4 Apr 2024 20:24:33 +0000
Subject: [PATCH v4 1/2] Introduce pg_signal_autovacuum role to allow
non-superuser to signal autovacuum worker
Non-superusers are currently not allowed to signal autovacuum
workers which can create friction in some scenarios when performing
maintenance. This commit introduce a new pre-defined role
'pg_signal_autovacuum'. Non-superusers granted with this role are
allowed to signal backends that are running autovacuum.
---
doc/src/sgml/user-manag.sgml | 4 +++
src/backend/storage/ipc/signalfuncs.c | 40 +++++++++++++++++----
src/backend/utils/activity/backend_status.c | 19 ++++++++++
src/include/catalog/pg_authid.dat | 5 +++
src/include/utils/backend_status.h | 2 +-
5 files changed, 63 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 07a16247d7..2c5b569d4d 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -705,6 +705,10 @@ DROP ROLE doomed_role;
database to issue
<link linkend="sql-createsubscription"><command>CREATE SUBSCRIPTION</command></link>.</entry>
</row>
+ <row>
+ <entry>pg_signal_autovacuum_worker</entry>
+ <entry>Allow signaling autovacuum worker backend to cancel or terminate</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 88e9bf8125..12d93ba83f 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -45,6 +45,7 @@
#define SIGNAL_BACKEND_ERROR 1
#define SIGNAL_BACKEND_NOPERMISSION 2
#define SIGNAL_BACKEND_NOSUPERUSER 3
+#define SIGNAL_BACKEND_NOAUTOVACUUM 4
static int
pg_signal_backend(int pid, int sig)
{
@@ -74,19 +75,32 @@ pg_signal_backend(int pid, int sig)
return SIGNAL_BACKEND_ERROR;
}
+ /*
+ * If the backend is autovacuum worker, allow user with the privileges of
+ * pg_signal_autovacuum_worker role to signal the backend.
+ */
+ if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
+ && pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)
+ {
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER))
+ return SIGNAL_BACKEND_NOAUTOVACUUM;
+ }
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
* backend, so treat it that way.
- */
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
+ */
+ else if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
+ !superuser())
+ {
return SIGNAL_BACKEND_NOSUPERUSER;
-
+ }
/* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ {
return SIGNAL_BACKEND_NOPERMISSION;
+ }
/*
* Can the process we just validated above end, followed by the pid being
@@ -137,6 +151,13 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with privileges of the role whose query is being canceled or with privileges of the \"%s\" role may cancel this query.",
"pg_signal_backend")));
+ if (r == SIGNAL_BACKEND_NOAUTOVACUUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to cancel autovacuum worker backend"),
+ errdetail("Only roles with the %s attribute or with privileges of the \"%s\" role may cancel autovacuum worker backend",
+ "SUPERUSER", "pg_signal_autovacuum_worker")));
+
PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}
@@ -243,6 +264,13 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with privileges of the role whose process is being terminated or with privileges of the \"%s\" role may terminate this process.",
"pg_signal_backend")));
+ if (r == SIGNAL_BACKEND_NOAUTOVACUUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to terminate autovacuum worker backend"),
+ errdetail("Only roles with the %s attribute or with privileges of the \"%s\" role may terminate autovacuum worker backend",
+ "SUPERUSER", "pg_signal_autovacuum_worker")));
+
/* Wait only on success and if actually requested */
if (r == SIGNAL_BACKEND_SUCCESS && timeout > 0)
PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1ccf4c6d83..a6c7899318 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1038,6 +1038,25 @@ pgstat_get_my_query_id(void)
return MyBEEntry->st_query_id;
}
+/* ----------
+ * pgstat_get_backend_type() -
+ *
+ * Return the backend type of the backend for the given proc number.
+ * ----------
+ */
+BackendType
+pgstat_get_backend_type(ProcNumber procNumber)
+{
+ PgBackendStatus *ret;
+
+ ret = pgstat_get_beentry_by_proc_number(procNumber);
+
+ if (!ret)
+ return B_INVALID;
+
+ return ret->st_backendType;
+}
+
/* ----------
* cmp_lbestatus
*
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 55cabdda6f..6156fc55dd 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -99,5 +99,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '8916', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM_WORKER',
+ rolname => 'pg_signal_autovacuum_worker', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7b7f6f59d0..7fd9cd7167 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -323,7 +323,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern BackendType pgstat_get_backend_type(ProcNumber procNumber);
/* ----------
* Support functions for the SQL-callable functions to
--
2.25.1
v4-0002-Add-tap-test-for-pg_signal_autovacuum-role.patchapplication/octet-stream; name=v4-0002-Add-tap-test-for-pg_signal_autovacuum-role.patchDownload
From 7661fb08515c2d1f159111e3f11a92d406050cb0 Mon Sep 17 00:00:00 2001
From: Anthony Leung <antholeu@amazon.com>
Date: Thu, 4 Apr 2024 23:33:13 +0000
Subject: [PATCH v4 2/2] Add tap test for pg_signal_autovacuum role
Add a tap test to validate the following:
1. Normal user cannot signal autovacuum worker backend
2. User with pg_signal_backend role cannot signal autovacuum worker
backend
3. User with pg_signal_autovacuum role can signal autovacuum worker
backend
---
src/backend/postmaster/autovacuum.c | 3 +
src/test/signals/.gitignore | 2 +
src/test/signals/Makefile | 27 ++++++
src/test/signals/meson.build | 15 ++++
src/test/signals/t/001_signal_autovacuum.pl | 93 +++++++++++++++++++++
5 files changed, 140 insertions(+)
create mode 100644 src/test/signals/.gitignore
create mode 100644 src/test/signals/Makefile
create mode 100644 src/test/signals/meson.build
create mode 100644 src/test/signals/t/001_signal_autovacuum.pl
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 170b973cc5..f13076f3b3 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
#include "utils/fmgroids.h"
#include "utils/fmgrprotos.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -1889,6 +1890,8 @@ do_autovacuum(void)
bool found_concurrent_worker = false;
int i;
+ INJECTION_POINT("autovacuum-worker-start");
+
/*
* StartTransactionCommand and CommitTransactionCommand will automatically
* switch to other contexts. We need this one to keep the list of
diff --git a/src/test/signals/.gitignore b/src/test/signals/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/signals/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/signals/Makefile b/src/test/signals/Makefile
new file mode 100644
index 0000000000..b5c1d65be4
--- /dev/null
+++ b/src/test/signals/Makefile
@@ -0,0 +1,27 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/signals
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/signals/Makefile
+#
+#-------------------------------------------------------------------------
+
+EXTRA_INSTALL=src/test/modules/injection_points
+
+subdir = src/test/signals
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+export enable_injection_points enable_injection_points
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
+
+clean distclean maintainer-clean:
+ rm -rf tmp_check
diff --git a/src/test/signals/meson.build b/src/test/signals/meson.build
new file mode 100644
index 0000000000..aec7437fb0
--- /dev/null
+++ b/src/test/signals/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
+
+tests += {
+ 'name': 'signals',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
+ 'tests': [
+ 't/001_signal_autovacuum.pl',
+ ],
+ },
+}
diff --git a/src/test/signals/t/001_signal_autovacuum.pl b/src/test/signals/t/001_signal_autovacuum.pl
new file mode 100644
index 0000000000..fd5850958c
--- /dev/null
+++ b/src/test/signals/t/001_signal_autovacuum.pl
@@ -0,0 +1,93 @@
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
+
+# Test signaling autovacuum worker backend by non-superuser role.
+# Regular roles are tested to fail to signals autovacuum worker, roles
+# with pg_signal_autovacuum_worker are tested to success this.
+# Test uses an injection point to ensure that the worker is present
+# for the whole duration of the test.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init();
+$node->append_conf(
+ 'postgresql.conf', 'autovacuum_naptime = 1
+');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Regular user and user with pg_signal_backend role cannot signal autovacuum worker
+# User with pg_signal_backend can signal autovacuum worker
+$node->safe_psql('postgres', qq(
+ CREATE ROLE regular_role;
+ CREATE ROLE signal_backend_role;
+ CREATE ROLE signal_autovacuum_worker_role;
+ GRANT pg_signal_backend TO signal_backend_role;
+ GRANT pg_signal_autovacuum_worker TO signal_autovacuum_worker_role;
+));
+
+# From this point, autovacuum worker will wait before doing any vacuum.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+# Create some content and set autovacuum setting such that it would be triggered.
+$node->safe_psql('postgres', qq(
+ CREATE TABLE tab_int(i int);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100);
+));
+
+$node->safe_psql('postgres', qq(
+ INSERT INTO tab_int VALUES(1);
+));
+
+# Wait until the autovacuum worker starts
+$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
+
+my $av_pid = $node->safe_psql('postgres', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular user cannot terminate autovacuum worker
+my $terminate_with_no_pg_signal_av = $node->psql('postgres', qq(
+ SET ROLE regular_role;
+ SELECT pg_terminate_backend($av_pid);
+), stdout => \$psql_out, stderr => \$psql_err);
+
+ok($terminate_with_no_pg_signal_av != 0, "Terminating autovacuum worker should not succeed without pg_signal_autovacuum role");
+like($psql_err, qr/ERROR: permission denied to terminate autovacuum worker backend\nDETAIL: Only roles with the SUPERUSER attribute or with privileges of the "pg_signal_autovacuum_worker" role may terminate autovacuum worker backend/,
+ "Terminating autovacuum errors gracefully when role is not granted with pg_signal_autovacuum_worker");
+
+my $offset = -s $node->logfile;
+
+# User with pg_signal_autovacuum can terminate autovacuum worker
+my $terminate_with_pg_signal_av = $node->psql('postgres', qq(
+ SET ROLE signal_autovacuum_worker_role;
+ SELECT pg_terminate_backend($av_pid);
+), stdout => \$psql_out, stderr => \$psql_err);
+
+
+ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should succeed with pg_signal_autovacuum role");
+
+# Check that the primary server logs a FATAL indicating that
+# autovacuum is terminated.
+ok($node->log_contains(qr/FATAL: terminating autovacuum process due to administrator command/, $offset),
+ "Autovacuum terminates when role is granted with pg_signal_autovacuum_worker");
+
+# Release injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_point_detach('autovacuum-worker-start');");
+
+done_testing();
--
2.25.1
On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
Posting updated version of this patch with comments above addressed.
I look for a commitfest entry for this one, but couldn't find it. Would
you mind either creating one or, if I've somehow missed it, pointing me to
the existing entry?
https://commitfest.postgresql.org/48/
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
It's feeling more natural here to check that we have a superuser-owned
backend first, and then do a lookup of the process type.I figured since there's no reason to rely on that behavior, we might as
well do a bit of future-proofing in case autovacuum workers are ever not
run as InvalidOid.I have combined them into this:
if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
&& pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)This is both future-proofing and natural, I suppose. Downside of this
is double checking condition (!OidIsValid(proc->roleId) ||
superuser_arg(proc->roleId)), but i think that is ok for the sake of
simplicity.
If we want to retain the check, IMO we might as well combine the first two
blocks like Anthony proposed:
if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
{
ProcNumber procNumber = GetNumberFromPGProc(proc);
PGBackendStatus procStatus = pgstat_get_beentry_by_proc_number(procNumber);
if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER &&
!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVAC_WORKER))
return SIGNAL_BACKEND_NOAUTOVAC;
else if (!superuser())
return SIGNAL_BACKEND_NOSUPERUSER;
}
+ <row>
+ <entry>pg_signal_autovacuum_worker</entry>
+ <entry>Allow signaling autovacuum worker backend to cancel or terminate</entry>
+ </row>
I think we need to be more specific about what pg_cancel_backend() and
pg_terminate_backend() do for autovacuum workers. The code offers some
clues:
/*
* SIGINT is used to signal canceling the current table's vacuum; SIGTERM
* means abort and exit cleanly, and SIGQUIT means abandon ship.
*/
pqsignal(SIGINT, StatementCancelHandler);
pqsignal(SIGTERM, die);
+/* ----------
+ * pgstat_get_backend_type() -
+ *
+ * Return the backend type of the backend for the given proc number.
+ * ----------
+ */
+BackendType
+pgstat_get_backend_type(ProcNumber procNumber)
+{
+ PgBackendStatus *ret;
+
+ ret = pgstat_get_beentry_by_proc_number(procNumber);
+
+ if (!ret)
+ return B_INVALID;
+
+ return ret->st_backendType;
+}
I'm not sure we really need to introduce a new function for this. I
avoided using it in my example snippet above. But, maybe it'll come in
handy down the road...
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, 11 Apr 2024 at 19:07, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
Posting updated version of this patch with comments above addressed.
I look for a commitfest entry for this one, but couldn't find it. Would
you mind either creating one
On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
The test doesn't fail because pg_terminate_backend actually meets his
point: autovac is killed. But while dying, autovac also receives
segfault. Thats because of injections points:(gdb) bt
#0 0x000056361c3379ea in tas (lock=0x7fbcb9632224 <error: Cannot
access memory at address 0x7fbcb9632224>) at
../../../../src/include/storage/s_lock.h:228
#1 ConditionVariableCancelSleep () at condition_variable.c:238
#2 0x000056361c337e4b in ConditionVariableBroadcast
(cv=0x7fbcb66f498c) at condition_variable.c:310
#3 0x000056361c330a40 in CleanupProcSignalState (status=<optimized
out>, arg=<optimized out>) at procsignal.c:240
#4 0x000056361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276
#5 0x000056361c3288fc in proc_exit_prepare (code=code@entry=1) at ipc.c:198
#6 0x000056361c3289bf in proc_exit (code=code@entry=1) at ipc.c:111
#7 0x000056361c49ffa8 in errfinish (filename=<optimized out>,
lineno=<optimized out>, funcname=0x56361c654370 <__func__.16>
"ProcessInterrupts") at elog.c:592
#8 0x000056361bf7191e in ProcessInterrupts () at postgres.c:3264
#9 0x000056361c3378d7 in ConditionVariableTimedSleep
(cv=0x7fbcb9632224, timeout=timeout@entry=-1,
wait_event_info=117440513) at condition_variable.c:196
#10 0x000056361c337d0b in ConditionVariableTimedSleep
(wait_event_info=<optimized out>, timeout=-1, cv=<optimized out>) at
condition_variable.c:135
#11 ConditionVariableSleep (cv=<optimized out>,
wait_event_info=<optimized out>) at condition_variable.c:98discovered because of
# Release injection point.
$node->safe_psql('postgres',
"SELECT injection_point_detach('autovacuum-worker-start');");
addedv4 also suffers from that. i will try to fix that.
I can see this stack trace as well. Capturing a bit more than your
own stack, this is crashing in the autovacuum worker while waiting on
a condition variable when processing a ProcessInterrupts().
That may point to a legit bug with condition variables in this
context, actually? From what I can see, issuing a signal on a backend
process waiting with a condition variable is able to process the
interrupt correctly.
--
Michael
At Fri, 12 Apr 2024 09:10:35 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
The test doesn't fail because pg_terminate_backend actually meets his
point: autovac is killed. But while dying, autovac also receives
segfault. Thats because of injections points:(gdb) bt
#0 0x000056361c3379ea in tas (lock=0x7fbcb9632224 <error: Cannot
access memory at address 0x7fbcb9632224>) at
../../../../src/include/storage/s_lock.h:228
#1 ConditionVariableCancelSleep () at condition_variable.c:238
...
#3 0x000056361c330a40 in CleanupProcSignalState (status=<optimized
out>, arg=<optimized out>) at procsignal.c:240
#4 0x000056361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276
#9 0x000056361c3378d7 in ConditionVariableTimedSleep
(cv=0x7fbcb9632224, timeout=timeout@entry=-1,
...
I can see this stack trace as well. Capturing a bit more than your
own stack, this is crashing in the autovacuum worker while waiting on
a condition variable when processing a ProcessInterrupts().That may point to a legit bug with condition variables in this
context, actually? From what I can see, issuing a signal on a backend
process waiting with a condition variable is able to process the
interrupt correctly.
ProcSignalInit sets up CleanupProcSignalState to be called via
on_shmem_exit. If the CV is allocated in a dsm segment, shmem_exit
should have detached the region for the CV. CV cleanup code should be
invoked via before_shmem_exit.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, 11 Apr 2024 at 16:55, Kirill Reshke <reshkekirill@gmail.com> wrote:
7)
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group [...] +# Copyright (c) 2024-2024, PostgreSQL Global Development GroupThese need to be cleaned up.
+# Makefile for src/test/recovery +# +# src/test/recovery/MakefileThis is incorrect, twice.
Cleaned up, thanks!
Oh, wait, I did this wrong.
Should i use
+# Copyright (c) 2024-2024, PostgreSQL Global Development Group
(Like in src/test/signals/meson.build &
src/test/signals/t/001_signal_autovacuum.pl)
or
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
(Like in src/test/signals/Makefile)
at the beginning of each added file?
On Fri, Apr 12, 2024 at 01:32:42PM +0500, Kirill Reshke wrote:
+# +# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group +# Portions Copyright (c) 1994, Regents of the University of California +# (Like in src/test/signals/Makefile)at the beginning of each added file?
Assuming that these files are merged in 2024, you could just use:
Copyright (c) 2024, PostgreSQL Global Development Group
See for example slotsync.c introduced recently in commit ddd5f4f54a02.
--
Michael
I adjusted 0001 based on my upthread feedback.
--
nathan
Attachments:
v5-0001-Introduce-pg_signal_autovacuum_worker.patchtext/plain; charset=us-asciiDownload
From 118f95346fcf8099ab28d2f9186185171e3b88af Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Wed, 12 Jun 2024 15:38:14 -0500
Subject: [PATCH v5 1/1] Introduce pg_signal_autovacuum_worker.
Since commit 3a9b18b309, roles with privileges of pg_signal_backend
cannot signal autovacuum workers. Many users treated the ability
to signal autovacuum workers as a feature instead of a bug, so we
are reintroducing it via a new predefined role. Having privileges
of this new role, named pg_signal_autovacuum_worker, only permits
signaling autovacuum workers. It does not permit signaling other
types of superuser backends.
Author: Kirill Reshke
Reviewed-by: Anthony Leung, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhC4GGmbnugHfB9G0%3DfAxjCSug_-rmL9oUh0LTxsyBfsg%40mail.gmail.com
---
doc/src/sgml/func.sgml | 8 +++--
doc/src/sgml/user-manag.sgml | 5 +++
src/backend/storage/ipc/signalfuncs.c | 46 +++++++++++++++++++++------
src/include/catalog/pg_authid.dat | 5 +++
4 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 17c44bc338..7d5b7acb68 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28037,7 +28037,9 @@ SELECT currval(pg_get_serial_sequence('sometable', 'id'));
<para>
Cancels the current query of the session whose backend process has the
specified process ID. This is also allowed if the
- calling role is a member of the role whose backend is being canceled or
+ calling role has privileges of the role whose backend is being canceled,
+ the backend process is an autovacuum worker and the calling role has
+ privileges of <literal>pg_signal_autovacuum_worker</literal>, or
the calling role has privileges of <literal>pg_signal_backend</literal>,
however only superusers can cancel superuser backends.
</para></entry>
@@ -28111,7 +28113,9 @@ SELECT currval(pg_get_serial_sequence('sometable', 'id'));
<para>
Terminates the session whose backend process has the
specified process ID. This is also allowed if the calling role
- is a member of the role whose backend is being terminated or the
+ has privileges of the role whose backend is being terminated,
+ the backend process is an autovacuum worker and the calling role has
+ privileges of <literal>pg_signal_autovacuum_worker</literal>, or the
calling role has privileges of <literal>pg_signal_backend</literal>,
however only superusers can terminate superuser backends.
</para>
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 07a16247d7..340cefff70 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -661,6 +661,11 @@ DROP ROLE doomed_role;
<entry>pg_signal_backend</entry>
<entry>Signal another backend to cancel a query or terminate its session.</entry>
</row>
+ <row>
+ <entry>pg_signal_autovacuum_worker</entry>
+ <entry>Signal an autovacuum worker to cancel the current table's vacuum
+ or terminate its session.</entry>
+ </row>
<row>
<entry>pg_read_server_files</entry>
<entry>Allow reading files from any location the database can access on the server with COPY and
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 88e9bf8125..780a1c682a 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -34,8 +34,9 @@
* role as the backend being signaled. For "dangerous" signals, an explicit
* check for superuser needs to be done prior to calling this function.
*
- * Returns 0 on success, 1 on general failure, 2 on normal permission error
- * and 3 if the caller needs to be a superuser.
+ * Returns 0 on success, 1 on general failure, 2 on normal permission error,
+ * 3 if the caller needs to be a superuser, and 4 if the caller needs to have
+ * privileges of pg_signal_autovacuum_worker.
*
* In the event of a general failure (return code 1), a warning message will
* be emitted. For permission errors, doing that is the responsibility of
@@ -45,6 +46,7 @@
#define SIGNAL_BACKEND_ERROR 1
#define SIGNAL_BACKEND_NOPERMISSION 2
#define SIGNAL_BACKEND_NOSUPERUSER 3
+#define SIGNAL_BACKEND_NOAUTOVAC 4
static int
pg_signal_backend(int pid, int sig)
{
@@ -77,15 +79,27 @@ pg_signal_backend(int pid, int sig)
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
- * backend, so treat it that way.
+ * backend, so treat it that way. As an exception, we allow roles with
+ * privileges of pg_signal_autovacuum_worker to signal autovacuum workers
+ * (which do not advertise a role).
+ *
+ * Otherwise, users can signal backends for roles they have privileges of.
*/
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
- return SIGNAL_BACKEND_NOSUPERUSER;
+ if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
+ {
+ ProcNumber procNumber = GetNumberFromPGProc(proc);
+ PgBackendStatus *procStatus = pgstat_get_beentry_by_proc_number(procNumber);
- /* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER)
+ {
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER))
+ return SIGNAL_BACKEND_NOAUTOVAC;
+ }
+ else if (!superuser())
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+ else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
return SIGNAL_BACKEND_NOPERMISSION;
/*
@@ -130,6 +144,13 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with the %s attribute may cancel queries of roles with the %s attribute.",
"SUPERUSER", "SUPERUSER")));
+ if (r == SIGNAL_BACKEND_NOAUTOVAC)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to cancel query"),
+ errdetail("Only roles with privileges of the \"%s\" role may cancel queries of autovacuum workers.",
+ "pg_signal_autovacuum_worker")));
+
if (r == SIGNAL_BACKEND_NOPERMISSION)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
@@ -236,6 +257,13 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with the %s attribute may terminate processes of roles with the %s attribute.",
"SUPERUSER", "SUPERUSER")));
+ if (r == SIGNAL_BACKEND_NOAUTOVAC)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to terminate process"),
+ errdetail("Only roles with privileges of the \"%s\" role may terminate autovacuum worker processes.",
+ "pg_signal_autovacuum_worker")));
+
if (r == SIGNAL_BACKEND_NOPERMISSION)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index bf00815c14..0129f67eaa 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -99,5 +99,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '8916', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM_WORKER',
+ rolname => 'pg_signal_autovacuum_worker', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
--
2.39.3 (Apple Git-146)
On 13 Jun 2024, at 02:04, Nathan Bossart <nathandbossart@gmail.com> wrote:
I adjusted 0001 based on my upthread feedback.
This patch looks good to me. Spellchecker is complaining about “signaling” instead of “signalling”, but ISTM it’s OK.
I’ve tried to dig into the test.
The problem is CV is allocated in
inj_state = GetNamedDSMSegment("injection_points”,
which seems to be destroyed in
shmem_exit() calling dsm_backend_shutdown()
This happens before we broadcast that sleep is over.
I think this might happen with any wait on injection point if it is pg_terminate_backend()ed.
Is there way to wake up from CV sleep before processing actual termination?
Thanks!
Best regards, Andrey Borodin.
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote:
This patch looks good to me.
Thanks for looking.
Spellchecker is complaining about "signaling" instead of "signalling",
but ISTM it�s OK.
I think this is an en-US versus en-GB thing. We've standardized on en-US
for "cancel" (see commits 8c9da14, 21f1e15, and af26857), so IMO we might
as well do so for "signal," too.
--
nathan
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote:
I’ve tried to dig into the test.
The problem is CV is allocated ininj_state = GetNamedDSMSegment("injection_points”,
which seems to be destroyed in
shmem_exit() calling dsm_backend_shutdown()
This happens before we broadcast that sleep is over.
I think this might happen with any wait on injection point if it is
pg_terminate_backend()ed.
Except if I am missing something, this is not a problem for a normal
backend, for example with one using a `SELECT injection_points_run()`.
Is there way to wake up from CV sleep before processing actual termination?
I am honestly not sure if this is worth complicating the sigjmp path
of the autovacuum worker just for the sake of this test. It seems to
me that it would be simple enough to move the injection point
autovacuum-worker-start within the transaction block a few lines down
in do_autovacuum(), no?
--
Michael
On 21 Jun 2024, at 09:01, Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote:
I’ve tried to dig into the test.
The problem is CV is allocated ininj_state = GetNamedDSMSegment("injection_points”,
which seems to be destroyed in
shmem_exit() calling dsm_backend_shutdown()
This happens before we broadcast that sleep is over.
I think this might happen with any wait on injection point if it is
pg_terminate_backend()ed.Except if I am missing something, this is not a problem for a normal
backend, for example with one using a `SELECT injection_points_run()`.
Yes, i’ve tried to get similar error in other CV-sleeps and in injection points of normal backend - everything works just fine. The error is specific to just this test.
Is there way to wake up from CV sleep before processing actual termination?
I am honestly not sure if this is worth complicating the sigjmp path
of the autovacuum worker just for the sake of this test. It seems to
me that it would be simple enough to move the injection point
autovacuum-worker-start within the transaction block a few lines down
in do_autovacuum(), no?
Thanks for the pointer, I’ll try this approach!
Best regards, Andrey Borodin,
On Fri, Jun 14, 2024 at 03:12:50PM -0500, Nathan Bossart wrote:
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote:
This patch looks good to me.
Thanks for looking.
While double-checking the whole, where I don't have much to say about
0001, I have fixed a few issues with the test presented upthread and
stabilized it (CI and my stuff are both OK). I'd suggest to move it
to test_misc/, because there is no clear category where to put it, and
we have another test with injection points there for timeouts so the
module dependency with EXTRA_INSTALL is already cleared.
What do you think?
--
Michael
Attachments:
v6-0001-Introduce-pg_signal_autovacuum_worker.patchtext/x-diff; charset=us-asciiDownload
From 2d3e219149b8dbdbaf2b465f619f898c61969b17 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Wed, 12 Jun 2024 15:38:14 -0500
Subject: [PATCH v6 1/2] Introduce pg_signal_autovacuum_worker.
Since commit 3a9b18b309, roles with privileges of pg_signal_backend
cannot signal autovacuum workers. Many users treated the ability
to signal autovacuum workers as a feature instead of a bug, so we
are reintroducing it via a new predefined role. Having privileges
of this new role, named pg_signal_autovacuum_worker, only permits
signaling autovacuum workers. It does not permit signaling other
types of superuser backends.
Author: Kirill Reshke
Reviewed-by: Anthony Leung, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhC4GGmbnugHfB9G0%3DfAxjCSug_-rmL9oUh0LTxsyBfsg%40mail.gmail.com
---
src/include/catalog/pg_authid.dat | 5 +++
src/backend/storage/ipc/signalfuncs.c | 46 +++++++++++++++++++++------
doc/src/sgml/func.sgml | 8 +++--
doc/src/sgml/user-manag.sgml | 5 +++
4 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index bf00815c14..0129f67eaa 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -99,5 +99,10 @@
rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '8916', oid_symbol => 'ROLE_PG_SIGNAL_AUTOVACUUM_WORKER',
+ rolname => 'pg_signal_autovacuum_worker', rolsuper => 'f', rolinherit => 't',
+ rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+ rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+ rolpassword => '_null_', rolvaliduntil => '_null_' },
]
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 88e9bf8125..780a1c682a 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -34,8 +34,9 @@
* role as the backend being signaled. For "dangerous" signals, an explicit
* check for superuser needs to be done prior to calling this function.
*
- * Returns 0 on success, 1 on general failure, 2 on normal permission error
- * and 3 if the caller needs to be a superuser.
+ * Returns 0 on success, 1 on general failure, 2 on normal permission error,
+ * 3 if the caller needs to be a superuser, and 4 if the caller needs to have
+ * privileges of pg_signal_autovacuum_worker.
*
* In the event of a general failure (return code 1), a warning message will
* be emitted. For permission errors, doing that is the responsibility of
@@ -45,6 +46,7 @@
#define SIGNAL_BACKEND_ERROR 1
#define SIGNAL_BACKEND_NOPERMISSION 2
#define SIGNAL_BACKEND_NOSUPERUSER 3
+#define SIGNAL_BACKEND_NOAUTOVAC 4
static int
pg_signal_backend(int pid, int sig)
{
@@ -77,15 +79,27 @@ pg_signal_backend(int pid, int sig)
/*
* Only allow superusers to signal superuser-owned backends. Any process
* not advertising a role might have the importance of a superuser-owned
- * backend, so treat it that way.
+ * backend, so treat it that way. As an exception, we allow roles with
+ * privileges of pg_signal_autovacuum_worker to signal autovacuum workers
+ * (which do not advertise a role).
+ *
+ * Otherwise, users can signal backends for roles they have privileges of.
*/
- if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) &&
- !superuser())
- return SIGNAL_BACKEND_NOSUPERUSER;
+ if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
+ {
+ ProcNumber procNumber = GetNumberFromPGProc(proc);
+ PgBackendStatus *procStatus = pgstat_get_beentry_by_proc_number(procNumber);
- /* Users can signal backends they have role membership in. */
- if (!has_privs_of_role(GetUserId(), proc->roleId) &&
- !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
+ if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER)
+ {
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER))
+ return SIGNAL_BACKEND_NOAUTOVAC;
+ }
+ else if (!superuser())
+ return SIGNAL_BACKEND_NOSUPERUSER;
+ }
+ else if (!has_privs_of_role(GetUserId(), proc->roleId) &&
+ !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))
return SIGNAL_BACKEND_NOPERMISSION;
/*
@@ -130,6 +144,13 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with the %s attribute may cancel queries of roles with the %s attribute.",
"SUPERUSER", "SUPERUSER")));
+ if (r == SIGNAL_BACKEND_NOAUTOVAC)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to cancel query"),
+ errdetail("Only roles with privileges of the \"%s\" role may cancel queries of autovacuum workers.",
+ "pg_signal_autovacuum_worker")));
+
if (r == SIGNAL_BACKEND_NOPERMISSION)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
@@ -236,6 +257,13 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
errdetail("Only roles with the %s attribute may terminate processes of roles with the %s attribute.",
"SUPERUSER", "SUPERUSER")));
+ if (r == SIGNAL_BACKEND_NOAUTOVAC)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to terminate process"),
+ errdetail("Only roles with privileges of the \"%s\" role may terminate autovacuum worker processes.",
+ "pg_signal_autovacuum_worker")));
+
if (r == SIGNAL_BACKEND_NOPERMISSION)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2609269610..f73ba439c0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28040,7 +28040,9 @@ SELECT currval(pg_get_serial_sequence('sometable', 'id'));
<para>
Cancels the current query of the session whose backend process has the
specified process ID. This is also allowed if the
- calling role is a member of the role whose backend is being canceled or
+ calling role has privileges of the role whose backend is being canceled,
+ the backend process is an autovacuum worker and the calling role has
+ privileges of <literal>pg_signal_autovacuum_worker</literal>, or
the calling role has privileges of <literal>pg_signal_backend</literal>,
however only superusers can cancel superuser backends.
</para></entry>
@@ -28114,7 +28116,9 @@ SELECT currval(pg_get_serial_sequence('sometable', 'id'));
<para>
Terminates the session whose backend process has the
specified process ID. This is also allowed if the calling role
- is a member of the role whose backend is being terminated or the
+ has privileges of the role whose backend is being terminated,
+ the backend process is an autovacuum worker and the calling role has
+ privileges of <literal>pg_signal_autovacuum_worker</literal>, or the
calling role has privileges of <literal>pg_signal_backend</literal>,
however only superusers can terminate superuser backends.
</para>
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 07a16247d7..340cefff70 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -661,6 +661,11 @@ DROP ROLE doomed_role;
<entry>pg_signal_backend</entry>
<entry>Signal another backend to cancel a query or terminate its session.</entry>
</row>
+ <row>
+ <entry>pg_signal_autovacuum_worker</entry>
+ <entry>Signal an autovacuum worker to cancel the current table's vacuum
+ or terminate its session.</entry>
+ </row>
<row>
<entry>pg_read_server_files</entry>
<entry>Allow reading files from any location the database can access on the server with COPY and
--
2.45.2
v6-0002-Add-tap-test-for-pg_signal_autovacuum-role.patchtext/x-diff; charset=us-asciiDownload
From 93b7d69585aa4a408749bee6239006d6494df1bb Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 21 Jun 2024 14:29:23 +0900
Subject: [PATCH v6 2/2] Add tap test for pg_signal_autovacuum role
---
src/backend/postmaster/autovacuum.c | 7 ++
.../test_misc/t/006_signal_autovacuum.pl | 100 ++++++++++++++++++
2 files changed, 107 insertions(+)
create mode 100644 src/test/modules/test_misc/t/006_signal_autovacuum.pl
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 9a925a10cd..0d4e2c5140 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
#include "utils/fmgroids.h"
#include "utils/fmgrprotos.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -1902,6 +1903,12 @@ do_autovacuum(void)
/* Start a transaction so our commands have one to play into. */
StartTransactionCommand();
+ /*
+ * This injection point is put in a transaction block to work with a wait
+ * that uses a condition variable.
+ */
+ INJECTION_POINT("autovacuum-worker-start");
+
/*
* Compute the multixact age for which freezing is urgent. This is
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
diff --git a/src/test/modules/test_misc/t/006_signal_autovacuum.pl b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
new file mode 100644
index 0000000000..98cbea3744
--- /dev/null
+++ b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
@@ -0,0 +1,100 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test signaling autovacuum worker backend by non-superuser role.
+#
+# Only non-superuser roles granted pg_signal_autovacuum_worker are allowed
+# to signal autovacuum workers. This test uses an injection point located
+# at the beginning of the autovacuum worker startup.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+
+# This ensures a quick worker spawn.
+$node->append_conf(
+ 'postgresql.conf', 'autovacuum_naptime = 1');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE ROLE regular_role;
+ CREATE ROLE signal_autovacuum_worker_role;
+ GRANT pg_signal_autovacuum_worker TO signal_autovacuum_worker_role;
+));
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+# Create some content and set an aggressive autovacuum.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_int(i int);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100);
+));
+
+$node->safe_psql(
+ 'postgres', qq(
+ INSERT INTO tab_int VALUES(1);
+));
+
+# Wait until an autovacuum worker starts.
+$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
+
+my $av_pid = $node->safe_psql(
+ 'postgres', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular role cannot terminate autovacuum worker.
+my $terminate_with_no_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE regular_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+like(
+ $psql_err,
+ qr/ERROR: permission denied to terminate process\nDETAIL: Only roles with privileges of the "pg_signal_autovacuum_worker" role may terminate autovacuum worker processes./,
+ "autovacuum worker not signaled with regular role");
+
+my $offset = -s $node->logfile;
+
+# Role with pg_signal_autovacuum can terminate autovacuum worker.
+my $terminate_with_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE signal_autovacuum_worker_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+# Check that the primary server logs a FATAL indicating that autovacuum
+# is terminated.
+ok( $node->log_contains(
+ qr/FATAL: terminating autovacuum process due to administrator command/,
+ $offset),
+ "autovacuum worker signaled with pg_signal_autovacuum_worker granted"
+);
+
+# Release injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('autovacuum-worker-start');");
+
+done_testing();
--
2.45.2
On Fri, Jun 21, 2024 at 10:31:30AM +0500, Andrey M. Borodin wrote:
Thanks for the pointer, I’ll try this approach!
Thanks. FWIW, I've put my mind into it, and fixed the thing a few
minutes ago:
/messages/by-id/ZnURUaujl39wSoEW@paquier.xyz
--
Michael
On 21 Jun 2024, at 10:36, Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Jun 14, 2024 at 03:12:50PM -0500, Nathan Bossart wrote:
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote:
This patch looks good to me.
Thanks for looking.
While double-checking the whole, where I don't have much to say about
0001, I have fixed a few issues with the test presented upthread and
stabilized it (CI and my stuff are both OK). I'd suggest to move it
to test_misc/, because there is no clear category where to put it, and
we have another test with injection points there for timeouts so the
module dependency with EXTRA_INSTALL is already cleared.What do you think?
Thanks Michael!
All changes look good to me.
I just have one more concern: we do not wakeup() upon test end. I observed that there might happen more autovacuums and start sleeping in injection point. In every case I observed - these autovacuums quit gracefully. But is it guaranteed that test will shut down node even if some of backends are waiting in injection points?
Or, perhaps, should we always wakeup() after detaching? (in case when new point run might happen)
Best regards, Andrey Borodin.
I've committed 0001. It looks like 0002 failed CI testing [0]https://cirrus-ci.com/task/5668467599212544, but I
haven't investigated why.
[0]: https://cirrus-ci.com/task/5668467599212544
--
nathan
Hi
On Tue, 9 Jul 2024 at 23:13, Nathan Bossart <nathandbossart@gmail.com> wrote:
I've committed 0001. It looks like 0002 failed CI testing [0], but I
haven't investigated why.[0] https://cirrus-ci.com/task/5668467599212544
--
nathan
The problem is the error message has been changed.
# DETAIL: Only roles with privileges of the
"pg_signal_autovacuum_worker" role may terminate autovacuum workers.'
# doesn't match '(?^:ERROR: permission denied to terminate
process\nDETAIL: Only roles with privileges of the
"pg_signal_autovacuum_worker" role may terminate autovacuum worker
processes.)'
# Looks like you failed 1 test of 2.
I changed the test to match the error message.
Attachments:
v7-0002-Add-tap-test-for-pg_signal_autovacuum-role.patchapplication/octet-stream; name=v7-0002-Add-tap-test-for-pg_signal_autovacuum-role.patchDownload
From e07f8fbc8a7e5c024deda800b97456d5446e622f Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 21 Jun 2024 14:29:23 +0900
Subject: [PATCH v7] Add tap test for pg_signal_autovacuum role
---
src/backend/postmaster/autovacuum.c | 7 ++
.../test_misc/t/006_signal_autovacuum.pl | 100 ++++++++++++++++++
2 files changed, 107 insertions(+)
create mode 100644 src/test/modules/test_misc/t/006_signal_autovacuum.pl
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 928754b51c..4e4a0ccbef 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
#include "utils/fmgroids.h"
#include "utils/fmgrprotos.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -1902,6 +1903,12 @@ do_autovacuum(void)
/* Start a transaction so our commands have one to play into. */
StartTransactionCommand();
+ /*
+ * This injection point is put in a transaction block to work with a wait
+ * that uses a condition variable.
+ */
+ INJECTION_POINT("autovacuum-worker-start");
+
/*
* Compute the multixact age for which freezing is urgent. This is
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
diff --git a/src/test/modules/test_misc/t/006_signal_autovacuum.pl b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
new file mode 100644
index 0000000000..250f598401
--- /dev/null
+++ b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
@@ -0,0 +1,100 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test signaling autovacuum worker backend by non-superuser role.
+#
+# Only non-superuser roles granted pg_signal_autovacuum_worker are allowed
+# to signal autovacuum workers. This test uses an injection point located
+# at the beginning of the autovacuum worker startup.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+
+# This ensures a quick worker spawn.
+$node->append_conf(
+ 'postgresql.conf', 'autovacuum_naptime = 1');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE ROLE regular_role;
+ CREATE ROLE signal_autovacuum_worker_role;
+ GRANT pg_signal_autovacuum_worker TO signal_autovacuum_worker_role;
+));
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+# Create some content and set an aggressive autovacuum.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_int(i int);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100);
+));
+
+$node->safe_psql(
+ 'postgres', qq(
+ INSERT INTO tab_int VALUES(1);
+));
+
+# Wait until an autovacuum worker starts.
+$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
+
+my $av_pid = $node->safe_psql(
+ 'postgres', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular role cannot terminate autovacuum worker.
+my $terminate_with_no_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE regular_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+like(
+ $psql_err,
+ qr/ERROR: permission denied to terminate process\nDETAIL: Only roles with privileges of the "pg_signal_autovacuum_worker" role may terminate autovacuum worker./,
+ "autovacuum worker not signaled with regular role");
+
+my $offset = -s $node->logfile;
+
+# Role with pg_signal_autovacuum can terminate autovacuum worker.
+my $terminate_with_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE signal_autovacuum_worker_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+# Check that the primary server logs a FATAL indicating that autovacuum
+# is terminated.
+ok( $node->log_contains(
+ qr/FATAL: terminating autovacuum process due to administrator command/,
+ $offset),
+ "autovacuum worker signaled with pg_signal_autovacuum_worker granted"
+);
+
+# Release injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('autovacuum-worker-start');");
+
+done_testing();
--
2.25.1
On Tue, Jul 09, 2024 at 01:12:59PM -0500, Nathan Bossart wrote:
I've committed 0001. It looks like 0002 failed CI testing [0], but I
haven't investigated why.
Nice catch by the CI. This looks like a race condition to me. I
think that we should wait for the autovacuum worker to exit, and then
scan the server logs we expect.
For this failure, look at the timestamps of the server logs:
2024-07-08 12:48:23.271 UTC [32697][client backend]
[006_signal_autovacuum.pl][11/3:0] LOG: statement: SELECT
pg_terminate_backend(32672);
2024-07-08 12:48:23.275 UTC [32697][client backend]
[006_signal_autovacuum.pl][:0] LOG: disconnection: session time:
0:00:00.018 user=postgres database=postgres host=[local]
2024-07-08 12:48:23.278 UTC [32672][autovacuum worker] FATAL:
terminating autovacuum process due to administrator command
And then the timestamp of the tests:
[12:48:23.277](0.058s) not ok 2 - autovacuum worker signaled with
pg_signal_autovacuum_worker granted
We check for the contents of the logs 1ms before they are generated,
hence failing the lookup check because the test is faster than the
backend.
Like what we are doing in 010_pg_basebackup.pl, we could do a
poll_query_until() until the PID of the autovacuum worker is gone from
pg_stat_activity before fetching the logs as ProcessInterrupts() stuff
would be logged before the process exits, say:
+# Wait for the autovacuum worker to exit before scanning the logs.
+$node->poll_query_until('postgres',
+ "SELECT count(*) = 0 FROM pg_stat_activity "
+ . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");
That gives something like the attached. Does that look correct to
you?
--
Michael
Attachments:
v7-0002-Add-tap-test-for-pg_signal_autovacuum-role.patchtext/x-diff; charset=us-asciiDownload
From 4f79a7741a0ff38ab063c02bc81a2ace60848d92 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 10 Jul 2024 14:13:00 +0900
Subject: [PATCH v7] Add tap test for pg_signal_autovacuum role
---
src/backend/postmaster/autovacuum.c | 7 ++
.../test_misc/t/006_signal_autovacuum.pl | 105 ++++++++++++++++++
2 files changed, 112 insertions(+)
create mode 100644 src/test/modules/test_misc/t/006_signal_autovacuum.pl
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 928754b51c..4e4a0ccbef 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
#include "utils/fmgroids.h"
#include "utils/fmgrprotos.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -1902,6 +1903,12 @@ do_autovacuum(void)
/* Start a transaction so our commands have one to play into. */
StartTransactionCommand();
+ /*
+ * This injection point is put in a transaction block to work with a wait
+ * that uses a condition variable.
+ */
+ INJECTION_POINT("autovacuum-worker-start");
+
/*
* Compute the multixact age for which freezing is urgent. This is
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
diff --git a/src/test/modules/test_misc/t/006_signal_autovacuum.pl b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
new file mode 100644
index 0000000000..136dc8c52e
--- /dev/null
+++ b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
@@ -0,0 +1,105 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test signaling autovacuum worker backend by non-superuser role.
+#
+# Only non-superuser roles granted pg_signal_autovacuum_worker are allowed
+# to signal autovacuum workers. This test uses an injection point located
+# at the beginning of the autovacuum worker startup.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+
+# This ensures a quick worker spawn.
+$node->append_conf(
+ 'postgresql.conf', 'autovacuum_naptime = 1');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE ROLE regular_role;
+ CREATE ROLE signal_autovacuum_worker_role;
+ GRANT pg_signal_autovacuum_worker TO signal_autovacuum_worker_role;
+));
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+# Create some content and set an aggressive autovacuum.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_int(i int);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1);
+ ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100);
+));
+
+$node->safe_psql(
+ 'postgres', qq(
+ INSERT INTO tab_int VALUES(1);
+));
+
+# Wait until an autovacuum worker starts.
+$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
+
+my $av_pid = $node->safe_psql(
+ 'postgres', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular role cannot terminate autovacuum worker.
+my $terminate_with_no_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE regular_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+like(
+ $psql_err,
+ qr/ERROR: permission denied to terminate process\nDETAIL: Only roles with privileges of the "pg_signal_autovacuum_worker" role may terminate autovacuum workers./,
+ "autovacuum worker not signaled with regular role");
+
+my $offset = -s $node->logfile;
+
+# Role with pg_signal_autovacuum can terminate autovacuum worker.
+my $terminate_with_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE signal_autovacuum_worker_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+# Wait for the autovacuum worker to exit before scanning the logs.
+$node->poll_query_until('postgres',
+ "SELECT count(*) = 0 FROM pg_stat_activity "
+ . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");
+
+# Check that the primary server logs a FATAL indicating that autovacuum
+# is terminated.
+ok( $node->log_contains(
+ qr/FATAL: terminating autovacuum process due to administrator command/,
+ $offset),
+ "autovacuum worker signaled with pg_signal_autovacuum_worker granted"
+);
+
+# Release injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('autovacuum-worker-start');");
+
+done_testing();
--
2.45.2
On Wed, Jul 10, 2024 at 10:03:04AM +0500, Kirill Reshke wrote:
The problem is the error message has been changed.
# DETAIL: Only roles with privileges of the
"pg_signal_autovacuum_worker" role may terminate autovacuum workers.'
# doesn't match '(?^:ERROR: permission denied to terminate
process\nDETAIL: Only roles with privileges of the
"pg_signal_autovacuum_worker" role may terminate autovacuum worker
processes.)'
# Looks like you failed 1 test of 2.I changed the test to match the error message.
The script has two tests, and the CI is failing for the second test
where we expect the signal to be processed:
[12:48:23.370] # Failed test 'autovacuum worker signaled with
pg_signal_autovacuum_worker granted'
[12:48:23.370] # at t/006_signal_autovacuum.pl line 90.
It is true that the first test where we expect the signal to not go
through also failed as the DETAIL string has been changed, which is
what you've fixed :)
--
Michael
The script has two tests, and the CI is failing for the second test
where we expect the signal to be processed:
[12:48:23.370] # Failed test 'autovacuum worker signaled with
pg_signal_autovacuum_worker granted'
[12:48:23.370] # at t/006_signal_autovacuum.pl line 90.
--
Michael
That's very strange, because the test works fine on my virtual
machine. Also, it seems that it works in Cirrus [0]https://api.cirrus-ci.com/v1/artifact/task/5668467599212544/log/src/test/modules/test_misc/tmp_check/log/006_signal_autovacuum_node.log, as there is this
line:
[autovacuum worker] FATAL: terminating autovacuum process due to
administrator command
after `SET ROLE signal_autovacuum_worker_role;` and `SELECT
pg_terminate_backend` in the log file.
Somehow the `node->log_contains` check does not catch that. Maybe
there is some issue with `$offset`? Will try to investigate
On 10 Jul 2024, at 11:27, Kirill Reshke <reshkekirill@gmail.com> wrote:
That's very strange, because the test works fine on my virtual
machine. Also, it seems that it works in Cirrus [0], as there is this
line:
So far I could not reproduce that failure.
I’ve checkouted 6edec53 from CFbot repository, but it works fine in both Cirrus[0,1,2] and my machines…
It seems like we have to rely on intuition to know what happened.
Best regards, Andrey Borodin.
[0]: https://github.com/x4m/postgres_g/runs/27266322657
[1]: https://github.com/x4m/postgres_g/runs/27266278325
[2]: https://github.com/x4m/postgres_g/runs/27266052318
Hi, that's for digging into this. Turns out I completely missed one of
your emails today morning.
On Wed, 10 Jul 2024 at 10:15, Michael Paquier <michael@paquier.xyz> wrote:
And then the timestamp of the tests:
[12:48:23.277](0.058s) not ok 2 - autovacuum worker signaled with
pg_signal_autovacuum_worker grantedWe check for the contents of the logs 1ms before they are generated,
hence failing the lookup check because the test is faster than the
backend.Like what we are doing in 010_pg_basebackup.pl, we could do a poll_query_until() until the PID of the autovacuum worker is gone from pg_stat_activity before fetching the logs as ProcessInterrupts() stuff would be logged before the process exits, say: +# Wait for the autovacuum worker to exit before scanning the logs. +$node->poll_query_until('postgres', + "SELECT count(*) = 0 FROM pg_stat_activity " + . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");That gives something like the attached. Does that look correct to
you?
--
Michael
+1.
On Wed, Jul 10, 2024 at 10:57:45PM +0500, Kirill Reshke wrote:
Hi, that's for digging into this. Turns out I completely missed one of
your emails today morning.
Don't worry. Using this domain tends to put my emails in one's spam
folder.
--
Michael
On Wed, Jul 10, 2024 at 02:14:55PM +0900, Michael Paquier wrote:
+# Only non-superuser roles granted pg_signal_autovacuum_worker are allowed +# to signal autovacuum workers. This test uses an injection point located +# at the beginning of the autovacuum worker startup.
nitpick: Superuser roles are also allowed to signal autovacuum workers.
Maybe this should read "Only roles with privileges of..."
+# Create some content and set an aggressive autovacuum. +$node->safe_psql( + 'postgres', qq( + CREATE TABLE tab_int(i int); + ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1); + ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100); +)); + +$node->safe_psql( + 'postgres', qq( + INSERT INTO tab_int VALUES(1); +)); + +# Wait until an autovacuum worker starts. +$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
I'm not following how this is guaranteed to trigger an autovacuum quickly.
Shouldn't we set autovacuum_vacuum_insert_threshold to 1 so that it is
eligible for autovacuum?
+# Wait for the autovacuum worker to exit before scanning the logs. +$node->poll_query_until('postgres', + "SELECT count(*) = 0 FROM pg_stat_activity " + . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");
WFM. Even if the PID is quickly reused, this should work. We just might
end up waiting a little longer.
Is it worth testing cancellation, too?
--
nathan
On Thu, Jul 11, 2024 at 08:50:57PM -0500, Nathan Bossart wrote:
I'm not following how this is guaranteed to trigger an autovacuum quickly.
Shouldn't we set autovacuum_vacuum_insert_threshold to 1 so that it is
eligible for autovacuum?
You are right, this is not going to influence a faster startup of a
worker, so we could remove that entirely. On closer look, the main
bottlebeck is that the test is spending a lot of time waiting on the
naptime even if we are using the minimum value of 1s, as the scan of
pg_stat_activity checking for workers keeps looping.
[ ..thinks.. ]
I have one trick in my sleeve for this one to make the launcher more
responsive in checking the timestamps of the database list. That's
not perfect, but it reduces the wait behind the worker startups by
400ms (1s previously as of the naptime, 600ms now) with a reload to
set the launcher's latch after the injection point has been attached.
The difference in runtime is noticeable.
+# Wait for the autovacuum worker to exit before scanning the logs. +$node->poll_query_until('postgres', + "SELECT count(*) = 0 FROM pg_stat_activity " + . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");WFM. Even if the PID is quickly reused, this should work. We just might
end up waiting a little longer.Is it worth testing cancellation, too?
The point is to check after pg_signal_backend, so I am not sure it is
worth the extra cycles for the cancellation.
Attaching the idea, with a fix for the comment you have mentioned and
appending "regress_" the role names for the warnings generated by
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, while on it.
What do you think?
--
Michael
Attachments:
v3-0001-Add-tap-test-for-pg_signal_autovacuum-role.patchtext/x-diff; charset=us-asciiDownload
From 8a827a8a59da4d3c2f2115151e1241cf663c11a1 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 12 Jul 2024 14:19:51 +0900
Subject: [PATCH v3] Add tap test for pg_signal_autovacuum role
---
src/backend/postmaster/autovacuum.c | 7 ++
.../test_misc/t/006_signal_autovacuum.pl | 94 +++++++++++++++++++
2 files changed, 101 insertions(+)
create mode 100644 src/test/modules/test_misc/t/006_signal_autovacuum.pl
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 928754b51c..4e4a0ccbef 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
#include "utils/fmgroids.h"
#include "utils/fmgrprotos.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/ps_status.h"
@@ -1902,6 +1903,12 @@ do_autovacuum(void)
/* Start a transaction so our commands have one to play into. */
StartTransactionCommand();
+ /*
+ * This injection point is put in a transaction block to work with a wait
+ * that uses a condition variable.
+ */
+ INJECTION_POINT("autovacuum-worker-start");
+
/*
* Compute the multixact age for which freezing is urgent. This is
* normally autovacuum_multixact_freeze_max_age, but may be less if we are
diff --git a/src/test/modules/test_misc/t/006_signal_autovacuum.pl b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
new file mode 100644
index 0000000000..ded42b1be9
--- /dev/null
+++ b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
@@ -0,0 +1,94 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test signaling autovacuum worker with pg_signal_autovacuum_worker.
+#
+# Only roles with privileges of pg_signal_autovacuum_worker are allowed to
+# signal autovacuum workers. This test uses an injection point located
+# at the beginning of the autovacuum worker startup.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+
+# This ensures a quick worker spawn.
+$node->append_conf(
+ 'postgresql.conf', 'autovacuum_naptime = 1');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE ROLE regress_regular_role;
+ CREATE ROLE regress_worker_role;
+ GRANT pg_signal_autovacuum_worker TO regress_worker_role;
+));
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+$node->reload();
+
+# Wait until an autovacuum worker starts.
+$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
+
+my $av_pid = $node->safe_psql(
+ 'postgres', qq(
+ SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular role cannot terminate autovacuum worker.
+my $terminate_with_no_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE regress_regular_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+like(
+ $psql_err,
+ qr/ERROR: permission denied to terminate process\nDETAIL: Only roles with privileges of the "pg_signal_autovacuum_worker" role may terminate autovacuum workers./,
+ "autovacuum worker not signaled with regular role");
+
+my $offset = -s $node->logfile;
+
+# Role with pg_signal_autovacuum can terminate autovacuum worker.
+my $terminate_with_pg_signal_av = $node->psql(
+ 'postgres', qq(
+ SET ROLE regress_worker_role;
+ SELECT pg_terminate_backend($av_pid);
+),
+ stdout => \$psql_out,
+ stderr => \$psql_err);
+
+# Wait for the autovacuum worker to exit before scanning the logs.
+$node->poll_query_until('postgres',
+ "SELECT count(*) = 0 FROM pg_stat_activity "
+ . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");
+
+# Check that the primary server logs a FATAL indicating that autovacuum
+# is terminated.
+ok( $node->log_contains(
+ qr/FATAL: terminating autovacuum process due to administrator command/,
+ $offset),
+ "autovacuum worker signaled with pg_signal_autovacuum_worker granted"
+);
+
+# Release injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('autovacuum-worker-start');");
+
+done_testing();
--
2.45.2
On Fri, Jul 12, 2024 at 02:21:09PM +0900, Michael Paquier wrote:
On Thu, Jul 11, 2024 at 08:50:57PM -0500, Nathan Bossart wrote:
I'm not following how this is guaranteed to trigger an autovacuum quickly.
Shouldn't we set autovacuum_vacuum_insert_threshold to 1 so that it is
eligible for autovacuum?You are right, this is not going to influence a faster startup of a
worker, so we could remove that entirely. On closer look, the main
bottlebeck is that the test is spending a lot of time waiting on the
naptime even if we are using the minimum value of 1s, as the scan of
pg_stat_activity checking for workers keeps looping.
I suppose it would be silly to allow even lower values for
autovacuum_naptime (e.g., by moving it to ConfigureNamesReal and setting
the minimum to 0.1).
I have one trick in my sleeve for this one to make the launcher more
responsive in checking the timestamps of the database list. That's
not perfect, but it reduces the wait behind the worker startups by
400ms (1s previously as of the naptime, 600ms now) with a reload to
set the launcher's latch after the injection point has been attached.
The difference in runtime is noticeable.
That's a neat trick. I was confused why this test generates an autovacuum
worker at all, but I now see that you are pausing it before we even gather
the list of tables that need to be vacuumed.
Is it worth testing cancellation, too?
The point is to check after pg_signal_backend, so I am not sure it is
worth the extra cycles for the cancellation.
Agreed.
What do you think?
Looks reasonable to me.
--
nathan
On Fri, Jul 12, 2024 at 11:19:05AM -0500, Nathan Bossart wrote:
I suppose it would be silly to allow even lower values for
autovacuum_naptime (e.g., by moving it to ConfigureNamesReal and setting
the minimum to 0.1).
I've thought about that as well, and did not mention it as this would
encourage insanely low naptime values resulting in fork() bursts.
That's a neat trick. I was confused why this test generates an autovacuum
worker at all, but I now see that you are pausing it before we even gather
the list of tables that need to be vacuumed.
Yep. More aggressive signals aren't going to help. One thing I also
considered here is to manipulate the db list timestamps inside a
USE_INJECTION_POINTS block in the launcher to make the spawn more
aggressive. Anyway, with 600ms in detection where I've tested it, I
can live with the responsiveness of the patch as proposed.
Looks reasonable to me.
Thanks. I'll see about stressing the buildfarm tomorrow or so, after
looking at how the CI reacts.
--
Michael
On Mon, Jul 15, 2024 at 09:54:43AM +0900, Michael Paquier wrote:
Thanks. I'll see about stressing the buildfarm tomorrow or so, after
looking at how the CI reacts.
There were a few more things here:
1) The new test was missing from test_misc/meson.build.
2) With 1) fixed, the CI has been complaining about the test
stability, when retrieving the PID of a worker with this query:
SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker'
And it's annoying to have missed what's wrong here:
- We don't check that the PID comes from a worker waiting on an
injection point, so it could be a PID of something running, still gone
once the signals are processed.
- No limit check, so we could finish with a list of PIDs while only
one is necessary. Windows was slow enough to spot that, spawning
multiple autovacuum workers waiting on the injection point.
After improving all that, I have checked again the CI and it was
happy, so applied on HEAD. Let's see now how the buildfarm reacts.
--
Michael
Hi,
On 2024-07-09 13:12:59 -0500, Nathan Bossart wrote:
I've committed 0001.
I justed ended up looking at this code for some boring reason. One thing that
has me worried a bit is that pg_signal_backend() now does
pgstat_get_beentry_by_proc_number(), triggering a pgstat_read_current_status()
further down.
pgstat_read_current_status() can be fairly expensive, both in CPU and in
memory. It copies each proc's activity strings, which each can be 1kB by
default!
The "saving grace" is that most of the time the pid will be sourced from
pg_stat_activity, which already will will have done
pgstat_read_current_status(). But I don't think that's always the case.
Another concern is that pgstat_read_current_status() won't actually refresh
the information if already cached, which might cause us to operate on outdated
information. A malicious user could start a transaction, cause
pgstat_read_current_status() to be called, and wait for the PID to be reused
for some interesting process, and then signal, which the outdated information
from the prior pgstat_read_current_status() would allow.
The threat of this isn't huge, there's some fundamental raciness around
pg_signal_backend() but this does open the window a lot wider.
And all of this just because we need the target proc's BackendType, a single 4
byte value.
I think it'd be better to introduce something that fetches a live
BackendType. We have such functionality already, see
pgstat_get_backend_current_activity().
Or we could have a copy of the backend type in PGPROC.
Greetings,
Andres Freund
On Fri, 22 Nov 2024 at 22:13, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2024-07-09 13:12:59 -0500, Nathan Bossart wrote:
I've committed 0001.
I justed ended up looking at this code for some boring reason. One thing that
has me worried a bit is that pg_signal_backend() now does
pgstat_get_beentry_by_proc_number(), triggering a pgstat_read_current_status()
further down.pgstat_read_current_status() can be fairly expensive, both in CPU and in
memory. It copies each proc's activity strings, which each can be 1kB by
default!The "saving grace" is that most of the time the pid will be sourced from
pg_stat_activity, which already will will have done
pgstat_read_current_status(). But I don't think that's always the case.Another concern is that pgstat_read_current_status() won't actually refresh
the information if already cached, which might cause us to operate on outdated
information. A malicious user could start a transaction, cause
pgstat_read_current_status() to be called, and wait for the PID to be reused
for some interesting process, and then signal, which the outdated information
from the prior pgstat_read_current_status() would allow.The threat of this isn't huge, there's some fundamental raciness around
pg_signal_backend() but this does open the window a lot wider.And all of this just because we need the target proc's BackendType, a single 4
byte value.I think it'd be better to introduce something that fetches a live
BackendType. We have such functionality already, see
pgstat_get_backend_current_activity().Or we could have a copy of the backend type in PGPROC.
Greetings,
Andres Freund
Hi, thanks for taking care of this. I agree with this analysis, and it
appears that the worries are legitimate.
For the fix, I believe that copy-pasting
`pgstat_get_backend_current_activity` to get the backend type should
solve the issue. Not sure if this is the correct way of doing this.
Enlarging PGPROC somehow feels worse.
--
Best regards,
Kirill Reshke
On Fri, Nov 22, 2024 at 12:13:49PM -0500, Andres Freund wrote:
I justed ended up looking at this code for some boring reason. One thing that
has me worried a bit is that pg_signal_backend() now does
pgstat_get_beentry_by_proc_number(), triggering a pgstat_read_current_status()
further down.pgstat_read_current_status() can be fairly expensive, both in CPU and in
memory. It copies each proc's activity strings, which each can be 1kB by
default!
Thanks for bringing this to my attention.
I think it'd be better to introduce something that fetches a live
BackendType. We have such functionality already, see
pgstat_get_backend_current_activity().
Here is a draft-grade patch for this one. It seems pretty
straightforward...
--
nathan
Attachments:
v1-0001-avoid-calling-pgstat_read_current_status-in-pg_si.patchtext/plain; charset=us-asciiDownload
From a5a3eb803b255acf4757217b9715e76f0a3c192c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 22 Nov 2024 13:19:44 -0600
Subject: [PATCH v1 1/1] avoid calling pgstat_read_current_status() in
pg_signal_backend()
---
src/backend/storage/ipc/signalfuncs.c | 5 +--
src/backend/utils/activity/backend_status.c | 37 +++++++++++++++++++++
src/include/utils/backend_status.h | 1 +
3 files changed, 39 insertions(+), 4 deletions(-)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index aa729a36e3..8e165e9729 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -87,10 +87,7 @@ pg_signal_backend(int pid, int sig)
*/
if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
{
- ProcNumber procNumber = GetNumberFromPGProc(proc);
- PgBackendStatus *procStatus = pgstat_get_beentry_by_proc_number(procNumber);
-
- if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER)
+ if (pgstat_get_backend_type(pid) == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER))
return SIGNAL_BACKEND_NOAUTOVAC;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index bdb3a296ca..4b39e30ca5 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -941,6 +941,43 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
return "<backend information not available>";
}
+BackendType
+pgstat_get_backend_type(int pid)
+{
+ PgBackendStatus *beentry = BackendStatusArray;
+
+ for (int i = 1; i <= MaxBackends; i++)
+ {
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ found = (vbeentry->st_procpid == pid);
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if (pgstat_read_activity_complete(before_changecount,
+ after_changecount))
+ break;
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ return beentry->st_backendType;
+
+ beentry++;
+ }
+
+ return B_INVALID;
+}
+
/* ----------
* pgstat_get_crashed_backend_activity() -
*
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 97874300c3..dfa4aa6127 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -320,6 +320,7 @@ extern void pgstat_report_tempfile(size_t filesize);
extern void pgstat_report_appname(const char *appname);
extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
+extern BackendType pgstat_get_backend_type(int pid);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
--
2.39.5 (Apple Git-154)
Hi,
On 2024-11-22 13:21:53 -0600, Nathan Bossart wrote:
Here is a draft-grade patch for this one. It seems pretty
straightforward...
Thanks for looking into it quickly.
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c index aa729a36e3..8e165e9729 100644 --- a/src/backend/storage/ipc/signalfuncs.c +++ b/src/backend/storage/ipc/signalfuncs.c @@ -87,10 +87,7 @@ pg_signal_backend(int pid, int sig) */ if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) { - ProcNumber procNumber = GetNumberFromPGProc(proc); - PgBackendStatus *procStatus = pgstat_get_beentry_by_proc_number(procNumber); - - if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER) + if (pgstat_get_backend_type(pid) == B_AUTOVAC_WORKER) { if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER)) return SIGNAL_BACKEND_NOAUTOVAC;
Because we already mapped the pid to a ProcNumber, it'd be cheaper to access
the backend status via procnumber.
+BackendType +pgstat_get_backend_type(int pid) +{ + PgBackendStatus *beentry = BackendStatusArray; + + for (int i = 1; i <= MaxBackends; i++) + { + volatile PgBackendStatus *vbeentry = beentry; + bool found; + + for (;;) + { + int before_changecount; + int after_changecount; + + pgstat_begin_read_activity(vbeentry, before_changecount); + + found = (vbeentry->st_procpid == pid); + + pgstat_end_read_activity(vbeentry, after_changecount);
We don't need the pgstat_begin_read_activity() protocol when just accessing a
single 4 byte value - we assume in lots of places that can be read in a
non-tearable way.
+ if (pgstat_read_activity_complete(before_changecount, + after_changecount)) + break; + + CHECK_FOR_INTERRUPTS(); + } + + if (found) + return beentry->st_backendType;
But if we were to follow it, we'd certainly need to use it here too.
Greetings,
Andres Freund
On Fri, Nov 22, 2024 at 06:13:16PM -0500, Andres Freund wrote:
- if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER) + if (pgstat_get_backend_type(pid) == B_AUTOVAC_WORKER)Because we already mapped the pid to a ProcNumber, it'd be cheaper to access
the backend status via procnumber.
D'oh, I missed that ProcNumber could be used as an index for the
BackendStatusArray. Is the attached more like what you are imagining?
We don't need the pgstat_begin_read_activity() protocol when just accessing a
single 4 byte value - we assume in lots of places that can be read in a
non-tearable way.+ if (pgstat_read_activity_complete(before_changecount, + after_changecount)) + break; + + CHECK_FOR_INTERRUPTS(); + } + + if (found) + return beentry->st_backendType;But if we were to follow it, we'd certainly need to use it here too.
I see.
--
nathan
Attachments:
v2-0001-avoid-calling-pgstat_read_current_status-in-pg_si.patchtext/plain; charset=us-asciiDownload
From 50220e60b8f1e15a63bdfec22cdaef2ebce549d6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 22 Nov 2024 20:39:35 -0600
Subject: [PATCH v2 1/1] avoid calling pgstat_read_current_status() in
pg_signal_backend()
---
src/backend/storage/ipc/signalfuncs.c | 4 ++--
src/backend/utils/activity/backend_status.c | 11 +++++++++++
src/include/utils/backend_status.h | 1 +
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index aa729a36e3..d835327600 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -88,9 +88,9 @@ pg_signal_backend(int pid, int sig)
if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
{
ProcNumber procNumber = GetNumberFromPGProc(proc);
- PgBackendStatus *procStatus = pgstat_get_beentry_by_proc_number(procNumber);
+ BackendType backendType = pgstat_get_backend_type_by_proc_number(procNumber);
- if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER)
+ if (backendType == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER))
return SIGNAL_BACKEND_NOAUTOVAC;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index bdb3a296ca..19796909e7 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1134,6 +1134,17 @@ pgstat_get_local_beentry_by_index(int idx)
}
+BackendType
+pgstat_get_backend_type_by_proc_number(ProcNumber procNumber)
+{
+ volatile PgBackendStatus *backendStatus;
+
+ backendStatus = &BackendStatusArray[procNumber];
+
+ return backendStatus->st_backendType;
+}
+
+
/* ----------
* pgstat_fetch_stat_numbackends() -
*
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 97874300c3..9640820edd 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -334,6 +334,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_get_beentry_by_proc_number(ProcNumber procNumber);
extern LocalPgBackendStatus *pgstat_get_local_beentry_by_proc_number(ProcNumber procNumber);
extern LocalPgBackendStatus *pgstat_get_local_beentry_by_index(int idx);
+extern BackendType pgstat_get_backend_type_by_proc_number(ProcNumber procNumber);
extern char *pgstat_clip_activity(const char *raw_activity);
--
2.39.5 (Apple Git-154)
On Sat, 23 Nov 2024, 07:44 Nathan Bossart, <nathandbossart@gmail.com> wrote:
On Fri, Nov 22, 2024 at 06:13:16PM -0500, Andres Freund wrote:
- if (procStatus && procStatus->st_backendType ==
B_AUTOVAC_WORKER)
+ if (pgstat_get_backend_type(pid) == B_AUTOVAC_WORKER)
Because we already mapped the pid to a ProcNumber, it'd be cheaper to
access
the backend status via procnumber.
D'oh, I missed that ProcNumber could be used as an index for the
BackendStatusArray. Is the attached more like what you are imagining?We don't need the pgstat_begin_read_activity() protocol when just
accessing a
single 4 byte value - we assume in lots of places that can be read in a
non-tearable way.+ if
(pgstat_read_activity_complete(before_changecount,
+
after_changecount))
+ break; + + CHECK_FOR_INTERRUPTS(); + } + + if (found) + return beentry->st_backendType;But if we were to follow it, we'd certainly need to use it here too.
I see.
--
nathan
Hi!
LGTM
Show quoted text
Hi,
On 2024-11-22 20:44:34 -0600, Nathan Bossart wrote:
On Fri, Nov 22, 2024 at 06:13:16PM -0500, Andres Freund wrote:
- if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER) + if (pgstat_get_backend_type(pid) == B_AUTOVAC_WORKER)Because we already mapped the pid to a ProcNumber, it'd be cheaper to access
the backend status via procnumber.D'oh, I missed that ProcNumber could be used as an index for the
BackendStatusArray. Is the attached more like what you are imagining?
Yes.
I'd probably add two function header comments:
1) explicit caution that this is fetching information not from the snapshot
but "live" data
2) the return value might be out of date, that the procnumber needs to be
valid and that the caller is responsible for permission checking
I'd also add a comment do the code saying that it's fine to bypass the
changecount mechanism, because we're reading a single 4 byte integer.
Greetings,
Andres Freund
On Tue, Nov 26, 2024 at 12:27:33PM -0500, Andres Freund wrote:
On 2024-11-22 20:44:34 -0600, Nathan Bossart wrote:
D'oh, I missed that ProcNumber could be used as an index for the
BackendStatusArray. Is the attached more like what you are imagining?Yes.
I'd probably add two function header comments:
1) explicit caution that this is fetching information not from the snapshot
but "live" data
2) the return value might be out of date, that the procnumber needs to be
valid and that the caller is responsible for permission checkingI'd also add a comment do the code saying that it's fine to bypass the
changecount mechanism, because we're reading a single 4 byte integer.
I've attempted to add all these details in v3.
--
nathan
Attachments:
v3-0001-Look-up-backend-type-in-pg_signal_backend-more-ch.patchtext/plain; charset=us-asciiDownload
From 42c35664c8a842c4d6df9cd68a58dc6cf1e35454 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 26 Nov 2024 15:04:41 -0600
Subject: [PATCH v3 1/1] Look up backend type in pg_signal_backend() more
cheaply.
Commit ccd38024bc, which introduced the pg_signal_autovacuum_worker
role, added a call to pgstat_get_beentry_by_proc_number() for the
purpose of determining whether the targeted backend process is an
autovacuum worker. This function calls
pgstat_read_current_status(), which can be fairly expensive and may
return cached, out-of-date information. Since we just need to look
up the target backend's BackendType, and we already know its
ProcNumber, we can instead inspect the BackendStatusArray directly,
which is much less expensive and probably more up-to-date. There
are some caveats with this approach (which are documented in the
code), but it's still substantially better than before.
Reported-by: Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/ujenaa2uabzfkwxwmfifawzdozh3ljr7geozlhftsuosgm7n7q%40g3utqqyyosb6
---
src/backend/storage/ipc/signalfuncs.c | 4 ++--
src/backend/utils/activity/backend_status.c | 25 +++++++++++++++++++++
src/include/utils/backend_status.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index aa729a36e3..d835327600 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -88,9 +88,9 @@ pg_signal_backend(int pid, int sig)
if (!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
{
ProcNumber procNumber = GetNumberFromPGProc(proc);
- PgBackendStatus *procStatus = pgstat_get_beentry_by_proc_number(procNumber);
+ BackendType backendType = pgstat_get_backend_type_by_proc_number(procNumber);
- if (procStatus && procStatus->st_backendType == B_AUTOVAC_WORKER)
+ if (backendType == B_AUTOVAC_WORKER)
{
if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM_WORKER))
return SIGNAL_BACKEND_NOAUTOVAC;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index bdb3a296ca..777f7410a7 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1036,6 +1036,31 @@ pgstat_get_my_query_id(void)
return MyBEEntry->st_query_id;
}
+/* ----------
+ * pgstat_get_backend_type_by_proc_number() -
+ *
+ * Return the type of the backend with the specified ProcNumber. This looks
+ * directly at the BackendStatusArray, so the return value may be out of date.
+ * The only current use of this function is in pg_signal_backend(), which is
+ * inherently racy, so we don't worry too much about this.
+ *
+ * It is the caller's responsibility to use this wisely; at minimum, callers
+ * should ensure that procNumber is valid and perform the required permissions
+ * checking.
+ * ----------
+ */
+BackendType
+pgstat_get_backend_type_by_proc_number(ProcNumber procNumber)
+{
+ volatile PgBackendStatus *status = &BackendStatusArray[procNumber];
+
+ /*
+ * We bypass the changecount mechanism since fetching and storing an int
+ * is almost certainly atomic.
+ */
+ return status->st_backendType;
+}
+
/* ----------
* cmp_lbestatus
*
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 97874300c3..4e8b39a66d 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -323,6 +323,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
+extern BackendType pgstat_get_backend_type_by_proc_number(ProcNumber procNumber);
/* ----------
--
2.39.5 (Apple Git-154)
On 2024-11-26 15:07:24 -0600, Nathan Bossart wrote:
I've attempted to add all these details in v3.
LGTM!