Weird test mixup

Started by Heikki Linnakangasalmost 2 years ago52 messages

hlinnaka@iki.fi

almost 2 years ago

I got a weird test failure while testing my forking refactor patches on
Cirrus CI
(https://cirrus-ci.com/task/5880724448870400?logs=test_running#L121):

[16:52:39.753] Summary of Failures:
[16:52:39.753]
[16:52:39.753] 66/73 postgresql:intarray-running / intarray-running/regress ERROR 6.27s exit status 1
[16:52:39.753]
[16:52:39.753] Ok: 72
[16:52:39.753] Expected Fail: 0
[16:52:39.753] Fail: 1
[16:52:39.753] Unexpected Pass: 0
[16:52:39.753] Skipped: 0
[16:52:39.753] Timeout: 0
[16:52:39.753]
[16:52:39.753] Full log written to /tmp/cirrus-ci-build/build/meson-logs/testlog-running.txt

And:

diff -U3 /tmp/cirrus-ci-build/contrib/intarray/expected/_int.out /tmp/cirrus-ci-build/build/testrun/intarray-running/regress/results/_int.out
--- /tmp/cirrus-ci-build/contrib/intarray/expected/_int.out	2024-03-14 16:48:48.690367000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/intarray-running/regress/results/_int.out	2024-03-14 16:52:05.759444000 +0000
@@ -804,6 +804,7 @@

DROP INDEX text_idx;
CREATE INDEX text_idx on test__int using gin ( a gin__int_ops );
+ERROR:  error triggered for injection point gin-leave-leaf-split-incomplete
SELECT count(*) from test__int WHERE a && '{23,50}';
count 
-------
@@ -877,6 +878,7 @@
(1 row)

DROP INDEX text_idx;
+ERROR:  index "text_idx" does not exist
-- Repeat the same queries with an extended data set. The data set is the
-- same that we used before, except that each element in the array is
-- repeated three times, offset by 1000 and 2000. For example, {1, 5}

Somehow the 'gin-leave-leaf-split-incomplete' injection point was active
in the 'intarray' test. That makes no sense. That injection point is
only used by the test in src/test/modules/gin/. Perhaps that ran at the
same time as the intarray test? But they run in separate instances, with
different data directories. And the 'gin' test passed.

I'm completely stumped. Anyone have a theory?

--
Heikki Linnakangas
Neon (https://neon.tech)

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Heikki Linnakangas (#1)

Re: Weird test mixup

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Somehow the 'gin-leave-leaf-split-incomplete' injection point was active
in the 'intarray' test. That makes no sense. That injection point is
only used by the test in src/test/modules/gin/. Perhaps that ran at the
same time as the intarray test? But they run in separate instances, with
different data directories.

Do they? It'd be fairly easy to explain this if these things were
being run in "installcheck" style. I'm not sure about CI, but from
memory, the buildfarm does use installcheck for some things.

I wonder if it'd be wise to adjust the injection point stuff so that
it's active in only the specific database the injection point was
activated in.

regards, tom lane

Thomas Munro

thomas.munro@gmail.com

almost 2 years ago

In reply to: Tom Lane (#2)

Re: Weird test mixup

On Fri, Mar 15, 2024 at 11:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Somehow the 'gin-leave-leaf-split-incomplete' injection point was active
in the 'intarray' test. That makes no sense. That injection point is
only used by the test in src/test/modules/gin/. Perhaps that ran at the
same time as the intarray test? But they run in separate instances, with
different data directories.

Do they? It'd be fairly easy to explain this if these things were
being run in "installcheck" style. I'm not sure about CI, but from
memory, the buildfarm does use installcheck for some things.

Right, as mentioned here:

/messages/by-id/CA+hUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX==AVnW84f-+8yamQ@mail.gmail.com

That's the "running" test, which is like the old installcheck.

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Tom Lane (#2)

Re: Weird test mixup

I wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Somehow the 'gin-leave-leaf-split-incomplete' injection point was active
in the 'intarray' test. That makes no sense. That injection point is
only used by the test in src/test/modules/gin/. Perhaps that ran at the
same time as the intarray test? But they run in separate instances, with
different data directories.

Do they? It'd be fairly easy to explain this if these things were
being run in "installcheck" style. I'm not sure about CI, but from
memory, the buildfarm does use installcheck for some things.

Hmm, Munro's comment yesterday[1]/messages/by-id/CA+hUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX==AVnW84f-+8yamQ@mail.gmail.com says that current CI does use
installcheck mode in some cases.

regards, tom lane

[1]: /messages/by-id/CA+hUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX==AVnW84f-+8yamQ@mail.gmail.com

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Thomas Munro (#3)

Re: Weird test mixup

Thomas Munro <thomas.munro@gmail.com> writes:

On Fri, Mar 15, 2024 at 11:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do they? It'd be fairly easy to explain this if these things were
being run in "installcheck" style. I'm not sure about CI, but from
memory, the buildfarm does use installcheck for some things.

Right, as mentioned here:
/messages/by-id/CA+hUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX==AVnW84f-+8yamQ@mail.gmail.com
That's the "running" test, which is like the old installcheck.

Hmm. Seems like maybe we need to institute a rule that anything
using injection points has to be marked NO_INSTALLCHECK. That's
kind of a big hammer though.

regards, tom lane

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Tom Lane (#2)

Re: Weird test mixup

On Thu, Mar 14, 2024 at 06:19:38PM -0400, Tom Lane wrote:

Do they? It'd be fairly easy to explain this if these things were
being run in "installcheck" style. I'm not sure about CI, but from
memory, the buildfarm does use installcheck for some things.

I wonder if it'd be wise to adjust the injection point stuff so that
it's active in only the specific database the injection point was
activated in.

It can be made optional by extending InjectionPointAttach() to
specify a database OID or a database name. Note that
041_checkpoint_at_promote.pl wants an injection point to run in the
checkpointer, where we don't have a database requirement.

Or we could just disable runningcheck because of the concurrency
requirement in this test. The test would still be able to run, just
less times.
--
Michael

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#6)

Re: Weird test mixup

On Fri, Mar 15, 2024 at 07:53:57AM +0900, Michael Paquier wrote:

It can be made optional by extending InjectionPointAttach() to
specify a database OID or a database name. Note that
041_checkpoint_at_promote.pl wants an injection point to run in the
checkpointer, where we don't have a database requirement.

Slight correction here. It is also possible to not touch
InjectionPointAttach() at all: just tweak the callbacks to do that as
long as the database that should be used is tracked in shmem with its
point name, say with new fields in InjectionPointSharedState. That
keeps the backend APIs in a cleaner state.
--
Michael

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Michael Paquier (#6)

Re: Weird test mixup

Michael Paquier <michael@paquier.xyz> writes:

On Thu, Mar 14, 2024 at 06:19:38PM -0400, Tom Lane wrote:

I wonder if it'd be wise to adjust the injection point stuff so that
it's active in only the specific database the injection point was
activated in.

It can be made optional by extending InjectionPointAttach() to
specify a database OID or a database name. Note that
041_checkpoint_at_promote.pl wants an injection point to run in the
checkpointer, where we don't have a database requirement.

Or we could just disable runningcheck because of the concurrency
requirement in this test. The test would still be able to run, just
less times.

No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points. The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation. Side-effects occurring in other
databases are completely not OK.

I can see that some tests would want to be able to inject code
cluster-wide, but I bet that's going to be a small minority.
I suggest that we invent a notion of "global" vs "local"
injection points, where a "local" one only fires in the DB it
was defined in. Then only tests that require a global injection
point need to be NO_INSTALLCHECK.

regards, tom lane

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Tom Lane (#8)

2 attachment(s)

Re: Weird test mixup

On Thu, Mar 14, 2024 at 07:13:53PM -0400, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

Or we could just disable runningcheck because of the concurrency
requirement in this test. The test would still be able to run, just
less times.

No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points. The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation. Side-effects occurring in other
databases are completely not OK.

I really don't want to plug any runtime conditions into the backend
APIs, because there can be so much more that can be done there than
only restricting a callback to a database. I can imagine process type
restrictions, process PID restrictions, etc. So this knowledge should
stick into the test module itself, and be expanded there. That's
easier ABI-wise, as well.

I can see that some tests would want to be able to inject code
cluster-wide, but I bet that's going to be a small minority.
I suggest that we invent a notion of "global" vs "local"
injection points, where a "local" one only fires in the DB it
was defined in. Then only tests that require a global injection
point need to be NO_INSTALLCHECK.

Attached is a POC of what could be done. I have extended the module
injection_points so as it is possible to register what I'm calling a
"condition" in the module that can be defined with a new SQL function.

The condition is stored in shared memory with the point name, then at
runtime the conditions are cross-checked in the callbacks. With the
interface of this patch, the condition should be registered *before* a
point is attached, but this stuff could also be written so as
injection_points_attach() takes an optional argument with a database
name. Or this could use a different, new SQL function, say a
injection_points_attach_local() that registers a condition with
MyDatabaseId on top of attaching the point, making the whole happening
while holding once the spinlock of the shmem state for the module.

By the way, modules/gin/ was missing missing a detach, so the test was
not repeatable with successive installchecks. Adding a pg_sleep of a
few seconds after 'gin-leave-leaf-split-incomplete' is registered
enlarges the window, and the patch avoids failures when running
installcheck in parallel for modules/gin/ and something else using
gin, like contrib/btree_gin/:
while make USE_MODULE_DB=1 installcheck; do :; done

0001 is the condition facility for the module, 0002 is a fix for the
GIN test. Thoughts are welcome.
--
Michael

Attachments:

0001-Introduce-runtime-conditions-in-module-injection_poi.patchtext/x-diff; charset=us-asciiDownload

From 5f00092a0bd193f70d9daff77caacfc2ba83ca52 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 15 Mar 2024 16:03:05 +0900
Subject: [PATCH 1/2] Introduce runtime conditions in module injection_points

This adds a new SQL function injection_points_condition() that can be
used to register a runtime condition to an injection point, declared or
not.  For now, it is possible to register a database name to make an
injection point run only on a defined database.
---
 .../expected/injection_points.out             |  54 ++++++++
 .../injection_points--1.0.sql                 |  13 ++
 .../injection_points/injection_points.c       | 131 ++++++++++++++++++
 .../injection_points/sql/injection_points.sql |  14 ++
 4 files changed, 212 insertions(+)

diff --git a/src/test/modules/injection_points/expected/injection_points.out b/src/test/modules/injection_points/expected/injection_points.out
index 827456ccc5..ccb83b291f 100644
--- a/src/test/modules/injection_points/expected/injection_points.out
+++ b/src/test/modules/injection_points/expected/injection_points.out
@@ -115,4 +115,58 @@ NOTICE:  notice triggered for injection point TestInjectionLog2
  
 (1 row)
 
+SELECT injection_points_detach('TestInjectionLog2');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
+-- Conditions
+SELECT current_database() AS datname \gset
+-- Register injection point to run on database 'postgres'.
+SELECT injection_points_attach('TestConditionError', 'error');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+SELECT injection_points_condition('TestConditionError', 'postgres');
+ injection_points_condition 
+----------------------------
+ 
+(1 row)
+
+SELECT injection_points_run('TestConditionError'); -- nothing
+ injection_points_run 
+----------------------
+ 
+(1 row)
+
+SELECT injection_points_detach('TestConditionError');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
+-- Register injection point to run on current database
+SELECT injection_points_attach('TestConditionError', 'error');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+SELECT injection_points_condition('TestConditionError', :'datname');
+ injection_points_condition 
+----------------------------
+ 
+(1 row)
+
+SELECT injection_points_run('TestConditionError'); -- error
+ERROR:  error triggered for injection point TestConditionError
+SELECT injection_points_detach('TestConditionError');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
 DROP EXTENSION injection_points;
diff --git a/src/test/modules/injection_points/injection_points--1.0.sql b/src/test/modules/injection_points/injection_points--1.0.sql
index 0a2e59aba7..b707f23dad 100644
--- a/src/test/modules/injection_points/injection_points--1.0.sql
+++ b/src/test/modules/injection_points/injection_points--1.0.sql
@@ -34,6 +34,19 @@ RETURNS void
 AS 'MODULE_PATHNAME', 'injection_points_wakeup'
 LANGUAGE C STRICT PARALLEL UNSAFE;
 
+--
+-- injection_points_condition()
+--
+-- Register a runtime condition associated to an injection point.
+--
+-- dbname can be specified to restrict an injection point to run on a
+-- wanted database.
+--
+CREATE FUNCTION injection_points_condition(IN point_name TEXT, IN dbname text)
+RETURNS void
+AS 'MODULE_PATHNAME', 'injection_points_condition'
+LANGUAGE C STRICT PARALLEL UNSAFE;
+
 --
 -- injection_points_detach()
 --
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 0730254d54..ce9317f0ee 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -17,7 +17,9 @@
 
 #include "postgres.h"
 
+#include "commands/dbcommands.h"
 #include "fmgr.h"
+#include "miscadmin.h"
 #include "storage/condition_variable.h"
 #include "storage/dsm_registry.h"
 #include "storage/lwlock.h"
@@ -31,6 +33,23 @@ PG_MODULE_MAGIC;
 /* Maximum number of waits usable in injection points at once */
 #define INJ_MAX_WAIT	8
 #define INJ_NAME_MAXLEN	64
+#define	INJ_MAX_CONDITION	8
+
+/*
+ * Conditions related to injection points.  This tracks in shared memory the
+ * some runtime conditions under which an injection point is allowed to run.
+ *
+ * If more types of runtime conditions need to be tracked, this structure
+ * should be expanded.
+ */
+typedef struct InjectionPointCondition
+{
+	/* Name of the injection point related to its  */
+	char	name[INJ_NAME_MAXLEN];
+
+	/* Oid of the database where the injection point is allowed to run */
+	Oid		dboid;
+} InjectionPointCondition;
 
 /* Shared state information for injection points. */
 typedef struct InjectionPointSharedState
@@ -44,6 +63,9 @@ typedef struct InjectionPointSharedState
 	/* Names of injection points attached to wait counters */
 	char		name[INJ_MAX_WAIT][INJ_NAME_MAXLEN];
 
+	/* Condition to run an injection point */
+	InjectionPointCondition	condition[INJ_MAX_CONDITION];
+
 	/* Condition variable used for waits and wakeups */
 	ConditionVariable wait_point;
 } InjectionPointSharedState;
@@ -67,6 +89,7 @@ injection_point_init_state(void *ptr)
 	SpinLockInit(&state->lock);
 	memset(state->wait_counts, 0, sizeof(state->wait_counts));
 	memset(state->name, 0, sizeof(state->name));
+	memset(state->condition, 0, sizeof(state->condition));
 	ConditionVariableInit(&state->wait_point);
 }
 
@@ -87,16 +110,62 @@ injection_init_shmem(void)
 								   &found);
 }
 
+/*
+ * Check runtime conditions associated to an injection point.
+ *
+ * Returns true if the named injection point is allowed to run, and false
+ * otherwise.  Multiple conditions can be associated to a single injection
+ * point, so check them all.
+ */
+static bool
+injection_point_allowed(const char *name)
+{
+	bool	result = true;
+
+	if (inj_state == NULL)
+		injection_init_shmem();
+
+	SpinLockAcquire(&inj_state->lock);
+
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition	*condition = &inj_state->condition[i];
+
+		if (strcmp(condition->name, name) == 0)
+		{
+			/*
+			 * Check that if the database matches with what's used in this
+			 * backend.
+			 */
+			if (MyDatabaseId != condition->dboid)
+			{
+				result = false;
+				break;
+			}
+		}
+	}
+
+	SpinLockRelease(&inj_state->lock);
+
+	return result;
+}
+
 /* Set of callbacks available to be attached to an injection point. */
 void
 injection_error(const char *name)
 {
+	if (!injection_point_allowed(name))
+		return;
+
 	elog(ERROR, "error triggered for injection point %s", name);
 }
 
 void
 injection_notice(const char *name)
 {
+	if (!injection_point_allowed(name))
+		return;
+
 	elog(NOTICE, "notice triggered for injection point %s", name);
 }
 
@@ -111,6 +180,9 @@ injection_wait(const char *name)
 	if (inj_state == NULL)
 		injection_init_shmem();
 
+	if (!injection_point_allowed(name))
+		return;
+
 	/*
 	 * Use the injection point name for this custom wait event.  Note that
 	 * this custom wait event name is not released, but we don't care much for
@@ -235,6 +307,48 @@ injection_points_wakeup(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * injection_points_condition
+ *
+ * Register a runtime condition associated to an injection point.  Currently,
+ * it is possible to define a database name where an injection point can be
+ * restricted to run on.
+ */
+PG_FUNCTION_INFO_V1(injection_points_condition);
+Datum
+injection_points_condition(PG_FUNCTION_ARGS)
+{
+	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	char	   *dbname = text_to_cstring(PG_GETARG_TEXT_PP(1));
+	Oid			dboid = get_database_oid(dbname, false);
+	int			index = -1;
+
+	if (inj_state == NULL)
+		injection_init_shmem();
+
+	SpinLockAcquire(&inj_state->lock);
+
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition	*condition = &inj_state->condition[i];
+
+		if (condition->name[0] == '\0')
+		{
+			index = i;
+			strlcpy(condition->name, name, INJ_NAME_MAXLEN);
+			condition->dboid = dboid;
+			break;
+		}
+	}
+	SpinLockRelease(&inj_state->lock);
+
+	if (index < 0)
+		elog(ERROR, "could not find free slot for condition of injection point %s",
+			 name);
+
+	PG_RETURN_VOID();
+}
+
 /*
  * SQL function for dropping an injection point.
  */
@@ -246,5 +360,22 @@ injection_points_detach(PG_FUNCTION_ARGS)
 
 	InjectionPointDetach(name);
 
+	if (inj_state == NULL)
+		injection_init_shmem();
+
+	/* Clean up any conditions associated to this injection point */
+	SpinLockAcquire(&inj_state->lock);
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition	*condition = &inj_state->condition[i];
+
+		if (strcmp(condition->name, name) == 0)
+		{
+			condition->dboid = InvalidOid;
+			condition->name[0] = '\0';
+		}
+	}
+	SpinLockRelease(&inj_state->lock);
+
 	PG_RETURN_VOID();
 }
diff --git a/src/test/modules/injection_points/sql/injection_points.sql b/src/test/modules/injection_points/sql/injection_points.sql
index 23c7e435ad..1b52e7e6d7 100644
--- a/src/test/modules/injection_points/sql/injection_points.sql
+++ b/src/test/modules/injection_points/sql/injection_points.sql
@@ -30,5 +30,19 @@ SELECT injection_points_run('TestInjectionLog2'); -- notice
 SELECT injection_points_detach('TestInjectionLog'); -- fails
 
 SELECT injection_points_run('TestInjectionLog2'); -- notice
+SELECT injection_points_detach('TestInjectionLog2');
+
+-- Conditions
+SELECT current_database() AS datname \gset
+-- Register injection point to run on database 'postgres'.
+SELECT injection_points_attach('TestConditionError', 'error');
+SELECT injection_points_condition('TestConditionError', 'postgres');
+SELECT injection_points_run('TestConditionError'); -- nothing
+SELECT injection_points_detach('TestConditionError');
+-- Register injection point to run on current database
+SELECT injection_points_attach('TestConditionError', 'error');
+SELECT injection_points_condition('TestConditionError', :'datname');
+SELECT injection_points_run('TestConditionError'); -- error
+SELECT injection_points_detach('TestConditionError');
 
 DROP EXTENSION injection_points;
-- 
2.43.0

0002-Restrict-GIN-test-with-injection-points-to-run-on-lo.patchtext/x-diff; charset=us-asciiDownload

From f0b6e81a8652cbfe8733c34bcd035997392011b8 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 15 Mar 2024 16:26:52 +0900
Subject: [PATCH 2/2] Restrict GIN test with injection points to run on local
 database

It is possible that this test conflicts with parallel installcheck
activities.  While on it, add a forgotten detach to the GIN test to make
it repeatable.
---
 .../gin/expected/gin_incomplete_splits.out    | 27 +++++++++++++++++++
 .../modules/gin/sql/gin_incomplete_splits.sql |  8 ++++++
 2 files changed, 35 insertions(+)

diff --git a/src/test/modules/gin/expected/gin_incomplete_splits.out b/src/test/modules/gin/expected/gin_incomplete_splits.out
index 9a60f8c128..7803dd7a90 100644
--- a/src/test/modules/gin/expected/gin_incomplete_splits.out
+++ b/src/test/modules/gin/expected/gin_incomplete_splits.out
@@ -14,7 +14,16 @@
 create extension injection_points;
 -- Use the index for all the queries
 set enable_seqscan=off;
+-- Limit all the injection points of this script to run on the current
+-- database, for safety under concurrent activity.
+SELECT current_database() AS datname \gset
 -- Print a NOTICE whenever an incomplete split gets fixed
+SELECT injection_points_condition('gin-finish-incomplete-split', :'datname');
+ injection_points_condition 
+----------------------------
+ 
+(1 row)
+
 SELECT injection_points_attach('gin-finish-incomplete-split', 'notice');
  injection_points_attach 
 -------------------------
@@ -109,6 +118,12 @@ select verify(:next_i);
 --
 -- Test incomplete leaf split
 --
+SELECT injection_points_condition('gin-leave-leaf-split-incomplete', :'datname');
+ injection_points_condition 
+----------------------------
+ 
+(1 row)
+
 SELECT injection_points_attach('gin-leave-leaf-split-incomplete', 'error');
  injection_points_attach 
 -------------------------
@@ -145,6 +160,12 @@ select verify(:next_i);
 --
 -- Test incomplete internal page split
 --
+SELECT injection_points_condition('gin-leave-internal-split-incomplete', :'datname');
+ injection_points_condition 
+----------------------------
+ 
+(1 row)
+
 SELECT injection_points_attach('gin-leave-internal-split-incomplete', 'error');
  injection_points_attach 
 -------------------------
@@ -178,3 +199,9 @@ select verify(:next_i);
  t
 (1 row)
 
+SELECT injection_points_detach('gin-finish-incomplete-split');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
diff --git a/src/test/modules/gin/sql/gin_incomplete_splits.sql b/src/test/modules/gin/sql/gin_incomplete_splits.sql
index 4e180b2519..5b4521edd9 100644
--- a/src/test/modules/gin/sql/gin_incomplete_splits.sql
+++ b/src/test/modules/gin/sql/gin_incomplete_splits.sql
@@ -17,7 +17,12 @@ create extension injection_points;
 -- Use the index for all the queries
 set enable_seqscan=off;
 
+-- Limit all the injection points of this script to run on the current
+-- database, for safety under concurrent activity.
+SELECT current_database() AS datname \gset
+
 -- Print a NOTICE whenever an incomplete split gets fixed
+SELECT injection_points_condition('gin-finish-incomplete-split', :'datname');
 SELECT injection_points_attach('gin-finish-incomplete-split', 'notice');
 
 --
@@ -111,6 +116,7 @@ select verify(:next_i);
 --
 -- Test incomplete leaf split
 --
+SELECT injection_points_condition('gin-leave-leaf-split-incomplete', :'datname');
 SELECT injection_points_attach('gin-leave-leaf-split-incomplete', 'error');
 select insert_until_fail(:next_i) as next_i
 \gset
@@ -129,6 +135,7 @@ select verify(:next_i);
 --
 -- Test incomplete internal page split
 --
+SELECT injection_points_condition('gin-leave-internal-split-incomplete', :'datname');
 SELECT injection_points_attach('gin-leave-internal-split-incomplete', 'error');
 select insert_until_fail(:next_i, 100) as next_i
 \gset
@@ -142,3 +149,4 @@ select insert_n(:next_i, 10) as next_i
 \gset
 -- Verify that a scan still works
 select verify(:next_i);
+SELECT injection_points_detach('gin-finish-incomplete-split');
-- 
2.43.0

#10

Heikki Linnakangas

hlinnaka@iki.fi

almost 2 years ago

In reply to: Michael Paquier (#9)

Re: Weird test mixup

On 15/03/2024 09:39, Michael Paquier wrote:

On Thu, Mar 14, 2024 at 07:13:53PM -0400, Tom Lane wrote:

I can see that some tests would want to be able to inject code
cluster-wide, but I bet that's going to be a small minority.
I suggest that we invent a notion of "global" vs "local"
injection points, where a "local" one only fires in the DB it
was defined in. Then only tests that require a global injection
point need to be NO_INSTALLCHECK.

Attached is a POC of what could be done. I have extended the module
injection_points so as it is possible to register what I'm calling a
"condition" in the module that can be defined with a new SQL function.

The condition is stored in shared memory with the point name, then at
runtime the conditions are cross-checked in the callbacks. With the
interface of this patch, the condition should be registered *before* a
point is attached, but this stuff could also be written so as
injection_points_attach() takes an optional argument with a database
name. Or this could use a different, new SQL function, say a
injection_points_attach_local() that registers a condition with
MyDatabaseId on top of attaching the point, making the whole happening
while holding once the spinlock of the shmem state for the module.

For the gin test, a single "SELECT injection_points_attach_local()" at
the top of the test file would be most convenient.

If I have to do "SELECT
injection_points_condition('gin-finish-incomplete-split', :'datname');"
for every injection point in the test, I will surely forget it sometimes.

In the 'gin' test, they could actually be scoped to the same backend.

Wrt. the spinlock and shared memory handling, I think this would be
simpler if you could pass some payload in the InjectionPointAttach()
call, which would be passed back to the callback function:

  void
  InjectionPointAttach(const char *name,
  					 const char *library,
-					 const char *function)
+					 const char *function,
+					 uint64 payload)

In this case, the payload would be the "slot index" in shared memory.

Or perhaps always allocate, say, 1024 bytes of working area for every
attached injection point that the test module can use any way it wants.
Like for storing extra conditions, or for the wakeup counter stuff in
injection_wait(). A fixed size working area is a little crude, but would
be very handy in practice.

By the way, modules/gin/ was missing missing a detach, so the test was
not repeatable with successive installchecks.

Oops.

It would be nice to automatically detach all the injection points on
process exit. You wouldn't always want that, but I think most tests hold
a session open throughout the test, and for those it would be handy.

--
Heikki Linnakangas
Neon (https://neon.tech)

#11

Heikki Linnakangas

hlinnaka@iki.fi

almost 2 years ago

In reply to: Tom Lane (#8)

Re: Weird test mixup

On 15/03/2024 01:13, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

Or we could just disable runningcheck because of the concurrency
requirement in this test. The test would still be able to run, just
less times.

No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points. The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation. Side-effects occurring in other
databases are completely not OK.

I committed a patch to do that, to put out the fire.

--
Heikki Linnakangas
Neon (https://neon.tech)

#12

Heikki Linnakangas

hlinnaka@iki.fi

almost 2 years ago

In reply to: Heikki Linnakangas (#11)

Re: Weird test mixup

On 15/03/2024 13:09, Heikki Linnakangas wrote:

On 15/03/2024 01:13, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

Or we could just disable runningcheck because of the concurrency
requirement in this test. The test would still be able to run, just
less times.

No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points. The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation. Side-effects occurring in other
databases are completely not OK.

I committed a patch to do that, to put out the fire.

That's turning the buildfarm quite red. Many, but not all animals are
failing like this:

--- /home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/expected/injection_points.out	2024-03-15 12:41:16.363286975 +0100
+++ /home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/results/injection_points.out	2024-03-15 12:53:11.528159615 +0100
@@ -1,118 +1,111 @@
CREATE EXTENSION injection_points;
+ERROR:  extension "injection_points" is not available
+DETAIL:  Could not open extension control file "/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/tmp_install/home/buildfarm/hippopotamus/buildroot/HEAD/inst/share/postgresql/extension/injection_points.control": No such file or directory.
+HINT:  The extension must first be installed on the system where PostgreSQL is running.
...

Looks like adding NO_INSTALLCHECK somehow affected how the modules are
installed in tmp_install. I'll investigate..

--
Heikki Linnakangas
Neon (https://neon.tech)

#13

Heikki Linnakangas

hlinnaka@iki.fi

almost 2 years ago

In reply to: Heikki Linnakangas (#12)

Re: Weird test mixup

On 15/03/2024 14:10, Heikki Linnakangas wrote:

On 15/03/2024 13:09, Heikki Linnakangas wrote:

I committed a patch to do that, to put out the fire.

That's turning the buildfarm quite red. Many, but not all animals are
failing like this:

--- /home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/expected/injection_points.out	2024-03-15 12:41:16.363286975 +0100
+++ /home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/src/test/modules/injection_points/results/injection_points.out	2024-03-15 12:53:11.528159615 +0100
@@ -1,118 +1,111 @@
CREATE EXTENSION injection_points;
+ERROR:  extension "injection_points" is not available
+DETAIL:  Could not open extension control file "/home/buildfarm/hippopotamus/buildroot/HEAD/pgsql.build/tmp_install/home/buildfarm/hippopotamus/buildroot/HEAD/inst/share/postgresql/extension/injection_points.control": No such file or directory.
+HINT:  The extension must first be installed on the system where PostgreSQL is running.
...

Looks like adding NO_INSTALLCHECK somehow affected how the modules are
installed in tmp_install. I'll investigate..

I think this is a bug in the buildfarm client. In the make_misc_check
step, it does this (reduced to just the interesting parts):

# run the modules that can't be run with installcheck
sub make_misc_check
{
...
my @dirs = glob("$pgsql/src/test/modules/* $pgsql/contrib/*");
foreach my $dir (@dirs)
{
next unless -e "$dir/Makefile";
my $makefile = file_contents("$dir/Makefile");
next unless $makefile =~ /^NO_INSTALLCHECK/m;
my $test = basename($dir);

# skip redundant TAP tests which are called elsewhere
my @out = run_log("cd $dir && $make $instflags TAP_TESTS= check");
...
}

So it scans src/test/modules, and runs "make check" for all
subdirectories that have NO_INSTALLCHECK in the makefile. But the
injection fault tests are also conditional on the
enable_injection_points in the parent Makefile:

ifeq ($(enable_injection_points),yes)
SUBDIRS += injection_points gin
else
ALWAYS_SUBDIRS += injection_points gin
endif

The buildfarm client doesn't pay any attention to that, and runs the
test anyway.

I committed an ugly hack to the subdirectory Makefiles, to turn "make
check" into a no-op if injection points are disabled. Normally when you
run "make check" at the parent level, it doesn't even recurse to the
directories, but this works around the buildfarm script. I hope...

--
Heikki Linnakangas
Neon (https://neon.tech)

#14

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Heikki Linnakangas (#12)

Re: Weird test mixup

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 15/03/2024 13:09, Heikki Linnakangas wrote:

I committed a patch to do that, to put out the fire.

That's turning the buildfarm quite red. Many, but not all animals are
failing like this:

It may be even worse than it appears from the buildfarm status page.
My animals were stuck in infinite loops that required a manual "kill"
to get out of, and it's reasonable to assume there are others that
will require owner intervention. Why would this test have done that,
if the module failed to load?

regards, tom lane

#15

Heikki Linnakangas

hlinnaka@iki.fi

almost 2 years ago

In reply to: Tom Lane (#14)

Re: Weird test mixup

On 15/03/2024 16:00, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 15/03/2024 13:09, Heikki Linnakangas wrote:

I committed a patch to do that, to put out the fire.

That's turning the buildfarm quite red. Many, but not all animals are
failing like this:

It may be even worse than it appears from the buildfarm status page.
My animals were stuck in infinite loops that required a manual "kill"
to get out of, and it's reasonable to assume there are others that
will require owner intervention. Why would this test have done that,
if the module failed to load?

The gin_incomplete_split test inserts rows until it hits the injection
point, at page split. There is a backstop, it should give up after 10000
iterations, but that was broken. Fixed that, thanks for the report!

Hmm, don't we have any timeout that would kill tests if they get stuck?

--
Heikki Linnakangas
Neon (https://neon.tech)

#16

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Heikki Linnakangas (#15)

Re: Weird test mixup

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 15/03/2024 16:00, Tom Lane wrote:

It may be even worse than it appears from the buildfarm status page.
My animals were stuck in infinite loops that required a manual "kill"
to get out of, and it's reasonable to assume there are others that
will require owner intervention. Why would this test have done that,
if the module failed to load?

The gin_incomplete_split test inserts rows until it hits the injection
point, at page split. There is a backstop, it should give up after 10000
iterations, but that was broken. Fixed that, thanks for the report!

Duh ...

Hmm, don't we have any timeout that would kill tests if they get stuck?

AFAIK, the only constraint on a buildfarm animal's runtime is the
wait_timeout setting, which is infinite by default, and was on my
machines. (Not anymore ;-).) We do have timeouts in (most?) TAP
tests, but this wasn't a TAP test.

If this is a continuous-insertion loop, presumably it will run the
machine out of disk space eventually, which could be unpleasant
if there are other services running besides the buildfarm.
I think I'll go notify the buildfarm owners list to check for
trouble.

Are there limits on the runtime of CI or cfbot jobs? Maybe
somebody should go check those systems.

regards, tom lane

#17

Thomas Munro

thomas.munro@gmail.com

almost 2 years ago

In reply to: Tom Lane (#16)

Re: Weird test mixup

On Sat, Mar 16, 2024 at 7:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Are there limits on the runtime of CI or cfbot jobs? Maybe
somebody should go check those systems.

Those get killed at a higher level after 60 minutes (configurable but
we didn't change it AFAIK):

https://cirrus-ci.org/faq/#instance-timed-out

It's a fresh virtual machine for each run, and after that it's gone
(well the ccache directory survives but only by being
uploaded/downloaded in explicit steps to transmit it between runs).

#18

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Heikki Linnakangas (#10)

Re: Weird test mixup

On Fri, Mar 15, 2024 at 11:23:31AM +0200, Heikki Linnakangas wrote:

For the gin test, a single "SELECT injection_points_attach_local()" at the
top of the test file would be most convenient.

If I have to do "SELECT
injection_points_condition('gin-finish-incomplete-split', :'datname');" for
every injection point in the test, I will surely forget it sometimes.

So will I, most likely.. The odds never play in favor of hackers. I
have a few more tests in mind that can be linked to a specific
backend with SQL queries, but I've not been able to get back to it
yet.

Wrt. the spinlock and shared memory handling, I think this would be simpler
if you could pass some payload in the InjectionPointAttach() call, which
would be passed back to the callback function:

In this case, the payload would be the "slot index" in shared memory.

Or perhaps always allocate, say, 1024 bytes of working area for every
attached injection point that the test module can use any way it wants. Like
for storing extra conditions, or for the wakeup counter stuff in
injection_wait(). A fixed size working area is a little crude, but would be
very handy in practice.

Perhaps. I am not sure that we need more than the current signature,
all that can just be handled in some module-specific shmem area. The
key is to be able to link a point name to some state related to it.
Using a hash table would be more efficient, but performance wise a
array is not going to matter as there will most likely never be more
than 8 points. 4 is already a lot, just doubling that on safety
ground.

It would be nice to automatically detach all the injection points on process
exit. You wouldn't always want that, but I think most tests hold a session
open throughout the test, and for those it would be handy.

Linking all the points to a PID with a injection_points_attach_local()
that switches a static flag while registering a before_shmem_exit() to
do an automated cleanup sounds like the simplest approach to me based
on what I'm reading on this thread.

(Just saw the buildfarm storm, wow.)
--
Michael

#19

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#18)

2 attachment(s)

Re: Weird test mixup

On Sat, Mar 16, 2024 at 08:40:21AM +0900, Michael Paquier wrote:

Linking all the points to a PID with a injection_points_attach_local()
that switches a static flag while registering a before_shmem_exit() to
do an automated cleanup sounds like the simplest approach to me based
on what I'm reading on this thread.

Please find a patch to do exactly that, without touching the backend
APIs. 0001 adds a new function call injection_points_local() that can
be added on top of a SQL test to make it concurrent-safe. 0002 is the
fix for the GIN tests.

I am going to add an open item to not forget about all that.

Comments are welcome.
--
Michael

Attachments:

v2-0001-Introduce-runtime-conditions-in-module-injection_.patchtext/x-diff; charset=us-asciiDownload

From a4c219f0142a36b2a009af1f9e9affdeab6ab383 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 18 Mar 2024 09:53:05 +0900
Subject: [PATCH v2 1/2] Introduce runtime conditions in module
 injection_points

This adds a new SQL function injection_points_local() that can be used
to force injection points to be run only on the process where they are
attached.  This is handy for SQL tests to:
- detach automatically local injection points when the process exits.
- allow tests with injection points to run concurrently with other test
suites.
---
 .../expected/injection_points.out             |  77 ++++++++
 .../injection_points--1.0.sql                 |  11 ++
 .../injection_points/injection_points.c       | 176 ++++++++++++++++++
 .../injection_points/sql/injection_points.sql |  20 ++
 src/tools/pgindent/typedefs.list              |   1 +
 5 files changed, 285 insertions(+)

diff --git a/src/test/modules/injection_points/expected/injection_points.out b/src/test/modules/injection_points/expected/injection_points.out
index 827456ccc5..f24a602def 100644
--- a/src/test/modules/injection_points/expected/injection_points.out
+++ b/src/test/modules/injection_points/expected/injection_points.out
@@ -115,4 +115,81 @@ NOTICE:  notice triggered for injection point TestInjectionLog2
  
 (1 row)
 
+SELECT injection_points_detach('TestInjectionLog2');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
+-- Runtime conditions
+SELECT injection_points_attach('TestConditionError', 'error');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+-- Any follow-up injection point attached will be local to this process.
+SELECT injection_points_local();
+ injection_points_local 
+------------------------
+ 
+(1 row)
+
+SELECT injection_points_attach('TestConditionLocal1', 'error');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+SELECT injection_points_attach('TestConditionLocal2', 'notice');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+SELECT injection_points_run('TestConditionLocal1'); -- error
+ERROR:  error triggered for injection point TestConditionLocal1
+SELECT injection_points_run('TestConditionLocal2'); -- notice
+NOTICE:  notice triggered for injection point TestConditionLocal2
+ injection_points_run 
+----------------------
+ 
+(1 row)
+
+-- reload, local injection points should be gone.
+\c
+SELECT injection_points_run('TestConditionLocal1'); -- nothing
+ injection_points_run 
+----------------------
+ 
+(1 row)
+
+SELECT injection_points_run('TestConditionLocal2'); -- nothing
+ injection_points_run 
+----------------------
+ 
+(1 row)
+
+SELECT injection_points_run('TestConditionError'); -- error
+ERROR:  error triggered for injection point TestConditionError
+SELECT injection_points_detach('TestConditionError');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
+-- Attaching injection points that use the same name as one defined locally
+-- previously should work.
+SELECT injection_points_attach('TestConditionLocal1', 'error');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+SELECT injection_points_detach('TestConditionLocal1');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
 DROP EXTENSION injection_points;
diff --git a/src/test/modules/injection_points/injection_points--1.0.sql b/src/test/modules/injection_points/injection_points--1.0.sql
index 0a2e59aba7..1813a4d6d8 100644
--- a/src/test/modules/injection_points/injection_points--1.0.sql
+++ b/src/test/modules/injection_points/injection_points--1.0.sql
@@ -34,6 +34,17 @@ RETURNS void
 AS 'MODULE_PATHNAME', 'injection_points_wakeup'
 LANGUAGE C STRICT PARALLEL UNSAFE;
 
+--
+-- injection_points_local()
+--
+-- Trigger switch to link any future injection points attached to the
+-- current process, useful to make SQL tests concurrently-safe.
+--
+CREATE FUNCTION injection_points_local()
+RETURNS void
+AS 'MODULE_PATHNAME', 'injection_points_local'
+LANGUAGE C STRICT PARALLEL UNSAFE;
+
 --
 -- injection_points_detach()
 --
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 0730254d54..8be081db99 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -18,8 +18,10 @@
 #include "postgres.h"
 
 #include "fmgr.h"
+#include "miscadmin.h"
 #include "storage/condition_variable.h"
 #include "storage/dsm_registry.h"
+#include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
@@ -31,6 +33,23 @@ PG_MODULE_MAGIC;
 /* Maximum number of waits usable in injection points at once */
 #define INJ_MAX_WAIT	8
 #define INJ_NAME_MAXLEN	64
+#define	INJ_MAX_CONDITION	4
+
+/*
+ * Conditions related to injection points.  This tracks in shared memory the
+ * runtime conditions under which an injection point is allowed to run.
+ *
+ * If more types of runtime conditions need to be tracked, this structure
+ * should be expanded.
+ */
+typedef struct InjectionPointCondition
+{
+	/* Name of the injection point related to its  */
+	char		name[INJ_NAME_MAXLEN];
+
+	/* ID of the process where the injection point is allowed to run */
+	int			pid;
+} InjectionPointCondition;
 
 /* Shared state information for injection points. */
 typedef struct InjectionPointSharedState
@@ -44,6 +63,9 @@ typedef struct InjectionPointSharedState
 	/* Names of injection points attached to wait counters */
 	char		name[INJ_MAX_WAIT][INJ_NAME_MAXLEN];
 
+	/* Condition to run an injection point */
+	InjectionPointCondition condition[INJ_MAX_CONDITION];
+
 	/* Condition variable used for waits and wakeups */
 	ConditionVariable wait_point;
 } InjectionPointSharedState;
@@ -55,6 +77,8 @@ extern PGDLLEXPORT void injection_error(const char *name);
 extern PGDLLEXPORT void injection_notice(const char *name);
 extern PGDLLEXPORT void injection_wait(const char *name);
 
+/* track if injection points attached in this process are linked to it */
+static bool injection_point_local = false;
 
 /*
  * Callback for shared memory area initialization.
@@ -67,6 +91,7 @@ injection_point_init_state(void *ptr)
 	SpinLockInit(&state->lock);
 	memset(state->wait_counts, 0, sizeof(state->wait_counts));
 	memset(state->name, 0, sizeof(state->name));
+	memset(state->condition, 0, sizeof(state->condition));
 	ConditionVariableInit(&state->wait_point);
 }
 
@@ -87,16 +112,92 @@ injection_init_shmem(void)
 								   &found);
 }
 
+/*
+ * Check runtime conditions associated to an injection point.
+ *
+ * Returns true if the named injection point is allowed to run, and false
+ * otherwise.  Multiple conditions can be associated to a single injection
+ * point, so check them all.
+ */
+static bool
+injection_point_allowed(const char *name)
+{
+	bool		result = true;
+
+	if (inj_state == NULL)
+		injection_init_shmem();
+
+	SpinLockAcquire(&inj_state->lock);
+
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition *condition = &inj_state->condition[i];
+
+		if (strcmp(condition->name, name) == 0)
+		{
+			/*
+			 * Check if this injection point is allowed to run in this
+			 * process.
+			 */
+			if (MyProcPid != condition->pid)
+			{
+				result = false;
+				break;
+			}
+		}
+	}
+
+	SpinLockRelease(&inj_state->lock);
+
+	return result;
+}
+
+/*
+ * before_shmem_exit callback to remove injection points linked to a
+ * specific process.
+ */
+static void
+injection_points_cleanup(int code, Datum arg)
+{
+	/* Leave if nothing is tracked locally */
+	if (!injection_point_local)
+		return;
+
+	SpinLockAcquire(&inj_state->lock);
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition *condition = &inj_state->condition[i];
+
+		if (condition->name[0] == '\0')
+			continue;
+
+		if (condition->pid != MyProcPid)
+			continue;
+
+		/* Detach the injection point and unregister condition */
+		InjectionPointDetach(condition->name);
+		condition->name[0] = '\0';
+		condition->pid = 0;
+	}
+	SpinLockRelease(&inj_state->lock);
+}
+
 /* Set of callbacks available to be attached to an injection point. */
 void
 injection_error(const char *name)
 {
+	if (!injection_point_allowed(name))
+		return;
+
 	elog(ERROR, "error triggered for injection point %s", name);
 }
 
 void
 injection_notice(const char *name)
 {
+	if (!injection_point_allowed(name))
+		return;
+
 	elog(NOTICE, "notice triggered for injection point %s", name);
 }
 
@@ -111,6 +212,9 @@ injection_wait(const char *name)
 	if (inj_state == NULL)
 		injection_init_shmem();
 
+	if (!injection_point_allowed(name))
+		return;
+
 	/*
 	 * Use the injection point name for this custom wait event.  Note that
 	 * this custom wait event name is not released, but we don't care much for
@@ -182,6 +286,35 @@ injection_points_attach(PG_FUNCTION_ARGS)
 
 	InjectionPointAttach(name, "injection_points", function);
 
+	if (injection_point_local)
+	{
+		int			index = -1;
+
+		/*
+		 * Register runtime condition to link this injection point to the
+		 * current process.
+		 */
+		SpinLockAcquire(&inj_state->lock);
+		for (int i = 0; i < INJ_MAX_CONDITION; i++)
+		{
+			InjectionPointCondition *condition = &inj_state->condition[i];
+
+			if (condition->name[0] == '\0')
+			{
+				index = i;
+				strlcpy(condition->name, name, INJ_NAME_MAXLEN);
+				condition->pid = MyProcPid;
+				break;
+			}
+		}
+		SpinLockRelease(&inj_state->lock);
+
+		if (index < 0)
+			elog(FATAL,
+				 "could not find free slot for condition of injection point %s",
+				 name);
+	}
+
 	PG_RETURN_VOID();
 }
 
@@ -235,6 +368,32 @@ injection_points_wakeup(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * injection_points_local
+ *
+ * Track if any injection point created in this process ought to run only
+ * in this process.  Such injection points are detached automatically when
+ * this process exits.  This is useful to make test suites concurrent-safe.
+ */
+PG_FUNCTION_INFO_V1(injection_points_local);
+Datum
+injection_points_local(PG_FUNCTION_ARGS)
+{
+	/* Enable flag to add a runtime condition based on this process ID */
+	injection_point_local = true;
+
+	if (inj_state == NULL)
+		injection_init_shmem();
+
+	/*
+	 * Register a before_shmem_exit callback to remove any injection points
+	 * linked to this process.
+	 */
+	before_shmem_exit(injection_points_cleanup, (Datum) 0);
+
+	PG_RETURN_VOID();
+}
+
 /*
  * SQL function for dropping an injection point.
  */
@@ -246,5 +405,22 @@ injection_points_detach(PG_FUNCTION_ARGS)
 
 	InjectionPointDetach(name);
 
+	if (inj_state == NULL)
+		injection_init_shmem();
+
+	/* Clean up any conditions associated to this injection point */
+	SpinLockAcquire(&inj_state->lock);
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition *condition = &inj_state->condition[i];
+
+		if (strcmp(condition->name, name) == 0)
+		{
+			condition->pid = 0;
+			condition->name[0] = '\0';
+		}
+	}
+	SpinLockRelease(&inj_state->lock);
+
 	PG_RETURN_VOID();
 }
diff --git a/src/test/modules/injection_points/sql/injection_points.sql b/src/test/modules/injection_points/sql/injection_points.sql
index 23c7e435ad..27424bdfe7 100644
--- a/src/test/modules/injection_points/sql/injection_points.sql
+++ b/src/test/modules/injection_points/sql/injection_points.sql
@@ -30,5 +30,25 @@ SELECT injection_points_run('TestInjectionLog2'); -- notice
 SELECT injection_points_detach('TestInjectionLog'); -- fails
 
 SELECT injection_points_run('TestInjectionLog2'); -- notice
+SELECT injection_points_detach('TestInjectionLog2');
+
+-- Runtime conditions
+SELECT injection_points_attach('TestConditionError', 'error');
+-- Any follow-up injection point attached will be local to this process.
+SELECT injection_points_local();
+SELECT injection_points_attach('TestConditionLocal1', 'error');
+SELECT injection_points_attach('TestConditionLocal2', 'notice');
+SELECT injection_points_run('TestConditionLocal1'); -- error
+SELECT injection_points_run('TestConditionLocal2'); -- notice
+-- reload, local injection points should be gone.
+\c
+SELECT injection_points_run('TestConditionLocal1'); -- nothing
+SELECT injection_points_run('TestConditionLocal2'); -- nothing
+SELECT injection_points_run('TestConditionError'); -- error
+SELECT injection_points_detach('TestConditionError');
+-- Attaching injection points that use the same name as one defined locally
+-- previously should work.
+SELECT injection_points_attach('TestConditionLocal1', 'error');
+SELECT injection_points_detach('TestConditionLocal1');
 
 DROP EXTENSION injection_points;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6ca93b1e47..d429d6035a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1210,6 +1210,7 @@ InitSampleScan_function
 InitializeDSMForeignScan_function
 InitializeWorkerForeignScan_function
 InjectionPointCacheEntry
+InjectionPointCondition
 InjectionPointEntry
 InjectionPointSharedState
 InlineCodeBlock
-- 
2.43.0

v2-0002-Switch-GIN-tests-with-injection-points-to-be-conc.patchtext/x-diff; charset=us-asciiDownload

From 8d7338f68b6add2b28acfebd7541ef5c21913264 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 18 Mar 2024 09:56:35 +0900
Subject: [PATCH v2 2/2] Switch GIN tests with injection points to be
 concurrent-safe

---
 src/test/modules/gin/Makefile                           | 3 ---
 src/test/modules/gin/expected/gin_incomplete_splits.out | 7 +++++++
 src/test/modules/gin/meson.build                        | 2 --
 src/test/modules/gin/sql/gin_incomplete_splits.sql      | 3 +++
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/test/modules/gin/Makefile b/src/test/modules/gin/Makefile
index da4c9cea5e..e007e38ac2 100644
--- a/src/test/modules/gin/Makefile
+++ b/src/test/modules/gin/Makefile
@@ -4,9 +4,6 @@ EXTRA_INSTALL = src/test/modules/injection_points
 
 REGRESS = gin_incomplete_splits
 
-# The injection points are cluster-wide, so disable installcheck
-NO_INSTALLCHECK = 1
-
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/gin/expected/gin_incomplete_splits.out b/src/test/modules/gin/expected/gin_incomplete_splits.out
index 822b78cffc..f503d3dd3c 100644
--- a/src/test/modules/gin/expected/gin_incomplete_splits.out
+++ b/src/test/modules/gin/expected/gin_incomplete_splits.out
@@ -12,6 +12,13 @@
 -- This uses injection points to cause errors that leave some page
 -- splits in "incomplete" state
 create extension injection_points;
+-- Make all injection points local to this process, for concurrency.
+SELECT injection_points_local();
+ injection_points_local 
+------------------------
+ 
+(1 row)
+
 -- Use the index for all the queries
 set enable_seqscan=off;
 -- Print a NOTICE whenever an incomplete split gets fixed
diff --git a/src/test/modules/gin/meson.build b/src/test/modules/gin/meson.build
index 5ec0760a27..9734b51de2 100644
--- a/src/test/modules/gin/meson.build
+++ b/src/test/modules/gin/meson.build
@@ -12,7 +12,5 @@ tests += {
     'sql': [
       'gin_incomplete_splits',
     ],
-    # The injection points are cluster-wide, so disable installcheck
-    'runningcheck': false,
   },
 }
diff --git a/src/test/modules/gin/sql/gin_incomplete_splits.sql b/src/test/modules/gin/sql/gin_incomplete_splits.sql
index 71253b71f3..f413b54708 100644
--- a/src/test/modules/gin/sql/gin_incomplete_splits.sql
+++ b/src/test/modules/gin/sql/gin_incomplete_splits.sql
@@ -14,6 +14,9 @@
 -- splits in "incomplete" state
 create extension injection_points;
 
+-- Make all injection points local to this process, for concurrency.
+SELECT injection_points_local();
+
 -- Use the index for all the queries
 set enable_seqscan=off;
 
-- 
2.43.0

#20

Andrey M. Borodin

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Michael Paquier (#19)

Re: Weird test mixup

On 18 Mar 2024, at 06:04, Michael Paquier <michael@paquier.xyz> wrote:

new function call injection_points_local() that can
be added on top of a SQL test to make it concurrent-safe.

Maybe consider function injection_points_attach_local(‘point name’) instead of static switch?
Or even injection_points_attach_global(‘point name’), while function injection_points_attach(‘point name’) will be global? This would favour writing concurrent test by default…

Best regards, Andrey Borodin.

#21

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andrey M. Borodin (#20)

Re: Weird test mixup

On Mon, Mar 18, 2024 at 10:50:25AM +0500, Andrey M. Borodin wrote:

Maybe consider function injection_points_attach_local(‘point name’)
instead of static switch?
Or even injection_points_attach_global(‘point name’), while function
injection_points_attach(‘point name’) will be global? This would
favour writing concurrent test by default…

The point is to limit accidents like the one of this thread. So, for
cases already in the tree, not giving the point name in the SQL
function would be simple enough.

What you are suggesting can be simply done, as well, though I'd rather
wait for a reason to justify doing so.
--
Michael

#22

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#19)

Re: Weird test mixup

On Mon, Mar 18, 2024 at 10:04:45AM +0900, Michael Paquier wrote:

Please find a patch to do exactly that, without touching the backend
APIs. 0001 adds a new function call injection_points_local() that can
be added on top of a SQL test to make it concurrent-safe. 0002 is the
fix for the GIN tests.

I am going to add an open item to not forget about all that.

It's been a couple of weeks since this has been sent, and this did not
get any reviews. I'd still be happy with the simplicity of a single
injection_points_local() that can be used to link all the injection
points created in a single process to it, discarding them once the
process exists with a shmem exit callback. And I don't really see an
argument to tweak the backend-side routines, as well. Comments and/or
objections?
--
Michael

#23

Andrey M. Borodin

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Michael Paquier (#22)

Re: Weird test mixup

On 5 Apr 2024, at 07:19, Michael Paquier <michael@paquier.xyz> wrote:

It's been a couple of weeks since this has been sent, and this did not
get any reviews. I'd still be happy with the simplicity of a single
injection_points_local() that can be used to link all the injection
points created in a single process to it, discarding them once the
process exists with a shmem exit callback.

OK, makes sense.
I find name of the function "injection_points_local()" strange, because there is no verb in the name. How about "injection_points_set_local"?

And I don't really see an
argument to tweak the backend-side routines, as well.
Comments and/or
objections?

I'm not sure if we should refactor anything here, but InjectionPointSharedState has singular name, plural wait_counts and singular condition.
InjectionPointSharedState is already an array of injection points, maybe let's add there optional pid instead of inventing separate array of pids?

Can we set global point to 'notice', but same local to 'wait'? Looks like now we can't, but allowing to do so would make code simpler.

Besides this opportunity to simplify stuff, both patches looks good to me.

Best regards, Andrey Borodin.

#24

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andrey M. Borodin (#23)

1 attachment(s)

Re: Weird test mixup

On Sat, Apr 06, 2024 at 10:34:46AM +0500, Andrey M. Borodin wrote:

I find name of the function "injection_points_local()" strange,
because there is no verb in the name. How about
"injection_points_set_local"?

That makes sense.

I'm not sure if we should refactor anything here, but
InjectionPointSharedState has singular name, plural wait_counts and
singular condition.
InjectionPointSharedState is already an array of injection points,
maybe let's add there optional pid instead of inventing separate
array of pids?

Perhaps we could unify these two concepts, indeed, with a "kind" added
to InjectionPointCondition. Now waits/wakeups are a different beast
than the conditions that could be assigned to a point to filter if it
should be executed. More runtime conditions coming immediately into
my mind, that could be added to this structure relate mostly to global
objects, like:
- Specific database name and/or OID.
- Specific role(s).
So that's mostly cross-checking states coming from miscadmin.h for
now.

Can we set global point to 'notice', but same local to 'wait'? Looks
like now we can't, but allowing to do so would make code simpler.

You mean using the name point name with more than more callback? Not
sure we'd want to be able to do that. Perhaps you're right, though,
if there is a use case that justifies it.

Besides this opportunity to simplify stuff, both patches looks good
to me.

Yeah, this module can be always tweaked more if necessary. Saying
that, naming the new thing "condition" in InjectionPointSharedState
felt strange, as you said, because it is an array of multiple
conditions.

For now I have applied 997db123c054 to make the GIN tests with
injection points repeatable as it was an independent issue, and
f587338dec87 to add the local function pieces.

Attached is the last piece to switch the GIN test to use local
injection points. 85f65d7a26fc should maintain the buildfarm at bay,
but I'd rather not take a bet and accidently freeze the buildfarm as
it would impact folks who aim at getting patches committed just before
the finish line. So I am holding on this one for a few more days
until we're past the freeze and the buildfarm is more stable.
--
Michael

Attachments:

v3-0002-Switch-GIN-tests-with-injection-points-to-be-conc.patchtext/x-diff; charset=us-asciiDownload

From bc2ce072cad0efa94084297330db3102ee346c25 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 8 Apr 2024 10:20:19 +0900
Subject: [PATCH v3] Switch GIN tests with injection points to be
 concurrent-safe

---
 src/test/modules/gin/Makefile                           | 3 ---
 src/test/modules/gin/expected/gin_incomplete_splits.out | 7 +++++++
 src/test/modules/gin/meson.build                        | 2 --
 src/test/modules/gin/sql/gin_incomplete_splits.sql      | 3 +++
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/test/modules/gin/Makefile b/src/test/modules/gin/Makefile
index da4c9cea5e..e007e38ac2 100644
--- a/src/test/modules/gin/Makefile
+++ b/src/test/modules/gin/Makefile
@@ -4,9 +4,6 @@ EXTRA_INSTALL = src/test/modules/injection_points
 
 REGRESS = gin_incomplete_splits
 
-# The injection points are cluster-wide, so disable installcheck
-NO_INSTALLCHECK = 1
-
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/gin/expected/gin_incomplete_splits.out b/src/test/modules/gin/expected/gin_incomplete_splits.out
index 973a8ce6c8..15574e547a 100644
--- a/src/test/modules/gin/expected/gin_incomplete_splits.out
+++ b/src/test/modules/gin/expected/gin_incomplete_splits.out
@@ -12,6 +12,13 @@
 -- This uses injection points to cause errors that leave some page
 -- splits in "incomplete" state
 create extension injection_points;
+-- Make all injection points local to this process, for concurrency.
+SELECT injection_points_set_local();
+ injection_points_set_local 
+----------------------------
+ 
+(1 row)
+
 -- Use the index for all the queries
 set enable_seqscan=off;
 -- Print a NOTICE whenever an incomplete split gets fixed
diff --git a/src/test/modules/gin/meson.build b/src/test/modules/gin/meson.build
index 5ec0760a27..9734b51de2 100644
--- a/src/test/modules/gin/meson.build
+++ b/src/test/modules/gin/meson.build
@@ -12,7 +12,5 @@ tests += {
     'sql': [
       'gin_incomplete_splits',
     ],
-    # The injection points are cluster-wide, so disable installcheck
-    'runningcheck': false,
   },
 }
diff --git a/src/test/modules/gin/sql/gin_incomplete_splits.sql b/src/test/modules/gin/sql/gin_incomplete_splits.sql
index ea3667b38d..ebf0f620f0 100644
--- a/src/test/modules/gin/sql/gin_incomplete_splits.sql
+++ b/src/test/modules/gin/sql/gin_incomplete_splits.sql
@@ -14,6 +14,9 @@
 -- splits in "incomplete" state
 create extension injection_points;
 
+-- Make all injection points local to this process, for concurrency.
+SELECT injection_points_set_local();
+
 -- Use the index for all the queries
 set enable_seqscan=off;
 
-- 
2.43.0

#25

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#24)

1 attachment(s)

Re: Weird test mixup

On Mon, Apr 08, 2024 at 10:22:40AM +0900, Michael Paquier wrote:

For now I have applied 997db123c054 to make the GIN tests with
injection points repeatable as it was an independent issue, and
f587338dec87 to add the local function pieces.

Bharath has reported me offlist that one of the new tests has a race
condition when doing the reconnection.  When the backend creating the
local points is very slow to exit, the backend created after the
reconnection may detect that a local point previously created still
exists, causing a failure.  The failure can be reproduced with a sleep
in the shmem exit callback, like:
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -163,6 +163,8 @@ injection_points_cleanup(int code, Datum arg)
 	if (!injection_point_local)
 		return;

+ pg_usleep(1000000 * 1L);
+
SpinLockAcquire(&inj_state->lock);
for (int i = 0; i < INJ_MAX_CONDITION; i++)
{

At first I was looking at a loop with a scan of pg_stat_activity, but
I've noticed that regress.so includes a wait_pid() that we can use to
make sure that a given process exits before moving on to the next
parts of a test, so I propose to just reuse that here. This requires
tweaks with --dlpath for meson and ./configure, nothing new. The CI
is clean. Patch attached.

Thoughts?
--
Michael

Attachments:

0001-Stabilize-injection-point-test.patchtext/x-diff; charset=us-asciiDownload

From 8f43807000c1f33a0238d7bbcc148a196e4134e4 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 8 Apr 2024 16:28:17 +0900
Subject: [PATCH] Stabilize injection point test

---
 src/test/modules/injection_points/Makefile       |  1 +
 .../expected/injection_points.out                | 16 ++++++++++++++++
 src/test/modules/injection_points/meson.build    |  1 +
 .../injection_points/sql/injection_points.sql    | 15 +++++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 1cb395c37a..31bd787994 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -7,6 +7,7 @@ DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
 REGRESS = injection_points
+REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/injection_points/expected/injection_points.out b/src/test/modules/injection_points/expected/injection_points.out
index 3d94016ac9..d2a69b54e8 100644
--- a/src/test/modules/injection_points/expected/injection_points.out
+++ b/src/test/modules/injection_points/expected/injection_points.out
@@ -1,4 +1,11 @@
 CREATE EXTENSION injection_points;
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+\set regresslib :libdir '/regress' :dlsuffix
+CREATE FUNCTION wait_pid(int)
+  RETURNS void
+  AS :'regresslib'
+  LANGUAGE C STRICT;
 SELECT injection_points_attach('TestInjectionBooh', 'booh');
 ERROR:  incorrect action "booh" for injection point creation
 SELECT injection_points_attach('TestInjectionError', 'error');
@@ -156,8 +163,17 @@ NOTICE:  notice triggered for injection point TestConditionLocal2
  
 (1 row)
 
+select pg_backend_pid() as oldpid \gset
 -- reload, local injection points should be gone.
 \c
+-- Wait for the previous backend process to exit, ensuring that its local
+-- injection points are cleaned up.
+SELECT wait_pid(:'oldpid');
+ wait_pid 
+----------
+ 
+(1 row)
+
 SELECT injection_points_run('TestConditionLocal1'); -- nothing
  injection_points_run 
 ----------------------
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index a29217f75f..8e1b5b4539 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -33,6 +33,7 @@ tests += {
     'sql': [
       'injection_points',
     ],
+    'regress_args': ['--dlpath', meson.build_root() / 'src/test/regress'],
     # The injection points are cluster-wide, so disable installcheck
     'runningcheck': false,
   },
diff --git a/src/test/modules/injection_points/sql/injection_points.sql b/src/test/modules/injection_points/sql/injection_points.sql
index 2aa76a542b..e18146eb8c 100644
--- a/src/test/modules/injection_points/sql/injection_points.sql
+++ b/src/test/modules/injection_points/sql/injection_points.sql
@@ -1,5 +1,14 @@
 CREATE EXTENSION injection_points;
 
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+\set regresslib :libdir '/regress' :dlsuffix
+
+CREATE FUNCTION wait_pid(int)
+  RETURNS void
+  AS :'regresslib'
+  LANGUAGE C STRICT;
+
 SELECT injection_points_attach('TestInjectionBooh', 'booh');
 SELECT injection_points_attach('TestInjectionError', 'error');
 SELECT injection_points_attach('TestInjectionLog', 'notice');
@@ -40,8 +49,14 @@ SELECT injection_points_attach('TestConditionLocal1', 'error');
 SELECT injection_points_attach('TestConditionLocal2', 'notice');
 SELECT injection_points_run('TestConditionLocal1'); -- error
 SELECT injection_points_run('TestConditionLocal2'); -- notice
+
+select pg_backend_pid() as oldpid \gset
+
 -- reload, local injection points should be gone.
 \c
+-- Wait for the previous backend process to exit, ensuring that its local
+-- injection points are cleaned up.
+SELECT wait_pid(:'oldpid');
 SELECT injection_points_run('TestConditionLocal1'); -- nothing
 SELECT injection_points_run('TestConditionLocal2'); -- nothing
 SELECT injection_points_run('TestConditionError'); -- error
-- 
2.43.0

#26

Andrey M. Borodin

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Michael Paquier (#25)

Re: Weird test mixup

On 8 Apr 2024, at 10:33, Michael Paquier <michael@paquier.xyz> wrote:

Thoughts?

As an alternative we can make local injection points mutually exclusive.

Best regards, Andrey Borodin.

#27

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andrey M. Borodin (#26)

Re: Weird test mixup

On Mon, Apr 08, 2024 at 10:42:08AM +0300, Andrey M. Borodin wrote:

As an alternative we can make local injection points mutually exclusive.

Sure. Now, the point of the test is to make sure that the local
cleanup happens, so I'd rather keep it as-is and use the same names
across reloads.
--
Michael

#28

Andrey M. Borodin

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Michael Paquier (#27)

Re: Weird test mixup

On 8 Apr 2024, at 11:55, Michael Paquier <michael@paquier.xyz> wrote:

the point of the test is to make sure that the local
cleanup happens

Uh, I did not understand this. Because commit message was about stabiilzizing tests, not extending coverage.
Also, should we drop function wait_pid() at the end of a test?
Given that tweaks with are nothing new, I think patch looks good.

Best regards, Andrey Borodin.

#29

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andrey M. Borodin (#28)

Re: Weird test mixup

On Mon, Apr 08, 2024 at 12:29:43PM +0300, Andrey M. Borodin wrote:

On 8 Apr 2024, at 11:55, Michael Paquier <michael@paquier.xyz> wrote:

Uh, I did not understand this. Because commit message was about
stabiilzizing tests, not extending coverage.

Okay, it is about stabilizing an existing test.

Also, should we drop function wait_pid() at the end of a test?

Sure.

Given that tweaks with are nothing new, I think patch looks good.

Applied that after a second check. And thanks to Bharath for the
poke.
--
Michael

#30

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#29)

Re: Weird test mixup

On Tue, Apr 09, 2024 at 12:41:57PM +0900, Michael Paquier wrote:

Applied that after a second check. And thanks to Bharath for the
poke.

And now that the buildfarm is cooler, I've also applied the final
patch in the series as of 5105c9079681 to make the GIN module
concurrent-safe using injection_points_set_local().
--
Michael

#31

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#18)

Re: Weird test mixup

While writing an injection point test, I encountered a variant of the race
condition that f4083c4 fixed. It had three sessions and this sequence of
events:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s3: detach POINT, deleting the InjectionPointCondition record
s2: wake up and run POINT as though it had been non-local

On Sat, Mar 16, 2024 at 08:40:21AM +0900, Michael Paquier wrote:

On Fri, Mar 15, 2024 at 11:23:31AM +0200, Heikki Linnakangas wrote:

Wrt. the spinlock and shared memory handling, I think this would be simpler
if you could pass some payload in the InjectionPointAttach() call, which
would be passed back to the callback function:

In this case, the payload would be the "slot index" in shared memory.

Or perhaps always allocate, say, 1024 bytes of working area for every
attached injection point that the test module can use any way it wants. Like
for storing extra conditions, or for the wakeup counter stuff in
injection_wait(). A fixed size working area is a little crude, but would be
very handy in practice.

That would be one good way to solve it. (Storing a slot index has the same
race condition, but it fixes the race to store a struct containing the PID.)

The best alternative I see is to keep an InjectionPointCondition forever after
creating it. Give it a "bool valid" field that we set on detach. I don't see
a major reason to prefer one of these over the other. One puts a negligible
amount of memory pressure on the main segment, but it simplifies the module
code. I lean toward the "1024 bytes of working area" idea. Other ideas or
opinions?

Separately, injection_points_cleanup() breaks the rules by calling
InjectionPointDetach() while holding a spinlock. The latter has an
elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock. I haven't
given as much thought to solutions for this one.

Thanks,
nm

#32

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#31)

1 attachment(s)

Re: Weird test mixup

On Wed, May 01, 2024 at 04:12:14PM -0700, Noah Misch wrote:

While writing an injection point test, I encountered a variant of the race
condition that f4083c4 fixed. It had three sessions and this sequence of
events:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s3: detach POINT, deleting the InjectionPointCondition record
s2: wake up and run POINT as though it had been non-local

Fun. One thing I would ask is why it makes sense to be able to detach
a local point from a different session than the one who defined it as
local. Shouldn't the operation of s3 be restricted rather than
authorized as a safety measure, instead?

On Sat, Mar 16, 2024 at 08:40:21AM +0900, Michael Paquier wrote:

On Fri, Mar 15, 2024 at 11:23:31AM +0200, Heikki Linnakangas wrote:

Wrt. the spinlock and shared memory handling, I think this would be simpler
if you could pass some payload in the InjectionPointAttach() call, which
would be passed back to the callback function:

In this case, the payload would be the "slot index" in shared memory.

Or perhaps always allocate, say, 1024 bytes of working area for every
attached injection point that the test module can use any way it wants. Like
for storing extra conditions, or for the wakeup counter stuff in
injection_wait(). A fixed size working area is a little crude, but would be
very handy in practice.

That would be one good way to solve it. (Storing a slot index has the same
race condition, but it fixes the race to store a struct containing the PID.)

The best alternative I see is to keep an InjectionPointCondition forever after
creating it. Give it a "bool valid" field that we set on detach. I don't see
a major reason to prefer one of these over the other. One puts a negligible
amount of memory pressure on the main segment, but it simplifies the module
code. I lean toward the "1024 bytes of working area" idea. Other ideas or
opinions?

If more state data is needed, the fixed area injection_point.c would
be better. Still, I am not sure that this is required here, either.

Separately, injection_points_cleanup() breaks the rules by calling
InjectionPointDetach() while holding a spinlock. The latter has an
elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock. I haven't
given as much thought to solutions for this one.

Indeed. That's a brain fade. This one could be fixed by collecting
the point names when cleaning up the conditions and detach after
releasing the spinlock. This opens a race condition between the
moment when the spinlock is released and the detach, where another
backend could come in and detach a point before the shmem_exit
callback has the time to do its cleanup, even if detach() is
restricted for local points. So we could do the callback cleanup in
three steps in the shmem exit callback:
- Collect the names of the points to detach, while holding the
spinlock.
- Do the Detach.
- Take again the spinlock, clean up the conditions.

Please see the attached.
--
Michael

Attachments:

injection-point-detach.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index ace386da50..caf58ef9db 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -159,10 +159,39 @@ injection_point_allowed(const char *name)
 static void
 injection_points_cleanup(int code, Datum arg)
 {
+	char		names[INJ_MAX_CONDITION][INJ_NAME_MAXLEN] = {0};
+	int			count = 0;
+
 	/* Leave if nothing is tracked locally */
 	if (!injection_point_local)
 		return;
 
+	/*
+	 * This is done in three steps: detect the points to detach,
+	 * detach them and release their conditions.
+	 */
+	SpinLockAcquire(&inj_state->lock);
+	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	{
+		InjectionPointCondition *condition = &inj_state->conditions[i];
+
+		if (condition->name[0] == '\0')
+			continue;
+
+		if (condition->pid != MyProcPid)
+			continue;
+
+		/* Extract the point name to detach */
+		strlcpy(names[count], condition->name, INJ_NAME_MAXLEN);
+		count++;
+	}
+	SpinLockRelease(&inj_state->lock);
+
+	/* Detach, without holding the spinlock */
+	for (int i = 0; i < count; i++)
+		InjectionPointDetach(names[i]);
+
+	/* Clear all the conditions */
 	SpinLockAcquire(&inj_state->lock);
 	for (int i = 0; i < INJ_MAX_CONDITION; i++)
 	{
@@ -174,8 +203,6 @@ injection_points_cleanup(int code, Datum arg)
 		if (condition->pid != MyProcPid)
 			continue;
 
-		/* Detach the injection point and unregister condition */
-		InjectionPointDetach(condition->name);
 		condition->name[0] = '\0';
 		condition->pid = 0;
 	}
@@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS)
 {
 	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+	if (!injection_point_allowed(name))
+		elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
+			 name);
+
 	InjectionPointDetach(name);
 
 	if (inj_state == NULL)

#33

Andrey M. Borodin

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Michael Paquier (#32)

Re: Weird test mixup

On 2 May 2024, at 12:27, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, May 01, 2024 at 04:12:14PM -0700, Noah Misch wrote:

While writing an injection point test, I encountered a variant of the race
condition that f4083c4 fixed. It had three sessions and this sequence of
events:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s3: detach POINT, deleting the InjectionPointCondition record
s2: wake up and run POINT as though it had been non-local

Fun. One thing I would ask is why it makes sense to be able to detach
a local point from a different session than the one who defined it as
local. Shouldn't the operation of s3 be restricted rather than
authorized as a safety measure, instead?

That seems to prevent meaningful use case. If we want exactly one session to be waiting just before some specific point, the only way to achieve this is to create local injection point. But the session must be resumable from another session.
Without this local waiting injection points are meaningless.

Best regards, Andrey Borodin.

#34

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Andrey M. Borodin (#33)

Re: Weird test mixup

On Thu, May 02, 2024 at 01:33:45PM +0500, Andrey M. Borodin wrote:

That seems to prevent meaningful use case. If we want exactly one
session to be waiting just before some specific point, the only way
to achieve this is to create local injection point. But the session
must be resumable from another session.
Without this local waiting injection points are meaningless.

I am not quite sure to follow your argument here. It is still
possible to attach a local injection point with a wait callback that
can be awaken by a different backend:
s1: select injection_points_set_local();
s1: select injection_points_attach('popo', 'wait');
s1: select injection_points_run('popo'); -- waits
s2: select injection_points_wakeup('popo');
s1: -- ready for action.

A detach is not a wakeup.
--
Michael

#35

Andrey M. Borodin

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Michael Paquier (#34)

Re: Weird test mixup

On 2 May 2024, at 13:43, Michael Paquier <michael@paquier.xyz> wrote:

A detach is not a wakeup.

Oh, now I see. Sorry for the noise.

Detaching local injection point of other backend seems to be useless and can be forbidden.
As far as I understand, your patch is already doing this in
+	if (!injection_point_allowed(name))
+		elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
+			 name);
+

As far as I understand this will effectively forbid calling injection_points_detach() for local injection point of other backend. Do I get it right?

Best regards, Andrey Borodin.

#36

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#32)

Re: Weird test mixup

On Thu, May 02, 2024 at 04:27:12PM +0900, Michael Paquier wrote:

On Wed, May 01, 2024 at 04:12:14PM -0700, Noah Misch wrote:

While writing an injection point test, I encountered a variant of the race
condition that f4083c4 fixed. It had three sessions and this sequence of
events:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s3: detach POINT, deleting the InjectionPointCondition record
s2: wake up and run POINT as though it had been non-local

I should have given a simpler example:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s1: exit
s2: wake up and run POINT as though it had been non-local

Fun. One thing I would ask is why it makes sense to be able to detach
a local point from a different session than the one who defined it as
local. Shouldn't the operation of s3 be restricted rather than
authorized as a safety measure, instead?

(That's orthogonal to the race condition.) When s1 would wait at the
injection point multiple times in one SQL statement, I like issuing the detach
from s3 so s1 waits at just the first encounter with the injection point.
This mimics setting a gdb breakpoint and deleting that breakpoint before
"continue". The alternative, waking s1 repeatedly until it finishes the SQL
statement, is less convenient. (I also patched _detach() to wake the waiter,
and I plan to propose that.)

Separately, injection_points_cleanup() breaks the rules by calling
InjectionPointDetach() while holding a spinlock. The latter has an
elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock. I haven't
given as much thought to solutions for this one.

Indeed. That's a brain fade. This one could be fixed by collecting
the point names when cleaning up the conditions and detach after
releasing the spinlock. This opens a race condition between the
moment when the spinlock is released and the detach, where another
backend could come in and detach a point before the shmem_exit
callback has the time to do its cleanup, even if detach() is
restricted for local points. So we could do the callback cleanup in

That race condition seems fine. The test can be expected to control the
timing of backend exits vs. detach calls. Unlike the InjectionPointRun()
race, it wouldn't affect backends unrelated to the test.

three steps in the shmem exit callback:
- Collect the names of the points to detach, while holding the
spinlock.
- Do the Detach.
- Take again the spinlock, clean up the conditions.

Please see the attached.

The injection_points_cleanup() parts look good. Thanks.

@@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS)
{
char *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	if (!injection_point_allowed(name))
+		elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
+			 name);
+

As above, I disagree with the injection_points_detach() part.

#37

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Andrey M. Borodin (#35)

Re: Weird test mixup

On Thu, May 02, 2024 at 01:52:20PM +0500, Andrey M. Borodin wrote:

As far as I understand this will effectively forbid calling
injection_points_detach() for local injection point of other
backend. Do I get it right?

Yes, that would be the intention. Noah has other use cases in mind
with this interface that I did not think about supporting, hence he's
objecting to this restriction.
--
Michael

#38

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#36)

Re: Weird test mixup

On Thu, May 02, 2024 at 12:35:55PM -0700, Noah Misch wrote:

I should have given a simpler example:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s1: exit
s2: wake up and run POINT as though it had been non-local

Hmm. Even if you were to emulate that in a controlled manner, you
would need a second injection point that does a wait in s2, which is
something that would happen before injection_callback() and before
scanning the local entry. This relies on the fact that we're holding
CPU in s2 between the backend shmem hash table lookup and the callback
being called.

Fun. One thing I would ask is why it makes sense to be able to detach
a local point from a different session than the one who defined it as
local. Shouldn't the operation of s3 be restricted rather than
authorized as a safety measure, instead?

(That's orthogonal to the race condition.) When s1 would wait at the
injection point multiple times in one SQL statement, I like issuing the detach
from s3 so s1 waits at just the first encounter with the injection point.
This mimics setting a gdb breakpoint and deleting that breakpoint before
"continue". The alternative, waking s1 repeatedly until it finishes the SQL
statement, is less convenient. (I also patched _detach() to wake the waiter,
and I plan to propose that.)

Okay.

Indeed. That's a brain fade. This one could be fixed by collecting
the point names when cleaning up the conditions and detach after
releasing the spinlock. This opens a race condition between the
moment when the spinlock is released and the detach, where another
backend could come in and detach a point before the shmem_exit
callback has the time to do its cleanup, even if detach() is
restricted for local points. So we could do the callback cleanup in

That race condition seems fine. The test can be expected to control the
timing of backend exits vs. detach calls. Unlike the InjectionPointRun()
race, it wouldn't affect backends unrelated to the test.

Sure. The fact that there are two spinlocks in the backend code and
the module opens concurrency issues. Making that more robust if there
is a case for it is OK by me, but I'd rather avoid making the
backend-side more complicated than need be.

three steps in the shmem exit callback:
- Collect the names of the points to detach, while holding the
spinlock.
- Do the Detach.
- Take again the spinlock, clean up the conditions.

Please see the attached.

The injection_points_cleanup() parts look good. Thanks.

Thanks for the check.

@@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS)
{
char *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	if (!injection_point_allowed(name))
+		elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
+			 name);
+
As above, I disagree with the injection_points_detach() part.

Okay, noted. Fine by me to expand this stuff as you feel, the code
has been written to be extended depending on what people want to
support. There should be tests in the tree that rely on any
new behavior, though.

I've applied the patch to fix the spinlock logic in the exit callback
for now.
--
Michael

#39

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#38)

1 attachment(s)

Re: Weird test mixup

On Mon, May 06, 2024 at 10:03:37AM +0900, Michael Paquier wrote:

On Thu, May 02, 2024 at 12:35:55PM -0700, Noah Misch wrote:

I should have given a simpler example:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s1: exit
s2: wake up and run POINT as though it had been non-local

Here's how I've patched it locally. It does avoid changing the backend-side,
which has some attraction. Shall I just push this?

Hmm. Even if you were to emulate that in a controlled manner, you
would need a second injection point that does a wait in s2, which is
something that would happen before injection_callback() and before
scanning the local entry. This relies on the fact that we're holding
CPU in s2 between the backend shmem hash table lookup and the callback
being called.

Right. We would need "second-level injection points" to write a test for that
race in the injection point system.

Attachments:

inplace030-inj-exit-race-v1.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix race condition in backend exit after injection_points_set_local().
    
    Symptoms were like those prompting commit
    f587338dec87d3c35b025e131c5977930ac69077.  That is, under installcheck,
    backends from unrelated tests could run the injection points.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index ace386d..75466d3 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -37,7 +37,13 @@ PG_MODULE_MAGIC;
 
 /*
  * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run.
+ * runtime conditions under which an injection point is allowed to run.  Once
+ * set, a name is never changed.  That avoids a race condition:
+ *
+ *	s1: local-attach to POINT
+ *	s2: yield CPU before InjectionPointRun(POINT) calls injection_callback()
+ *	s1: exit()
+ *	s2: run POINT as though it had been non-local
  *
  * If more types of runtime conditions need to be tracked, this structure
  * should be expanded.
@@ -49,6 +55,9 @@ typedef struct InjectionPointCondition
 
 	/* ID of the process where the injection point is allowed to run */
 	int			pid;
+
+	/* Should "pid" run this injection point, or not? */
+	bool		valid;
 } InjectionPointCondition;
 
 /* Shared state information for injection points. */
@@ -133,7 +142,7 @@ injection_point_allowed(const char *name)
 	{
 		InjectionPointCondition *condition = &inj_state->conditions[i];
 
-		if (strcmp(condition->name, name) == 0)
+		if (condition->valid && strcmp(condition->name, name) == 0)
 		{
 			/*
 			 * Check if this injection point is allowed to run in this
@@ -168,15 +177,16 @@ injection_points_cleanup(int code, Datum arg)
 	{
 		InjectionPointCondition *condition = &inj_state->conditions[i];
 
-		if (condition->name[0] == '\0')
+		if (!condition->valid)
 			continue;
+		Assert(condition->name[0] != '\0');
 
 		if (condition->pid != MyProcPid)
 			continue;
 
 		/* Detach the injection point and unregister condition */
 		InjectionPointDetach(condition->name);
-		condition->name[0] = '\0';
+		condition->valid = false;
 		condition->pid = 0;
 	}
 	SpinLockRelease(&inj_state->lock);
@@ -299,11 +309,13 @@ injection_points_attach(PG_FUNCTION_ARGS)
 		{
 			InjectionPointCondition *condition = &inj_state->conditions[i];
 
-			if (condition->name[0] == '\0')
+			if (strcmp(condition->name, name) == 0 ||
+				condition->name[0] == '\0')
 			{
 				index = i;
 				strlcpy(condition->name, name, INJ_NAME_MAXLEN);
 				condition->pid = MyProcPid;
+				condition->valid = true;
 				break;
 			}
 		}
@@ -416,8 +428,8 @@ injection_points_detach(PG_FUNCTION_ARGS)
 
 		if (strcmp(condition->name, name) == 0)
 		{
+			condition->valid = false;
 			condition->pid = 0;
-			condition->name[0] = '\0';
 		}
 	}
 	SpinLockRelease(&inj_state->lock);

#40

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#39)

1 attachment(s)

Re: Weird test mixup

On Mon, May 06, 2024 at 02:23:24PM -0700, Noah Misch wrote:

Here's how I've patched it locally. It does avoid changing the backend-side,
which has some attraction. Shall I just push this?

It looks like you did not rebase on top of HEAD to avoid the spinlock
taken with InjectionPointDetach() in the shmem callback. I think that
you'd mean the attached, once rebased (apologies I've reset the author
field).

+ *    s1: local-attach to POINT
+ *    s2: yield CPU before InjectionPointRun(POINT) calls injection_callback()
+ *    s1: exit()
+ *    s2: run POINT as though it had been non-local

I see. So you are taking a shortcut in the shape of never resetting
the name of a condition, so as it is possible to let the point of step
4 check the runtime condition with a condition still stored while the
point has been detached, removed from the hash table.

if (strcmp(condition->name, name) == 0)
{
+ condition->valid = false;
condition->pid = 0;
- condition->name[0] = '\0';
}
}

As of HEAD, we rely on InjectionPointCondition->name to be set to
check if a condition is valid. Your patch adds a different variable
to do mostly the same thing, and never clears the name in any existing
conditions. A side effect is that this causes the conditions to pile
up on a running server when running installcheck, and assuming that
many test suites are run on a server left running this could cause
spurious failures when failing to find a new slot. Always resetting
condition->name when detaching a point is a simpler flow and saner
IMO.

Overall, this switches from one detach behavior to a different one,
which may or may not be intuitive depending on what one is looking
for. FWIW, I see InjectionPointCondition as something that should be
around as long as its injection point exists, with the condition
entirely gone once the point is detached because it should not exist
anymore on the server running, with no information left in shmem.

Through your patch, you make conditions have a different meaning, with
a mix of "local" definition, but it is kind of permanent as it keeps a
trace of the point's name in shmem. I find the behavior of the patch
less intuitive. Perhaps it would be interesting to see first the bug
and/or problem you are trying to tackle with this different behavior
as I feel like we could do something even with the module as-is. As
far as I understand, the implementation of the module on HEAD allows
one to emulate a breakpoint with a wait/wake, which can avoid the
window mentioned in step 2. Even if a wait point is detached
concurrently, it can be awaken with its traces in shmem removed.
--
Michael

Attachments:

v2-0001-Fix-race-condition-in-backend-exit-after-injectio.patchtext/x-diff; charset=us-asciiDownload

From 9cf38b2f6dadd5b4bb81fe5ccea350d259e6a241 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 7 May 2024 10:15:28 +0900
Subject: [PATCH v2] Fix race condition in backend exit after
 injection_points_set_local().

Symptoms were like those prompting commit
f587338dec87d3c35b025e131c5977930ac69077.  That is, under installcheck,
backends from unrelated tests could run the injection points.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME
---
 .../injection_points/injection_points.c       | 27 ++++++++++++++-----
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index a74a4a28af..8c9a3ebd74 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -37,7 +37,13 @@ PG_MODULE_MAGIC;
 
 /*
  * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run.
+ * runtime conditions under which an injection point is allowed to run.  Once
+ * set, a name is never changed.  That avoids a race condition:
+ *
+ *	s1: local-attach to POINT
+ *	s2: yield CPU before InjectionPointRun(POINT) calls injection_callback()
+ *	s1: exit()
+ *	s2: run POINT as though it had been non-local
  *
  * If more types of runtime conditions need to be tracked, this structure
  * should be expanded.
@@ -49,6 +55,9 @@ typedef struct InjectionPointCondition
 
 	/* ID of the process where the injection point is allowed to run */
 	int			pid;
+
+	/* Should "pid" run this injection point, or not? */
+	bool		valid;
 } InjectionPointCondition;
 
 /* Shared state information for injection points. */
@@ -133,7 +142,7 @@ injection_point_allowed(const char *name)
 	{
 		InjectionPointCondition *condition = &inj_state->conditions[i];
 
-		if (strcmp(condition->name, name) == 0)
+		if (condition->valid && strcmp(condition->name, name) == 0)
 		{
 			/*
 			 * Check if this injection point is allowed to run in this
@@ -175,9 +184,11 @@ injection_points_cleanup(int code, Datum arg)
 	{
 		InjectionPointCondition *condition = &inj_state->conditions[i];
 
-		if (condition->name[0] == '\0')
+		if (!condition->valid)
 			continue;
 
+		Assert(condition->name[0] != '\0');
+
 		if (condition->pid != MyProcPid)
 			continue;
 
@@ -197,14 +208,14 @@ injection_points_cleanup(int code, Datum arg)
 	{
 		InjectionPointCondition *condition = &inj_state->conditions[i];
 
-		if (condition->name[0] == '\0')
+		if (condition->valid)
 			continue;
 
 		if (condition->pid != MyProcPid)
 			continue;
 
-		condition->name[0] = '\0';
 		condition->pid = 0;
+		condition->valid = false;
 	}
 	SpinLockRelease(&inj_state->lock);
 }
@@ -326,11 +337,13 @@ injection_points_attach(PG_FUNCTION_ARGS)
 		{
 			InjectionPointCondition *condition = &inj_state->conditions[i];
 
-			if (condition->name[0] == '\0')
+			if (strcmp(condition->name, name) == 0 ||
+				condition->name[0] == '\0')
 			{
 				index = i;
 				strlcpy(condition->name, name, INJ_NAME_MAXLEN);
 				condition->pid = MyProcPid;
+				condition->valid = true;
 				break;
 			}
 		}
@@ -444,7 +457,7 @@ injection_points_detach(PG_FUNCTION_ARGS)
 		if (strcmp(condition->name, name) == 0)
 		{
 			condition->pid = 0;
-			condition->name[0] = '\0';
+			condition->valid = false;
 		}
 	}
 	SpinLockRelease(&inj_state->lock);
-- 
2.43.0

#41

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#40)

1 attachment(s)

Re: Weird test mixup

On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote:

On Mon, May 06, 2024 at 02:23:24PM -0700, Noah Misch wrote:

Here's how I've patched it locally. It does avoid changing the backend-side,
which has some attraction. Shall I just push this?

It looks like you did not rebase on top of HEAD

Yes, the base was 713cfaf (Sunday).

A side effect is that this causes the conditions to pile
up on a running server when running installcheck, and assuming that
many test suites are run on a server left running this could cause
spurious failures when failing to find a new slot.

Yes, we'd be raising INJ_MAX_CONDITION more often under this approach.

Always resetting
condition->name when detaching a point is a simpler flow and saner
IMO.

Overall, this switches from one detach behavior to a different one,

Can you say more about that? The only behavior change known to me is that a
given injection point workload uses more of INJ_MAX_CONDITION. If there's
another behavior change, it was likely unintended.

which may or may not be intuitive depending on what one is looking
for. FWIW, I see InjectionPointCondition as something that should be
around as long as its injection point exists, with the condition
entirely gone once the point is detached because it should not exist
anymore on the server running, with no information left in shmem.

Through your patch, you make conditions have a different meaning, with
a mix of "local" definition, but it is kind of permanent as it keeps a
trace of the point's name in shmem. I find the behavior of the patch
less intuitive. Perhaps it would be interesting to see first the bug
and/or problem you are trying to tackle with this different behavior
as I feel like we could do something even with the module as-is. As
far as I understand, the implementation of the module on HEAD allows
one to emulate a breakpoint with a wait/wake, which can avoid the
window mentioned in step 2. Even if a wait point is detached
concurrently, it can be awaken with its traces in shmem removed.

The problem I'm trying to tackle in this thread is to make
src/test/modules/gin installcheck-safe. $SUBJECT's commit 5105c90 started
that work, having seen the intarray test suite break when run concurrently
with the injection_points test suite. That combination still does break at
the exit-time race condition. To reproduce, apply this attachment to add
sleeps, and run:

make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1

Separately, I see injection_points_attach() populates InjectionPointCondition
after InjectionPointAttach(). Shouldn't InjectionPointAttach() come last, to
avoid the same sort of race? I've not tried to reproduce that one.

Attachments:

repro-inj-exit-race-v1.patchtext/plain; charset=us-asciiDownload

diff --git a/src/test/modules/gin/expected/gin_incomplete_splits.out b/src/test/modules/gin/expected/gin_incomplete_splits.out
index 15574e5..4bc5ef1 100644
--- a/src/test/modules/gin/expected/gin_incomplete_splits.out
+++ b/src/test/modules/gin/expected/gin_incomplete_splits.out
@@ -126,6 +126,12 @@ SELECT injection_points_attach('gin-leave-leaf-split-incomplete', 'error');
 select insert_until_fail(:next_i) as next_i
 \gset
 NOTICE:  failed with: error triggered for injection point gin-leave-leaf-split-incomplete
+SELECT pg_sleep(10);
+ pg_sleep 
+----------
+ 
+(1 row)
+
 SELECT injection_points_detach('gin-leave-leaf-split-incomplete');
  injection_points_detach 
 -------------------------
diff --git a/src/test/modules/gin/sql/gin_incomplete_splits.sql b/src/test/modules/gin/sql/gin_incomplete_splits.sql
index ebf0f62..fb66bac 100644
--- a/src/test/modules/gin/sql/gin_incomplete_splits.sql
+++ b/src/test/modules/gin/sql/gin_incomplete_splits.sql
@@ -118,6 +118,7 @@ select verify(:next_i);
 SELECT injection_points_attach('gin-leave-leaf-split-incomplete', 'error');
 select insert_until_fail(:next_i) as next_i
 \gset
+SELECT pg_sleep(10);
 SELECT injection_points_detach('gin-leave-leaf-split-incomplete');
 
 -- Verify that a scan works even though there's an incomplete split
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index a74a4a2..f9e8a1f 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -18,6 +18,7 @@
 #include "postgres.h"
 
 #include "fmgr.h"
+#include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "storage/condition_variable.h"
 #include "storage/dsm_registry.h"
@@ -213,6 +214,14 @@ injection_points_cleanup(int code, Datum arg)
 void
 injection_error(const char *name)
 {
+	if (strcmp(name, "gin-leave-leaf-split-incomplete") == 0 &&
+		!injection_point_allowed(name))
+	{
+		sigprocmask(SIG_SETMASK, &BlockSig, NULL);
+		pg_usleep(20 * USECS_PER_SEC); /* let other suite detach */
+		sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+	}
+
 	if (!injection_point_allowed(name))
 		return;

#42

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#41)

Re: Weird test mixup

On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:

On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote:

Overall, this switches from one detach behavior to a different one,

Can you say more about that? The only behavior change known to me is that a
given injection point workload uses more of INJ_MAX_CONDITION. If there's
another behavior change, it was likely unintended.

I see patch inplace030-inj-exit-race-v1.patch does not fix the race seen with
repro-inj-exit-race-v1.patch. I withdraw inplace030-inj-exit-race-v1.patch,
and I withdraw the above question.

Show quoted text

To reproduce, apply [repro-inj-exit-race-v1.patch] to add
sleeps, and run:

make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1

Separately, I see injection_points_attach() populates InjectionPointCondition
after InjectionPointAttach(). Shouldn't InjectionPointAttach() come last, to
avoid the same sort of race? I've not tried to reproduce that one.

#43

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#41)

Re: Weird test mixup

On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:

On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote:

Always resetting
condition->name when detaching a point is a simpler flow and saner
IMO.

Overall, this switches from one detach behavior to a different one,

Can you say more about that? The only behavior change known to me is that a
given injection point workload uses more of INJ_MAX_CONDITION. If there's
another behavior change, it was likely unintended.

As far as I read the previous patch, the conditions stored in
InjectionPointSharedState would never be really gone, even if the
points are removed from InjectionPointHash.

The problem I'm trying to tackle in this thread is to make
src/test/modules/gin installcheck-safe. $SUBJECT's commit 5105c90 started
that work, having seen the intarray test suite break when run concurrently
with the injection_points test suite. That combination still does break at
the exit-time race condition. To reproduce, apply this attachment to add
sleeps, and run:

make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1

Thanks for that. I am not really sure how to protect that without a
small cost in flexibility for the cases of detach vs run paths. This
comes down to the fact that a custom routine could be run while it
could be detached concurrently, removing any stuff a callback could
depend on in the module.

It was mentioned upthread to add to InjectionPointCacheEntry a fixed
area of memory that modules could use to store some "status" data, but
it would not close the run/detach race because a backend could still
hold a pointer to a callback, with concurrent backends playing with
the contents of InjectionPointCacheEntry (concurrent detaches and
attaches that would cause the same entries to be reused).

One way to close entirely the window would be to hold
InjectionPointLock longer in InjectionPointRun() until the callback
finishes or until it triggers an ERROR. This would mean that the case
you've mentioned in [1]/messages/by-id/20240501231214.40@rfd.leadboat.com -- Michael would change, by blocking the detach() of s3
until the callback of s2 finishes. We don't have tests in the tree
that do any of that, so holding InjectionPointLock longer would not
break anything on HEAD. A detach being possible while the callback is
run is something I've considered as valid in d86d20f0ba79, but perhaps
that was too cute of me, even more with the use case of local points.

Separately, I see injection_points_attach() populates InjectionPointCondition
after InjectionPointAttach(). Shouldn't InjectionPointAttach() come last, to
avoid the same sort of race? I've not tried to reproduce that one.

Good point. You could run into the case of a concurrent backend
running an injection point that should be local if waiting between
InjectionPointAttach() and the condition getting registered in
injection_points_attach(). That should be reversed.

[1]: /messages/by-id/20240501231214.40@rfd.leadboat.com -- Michael
--
Michael

#44

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#43)

Re: Weird test mixup

On Thu, May 09, 2024 at 09:37:54AM +0900, Michael Paquier wrote:

On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:

The problem I'm trying to tackle in this thread is to make
src/test/modules/gin installcheck-safe. $SUBJECT's commit 5105c90 started
that work, having seen the intarray test suite break when run concurrently
with the injection_points test suite. That combination still does break at
the exit-time race condition. To reproduce, apply this attachment to add
sleeps, and run:

make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1

Thanks for that. I am not really sure how to protect that without a
small cost in flexibility for the cases of detach vs run paths. This
comes down to the fact that a custom routine could be run while it
could be detached concurrently, removing any stuff a callback could
depend on in the module.

It was mentioned upthread to add to InjectionPointCacheEntry a fixed
area of memory that modules could use to store some "status" data, but
it would not close the run/detach race because a backend could still
hold a pointer to a callback, with concurrent backends playing with
the contents of InjectionPointCacheEntry (concurrent detaches and
attaches that would cause the same entries to be reused).

One way to close entirely the window would be to hold
InjectionPointLock longer in InjectionPointRun() until the callback
finishes or until it triggers an ERROR. This would mean that the case
you've mentioned in [1] would change, by blocking the detach() of s3
until the callback of s2 finishes.

[1] /messages/by-id/20240501231214.40@rfd.leadboat.com

Yes, that would be a loss for test readability. Also, I wouldn't be surprised
if some test will want to attach POINT-B while POINT-A is in injection_wait().
Various options avoiding those limitations:

1. The data area formerly called a "status" area is immutable after attach.
The core code copies it into a stack buffer to pass in a const callback
argument.

2. InjectionPointAttach() returns an attachment serial number, and the
callback receives that as an argument. injection_points.c would maintain
an InjectionPointCondition for every still-attached serial number,
including global attachments. Finding no condition matching the serial
number argument means detach already happened and callback should do
nothing.

3. Move the PID check into core code.

4. Separate the concept of "make ineligible to fire" from "detach", with
stronger locking for detach. v1 pointed in this direction, though not
using that terminology.

5. Injection point has multiple callbacks. At least one runs with the lock
held and can mutate the "status" data. At least one runs without the lock.

(1) is, I think, simple and sufficient. How about that?

#45

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#44)

Re: Weird test mixup

On Wed, May 08, 2024 at 08:15:53PM -0700, Noah Misch wrote:

Yes, that would be a loss for test readability. Also, I wouldn't be surprised
if some test will want to attach POINT-B while POINT-A is in injection_wait().

Not impossible, still annoying with more complex scenarios.

Various options avoiding those limitations:

1. The data area formerly called a "status" area is immutable after attach.
The core code copies it into a stack buffer to pass in a const callback
argument.

3. Move the PID check into core code.

The PID checks are specific to the module, and there could be much
more conditions like running only in specific backend types (first
example coming into mind), so I want to avoid that kind of knowledge
in the backend.

(1) is, I think, simple and sufficient. How about that?

That sounds fine to do that at the end.. I'm not sure how large this
chunk area added to each InjectionPointEntry should be, though. 128B
stored in each InjectionPointEntry would be more than enough I guess?
Or more like 1024? The in-core module does not need much currently,
but larger is likely better for pluggability.
--
Michael

#46

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Michael Paquier (#45)

1 attachment(s)

Re: Weird test mixup

On Thu, May 09, 2024 at 01:47:45PM +0900, Michael Paquier wrote:

That sounds fine to do that at the end.. I'm not sure how large this
chunk area added to each InjectionPointEntry should be, though. 128B
stored in each InjectionPointEntry would be more than enough I guess?
Or more like 1024? The in-core module does not need much currently,
but larger is likely better for pluggability.

Actually, this is leading to simplifications in the module, giving me
the attached:
4 files changed, 117 insertions(+), 134 deletions(-)

So I like your suggestion. This version closes the race window
between the shmem exit detach in backend A and a concurrent backend B
running a point local to A so as B will never run the local point of
A. However, it can cause a failure in the shmem exit callback of
backend A if backend B does a detach of a point local to A because A
tracks its local points with a static list in its TopMemoryContext, at
least in the attached. The second case could be solved by tracking
the list of local points in the module's InjectionPointSharedState,
but is this case really worth the complications it adds in the module
knowing that the first case would be solid enough? Perhaps not.

Another idea I have for the second case is to make
InjectionPointDetach() a bit "softer", by returning a boolean status
rather than fail if the detach cannot be done, so as the shmem exit
callback can still loop through the entries it has in store. It could
always be possible that a concurrent backend does a detach followed by
an attach with the same name, causing the shmem exit callback to drop
a point it should not, but that's not really a plausible case IMO :)

This stuff can be adjusted in subtle ways depending on the cases you
are most interested in. What do you think?
--
Michael

Attachments:

v3-0001-Add-chunk-area-for-injection-points.patchtext/x-diff; charset=us-asciiDownload

From 27b0ba88d3530c50cb87d1495439372885f0773b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 9 May 2024 16:29:16 +0900
Subject: [PATCH v3] Add chunk area for injection points

---
 src/include/utils/injection_point.h           |   7 +-
 src/backend/utils/misc/injection_point.c      |  45 ++++-
 .../injection_points/injection_points.c       | 189 +++++++-----------
 doc/src/sgml/xfunc.sgml                       |  10 +-
 4 files changed, 117 insertions(+), 134 deletions(-)

diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h
index 55524b568f..a8a11de5f2 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -23,14 +23,17 @@
 /*
  * Typedef for callback function launched by an injection point.
  */
-typedef void (*InjectionPointCallback) (const char *name);
+typedef void (*InjectionPointCallback) (const char *name,
+										const void *private_data);
 
 extern Size InjectionPointShmemSize(void);
 extern void InjectionPointShmemInit(void);
 
 extern void InjectionPointAttach(const char *name,
 								 const char *library,
-								 const char *function);
+								 const char *function,
+								 const void *private_data,
+								 int private_data_size);
 extern void InjectionPointRun(const char *name);
 extern void InjectionPointDetach(const char *name);
 
diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c
index 0cf4d51cac..d46ad8b48f 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -42,6 +42,7 @@ static HTAB *InjectionPointHash;	/* find points from names */
 #define INJ_NAME_MAXLEN		64
 #define INJ_LIB_MAXLEN		128
 #define INJ_FUNC_MAXLEN		128
+#define INJ_PRIVATE_MAXLEN	1024
 
 /* Single injection point stored in InjectionPointHash */
 typedef struct InjectionPointEntry
@@ -49,6 +50,12 @@ typedef struct InjectionPointEntry
 	char		name[INJ_NAME_MAXLEN];	/* hash key */
 	char		library[INJ_LIB_MAXLEN];	/* library */
 	char		function[INJ_FUNC_MAXLEN];	/* function */
+
+	/*
+	 * Opaque data area that modules can use to pass some custom data to
+	 * callbacks, registered when attached.
+	 */
+	char		private_data[INJ_PRIVATE_MAXLEN];
 } InjectionPointEntry;
 
 #define INJECTION_POINT_HASH_INIT_SIZE	16
@@ -61,6 +68,7 @@ typedef struct InjectionPointEntry
 typedef struct InjectionPointCacheEntry
 {
 	char		name[INJ_NAME_MAXLEN];
+	char		private_data[INJ_PRIVATE_MAXLEN];
 	InjectionPointCallback callback;
 } InjectionPointCacheEntry;
 
@@ -73,7 +81,8 @@ static HTAB *InjectionPointCache = NULL;
  */
 static void
 injection_point_cache_add(const char *name,
-						  InjectionPointCallback callback)
+						  InjectionPointCallback callback,
+						  const void *private_data)
 {
 	InjectionPointCacheEntry *entry;
 	bool		found;
@@ -99,6 +108,7 @@ injection_point_cache_add(const char *name,
 	Assert(!found);
 	strlcpy(entry->name, name, sizeof(entry->name));
 	entry->callback = callback;
+	memcpy(entry->private_data, private_data, INJ_PRIVATE_MAXLEN);
 }
 
 /*
@@ -124,11 +134,14 @@ injection_point_cache_remove(const char *name)
  * Retrieve an injection point from the local cache, if any.
  */
 static InjectionPointCallback
-injection_point_cache_get(const char *name)
+injection_point_cache_get(const char *name, const void **private_data)
 {
 	bool		found;
 	InjectionPointCacheEntry *entry;
 
+	if (private_data)
+		*private_data = NULL;
+
 	/* no callback if no cache yet */
 	if (InjectionPointCache == NULL)
 		return NULL;
@@ -137,7 +150,11 @@ injection_point_cache_get(const char *name)
 		hash_search(InjectionPointCache, name, HASH_FIND, &found);
 
 	if (found)
+	{
+		if (private_data)
+			*private_data = entry->private_data;
 		return entry->callback;
+	}
 
 	return NULL;
 }
@@ -186,7 +203,9 @@ InjectionPointShmemInit(void)
 void
 InjectionPointAttach(const char *name,
 					 const char *library,
-					 const char *function)
+					 const char *function,
+					 const void *private_data,
+					 int private_data_size)
 {
 #ifdef USE_INJECTION_POINTS
 	InjectionPointEntry *entry_by_name;
@@ -201,6 +220,9 @@ InjectionPointAttach(const char *name,
 	if (strlen(function) >= INJ_FUNC_MAXLEN)
 		elog(ERROR, "injection point function %s too long (maximum of %u)",
 			 function, INJ_FUNC_MAXLEN);
+	if (private_data_size >= INJ_PRIVATE_MAXLEN)
+		elog(ERROR, "injection point data too long (maximum of %u)",
+			 INJ_PRIVATE_MAXLEN);
 
 	/*
 	 * Allocate and register a new injection point.  A new point should not
@@ -223,6 +245,7 @@ InjectionPointAttach(const char *name,
 	entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0';
 	strlcpy(entry_by_name->function, function, sizeof(entry_by_name->function));
 	entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0';
+	memcpy(entry_by_name->private_data, private_data, private_data_size);
 
 	LWLockRelease(InjectionPointLock);
 
@@ -265,6 +288,7 @@ InjectionPointRun(const char *name)
 	InjectionPointEntry *entry_by_name;
 	bool		found;
 	InjectionPointCallback injection_callback;
+	const void *private_data;
 
 	LWLockAcquire(InjectionPointLock, LW_SHARED);
 	entry_by_name = (InjectionPointEntry *)
@@ -286,10 +310,10 @@ InjectionPointRun(const char *name)
 	 * Check if the callback exists in the local cache, to avoid unnecessary
 	 * external loads.
 	 */
-	injection_callback = injection_point_cache_get(name);
-	if (injection_callback == NULL)
+	if (injection_point_cache_get(name, NULL) == NULL)
 	{
 		char		path[MAXPGPATH];
+		InjectionPointCallback injection_callback_local;
 
 		/* not found in local cache, so load and register */
 		snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path,
@@ -299,18 +323,21 @@ InjectionPointRun(const char *name)
 			elog(ERROR, "could not find library \"%s\" for injection point \"%s\"",
 				 path, name);
 
-		injection_callback = (InjectionPointCallback)
+		injection_callback_local = (InjectionPointCallback)
 			load_external_function(path, entry_by_name->function, false, NULL);
 
-		if (injection_callback == NULL)
+		if (injection_callback_local == NULL)
 			elog(ERROR, "could not find function \"%s\" in library \"%s\" for injection point \"%s\"",
 				 entry_by_name->function, path, name);
 
 		/* add it to the local cache when found */
-		injection_point_cache_add(name, injection_callback);
+		injection_point_cache_add(name, injection_callback_local,
+								  entry_by_name->private_data);
 	}
 
-	injection_callback(name);
+	/* Now loaded, so get it. */
+	injection_callback = injection_point_cache_get(name, &private_data);
+	injection_callback(name, private_data);
 #else
 	elog(ERROR, "Injection points are not supported by this build");
 #endif
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index a74a4a28af..5199d24307 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -19,6 +19,8 @@
 
 #include "fmgr.h"
 #include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/value.h"
 #include "storage/condition_variable.h"
 #include "storage/dsm_registry.h"
 #include "storage/ipc.h"
@@ -26,6 +28,7 @@
 #include "storage/shmem.h"
 #include "utils/builtins.h"
 #include "utils/injection_point.h"
+#include "utils/memutils.h"
 #include "utils/wait_event.h"
 
 PG_MODULE_MAGIC;
@@ -33,24 +36,37 @@ PG_MODULE_MAGIC;
 /* Maximum number of waits usable in injection points at once */
 #define INJ_MAX_WAIT	8
 #define INJ_NAME_MAXLEN	64
-#define	INJ_MAX_CONDITION	4
 
 /*
  * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run.
+ * runtime conditions under which an injection point is allowed to run,
+ * stored as private_data when an injection point is attached, and passed as
+ * argument to the callback.
  *
  * If more types of runtime conditions need to be tracked, this structure
  * should be expanded.
  */
+typedef enum InjectionPointConditionType
+{
+	INJ_CONDITION_INVALID = 0,
+	INJ_CONDITION_PID,
+} InjectionPointConditionType;
+
 typedef struct InjectionPointCondition
 {
-	/* Name of the injection point related to this condition */
-	char		name[INJ_NAME_MAXLEN];
+	/* Type of the condition */
+	InjectionPointConditionType	type;
 
 	/* ID of the process where the injection point is allowed to run */
 	int			pid;
 } InjectionPointCondition;
 
+/*
+ * List of injection points stored in TopMemoryContext attached
+ * locally to this process.
+ */
+static List *inj_list_local = NIL;
+
 /* Shared state information for injection points. */
 typedef struct InjectionPointSharedState
 {
@@ -65,17 +81,17 @@ typedef struct InjectionPointSharedState
 
 	/* Condition variable used for waits and wakeups */
 	ConditionVariable wait_point;
-
-	/* Conditions to run an injection point */
-	InjectionPointCondition conditions[INJ_MAX_CONDITION];
 } InjectionPointSharedState;
 
 /* Pointer to shared-memory state. */
 static InjectionPointSharedState *inj_state = NULL;
 
-extern PGDLLEXPORT void injection_error(const char *name);
-extern PGDLLEXPORT void injection_notice(const char *name);
-extern PGDLLEXPORT void injection_wait(const char *name);
+extern PGDLLEXPORT void injection_error(const char *name,
+										const void *private_data);
+extern PGDLLEXPORT void injection_notice(const char *name,
+										 const void *private_data);
+extern PGDLLEXPORT void injection_wait(const char *name,
+									   const void *private_data);
 
 /* track if injection points attached in this process are linked to it */
 static bool injection_point_local = false;
@@ -91,7 +107,6 @@ injection_point_init_state(void *ptr)
 	SpinLockInit(&state->lock);
 	memset(state->wait_counts, 0, sizeof(state->wait_counts));
 	memset(state->name, 0, sizeof(state->name));
-	memset(state->conditions, 0, sizeof(state->conditions));
 	ConditionVariableInit(&state->wait_point);
 }
 
@@ -116,39 +131,23 @@ injection_init_shmem(void)
  * Check runtime conditions associated to an injection point.
  *
  * Returns true if the named injection point is allowed to run, and false
- * otherwise.  Multiple conditions can be associated to a single injection
- * point, so check them all.
+ * otherwise.
  */
 static bool
-injection_point_allowed(const char *name)
+injection_point_allowed(InjectionPointCondition *condition)
 {
 	bool		result = true;
 
-	if (inj_state == NULL)
-		injection_init_shmem();
-
-	SpinLockAcquire(&inj_state->lock);
-
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	switch (condition->type)
 	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
-
-		if (strcmp(condition->name, name) == 0)
-		{
-			/*
-			 * Check if this injection point is allowed to run in this
-			 * process.
-			 */
+		case INJ_CONDITION_PID:
 			if (MyProcPid != condition->pid)
-			{
 				result = false;
-				break;
-			}
-		}
+			break;
+		case INJ_CONDITION_INVALID:
+			break;
 	}
 
-	SpinLockRelease(&inj_state->lock);
-
 	return result;
 }
 
@@ -159,70 +158,38 @@ injection_point_allowed(const char *name)
 static void
 injection_points_cleanup(int code, Datum arg)
 {
-	char		names[INJ_MAX_CONDITION][INJ_NAME_MAXLEN] = {0};
-	int			count = 0;
+	ListCell   *lc;
 
 	/* Leave if nothing is tracked locally */
 	if (!injection_point_local)
 		return;
 
-	/*
-	 * This is done in three steps: detect the points to detach, detach them
-	 * and release their conditions.
-	 */
-	SpinLockAcquire(&inj_state->lock);
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	/* Detach all the local points */
+	foreach(lc, inj_list_local)
 	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
-
-		if (condition->name[0] == '\0')
-			continue;
-
-		if (condition->pid != MyProcPid)
-			continue;
-
-		/* Extract the point name to detach */
-		strlcpy(names[count], condition->name, INJ_NAME_MAXLEN);
-		count++;
+		char   *name = strVal(lfirst(lc));
+		InjectionPointDetach(name);
 	}
-	SpinLockRelease(&inj_state->lock);
-
-	/* Detach, without holding the spinlock */
-	for (int i = 0; i < count; i++)
-		InjectionPointDetach(names[i]);
-
-	/* Clear all the conditions */
-	SpinLockAcquire(&inj_state->lock);
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
-	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
-
-		if (condition->name[0] == '\0')
-			continue;
-
-		if (condition->pid != MyProcPid)
-			continue;
-
-		condition->name[0] = '\0';
-		condition->pid = 0;
-	}
-	SpinLockRelease(&inj_state->lock);
 }
 
 /* Set of callbacks available to be attached to an injection point. */
 void
-injection_error(const char *name)
+injection_error(const char *name, const void *private_data)
 {
-	if (!injection_point_allowed(name))
+	InjectionPointCondition	*condition = (InjectionPointCondition *) private_data;
+
+	if (!injection_point_allowed(condition))
 		return;
 
 	elog(ERROR, "error triggered for injection point %s", name);
 }
 
 void
-injection_notice(const char *name)
+injection_notice(const char *name, const void *private_data)
 {
-	if (!injection_point_allowed(name))
+	InjectionPointCondition	*condition = (InjectionPointCondition *) private_data;
+
+	if (!injection_point_allowed(condition))
 		return;
 
 	elog(NOTICE, "notice triggered for injection point %s", name);
@@ -230,16 +197,17 @@ injection_notice(const char *name)
 
 /* Wait on a condition variable, awaken by injection_points_wakeup() */
 void
-injection_wait(const char *name)
+injection_wait(const char *name, const void *private_data)
 {
 	uint32		old_wait_counts = 0;
 	int			index = -1;
 	uint32		injection_wait_event = 0;
+	InjectionPointCondition	*condition = (InjectionPointCondition *) private_data;
 
 	if (inj_state == NULL)
 		injection_init_shmem();
 
-	if (!injection_point_allowed(name))
+	if (!injection_point_allowed(condition))
 		return;
 
 	/*
@@ -301,6 +269,7 @@ injection_points_attach(PG_FUNCTION_ARGS)
 	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	char	   *action = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	char	   *function;
+	InjectionPointCondition condition = {0};
 
 	if (strcmp(action, "error") == 0)
 		function = "injection_error";
@@ -311,37 +280,24 @@ injection_points_attach(PG_FUNCTION_ARGS)
 	else
 		elog(ERROR, "incorrect action \"%s\" for injection point creation", action);
 
-	InjectionPointAttach(name, "injection_points", function);
+	if (injection_point_local)
+	{
+		condition.type = INJ_CONDITION_PID;
+		condition.pid = MyProcPid;
+	}
+
+	InjectionPointAttach(name, "injection_points", function, &condition,
+						 sizeof(InjectionPointCondition));
 
 	if (injection_point_local)
 	{
-		int			index = -1;
+		MemoryContext	oldctx;
 
-		/*
-		 * Register runtime condition to link this injection point to the
-		 * current process.
-		 */
-		SpinLockAcquire(&inj_state->lock);
-		for (int i = 0; i < INJ_MAX_CONDITION; i++)
-		{
-			InjectionPointCondition *condition = &inj_state->conditions[i];
-
-			if (condition->name[0] == '\0')
-			{
-				index = i;
-				strlcpy(condition->name, name, INJ_NAME_MAXLEN);
-				condition->pid = MyProcPid;
-				break;
-			}
-		}
-		SpinLockRelease(&inj_state->lock);
-
-		if (index < 0)
-			elog(FATAL,
-				 "could not find free slot for condition of injection point %s",
-				 name);
+		/* Local injection point, so track it for automated cleanup */
+		oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		inj_list_local = lappend(inj_list_local, makeString(pstrdup(name)));
+		MemoryContextSwitchTo(oldctx);
 	}
-
 	PG_RETURN_VOID();
 }
 
@@ -432,22 +388,15 @@ injection_points_detach(PG_FUNCTION_ARGS)
 
 	InjectionPointDetach(name);
 
-	if (inj_state == NULL)
-		injection_init_shmem();
-
-	/* Clean up any conditions associated to this injection point */
-	SpinLockAcquire(&inj_state->lock);
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	/* Remove point from local list, if required */
+	if (inj_list_local != NIL)
 	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
+		MemoryContext	oldctx;
 
-		if (strcmp(condition->name, name) == 0)
-		{
-			condition->pid = 0;
-			condition->name[0] = '\0';
-		}
+		oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		inj_list_local = list_delete(inj_list_local, makeString(name));
+		MemoryContextSwitchTo(oldctx);
 	}
-	SpinLockRelease(&inj_state->lock);
 
 	PG_RETURN_VOID();
 }
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 7d053698a2..1a6b5c97b6 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3624,12 +3624,16 @@ INJECTION_POINT(name);
 <programlisting>
 extern void InjectionPointAttach(const char *name,
                                  const char *library,
-                                 const char *function);
+                                 const char *function,
+                                 const void *private_data,
+                                 int private_data_size);
 </programlisting>
 
      <literal>name</literal> is the name of the injection point, which when
      reached during execution will execute the <literal>function</literal>
-     loaded from <literal>library</literal>.
+     loaded from <literal>library</literal>. <literal>private_data</literal>
+     is a private area of data of size <literal>private_data_size</literal>
+     given as argument to the callback when executed,
     </para>
 
     <para>
@@ -3637,7 +3641,7 @@ extern void InjectionPointAttach(const char *name,
      <literal>InjectionPointCallback</literal>:
 <programlisting>
 static void
-custom_injection_callback(const char *name)
+custom_injection_callback(const char *name, const void *private_data)
 {
     elog(NOTICE, "%s: executed custom callback", name);
 }
-- 
2.43.0

#47

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#46)

Re: Weird test mixup

On Thu, May 09, 2024 at 04:40:43PM +0900, Michael Paquier wrote:

So I like your suggestion. This version closes the race window
between the shmem exit detach in backend A and a concurrent backend B
running a point local to A so as B will never run the local point of
A. However, it can cause a failure in the shmem exit callback of
backend A if backend B does a detach of a point local to A because A
tracks its local points with a static list in its TopMemoryContext, at
least in the attached. The second case could be solved by tracking
the list of local points in the module's InjectionPointSharedState,
but is this case really worth the complications it adds in the module
knowing that the first case would be solid enough? Perhaps not.

Another idea I have for the second case is to make
InjectionPointDetach() a bit "softer", by returning a boolean status
rather than fail if the detach cannot be done, so as the shmem exit
callback can still loop through the entries it has in store.

The return-bool approach sounds fine. Up to you whether to do in this patch,
else I'll do it when I add the test.

It could
always be possible that a concurrent backend does a detach followed by
an attach with the same name, causing the shmem exit callback to drop
a point it should not, but that's not really a plausible case IMO :)

Agreed. It's reasonable to expect test cases to serialize backend exits,
attach calls, and detach calls. If we need to fix that later, we can use
attachment serial numbers.

--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c

+typedef enum InjectionPointConditionType
+{
+	INJ_CONDITION_INVALID = 0,

I'd name this INJ_CONDITION_UNCONDITIONAL or INJ_CONDITION_ALWAYS. INVALID
sounds like a can't-happen event or an injection point that never runs.
Otherwise, the patch looks good and makes src/test/modules/gin safe for
installcheck. Thanks.

#48

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#47)

1 attachment(s)

Re: Weird test mixup

On Thu, May 09, 2024 at 04:39:00PM -0700, Noah Misch wrote:

Thanks for the feedback.

The return-bool approach sounds fine. Up to you whether to do in this patch,
else I'll do it when I add the test.

I see no reason to not change the signature of the routine now if we
know that we're going to do it anyway in the future. I was shortly
wondering if doing the same for InjectionpointAttach() would make
sense, but it has more error states, so I'm not really tempted without
an actual reason (cannot think of a case where I'd want to put more
control into a module after a failed attach).

It could
always be possible that a concurrent backend does a detach followed by
an attach with the same name, causing the shmem exit callback to drop
a point it should not, but that's not really a plausible case IMO :)

Agreed. It's reasonable to expect test cases to serialize backend exits,
attach calls, and detach calls. If we need to fix that later, we can use
attachment serial numbers.

Okay by me.

I'd name this INJ_CONDITION_UNCONDITIONAL or INJ_CONDITION_ALWAYS. INVALID
sounds like a can't-happen event or an injection point that never runs.
Otherwise, the patch looks good and makes src/test/modules/gin safe for
installcheck. Thanks.

INJ_CONDITION_ALWAYS sounds like a good compromise here.

Attached is an updated patch for now, indented with a happy CI. I am
still planning to look at that a second time on Monday with a fresher
mind, in case I'm missing something now.
--
Michael

Attachments:

v4-0001-Add-chunk-area-for-injection-points.patchtext/x-diff; charset=us-asciiDownload

From 233e710c8fc2242bdd4e40d5a20f8de2828765b7 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 10 May 2024 09:40:42 +0900
Subject: [PATCH v4] Add chunk area for injection points

---
 src/include/utils/injection_point.h           |   9 +-
 src/backend/utils/misc/injection_point.c      |  53 ++++-
 .../expected/injection_points.out             |   2 +-
 .../injection_points/injection_points.c       | 191 +++++++-----------
 doc/src/sgml/xfunc.sgml                       |  14 +-
 src/tools/pgindent/typedefs.list              |   1 +
 6 files changed, 131 insertions(+), 139 deletions(-)

diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h
index 55524b568f..a61d5d4439 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -23,15 +23,18 @@
 /*
  * Typedef for callback function launched by an injection point.
  */
-typedef void (*InjectionPointCallback) (const char *name);
+typedef void (*InjectionPointCallback) (const char *name,
+										const void *private_data);
 
 extern Size InjectionPointShmemSize(void);
 extern void InjectionPointShmemInit(void);
 
 extern void InjectionPointAttach(const char *name,
 								 const char *library,
-								 const char *function);
+								 const char *function,
+								 const void *private_data,
+								 int private_data_size);
 extern void InjectionPointRun(const char *name);
-extern void InjectionPointDetach(const char *name);
+extern bool InjectionPointDetach(const char *name);
 
 #endif							/* INJECTION_POINT_H */
diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c
index 0cf4d51cac..d5a8726644 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -42,6 +42,7 @@ static HTAB *InjectionPointHash;	/* find points from names */
 #define INJ_NAME_MAXLEN		64
 #define INJ_LIB_MAXLEN		128
 #define INJ_FUNC_MAXLEN		128
+#define INJ_PRIVATE_MAXLEN	1024
 
 /* Single injection point stored in InjectionPointHash */
 typedef struct InjectionPointEntry
@@ -49,6 +50,12 @@ typedef struct InjectionPointEntry
 	char		name[INJ_NAME_MAXLEN];	/* hash key */
 	char		library[INJ_LIB_MAXLEN];	/* library */
 	char		function[INJ_FUNC_MAXLEN];	/* function */
+
+	/*
+	 * Opaque data area that modules can use to pass some custom data to
+	 * callbacks, registered when attached.
+	 */
+	char		private_data[INJ_PRIVATE_MAXLEN];
 } InjectionPointEntry;
 
 #define INJECTION_POINT_HASH_INIT_SIZE	16
@@ -61,6 +68,7 @@ typedef struct InjectionPointEntry
 typedef struct InjectionPointCacheEntry
 {
 	char		name[INJ_NAME_MAXLEN];
+	char		private_data[INJ_PRIVATE_MAXLEN];
 	InjectionPointCallback callback;
 } InjectionPointCacheEntry;
 
@@ -73,7 +81,8 @@ static HTAB *InjectionPointCache = NULL;
  */
 static void
 injection_point_cache_add(const char *name,
-						  InjectionPointCallback callback)
+						  InjectionPointCallback callback,
+						  const void *private_data)
 {
 	InjectionPointCacheEntry *entry;
 	bool		found;
@@ -99,6 +108,7 @@ injection_point_cache_add(const char *name,
 	Assert(!found);
 	strlcpy(entry->name, name, sizeof(entry->name));
 	entry->callback = callback;
+	memcpy(entry->private_data, private_data, INJ_PRIVATE_MAXLEN);
 }
 
 /*
@@ -124,11 +134,14 @@ injection_point_cache_remove(const char *name)
  * Retrieve an injection point from the local cache, if any.
  */
 static InjectionPointCallback
-injection_point_cache_get(const char *name)
+injection_point_cache_get(const char *name, const void **private_data)
 {
 	bool		found;
 	InjectionPointCacheEntry *entry;
 
+	if (private_data)
+		*private_data = NULL;
+
 	/* no callback if no cache yet */
 	if (InjectionPointCache == NULL)
 		return NULL;
@@ -137,7 +150,11 @@ injection_point_cache_get(const char *name)
 		hash_search(InjectionPointCache, name, HASH_FIND, &found);
 
 	if (found)
+	{
+		if (private_data)
+			*private_data = entry->private_data;
 		return entry->callback;
+	}
 
 	return NULL;
 }
@@ -186,7 +203,9 @@ InjectionPointShmemInit(void)
 void
 InjectionPointAttach(const char *name,
 					 const char *library,
-					 const char *function)
+					 const char *function,
+					 const void *private_data,
+					 int private_data_size)
 {
 #ifdef USE_INJECTION_POINTS
 	InjectionPointEntry *entry_by_name;
@@ -201,6 +220,9 @@ InjectionPointAttach(const char *name,
 	if (strlen(function) >= INJ_FUNC_MAXLEN)
 		elog(ERROR, "injection point function %s too long (maximum of %u)",
 			 function, INJ_FUNC_MAXLEN);
+	if (private_data_size >= INJ_PRIVATE_MAXLEN)
+		elog(ERROR, "injection point data too long (maximum of %u)",
+			 INJ_PRIVATE_MAXLEN);
 
 	/*
 	 * Allocate and register a new injection point.  A new point should not
@@ -223,6 +245,7 @@ InjectionPointAttach(const char *name,
 	entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0';
 	strlcpy(entry_by_name->function, function, sizeof(entry_by_name->function));
 	entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0';
+	memcpy(entry_by_name->private_data, private_data, private_data_size);
 
 	LWLockRelease(InjectionPointLock);
 
@@ -233,8 +256,10 @@ InjectionPointAttach(const char *name,
 
 /*
  * Detach an existing injection point.
+ *
+ * Returns true if the injection point was detached, false otherwise.
  */
-void
+bool
 InjectionPointDetach(const char *name)
 {
 #ifdef USE_INJECTION_POINTS
@@ -245,10 +270,12 @@ InjectionPointDetach(const char *name)
 	LWLockRelease(InjectionPointLock);
 
 	if (!found)
-		elog(ERROR, "injection point \"%s\" not found", name);
+		return false;
 
+	return true;
 #else
 	elog(ERROR, "Injection points are not supported by this build");
+	return true;				/* silence compiler */
 #endif
 }
 
@@ -265,6 +292,7 @@ InjectionPointRun(const char *name)
 	InjectionPointEntry *entry_by_name;
 	bool		found;
 	InjectionPointCallback injection_callback;
+	const void *private_data;
 
 	LWLockAcquire(InjectionPointLock, LW_SHARED);
 	entry_by_name = (InjectionPointEntry *)
@@ -286,10 +314,10 @@ InjectionPointRun(const char *name)
 	 * Check if the callback exists in the local cache, to avoid unnecessary
 	 * external loads.
 	 */
-	injection_callback = injection_point_cache_get(name);
-	if (injection_callback == NULL)
+	if (injection_point_cache_get(name, NULL) == NULL)
 	{
 		char		path[MAXPGPATH];
+		InjectionPointCallback injection_callback_local;
 
 		/* not found in local cache, so load and register */
 		snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path,
@@ -299,18 +327,21 @@ InjectionPointRun(const char *name)
 			elog(ERROR, "could not find library \"%s\" for injection point \"%s\"",
 				 path, name);
 
-		injection_callback = (InjectionPointCallback)
+		injection_callback_local = (InjectionPointCallback)
 			load_external_function(path, entry_by_name->function, false, NULL);
 
-		if (injection_callback == NULL)
+		if (injection_callback_local == NULL)
 			elog(ERROR, "could not find function \"%s\" in library \"%s\" for injection point \"%s\"",
 				 entry_by_name->function, path, name);
 
 		/* add it to the local cache when found */
-		injection_point_cache_add(name, injection_callback);
+		injection_point_cache_add(name, injection_callback_local,
+								  entry_by_name->private_data);
 	}
 
-	injection_callback(name);
+	/* Now loaded, so get it. */
+	injection_callback = injection_point_cache_get(name, &private_data);
+	injection_callback(name, private_data);
 #else
 	elog(ERROR, "Injection points are not supported by this build");
 #endif
diff --git a/src/test/modules/injection_points/expected/injection_points.out b/src/test/modules/injection_points/expected/injection_points.out
index 1341d66c92..dd9db06e10 100644
--- a/src/test/modules/injection_points/expected/injection_points.out
+++ b/src/test/modules/injection_points/expected/injection_points.out
@@ -114,7 +114,7 @@ NOTICE:  notice triggered for injection point TestInjectionLog2
 (1 row)
 
 SELECT injection_points_detach('TestInjectionLog'); -- fails
-ERROR:  injection point "TestInjectionLog" not found
+ERROR:  could not detach injection point "TestInjectionLog"
 SELECT injection_points_run('TestInjectionLog2'); -- notice
 NOTICE:  notice triggered for injection point TestInjectionLog2
  injection_points_run 
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index a74a4a28af..5c44625d1d 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -19,6 +19,8 @@
 
 #include "fmgr.h"
 #include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/value.h"
 #include "storage/condition_variable.h"
 #include "storage/dsm_registry.h"
 #include "storage/ipc.h"
@@ -26,6 +28,7 @@
 #include "storage/shmem.h"
 #include "utils/builtins.h"
 #include "utils/injection_point.h"
+#include "utils/memutils.h"
 #include "utils/wait_event.h"
 
 PG_MODULE_MAGIC;
@@ -33,24 +36,37 @@ PG_MODULE_MAGIC;
 /* Maximum number of waits usable in injection points at once */
 #define INJ_MAX_WAIT	8
 #define INJ_NAME_MAXLEN	64
-#define	INJ_MAX_CONDITION	4
 
 /*
  * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run.
+ * runtime conditions under which an injection point is allowed to run,
+ * stored as private_data when an injection point is attached, and passed as
+ * argument to the callback.
  *
  * If more types of runtime conditions need to be tracked, this structure
  * should be expanded.
  */
+typedef enum InjectionPointConditionType
+{
+	INJ_CONDITION_ALWAYS = 0,	/* always run */
+	INJ_CONDITION_PID,			/* PID restriction */
+} InjectionPointConditionType;
+
 typedef struct InjectionPointCondition
 {
-	/* Name of the injection point related to this condition */
-	char		name[INJ_NAME_MAXLEN];
+	/* Type of the condition */
+	InjectionPointConditionType type;
 
 	/* ID of the process where the injection point is allowed to run */
 	int			pid;
 } InjectionPointCondition;
 
+/*
+ * List of injection points stored in TopMemoryContext attached
+ * locally to this process.
+ */
+static List *inj_list_local = NIL;
+
 /* Shared state information for injection points. */
 typedef struct InjectionPointSharedState
 {
@@ -65,17 +81,17 @@ typedef struct InjectionPointSharedState
 
 	/* Condition variable used for waits and wakeups */
 	ConditionVariable wait_point;
-
-	/* Conditions to run an injection point */
-	InjectionPointCondition conditions[INJ_MAX_CONDITION];
 } InjectionPointSharedState;
 
 /* Pointer to shared-memory state. */
 static InjectionPointSharedState *inj_state = NULL;
 
-extern PGDLLEXPORT void injection_error(const char *name);
-extern PGDLLEXPORT void injection_notice(const char *name);
-extern PGDLLEXPORT void injection_wait(const char *name);
+extern PGDLLEXPORT void injection_error(const char *name,
+										const void *private_data);
+extern PGDLLEXPORT void injection_notice(const char *name,
+										 const void *private_data);
+extern PGDLLEXPORT void injection_wait(const char *name,
+									   const void *private_data);
 
 /* track if injection points attached in this process are linked to it */
 static bool injection_point_local = false;
@@ -91,7 +107,6 @@ injection_point_init_state(void *ptr)
 	SpinLockInit(&state->lock);
 	memset(state->wait_counts, 0, sizeof(state->wait_counts));
 	memset(state->name, 0, sizeof(state->name));
-	memset(state->conditions, 0, sizeof(state->conditions));
 	ConditionVariableInit(&state->wait_point);
 }
 
@@ -116,39 +131,23 @@ injection_init_shmem(void)
  * Check runtime conditions associated to an injection point.
  *
  * Returns true if the named injection point is allowed to run, and false
- * otherwise.  Multiple conditions can be associated to a single injection
- * point, so check them all.
+ * otherwise.
  */
 static bool
-injection_point_allowed(const char *name)
+injection_point_allowed(InjectionPointCondition *condition)
 {
 	bool		result = true;
 
-	if (inj_state == NULL)
-		injection_init_shmem();
-
-	SpinLockAcquire(&inj_state->lock);
-
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	switch (condition->type)
 	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
-
-		if (strcmp(condition->name, name) == 0)
-		{
-			/*
-			 * Check if this injection point is allowed to run in this
-			 * process.
-			 */
+		case INJ_CONDITION_PID:
 			if (MyProcPid != condition->pid)
-			{
 				result = false;
-				break;
-			}
-		}
+			break;
+		case INJ_CONDITION_ALWAYS:
+			break;
 	}
 
-	SpinLockRelease(&inj_state->lock);
-
 	return result;
 }
 
@@ -159,70 +158,39 @@ injection_point_allowed(const char *name)
 static void
 injection_points_cleanup(int code, Datum arg)
 {
-	char		names[INJ_MAX_CONDITION][INJ_NAME_MAXLEN] = {0};
-	int			count = 0;
+	ListCell   *lc;
 
 	/* Leave if nothing is tracked locally */
 	if (!injection_point_local)
 		return;
 
-	/*
-	 * This is done in three steps: detect the points to detach, detach them
-	 * and release their conditions.
-	 */
-	SpinLockAcquire(&inj_state->lock);
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	/* Detach all the local points */
+	foreach(lc, inj_list_local)
 	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
+		char	   *name = strVal(lfirst(lc));
 
-		if (condition->name[0] == '\0')
-			continue;
-
-		if (condition->pid != MyProcPid)
-			continue;
-
-		/* Extract the point name to detach */
-		strlcpy(names[count], condition->name, INJ_NAME_MAXLEN);
-		count++;
+		(void) InjectionPointDetach(name);
 	}
-	SpinLockRelease(&inj_state->lock);
-
-	/* Detach, without holding the spinlock */
-	for (int i = 0; i < count; i++)
-		InjectionPointDetach(names[i]);
-
-	/* Clear all the conditions */
-	SpinLockAcquire(&inj_state->lock);
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
-	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
-
-		if (condition->name[0] == '\0')
-			continue;
-
-		if (condition->pid != MyProcPid)
-			continue;
-
-		condition->name[0] = '\0';
-		condition->pid = 0;
-	}
-	SpinLockRelease(&inj_state->lock);
 }
 
 /* Set of callbacks available to be attached to an injection point. */
 void
-injection_error(const char *name)
+injection_error(const char *name, const void *private_data)
 {
-	if (!injection_point_allowed(name))
+	InjectionPointCondition *condition = (InjectionPointCondition *) private_data;
+
+	if (!injection_point_allowed(condition))
 		return;
 
 	elog(ERROR, "error triggered for injection point %s", name);
 }
 
 void
-injection_notice(const char *name)
+injection_notice(const char *name, const void *private_data)
 {
-	if (!injection_point_allowed(name))
+	InjectionPointCondition *condition = (InjectionPointCondition *) private_data;
+
+	if (!injection_point_allowed(condition))
 		return;
 
 	elog(NOTICE, "notice triggered for injection point %s", name);
@@ -230,16 +198,17 @@ injection_notice(const char *name)
 
 /* Wait on a condition variable, awaken by injection_points_wakeup() */
 void
-injection_wait(const char *name)
+injection_wait(const char *name, const void *private_data)
 {
 	uint32		old_wait_counts = 0;
 	int			index = -1;
 	uint32		injection_wait_event = 0;
+	InjectionPointCondition *condition = (InjectionPointCondition *) private_data;
 
 	if (inj_state == NULL)
 		injection_init_shmem();
 
-	if (!injection_point_allowed(name))
+	if (!injection_point_allowed(condition))
 		return;
 
 	/*
@@ -301,6 +270,7 @@ injection_points_attach(PG_FUNCTION_ARGS)
 	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	char	   *action = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	char	   *function;
+	InjectionPointCondition condition = {0};
 
 	if (strcmp(action, "error") == 0)
 		function = "injection_error";
@@ -311,37 +281,24 @@ injection_points_attach(PG_FUNCTION_ARGS)
 	else
 		elog(ERROR, "incorrect action \"%s\" for injection point creation", action);
 
-	InjectionPointAttach(name, "injection_points", function);
+	if (injection_point_local)
+	{
+		condition.type = INJ_CONDITION_PID;
+		condition.pid = MyProcPid;
+	}
+
+	InjectionPointAttach(name, "injection_points", function, &condition,
+						 sizeof(InjectionPointCondition));
 
 	if (injection_point_local)
 	{
-		int			index = -1;
+		MemoryContext oldctx;
 
-		/*
-		 * Register runtime condition to link this injection point to the
-		 * current process.
-		 */
-		SpinLockAcquire(&inj_state->lock);
-		for (int i = 0; i < INJ_MAX_CONDITION; i++)
-		{
-			InjectionPointCondition *condition = &inj_state->conditions[i];
-
-			if (condition->name[0] == '\0')
-			{
-				index = i;
-				strlcpy(condition->name, name, INJ_NAME_MAXLEN);
-				condition->pid = MyProcPid;
-				break;
-			}
-		}
-		SpinLockRelease(&inj_state->lock);
-
-		if (index < 0)
-			elog(FATAL,
-				 "could not find free slot for condition of injection point %s",
-				 name);
+		/* Local injection point, so track it for automated cleanup */
+		oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		inj_list_local = lappend(inj_list_local, makeString(pstrdup(name)));
+		MemoryContextSwitchTo(oldctx);
 	}
-
 	PG_RETURN_VOID();
 }
 
@@ -430,24 +387,18 @@ injection_points_detach(PG_FUNCTION_ARGS)
 {
 	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
-	InjectionPointDetach(name);
+	if (!InjectionPointDetach(name))
+		elog(ERROR, "could not detach injection point \"%s\"", name);
 
-	if (inj_state == NULL)
-		injection_init_shmem();
-
-	/* Clean up any conditions associated to this injection point */
-	SpinLockAcquire(&inj_state->lock);
-	for (int i = 0; i < INJ_MAX_CONDITION; i++)
+	/* Remove point from local list, if required */
+	if (inj_list_local != NIL)
 	{
-		InjectionPointCondition *condition = &inj_state->conditions[i];
+		MemoryContext oldctx;
 
-		if (strcmp(condition->name, name) == 0)
-		{
-			condition->pid = 0;
-			condition->name[0] = '\0';
-		}
+		oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		inj_list_local = list_delete(inj_list_local, makeString(name));
+		MemoryContextSwitchTo(oldctx);
 	}
-	SpinLockRelease(&inj_state->lock);
 
 	PG_RETURN_VOID();
 }
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 7d053698a2..f5cb4b2212 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3624,12 +3624,16 @@ INJECTION_POINT(name);
 <programlisting>
 extern void InjectionPointAttach(const char *name,
                                  const char *library,
-                                 const char *function);
+                                 const char *function,
+                                 const void *private_data,
+                                 int private_data_size);
 </programlisting>
 
      <literal>name</literal> is the name of the injection point, which when
      reached during execution will execute the <literal>function</literal>
-     loaded from <literal>library</literal>.
+     loaded from <literal>library</literal>. <literal>private_data</literal>
+     is a private area of data of size <literal>private_data_size</literal>
+     given as argument to the callback when executed,
     </para>
 
     <para>
@@ -3637,7 +3641,7 @@ extern void InjectionPointAttach(const char *name,
      <literal>InjectionPointCallback</literal>:
 <programlisting>
 static void
-custom_injection_callback(const char *name)
+custom_injection_callback(const char *name, const void *private_data)
 {
     elog(NOTICE, "%s: executed custom callback", name);
 }
@@ -3650,8 +3654,10 @@ custom_injection_callback(const char *name)
     <para>
      Optionally, it is possible to detach an injection point by calling:
 <programlisting>
-extern void InjectionPointDetach(const char *name);
+extern bool InjectionPointDetach(const char *name);
 </programlisting>
+     On success, <literal>true</literal> is returned, <literal>false</literal>
+     otherwise.
     </para>
 
     <para>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2311f82d81..34ec87a85e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1219,6 +1219,7 @@ InitializeDSMForeignScan_function
 InitializeWorkerForeignScan_function
 InjectionPointCacheEntry
 InjectionPointCondition
+InjectionPointConditionType
 InjectionPointEntry
 InjectionPointSharedState
 InlineCodeBlock
-- 
2.43.0

#49

Andrey M. Borodin

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Michael Paquier (#48)

Re: Weird test mixup

On 10 May 2024, at 06:04, Michael Paquier <michael@paquier.xyz> wrote:

Attached is an updated patch for now

Can you, please, add some more comments regarding purpose of private data?
I somewhat lost understanding of the discussion for a week or so. And I hoped to grasp the idea of private_data from resulting code. But I cannot do so from current patch version...

I see that you store condition in private_data. So "private" means that this is a data specific to extension, do I understand it right?

As long as I started anyway, I also want to ask some more stupid questions:
1. Where is the border between responsibility of an extension and the core part? I mean can we define in simple words what functionality must be in extension?
2. If we have some concurrency issues, why can't we just protect everything with one giant LWLock\SpinLock. We have some locking model instead of serializing access from enter until exit.

Most probably, this was discussed somewhere, but I could not find it.

Thanks!

Best regards, Andrey Borodin.

#50

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Michael Paquier (#48)

Re: Weird test mixup

On Fri, May 10, 2024 at 10:04:17AM +0900, Michael Paquier wrote:

Attached is an updated patch for now, indented with a happy CI. I am
still planning to look at that a second time on Monday with a fresher
mind, in case I'm missing something now.

This looks correct, and it works well in my tests. Thanks.

#51

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Noah Misch (#50)

Re: Weird test mixup

On Sun, May 12, 2024 at 10:48:51AM -0700, Noah Misch wrote:

This looks correct, and it works well in my tests. Thanks.

Thanks for looking. While looking at it yesterday I've decided to
split the change into two commits, one for the infra and one for the
module. While doing so, I've noticed that the case of a private area
passed as NULL was not completely correct as memcpy would be
undefined.

The open item has been marked as fixed.
--
Michael

#52

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Andrey M. Borodin (#49)

Re: Weird test mixup

On Sat, May 11, 2024 at 11:45:33AM +0500, Andrey M. Borodin wrote:

I see that you store condition in private_data. So "private" means
that this is a data specific to extension, do I understand it right?

Yes, it is possible to pass down some custom data to the callbacks
registered, generated in a module. One example would be more complex
condition grammar, like a JSON-based thing. I don't really see the
need for this amount of complexity in the tree yet, but one could do
that outside the tree easily.

As long as I started anyway, I also want to ask some more stupid
questions:
1. Where is the border between responsibility of an extension and
the core part? I mean can we define in simple words what
functionality must be in extension?

Rule 0 I've been using here: keep the footprint on the backend as
simple as possible. These have as absolute minimum requirement:
- A function name.
- A library name.
- A point name.

The private area contents and size are added to address the
concurrency cases with runtime checks. I didn't see a strong use for
that first, but Noah has been convincing enough with his use cases and
the fact that the race between detach and run was not completely
closed because we lacked consistency with the shmem hash table lookup.

2. If we have some concurrency issues, why can't we just protect
everything with one giant LWLock\SpinLock. We have some locking
model instead of serializing access from enter until exit.

This reduces the test infrastructure flexibility, because one may want
to attach or detach injection points while a point is running. So it
is by design that the LWLock protecting the shmem hash table is not hold
when a point is running. This has been covered a bit upthread, and I
want to be able to do that as well.
--
Michael