Adding facility for injection points (or probe points?) for more advanced tests
Hi all,
I don't remember how many times in the last few years when I've had to
hack the backend to produce a test case that involves a weird race
condition across multiple processes running in the backend, to be able
to prove a point or just test a fix (one recent case: 2b8e5273e949).
Usually, I come to hardcoding stuff for the following situations:
- Trigger a PANIC, to force recovery.
- A FATAL, to take down a session, or just an ERROR.
- palloc() failure injection.
- Sleep to slow down a code path.
- Pause and release with condition variable.
And, while that's helpful to prove a point on a thread, nothing comes
out of it in terms of regression test coverage in the tree because
these tests are usually too slow and expensive, as they usually rely
on hardcoded timeouts. So that's pretty much attempting to emulate
what one would do with a debugger in a predictable way, without the
manual steps because human hands don't scale well.
The reason behind that is of course more advanced testing, to be able
to expand coverage when we have weird and non-deterministic race
issues to deal with, and the code becoming more complex every year
makes that even harder. Fault and failure injection in specific paths
comes into mind, additionally, particularly if you manage complex
projects based on Postgres.
So, please find attached a patch set that introduces an in-core
facility to be able to set what I'm calling here an "injection point",
that consists of being able to register in shared memory a callback
that can be run within a defined location of the code. It means that
it is not possible to trigger a callback before shared memory is set,
but I've faced far more the case where I wanted to trigger something
after shmem is set anyway. Persisting an injection point across
restarts is also possible by adding some through an extension's shmem
hook, as we do for custom LWLocks for example, as long as a library is
loaded.
This will remind a bit of what Alexander Korotkov has proposed here:
/messages/by-id/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com
Also, this is much closee to what Craig Ringer is mentioning here,
where it is named probe points, but I am using a minimal design that
allows to achieve the same:
/messages/by-id/CAPpHfdsn-hzneYNbX4qcY5rnwr-BA1ogOCZ4TQCKQAw9qa48kA@mail.gmail.com
A difference is that I don't really see a point in passing to the
callback triggered an area of data coming from the hash table itself,
as at the end a callback could just refer to an area in shared memory
or a static set of variables depending on what it wants, with one or
more injection points (say a location to set a state, and a second to
check it). So, at the end, the problem comes down in my opinion to
two things:
- Possibility to trigger a condition defined by some custom code, in
the backend (core code or even out-of-core).
- Possibility to define a location in the code where a named point
would be checked.
0001 introduces three APIs to create, run, and drop injection points:
+extern void InjectionPointCreate(const char *name,
+ InjectionPointCallback callback);
+extern void InjectionPointRun(const char *name);
+extern void InjectionPointDrop(const char *name);
Then one just needs to add a macro like that to trigger the callback
registered in the code to test:
INJECTION_POINT_RUN("String");
So the footprint in the core tree is not zero, but it is as minimal as
it can be.
I have added some documentation to explain that, as well. I am not
wedded to the name proposed in the patch, so if you feel there is
better, feel free to propose ideas.
This facility is hidden behind a specific configure/Meson switch,
making it a no-op by default:
--enable-injection-points
-Dinjection_points={ true | false }
0002 is a test module to test these routines, that I have kept a
maximum simple to ease review of the basics proposed here. This could
be extended further to propose more default modes with TAP tests on
its own, as I don't see a real point in having the SQL bits or some
common callbacks (like for the PANIC or the FATAL cases) in core.
Thoughts and comments are welcome.
--
Michael
On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote:
Hi all,
I don't remember how many times in the last few years when I've had to
hack the backend to produce a test case that involves a weird race
condition across multiple processes running in the backend, to be able
to prove a point or just test a fix (one recent case: 2b8e5273e949).
Usually, I come to hardcoding stuff for the following situations:
- Trigger a PANIC, to force recovery.
- A FATAL, to take down a session, or just an ERROR.
- palloc() failure injection.
- Sleep to slow down a code path.
- Pause and release with condition variable.
+1 for the feature.
TWIMW, here[1] is an interesting talk from pgconf.in 2020 on the similar
topic.
1] https://pgconf.in/conferences/pgconfin2020/program/proposals/101
Regards,
Amul Sul
On Wed, Oct 25, 2023 at 10:06:17AM +0530, Amul Sul wrote:
+1 for the feature.
TWIMW, here[1] is an interesting talk from pgconf.in 2020 on the similar
topic.1] https://pgconf.in/conferences/pgconfin2020/program/proposals/101
Right, this uses a shared hash table. There is a patch from 2019 that
summarizes this presentation as well:
/messages/by-id/CANXE4TdxdESX1jKw48xet-5GvBFVSq=4cgNeioTQff372KO45A@mail.gmail.com
A different idea is that this patch could leverage a bgworker instead
of having a footprint in the postmaster. FWIW, I think that my patch
is more flexible than the modes added by faultinjector.h (see 0001),
because the actions that can be taken should not be limited by the
core code: the point registered could just use what it wants as
callback, so an extension could register a custom thing as well.
--
Michael
Hi,
On Wed, 25 Oct 2023 at 07:13, Michael Paquier <michael@paquier.xyz> wrote:
Hi all,
I don't remember how many times in the last few years when I've had to
hack the backend to produce a test case that involves a weird race
condition across multiple processes running in the backend, to be able
to prove a point or just test a fix (one recent case: 2b8e5273e949).
Usually, I come to hardcoding stuff for the following situations:
- Trigger a PANIC, to force recovery.
- A FATAL, to take down a session, or just an ERROR.
- palloc() failure injection.
- Sleep to slow down a code path.
- Pause and release with condition variable.
I liked the idea; thanks for working on this!
What do you think about creating a function for updating the already
created injection point's callback or name (mostly callback)? For now,
you need to drop and recreate the injection point to change the
callback or the name.
Here is my code correctness review:
diff --git a/meson_options.txt b/meson_options.txt
+option('injection_points', type: 'boolean', value: true,
+ description: 'Enable injection points')
+
It is enabled by default while building with meson.
diff --git a/src/backend/utils/misc/injection_point.c
b/src/backend/utils/misc/injection_point.c
+ LWLockRelease(InjectionPointLock);
+
+ /* If not found, do nothing? */
+ if (!found)
+ return;
It would be good to log a warning message here.
I tried to compile that with -Dwerror=true -Dinjection_points=false
and got some errors (warnings):
injection_point.c: In function ‘InjectionPointShmemSize’:
injection_point.c:59:1: error: control reaches end of non-void
function [-Werror=return-type]
injection_point.c: At top level:
injection_point.c:32:14: error: ‘InjectionPointHashByName’ defined but
not used [-Werror=unused-variable]
test_injection_points.c: In function ‘test_injection_points_run’:
test_injection_points.c:69:21: error: unused variable ‘name’
[-Werror=unused-variable]
The test_injection_points test runs and passes although I set
-Dinjection_points=false. That could be misleading, IMO the test
should be skipped if Postgres is not compiled with the injection
points.
Regards,
Nazir Bilal Yavuz
Microsoft
On Mon, Nov 06, 2023 at 10:28:14PM +0300, Nazir Bilal Yavuz wrote:
I liked the idea; thanks for working on this!
Thanks for the review.
What do you think about creating a function for updating the already
created injection point's callback or name (mostly callback)? For now,
you need to drop and recreate the injection point to change the
callback or the name.
I am not sure if that's worth the addition. TBH, all the code I've
seen that would benefit from these APIs just set up a cluster,
register a few injection points with a module, and then run a set of
tests. They can also remove points. So I'm just aiming for simplest
for the moment.
Here is my code correctness review:
diff --git a/meson_options.txt b/meson_options.txt +option('injection_points', type: 'boolean', value: true, + description: 'Enable injection points') +It is enabled by default while building with meson.
Indeed, fixed.
diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c + LWLockRelease(InjectionPointLock); + + /* If not found, do nothing? */ + if (!found) + return;It would be good to log a warning message here.
I don't think that's a good idea. If a code path defines a
INJECTION_POINT_RUN() we'd get spurious warnings except if a point is
always defined when the build switch is enabled.
I tried to compile that with -Dwerror=true -Dinjection_points=false
and got some errors (warnings):
Right, fixed these three.
The test_injection_points test runs and passes although I set
-Dinjection_points=false. That could be misleading, IMO the test
should be skipped if Postgres is not compiled with the injection
points.
The test suite has been using an alternate output, but perhaps you are
right that this has little value without the switch enabled anyway.
I've made the processing optional when the option is not used for
meson and ./configure (requires a variable in Makefile.global.in in
the latter case), removing the alternate output.
Please find v2.
--
Michael
Attachments:
v2-0001-Base-facility-for-injection-points.patchtext/x-diff; charset=us-asciiDownload+599-1
On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote:
On Mon, Nov 06, 2023 at 10:28:14PM +0300, Nazir Bilal Yavuz wrote:
I liked the idea; thanks for working on this!
+1, this seems very useful.
+#ifdef USE_INJECTION_POINTS +#define INJECTION_POINT_RUN(name) InjectionPointRun(name) +#else +#define INJECTION_POINT_RUN(name) ((void) name) +#endif
nitpick: Why is the non-injection-point version "(void) name"? I see
"(void) true" used elsewhere for this purpose.
+ <para> + Here is an example of callback for + <literal>InjectionPointCallback</literal>: +<programlisting> +static void +custom_injection_callback(const char *name) +{ + elog(NOTICE, "%s: executed custom callback", name); +} +</programlisting>
Why do we provide the name to the callback functions?
Overall, the design looks pretty good to me. I think it's a good idea to
keep it simple to start with. Since this is really only intended for
special tests that run in special builds, it seems like we ought to be able
to change it easily in the future as needed.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Hi,
On 2023-10-25 13:13:38 +0900, Michael Paquier wrote:
So, please find attached a patch set that introduces an in-core
facility to be able to set what I'm calling here an "injection point",
that consists of being able to register in shared memory a callback
that can be run within a defined location of the code. It means that
it is not possible to trigger a callback before shared memory is set,
but I've faced far more the case where I wanted to trigger something
after shmem is set anyway. Persisting an injection point across
restarts is also possible by adding some through an extension's shmem
hook, as we do for custom LWLocks for example, as long as a library is
loaded.
I would like to see a few example tests using this facility - without that
it's a bit hard to judge how the impact on core code would be and how easy
tests are to write.
It also seems like there's a few bits and pieces missing to actually be able
to write interesting tests. It's one thing to be able to inject code, but what
you commonly want to do for tests is to actually wait for such a spot in the
code to be reached, then perhaps wait inside the "modified" code, and do
something else in the test script. But as-is a decent amount of C code would
need to be written to write such a test, from what I can tell?
Greetings,
Andres Freund
On Fri, Nov 10, 2023 at 02:44:25PM -0600, Nathan Bossart wrote:
On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote:
+#ifdef USE_INJECTION_POINTS +#define INJECTION_POINT_RUN(name) InjectionPointRun(name) +#else +#define INJECTION_POINT_RUN(name) ((void) name) +#endifnitpick: Why is the non-injection-point version "(void) name"? I see
"(void) true" used elsewhere for this purpose.
Or (void) 0.
+ <para> + Here is an example of callback for + <literal>InjectionPointCallback</literal>: +<programlisting> +static void +custom_injection_callback(const char *name) +{ + elog(NOTICE, "%s: executed custom callback", name); +} +</programlisting>Why do we provide the name to the callback functions?
This is for the use of the same callback across multiple points, and
tracking the name of the event happening was making sense to me to
know which code path is being taken when a callback is called. One
thing that I got in mind as well here is to be able to register custom
wait events based on the name of the callback taken, for example on a
condition variable, a latch or a named LWLock.
Overall, the design looks pretty good to me. I think it's a good idea to
keep it simple to start with. Since this is really only intended for
special tests that run in special builds, it seems like we ought to be able
to change it easily in the future as needed.
Yes, my first idea is to keep the initial design minimal and take the
temperature. As far as I can see, there seem to not be any strong
objection with this basic design, still I agree that I need to show a
bit more code about its usability. I have some SQL and recovery cases
where this is handy and these have piled over time, including at least
two/three of them with more basic APIs in the test module may make
sense in the initial batch of what I am proposing here.
--
Michael
On Fri, Nov 10, 2023 at 06:32:27PM -0800, Andres Freund wrote:
I would like to see a few example tests using this facility - without that
it's a bit hard to judge how the impact on core code would be and how easy
tests are to write.
Sure. I was wondering if people would be interested in that first.
It also seems like there's a few bits and pieces missing to actually be able
to write interesting tests. It's one thing to be able to inject code, but what
you commonly want to do for tests is to actually wait for such a spot in the
code to be reached, then perhaps wait inside the "modified" code, and do
something else in the test script. But as-is a decent amount of C code would
need to be written to write such a test, from what I can tell?
Depends on what you'd want to achieve. As I mentioned at the top of
the thread, error, fatal, panics, hardcoded waits are the most common
cases I've seen in the last years. Conditional waits are not in the
main patch but these are simple to support done (I mean, as in the
0003 attached with a TAP example).
While on it, I have extended the patch in the hash table a library
name and a function name so as the callback is loaded each time an
injection point is run. (Perhaps the list of callbacks already loaded
in a process should be saved in a session-level static list/array to
avoid loading the same callbacks again, not sure if that's worth doing
for a test facility assuming that the number of times a callback is
called in a single session is usually very limited. Anyway, that
would be simple to add if people prefer this addition.)
Anyway, here is a short list of commits that could have taken benefit
from this facility. There are is much more, but that's a list I
grabbed quickly from my notes:
1) 8a4237908c0f
2) cb0cca188072
3) 7863ee4def65 (See /messages/by-id/YnT/Y2sEYj7pyOdc@paquier.xyz
where an expensive TAP test was included, and I've seen users facing
this bug in real life). Revert of the original is clean here as well.
The trick is simple: stop a restartpoint during a promotion, and let
the restartpoint finish after the promotion.
4) 409f9ca44713, where injecting an error would stress the consistency
of the data reset (mentioned an error injected at
/messages/by-id/YWZk6nmAzQZS4B/z@paquier.xyz). This reverts
cleanly even today.
5) b4721f39505b, quite similar (mentioned an error injection exactly
here: /messages/by-id/20181011033810.GB23570@paquier.xyz). This
one requires an error when a transaction is started, something can be
achieved if the error is triggered conditionally (note that hard
failure would prevent the transaction to begin with the initial
snapshot taken in InitPostgres, but the module could just use a static
variable to track that).
Among these, I have implemented two examples on top of the main patch
set in 0002 and 0003: 4) as a TAP test with replication commands and
an error injection, and 3) that relies on a custom wait event and a
conditional variable to make the test posted on the other thread
cheaper, with an injection point waiting for a condition variable in
the middle of a restartpoint in the checkpointer. I don't mean to
necessarily include all that in the upstream tree, these are just here
for reference first.
3) is the most interesting in this set, for sure. That was a nasty
problem, and some cheap coverage in the core tree could be really good
for it, so I'd like to propose for commit after more polishing. The
test of the bug 3) I am referring to takes originally 30~45s to run
and it was unstable as it could timeout. With an injection point it
takes 1~2s. Note that test_injection_points gains a wait/wake logic
to be able to use condition variables to wait on the restartpoint of a
promoted standby). Both tests are not shaped for prime day yet, but
that's enough for a set of examples IMHO to show what can be done.
Does it answer your questions?
--
Michael
Attachments:
v3-0001-Base-facility-for-injection-points.patchtext/x-diff; charset=us-asciiDownload+662-1
v3-0002-Add-regression-test-to-show-snapbuild-consistency.patchtext/x-diff; charset=us-asciiDownload+52-1
v3-0003-Add-basic-test-for-promotion-and-restart-race-con.patchtext/x-diff; charset=us-asciiDownload+228-1
Hello,
Good stuff here, I also have a bunch of bugfix commits that ended up not
having a test because of the need for a debugger or other interaction,
so let's move forward.
I think the docs (and the macro/function naming) describe things
backwards. In my mind, it is INJECTION_POINT_RUN() that creates the
injection point; then InjectionPointCreate() attaches something to it.
So I would rename the macro to just INJECTION_POINT() and the function
to InjectionPointAttach(). This way you're saying "attach function FN
from library L to the injection point P"; where P is an entity that is
being created by the INJECTION_POINT() call in the code.
You named the hash table InjectionPointHashByName, which seems weird.
Is there any *other* way to locate an injection point that is not by
name?
In this patch, injection points are instance-wide (because the hash
table is in shmem). As soon as you install a callback to one point,
that callback will be fired in every session. Maybe for some tests this
is OK (and in particular your TAP tests have them attached in one
->safe_psql call and then they hit a completely different session, which
wouldn't work if the attachments were process-local), but maybe one
would want them limited to some specific process. Maybe give an
optional PID so that if any other process hits that injection point,
nothing happens?
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
On Tue, Nov 14, 2023 at 02:11:50PM +0100, Alvaro Herrera wrote:
Good stuff here, I also have a bunch of bugfix commits that ended up not
having a test because of the need for a debugger or other interaction,
so let's move forward.I think the docs (and the macro/function naming) describe things
backwards. In my mind, it is INJECTION_POINT_RUN() that creates the
injection point; then InjectionPointCreate() attaches something to it.
So I would rename the macro to just INJECTION_POINT() and the function
to InjectionPointAttach(). This way you're saying "attach function FN
from library L to the injection point P"; where P is an entity that is
being created by the INJECTION_POINT() call in the code.
Okay. I am not strongly attached to the terms used by the patch. The
first WIP I wrote used completely different terms.
You named the hash table InjectionPointHashByName, which seems weird.
Is there any *other* way to locate an injection point that is not by
name?
I am not sure what you mean here. Names are kind of the most
portable and simplest thing I could think of. Is there something else
you have in mind that would allow a mapping between a code path and
what should be run? Perhaps that's useful in some cases, but you were
also thinking about an in-core API where it is possible to retrieve a
list of callbacks based on a library name and/or a function name? I
didn't see a use for it, but why not.
In this patch, injection points are instance-wide (because the hash
table is in shmem). As soon as you install a callback to one point,
that callback will be fired in every session. Maybe for some tests this
is OK (and in particular your TAP tests have them attached in one
->safe_psql call and then they hit a completely different session, which
wouldn't work if the attachments were process-local), but maybe one
would want them limited to some specific process. Maybe give an
optional PID so that if any other process hits that injection point,
nothing happens?
Yes, still not something that's required in the core APIs or an
initial batch. This is something I've seen used and a central place
where the callbacks are registered allows that because the callback is
triggered based on a global state like a MyProcPid or a getpid(), so
it is possible to pass a condition to a callback when it is created
(or attached per your wording), with the condition maintained in a
shmem area that can be part of an extension module that defines the
callbacks (in test_injection_points). One trick sometimes is to know
the PID beforehand, which may need a second wait point (for example)
to make a test deterministic so as a test script has the time to get
the PID of a running session (bgworkers included) before the process
has time to do anything critical for the scenario tested.
An extra thing is that this design can be extended so as it could be
possible to pass down to the callback execution a private pointer of
data, though that's bound to the code path running the injection
point (not in the initial patch). Then it's up to the callback to
decide if it needs to do something or not (say, I don't want to run
this callback except if I am manipulating page N in an access method,
etc.). The conditional complexity is pushed to the injection
callbacks, not the core routines in charge is finding a callback or
attaching/creating one. I am not sure that it is a good idea to
enforce a specific conditional logic in the backend core code.
--
Michael
On 2023-Nov-15, Michael Paquier wrote:
On Tue, Nov 14, 2023 at 02:11:50PM +0100, Alvaro Herrera wrote:
You named the hash table InjectionPointHashByName, which seems weird.
Is there any *other* way to locate an injection point that is not by
name?I am not sure what you mean here. Names are kind of the most
portable and simplest thing I could think of.
Oh, I think you're overthinking what my comment was. I was saying, just
name it "InjectionPointsHash". Since there appears to be no room for
another hash table for injection points, then there's no need to specify
that this one is the ByName hash. I couldn't think of any other way to
organize the injection points either.
In this patch, injection points are instance-wide (because the hash
table is in shmem).Yes, still not something that's required in the core APIs or an
initial batch.
I agree that we can do the easy thing first and build it up later. I
just hope we don't get too wedded on the current interface because of
lack of time in the current release that we get stuck with it.
I am not sure that it is a good idea to enforce a specific conditional
logic in the backend core code.
Agreed, let's get more experience on what other types of tests people
want to build, and how are things going to interact with each other.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)
On Wed, Nov 15, 2023 at 12:21:40PM +0100, Alvaro Herrera wrote:
On 2023-Nov-15, Michael Paquier wrote:
Oh, I think you're overthinking what my comment was. I was saying, just
name it "InjectionPointsHash". Since there appears to be no room for
another hash table for injection points, then there's no need to specify
that this one is the ByName hash. I couldn't think of any other way to
organize the injection points either.
Aha, OK. No problem, this was itching me as well but I didn't see an
argument with changing these names, so I've renamed things a bit more.
Yes, still not something that's required in the core APIs or an
initial batch.I agree that we can do the easy thing first and build it up later. I
just hope we don't get too wedded on the current interface because of
lack of time in the current release that we get stuck with it.
One thing that I assume we will need with more advanced testing is the
possibility to pass down one (for a structure of data) or more
arguments to the callback a point is attached to. For that, it is
possible to add more macros, like INJECTION_POINT_1ARG,
INJECTION_POINT_ARG(), etc. that use some (void *) pointers. It would
be possible to add that even in stable branches, as need arises,
changing the structure of the shmem hash table if required to control
the way a callback is run.
At the end, I suspect that it is one of these things where we'll need
to adapt depending on what people want to do with this stuff. FWIW, I
can do already quite a bit with this minimalistic design and an
external extension. Custom wait events are also very handy to monitor
a wait.
I am not sure that it is a good idea to enforce a specific conditional
logic in the backend core code.Agreed, let's get more experience on what other types of tests people
want to build, and how are things going to interact with each other.
I've worked more on polishing the core facility today, with 0001
introducing the backend-side facility. One thing that I mentioned
lacking is a local cache for processes so as they don't load an
external callback more than once if run. So I have added this local
cache. When a point is scanned but not found, a previous cache entry
is cleared if any (this leaks but we don't have a way to unload stuff,
and for testing that's not a big deal). I've renamed the routines to
use attach and detach as terms, and adopted the name you've suggested
for the macro. The names around the hash table and its entries have
been changed to what you've suggested. You are right, that's more
intuitive.
0002 is the test module for the basics, split into its own patch, with
a couple of tests for the local cache part. 0003 and 0004 have been
adjusted with the new SQL functions. At the end, I'd like to propose
0004 as it's been a PITA for me and I don't want to break this case
again. It needs more work and can be cheaper. One more argument in
favor of it is the addition of condition variables to wait and wake
points (perhaps with even more variables?) in the test module.
If there is interest for 0003, I'm OK to work more on it as well, but
that's less important IMV.
Thoughts and comments are welcome, with a v4 series attached.
--
Michael
Attachments:
v4-0001-Add-backend-facility-for-injection-points.patchtext/x-diff; charset=us-asciiDownload+529-1
v4-0002-Add-test-module-test_injection_points.patchtext/x-diff; charset=us-asciiDownload+352-1
v4-0003-Add-regression-test-to-show-snapbuild-consistency.patchtext/x-diff; charset=us-asciiDownload+57-1
v4-0004-Add-basic-test-for-promotion-and-restart-race-con.patchtext/x-diff; charset=us-asciiDownload+223-1
On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote:
Hi all,
I don't remember how many times in the last few years when I've had to
hack the backend to produce a test case that involves a weird race
condition across multiple processes running in the backend, to be able
to prove a point or just test a fix (one recent case: 2b8e5273e949).
Usually, I come to hardcoding stuff for the following situations:
- Trigger a PANIC, to force recovery.
- A FATAL, to take down a session, or just an ERROR.
- palloc() failure injection.
- Sleep to slow down a code path.
- Pause and release with condition variable.And, while that's helpful to prove a point on a thread, nothing comes
out of it in terms of regression test coverage in the tree because
these tests are usually too slow and expensive, as they usually rely
on hardcoded timeouts. So that's pretty much attempting to emulate
what one would do with a debugger in a predictable way, without the
manual steps because human hands don't scale well.The reason behind that is of course more advanced testing, to be able
to expand coverage when we have weird and non-deterministic race
issues to deal with, and the code becoming more complex every year
makes that even harder. Fault and failure injection in specific paths
comes into mind, additionally, particularly if you manage complex
projects based on Postgres.So, please find attached a patch set that introduces an in-core
facility to be able to set what I'm calling here an "injection point",
that consists of being able to register in shared memory a callback
that can be run within a defined location of the code. It means that
it is not possible to trigger a callback before shared memory is set,
but I've faced far more the case where I wanted to trigger something
after shmem is set anyway. Persisting an injection point across
restarts is also possible by adding some through an extension's shmem
hook, as we do for custom LWLocks for example, as long as a library is
loaded.This will remind a bit of what Alexander Korotkov has proposed here:
/messages/by-id/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com
Also, this is much closee to what Craig Ringer is mentioning here,
where it is named probe points, but I am using a minimal design that
allows to achieve the same:
/messages/by-id/CAPpHfdsn-hzneYNbX4qcY5rnwr-BA1ogOCZ4TQCKQAw9qa48kA@mail.gmail.comA difference is that I don't really see a point in passing to the
callback triggered an area of data coming from the hash table itself,
as at the end a callback could just refer to an area in shared memory
or a static set of variables depending on what it wants, with one or
more injection points (say a location to set a state, and a second to
check it). So, at the end, the problem comes down in my opinion to
two things:
- Possibility to trigger a condition defined by some custom code, in
the backend (core code or even out-of-core).
- Possibility to define a location in the code where a named point
would be checked.0001 introduces three APIs to create, run, and drop injection points: +extern void InjectionPointCreate(const char *name, + InjectionPointCallback callback); +extern void InjectionPointRun(const char *name); +extern void InjectionPointDrop(const char *name);Then one just needs to add a macro like that to trigger the callback
registered in the code to test:
INJECTION_POINT_RUN("String");
So the footprint in the core tree is not zero, but it is as minimal as
it can be.I have added some documentation to explain that, as well. I am not
wedded to the name proposed in the patch, so if you feel there is
better, feel free to propose ideas.
Actually with Attach and Detach terminology, INJECTION_POINT becomes
the place where we "declare" the injection point. So the documentation
needs to first explain INJECTION_POINT and then explain the other
operations.
This facility is hidden behind a specific configure/Meson switch,
making it a no-op by default:
--enable-injection-points
-Dinjection_points={ true | false }
That's useful, but we will also see demand to enable those in
production (under controlled circumstances). But let the functionality
mature under a separate flag and developer builds before used in
production.
0002 is a test module to test these routines, that I have kept a
maximum simple to ease review of the basics proposed here. This could
be extended further to propose more default modes with TAP tests on
its own, as I don't see a real point in having the SQL bits or some
common callbacks (like for the PANIC or the FATAL cases) in core.Thoughts and comments are welcome.
I think this is super useful functionality. Some comments here.
+/*
+ * Local cache of injection callbacks already loaded, stored in
+ * TopMemoryContext.
+ */
+typedef struct InjectionPointArrayEntry
+{
+ char name[INJ_NAME_MAXLEN];
+ InjectionPointCallback callback;
+} InjectionPointArrayEntry;
+
+static InjectionPointArrayEntry *InjectionPointCacheArray = NULL;
Initial size of this array is small, but given the size of code in a
given path to be tested, I fear that this may not be sufficient. I
feel it's better to use hash table whose APIs are already available.
+ test_injection_points_attach
+------------------------------
+
+(1 row)
+
+SELECT test_injection_points_run('TestInjectionBooh'); -- nothing
I find it pretty useless to expose that as a SQL API. Also adding it
in tests would make it usable as an example. In this patchset
INJECTION_POINT should be the only way to trigger an injection point.
That also brings another question. The INJECTION_POINT needs to be added in the
core code but can only be used through an extension. I think there should be
some rudimentary albeit raw test in core for this. Extension does some good
things like provides built-in actions when the injection is triggered. So, we
should keep those too.
I am glad that it covers the frequently needed injections error, notice and
wait. If someone adds multiple injection points for wait and all of them are
triggered (in different backends), test_injection_points_wake() would
wake them all. When debugging cascaded functionality it's not easy to
follow sequence add, trigger, wake for multiple injections. We should
associate a conditional variable with the required injection points. Attach
should create conditional variable in the shared memory for that injection
point and detach should remove it. I see this being mentioned in the commit
message, but I think it's something we need in the first version of "wait" to
be useful.
At some point we may want to control how many times an injection is
triggered by using a count. Often, I have wanted an ERROR to be thrown
in a code path once or twice and then stop triggering it. For example
to test WAL sender restart after a certain event. We can't really time
Detach correctly to avoid multiple restarts. A count is useful is such
a case.
+/*
+ * Attach a new injection point.
+ */
+void
+InjectionPointAttach(const char *name,
+ const char *library,
+ const char *function)
+{
+#ifdef USE_INJECTION_POINTS
In a non-injection-point build this function would be compiled and a call to
this function would throw an error. This means that any test we write which
uses injection points can not do so optionally. Tests which can be run with and
without injection points built will reduce duplication. We should define this
function as no-op in non-injection-point build to faciliate such tests.
Those tests need to also install extension. That's another pain point.
So anyone wants to run the tests needs to compile the extension too. I
am wondering whether we should have this functionality in the core
itself somewhere and will be only useful when built with injection.
Many a times it's only a single backend which needs to be subjected to
an injection. For inducing ERROR and NOTICE, many a times it's also
the same backend attached the client session. For WAIT, however we
need a way to inject from some other session. We might be able to use
current signalling mechanism for that (wake sends SIGUSR1 with
reason). Leaving aside WAIT for a moment when the same backend's
behaviour is being controlled, do we really need shared memory and
also affect all the running backends. I see some discussion about
being able to trigger only for a given PID, but when that PID of the
current backend itself shared memory is not required.
--
Best Wishes,
Ashutosh Bapat
On Mon, Nov 20, 2023 at 04:53:45PM +0530, Ashutosh Bapat wrote:
On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier <michael@paquier.xyz> wrote:
I have added some documentation to explain that, as well. I am not
wedded to the name proposed in the patch, so if you feel there is
better, feel free to propose ideas.Actually with Attach and Detach terminology, INJECTION_POINT becomes
the place where we "declare" the injection point. So the documentation
needs to first explain INJECTION_POINT and then explain the other
operations.
Sure.
This facility is hidden behind a specific configure/Meson switch,
making it a no-op by default:
--enable-injection-points
-Dinjection_points={ true | false }That's useful, but we will also see demand to enable those in
production (under controlled circumstances). But let the functionality
mature under a separate flag and developer builds before used in
production.
Hmm. Okay. I'd still keep that under a compile switch for now
anyway to keep a cleaner separation and avoid issues where it would be
possible to inject code in a production build. Note that I was
planning to switch one of my buildfarm animals to use it should this
stuff make it into the tree. And people would be free to use it if
they want. If in production, that would be a risk, IMO.
+/* + * Local cache of injection callbacks already loaded, stored in + * TopMemoryContext. + */ +typedef struct InjectionPointArrayEntry +{ + char name[INJ_NAME_MAXLEN]; + InjectionPointCallback callback; +} InjectionPointArrayEntry; + +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL;Initial size of this array is small, but given the size of code in a
given path to be tested, I fear that this may not be sufficient. I
feel it's better to use hash table whose APIs are already available.
I've never seen in recent years a need for a given test to use more
than 4~5 points. But I guess that you've seen more than that wanted
in a prod environment with a fork of Postgres?
I'm OK to switch that to use a hash table in the initial
implementation, even for a low number of entries with points that are
not in hot code paths that should be OK. At least for my cases
related to testing that's OK. Am I right to assume that you mean a
HTAB in TopMemoryContext?
I find it pretty useless to expose that as a SQL API. Also adding it
in tests would make it usable as an example. In this patchset
INJECTION_POINT should be the only way to trigger an injection point.
That's useful to test the cache logic while providing simple coverage
for the core API, and that's cheap. So I'd rather keep this test
module around with these tests.
That also brings another question. The INJECTION_POINT needs to be added in the
core code but can only be used through an extension. I think there should be
some rudimentary albeit raw test in core for this. Extension does some good
things like provides built-in actions when the injection is triggered. So, we
should keep those too.
Yeah, I'd like to agree with that but everything I've seen in the
recent years is that every setup finishes by having different
assumptions about what they want to do, so my intention is to trim
down the core of the patch to a bare acceptable minimum and then work
on top of that.
I am glad that it covers the frequently needed injections error, notice and
wait. If someone adds multiple injection points for wait and all of them are
triggered (in different backends), test_injection_points_wake() would
wake them all. When debugging cascaded functionality it's not easy to
follow sequence add, trigger, wake for multiple injections. We should
associate a conditional variable with the required injection points. Attach
should create conditional variable in the shared memory for that injection
point and detach should remove it. I see this being mentioned in the commit
message, but I think it's something we need in the first version of "wait" to
be useful.
More to the point, I actually disagree with that, because it could be
possible as well that the same condition variable is used across
multiple points. At the end, IMHO, the central hash table should hold
zero meta-data associated to an injection point like a counter, an
elog, a condition variable, a sleep time, etc. or any combination of
all these ones, and should just know about how to load a callback with
a library path and a routine name. I understand that this is leaving
the responsibility of what a callback should do down to the individual
developer implementing a callback into their own extension, where they
should be free to have conditions of their own.
Something that I agree would be very useful for the in-core APIs,
depending on the cases, is to be able to push some information to the
callback at runtime to let a callback decide what to do depending on a
running state, including a condition registered when a point was
attached. See my argument about an _ARG macro that passes down to the
callback a (void *).
At some point we may want to control how many times an injection is
triggered by using a count. Often, I have wanted an ERROR to be thrown
in a code path once or twice and then stop triggering it. For example
to test WAL sender restart after a certain event. We can't really time
Detach correctly to avoid multiple restarts. A count is useful is such
a case.
Yeah. That's also something that can be achieved outside the shmem
hash table, so this is intentionally not part of InjectionPointHash.
+/* + * Attach a new injection point. + */ +void +InjectionPointAttach(const char *name, + const char *library, + const char *function) +{ +#ifdef USE_INJECTION_POINTSIn a non-injection-point build this function would be compiled and a call to
this function would throw an error. This means that any test we write which
uses injection points can not do so optionally. Tests which can be run with and
without injection points built will reduce duplication. We should define this
function as no-op in non-injection-point build to faciliate such tests.
The argument goes both ways, I guess. I'd rather choose a hard
failure to know that what I am doing is not silently ignored, which is
why I made this choice in the patch.
Those tests need to also install extension. That's another pain point.
So anyone wants to run the tests needs to compile the extension too. I
am wondering whether we should have this functionality in the core
itself somewhere and will be only useful when built with injection.
That's something that may be discussed on top of the backend APIs,
though this comes down to how and what kind of meta-data should be
associated to the central shmem hash table. Keeping the shmem hash as
small as possible to keep minimal the traces of this code in core is a
design choice that I'd rather not change.
Many a times it's only a single backend which needs to be subjected to
an injection. For inducing ERROR and NOTICE, many a times it's also
the same backend attached the client session.
Yep. I've used that across multiple sessions. For the basic
facility, I think that's the absolute minimum.
For WAIT, however we
need a way to inject from some other session.
You can do that already with the patch, no? If you know that a
different session would cross a given path, you could set a macro in
it. If you wish for this session to wait before that, it is possible
to use a second point to make it do so. I've used such techniques as
well for more complex reproducible failures than what I've posted in
the patch series. In the last months, I've topped a TAP test to rely
on 5 deterministic points, I think. Or perhaps 6. That was a fun
exercise, for a TAP test coded while self-complaining about the core
backend code that does not make this stuff easier.
We might be able to use
current signalling mechanism for that (wake sends SIGUSR1 with
reason). Leaving aside WAIT for a moment when the same backend's
behaviour is being controlled, do we really need shared memory and
also affect all the running backends. I see some discussion about
being able to trigger only for a given PID, but when that PID of the
current backend itself shared memory is not required.
I am not convinced that there is any need for signalling in most
cases, as long as you know beforehand the PID of the session you'd
like to stop, because this would still require a second session to
register a condition based on the PID known. Another possibility that
I can think of is to use a custom wait event with a second point to
setup a different condition. At the end, my point is that it is
possible to control everything in some extension code that holds the
callbacks, with an extra shmem area in the extension that associates
some meta-data with a point name, for instance.
--
Michael
On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote:
This facility is hidden behind a specific configure/Meson switch,
making it a no-op by default:
--enable-injection-points
-Dinjection_points={ true | false }That's useful, but we will also see demand to enable those in
production (under controlled circumstances). But let the functionality
mature under a separate flag and developer builds before used in
production.Hmm. Okay. I'd still keep that under a compile switch for now
anyway to keep a cleaner separation and avoid issues where it would be
possible to inject code in a production build. Note that I was
planning to switch one of my buildfarm animals to use it should this
stuff make it into the tree. And people would be free to use it if
they want. If in production, that would be a risk, IMO.
makes sense. Just to be clear - by "in production" I mean user
installations - they may be testing environments or production
environments.
+/* + * Local cache of injection callbacks already loaded, stored in + * TopMemoryContext. + */ +typedef struct InjectionPointArrayEntry +{ + char name[INJ_NAME_MAXLEN]; + InjectionPointCallback callback; +} InjectionPointArrayEntry; + +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL;Initial size of this array is small, but given the size of code in a
given path to be tested, I fear that this may not be sufficient. I
feel it's better to use hash table whose APIs are already available.I've never seen in recent years a need for a given test to use more
than 4~5 points. But I guess that you've seen more than that wanted
in a prod environment with a fork of Postgres?
A given case may not require more than 4 -5 points but users may
create scripts with many frequently required injection points and
install handlers for them.
I'm OK to switch that to use a hash table in the initial
implementation, even for a low number of entries with points that are
not in hot code paths that should be OK. At least for my cases
related to testing that's OK. Am I right to assume that you mean a
HTAB in TopMemoryContext?
Yes.
I am glad that it covers the frequently needed injections error, notice and
wait. If someone adds multiple injection points for wait and all of them are
triggered (in different backends), test_injection_points_wake() would
wake them all. When debugging cascaded functionality it's not easy to
follow sequence add, trigger, wake for multiple injections. We should
associate a conditional variable with the required injection points. Attach
should create conditional variable in the shared memory for that injection
point and detach should remove it. I see this being mentioned in the commit
message, but I think it's something we need in the first version of "wait" to
be useful.More to the point, I actually disagree with that, because it could be
possible as well that the same condition variable is used across
multiple points. At the end, IMHO, the central hash table should hold
zero meta-data associated to an injection point like a counter, an
elog, a condition variable, a sleep time, etc. or any combination of
all these ones, and should just know about how to load a callback with
a library path and a routine name. I understand that this is leaving
the responsibility of what a callback should do down to the individual
developer implementing a callback into their own extension, where they
should be free to have conditions of their own.Something that I agree would be very useful for the in-core APIs,
depending on the cases, is to be able to push some information to the
callback at runtime to let a callback decide what to do depending on a
running state, including a condition registered when a point was
attached. See my argument about an _ARG macro that passes down to the
callback a (void *).
The injection_run function is called from the place where the
injection point is declared but that place does not know what
injection function is going to be run. So a user can not pass
arguments to injection declaration. What injection to run is decided
by the injection_attach and thus one can pass arguments to it but then
injection_attach stores the information in the shared memory from
where it's picked up by injection_run. So even though you don't want
to store the arguments in the shared memory, you are creating a design
which takes us towards that direction eventually - otherwise users
will end up writing many injection functions - one for each possible
combination of count, sleep, conditional variable etc. But let's see
whether that happens to be the case in practice. We will need to
evolve this feature based on usage.
Those tests need to also install extension. That's another pain point.
So anyone wants to run the tests needs to compile the extension too. I
am wondering whether we should have this functionality in the core
itself somewhere and will be only useful when built with injection.That's something that may be discussed on top of the backend APIs,
though this comes down to how and what kind of meta-data should be
associated to the central shmem hash table. Keeping the shmem hash as
small as possible to keep minimal the traces of this code in core is a
design choice that I'd rather not change.
shmem hash size won't depend upon the number of functions we write in
the core. Yes it will add to the core code and may add maintenance
burden. So I understand your inclination to keep the core minimal.
Many a times it's only a single backend which needs to be subjected to
an injection. For inducing ERROR and NOTICE, many a times it's also
the same backend attached the client session.Yep. I've used that across multiple sessions. For the basic
facility, I think that's the absolute minimum.For WAIT, however we
need a way to inject from some other session.You can do that already with the patch, no? If you know that a
different session would cross a given path, you could set a macro in
it. If you wish for this session to wait before that, it is possible
to use a second point to make it do so. I've used such techniques as
well for more complex reproducible failures than what I've posted in
the patch series. In the last months, I've topped a TAP test to rely
on 5 deterministic points, I think. Or perhaps 6. That was a fun
exercise, for a TAP test coded while self-complaining about the core
backend code that does not make this stuff easier.We might be able to use
current signalling mechanism for that (wake sends SIGUSR1 with
reason). Leaving aside WAIT for a moment when the same backend's
behaviour is being controlled, do we really need shared memory and
also affect all the running backends. I see some discussion about
being able to trigger only for a given PID, but when that PID of the
current backend itself shared memory is not required.I am not convinced that there is any need for signalling in most
cases, as long as you know beforehand the PID of the session you'd
like to stop, because this would still require a second session to
register a condition based on the PID known. Another possibility that
I can think of is to use a custom wait event with a second point to
setup a different condition. At the end, my point is that it is
possible to control everything in some extension code that holds the
callbacks, with an extra shmem area in the extension that associates
some meta-data with a point name, for instance.
If the session which attaches to an injection point is same as the
session where the injection point is triggered (most of the ERROR and
NOTICE injections will see this pattern), we don't need shared memory.
There's a performance penalty to it since injection_run will look up
shared memory. For WAIT we may or may not need shared memory. But
let's see what other think and what usage patterns we see.
--
Best Wishes,
Ashutosh Bapat
On Wed, Nov 22, 2023 at 09:23:21PM +0530, Ashutosh Bapat wrote:
On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote:
I've never seen in recent years a need for a given test to use more
than 4~5 points. But I guess that you've seen more than that wanted
in a prod environment with a fork of Postgres?A given case may not require more than 4 -5 points but users may
create scripts with many frequently required injection points and
install handlers for them.
Sure, if a callback is generic enough it could be shared across
multiple points.
The injection_run function is called from the place where the
injection point is declared but that place does not know what
injection function is going to be run. So a user can not pass
arguments to injection declaration.
It is possible to make that predictible but it means that a callback
is most likely to be used by one single point. This makes extensions
in charge of holding the callbacks more complicated, at the benefit of
keeping a minimal footprint in the backend code.
What injection to run is decided
by the injection_attach and thus one can pass arguments to it but then
injection_attach stores the information in the shared memory from
where it's picked up by injection_run. So even though you don't want
to store the arguments in the shared memory, you are creating a design
which takes us towards that direction eventually - otherwise users
will end up writing many injection functions - one for each possible
combination of count, sleep, conditional variable etc. But let's see
whether that happens to be the case in practice. We will need to
evolve this feature based on usage.
A one-one mapping between callback and point is not always necessary.
If you wish to use a combination of N points with a sleep callback and
different sleep times, one can just register a second shmem area in
the extension holding the callbacks that links the point names with
the sleep time to use.
shmem hash size won't depend upon the number of functions we write in
the core. Yes it will add to the core code and may add maintenance
burden. So I understand your inclination to keep the core minimal.
Yeah, without a clear benefit, my point is just to throw the
responsibility to extension developers for now, which would mean the
addition of tests that depend on test_injection_points/, or just
install this extension optionally in other code path that need it.
Maybe 0004 should be in src/test/recovery/ and do that, actually..
I'll most likely agree with extending all the backend stuff in a more
meaningful way, but I am not sure which method should be enforced.
If the session which attaches to an injection point is same as the
session where the injection point is triggered (most of the ERROR and
NOTICE injections will see this pattern), we don't need shared memory.
There's a performance penalty to it since injection_run will look up
shared memory. For WAIT we may or may not need shared memory. But
let's see what other think and what usage patterns we see.
The first POC of the patch that you can find at the top of this thread
did that, actually, but this is too limited. IMO, linking things to a
central table is just *much* more useful.
I've implemented a v5 that switches the cache to use a seconf hash
table on TopMemoryContext for the cache instead of an array. This
makes the cache handling slightly cleaner, so your suggestion was
right. 0003 and 0004 are the same as previously, passing or failing
under the same conditions. I'm wondering if folks have other comments
about 0001 and 0002? It sounds to me like the consensus is that this
stuff is useful and that there are no string objections, so feel free
to comment.
I don't want to propose 0003 in the tree, just an improved version of
0004 for the test coverage (still need to improve that).
--
Michael
Attachments:
v5-0001-Add-backend-facility-for-injection-points.patchtext/x-diff; charset=us-asciiDownload+497-1
v5-0002-Add-test-module-test_injection_points.patchtext/x-diff; charset=us-asciiDownload+352-1
v5-0003-Add-regression-test-to-show-snapbuild-consistency.patchtext/x-diff; charset=us-asciiDownload+57-1
v5-0004-Add-basic-test-for-promotion-and-restart-race-con.patchtext/x-diff; charset=us-asciiDownload+223-1
On Fri, Nov 24, 2023 at 7:26 AM Michael Paquier <michael@paquier.xyz> wrote:
If you wish to use a combination of N points with a sleep callback and
different sleep times, one can just register a second shmem area in
the extension holding the callbacks that links the point names with
the sleep time to use.
Interesting idea. For that the callback needs to know the injection
point name. At least we should pass that to the callback. It's trivial
thing to do.
shmem hash size won't depend upon the number of functions we write in
the core. Yes it will add to the core code and may add maintenance
burden. So I understand your inclination to keep the core minimal.Yeah, without a clear benefit, my point is just to throw the
responsibility to extension developers for now, which would mean the
addition of tests that depend on test_injection_points/, or just
install this extension optionally in other code path that need it.
Maybe 0004 should be in src/test/recovery/ and do that, actually..
That might work, but in order to run tests in that directory one has
to also install the extension. Do we have precedence for such kind of
dependency?
If the session which attaches to an injection point is same as the
session where the injection point is triggered (most of the ERROR and
NOTICE injections will see this pattern), we don't need shared memory.
There's a performance penalty to it since injection_run will look up
shared memory. For WAIT we may or may not need shared memory. But
let's see what other think and what usage patterns we see.The first POC of the patch that you can find at the top of this thread
did that, actually, but this is too limited. IMO, linking things to a
central table is just *much* more useful.I've implemented a v5 that switches the cache to use a seconf hash
table on TopMemoryContext for the cache instead of an array. This
makes the cache handling slightly cleaner, so your suggestion was
right.
glad that you liked the outcome.
0003 and 0004 are the same as previously, passing or failing
under the same conditions. I'm wondering if folks have other comments
about 0001 and 0002? It sounds to me like the consensus is that this
stuff is useful
I think so.
and that there are no string objections, so feel free
to comment.
Let's get some more opinions on the design. I will review the detailed
code then.
I don't want to propose 0003 in the tree, just an improved version of
0004 for the test coverage (still need to improve that).
Are you working on v6 already?
--
Best Wishes,
Ashutosh Bapat
On Fri, Nov 24, 2023 at 04:37:58PM +0530, Ashutosh Bapat wrote:
Interesting idea. For that the callback needs to know the injection
point name. At least we should pass that to the callback. It's trivial
thing to do.
This is what's done from the beginning, as well as of 0001 in the v5
series:
+INJECTION_POINT(name);
[...]
+ injection_callback(name);
That might work, but in order to run tests in that directory one has
to also install the extension. Do we have precedence for such kind of
dependency?
Yes, please see EXTRA_INSTALL in some of the Makefiles. This can
install stuff from paths different than the location where the tests
are run.
and that there are no string objections, so feel free
to comment.Let's get some more opinions on the design. I will review the detailed
code then.
Sure. Thanks.
I don't want to propose 0003 in the tree, just an improved version of
0004 for the test coverage (still need to improve that).Are you working on v6 already?
No, what would be the point at this stage? I dont have much more to
add to 0001 and 0002 at the moment, which focus on the core of the
problem.
--
Michael
On Fri, Nov 24, 2023 at 7:37 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Nov 24, 2023 at 04:37:58PM +0530, Ashutosh Bapat wrote:
Interesting idea. For that the callback needs to know the injection
point name. At least we should pass that to the callback. It's trivial
thing to do.This is what's done from the beginning, as well as of 0001 in the v5
series:
+INJECTION_POINT(name);
[...]
+ injection_callback(name);
In my first look I missed the actual call to the injection callback in
InjectionPointRun()
injection_callback(name);
Sorry for that.
The way I see it is that an extension using this functionality will
create an auxiliary lookup table keyed by the injection point name to
obtain the injection point specific arguments (sleep time, count etc.)
in the shared memory or local memory. Every time an injection callback
is called it will consult this look up table to get the arguments.
That looks ok to me. There might be other ways to achieve the same
effect. We will learn and absorb whatever benefits core and the users.
I like that.
That might work, but in order to run tests in that directory one has
to also install the extension. Do we have precedence for such kind of
dependency?Yes, please see EXTRA_INSTALL in some of the Makefiles. This can
install stuff from paths different than the location where the tests
are run.
WFM then.
and that there are no string objections, so feel free
to comment.Let's get some more opinions on the design. I will review the detailed
code then.Sure. Thanks.
I don't want to propose 0003 in the tree, just an improved version of
0004 for the test coverage (still need to improve that).Are you working on v6 already?
No, what would be the point at this stage? I dont have much more to
add to 0001 and 0002 at the moment, which focus on the core of the
problem.
Since you wroten "(still need to improve ...) I thought you are
working on v6. No problem. Sorry for the confusion.
--
Best Wishes,
Ashutosh Bapat