Add isolation test template in injection_points for wait/wakeup/detach

Started by Michael Paquierabout 1 year ago21 messages
#1Michael Paquier
michael@paquier.xyz
1 attachment(s)

Hi all,

This was on my TODO bucket for some time. The isolation test
"inplace" in the isolation test suite for injection_points relies on a
couple of behaviors implemented in the backend. One of them is that a
point detach should not affect a wait. Rather than having to guess
this stuff from the sole isolation test we have in this module, I
would like to add a simple test to document all these assumptions.

This also works as a simple template for a newcomer willing to
implement an isolation test with injection points. That's less
intimidating than the inplace test with its VACUUM and GRANT
interactions, based on an injection point run when updating the stats
of a relation in VACUUM.

Thoughts?
--
Michael

Attachments:

0001-injection_points-Add-basic-isolation-test.patchtext/x-diff; charset=us-asciiDownload
From 4a0ffa49c7a7b1d3d3b878780aee6f536f38a68f Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 18 Oct 2024 07:40:46 +0900
Subject: [PATCH] injection_points: Add basic isolation test

This can act as a template when implementing an isolation test with
injection points, and tracks in a much simpler way some of the behaviors
implied in the existing isolation test "inplace".  Particularly, a
detach should not affect a backend wait; a wait can should only be
stopped by a wakeup.
---
 src/test/modules/injection_points/Makefile    |  2 +-
 .../injection_points/expected/basic.out       | 74 +++++++++++++++++++
 src/test/modules/injection_points/meson.build |  1 +
 .../modules/injection_points/specs/basic.spec | 35 +++++++++
 4 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/injection_points/expected/basic.out
 create mode 100644 src/test/modules/injection_points/specs/basic.spec

diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 8cb8c498e2..0753a9df58 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -13,7 +13,7 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points reindex_conc
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = inplace
+ISOLATION = basic inplace
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/basic.out b/src/test/modules/injection_points/expected/basic.out
new file mode 100644
index 0000000000..840ce2dac9
--- /dev/null
+++ b/src/test/modules/injection_points/expected/basic.out
@@ -0,0 +1,74 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait1 wakeup2 detach2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait1: SELECT injection_points_run('injection-points-wait'); <waiting ...>
+step wakeup2: SELECT injection_points_wakeup('injection-points-wait');
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait1: <... completed>
+injection_points_run
+--------------------
+                    
+(1 row)
+
+step detach2: SELECT injection_points_detach('injection-points-wait');
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+
+starting permutation: wait1 detach2 wakeup2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait1: SELECT injection_points_run('injection-points-wait'); <waiting ...>
+step detach2: SELECT injection_points_detach('injection-points-wait');
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+step wakeup2: SELECT injection_points_wakeup('injection-points-wait');
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait1: <... completed>
+injection_points_run
+--------------------
+                    
+(1 row)
+
+
+starting permutation: detach2 wait1 wakeup2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step detach2: SELECT injection_points_detach('injection-points-wait');
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+step wait1: SELECT injection_points_run('injection-points-wait');
+injection_points_run
+--------------------
+                    
+(1 row)
+
+step wakeup2: SELECT injection_points_wakeup('injection-points-wait');
+ERROR:  could not find injection point injection-points-wait to wake up
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index fdb5a25d7b..58f1900115 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -42,6 +42,7 @@ tests += {
   },
   'isolation': {
     'specs': [
+      'basic',
       'inplace',
     ],
   },
diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec
new file mode 100644
index 0000000000..ccd49e31ed
--- /dev/null
+++ b/src/test/modules/injection_points/specs/basic.spec
@@ -0,0 +1,35 @@
+# Basic isolation test for injection points.
+#
+# This checks the interactions between wakeup, wait and detach.
+# Feel free to use it as a template when implementing an isolation
+# test with injection points.
+
+setup
+{
+	CREATE EXTENSION injection_points;
+}
+teardown
+{
+	DROP EXTENSION injection_points;
+}
+
+# Wait happens in the first session, wakeup in the second session.
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('injection-points-wait', 'wait');
+}
+step wait1	{ SELECT injection_points_run('injection-points-wait'); }
+
+session s2
+step wakeup2	{ SELECT injection_points_wakeup('injection-points-wait'); }
+step detach2	{ SELECT injection_points_detach('injection-points-wait'); }
+
+# Detach after wait and wakeup.
+permutation wait1 wakeup2 detach2
+
+# Detach before wake.  s1 waits until wakeup, ignores the detach.
+permutation wait1 detach2 wakeup2
+
+# Detach before wait does not cause a wait.
+permutation detach2 wait1 wakeup2
-- 
2.45.2

#2Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#1)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Fri, Oct 18, 2024 at 07:44:08AM +0900, Michael Paquier wrote:

Hi all,

This was on my TODO bucket for some time. The isolation test
"inplace" in the isolation test suite for injection_points relies on a
couple of behaviors implemented in the backend. One of them is that a
point detach should not affect a wait. Rather than having to guess
this stuff from the sole isolation test we have in this module, I
would like to add a simple test to document all these assumptions.

This also works as a simple template for a newcomer willing to
implement an isolation test with injection points. That's less
intimidating than the inplace test with its VACUUM and GRANT
interactions, based on an injection point run when updating the stats
of a relation in VACUUM.

Thoughts?

I think that it makes sense to "expose" the basics in a dedicated test.

I thought that template.spec could have been another "naming" option but
basic.spec is better I think.

The test is pretty straightforward so I don't have that much to say, just one Nit:

1 ===

+# Detach before wait does not cause a wait.

What about?

Detach before wait does not cause a wait (and so wakeup produces an error)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#3Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#2)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Fri, Oct 18, 2024 at 07:58:26AM +0000, Bertrand Drouvot wrote:

I think that it makes sense to "expose" the basics in a dedicated test.

I thought that template.spec could have been another "naming" option but
basic.spec is better I think.

I am not wedded to the name I have proposed. No comment on my naming
sense..

+# Detach before wait does not cause a wait.

What about?

Detach before wait does not cause a wait (and so wakeup produces an error)

Sure, why not.
--
Michael

#4Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#3)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Fri, Oct 18, 2024 at 06:16:50PM +0900, Michael Paquier wrote:

Sure, why not.

Used "basic" as test name at the end, tweaked the comment close to
what you have suggested, and applied the result.

Thanks for the review.
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Mon, Oct 21, 2024 at 11:13:59AM +0900, Michael Paquier wrote:

Used "basic" as test name at the end, tweaked the comment close to
what you have suggested, and applied the result.

I have spotted a race condition with this test in some CF bot runs.
Here are some of them:
https://github.com/michaelpq/postgres/runs/31984161747
https://cirrus-ci.com/task/6741385749987328
https://cirrus-ci.com/task/6056918689513472
https://cirrus-ci.com/task/5863397261049856
https://cirrus-ci.com/task/5565235765968896
https://cirrus-ci.com/task/6238594598174720

All of them show up as follows in regression.diffs:
diff -U3
/tmp/cirrus-ci-build/src/test/modules/injection_points/expected/basic.out
/tmp/cirrus-ci-build/build/testrun/injection_points/isolation/results/basic.out
---
/tmp/cirrus-ci-build/src/test/modules/injection_points/expected/basic.out
2024-10-24 02:36:24.068590000 +0000
+++
/tmp/cirrus-ci-build/build/testrun/injection_points/isolation/results/basic.out
2024-10-24 02:40:09.179695000 +0000
@@ -13,16 +13,16 @@

(1 row)

-step wait1: <... completed>
-injection_points_run
---------------------
-
-(1 row)
-
step detach2: SELECT injection_points_detach('injection-points-wait');
injection_points_detach
-----------------------

+(1 row)
+
+step wait1: <... completed>
+injection_points_run
+--------------------
+                    
 (1 row)

This is in the first permutation of the test done with "wait1 wakeup2
detach2", and the diff means that the backend running the "wait"
callback is reported as finished after the detach is done,
injection_points_run being only used for the waits. Hence the wait is
so slow to finish that the detach has time to complete and finish,
breaking the output.

And here comes the puzzling part: all of failures involve FreeBSD 13
in the CI. Reproducing this failure would not be difficult, I thought
first; we can add a hardcoded pg_usleep() to delay the end of
injection_wait() so as we make sure that the backend doing the wait
reports later than the detach. Or just do the same at the end of
injection_points_run() once the callback exits. I've sure done that,
placing some strategic pg_usleep() calls on Linux to make the paths
that matter in the wait slower, but the test remains stable. The CI
on Linux is stable as well: 3rd and 4th columns of the cfbot are
green, I did not spot any failures related to this isolation test in
injection_points. Only the second column about FreeBSD is going rogue
on a periodic basis.

One fix would is to remove this first permutation test, still that's
hiding a problem rather than solving it, and something looks wrong
with conditional variables, specific to FreeBSD? Hence, I'm puzzled
by that.

I am going to remove the first permutation anyway to stabilize the CI
reports. But it feels like we have a different problem here, and I am
not sure what.

Any thoughts or comments?
--
Michael

#6Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#5)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Fri, Oct 25, 2024 at 09:33:12AM +0900, Michael Paquier wrote:

I am going to remove the first permutation anyway to stabilize the CI
reports. But it feels like we have a different problem here, and I am
not sure what.

A few days later, I have looked at the full set of CF bot jobs and
9f00edc22888 has stabilized the CI. We went down from 20-ish failures
in the whole set of patches to 0.

I still don't have a clear idea about why this is limited to FreeBSD
13 and only for this permutation. The buildfarm did not report
anything, except if I've missed something, of course.
--
Michael

#7Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#5)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Fri, Oct 25, 2024 at 09:33:12AM +0900, Michael Paquier wrote:

This is in the first permutation of the test done with "wait1 wakeup2
detach2", and the diff means that the backend running the "wait"
callback is reported as finished after the detach is done,
injection_points_run being only used for the waits. Hence the wait is
so slow to finish that the detach has time to complete and finish,
breaking the output.

Yeah, I agree with your analysis.

And here comes the puzzling part: all of failures involve FreeBSD 13
in the CI. Reproducing this failure would not be difficult, I thought
first; we can add a hardcoded pg_usleep() to delay the end of
injection_wait() so as we make sure that the backend doing the wait
reports later than the detach. Or just do the same at the end of
injection_points_run() once the callback exits. I've sure done that,
placing some strategic pg_usleep() calls on Linux to make the paths
that matter in the wait slower, but the test remains stable.

Right, I did the same and observed the exact same behavior.

Still, it's possible to observe the s1 wait finishing after the s2 detach by
running this test manually (means create 2 sessions and run the test commands
manually) and:

1. attach a debuger on the first session (say with a break point in injection_wait()).
or
2. add a hardcoded pg_usleep() in injection_wait()

So I think that the s1 wait finishing after the s2 detach is possible if
the session 1 is "freezed" (gdb case) or slow enough (pg_usleep() case).

The CI
on Linux is stable as well: 3rd and 4th columns of the cfbot are
green, I did not spot any failures related to this isolation test in
injection_points. Only the second column about FreeBSD is going rogue
on a periodic basis.

One fix would is to remove this first permutation test, still that's
hiding a problem rather than solving it, and something looks wrong
with conditional variables, specific to FreeBSD?

Hum, we would probably observe other failures in other tests no?

Any thoughts or comments?

I think that we can not be 100% sure that the s1 wait will finish before the
s2 detach (easy reproducible with gdb attached on s1 or an hardcoded sleep) and
that other OS could also report the test as failing for the same reason.

It's not ideal, but instead of removing this first permutation test what about
adding a "sleep2" step in it (doing say, SELECT pg_sleep(1);) and call this
new step before the detach2 one?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#8Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#7)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Mon, Oct 28, 2024 at 07:17:28AM +0000, Bertrand Drouvot wrote:

I think that we can not be 100% sure that the s1 wait will finish before the
s2 detach (easy reproducible with gdb attached on s1 or an hardcoded sleep) and
that other OS could also report the test as failing for the same reason.

Yes, the only safe thing we can do in this test is to let the wakeup2
be last, as we are sure that the isolationtester is going to keep at
s2 to finish the call of the wakeup function before moving on with
checking the end of the wait.

It's not ideal, but instead of removing this first permutation test what about
adding a "sleep2" step in it (doing say, SELECT pg_sleep(1);) and call this
new step before the detach2 one?

There is no real guarantee of stability. Under a wait of N seconds,
we could still have environments where the wait() could remain stuck
more than N seconds between the moment the condition variable is woken
up and the result of the wait() is reported back to the client. And
hardcoded sleeps make the test slower even on fast machines.

What we have here seems like just contention of Cirrus with the
FreeBSD hosts while there is much more stability with the linux hosts.
--
Michael

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#8)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Tue, Oct 29, 2024 at 01:12:25PM +0900, Michael Paquier wrote:

On Mon, Oct 28, 2024 at 07:17:28AM +0000, Bertrand Drouvot wrote:

I think that we can not be 100% sure that the s1 wait will finish before the
s2 detach (easy reproducible with gdb attached on s1 or an hardcoded sleep) and
that other OS could also report the test as failing for the same reason.

Yes, the only safe thing we can do in this test is to let the wakeup2
be last, as we are sure that the isolationtester is going to keep at
s2 to finish the call of the wakeup function before moving on with
checking the end of the wait.

Agree.

It's not ideal, but instead of removing this first permutation test what about
adding a "sleep2" step in it (doing say, SELECT pg_sleep(1);) and call this
new step before the detach2 one?

There is no real guarantee of stability.

Right.

Under a wait of N seconds,
we could still have environments where the wait() could remain stuck
more than N seconds between the moment the condition variable is woken
up and the result of the wait() is reported back to the client. And
hardcoded sleeps make the test slower even on fast machines.

Yeah, not ideal so better to keep the test as it is currently (after your
refactoring).

What we have here seems like just contention of Cirrus with the
FreeBSD hosts while there is much more stability with the linux hosts.

That sounds like it, agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#10Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#9)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Tue, Oct 29, 2024 at 07:59:31AM +0000, Bertrand Drouvot wrote:

On Tue, Oct 29, 2024 at 01:12:25PM +0900, Michael Paquier wrote:

Under a wait of N seconds,
we could still have environments where the wait() could remain stuck
more than N seconds between the moment the condition variable is woken
up and the result of the wait() is reported back to the client. And
hardcoded sleeps make the test slower even on fast machines.

Yeah, not ideal so better to keep the test as it is currently (after your
refactoring).

Backpedalling a bit on that.. Perhaps it would be useful to keep the
permutation as disabled in the test and to update the comment in
basic.spec to tell that this had better be avoided when implementing a
test if the timing of the return value of a SQL wait() cannot be
strictly controlled? And also say that putting the wakeup() should be
last for simple permutations? The point of this spec is also to
document some good and bad practices.
--
Michael

#11Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#10)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Tue, Oct 29, 2024 at 05:20:42PM +0900, Michael Paquier wrote:

On Tue, Oct 29, 2024 at 07:59:31AM +0000, Bertrand Drouvot wrote:

On Tue, Oct 29, 2024 at 01:12:25PM +0900, Michael Paquier wrote:

Under a wait of N seconds,
we could still have environments where the wait() could remain stuck
more than N seconds between the moment the condition variable is woken
up and the result of the wait() is reported back to the client. And
hardcoded sleeps make the test slower even on fast machines.

Yeah, not ideal so better to keep the test as it is currently (after your
refactoring).

Backpedalling a bit on that.. Perhaps it would be useful to keep the
permutation as disabled in the test and to update the comment in
basic.spec to tell that this had better be avoided when implementing a
test if the timing of the return value of a SQL wait() cannot be
strictly controlled? And also say that putting the wakeup() should be
last for simple permutations? The point of this spec is also to
document some good and bad practices.

Yeah, agree that it would make sense to document in the test what has been
discovered here.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#11)
1 attachment(s)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Tue, Oct 29, 2024 at 02:06:11PM +0000, Bertrand Drouvot wrote:

Yeah, agree that it would make sense to document in the test what has been
discovered here.

How about something like the attached?
--
Michael

Attachments:

inj-permutation-comment.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec
index 7f44e3ddc3..4a78fda64a 100644
--- a/src/test/modules/injection_points/specs/basic.spec
+++ b/src/test/modules/injection_points/specs/basic.spec
@@ -26,7 +26,11 @@ step wakeup2	{ SELECT injection_points_wakeup('injection-points-wait'); }
 step detach2	{ SELECT injection_points_detach('injection-points-wait'); }
 
 # Detach after wait and wakeup.
-# This permutation is proving to be unstable on FreeBSD, so disable for now.
+# Permutations like this one should be avoided, as it is not possible to
+# strictly control the timing when the SQL function doing the wait returns
+# back its result.  In this case, the detach may finish faster than the
+# wait.  It is recommended to use wakeups as the last permutation should
+# a wait be done within an SQL function.
 #permutation wait1 wakeup2 detach2
 
 # Detach before wakeup.  s1 waits until wakeup, ignores the detach.
#13Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#12)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Wed, Oct 30, 2024 at 12:40:05PM +0900, Michael Paquier wrote:

On Tue, Oct 29, 2024 at 02:06:11PM +0000, Bertrand Drouvot wrote:

Yeah, agree that it would make sense to document in the test what has been
discovered here.

How about something like the attached?

Thanks!

A few comments:

1 === Nit

s/Permutations like this one/Permutations like the following commented out one/ ?

2 ===

s/back its result/back its result once woken up/ ?

3 ===

s/faster than the wait/before the SQL function doing the wait returns its result/ ?

With 2 === and 3 === I think it's easier to "link" both sentences that way.

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#13)
1 attachment(s)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Wed, Oct 30, 2024 at 07:26:07AM +0000, Bertrand Drouvot wrote:

1 === Nit

s/Permutations like this one/Permutations like the following commented out one/ ?

2 ===

s/back its result/back its result once woken up/ ?

3 ===

s/faster than the wait/before the SQL function doing the wait returns its result/ ?

With 2 === and 3 === I think it's easier to "link" both sentences that way.

What do you think about the updated version attached?
--
Michael

Attachments:

inj-permutation-comment-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec
index 7f44e3ddc3..47db676c0c 100644
--- a/src/test/modules/injection_points/specs/basic.spec
+++ b/src/test/modules/injection_points/specs/basic.spec
@@ -26,7 +26,10 @@ step wakeup2	{ SELECT injection_points_wakeup('injection-points-wait'); }
 step detach2	{ SELECT injection_points_detach('injection-points-wait'); }
 
 # Detach after wait and wakeup.
-# This permutation is proving to be unstable on FreeBSD, so disable for now.
+# Permutations like the following one commented out should be avoided, as
+# the detach may finish before the SQL function doing the wait returns
+# its result.  It is recommended to use wakeups as the last permutation
+# should a wait be done within an SQL function.
 #permutation wait1 wakeup2 detach2
 
 # Detach before wakeup.  s1 waits until wakeup, ignores the detach.
#15Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#14)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Wed, Oct 30, 2024 at 06:21:04PM +0900, Michael Paquier wrote:

On Wed, Oct 30, 2024 at 07:26:07AM +0000, Bertrand Drouvot wrote:

1 === Nit

s/Permutations like this one/Permutations like the following commented out one/ ?

2 ===

s/back its result/back its result once woken up/ ?

3 ===

s/faster than the wait/before the SQL function doing the wait returns its result/ ?

With 2 === and 3 === I think it's easier to "link" both sentences that way.

What do you think about the updated version attached?

Thanks! LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#15)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Wed, Oct 30, 2024 at 09:27:13AM +0000, Bertrand Drouvot wrote:

Thanks! LGTM.

Thanks for the review. Applied that, then.
--
Michael

#17Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#16)
1 attachment(s)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Thu, Oct 31, 2024 at 08:47:04AM +0900, Michael Paquier wrote:

Thanks for the review. Applied that, then.

I was looking at src/test/isolation/README, and based on what is
described for parenthesized markers (which I didn't know about so we
learn a new thing every day), it is possible to force a strict
ordering of the test. Quote that matters:
"A marker consisting solely of a step name indicates that this step
may not be reported as completing until that other step has
completed."

In our case, the problem that caused 9f00edc22888 is that the "wait"
step could be so slow that the "detach" step could report before the
"wait" step.

So I'd like to propose that we revisit 9f00edc22888, and re-enable the
permutation with a marker so as the "detach" step waits for the "wait"
step to complete before printing its own report, as per the attached.
That should make the permutation safe to use.

The output of the attached patch is the same as what has been removed
in 9f00edc22888 except that the permutation is changed from that:
permutation wait1 wakeup2 detach2
To that, making sure that detach2 report happens only after wait1
has returned its own result:
permutation wait1 wakeup2 detach2(wait1)

Thoughts?
--
Michael

Attachments:

inj-isolation-test.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/modules/injection_points/expected/basic.out b/src/test/modules/injection_points/expected/basic.out
index 9499ef4bd9d..840ce2dac90 100644
--- a/src/test/modules/injection_points/expected/basic.out
+++ b/src/test/modules/injection_points/expected/basic.out
@@ -1,5 +1,31 @@
 Parsed test spec with 2 sessions
 
+starting permutation: wait1 wakeup2 detach2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait1: SELECT injection_points_run('injection-points-wait'); <waiting ...>
+step wakeup2: SELECT injection_points_wakeup('injection-points-wait');
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait1: <... completed>
+injection_points_run
+--------------------
+                    
+(1 row)
+
+step detach2: SELECT injection_points_detach('injection-points-wait');
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+
 starting permutation: wait1 detach2 wakeup2
 injection_points_attach
 -----------------------
diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec
index 47db676c0c1..753128e7f36 100644
--- a/src/test/modules/injection_points/specs/basic.spec
+++ b/src/test/modules/injection_points/specs/basic.spec
@@ -25,12 +25,11 @@ session s2
 step wakeup2	{ SELECT injection_points_wakeup('injection-points-wait'); }
 step detach2	{ SELECT injection_points_detach('injection-points-wait'); }
 
-# Detach after wait and wakeup.
-# Permutations like the following one commented out should be avoided, as
-# the detach may finish before the SQL function doing the wait returns
-# its result.  It is recommended to use wakeups as the last permutation
-# should a wait be done within an SQL function.
-#permutation wait1 wakeup2 detach2
+# Detach after wait and wakeup.  Note that the detach may finish before
+# the SQL function doing the wait returns its result.  In order to avoid
+# any ordering issues, the detach step reports its result only once the
+# wait has completed.  This is enforced with a parenthesized marker.
+permutation wait1 wakeup2 detach2(wait1)
 
 # Detach before wakeup.  s1 waits until wakeup, ignores the detach.
 permutation wait1 detach2 wakeup2
#18Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#17)
Re: Add isolation test template in injection_points for wait/wakeup/detach

Hi,

On Wed, Feb 05, 2025 at 03:12:53PM +0900, Michael Paquier wrote:

On Thu, Oct 31, 2024 at 08:47:04AM +0900, Michael Paquier wrote:

Thanks for the review. Applied that, then.

I was looking at src/test/isolation/README, and based on what is
described for parenthesized markers (which I didn't know about so we
learn a new thing every day), it is possible to force a strict
ordering of the test. Quote that matters:
"A marker consisting solely of a step name indicates that this step
may not be reported as completing until that other step has
completed."

Oh, I did not know about it too (thanks for sharing).

In our case, the problem that caused 9f00edc22888 is that the "wait"
step could be so slow that the "detach" step could report before the
"wait" step.

Yup.

So I'd like to propose that we revisit 9f00edc22888, and re-enable the
permutation with a marker so as the "detach" step waits for the "wait"
step to complete before printing its own report, as per the attached.
That should make the permutation safe to use.

The output of the attached patch is the same as what has been removed
in 9f00edc22888 except that the permutation is changed from that:
permutation wait1 wakeup2 detach2
To that, making sure that detach2 report happens only after wait1
has returned its own result:
permutation wait1 wakeup2 detach2(wait1)

Thoughts?

I think that makes sense and the patch LGTM.
A few tests are already using this technique (including injection_points in
inplace.spec).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#18)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Wed, Feb 05, 2025 at 11:03:25AM +0000, Bertrand Drouvot wrote:

A few tests are already using this technique (including injection_points in
inplace.spec).

Right. I've just re-enabled the permutation. Now let's keep a track
of the activity in the CI and the buildfarm.
--
Michael

#20Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#18)
1 attachment(s)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Wed, Feb 05, 2025 at 11:03:25AM +0000, Bertrand Drouvot wrote:

I think that makes sense and the patch LGTM.
A few tests are already using this technique (including injection_points in
inplace.spec).

Well, that's not the end of the story, I have brushed the CF bot for
some activity and noticed this one:
https://cirrus-ci.com/task/6565511640645632

That's kind of rather hard to reach, it seems, but I also got it in
one of my own runs:
--- /tmp/cirrus-ci-build/src/test/modules/injection_points/expected/basic.out	2025-02-06 23:53:40.838077000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/injection_points/isolation/results/basic.out	2025-02-06 23:56:21.848507000 +0000
@@ -13,13 +13,14 @@

(1 row)

+step detach2: SELECT injection_points_detach('injection-points-wait'); <waiting ...>
 step wait1: <... completed>
 injection_points_run
 --------------------

(1 row)

-step detach2: SELECT injection_points_detach('injection-points-wait');
+step detach2: <... completed>
 injection_points_detach
 -----------------------

This is telling us that the detach step could be seen as waiting by
the isolation tester before the wait phase reports for completion. I
didn't think it would be possible to get that, but well, we do.

A marker like detach2(wait1) is not enough to cover that, as this
ensures the order of the step completion output. Using detach2(*)
which would cause the detach2 step to show as <waiting> immediately is
not good either, as the wait could always complete between the
detach's <waiting> and <completed>.

There is an stronger trick mentioned at the end of the README that
should be able to solve this new problem as well as the original one:
an empty step between the wait and the detach. If we do that, the
detach will never be launched until the wait has fully completed,
bringing a stronger ordering of the events: we should never see the
detach as waiting like in this new problem, now would we see the first
problem where the wait would report its result after the detach.

I have done a total of 10 runs in the CI with the attached, without
getting a failure. HEAD was failing a bit more easily than that, with
at least one failure every 5 runs in my branches. Will go adjust that
in a bit as per the attached.
--
Michael

Attachments:

0001-Extra-tweak-for-injection-test-permutation.patchtext/x-diff; charset=us-asciiDownload
From 8f7d244eddd965c4e4725cc88257292ee65214d2 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 7 Feb 2025 13:36:40 +0900
Subject: [PATCH] Extra tweak for injection test permutation

---
 src/test/modules/injection_points/expected/basic.out | 3 ++-
 src/test/modules/injection_points/specs/basic.spec   | 7 ++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/test/modules/injection_points/expected/basic.out b/src/test/modules/injection_points/expected/basic.out
index 840ce2dac90..4fc218ed00d 100644
--- a/src/test/modules/injection_points/expected/basic.out
+++ b/src/test/modules/injection_points/expected/basic.out
@@ -1,6 +1,6 @@
 Parsed test spec with 2 sessions
 
-starting permutation: wait1 wakeup2 detach2
+starting permutation: wait1 wakeup2 noop1 detach2
 injection_points_attach
 -----------------------
                        
@@ -19,6 +19,7 @@ injection_points_run
                     
 (1 row)
 
+step noop1: 
 step detach2: SELECT injection_points_detach('injection-points-wait');
 injection_points_detach
 -----------------------
diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec
index 753128e7f36..13d2793f6e4 100644
--- a/src/test/modules/injection_points/specs/basic.spec
+++ b/src/test/modules/injection_points/specs/basic.spec
@@ -20,6 +20,7 @@ setup	{
 	SELECT injection_points_attach('injection-points-wait', 'wait');
 }
 step wait1	{ SELECT injection_points_run('injection-points-wait'); }
+step noop1	{ }
 
 session s2
 step wakeup2	{ SELECT injection_points_wakeup('injection-points-wait'); }
@@ -27,9 +28,9 @@ step detach2	{ SELECT injection_points_detach('injection-points-wait'); }
 
 # Detach after wait and wakeup.  Note that the detach may finish before
 # the SQL function doing the wait returns its result.  In order to avoid
-# any ordering issues, the detach step reports its result only once the
-# wait has completed.  This is enforced with a parenthesized marker.
-permutation wait1 wakeup2 detach2(wait1)
+# any ordering issues, a no-op step is added after the wait, so as the
+# detach is not launched until the wait has completed.
+permutation wait1 wakeup2 noop1 detach2
 
 # Detach before wakeup.  s1 waits until wakeup, ignores the detach.
 permutation wait1 detach2 wakeup2
-- 
2.47.2

#21Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#20)
Re: Add isolation test template in injection_points for wait/wakeup/detach

On Fri, Feb 07, 2025 at 01:41:20PM +0900, Michael Paquier wrote:

I have done a total of 10 runs in the CI with the attached, without
getting a failure. HEAD was failing a bit more easily than that, with
at least one failure every 5 runs in my branches. Will go adjust that
in a bit as per the attached.

For the sake of the archives. I have been staring at the CF bot for
quite a bit, and no problems have showed up since 926c7fce0360. So I
think that we are good here.
--
Michael