Persist injection points across server restarts

Started by Michael Paquier12 months ago18 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

A bug related to data visibility on standbys with a race condition
mixing 2PC transactions and synchronous_standby_names with the
checkpointer has been fixed a couple of days ago:
/messages/by-id/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru

One issue that we had while discussing this thread is that there was
no easy way to have a regression test because the problem requires a
wait in the checkpointer at a very early startup phase where the
shared memory status data related to s_s_names has *not* been
initialized yet.

I have been working on the problem and found out one nice way to
address this limitation, introducing in the module injection_points a
new function that flushes to a file the set of injection points
currently attached to a cluster, reloading the data from the file to
shmem early at startup when initializing the shmem state data through
shared_preload_libraries.

With that in place, it is then possible to make the checkpointer wait
when it starts at a very early stage, giving a way to reproduce the
original failure reported on the other thread:
- A wait injection point is attached.
- A flush is used to write the points' data to disk.
- Node restarts, loading back their state.
- The wait triggers in the checkpointer.

So, please find attached a patch set for all that:
- 0001 is a patch I have stolen from a different thread (see [1]/messages/by-id/Z_xYkA21KyLEHvWR@paquier.xyz -- Michael),
introducing InjectionPointList() that retrieves a list of the
injection points attached.
- 0002 extends injection_points with a new flush function, that can be
used in TAP tests to persist some points across restarts. One sticky
point is that I did not want to add any of this information in the
core backend injection point APIs, nor to any of the backend
structures because that's not necessary, and what's here is enough for
some TAP tests.
- 0003 adds a new regression test providing some coverage for
2e57790836c6. Reverting 2e57790836c6 causes the test to fail. This
shows how to use this new facility.

This is v19 work, so I am adding that to the next commit fest.

Thanks,

[1]: /messages/by-id/Z_xYkA21KyLEHvWR@paquier.xyz -- Michael
--
Michael

Attachments:

0001-Add-InjectionPointList-to-retrieve-list-of-injection.patchtext/x-diff; charset=us-asciiDownload+63-1
0002-injection_points-Add-function-to-flush-injection-poi.patchtext/x-diff; charset=us-asciiDownload+227-1
0003-Add-regression-test-for-2PC-visibility-check-on-stan.patchtext/x-diff; charset=us-asciiDownload+63-1
#2Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#1)
Re: Persist injection points across server restarts

On 30 Apr 2025, at 10:35, Michael Paquier <michael@paquier.xyz> wrote:

- A flush is used to write the points' data to disk.

Interesting functionality.
Did you consider custom resource manager to WAL-log injection points? This way we do not need to flush them explicitly, they always will be persistent.
Though they will appear on standby, which is, probably, not expected functionality...

Best regards, Andrey Borodin.

#3Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#2)
Re: Persist injection points across server restarts

On Wed, Apr 30, 2025 at 10:52:51AM +0500, Andrey Borodin wrote:

Did you consider custom resource manager to WAL-log injection
points? This way we do not need to flush them explicitly, they
always will be persistent.

WAL-logging would not work in the case I've faced, because we need a
point much earlier than the startup process beginning any redo
activity, so I don't think that this would be useful.
--
Michael

#4Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: Persist injection points across server restarts

On Wed, Apr 30, 2025 at 02:35:40PM +0900, Michael Paquier wrote:

This is v19 work, so I am adding that to the next commit fest.

Rebased, to fix a conflict I've introduced with a different commit.
--
Michael

Attachments:

v2-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patchtext/x-diff; charset=us-asciiDownload+63-1
v2-0002-injection_points-Add-function-to-flush-injection-.patchtext/x-diff; charset=us-asciiDownload+227-1
v2-0003-Add-regression-test-for-2PC-visibility-check-on-s.patchtext/x-diff; charset=us-asciiDownload+63-1
#5Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#4)
Re: Persist injection points across server restarts

On 21 May 2025, at 11:20, Michael Paquier <michael@paquier.xyz> wrote:

Rebased, to fix a conflict I've introduced with a different commit.

I've looked into the patch set and have some more questions:
1. What if we flush many times? Is it supposed to overwrite injection_points.data?
2. What if we restart many times? first startup will remove the file... maybe remove it explicitly?
3. InjectionPoint private data is not handled, is it OK?
4. How session-local points are expected to be flushed? into which session they will be loaded? Maybe let's document that they are not saved?

Besides these, cool new abilities and a test for a bug, looks good to me.

Best regards, Andrey Borodin.

#6Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#5)
Re: Persist injection points across server restarts

On Wed, May 21, 2025 at 12:17:55PM +0300, Andrey Borodin wrote:

I've looked into the patch set and have some more questions:
1. What if we flush many times? Is it supposed to overwrite injection_points.data?

Yes.

2. What if we restart many times? first startup will remove the file... maybe remove it explicitly?

Yes. The point is to persist across one restart. That's the same
behavior as what we do for the stats, implying that an extra flush
would be required across multiple restarts. As this behavior is
within the extension, I don't see why we could not change the
internals at some point if we're unhappy with what it does. The only
case where I can see us do a node restart with point persistence is
TAP tests, where local points don't really make sense. It is true
that we could introduce at some point tests with more advanced
filtering conditions, like backend types where flushes of the private
data could make sense. We don't have that now.

3. InjectionPoint private data is not handled, is it OK?

Because I don't have a case for that in core yet. We could add it, of
course, but I've always defined the bar of what gets into the backend
code as of something that is used by the core tests. I've just not
needed it for this test case.

4. How session-local points are expected to be flushed? into which
session they will be loaded? Maybe let's document that they are not
saved?

Yes, I was wondering about that, but finished by discarding it based
on the same argument as the previous point. Again, we could do that.
Flushes with restarts would be part of TAP tests, where we don't care
about concurrent test sessions, so I'm not sure it is worth worrying
at this stage. That feels like bloat in the backend interface than
actually required.
--
Michael

#7Rahila Syed
rahilasyed90@gmail.com
In reply to: Michael Paquier (#4)
Re: Persist injection points across server restarts

Hi,

This is v19 work, so I am adding that to the next commit fest.

Rebased, to fix a conflict I've introduced with a different commit.

Thank you for the rebased patches. I reviewed the code and have a few
comments for
your consideration.

It appears that Github CI is reporting failures with
injection_points/002_data_persist
failing across all OSes.

Following are comments on
v2-0002-injection_points-Add-function-to-flush-injection-.patch
1.
 /*
+        * Rename file into place, so we atomically replace any old one.
+        */
+       (void) durable_rename(INJ_DUMP_FILE_TMP, INJ_DUMP_FILE, LOG);

I wonder if it would be better to include a check for the return value of
`durable_rename` to
manage potential failure scenarios.

2. +
+ file = AllocateFile(INJ_DUMP_FILE ".tmp", PG_BINARY_W);

Could we perhaps add a brief comment clarifying the rationale behind the
use of a
temporary file in this context?

3.   +               if (fread(buf, 1, len + 1, file) != len + 1)
+                       goto error;
+               library = pstrdup(buf);

It seems that `buf` should be null-terminated before being passed to
`pstrdup(buf)`.
Otherwise, `strlen` might not accurately determine the length. This seems
to be
consistent with the handling of `fread` calls in `snapmgr.c`.

Thank you,
Rahila Syed

#8Michael Paquier
michael@paquier.xyz
In reply to: Rahila Syed (#7)
Re: Persist injection points across server restarts

On Mon, May 26, 2025 at 01:17:46PM +0530, Rahila Syed wrote:

It appears that Github CI is reporting failures with
injection_points/002_data_persist
failing across all OSes.

I am not sure to see what you are referring to here, based on the
following reports:
https://commitfest.postgresql.org/patch/5731/
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5731
http://cfbot.cputube.org/highlights/all.html

The last failure reported that I know of was due to the addition of
the runtime arguments to the macro INJECTION_POINT().

I wonder if it would be better to include a check for the return value of
`durable_rename` to
manage potential failure scenarios.

Or we can be smarter and make this call an ERROR.

2. +
+ file = AllocateFile(INJ_DUMP_FILE ".tmp", PG_BINARY_W);

Could we perhaps add a brief comment clarifying the rationale behind the
use of a
temporary file in this context?

Sure. This is about avoiding the load of incomplete files if there's
a crash in the middle of a flush call, for example.

It seems that `buf` should be null-terminated before being passed to
`pstrdup(buf)`.
Otherwise, `strlen` might not accurately determine the length. This seems
to be
consistent with the handling of `fread` calls in `snapmgr.c`.

Hmm, good point. That would be safer, even if the write part should
already push the null termination.
--
Michael

Attachments:

v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patchtext/x-diff; charset=us-asciiDownload+63-1
v3-0002-injection_points-Add-function-to-flush-injection-.patchtext/x-diff; charset=us-asciiDownload+234-1
v3-0003-Add-regression-test-for-2PC-visibility-check-on-s.patchtext/x-diff; charset=us-asciiDownload+63-1
#9Rahila Syed
rahilasyed90@gmail.com
In reply to: Michael Paquier (#8)
Re: Persist injection points across server restarts

Hi,

Thank you for addressing the review comments.

It appears that Github CI is reporting failures with

injection_points/002_data_persist
failing across all OSes.

I am not sure to see what you are referring to here, based on the
following reports:
https://commitfest.postgresql.org/patch/5731/
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5731
http://cfbot.cputube.org/highlights/all.html

The last failure reported that I know of was due to the addition of
the runtime arguments to the macro INJECTION_POINT().

After applying the v3-patches, I see failure like these:
macOS - Sonoma - Meson - Cirrus CI
<https://cirrus-ci.com/task/4589192849653760&gt;
Windows - Server 2019, VS 2019 - Meson & ninja - Cirrus CI
<https://cirrus-ci.com/task/5715092756496384&gt;
Linux - Debian Bookworm - Meson - Cirrus CI
<https://cirrus-ci.com/task/6629886430806016&gt;

I wonder if it would be better to include a check for the return value of
`durable_rename` to
manage potential failure scenarios.

Or we can be smarter and make this call an ERROR.

One issue with this approach is that if a failure occurs during rename, we
cannot
unlink(INJ_DUMP_FILE_TMP), because durable_rename does not handle this
before throwing an error.

Should we also consider unlinking the target file INJ_DUMP_FILE in case of
an
error during renaming?
Since target files are already synced at the start in the durable_rename()
function.

Thank you,
Rahila Syed

#10Michael Paquier
michael@paquier.xyz
In reply to: Rahila Syed (#9)
Re: Persist injection points across server restarts

On Thu, May 29, 2025 at 01:17:21PM +0530, Rahila Syed wrote:

After applying the v3-patches, I see failure like these:
macOS - Sonoma - Meson - Cirrus CI
<https://cirrus-ci.com/task/4589192849653760&gt;
Windows - Server 2019, VS 2019 - Meson & ninja - Cirrus CI
<https://cirrus-ci.com/task/5715092756496384&gt;
Linux - Debian Bookworm - Meson - Cirrus CI
<https://cirrus-ci.com/task/6629886430806016&gt;

These tasks complain about the following:
[07:22:58.931] ../src/include/utils/injection_point.h:62:28: note:
'InjectionPointList' declared here [07:22:58.931] 62 | extern
InjectionPointData *InjectionPointList(uint32 *num_points);

The latest version of the patch (v3) posted on this thread includes
the following:
./v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch:
+extern List *InjectionPointList(void);
./v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch:
+InjectionPointList(void)
./v3-0002-injection_points-Add-function-to-flush-injection-.patch:
+ inj_points = InjectionPointList();

How is it possible for what you have applied to include changes that
are not part of the patch set posted? Manual mistake on your side
perhaps where versions of the patches have been mixed?

The CF bot is green:
https://commitfest.postgresql.org/patch/5731/

Should we also consider unlinking the target file INJ_DUMP_FILE in case of
an error during renaming?

I don't think that this is worth caring for the sake of tests. If an
error happens before the atomic rename is done to the final file,
leaving around a temporary file, the next flush would just recreate a
new file from scratch with the new contents.
--
Michael

#11Rahila Syed
rahilasyed90@gmail.com
In reply to: Michael Paquier (#10)
Re: Persist injection points across server restarts

Hi Michael,

On Fri, May 30, 2025 at 5:01 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, May 29, 2025 at 01:17:21PM +0530, Rahila Syed wrote:

After applying the v3-patches, I see failure like these:
macOS - Sonoma - Meson - Cirrus CI
<https://cirrus-ci.com/task/4589192849653760&gt;
Windows - Server 2019, VS 2019 - Meson & ninja - Cirrus CI
<https://cirrus-ci.com/task/5715092756496384&gt;
Linux - Debian Bookworm - Meson - Cirrus CI
<https://cirrus-ci.com/task/6629886430806016&gt;

These tasks complain about the following:
[07:22:58.931] ../src/include/utils/injection_point.h:62:28: note:
'InjectionPointList' declared here [07:22:58.931] 62 | extern
InjectionPointData *InjectionPointList(uint32 *num_points);

The latest version of the patch (v3) posted on this thread includes
the following:
./v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch:
+extern List *InjectionPointList(void);
./v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch:
+InjectionPointList(void)
./v3-0002-injection_points-Add-function-to-flush-injection-.patch:
+ inj_points = InjectionPointList();

How is it possible for what you have applied to include changes that
are not part of the patch set posted? Manual mistake on your side
perhaps where versions of the patches have been mixed?

The CF bot is green:
https://commitfest.postgresql.org/patch/5731/

I apologize for the confusion; I mistakenly applied the wrong version of
the 0001 patch.
I can now confirm that the tests are passing on my end.

Thank you,
Rahila Syed

#12Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#8)
Re: Persist injection points across server restarts

On 27 May 2025, at 05:51, Michael Paquier <michael@paquier.xyz> wrote:

<v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch>

<v3-0002-injection_points-Add-function-to-flush-injection-.patch>

<v3-0003-Add-regression-test-for-2PC-visibility-check-on-s.patch>

I've made another round of looking into these patches.
I think it's fine that private data is not included. But the feature looks orthogonal, and I do not see specific reasons why it should be omitted when IPs are persisted.
Another idea of improvement is using distinguishable errors in injection_shmem_startup(). Like differentiating between read error and wrong magic number.
But there's no big value in these improvements, so the patch is fine as is too.

Best regards, Andrey Borodin.

#13Jeff Davis
pgsql@j-davis.com
In reply to: Michael Paquier (#1)
Re: Persist injection points across server restarts

On Wed, 2025-04-30 at 14:35 +0900, Michael Paquier wrote:

- 0001 is a patch I have stolen from a different thread (see [1]),
introducing InjectionPointList() that retrieves a list of the
injection points attached.

Unless that other thread is blocked for some reason, I'd suggest just
going ahead with the SQL callable function first. That seems
independently useful.

- 0002 extends injection_points with a new flush function, that can
be
used in TAP tests to persist some points across restarts.  One sticky
point is that I did not want to add any of this information in the
core backend injection point APIs, nor to any of the backend
structures because that's not necessary, and what's here is enough
for
some TAP tests.

This seems like a fair amount of complexity for a single use case.
Arguably, complexity around testing infrastructure is a lot less of a
problem than other kinds of complexity, so it might be OK. But it would
be nice if there were a couple cases that would benefit rather than
one.

Regards,
Jeff Davis

#14Michael Paquier
michael@paquier.xyz
In reply to: Jeff Davis (#13)
Re: Persist injection points across server restarts

On Mon, Jun 02, 2025 at 12:42:35PM -0700, Jeff Davis wrote:

Unless that other thread is blocked for some reason, I'd suggest just
going ahead with the SQL callable function first. That seems
independently useful.

Thanks. Having InjectionPointList() was also required for the other
thread.

This seems like a fair amount of complexity for a single use case.
Arguably, complexity around testing infrastructure is a lot less of a
problem than other kinds of complexity, so it might be OK. But it would
be nice if there were a couple cases that would benefit rather than
one.

In all the approaches I've considered, this one was the least worst of
all based on the point that all the complexity is hidden in the test
module; there is no need to touch the backend code at all as long as
there is a way to retrieve the list of points that would be dumped to
disk.

Another set of test cases I had in mind was waits during recovery
before consistency is reached. There is no way to add a point without
connecting to the database, and we've had plenty of fixes involving
the startup process and a different process, mostly the checkpointer.
That's an annoying limitation.
--
Michael

#15Jeff Davis
pgsql@j-davis.com
In reply to: Michael Paquier (#14)
Re: Persist injection points across server restarts

On Tue, 2025-06-03 at 13:13 +0900, Michael Paquier wrote:

In all the approaches I've considered, this one was the least worst
of
all based on the point that all the complexity is hidden in the test
module; there is no need to touch the backend code at all as long as
there is a way to retrieve the list of points that would be dumped to
disk.

True, though it does create a new file.

Another set of test cases I had in mind was waits during recovery
before consistency is reached.  There is no way to add a point
without
connecting to the database, and we've had plenty of fixes involving
the startup process and a different process, mostly the checkpointer.
That's an annoying limitation.

If you have in mind some other ways to use it than I like it a lot
more. And I don't have a better idea.

Regards,
Jeff Davis

#16Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#14)
Re: Persist injection points across server restarts

Hi,

On 2025-06-03 13:13:06 +0900, Michael Paquier wrote:

On Mon, Jun 02, 2025 at 12:42:35PM -0700, Jeff Davis wrote:

This seems like a fair amount of complexity for a single use case.
Arguably, complexity around testing infrastructure is a lot less of a
problem than other kinds of complexity, so it might be OK. But it would
be nice if there were a couple cases that would benefit rather than
one.

In all the approaches I've considered, this one was the least worst of
all based on the point that all the complexity is hidden in the test
module; there is no need to touch the backend code at all as long as
there is a way to retrieve the list of points that would be dumped to
disk.

Another set of test cases I had in mind was waits during recovery
before consistency is reached. There is no way to add a point without
connecting to the database, and we've had plenty of fixes involving
the startup process and a different process, mostly the checkpointer.
That's an annoying limitation.

I'm somewhat doubtful this is is the right direction. Tests that require
injection points before consistency also can't wait for injection points using
the SQL interface or such, so most of the stuff has to be written in C
anyway. And if so, you also can attach to injection points in the relevant
shared_preload_libraries entry.

Greetings,

Andres Freund

#17Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#16)
Re: Persist injection points across server restarts

On Tue, Jun 03, 2025 at 03:34:16PM -0400, Andres Freund wrote:

I'm somewhat doubtful this is is the right direction. Tests that require
injection points before consistency also can't wait for injection points using
the SQL interface or such, so most of the stuff has to be written in C
anyway. And if so, you also can attach to injection points in the relevant
shared_preload_libraries entry.

Hmm. I'm wondering about an alternate approach here: a postmaster GUC
in injection_points that can take in input a list of
name/library/function where the module would load them when
initializing. That's a bit artistic, perhaps, still it would work
without having to worry about the flush and reload steps.
--
Michael

#18Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#17)
Re: Persist injection points across server restarts

On Wed, Jun 04, 2025 at 09:15:08AM +0900, Michael Paquier wrote:

On Tue, Jun 03, 2025 at 03:34:16PM -0400, Andres Freund wrote:

I'm somewhat doubtful this is is the right direction. Tests that require
injection points before consistency also can't wait for injection points using
the SQL interface or such, so most of the stuff has to be written in C
anyway. And if so, you also can attach to injection points in the relevant
shared_preload_libraries entry.

Hmm. I'm wondering about an alternate approach here: a postmaster GUC
in injection_points that can take in input a list of
name/library/function where the module would load them when
initializing. That's a bit artistic, perhaps, still it would work
without having to worry about the flush and reload steps.

[A couple of weeks later]

I am still not sure what's the right course of action here, so I'd
agree to just reject the patch for now, but keep it in mind once we
have more cases where this could be useful. The case I've found is
perhaps not enough. Updating the CF app to reflect on this feedback
now, thanks all for the input.
--
Michael