pgsql: Add test case for an archive recovery corner case.

Started by Heikki Linnakangasalmost 4 years ago9 messages
#1Heikki Linnakangas
heikki.linnakangas@iki.fi

Add test case for an archive recovery corner case.

While I was working on a patch to refactor things around xlog.c, I mixed
up EndOfLogTLI and replayTLI at the end of recovery. As a result, if you
recovered to a point with a lower-numbered timeline in a WAL segment
that has a higher TLI in the filename, the end-of-recovery WAL record
was created with invalid PrevTimeLineId. I noticed that while
self-reviewing, but no tests failed. So add a test to cover that corner
case.

Thanks to Amul Sul who also submitted a test case for the same corner
case, although this patch is different from that.

Reviewed-by: Amul Sul, Michael Paquier
Discussion: /messages/by-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi
Discussion: /messages/by-id/CAAJ_b94Vjt5cXGza_1MkjLQWciNdEemsmiWuQj0d=M7JfjAa1g@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/50e5bc582a4371a88218078a2c651d901bf87d96

Modified Files
--------------
src/test/recovery/t/028_pitr_timelines.pl | 176 ++++++++++++++++++++++++++++++
1 file changed, 176 insertions(+)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: pgsql: Add test case for an archive recovery corner case.

Heikki Linnakangas <heikki.linnakangas@iki.fi> writes:

Add test case for an archive recovery corner case.

hoverfly seems not to like this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&amp;dt=2022-02-14%2012%3A36%3A12

# poll_query_until timed out executing this query:
# SELECT '000000020000000000000003' <= last_archived_wal FROM pg_stat_archiver;
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 1.
[13:22:56] t/028_pitr_timelines.pl ..............
Dubious, test returned 29 (wstat 7424, 0x1d00)
All 1 subtests passed

regards, tom lane

#3Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#2)
Re: pgsql: Add test case for an archive recovery corner case.

On 14/02/2022 16:41, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@iki.fi> writes:

Add test case for an archive recovery corner case.

hoverfly seems not to like this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&amp;dt=2022-02-14%2012%3A36%3A12

Hmm, only hoverfly - and even that succeeded on next run. Some kind of a
flakyness I guess. I'll try to run the test in a loop and see if I can
reproduce it.

- Heikki

#4Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Heikki Linnakangas (#3)
1 attachment(s)
last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

On 14/02/2022 22:43, Heikki Linnakangas wrote:

On 14/02/2022 16:41, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@iki.fi> writes:

Add test case for an archive recovery corner case.

hoverfly seems not to like this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&amp;dt=2022-02-14%2012%3A36%3A12

Hmm, only hoverfly - and even that succeeded on next run. Some kind of a
flakyness I guess. I'll try to run the test in a loop and see if I can
reproduce it.

That was interesting: the order that WAL segments are archived when a
standby is promoted is not fully deterministic. The test polled
pg_stat_archiver.last_archived_wal to wait until a particular WAL
segment was archived, but it could happen that a lower-numbered WAL
segment was archived *after* the waited-for segment, and
pg_stat_archiver.last_archived_wal therefore displayed the
lower-numbered WAL segment. So the test incorrectly thought that the
higher-numbered segment that it waits for hadn't been archived yet.

I realized that this test doesn't really need to wait for the segment to
be archived, because it will stop the standby server immediately after
that, and stopping a server implicitly waits for all the WAL to be
archived before the archiver process exits. So I just removed it.

During normal operations the WAL segments are archived in order. But I'm
not sure if there are some other corner cases, aside from promoting a
standby server, when this could happen. After crash restart, maybe, if
some .ready/done files are lost.

I find it a bit surprising that pg_stat_archiver.last_archived_wal is
not necessarily the highest-numbered segment that was archived. I
propose that we mention that in the docs, as in the attached patch.

I'll commit this soon, to silence the occasional failures on the
buildfarm, but let me know if you have any better suggestions or thoughts.

- Heikki

Attachments:

0001-Fix-race-condition-in-028_pitr_timelines.pl-test-add.patchtext/x-patch; charset=UTF-8; name=0001-Fix-race-condition-in-028_pitr_timelines.pl-test-add.patchDownload
From 271d4ec8caec018186ab0314295ee761e2882488 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 15 Feb 2022 23:14:59 +0200
Subject: [PATCH 1/1] Fix race condition in 028_pitr_timelines.pl test, add
 note to docs.

The 028_pitr_timelines.pl test would sometimes hang, waiting for a WAL
segment that was just filled up to be archived. It was because the
test used 'pg_stat_archiver.last_archived_wal' to check if a file was
archived, but the order that WAL files are archived when a standby is
promoted is not fully deterministic, and 'last_archived_wal' tracks
the last segment that was archived, not the highest-numbered WAL
segment. Because of that, if the archiver archived segment 3, and then
2, 'last_archived_wal' say 2, and the test query would think that 3
has not been archived yet.

Normally, WAL files are marked ready for archival in order, and the
archiver process will process them in order, so that issue doesn't
arise.  We have used the same query on 'last_archived_wal' in a few
other tests with no problem. But when a standby is promoted, things
are a bit chaotic. After promotion, the server will try to archive all
the WAL segments from the old timeline that are in pg_wal, as well as
the history file and any new WAL segments on the new timeline. The
end-of-recovery checkpoint will create the .ready files for all the
WAL files on the old timeline, but at the same time, the new timeline
is opened up for business. A file from the new timeline can therefore
be archived before the files from the old timeline have been marked as
ready for archival.

It turns out that we don't really need to wait for the archival in
this particular test, because the standby server is about to be
stopped, and stopping a server will wait for the end-of-recovery
checkpoint and all WAL archivals to finish, anyway. So we can just
remove it from the test.

Add a note to the docs on 'pg_stat_archiver' view that files can be
archived out of order.

Discussion: https://www.postgresql.org/message-id/2842251.1644849700@sss.pgh.pa.us
---
 doc/src/sgml/monitoring.sgml              |  9 +++++++++
 src/test/recovery/t/028_pitr_timelines.pl | 13 ++-----------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..44e08225a8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3474,6 +3474,15 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    </tgroup>
   </table>
 
+  <para>
+    Normally, WAL files are archived in order, oldest to newest, but that is
+    not guaranteed, and does not hold under special circumstances like when
+    promoting a standby or after crash recovery. Therefore it is not safe to
+    assume that all files older than
+    <structfield>last_archived_wal</structfield> have also been successfully
+    archived.
+  </para>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-bgwriter-view">
diff --git a/src/test/recovery/t/028_pitr_timelines.pl b/src/test/recovery/t/028_pitr_timelines.pl
index c0b76fe37b..a8b12d9af6 100644
--- a/src/test/recovery/t/028_pitr_timelines.pl
+++ b/src/test/recovery/t/028_pitr_timelines.pl
@@ -36,7 +36,6 @@ use File::Compare;
 # Initialize and start primary node with WAL archiving
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 $node_primary->init(has_archiving => 1, allows_streaming => 1);
-$node_primary->append_conf('postgresql.conf', 'log_min_messages=debug1');
 $node_primary->start;
 
 # Take a backup.
@@ -70,7 +69,6 @@ $node_standby->init_from_backup(
 	has_archiving => 1,
 	has_restoring => 0);
 $node_standby->append_conf('postgresql.conf', 'archive_mode = always');
-$node_standby->append_conf('postgresql.conf', 'log_min_messages=debug1');
 $node_standby->start;
 $node_primary->wait_for_catchup($node_standby);
 
@@ -93,15 +91,8 @@ my $walfile_to_be_archived = $node_standby->safe_psql('postgres',
 # Make WAL segment eligible for archival
 $node_standby->safe_psql('postgres', 'SELECT pg_switch_wal()');
 
-# Wait until the WAL segment has been archived.
-my $archive_wait_query =
-  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
-$node_standby->poll_query_until('postgres', $archive_wait_query)
-  or die "Timed out while waiting for WAL segment to be archived";
-my $last_archived_wal_file = $walfile_to_be_archived;
-
-# Ok, the standby has now archived the WAL on timeline 2.  We don't
-# need the standby anymore.
+# We don't need the standby anymore, request shutdown. The server will
+# finish archiving all the WAL on timeline 2 before it exits.
 $node_standby->stop;
 
 # Contents of the WAL archive at this point:
-- 
2.30.2

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#4)
Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

Heikki Linnakangas <hlinnaka@iki.fi> writes:

That was interesting: the order that WAL segments are archived when a
standby is promoted is not fully deterministic.

Oh, of course.

I find it a bit surprising that pg_stat_archiver.last_archived_wal is
not necessarily the highest-numbered segment that was archived. I
propose that we mention that in the docs, as in the attached patch.

+1, but I think the description of that field in the pg-stat-archiver-view
table is also pretty misleading. Maybe like

-      Name of the last WAL file successfully archived
+      Name of the WAL file most recently successfully archived

and similarly s/last/most recent/ for the other fields claiming
to be "last" something.

regards, tom lane

#6Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#5)
Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

On 15/02/2022 23:28, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

That was interesting: the order that WAL segments are archived when a
standby is promoted is not fully deterministic.

Oh, of course.

I find it a bit surprising that pg_stat_archiver.last_archived_wal is
not necessarily the highest-numbered segment that was archived. I
propose that we mention that in the docs, as in the attached patch.

+1, but I think the description of that field in the pg-stat-archiver-view
table is also pretty misleading. Maybe like

-      Name of the last WAL file successfully archived
+      Name of the WAL file most recently successfully archived

and similarly s/last/most recent/ for the other fields claiming
to be "last" something.

Makes sense, committed it that way.

- Heikki

#7Noah Misch
noah@leadboat.com
In reply to: Heikki Linnakangas (#6)
2 attachment(s)
Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

On Wed, Feb 16, 2022 at 01:42:27AM +0200, Heikki Linnakangas wrote:

On 15/02/2022 23:28, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

That was interesting: the order that WAL segments are archived when a
standby is promoted is not fully deterministic.

Oh, of course.

I find it a bit surprising that pg_stat_archiver.last_archived_wal is
not necessarily the highest-numbered segment that was archived. I
propose that we mention that in the docs, as in the attached patch.

+1, but I think the description of that field in the pg-stat-archiver-view
table is also pretty misleading. Maybe like

-      Name of the last WAL file successfully archived
+      Name of the WAL file most recently successfully archived

and similarly s/last/most recent/ for the other fields claiming
to be "last" something.

Makes sense, committed it that way.

This has seen two failures like the following:
# Failed test 'check table contents after point-in-time recovery'
# at t/028_pitr_timelines.pl line 167.
# got: '2'
# expected: '3'
# Looks like you failed 1 test of 3.

sysname │ snapshot │ branch │ bfurl
──────────┼─────────────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────────────────
sungazer │ 2022-03-08 16:24:46 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2022-03-08%2016%3A24%3A46
mylodon │ 2022-05-18 00:14:19 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&amp;dt=2022-05-18%2000%3A14%3A19

For me, it reproduces consistently with a sleep just before the startup
process exits:

--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -266,6 +266,8 @@ StartupProcessMain(void)
 	 */
 	StartupXLOG();

+ pg_usleep(3 * 1000 * 1000);
+
/*
* Exit normally. Exit code 0 tells postmaster that we completed recovery
* successfully.

A normal/passing run gets this sequence of events:

checkpointer: end-of-recovery checkpoint
startup process: exits
postmaster reaper(): "Startup succeeded, commence normal operations" = StartArchiver()
test issues its INSERT
shutdown checkpoint
checkpointer: exits
postmaster reaper(): signal_child(PgArchPID, SIGUSR2)
archiver: completes archiving, exits

However, if the startup process exit is slow enough, you get:

checkpointer: end-of-recovery checkpoint
test issues its INSERT
shutdown checkpoint
checkpointer: exits
postmaster reaper(): no PgArchPID, skip that part
[no archiving]

One can adapt the test to the server behavior by having the test wait for the
archiver to start, as attached. This is sufficient to make check-world pass
with the above sleep in place. I think we should also modify the PostgresNode
archive_command to log a message. That lack of logging was a obstacle
upthread (as seen in commit 3279cef) and again here.

An alternative would be to declare that the test is right and the server is
wrong. The postmaster knows how to start the checkpointer if the checkpointer
is not running when the postmaster needs a shutdown checkpoint. It could
start the archiver around that same area:

/* Start the checkpointer if not running */
if (CheckpointerPID == 0)
CheckpointerPID = StartCheckpointer();
/* And tell it to shut down */
if (CheckpointerPID != 0)
{
signal_child(CheckpointerPID, SIGUSR2);
pmState = PM_SHUTDOWN;
}

Any opinions between the change-test and change-server approaches?

The test is new in HEAD, but I get the same behavior in v14 and v13. In v12,
node_pitr2 never exits pg_is_in_recovery(). v11 would need test code changes,
which I did not attempt.

Thanks,
nm

Attachments:

archiver-test-wait-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    

diff --git a/src/test/recovery/t/028_pitr_timelines.pl b/src/test/recovery/t/028_pitr_timelines.pl
index a8b12d9..bad02ed 100644
--- a/src/test/recovery/t/028_pitr_timelines.pl
+++ b/src/test/recovery/t/028_pitr_timelines.pl
@@ -140,6 +140,13 @@ is($result, qq{1}, "check table contents after point-in-time recovery");
 # back to this timeline.
 $node_pitr->safe_psql('postgres', "INSERT INTO foo VALUES(3);");
 
+# Wait for the archiver to be running.  The startup process might have yet to
+# exit, in which case the postmaster has not started the archiver.  If we
+# stop() without an archiver, the archive will be incomplete.
+$node_pitr->poll_query_until('postgres',
+	"SELECT true FROM pg_stat_activity WHERE backend_type = 'archiver';")
+  or die "Timed out while waiting for archiver to start";
+
 # Stop the node.  This archives the last segment.
 $node_pitr->stop();
 
archiver-test-logging-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index c8c7bc5..12f35a7 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1124,7 +1124,7 @@ sub enable_archiving
 	my $copy_command =
 	  $PostgreSQL::Test::Utils::windows_os
 	  ? qq{copy "%p" "$path\\\\%f"}
-	  : qq{cp "%p" "$path/%f"};
+	  : qq{echo >&2 "ARCHIVE_COMMAND %p"; cp "%p" "$path/%f"};
 
 	# Enable archive_mode and archive_command on node
 	$self->append_conf(
#8Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#7)
Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

On Mon, Jun 27, 2022 at 12:04:57AM -0700, Noah Misch wrote:

For me, it reproduces consistently with a sleep just before the startup
process exits:

Nice catch.

One can adapt the test to the server behavior by having the test wait for the
archiver to start, as attached. This is sufficient to make check-world pass
with the above sleep in place. I think we should also modify the PostgresNode
archive_command to log a message. That lack of logging was a obstacle
upthread (as seen in commit 3279cef) and again here.

          ? qq{copy "%p" "$path\\\\%f"}
-         : qq{cp "%p" "$path/%f"};
+         : qq{echo >&2 "ARCHIVE_COMMAND %p"; cp "%p" "$path/%f"};

This is a bit inelegant. Perhaps it would be better through a perl
wrapper like cp_history_files?

An alternative would be to declare that the test is right and the server is
wrong. The postmaster knows how to start the checkpointer if the checkpointer
is not running when the postmaster needs a shutdown checkpoint. It could
start the archiver around that same area:

/* Start the checkpointer if not running */
if (CheckpointerPID == 0)
CheckpointerPID = StartCheckpointer();
/* And tell it to shut down */
if (CheckpointerPID != 0)
{
signal_child(CheckpointerPID, SIGUSR2);
pmState = PM_SHUTDOWN;
}

Any opinions between the change-test and change-server approaches?

The startup sequence can be sometimes tricky. Though I don't have a
specific argument coming into mind, I would stick to a fix in the
test.
--
Michael

#9Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#8)
Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)

On Mon, Jun 27, 2022 at 07:32:08PM +0900, Michael Paquier wrote:

On Mon, Jun 27, 2022 at 12:04:57AM -0700, Noah Misch wrote:

One can adapt the test to the server behavior by having the test wait for the
archiver to start, as attached. This is sufficient to make check-world pass
with the above sleep in place. I think we should also modify the PostgresNode
archive_command to log a message. That lack of logging was a obstacle
upthread (as seen in commit 3279cef) and again here.

? qq{copy "%p" "$path\\\\%f"}
-         : qq{cp "%p" "$path/%f"};
+         : qq{echo >&2 "ARCHIVE_COMMAND %p"; cp "%p" "$path/%f"};

This is a bit inelegant. Perhaps it would be better through a perl
wrapper like cp_history_files?

I see it the other way. Replacing a 49-character compound command with a
wrapper script would gain no particular advantage, and it would give readers
of the test code one more file to open and understand.