Test to dump and restore objects left behind by regression

Started by Ashutosh Bapatalmost 2 years ago106 messages

ashutosh.bapat.oss@gmail.com

almost 2 years ago

1 attachment(s)

Hi All,
In [1]/messages/by-id/CAExHW5vyqv=XLTcNMzCNccOrHiun_XhYPjcRqeV6dLvZSamriQ@mail.gmail.com we found that having a test to dump and restore objects left
behind by regression test is missing. Such a test would cover many
dump restore scenarios without much effort. It will also help identity
problems described in the same thread [2]/messages/by-id/3462358.1708107856@sss.pgh.pa.us during development itself.

I am starting a new thread to discuss such a test. Attached is a WIP
version of the test. The test does fail at the restore step when
commit 74563f6b90216180fc13649725179fc119dddeb5 is reverted
reintroducing the problem.

Attached WIP test is inspired from
src/bin/pg_upgrade/t/002_pg_upgrade.pl which tests binary-upgrade
dumps. Attached test tests the non-binary-upgrade dumps.

Similar to 0002_pg_upgrade.pl the test uses SQL dumps before and after
dump and restore to make sure that the objects are restored correctly.
The test has some shortcomings
1. Objects which are not dumped at all are never tested.
2. Since the rows are dumped in varying order by the two clusters, the
test only tests schema dump and restore.
3. The order of columns of the inheritance child table differs
depending upon the DDLs used to reach a given state. This introduces
diffs in the SQL dumps before and after restore. The test ignores
these diffs by hardcoding the diff in the test.

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.

Suggestions welcome.

[1]: /messages/by-id/CAExHW5vyqv=XLTcNMzCNccOrHiun_XhYPjcRqeV6dLvZSamriQ@mail.gmail.com
[2]: /messages/by-id/3462358.1708107856@sss.pgh.pa.us

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-WIP-Test-to-dump-and-restore-object-left-be-20240221.patchapplication/x-patch; name=0001-WIP-Test-to-dump-and-restore-object-left-be-20240221.patchDownload

From 78903d2cad4e94e05db74b2473f82aabb498f987 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Wed, 21 Feb 2024 11:02:40 +0530
Subject: [PATCH 1/2] WIP Test to dump and restore object left behind by
 regression

Regression run leaves many database objects behind in a variety of states.
Dumping and restoring these objects covers many dump/restore scenarios not
covered elsewhere.  src/bin/pg_upgrade/t/002_pg_upgrade.pl test pg_upgrade this
way. But it does not cover non-binary-upgrade dump and restore.

The test takes a dump of regression database from one cluster and restores it
on another cluster. It compares the dumps from both the clusters in SQL format
to make sure that the objects dumped were restored properly. Obviously if some
objects were not dumped, those remain untested. Hence this isn't a complete
test.

The order in which data rows are dumped varies a lot between dump and restore,
hence the tests only schema dumps.

The order in which columns of an inherited table are dumped varies depending
upon the DDLs used to set up inheritance. This introduces some difference in
the SQL dumps taken from the two clusters. Those differences are explicitly
ignored in the test.

TODO:
1. We could test formats other than -Fc

Ashutosh Bapat
---
 src/bin/pg_dump/Makefile                 |   4 +
 src/bin/pg_dump/t/006_pg_dump_regress.pl | 180 +++++++++++++++++++++++
 2 files changed, 184 insertions(+)
 create mode 100644 src/bin/pg_dump/t/006_pg_dump_regress.pl

diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index 930c741c95..29a3d67953 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -42,6 +42,10 @@ OBJS = \
 	pg_backup_tar.o \
 	pg_backup_utils.o
 
+# required for 006_pg_dump_regress.pl
+REGRESS_SHLIB=$(abs_top_builddir)/src/test/regress/regress$(DLSUFFIX)
+export REGRESS_SHLIB
+
 all: pg_dump pg_restore pg_dumpall
 
 pg_dump: pg_dump.o common.o pg_dump_sort.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
diff --git a/src/bin/pg_dump/t/006_pg_dump_regress.pl b/src/bin/pg_dump/t/006_pg_dump_regress.pl
new file mode 100644
index 0000000000..c3016f975d
--- /dev/null
+++ b/src/bin/pg_dump/t/006_pg_dump_regress.pl
@@ -0,0 +1,180 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+# Test dump and restore of regression data. This is expected to cover dump and
+# restore of most of the types of objects left behind in different states by
+# the regression test.
+use strict;
+use warnings FATAL => 'all';
+
+use Cwd            qw(abs_path);
+use File::Basename qw(dirname);
+use File::Compare;
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::AdjustUpgrade;
+use Test::More;
+
+sub generate_regress_data
+{
+	my ($node) = @_;
+
+	# The default location of the source code is the root of this directory.
+	my $srcdir = abs_path("../../..");
+
+	# Grab any regression options that may be passed down by caller.
+	my $extra_opts = $ENV{EXTRA_REGRESS_OPTS} || "";
+
+	# --dlpath is needed to be able to find the location of regress.so
+	# and any libraries the regression tests require.
+	my $dlpath = dirname($ENV{REGRESS_SHLIB});
+
+	# --outputdir points to the path where to place the output files.
+	my $outputdir = $PostgreSQL::Test::Utils::tmp_check;
+
+	# --inputdir points to the path of the input files.
+	my $inputdir = "$srcdir/src/test/regress";
+
+	note 'running regression tests in old instance';
+	my $rc =
+	  system($ENV{PG_REGRESS}
+		  . " $extra_opts "
+		  . "--dlpath=\"$dlpath\" "
+		  . "--bindir= "
+		  . "--host="
+		  . $node->host . " "
+		  . "--port="
+		  . $node->port . " "
+		  . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+		  . "--max-concurrent-tests=20 "
+		  . "--inputdir=\"$inputdir\" "
+		  . "--outputdir=\"$outputdir\"");
+	if ($rc != 0)
+	{
+		# Dump out the regression diffs file, if there is one
+		my $diffs = "$outputdir/regression.diffs";
+		if (-e $diffs)
+		{
+			print "=== dumping $diffs ===\n";
+			print slurp_file($diffs);
+			print "=== EOF ===\n";
+		}
+	}
+	is($rc, 0, 'regression tests pass');
+}
+
+# Paths to the dumps taken during the tests.
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+my $dump1_file = "$tempdir/dump1.sql";
+my $dump2_file = "$tempdir/dump2.sql";
+my $dump_file = "$tempdir/dump.out";
+
+# Initialize source node
+my $src_node =
+  PostgreSQL::Test::Cluster->new('src_node');
+
+$src_node->init();
+$src_node->start;
+generate_regress_data($src_node);
+
+# Initialize a new node for the upgrade.
+my $dst_node = PostgreSQL::Test::Cluster->new('new_node');
+
+$dst_node->init();
+
+# In a VPATH build, we'll be started in the source directory, but we want
+# to run pg_upgrade in the build directory so that any files generated finish
+# in it, like delete_old_cluster.{sh,bat}.
+chdir ${PostgreSQL::Test::Utils::tmp_check};
+
+# Dump source database for comparison later
+command_ok(
+	[
+		'pg_dump', '-s', '-d', 'regression',
+		'-h', $src_node->host,
+		'-p', $src_node->port,
+		'-f', $dump1_file
+	],
+	'pg_dump on source instance');
+
+# Dump to be restored
+command_ok(
+	[
+		'pg_dump', '-Fc', '-d', 'regression',
+		'-h', $src_node->host,
+		'-p', $src_node->port,
+		'-f', $dump_file
+	],
+	'pg_dump on source instance');
+
+$dst_node->start;
+$dst_node->command_ok(
+		[ 'createdb', 'regression' ],
+		"created destination database");
+
+# Restore into destination database
+command_ok(
+	[
+		'pg_restore', '-d', 'regression',
+		'-h', $dst_node->host,
+		'-p', $dst_node->port,
+		$dump_file
+	],
+	'pg_restore on destination instance');
+
+# Dump from destination database for comparison
+command_ok(
+	[
+		'pg_dump', '-s', '-d', 'regression',
+		'-h', $dst_node->host,
+		'-p', $dst_node->port,
+		'-f', $dump2_file
+	],
+	'pg_dump on destination instance');
+
+# Compare the two dumps. Usually there is no difference except a difference in
+# column order caused because of the way the tables are created in regression
+# tests and the way they are dumped. Treat that as an exception.
+my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
+ INHERITS (public.gtest1);
+ --
+ CREATE TABLE public.test_type_diff2_c1 (
++    int_two smallint,
+     int_four bigint,
+-    int_eight bigint,
+-    int_two smallint
++    int_eight bigint
+ )
+ INHERITS (public.test_type_diff2);
+ --
+ CREATE TABLE public.test_type_diff2_c2 (
+-    int_eight bigint,
+     int_two smallint,
+-    int_four bigint
++    int_four bigint,
++    int_eight bigint
+ )
+ INHERITS (public.test_type_diff2);
+ ";
+my ($stdout, $stderr) =
+	run_command([ 'diff', '-u', $dump1_file, $dump2_file]);
+# Clear file names, line numbersfrom the diffs; those are not going to remain
+# the same always. Also clear empty lines and normalize new line characters
+# across platforms.
+$stdout =~ s/^\@\@.*$//mg;
+$stdout =~ s/^.*$dump1_file.*$//mg;
+$stdout =~ s/^.*$dump2_file.*$//mg;
+$stdout =~ s/^\s*\n//mg;
+$stdout =~ s/\r\n/\n/g;
+$expected_diff =~ s/\r\n/\n/g;
+is($stdout, $expected_diff, 'old and new dumps match after dump and restore');
+
+done_testing();
\ No newline at end of file
-- 
2.25.1

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Ashutosh Bapat (#1)

Re: Test to dump and restore objects left behind by regression

On Wed, Feb 21, 2024 at 12:18:45PM +0530, Ashutosh Bapat wrote:

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.

+	my $rc =
+	  system($ENV{PG_REGRESS}
+		  . " $extra_opts "
+		  . "--dlpath=\"$dlpath\" "
+		  . "--bindir= "
+		  . "--host="
+		  . $node->host . " "
+		  . "--port="
+		  . $node->port . " "
+		  . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+		  . "--max-concurrent-tests=20 "
+		  . "--inputdir=\"$inputdir\" "
+		  . "--outputdir=\"$outputdir\"");

I am not sure that it is a good idea to add a full regression test
cycle while we have already 027_stream_regress.pl that would be enough
to test some dump scenarios. These are very expensive and easy to
notice even with a high level of parallelization of the tests.
--
Michael

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#2)

Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Feb 21, 2024 at 12:18:45PM +0530, Ashutosh Bapat wrote:

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.
+       my $rc =
+         system($ENV{PG_REGRESS}
+                 . " $extra_opts "
+                 . "--dlpath=\"$dlpath\" "
+                 . "--bindir= "
+                 . "--host="
+                 . $node->host . " "
+                 . "--port="
+                 . $node->port . " "
+                 . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+                 . "--max-concurrent-tests=20 "
+                 . "--inputdir=\"$inputdir\" "
+                 . "--outputdir=\"$outputdir\"");
I am not sure that it is a good idea to add a full regression test
cycle while we have already 027_stream_regress.pl that would be enough
to test some dump scenarios.

That test *uses* pg_dump as a way to test whether the two clusters are
in sync. The test might change in future to use some other method to
make sure the two clusters are consistent. Adding the test here to
that test will make that change much harder.

It's not the dump, but restore, we are interested in here. No test
that runs PG_REGRESS also runs pg_restore in non-binary mode.

Also we need to keep this test near other pg_dump tests, not far from them.

These are very expensive and easy to
notice even with a high level of parallelization of the tests.

I agree, but I didn't find a suitable test to ride on.

--
Best Wishes,
Ashutosh Bapat

Peter Eisentraut

peter@eisentraut.org

almost 2 years ago

In reply to: Michael Paquier (#2)

Re: Test to dump and restore objects left behind by regression

On 22.02.24 02:01, Michael Paquier wrote:

On Wed, Feb 21, 2024 at 12:18:45PM +0530, Ashutosh Bapat wrote:

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.
+	my $rc =
+	  system($ENV{PG_REGRESS}
+		  . " $extra_opts "
+		  . "--dlpath=\"$dlpath\" "
+		  . "--bindir= "
+		  . "--host="
+		  . $node->host . " "
+		  . "--port="
+		  . $node->port . " "
+		  . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+		  . "--max-concurrent-tests=20 "
+		  . "--inputdir=\"$inputdir\" "
+		  . "--outputdir=\"$outputdir\"");
I am not sure that it is a good idea to add a full regression test
cycle while we have already 027_stream_regress.pl that would be enough
to test some dump scenarios. These are very expensive and easy to
notice even with a high level of parallelization of the tests.

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

Note that each of these steps could fail.

We have somewhat relied on the pg_upgrade test to provide this testing,
but we have recently discovered that the dumps in binary-upgrade mode
are different enough to not test the normal dumps well.

Yes, this test is a bit expensive. We could save some time by doing the
first dump at the end of the normal regress test and have the pg_dump
test reuse that, but then that would make the regress test run a bit
longer. Is that a better tradeoff?

I have done some timing tests:

master:

pg_dump check: 22s
pg_dump check -j8: 8s
check-world -j8: 2min44s

patched:

pg_dump check: 34s
pg_dump check -j8: 13s
check-world -j8: 2min46s

So overall it doesn't seem that bad.

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Peter Eisentraut (#4)

Re: Test to dump and restore objects left behind by regression

On 22 Feb 2024, at 10:16, Peter Eisentraut <peter@eisentraut.org> wrote:

We have somewhat relied on the pg_upgrade test to provide this testing, but we have recently discovered that the dumps in binary-upgrade mode are different enough to not test the normal dumps well.

Yes, this test is a bit expensive. We could save some time by doing the first dump at the end of the normal regress test and have the pg_dump test reuse that, but then that would make the regress test run a bit longer. Is that a better tradeoff?

Something this expensive seems like what PG_TEST_EXTRA is intended for, we
already have important test suites there.

But. We know that the cluster has an interesting state when the pg_upgrade
test starts, could we use that to make a dump/restore test before continuing
with testing pg_upgrade? It can be argued that pg_upgrade shouldn't be
responsible for testing pg_dump, but it's already now a pretty important
testcase for pg_dump in binary upgrade mode so it's that far off. If pg_dump
has bugs then pg_upgrade risks subtly breaking.

When upgrading to the same version, we could perhaps also use this to test a
scenario like: Dump A, restore into B, upgrade B into C, dump C and compare C
to A.

--
Daniel Gustafsson

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Daniel Gustafsson (#5)

Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 22 Feb 2024, at 10:16, Peter Eisentraut <peter@eisentraut.org> wrote:

We have somewhat relied on the pg_upgrade test to provide this testing, but we have recently discovered that the dumps in binary-upgrade mode are different enough to not test the normal dumps well.

Yes, this test is a bit expensive. We could save some time by doing the first dump at the end of the normal regress test and have the pg_dump test reuse that, but then that would make the regress test run a bit longer. Is that a better tradeoff?

Something this expensive seems like what PG_TEST_EXTRA is intended for, we
already have important test suites there.

That's ok with me.

But. We know that the cluster has an interesting state when the pg_upgrade
test starts, could we use that to make a dump/restore test before continuing
with testing pg_upgrade? It can be argued that pg_upgrade shouldn't be
responsible for testing pg_dump, but it's already now a pretty important
testcase for pg_dump in binary upgrade mode so it's that far off. If pg_dump
has bugs then pg_upgrade risks subtly breaking.

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think. However if more people think we should
just add this test 002_pg_upgrade.pl, I am fine with it.

When upgrading to the same version, we could perhaps also use this to test a
scenario like: Dump A, restore into B, upgrade B into C, dump C and compare C
to A.

If comparison of C to A fails, we wouldn't know which step fails. I
would rather compare outputs of each step separately.

--
Best Wishes,
Ashutosh Bapat

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Ashutosh Bapat (#6)

Re: Test to dump and restore objects left behind by regression

On 22 Feb 2024, at 10:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think.

Quite possibly not, but pg_upgrade is already today an important testsuite for
testing pg_dump in binary-upgrade mode so maybe more developers touching
pg_dump should?

When upgrading to the same version, we could perhaps also use this to test a
scenario like: Dump A, restore into B, upgrade B into C, dump C and compare C
to A.

If comparison of C to A fails, we wouldn't know which step fails. I
would rather compare outputs of each step separately.

To be clear, this wasn't intended to replace what you are proposing, but an
idea for using it to test *more* scenarios.

--
Daniel Gustafsson

Peter Eisentraut

peter@eisentraut.org

almost 2 years ago

In reply to: Daniel Gustafsson (#7)

Re: Test to dump and restore objects left behind by regression

On 22.02.24 11:00, Daniel Gustafsson wrote:

On 22 Feb 2024, at 10:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think.

Quite possibly not, but pg_upgrade is already today an important testsuite for
testing pg_dump in binary-upgrade mode so maybe more developers touching
pg_dump should?

Yeah, I think attaching this to the existing pg_upgrade test would be a
good idea. Not only would it save test run time, it would probably also
reduce code duplication.

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Peter Eisentraut (#4)

Re: Test to dump and restore objects left behind by regression

Peter Eisentraut <peter@eisentraut.org> writes:

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

I'm very much against adding another full run of the core regression
tests to support this. But beyond the problem of not bloating the
check-world test runtime, there is the question of what this would
actually buy us. I doubt that it is worth very much, because
it would not detect bugs-of-omission in pg_dump. As I remarked in
the other thread, if pg_dump is blind to the existence of some
feature or field, testing that the dumps compare equal will fail
to reveal that it didn't restore that property.

I'm not sure what we could do about that. One could imagine writing
some test infrastructure that dumps out the contents of the system
catalogs directly, and comparing that instead of pg_dump output.
But that'd be a lot of infrastructure to write and maintain ...
and it's not real clear why it wouldn't *also* suffer from
I-forgot-to-add-this hazards.

On balance, I think there are good reasons that we've not added
such a test, and I don't believe those tradeoffs have changed.

regards, tom lane

#10

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Peter Eisentraut (#8)

Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 3:50 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 22.02.24 11:00, Daniel Gustafsson wrote:

On 22 Feb 2024, at 10:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think.

Quite possibly not, but pg_upgrade is already today an important testsuite for
testing pg_dump in binary-upgrade mode so maybe more developers touching
pg_dump should?

Yeah, I think attaching this to the existing pg_upgrade test would be a
good idea. Not only would it save test run time, it would probably also
reduce code duplication.

That's more than one vote for adding the test to 002_pg_ugprade.pl.
Seems fine to me.

--
Best Wishes,
Ashutosh Bapat

#11

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 2 years ago

In reply to: Tom Lane (#9)

Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 8:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

I'm very much against adding another full run of the core regression
tests to support this.

This will be taken care of by Peter's latest idea of augmenting
existing 002_pg_upgrade.pl.

But beyond the problem of not bloating the
check-world test runtime, there is the question of what this would
actually buy us. I doubt that it is worth very much, because
it would not detect bugs-of-omission in pg_dump. As I remarked in
the other thread, if pg_dump is blind to the existence of some
feature or field, testing that the dumps compare equal will fail
to reveal that it didn't restore that property.

I'm not sure what we could do about that. One could imagine writing
some test infrastructure that dumps out the contents of the system
catalogs directly, and comparing that instead of pg_dump output.
But that'd be a lot of infrastructure to write and maintain ...
and it's not real clear why it wouldn't *also* suffer from
I-forgot-to-add-this hazards.

If a developer forgets to add logic to dump objects that their patch
adds, it's hard to detect it, through testing alone, in every possible
case. We need reviewers to take care of that. I don't think that's the
objective of this test case or of pg_upgrade test either.

On balance, I think there are good reasons that we've not added
such a test, and I don't believe those tradeoffs have changed.

I am not aware of those reasons. Are they documented somewhere? Any
pointers to the previous discussion on this topic? Googling "pg_dump
regression pgsql-hackers" returns threads about performance
regressions.

On the flip side, the test I wrote reproduces the COMPRESSION/STORAGE
bug you reported along with a few other bugs in that area which I will
report soon on that thread. I think, that shows that we need such a
test.

--
Best Wishes,
Ashutosh Bapat

#12

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Ashutosh Bapat (#11)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Fri, Feb 23, 2024 at 10:46 AM Ashutosh Bapat <
ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Feb 22, 2024 at 8:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

I'm very much against adding another full run of the core regression
tests to support this.

This will be taken care of by Peter's latest idea of augmenting
existing 002_pg_upgrade.pl.

Incorporated the test to 002_pg_ugprade.pl.

Some points for discussion:
1. The test still hardcodes the diffs between two dumps. Haven't found a
better way to do it. I did consider removing the problematic objects from
the regression database but thought against it since we would lose some
coverage.

2. The new code tests dump and restore of just the regression database and
does not use pg_dumpall like pg_upgrade. Should it instead perform
pg_dumpall? I decided against it since a. we are interested in dumping and
restoring objects left behind by regression, b. I didn't find a way to
provide the format option to pg_dumpall. The test could be enhanced to use
different dump formats.

I have added it to the next commitfest.
https://commitfest.postgresql.org/48/4956/

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-pg_dump-restore-regression-objects-20240426.patchtext/x-patch; charset=US-ASCII; name=0001-pg_dump-restore-regression-objects-20240426.patchDownload

From cd1d0d3a2fe5ef6b7659ab710f0287d186ca0051 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 15 Apr 2024 08:20:34 +0200
Subject: [PATCH] pg_dump/restore regression objects

002_pg_upgrade.pl tests pg_upgrade on the regression database left
behind by regression run. Modify it to test pg_dump/restore the
regression database.

Author: Ashutosh Bapat
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 117 +++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 3e67121a8d..e79bd85a2a 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -197,6 +197,7 @@ is( $result,
 my $srcdir = abs_path("../../..");
 
 # Set up the data of the old instance with a dump or pg_regress.
+my $db_from_regress;
 if (defined($ENV{olddump}))
 {
 	# Use the dump specified.
@@ -207,6 +208,7 @@ if (defined($ENV{olddump}))
 	# not exist yet, and we are done here.
 	$oldnode->command_ok([ 'psql', '-X', '-f', $olddumpfile, 'postgres' ],
 		'loaded old dump file');
+	$db_from_regress = 0;
 }
 else
 {
@@ -258,6 +260,7 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+	$db_from_regress = 1;
 }
 
 # Initialize a new node for the upgrade.
@@ -510,4 +513,118 @@ if ($compare_res != 0)
 	print "=== EOF ===\n";
 }
 
+# Test normal dump/restore of the objects left behind by regression. Ideally it
+# should be done in a separate test, but doing it here saves us one full
+# regression run.
+if ($db_from_regress)
+{
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my $dump3_file = "$tempdir/dump3.sql";
+	my $dump4_file = "$tempdir/dump4.sql";
+	my $dump5_file = "$tempdir/dump5.sql";
+
+	$dst_node->init();
+	$oldnode->start;
+
+	# Dump source database for comparison later
+	command_ok(
+		[
+			'pg_dump', '-s', '-d', 'regression',
+			'-h', $oldnode->host,
+			'-p', $oldnode->port,
+			'-f', $dump4_file
+		],
+		'pg_dump on source instance');
+
+	# Dump to be restored
+	command_ok(
+		[
+			'pg_dump', '-Fc', '-d', 'regression',
+			'-h', $oldnode->host,
+			'-p', $oldnode->port,
+			'-f', $dump3_file
+		],
+		'pg_dump on source instance');
+
+	$dst_node->start;
+	$dst_node->command_ok(
+			[ 'createdb', 'regression' ],
+			"created destination database");
+
+	# Restore into destination database
+	command_ok(
+		[
+			'pg_restore', '-d', 'regression',
+			'-h', $dst_node->host,
+			'-p', $dst_node->port,
+			$dump3_file
+		],
+		'pg_restore on destination instance');
+
+	# Dump from destination database for comparison
+	command_ok(
+		[
+			'pg_dump', '-s', '-d', 'regression',
+			'-h', $dst_node->host,
+			'-p', $dst_node->port,
+			'-f', $dump5_file
+		],
+		'pg_dump on destination instance');
+
+	# Compare the two dumps. Ideally there should be no difference in the two
+	# dumps. But the column order in the dumps differs for inheritance
+	# children. Some regression tests purposefully create the child table with
+	# columns in different order than the parent using CREATE TABLE ...
+	# followed by ALTER TABLE ... INHERIT. These child tables are dumped as a
+	# single CREATE TABLE ... INHERITS with column order same as the child.
+	# When the child table is restored using this command, it creates the child
+	# table with same column order as the parent. The restored table is dumped
+	# as CREATE TABLE ... INHERITS but with columns order same as parent. Thus
+	# the column orders differ between the two dumps. Treat this difference as
+	# an exception.
+	#
+	# We could avoid this by dumping the database loaded from original dump.
+	# But that would change the state of the objects as left behind by the
+	# regression.
+	my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
+ INHERITS (public.gtest1);
+ --
+ CREATE TABLE public.test_type_diff2_c1 (
++    int_two smallint,
+     int_four bigint,
+-    int_eight bigint,
+-    int_two smallint
++    int_eight bigint
+ )
+ INHERITS (public.test_type_diff2);
+ --
+ CREATE TABLE public.test_type_diff2_c2 (
+-    int_eight bigint,
+     int_two smallint,
+-    int_four bigint
++    int_four bigint,
++    int_eight bigint
+ )
+ INHERITS (public.test_type_diff2);
+ ";
+	my ($stdout, $stderr) =
+		run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
+	# Clear file names, line numbers from the diffs; those are not going to
+	# remain the same always. Also clear empty lines and normalize new line
+	# characters across platforms.
+	$stdout =~ s/^\@\@.*$//mg;
+	$stdout =~ s/^.*$dump4_file.*$//mg;
+	$stdout =~ s/^.*$dump5_file.*$//mg;
+	$stdout =~ s/^\s*\n//mg;
+	$stdout =~ s/\r\n/\n/g;
+	$expected_diff =~ s/\r\n/\n/g;
+	is($stdout, $expected_diff, 'old and new dumps match after dump and restore');
+}
+
 done_testing();
-- 
2.34.1

#13

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Ashutosh Bapat (#12)

Re: Test to dump and restore objects left behind by regression

On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote:

Some points for discussion:
1. The test still hardcodes the diffs between two dumps. Haven't found a
better way to do it. I did consider removing the problematic objects from
the regression database but thought against it since we would lose some
coverage.

2. The new code tests dump and restore of just the regression database and
does not use pg_dumpall like pg_upgrade. Should it instead perform
pg_dumpall? I decided against it since a. we are interested in dumping and
restoring objects left behind by regression, b. I didn't find a way to
provide the format option to pg_dumpall. The test could be enhanced to use
different dump formats.

I have added it to the next commitfest.
https://commitfest.postgresql.org/48/4956/

Ashutosh and I have discussed this patch a bit last week. Here is a
short summary of my input, after I understood what is going on.

+	# We could avoid this by dumping the database loaded from original dump.
+	# But that would change the state of the objects as left behind by the
+	# regression.
+	my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
[...]
+	my ($stdout, $stderr) =
+		run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
+	# Clear file names, line numbers from the diffs; those are not going to
+	# remain the same always. Also clear empty lines and normalize new line
+	# characters across platforms.
+	$stdout =~ s/^\@\@.*$//mg;
+	$stdout =~ s/^.*$dump4_file.*$//mg;
+	$stdout =~ s/^.*$dump5_file.*$//mg;
+	$stdout =~ s/^\s*\n//mg;
+	$stdout =~ s/\r\n/\n/g;
+	$expected_diff =~ s/\r\n/\n/g;
+	is($stdout, $expected_diff, 'old and new dumps match after dump and restore');
+}

I am not a fan of what this patch does, adding the knowledge related
to the dump filtering within 002_pg_upgrade.pl. Please do not take
me wrong, I am not against the idea of adding that within this
pg_upgrade test to save from one full cycle of `make check` to check
the consistency of the dump. My issue is that this logic should be
externalized, and it should be in fewer lines of code.

For the externalization part, Ashutosh and I considered a few ideas,
but one that we found tempting is to create a small .pm, say named
AdjustDump.pm. This would share some rules with the existing
AdjustUpgrade.pm, which would be fine IMO even if there is a small
overlap, documenting the dependency between each module. That makes
the integration with the buildfarm much simpler by not creating more
dependencies with the modules shared between core and the buildfarm
code. For the "shorter" part, one idea that I had is to apply to the
dump a regexp that wipes out the column definitions within the
parenthesis, keeping around the CREATE TABLE and any other attributes
not impacted by the reordering. All that should be documented in the
module, of course.

Another thing would be to improve the backend so as we are able to
a better support for physical column ordering, which would, I assume
(and correct me if I'm wrong!), prevent the reordering of the
attributes like in this inheritance case. But that would not address
the case of dumps taken from older versions with a new version of
pg_dump, which is something that may be interesting to have more tests
for in the long-term. Overall a module sounds like a better solution.
--
Michael

#14

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Michael Paquier (#13)

Re: Test to dump and restore objects left behind by regression

On Tue, Jun 4, 2024 at 4:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote:

Some points for discussion:
1. The test still hardcodes the diffs between two dumps. Haven't found a
better way to do it. I did consider removing the problematic objects from
the regression database but thought against it since we would lose some
coverage.

2. The new code tests dump and restore of just the regression database

and

does not use pg_dumpall like pg_upgrade. Should it instead perform
pg_dumpall? I decided against it since a. we are interested in dumping

and

restoring objects left behind by regression, b. I didn't find a way to
provide the format option to pg_dumpall. The test could be enhanced to

use

different dump formats.

I have added it to the next commitfest.
https://commitfest.postgresql.org/48/4956/

Ashutosh and I have discussed this patch a bit last week. Here is a
short summary of my input, after I understood what is going on.
+       # We could avoid this by dumping the database loaded from original
dump.
+       # But that would change the state of the objects as left behind by
the
+       # regression.
+       my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
[...]
+       my ($stdout, $stderr) =
+               run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
+       # Clear file names, line numbers from the diffs; those are not
going to
+       # remain the same always. Also clear empty lines and normalize new
line
+       # characters across platforms.
+       $stdout =~ s/^\@\@.*$//mg;
+       $stdout =~ s/^.*$dump4_file.*$//mg;
+       $stdout =~ s/^.*$dump5_file.*$//mg;
+       $stdout =~ s/^\s*\n//mg;
+       $stdout =~ s/\r\n/\n/g;
+       $expected_diff =~ s/\r\n/\n/g;
+       is($stdout, $expected_diff, 'old and new dumps match after dump
and restore');
+}
I am not a fan of what this patch does, adding the knowledge related
to the dump filtering within 002_pg_upgrade.pl. Please do not take
me wrong, I am not against the idea of adding that within this
pg_upgrade test to save from one full cycle of `make check` to check
the consistency of the dump. My issue is that this logic should be
externalized, and it should be in fewer lines of code.

For the externalization part, Ashutosh and I considered a few ideas,
but one that we found tempting is to create a small .pm, say named
AdjustDump.pm. This would share some rules with the existing
AdjustUpgrade.pm, which would be fine IMO even if there is a small
overlap, documenting the dependency between each module. That makes
the integration with the buildfarm much simpler by not creating more
dependencies with the modules shared between core and the buildfarm
code. For the "shorter" part, one idea that I had is to apply to the
dump a regexp that wipes out the column definitions within the
parenthesis, keeping around the CREATE TABLE and any other attributes
not impacted by the reordering. All that should be documented in the
module, of course.

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.

Another thing would be to improve the backend so as we are able to
a better support for physical column ordering, which would, I assume
(and correct me if I'm wrong!), prevent the reordering of the
attributes like in this inheritance case. But that would not address
the case of dumps taken from older versions with a new version of
pg_dump, which is something that may be interesting to have more tests
for in the long-term. Overall a module sounds like a better solution.

Changing the physical order of column of a child table based on the
inherited table seems intentional as per MergeAttributes(). That logic
looks sane by itself. In binary mode pg_dump works very hard to retain the
column order by issuing UPDATE commands against catalog tables. I don't
think mimicking that behaviour is the right choice for non-binary dump. I
agree with your conclusion that we fix it in by fixing the diffs. The code
to do that will be part of a separate module.

--
Best Wishes,
Ashutosh Bapat

#15

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Ashutosh Bapat (#14)

Re: Test to dump and restore objects left behind by regression

On Wed, Jun 05, 2024 at 05:09:58PM +0530, Ashutosh Bapat wrote:

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.

I had some doubts about PGBuild/Modules/TestUpgradeXversion.pm, but
after double-checking it loads dynamically AdjustUpgrade from the core
tree based on the base path where all the modules are:
# load helper module from source tree
unshift(@INC, "$srcdir/src/test/perl");
require PostgreSQL::Test::AdjustUpgrade;
PostgreSQL::Test::AdjustUpgrade->import;
shift(@INC);

It would be annoying to tweak the buildfarm code more to have a
different behavior depending on the branch of Postgres tested.
Anyway, from what I can see, you could create a new module with the
dump filtering rules that AdjustUpgrade requires without having to
update the buildfarm code.

Changing the physical order of column of a child table based on the
inherited table seems intentional as per MergeAttributes(). That logic
looks sane by itself. In binary mode pg_dump works very hard to retain the
column order by issuing UPDATE commands against catalog tables. I don't
think mimicking that behaviour is the right choice for non-binary dump. I
agree with your conclusion that we fix it in by fixing the diffs. The code
to do that will be part of a separate module.

Thanks.
--
Michael

#16

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Michael Paquier (#15)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

Sorry for delay, but here's next version of the patchset for review.

On Thu, Jun 6, 2024 at 5:07 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 05, 2024 at 05:09:58PM +0530, Ashutosh Bapat wrote:

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.

I had some doubts about PGBuild/Modules/TestUpgradeXversion.pm, but
after double-checking it loads dynamically AdjustUpgrade from the core
tree based on the base path where all the modules are:
# load helper module from source tree
unshift(@INC, "$srcdir/src/test/perl");
require PostgreSQL::Test::AdjustUpgrade;
PostgreSQL::Test::AdjustUpgrade->import;
shift(@INC);

It would be annoying to tweak the buildfarm code more to have a
different behavior depending on the branch of Postgres tested.
Anyway, from what I can see, you could create a new module with the
dump filtering rules that AdjustUpgrade requires without having to
update the buildfarm code.

The two filtering rules that I picked from AdjustUpgrade() are a. use unix
style newline b. eliminate blank lines. I think we could copy those rule
into the new module (as done in the patch) without creating any dependency
between modules. There's little gained by creating another perl function
just for those two sed commands. There's no way to do that otherwise. If we
keep those two modules independent, we will be free to change each module
as required in future. Do we need to change buildfarm code to load the
AdjustDump module like above? I am not familiar with the buildfarm code.

Here's a description of patches and some notes
0001
-------
1. Per your suggestion the logic to handle dump output differences is
externalized in PostgreSQL::Test::AdjustDump. Instead of eliminating those
differences altogether from both the dump outputs, the corresponding DDL in
the original dump output is adjusted to look like that from the restored
database. Thus we retain full knowledge of what differences to expect.
2. I have changed the name filter_dump to filter_dump_for_upgrade so as to
differentiate between two adjustments 1. for upgrade and 2. for
dump/restore. Ideally the name should have been adjust_dump_for_ugprade() .
It's more of an adjustment than filtering as indicated by the function it
calls. But I haven't changed that. The new function to adjust dumps for
dump and restore tests is named adjust_dump_for_restore() however.
3. As suggested by Daniel upthread, the test for dump and restore happens
before upgrade which might change the old cluster thus changing the state
of objects left behind by regression. The test is not executed if
regression is not used to create the old cluster.
4. The code to compare two dumps and report differences if any is moved to
its own function compare_dumps() which is used for both upgrade and
dump/restore tests.
The test uses the custom dump format for dumping and restoring the database.

0002
------
This commit expands the previous test to test all dump formats. But as
expected that increases the time taken by this test. On my laptop 0001
takes approx 28 seconds to run the test and with 0002 it takes approx 35
seconds. But there's not much impact on the duration of running all the
tests (2m30s vs 2m40s). The code which creates the DDL statements in the
dump is independent of the dump format. So usually we shouldn't require to
test all the formats in this test. But each format stores the dependencies
between dumped objects in a different manner which would be tested with the
changes in this patch. I think this patch is also useful. If we decide to
keep this test, the patch is intended to be merged into 0001.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Test-dump-and-restore-in-all-formats-20240628.patchtext/x-patch; charset=US-ASCII; name=0002-Test-dump-and-restore-in-all-formats-20240628.patchDownload

From dea7d55a8b938c1b670eebe7662a0dace5077a0d Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 15:17:29 +0530
Subject: [PATCH 3/3] Test dump and restore in all formats

Expanding on the previous commit, this commit modifies the test to dump
and restore regression database using all dump formats one by one.

But this changes increases the time to run test from 51s to 78s on my laptop.
If that's acceptable this commit should be squashed into the previous commit
before committing upstream.

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 84 +++++++++++++++++---------
 1 file changed, 54 insertions(+), 30 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 0e181b294d..5093b2bcaa 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -576,41 +576,65 @@ sub test_regression_dump_restore
 {
 	my ($src_node, %node_params) = @_;
 	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
-	my $dump3_file = "$tempdir/dump3.custom";
-	my $dump4_file = "$tempdir/dump4.sql";
-	my $dump5_file = "$tempdir/dump5.sql";
-
-
-	# Dump to be restored
-	command_ok(
-		[
-			'pg_dump', '-Fc', '--no-sync',
-			'-d', $src_node->connstr('regression'),
-			'-f', $dump3_file
-		],
-		'pg_dump on source instance');
 
 	$dst_node->init(%node_params);
 	$dst_node->start;
-	$dst_node->command_ok([ 'createdb', 'regression' ],
-		"created destination database");
 
-	# Restore into destination database
-	command_ok(
-		[ 'pg_restore', '-d', $dst_node->connstr('regression'), $dump3_file ],
-		'pg_restore on destination instance');
-
-	# Dump original and restored databases for comparison
+	# Dump original database for comparison
+	my $src_dump_file = "$tempdir/src_dump.sql";
 	take_dump_for_comparison($src_node->connstr('regression'),
-		$dump4_file, 'original');
-	take_dump_for_comparison($dst_node->connstr('regression'),
-		$dump5_file, 'restored');
-	my $dump4_adjusted = adjust_dump_for_restore($dump4_file, 1);
-	my $dump5_adjusted = adjust_dump_for_restore($dump5_file, 0);
-
-	# Compare adjusted dumps, there should be no differences.
-	compare_dumps($dump4_adjusted, $dump5_adjusted,
-		'dump outputs of original and restored regression database match');
+		$src_dump_file, 'original');
+	my $src_adjusted_dump = adjust_dump_for_restore($src_dump_file, 1);
+
+	# Test dump and restore in all formats one by one
+	for my $format ('tar', 'directory', 'custom', 'plain')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $dst_dump_file = "$tempdir/dest_dump.$format";
+		my $format_spec = substr($format, 0, 1);
+		my $restored_db = 'regression_' . $format;
+
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in '$format' format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"pg_restore on destination instance in '$format' format");
+
+		# Dump restored database for comparison
+		take_dump_for_comparison($dst_node->connstr($restored_db),
+			$dst_dump_file, 'restored');
+		my $dst_adjusted_dump = adjust_dump_for_restore($dst_dump_file, 0);
+
+		# Compare adjusted dumps, there should be no differences.
+		compare_dumps($src_adjusted_dump, $dst_adjusted_dump,
+			"dump outputs of original and restored regression database, using format '$format' match"
+		);
+	}
 }
 
 done_testing();
-- 
2.34.1

0001-pg_dump-restore-regression-objects-20240628.patchtext/x-patch; charset=US-ASCII; name=0001-pg_dump-restore-regression-objects-20240628.patchDownload

From 3ce3dfc2375759bd0083c8400d5b4ccaf9149aab Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 2/3] pg_dump/restore regression objects

002_pg_upgrade.pl tests pg_upgrade on the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL left behind in various states.
The test thus covers wider dump and restore scenarios.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 125 +++++++++++++++++---
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 108 +++++++++++++++++
 2 files changed, 216 insertions(+), 17 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..0e181b294d 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,6 +13,7 @@ use File::Path qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +37,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison when
+# testing upgrade. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -61,6 +62,57 @@ sub filter_dump
 	return $dump_file_filtered;
 }
 
+# Test that the given two files (which usually contain output of pg_dump
+# command in plain format) match. Output the difference if any.
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context if the dumps do not match.
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+}
+
+# Dump the database specified in the connection string for comparing original
+# and restored databases. The order of columns in COPY statements dumped from
+# the original database and the restored database introduces differences which
+# are difficult to adjust. Hence dump only schema for now.
+sub take_dump_for_comparison
+{
+	my ($connstr, $dump_file, $dbinstance) = @_;
+
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dump_file ],
+		'pg_dump on ' . $dbinstance . ' instance');
+}
+
+# Adjust the contents of a dump before its use in a content comparison when
+# testing dump and restore. This returns the path to the adjusted dump.
+sub adjust_dump_for_restore
+{
+	my ($dump_file, $original) = @_;
+	my $dump_adjusted = "${dump_file}_adjusted";
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "opening $dump_adjusted ";
+	print $dh adjust_regress_dumpfile(slurp_file($dump_file), $original);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 # The test of pg_upgrade requires two clusters, an old one and a new one
 # that gets upgraded.  Before running the upgrade, a logical dump of the
 # old cluster is taken, and a second logical dump of the new one is taken
@@ -258,6 +310,12 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate test, but doing it here saves us one full
+	# regression run. Do this while the old cluster remains usable before
+	# upgrading it.
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -502,24 +560,57 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered =
+  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered =
+  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+# Test dump and restore of objects left behind regression run. It is expected
+# that regression tests, which create `regression`` database, are run on
+# `src_node` and the node is left in running state.
+sub test_regression_dump_restore
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my $dump3_file = "$tempdir/dump3.custom";
+	my $dump4_file = "$tempdir/dump4.sql";
+	my $dump5_file = "$tempdir/dump5.sql";
+
+
+	# Dump to be restored
+	command_ok(
+		[
+			'pg_dump', '-Fc', '--no-sync',
+			'-d', $src_node->connstr('regression'),
+			'-f', $dump3_file
+		],
+		'pg_dump on source instance');
+
+	$dst_node->init(%node_params);
+	$dst_node->start;
+	$dst_node->command_ok([ 'createdb', 'regression' ],
+		"created destination database");
+
+	# Restore into destination database
+	command_ok(
+		[ 'pg_restore', '-d', $dst_node->connstr('regression'), $dump3_file ],
+		'pg_restore on destination instance');
+
+	# Dump original and restored databases for comparison
+	take_dump_for_comparison($src_node->connstr('regression'),
+		$dump4_file, 'original');
+	take_dump_for_comparison($dst_node->connstr('regression'),
+		$dump5_file, 'restored');
+	my $dump4_adjusted = adjust_dump_for_restore($dump4_file, 1);
+	my $dump5_adjusted = adjust_dump_for_restore($dump5_file, 0);
+
+	# Compare adjusted dumps, there should be no differences.
+	compare_dumps($dump4_adjusted, $dump5_adjusted,
+		'dump outputs of original and restored regression database match');
 }
 
 done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 0000000000..df79cbe7c8
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,108 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for regression dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and retore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use PostgreSQL::Version;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. When these child tables are restore using
+C<CREATE TABLE ... INHERITS> command, they have their column orders same as
+that of their respective parents. Thus the column orders of child tables in the
+original database and those in the restored database differ, causing difference
+in the dump outputs. Adjust these DDL statements in the dump file from original
+database to match those from the restored database so that both the dump files
+match.
+
+Additionally adjust blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<original>: 1 indicates that the given dump file is from the original
+database, else 0
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $original) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($original)
+	{
+		$dump =~ s/(^CREATE\sTABLE\spublic\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger)/$1$3,$2/mgx;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
-- 
2.34.1

#17

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Ashutosh Bapat (#16)

Re: Test to dump and restore objects left behind by regression

On Fri, Jun 28, 2024 at 06:00:07PM +0530, Ashutosh Bapat wrote:

Here's a description of patches and some notes
0001
-------
1. Per your suggestion the logic to handle dump output differences is
externalized in PostgreSQL::Test::AdjustDump. Instead of eliminating those
differences altogether from both the dump outputs, the corresponding DDL in
the original dump output is adjusted to look like that from the restored
database. Thus we retain full knowledge of what differences to expect.
2. I have changed the name filter_dump to filter_dump_for_upgrade so as to
differentiate between two adjustments 1. for upgrade and 2. for
dump/restore. Ideally the name should have been adjust_dump_for_ugprade() .
It's more of an adjustment than filtering as indicated by the function it
calls. But I haven't changed that. The new function to adjust dumps for
dump and restore tests is named adjust_dump_for_restore() however.
3. As suggested by Daniel upthread, the test for dump and restore happens
before upgrade which might change the old cluster thus changing the state
of objects left behind by regression. The test is not executed if
regression is not used to create the old cluster.
4. The code to compare two dumps and report differences if any is moved to
its own function compare_dumps() which is used for both upgrade and
dump/restore tests.
The test uses the custom dump format for dumping and restoring the
database.

At quick glance, that seems to be going in the right direction. Note
that you have forgotten install and uninstall rules for the new .pm
file.

0002 increases more the runtime of a test that's already one of the
longest ones in the tree is not really appealing, I am afraid.
--
Michael

#18

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Michael Paquier (#17)

Re: Test to dump and restore objects left behind by regression

On Fri, Jul 5, 2024 at 10:59 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jun 28, 2024 at 06:00:07PM +0530, Ashutosh Bapat wrote:

Here's a description of patches and some notes
0001
-------
1. Per your suggestion the logic to handle dump output differences is
externalized in PostgreSQL::Test::AdjustDump. Instead of eliminating

those

differences altogether from both the dump outputs, the corresponding DDL

in

the original dump output is adjusted to look like that from the restored
database. Thus we retain full knowledge of what differences to expect.
2. I have changed the name filter_dump to filter_dump_for_upgrade so as

to

differentiate between two adjustments 1. for upgrade and 2. for
dump/restore. Ideally the name should have been

adjust_dump_for_ugprade() .

It's more of an adjustment than filtering as indicated by the function it
calls. But I haven't changed that. The new function to adjust dumps for
dump and restore tests is named adjust_dump_for_restore() however.
3. As suggested by Daniel upthread, the test for dump and restore happens
before upgrade which might change the old cluster thus changing the state
of objects left behind by regression. The test is not executed if
regression is not used to create the old cluster.
4. The code to compare two dumps and report differences if any is moved

to

its own function compare_dumps() which is used for both upgrade and
dump/restore tests.
The test uses the custom dump format for dumping and restoring the
database.

At quick glance, that seems to be going in the right direction. Note
that you have forgotten install and uninstall rules for the new .pm
file.

Before submitting the patch, I looked for all the places which mention
AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
add AdjustUpgrade.pm in those files?

0002 increases more the runtime of a test that's already one of the
longest ones in the tree is not really appealing, I am afraid.

We could forget 0002. I am fine with that. But I can change the code such
that formats other than "plain" are tested when PG_TEST_EXTRAS contains
"regress_dump_formats". Would that be acceptable?

--
Best Wishes,
Ashutosh Bapat

#19

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Ashutosh Bapat (#18)

Re: Test to dump and restore objects left behind by regression

On Mon, Jul 08, 2024 at 03:59:30PM +0530, Ashutosh Bapat wrote:

Before submitting the patch, I looked for all the places which mention
AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
add AdjustUpgrade.pm in those files?

Good question. This has not been mentioned on the thread that added
the module:
/messages/by-id/891521.1673657296@sss.pgh.pa.us

And I could see it as being useful if installed. The same applies to
Kerberos.pm, actually. I'll ping that on a new thread.

We could forget 0002. I am fine with that. But I can change the code such
that formats other than "plain" are tested when PG_TEST_EXTRAS contains
"regress_dump_formats". Would that be acceptable?

Interesting idea. That may be acceptable, under the same arguments as
the xid_wraparound one.
--
Michael

#20

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Michael Paquier (#19)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Tue, Jul 9, 2024 at 1:07 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 08, 2024 at 03:59:30PM +0530, Ashutosh Bapat wrote:

Before submitting the patch, I looked for all the places which mention
AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
add AdjustUpgrade.pm in those files?

Good question. This has not been mentioned on the thread that added
the module:
/messages/by-id/891521.1673657296@sss.pgh.pa.us

And I could see it as being useful if installed. The same applies to
Kerberos.pm, actually. I'll ping that on a new thread.

For now, it may be better to maintain status-quo. If we see a need to
use these modules in future by say extensions or tests outside core
tree, we will add them to meson and make files.

We could forget 0002. I am fine with that. But I can change the code such
that formats other than "plain" are tested when PG_TEST_EXTRAS contains
"regress_dump_formats". Would that be acceptable?

Interesting idea. That may be acceptable, under the same arguments as
the xid_wraparound one.

Done. Added a new entry in PG_TEST_EXTRA documentation.

I have merged the two patches now.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20240712.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20240712.patchDownload

From aff0939b8aa8566daecf1ac35b9c7fce9fa851ca Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 2/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade on the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. The test
thus covers wider dump and restore scenarios.

When PG_TEST_EXTRA has 'regress_dump_formats' in it, test dump and
restore in all supported formats. Otherwise test only "plain" format so
that the test finishes quickly.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  13 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 173 ++++++++++++++++++--
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 109 ++++++++++++
 3 files changed, 277 insertions(+), 18 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index d1042e0222..8c1a9ddc40 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,19 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_formats</literal></term>
+     <listitem>
+      <para>
+       When enabled,
+       <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename> tests dump
+       and restore of regression database using all dump formats. Otherwise
+       tests only <literal>plain</literal> format. Not enabled by default
+       because it is resource intensive.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..613512ffe7 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,6 +13,7 @@ use File::Path qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +37,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -61,6 +62,44 @@ sub filter_dump
 	return $dump_file_filtered;
 }
 
+# Test that the given two files match.  The files usually contain pg_dump
+# output in "plain" format. Output the difference if any.
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context if the dumps do not match.
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+}
+
+# Adjust the contents of a dump before its use in a content comparison for dump
+# and restore testing. This returns the path to the adjusted dump.
+sub adjust_dump_for_restore
+{
+	my ($dump_file, $is_original) = @_;
+	my $dump_adjusted = "${dump_file}_adjusted";
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "opening $dump_adjusted ";
+	print $dh adjust_regress_dumpfile(slurp_file($dump_file), $is_original);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 # The test of pg_upgrade requires two clusters, an old one and a new one
 # that gets upgraded.  Before running the upgrade, a logical dump of the
 # old cluster is taken, and a second logical dump of the new one is taken
@@ -258,6 +297,12 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate test, but doing it here saves us one full
+	# regression run. Do this while the old cluster remains usable before
+	# upgrading it.
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -502,24 +547,116 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered =
+  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered =
+  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
-
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn is left in running state. The dump is
+# restored on a fresh node created using given `node_params`. Plain dumps from
+# both the nodes are compared to make sure that all the dumped objects are
+# restored faithfully.
+sub test_regression_dump_restore
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Dump the original database in "plain" format for comparison later. The
+	# order of columns in COPY statements dumped from the original database and
+	# that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	my $src_dump_file = "$tempdir/src_dump.sql";
+	command_ok(
+		[
+			'pg_dump', '-s',
+			'--no-sync', '-d',
+			$src_node->connstr('regression'), '-f',
+			$src_dump_file
+		],
+		'pg_dump on original instance');
+	my $src_adjusted_dump = adjust_dump_for_restore($src_dump_file, 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	# Testing all dump formats takes longer. Do it only when explicitly
+	# requested.
+	my @formats;
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_formats\b/)
+	{
+		@formats = ('tar', 'directory', 'custom', 'plain');
+	}
+	else
+	{
+		@formats = ('plain');
+	}
+
+	for my $format (@formats)
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $format_spec = substr($format, 0, 1);
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database, we dump and restore data as well to catch any errors while
+		# doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in '$format' format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"pg_restore on destination instance in '$format' format");
+
+		# Dump restored database for comparison
+		my $dst_dump_file = "$tempdir/dest_dump.$format.sql";
+		command_ok(
+			[
+				'pg_dump', '-s',
+				'--no-sync', '-d',
+				$dst_node->connstr($restored_db), '-f',
+				$dst_dump_file
+			],
+			"pg_dump on instance restored with '$format' format");
+		my $dst_adjusted_dump = adjust_dump_for_restore($dst_dump_file, 0);
+
+		# Compare adjusted dumps, there should be no differences.
+		compare_dumps($src_adjusted_dump, $dst_adjusted_dump,
+			"dump outputs of original and restored regression database, using format '$format', match"
+		);
+	}
 }
 
 done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 0000000000..cd0516b58f
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,109 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and retore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in these C<CREATE TABLE ... INHERITS>
+statements in the dump file from original database to match that from the
+restored database.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<original>: 1 indicates that the given dump file is from the original
+database, else 0
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $original) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($original)
+	{
+		$dump =~ s/(^CREATE\sTABLE\spublic\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger)/$1$3,$2/mgx;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
-- 
2.34.1

#21

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 1 year ago

In reply to: Ashutosh Bapat (#20)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Fri, Jul 12, 2024 at 10:42 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I have merged the two patches now.

894be11adfa60ad1ce5f74534cf5f04e66d51c30 changed the schema in which
objects in test genereated_stored.sql are created. Because of this the
new test added by the patch was failing. Fixed the failure in the
attached.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20240909.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20240909.patchDownload

From 3b4573b0d3bb59fd21e01c3887a3d9cab8643238 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 2/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade on the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. The test
thus covers wider dump and restore scenarios.

When PG_TEST_EXTRA has 'regress_dump_formats' in it, test dump and
restore in all supported formats. Otherwise test only "plain" format so
that the test finishes quickly.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  13 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 173 ++++++++++++++++++--
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 109 ++++++++++++
 3 files changed, 277 insertions(+), 18 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index d1042e02228..8c1a9ddc403 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,19 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_formats</literal></term>
+     <listitem>
+      <para>
+       When enabled,
+       <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename> tests dump
+       and restore of regression database using all dump formats. Otherwise
+       tests only <literal>plain</literal> format. Not enabled by default
+       because it is resource intensive.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61ef..613512ffe7d 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,6 +13,7 @@ use File::Path qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +37,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -61,6 +62,44 @@ sub filter_dump
 	return $dump_file_filtered;
 }
 
+# Test that the given two files match.  The files usually contain pg_dump
+# output in "plain" format. Output the difference if any.
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context if the dumps do not match.
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+}
+
+# Adjust the contents of a dump before its use in a content comparison for dump
+# and restore testing. This returns the path to the adjusted dump.
+sub adjust_dump_for_restore
+{
+	my ($dump_file, $is_original) = @_;
+	my $dump_adjusted = "${dump_file}_adjusted";
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "opening $dump_adjusted ";
+	print $dh adjust_regress_dumpfile(slurp_file($dump_file), $is_original);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 # The test of pg_upgrade requires two clusters, an old one and a new one
 # that gets upgraded.  Before running the upgrade, a logical dump of the
 # old cluster is taken, and a second logical dump of the new one is taken
@@ -258,6 +297,12 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate test, but doing it here saves us one full
+	# regression run. Do this while the old cluster remains usable before
+	# upgrading it.
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -502,24 +547,116 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered =
+  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered =
+  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
-
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn is left in running state. The dump is
+# restored on a fresh node created using given `node_params`. Plain dumps from
+# both the nodes are compared to make sure that all the dumped objects are
+# restored faithfully.
+sub test_regression_dump_restore
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Dump the original database in "plain" format for comparison later. The
+	# order of columns in COPY statements dumped from the original database and
+	# that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	my $src_dump_file = "$tempdir/src_dump.sql";
+	command_ok(
+		[
+			'pg_dump', '-s',
+			'--no-sync', '-d',
+			$src_node->connstr('regression'), '-f',
+			$src_dump_file
+		],
+		'pg_dump on original instance');
+	my $src_adjusted_dump = adjust_dump_for_restore($src_dump_file, 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	# Testing all dump formats takes longer. Do it only when explicitly
+	# requested.
+	my @formats;
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_formats\b/)
+	{
+		@formats = ('tar', 'directory', 'custom', 'plain');
+	}
+	else
+	{
+		@formats = ('plain');
+	}
+
+	for my $format (@formats)
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $format_spec = substr($format, 0, 1);
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database, we dump and restore data as well to catch any errors while
+		# doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in '$format' format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"pg_restore on destination instance in '$format' format");
+
+		# Dump restored database for comparison
+		my $dst_dump_file = "$tempdir/dest_dump.$format.sql";
+		command_ok(
+			[
+				'pg_dump', '-s',
+				'--no-sync', '-d',
+				$dst_node->connstr($restored_db), '-f',
+				$dst_dump_file
+			],
+			"pg_dump on instance restored with '$format' format");
+		my $dst_adjusted_dump = adjust_dump_for_restore($dst_dump_file, 0);
+
+		# Compare adjusted dumps, there should be no differences.
+		compare_dumps($src_adjusted_dump, $dst_adjusted_dump,
+			"dump outputs of original and restored regression database, using format '$format', match"
+		);
+	}
 }
 
 done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..7697b488b17
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,109 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and retore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in these C<CREATE TABLE ... INHERITS>
+statements in the dump file from original database to match that from the
+restored database.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<original>: 1 indicates that the given dump file is from the original
+database, else 0
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $original) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($original)
+	{
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger)/$1$3,$2/mgx;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
-- 
2.34.1

#22

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Ashutosh Bapat (#21)

Re: Test to dump and restore objects left behind by regression

On Mon, Sep 09, 2024 at 03:43:58PM +0530, Ashutosh Bapat wrote:

894be11adfa60ad1ce5f74534cf5f04e66d51c30 changed the schema in which
objects in test genereated_stored.sql are created. Because of this the
new test added by the patch was failing. Fixed the failure in the
attached.

On my laptop, testing the plain format adds roughly 12s, in a test
that now takes 1m20s to run vs 1m32s. Enabling regress_dump_formats
and adding three more formats counts for 45s of runtime. For a test
that usually shows up as the last one to finish for a heavily
parallelized run. So even the default of "plain" is going to be
noticeable, I am afraid.

+ test_regression_dump_restore($oldnode, %node_params);

Why is this only done for the main regression test suite? Perhaps it
could be useful as well for tests that want to check after their own
custom dumps, as a shortcut?

Linked to that. Could there be some use in being able to pass down a
list of databases to this routine, rather than being limited only to
"regression"? Think extension databases with USE_MODULE_DB that have
unique names.

+   # Dump the original database in "plain" format for comparison later. The
+   # order of columns in COPY statements dumped from the original database and
[...]
+   # Adjust the CREATE TABLE ... INHERITS statements.
+   if ($original)
+   {
+       $dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+                  (\n\s+b\sinteger),
+                  (\n\s+a\sinteger)/$1$3,$2/mgx;

The reason why $original exists is documented partially in both
002_pg_upgrade.pl and AdjustDump.pm. It would be better to
consolidate that only in AdjustDump.pm, I guess. Isn't the name
"$original" a bit too general when it comes to applying filters to
the dumps to as the order of the column re-dumped is under control?
Perhaps it would be adapted to use a hash that can be extended with
more than one parameter to control which filters are applied? For
example, imagine a %params where the caller of adjust_dumpfile() can
pass in a "filter_column_order => 1". The filters applied to the dump
are then self-documented. We could do with a second for the
whitespaces, as well.

What's the advantage of testing all the formats? Would that stuff
have been able to catch up more issues related to specific format(s)
when it came to the compression improvements with inheritance?

I'm wondering if it would make sense to also externalize the dump
comparison routine currently in the pg_upgrade script. Perhaps we
should be more ambitious and move more logic into AdjustDump.pm? If
we think that the full cycle of dump -> restore -> dump -> compare
could be used elsewhere, this would limit the footprint of what we are
doing in the pg_upgrade script in this patch and be able to do similar
stuff in out-of-core extensions or other tests. Let's say a
PostgreSQL::Test::Dump.pm?
--
Michael

#23

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Michael Paquier (#22)

Re: Test to dump and restore objects left behind by regression

Michael Paquier <michael@paquier.xyz> writes:

On my laptop, testing the plain format adds roughly 12s, in a test
that now takes 1m20s to run vs 1m32s. Enabling regress_dump_formats
and adding three more formats counts for 45s of runtime. For a test
that usually shows up as the last one to finish for a heavily
parallelized run. So even the default of "plain" is going to be
noticeable, I am afraid.

Yeah, that's what I've been afraid of from the start. There's
no way that this will buy us enough new coverage to justify
that sort of addition to every check-world run.

I'd be okay with adding it in a form where the default behavior
is to do no additional checking. Whether that's worth maintaining
is hard to say though.

regards, tom lane

#24

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Tom Lane (#23)

Re: Test to dump and restore objects left behind by regression

On Thu, Oct 31, 2024 at 10:26:01AM -0400, Tom Lane wrote:

I'd be okay with adding it in a form where the default behavior
is to do no additional checking. Whether that's worth maintaining
is hard to say though.

In terms of maintenance, it would be nice if we are able to minimize
the code added to the pg_upgrade suite, so as it would be simple to
switch this code elsewhere if need be.

I'd imagine a couple of new routines, in the lines of:
- Dump of a database into an output file given in input, as a routine
of Cluster.pm so as it is possible to do dumps from different major
versions. Format should be defined in input.
- Restore to a database from an input file, also as a routine of
Cluster.pm, for the major version argument.
- Filter of the dumps for the contents where column ordering is
inconsistent up at restore. In a new module.
- Comparison of two dumps, with potentially filters applied to them,
with diff printed. In a new module.
--
Michael

#25

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 1 year ago

In reply to: Michael Paquier (#24)

Re: Test to dump and restore objects left behind by regression

Hi Tom and Michael,

Thanks for your inputs.

I am replying to all the comments in a single email arranging related
comments together.

On Thu, Oct 31, 2024 at 11:26 AM Michael Paquier <michael@paquier.xyz> wrote:

On my laptop, testing the plain format adds roughly 12s, in a test
that now takes 1m20s to run vs 1m32s. Enabling regress_dump_formats
and adding three more formats counts for 45s of runtime. For a test
that usually shows up as the last one to finish for a heavily
parallelized run. So even the default of "plain" is going to be
noticeable, I am afraid.

On Thu, Oct 31, 2024 at 10:26:01AM -0400, Tom Lane wrote:

I'd be okay with adding it in a form where the default behavior
is to do no additional checking.

If I run the test alone, it takes 45s (master) vs 54s (with patch) on
my laptop. These readings are similar to what you have observed. The
restore step by itself takes most of the time, even if a. we eliminate
data, b. use formats other than plain or c. use --jobs=2. Hence I am
fine with Tom's suggestion i.e. default behaviour is to do no
additional testing. I propose to test all dump formats (including
plain) only when PG_TEST_EXTRA has "regress_dump_tests". But see next

What's the advantage of testing all the formats? Would that stuff
have been able to catch up more issues related to specific format(s)
when it came to the compression improvements with inheritance?

I haven't caught any more issues with formats other than "plain". It
is more for future-proof testing. I am fine if we want to test just
plain dump format for now. Adding more formats would be easier if
required.

Whether that's worth maintaining
is hard to say though.

In terms of maintenance, it would be nice if we are able to minimize
the code added to the pg_upgrade suite, so as it would be simple to
switch this code elsewhere if need be.

I think Tom hints at maintenance of
AdjustDump::adjust_dump_for_restore(). In future, if the difference
between dump from the original database and that from the restored
database grows, we will need to update
AdjustDump::adjust_dump_for_restore() accordingly. That will be some
maintenance. But the person introducing such changes will get a chance
to fix them if unintentional. That balances out any maintenance
efforts, I think.

+ test_regression_dump_restore($oldnode, %node_params);

Why is this only done for the main regression test suite? Perhaps it
could be useful as well for tests that want to check after their own
custom dumps, as a shortcut?

Linked to that. Could there be some use in being able to pass down a
list of databases to this routine, rather than being limited only to
"regression"? Think extension databases with USE_MODULE_DB that have
unique names.

I did think of it when implementing this function. In order to test
the custom dumps or extensions, adjust_regress_dumpfile() will need to
be extensible or the test will need a way to accept a custom dump file
for comparison. Without a concrete use case, adding the customization
hooks might go wrong and will need rework.
test_regression_dump_restore() itself is isolated enough that we can
extend it when the need arises. When the need arises we will know what
needs to be extensible and how. If you have a specific use case,
please let me know, I will accommodate it in my patch.

Perhaps we
should be more ambitious and move more logic into AdjustDump.pm? If
we think that the full cycle of dump -> restore -> dump -> compare
could be used elsewhere, this would limit the footprint of what we are
doing in the pg_upgrade script in this patch and be able to do similar
stuff in out-of-core extensions or other tests. Let's say a
PostgreSQL::Test::Dump.pm?

dump->restore->dump->compare pattern is seen only in 002_pg_upgrade
test. 002_compare_backups compares dumps from servers but does not use
the dump->restore->dump->compare pattern. If a similar pattern starts
appearing at multiple places, we will easily move
test_regression_dump_restore() to a common module to avoid code
duplication. That function is isolated enough for that purpose.

- Dump of a database into an output file given in input, as a routine
of Cluster.pm so as it is possible to do dumps from different major
versions. Format should be defined in input.

SInce you are suggesting adding the new routine to Cluster.pm, I
assume that you would like to use it in many tests (ideally every test
which uses pg_dump). I did attempt this when I wrote the last version
of the patch. Code to run a pg_dump command is just a few lines. The
tests invoke pg_dump in many different ways with many different
combinations of arguments. In order to cater all those invocations,
the routine in Cluster.pm needs to be very versatile and thus complex.
It will be certainly a dozen lines at least. If such a routine would
have been useful, it would have been added to Cluster.pm already. It's
not there, because it won't be useful.

We could turn the two invocations of pg_dump for comparison (in the
patch) into a routine if that helps. It might shave a few lines of
code. Since the routine won't be general, it should reside in
002_pg_upgrade where it is used.

If you have something else in mind, please let me know.

- Restore to a database from an input file, also as a routine of
Cluster.pm, for the major version argument.

Similar to above, each of the pg_restore invocations are just a few
lines but there is a lot of variety in those invocations.

- Filter of the dumps for the contents where column ordering is
inconsistent up at restore. In a new module.

Please note, this is filtering + adjustment. The routine is already in
a new module as you suggested earlier.

I'm wondering if it would make sense to also externalize the dump
comparison routine currently in the pg_upgrade script.
- Comparison of two dumps, with potentially filters applied to them,
with diff printed. In a new module.

It is a good idea to externalize the compare_dump() function in
PostgreSQL::Test::Utils. Similar code exists in
002_compare_backups.pl. 027_stream_regress.pl also uses compare() to
compare dump files but it uses `diff` command for the same. We can
change both usages to use compare_dump().

+   # Dump the original database in "plain" format for comparison later. The
+   # order of columns in COPY statements dumped from the original database and
[...]
+   # Adjust the CREATE TABLE ... INHERITS statements.
+   if ($original)
+   {
+       $dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+                  (\n\s+b\sinteger),
+                  (\n\s+a\sinteger)/$1$3,$2/mgx;

The reason why $original exists is documented partially in both
002_pg_upgrade.pl and AdjustDump.pm. It would be better to
consolidate that only in AdjustDump.pm, I guess.

I believe the comment in 0002_pg_upgrade.pl you quoted above and the
prologue of adjust_regress_dumpfile() are the two places you are
referring to. They serve different purposes. The one in 002_pg_upgrade
explains why we dump only schema for comparison. It is independent of
whether the dump is taken from the original database or target
database. The argument "original" to adjust_regress_dumpfile() is only
explained in the function's prologue in AdjustDump.pm. Am I missing
something?

Isn't the name
"$original" a bit too general when it comes to applying filters to
the dumps to as the order of the column re-dumped is under control?
Perhaps it would be adapted to use a hash that can be extended with
more than one parameter to control which filters are applied? For
example, imagine a %params where the caller of adjust_dumpfile() can
pass in a "filter_column_order => 1". The filters applied to the dump
are then self-documented. We could do with a second for the
whitespaces, as well.

I agree that "original" is a generic name. And I like your suggestion
partly. I will rename it as "adjust_column_order".

But I don't think we need to use a hash since filters like whitespace
are not dependent upon whether the dump is from source or target
database. IOW those filters are not optional. It will add extra
redirection unnecessarily. If in future we have to add another
adjustment which is applicable under certain conditions, we could use
a hash of switches but till then let's keep it simple.

--
Best Wishes,
Ashutosh Bapat

#26

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 1 year ago

In reply to: Ashutosh Bapat (#25)

4 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Thu, Nov 7, 2024 at 3:59 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi Tom and Michael,

Thanks for your inputs.

I am replying to all the comments in a single email arranging related
comments together.

On Thu, Oct 31, 2024 at 11:26 AM Michael Paquier <michael@paquier.xyz> wrote:

On my laptop, testing the plain format adds roughly 12s, in a test
that now takes 1m20s to run vs 1m32s. Enabling regress_dump_formats
and adding three more formats counts for 45s of runtime. For a test
that usually shows up as the last one to finish for a heavily
parallelized run. So even the default of "plain" is going to be
noticeable, I am afraid.

On Thu, Oct 31, 2024 at 10:26:01AM -0400, Tom Lane wrote:

I'd be okay with adding it in a form where the default behavior
is to do no additional checking.

If I run the test alone, it takes 45s (master) vs 54s (with patch) on
my laptop. These readings are similar to what you have observed. The
restore step by itself takes most of the time, even if a. we eliminate
data, b. use formats other than plain or c. use --jobs=2. Hence I am
fine with Tom's suggestion i.e. default behaviour is to do no
additional testing. I propose to test all dump formats (including
plain) only when PG_TEST_EXTRA has "regress_dump_tests".

Done.

But see next

What's the advantage of testing all the formats? Would that stuff
have been able to catch up more issues related to specific format(s)
when it came to the compression improvements with inheritance?

I haven't caught any more issues with formats other than "plain". It
is more for future-proof testing. I am fine if we want to test just
plain dump format for now. Adding more formats would be easier if
required.

Not done for now. Given that the 'directory' formats dumps the tables
in separate directories, and thus has some impact on how child tables
would be dumped and restored, I think we should at least have plain
and directory tested in this test. But I will wait for other opinion
before removing formats other than plain.

Whether that's worth maintaining
is hard to say though.

In terms of maintenance, it would be nice if we are able to minimize
the code added to the pg_upgrade suite, so as it would be simple to
switch this code elsewhere if need be.

I think Tom hints at maintenance of
AdjustDump::adjust_dump_for_restore(). In future, if the difference
between dump from the original database and that from the restored
database grows, we will need to update
AdjustDump::adjust_dump_for_restore() accordingly. That will be some
maintenance. But the person introducing such changes will get a chance
to fix them if unintentional. That balances out any maintenance
efforts, I think.

I added a test in AdjustDump::adjust_dump_for_restore() to make sure
that the column order adjustment is indeed applied. Thus now the test
will fail when
a. the adjustment is not needed anymore, in which we could remove
adjustment logic
b. more adjustments are required

Interestingly, I have caught a new difference in dump from original
and restored database. See the difference between attached plain dump
files. I will start a new thread to see if this difference is
legitimate. Had this test been part of core, we would have caught it
earlier.

Because of this difference, the test is failing. I will wait for the
conclusion on the other thread before adding more adjustments.

We could turn the two invocations of pg_dump for comparison (in the
patch) into a routine if that helps. It might shave a few lines of
code. Since the routine won't be general, it should reside in
002_pg_upgrade where it is used.

Done. Added a function to take dump output from given server and
adjust it. The function is used for both original and restored
database. Shaves a handful lines and deduplicates the logic to take
dump and adjust. I like the end result.

I agree that "original" is a generic name. And I like your suggestion
partly. I will rename it as "adjust_column_order".

Done.

Also added AdjustDump.pm to the list of modules to be installed in
meson.build and Makefile.

All these changes are part of 0001 patch now.

I'm wondering if it would make sense to also externalize the dump
comparison routine currently in the pg_upgrade script.
- Comparison of two dumps, with potentially filters applied to them,
with diff printed. In a new module.

It is a good idea to externalize the compare_dump() function in
PostgreSQL::Test::Utils. Similar code exists in
002_compare_backups.pl. 027_stream_regress.pl also uses compare() to
compare dump files but it uses `diff` command for the same. We can
change both usages to use compare_dump().

I have made that change in 0002 patch. The resultant code looks
better, it standardizes the way we compare dumps and report
differences if any. As a bonus, the dump files being compared are
"noted" in regress_log_* so that it's easy to locate them for
debugging and investigation. New
PostgreSQL::Test::Utils::compare_dumps() routine compares the contents
of given two dump files. If the files do not match it will print the
difference along with the paths of files. If the files match, it will
"note" the paths. Three tests use this routine now
002_compare_backups, 002_pg_upgrade and 027_stream_regress. With this
change 002_pg_upgrade will start "note"ing path of dump files which it
didn't do before. The resultant output in regress_log_... file is more
useful, I think

```
ok 16 - dump outputs of original and restored regression database,
using format 'tar', match
# first dump file:
/masked-path/build/dev/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_RIwT/src_dump.sql_adjusted
# second dump file:
/masked-path/build/dev/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_RIwT/dest_dump.tar.sql_adjusted
```
027_stream_regress used command_ok + diff for the same purpose. But I
don't see a reason, in relevant thread [1]/messages/by-id/CA+hUKGK-+mg6RWiDu0JudF6jWeL5+gPmi8EKUm1eAzmdbwiE_A@mail.gmail.com, why it can't use the new
routine instead.

[1]: /messages/by-id/CA+hUKGK-+mg6RWiDu0JudF6jWeL5+gPmi8EKUm1eAzmdbwiE_A@mail.gmail.com

I am not against the other suggestions to make the functions, code
added by this patch more general and extensible. But without an
example or case for such generalization and/or extensibility, it's
hard to get it right. And the functions and code are isolated enough
that we could generalize and extend them if the need arises.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Add-PostreSQL-Test-Utils-compare_dumps-20241114.patchtext/x-patch; charset=US-ASCII; name=0002-Add-PostreSQL-Test-Utils-compare_dumps-20241114.patchDownload

From 7c815b425612dd3682ddd02353e3862815798119 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Wed, 13 Nov 2024 11:41:45 +0530
Subject: [PATCH 3/3] Add PostreSQL::Test::Utils::compare_dumps()

Multiple tests compare pg_dump outputs taken from two clusters in plain format
as a way to compare the contents of those clusters. Standardize and modularize
that process into a routine.

Author: Ashutosh Bapat (ashutosh.bapat.oss@gmail.com) per suggestion by Michael Pacquier
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 .../pg_combinebackup/t/002_compare_backups.pl | 18 +------
 src/bin/pg_upgrade/t/002_pg_upgrade.pl        |  5 +-
 src/test/perl/PostgreSQL/Test/Utils.pm        | 48 +++++++++++++++++++
 src/test/recovery/t/027_stream_regress.pl     | 14 +++---
 4 files changed, 57 insertions(+), 28 deletions(-)

diff --git a/src/bin/pg_combinebackup/t/002_compare_backups.pl b/src/bin/pg_combinebackup/t/002_compare_backups.pl
index 63a0255de15..23bc8504eb3 100644
--- a/src/bin/pg_combinebackup/t/002_compare_backups.pl
+++ b/src/bin/pg_combinebackup/t/002_compare_backups.pl
@@ -185,22 +185,6 @@ $pitr1->command_ok(
 	'dump from PITR 2');
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1, $dump2);
-note($dump1);
-note($dump2);
-is($compare_res, 0, "dumps are identical");
-
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
-{
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1, $dump2 ]);
-	print "=== diff of $dump1 and $dump2\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
-}
+compare_dumps($dump1, $dump2, "dumps are identical");
 
 done_testing();
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 35d3d3970d8..216574dc6d6 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,9 +6,8 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 022b44ba22b..6efe5faf77d 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -89,6 +90,8 @@ our @EXPORT = qw(
   command_fails_like
   command_checks_all
 
+  compare_dumps
+
   $windows_os
   $is_msys2
   $use_unix_sockets
@@ -1081,6 +1084,51 @@ sub command_checks_all
 
 =pod
 
+=item compare_dumps(dump1, dump2, testname)
+
+Test that the given two files match. The files usually contain pg_dump output in
+"plain" format. Output the difference if any.
+
+=over
+
+=item C<dump1> and C<dump2>: Dump files to compare
+
+=item C<testname>: test name
+
+=back
+
+=cut
+
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+	else
+	{
+		note('first dump file: ' . $dump1);
+		note('second dump file: ' . $dump2);
+	}
+
+	return;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index d1ae32d97d6..b5ea1356751 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -116,8 +116,9 @@ command_ok(
 		'--no-sync', '-p', $node_standby_1->port
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
+compare_dumps(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -146,12 +147,9 @@ command_ok(
 		'regression'
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump'
-	],
+compare_dumps(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20241114.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20241114.patchDownload

From 75ed13c1d9d331e1e7735bcc2ad2610e07d92409 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 2/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade on the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. The test
thus covers dump and restore scenarios not covered by individual
dump/restore cases. Many regression tests leave objects behind so that
they are tested by this test. But till now this test only tested
dump/restore through pg_upgrade which is different from dump/restore
through pg_dump. Adding this test closes that gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_tests" in
PG_TEST_EXTRA.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  13 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 141 +++++++++++++++++---
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 118 ++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 258 insertions(+), 17 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index f4cef9e80f7..f04799382ba 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,19 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_formats</literal></term>
+     <listitem>
+      <para>
+       When enabled,
+       <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename> tests dump
+       and restore of regression database using all dump formats. Otherwise
+       tests only <literal>plain</literal> format. Not enabled by default
+       because it is resource intensive.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 3b9cb21cbd5..35d3d3970d8 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,6 +13,7 @@ use File::Path qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +37,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -262,6 +263,20 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds. Do it only when requested so as to
+	# avoid spending those extra seconds in every check-world run.
+	#
+	# Do this while the old cluster remains usable before upgrading it.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_tests\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -506,24 +521,116 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered =
+  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered =
+  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn is left in running state. The dump is
+# restored on a fresh node created using given `node_params`. Plain dumps from
+# both the nodes are compared to make sure that all the dumped objects are
+# restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $format_spec = substr($format, 0, 1);
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database, we dump and restore data as well to catch any errors while
+		# doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in '$format' format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restore dump taken in '$format' format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		# Compare adjusted dumps, there should be no differences.
+		compare_dumps($src_dump, $dst_dump,
+			"dump outputs of original and restored regression database, using format '$format', match"
+		);
+	}
+}
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+# Dump database pointed by given connection string in plain format and adjust it
+# for database comparison.
+#
+# file_prefix is used to create unique names for all dump files, so that they
+# remain available for debugging in case the test fails.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "opening $dump_adjusted ";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
 }
 
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index c02f18454e3..91235204c7a 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..cc813d9363d
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,118 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and retore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in these C<CREATE TABLE ... INHERITS>
+statements in the dump file from original database to match that from the
+restored database.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns as described above, usually when the dump is from original
+database. 0 indicates no such adjustment is needed, usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger)/$1$3,$2/mgx;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		# Following test will fail if this adjustment is not required. We should
+		# remove adjustment code then.
+		ok($saved_dump ne $dump, 'dump required adjustment');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index fc9cf971ea3..3a98ac49daa 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
-- 
2.34.1

dest_dump.plain.sql_adjustedapplication/octet-stream; name=dest_dump.plain.sql_adjustedDownload

src_dump.sql_adjustedapplication/octet-stream; name=src_dump.sql_adjustedDownload

#27

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 1 year ago

In reply to: Ashutosh Bapat (#26)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Thu, Nov 14, 2024 at 4:16 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

But see next

What's the advantage of testing all the formats? Would that stuff
have been able to catch up more issues related to specific format(s)
when it came to the compression improvements with inheritance?

I haven't caught any more issues with formats other than "plain". It
is more for future-proof testing. I am fine if we want to test just
plain dump format for now. Adding more formats would be easier if
required.

Not done for now. Given that the 'directory' formats dumps the tables
in separate directories, and thus has some impact on how child tables
would be dumped and restored, I think we should at least have plain
and directory tested in this test. But I will wait for other opinion
before removing formats other than plain.

I gave this another thought. Looking at the documentation [1]https://www.postgresql.org/docs/current/app-pgdump.html, each
format does something different that affects the way objects are
dumped and restored. Eliminating one or the other means we lose
corresponding coverage in dump or restore. So I have left this
untouched again.

Interestingly, I have caught a new difference in dump from original
and restored database. See the difference between attached plain dump
files. I will start a new thread to see if this difference is
legitimate. Had this test been part of core, we would have caught it
earlier.

Because of this difference, the test is failing. I will wait for the
conclusion on the other thread before adding more adjustments.

The new test uncovered an issue related to NOT NULL constraints [2]/messages/by-id/CAExHW5tbdgAKDfqjDJ-7Fk6PJtHg8D4zUF6FQ4H2Pq8zK38Nyw@mail.gmail.com.
We have committed a fix for that bug. So far this test has unearthed
two bugs in committed changes in just one year. That proves the worth
of this test. There are many projects, in flight, which implement new
objects or new states of existing objects. I think this test will help
in all those projects.

I have rebased my patches on the current HEAD. The test now passes and
does not show any new diff or bug.

Squashed all the patches into one. While rebasing I found that
002_compare_backups has changed the way it compares dumps slightly. I
have left it outside of this patch right now.

I am not against the other suggestions to make the functions, code
added by this patch more general and extensible. But without an
example or case for such generalization and/or extensibility, it's
hard to get it right. And the functions and code are isolated enough
that we could generalize and extend them if the need arises.

We can work on extending this further after the basic test is
committed. But if we delay committing the test for the extensibility
we might lose another bug.

[1]: https://www.postgresql.org/docs/current/app-pgdump.html
[2]: /messages/by-id/CAExHW5tbdgAKDfqjDJ-7Fk6PJtHg8D4zUF6FQ4H2Pq8zK38Nyw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20241218.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20241218.patchDownload

From a70b42bb88e7c885a67913b67f630c2e2ea6faa5 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Many regression tests mention tht they leave objects
behind for dump/restore testing. But till now 002_pg_upgrade only tested
dump/restore through pg_upgrade which is different from dump/restore
through pg_dump. Adding the new testcase closes that gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewer:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Multiple tests compare pg_dump outputs taken from two clusters in plain
format as a way to compare the contents of those clusters. Add
PostreSQL::Test::Utils::compare_dumps() to standardize and modularize
the comparison.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  11 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 145 +++++++++++++++++---
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 122 ++++++++++++++++
 src/test/perl/PostgreSQL/Test/Utils.pm      |  48 +++++++
 src/test/perl/meson.build                   |   1 +
 src/test/recovery/t/027_stream_regress.pl   |  14 +-
 7 files changed, 315 insertions(+), 28 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index f4cef9e80f7..4be5d2d7d52 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,17 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 82a82a1841a..42b68527146 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,13 +6,13 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +36,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -262,6 +262,20 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds. Do it only when requested so as to
+	# avoid spending those extra seconds in every check-world run.
+	#
+	# Do this while the old cluster is running before the upgrade.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -511,24 +525,115 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered =
+  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered =
+  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn is left in running state. The dump is
+# restored on a fresh node created using given `node_params`. Plain dumps from
+# both the nodes are compared to make sure that all the dumped objects are
+# restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $format_spec = substr($format, 0, 1);
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restore dump taken in $format format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		compare_dumps($src_dump, $dst_dump,
+			"dump outputs of original and restored regression database, using $format format match"
+		);
+	}
+}
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+# Dump database pointed by given connection string in plain format and adjust it
+# to compare dumps from original and restored database.
+#
+# file_prefix is used to create unique names for all dump files, so that they
+# remain available for debugging in case the test fails.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "opening $dump_adjusted ";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
 }
 
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index c02f18454e3..91235204c7a 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..0b0abb0cefc
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,122 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and retore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in these C<CREATE TABLE ... INHERITS>
+statements in the dump file from original database to match that from the
+restored database.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied gtestxx_4 adjustments');
+
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c1 adjustments');
+
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 022b44ba22b..6efe5faf77d 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -89,6 +90,8 @@ our @EXPORT = qw(
   command_fails_like
   command_checks_all
 
+  compare_dumps
+
   $windows_os
   $is_msys2
   $use_unix_sockets
@@ -1081,6 +1084,51 @@ sub command_checks_all
 
 =pod
 
+=item compare_dumps(dump1, dump2, testname)
+
+Test that the given two files match. The files usually contain pg_dump output in
+"plain" format. Output the difference if any.
+
+=over
+
+=item C<dump1> and C<dump2>: Dump files to compare
+
+=item C<testname>: test name
+
+=back
+
+=cut
+
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+	else
+	{
+		note('first dump file: ' . $dump1);
+		note('second dump file: ' . $dump2);
+	}
+
+	return;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index fc9cf971ea3..3a98ac49daa 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index d1ae32d97d6..b5ea1356751 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -116,8 +116,9 @@ command_ok(
 		'--no-sync', '-p', $node_standby_1->port
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
+compare_dumps(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -146,12 +147,9 @@ command_ok(
 		'regression'
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump'
-	],
+compare_dumps(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.34.1

#28

Daniel Gustafsson

daniel@yesql.se

about 1 year ago

In reply to: Ashutosh Bapat (#27)

Re: Test to dump and restore objects left behind by regression

On 18 Dec 2024, at 12:28, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

In general I think it's fine to have such an expensive test gated behind a
PG_TEST_EXTRA flag, and since it's only run on demand we might as well run it
for all formats while at it. If this ran just once per week in the buildfarm
it would still allow us to catch things in time at fairly low overall cost.

I have rebased my patches on the current HEAD. The test now passes and
does not show any new diff or bug.

A few comments on this version of the patch:

+ regression run. Not enabled by default because it is time consuming.
Since this test consumes both time and to some degree diskspace (the dumpfiles)
I wonder if this should be "time and resource consuming".

+   if (   $ENV{PG_TEST_EXTRA}
+       && $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
Should this also test that $oldnode and $newnode have matching pg_version to
keep this from running in a cross-version upgrade test?  While it can be argued
that running this in a cross-version upgrade is breaking it and getting to keep
both pieces, it's also not ideal to run a resource intensive test we know will
fail.  (It can't be done at this exact callsite, just picked to illustrate.)

-sub filter_dump
+sub filter_dump_for_upgrade
What is the reason for the rename?  filter_dump() is perhaps generic but it's
also local to the upgrade test so it's also not too unclear.

+ my $format_spec = substr($format, 0, 1);
This doesn't seem great for readability, how about storing the formats and
specfiers in an array of Perl hashes which can be iterated over with
descriptive names, like $format{'name'} and $format{'spec'}?

+ || die "opening $dump_adjusted ";
Please include the errno in the message using ": $!" appended to the error
message, it could help in debugging.

+compare the results of dump and retore tests
s/retore/restore/

+   else
+   {
+       note('first dump file: ' . $dump1);
+       note('second dump file: ' . $dump2);
+   }
+
This doesn't seem particularly helpful, if the tests don't fail then printing
the names won't bring any needed information.  What we could do here is to add
an is() test in compare_dump()s to ensure the filenames differ to catch any
programmer error in passing in the same file twice.

--
Daniel Gustafsson

#29

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 1 year ago

In reply to: Daniel Gustafsson (#28)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Wed, Dec 18, 2024 at 7:39 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 18 Dec 2024, at 12:28, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

In general I think it's fine to have such an expensive test gated behind a
PG_TEST_EXTRA flag, and since it's only run on demand we might as well run it
for all formats while at it. If this ran just once per week in the buildfarm
it would still allow us to catch things in time at fairly low overall cost.

I have rebased my patches on the current HEAD. The test now passes and
does not show any new diff or bug.

A few comments on this version of the patch:

+ regression run. Not enabled by default because it is time consuming.
Since this test consumes both time and to some degree diskspace (the dumpfiles)
I wonder if this should be "time and resource consuming".

Done.

+   if (   $ENV{PG_TEST_EXTRA}
+       && $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
Should this also test that $oldnode and $newnode have matching pg_version to
keep this from running in a cross-version upgrade test?  While it can be argued
that running this in a cross-version upgrade is breaking it and getting to keep
both pieces, it's also not ideal to run a resource intensive test we know will
fail.  (It can't be done at this exact callsite, just picked to illustrate.)

You already wrote it in parenthesis. At the exact callsite $oldnode
and $newnode can not be of different versions. In fact newnode is yet
to be created at this point. But $oldnode has the same version as the
server run from the code. In a cross-version upgrade this test will
not be executed. I am confused as to what this comment is about.

-sub filter_dump
+sub filter_dump_for_upgrade
What is the reason for the rename?  filter_dump() is perhaps generic but it's
also local to the upgrade test so it's also not too unclear.

In one of the earlier versions of the patch, there was
filter_dump_for_regress or some such function which was used to filter
the dump from the regression database. Name was changed to
differentiate between the two functions. But the new function is now
named as adjust_regress_dumpfile() so this name change is not required
anymore. Reverting it. I have left the comment change since the test
file now has tests for both upgrade and dump/restore.

+ my $format_spec = substr($format, 0, 1);
This doesn't seem great for readability, how about storing the formats and
specfiers in an array of Perl hashes which can be iterated over with
descriptive names, like $format{'name'} and $format{'spec'}?

Instead of an array of hashes, I used a single hash with format
description as key and format spec as value. Hope that's acceptable.

+ || die "opening $dump_adjusted ";
Please include the errno in the message using ": $!" appended to the error
message, it could help in debugging.

I didn't see this being used with other open calls in the file. For
that matter we are not using $! with open() in many test files. But it
seems useful. Done

+compare the results of dump and retore tests
s/retore/restore/

Thanks for pointing out. Fixed.

+   else
+   {
+       note('first dump file: ' . $dump1);
+       note('second dump file: ' . $dump2);
+   }
+
This doesn't seem particularly helpful, if the tests don't fail then printing
the names won't bring any needed information.  What we could do here is to add
an is() test in compare_dump()s to ensure the filenames differ to catch any
programmer error in passing in the same file twice.

Good suggestion. Done.

0001 - same as 0001 from previous version
0002 - addresses above comments

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20241220.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20241220.patchDownload

From 5ab6dd99438dbb1a77151f5faa0a4104aec5ce74 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Many regression tests mention tht they leave objects
behind for dump/restore testing. But till now 002_pg_upgrade only tested
dump/restore through pg_upgrade which is different from dump/restore
through pg_dump. Adding the new testcase closes that gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewer:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Multiple tests compare pg_dump outputs taken from two clusters in plain
format as a way to compare the contents of those clusters. Add
PostreSQL::Test::Utils::compare_dumps() to standardize and modularize
the comparison.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  11 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 145 +++++++++++++++++---
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 122 ++++++++++++++++
 src/test/perl/PostgreSQL/Test/Utils.pm      |  48 +++++++
 src/test/perl/meson.build                   |   1 +
 src/test/recovery/t/027_stream_regress.pl   |  14 +-
 7 files changed, 315 insertions(+), 28 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index f4cef9e80f7..4be5d2d7d52 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,17 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 82a82a1841a..42b68527146 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,13 +6,13 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +36,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -262,6 +262,20 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds. Do it only when requested so as to
+	# avoid spending those extra seconds in every check-world run.
+	#
+	# Do this while the old cluster is running before the upgrade.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -511,24 +525,115 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered =
+  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered =
+  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn is left in running state. The dump is
+# restored on a fresh node created using given `node_params`. Plain dumps from
+# both the nodes are compared to make sure that all the dumped objects are
+# restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $format_spec = substr($format, 0, 1);
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restore dump taken in $format format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		compare_dumps($src_dump, $dst_dump,
+			"dump outputs of original and restored regression database, using $format format match"
+		);
+	}
+}
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+# Dump database pointed by given connection string in plain format and adjust it
+# to compare dumps from original and restored database.
+#
+# file_prefix is used to create unique names for all dump files, so that they
+# remain available for debugging in case the test fails.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "opening $dump_adjusted ";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
 }
 
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index c02f18454e3..91235204c7a 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..0b0abb0cefc
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,122 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and retore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in these C<CREATE TABLE ... INHERITS>
+statements in the dump file from original database to match that from the
+restored database.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied gtestxx_4 adjustments');
+
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c1 adjustments');
+
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 022b44ba22b..6efe5faf77d 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -89,6 +90,8 @@ our @EXPORT = qw(
   command_fails_like
   command_checks_all
 
+  compare_dumps
+
   $windows_os
   $is_msys2
   $use_unix_sockets
@@ -1081,6 +1084,51 @@ sub command_checks_all
 
 =pod
 
+=item compare_dumps(dump1, dump2, testname)
+
+Test that the given two files match. The files usually contain pg_dump output in
+"plain" format. Output the difference if any.
+
+=over
+
+=item C<dump1> and C<dump2>: Dump files to compare
+
+=item C<testname>: test name
+
+=back
+
+=cut
+
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+	else
+	{
+		note('first dump file: ' . $dump1);
+		note('second dump file: ' . $dump2);
+	}
+
+	return;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index fc9cf971ea3..3a98ac49daa 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index d1ae32d97d6..b5ea1356751 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -116,8 +116,9 @@ command_ok(
 		'--no-sync', '-p', $node_standby_1->port
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
+compare_dumps(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -146,12 +147,9 @@ command_ok(
 		'regression'
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump'
-	],
+compare_dumps(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.34.1

0002-Address-comments-by-Daniel-Gustafsson-20241220.patchtext/x-patch; charset=US-ASCII; name=0002-Address-comments-by-Daniel-Gustafsson-20241220.patchDownload

From 74f9a88c6f7ddfe26019dbd50f98c2789029ad9f Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Fri, 20 Dec 2024 15:22:14 +0530
Subject: [PATCH 2/2] Address comments by Daniel Gustafsson

To be merged with the earlier commit.
---
 doc/src/sgml/regress.sgml                   |  3 ++-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 14 ++++++--------
 src/test/perl/PostgreSQL/Test/AdjustDump.pm |  2 +-
 src/test/perl/PostgreSQL/Test/Utils.pm      |  7 +++++--
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 4be5d2d7d52..60da8eb95e5 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -343,7 +343,8 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       <para>
        When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
        tests dump and restore of regression database left behind by the
-       regression run. Not enabled by default because it is time consuming.
+       regression run. Not enabled by default because it is time and resource
+       consuming.
       </para>
      </listitem>
     </varlistentry>
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 42b68527146..a817ed0d00b 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -38,7 +38,7 @@ sub generate_db
 
 # Filter the contents of a dump before its use in a content comparison for
 # upgrade testing. This returns the path to the filtered dump.
-sub filter_dump_for_upgrade
+sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -525,10 +525,8 @@ push(@dump_command, '--extra-float-digits', '0')
 $newnode->command_ok(\@dump_command, 'dump after running pg_upgrade');
 
 # Filter the contents of the dumps.
-my $dump1_filtered =
-  filter_dump_for_upgrade(1, $oldnode->pg_version, $dump1_file);
-my $dump2_filtered =
-  filter_dump_for_upgrade(0, $oldnode->pg_version, $dump2_file);
+my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
+my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
 compare_dumps($dump1_filtered, $dump2_filtered,
@@ -545,6 +543,7 @@ sub test_regression_dump_restore
 {
 	my ($src_node, %node_params) = @_;
 	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
 
 	# Dump the original database for comparison later.
 	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
@@ -554,10 +553,9 @@ sub test_regression_dump_restore
 	$dst_node->init(%node_params);
 	$dst_node->start;
 
-	for my $format ('plain', 'tar', 'directory', 'custom')
+	while (my ($format, $format_spec) = each %dump_formats)
 	{
 		my $dump_file = "$tempdir/regression_dump.$format";
-		my $format_spec = substr($format, 0, 1);
 		my $restored_db = 'regression_' . $format;
 
 		# Even though we compare only schema from the original and the restored
@@ -628,7 +626,7 @@ sub get_dump_for_comparison
 		'dump for comparison succeeded');
 
 	open(my $dh, '>', $dump_adjusted)
-	  || die "opening $dump_adjusted ";
+	  || die "could not open $dump_adjusted for writing the adjusted dump: $!";
 	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
 		$adjust_child_columns);
 	close($dh);
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
index 0b0abb0cefc..5b9990e4719 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -18,7 +18,7 @@ PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
 =head1 DESCRIPTION
 
 C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
-compare the results of dump and retore tests
+compare the results of dump and restore tests
 
 =cut
 
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 6efe5faf77d..bf56eb4b23c 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1120,8 +1120,11 @@ sub compare_dumps
 	}
 	else
 	{
-		note('first dump file: ' . $dump1);
-		note('second dump file: ' . $dump2);
+		# Fail if the comparison succeeds because the files are the same. This
+		# will detect simple programming errors. It won't detect more complex
+		# errors like passing different links pointing to the same underlying
+		# file.
+		ok($dump1 ne $dump2, "dump files being compared are distinct")
 	}
 
 	return;
-- 
2.34.1

#30

Daniel Gustafsson

daniel@yesql.se

about 1 year ago

In reply to: Ashutosh Bapat (#29)

Re: Test to dump and restore objects left behind by regression

On 20 Dec 2024, at 11:01, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Dec 18, 2024 at 7:39 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 18 Dec 2024, at 12:28, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

+   if (   $ENV{PG_TEST_EXTRA}
+       && $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
Should this also test that $oldnode and $newnode have matching pg_version to
keep this from running in a cross-version upgrade test?  While it can be argued
that running this in a cross-version upgrade is breaking it and getting to keep
both pieces, it's also not ideal to run a resource intensive test we know will
fail.  (It can't be done at this exact callsite, just picked to illustrate.)
You already wrote it in parenthesis. At the exact callsite $oldnode
and $newnode can not be of different versions. In fact newnode is yet
to be created at this point. But $oldnode has the same version as the
server run from the code. In a cross-version upgrade this test will
not be executed. I am confused as to what this comment is about.

Sure, it can't be checked until $newnode is created, but it seems like a cheap
test to ensure it's not executed as part of someones cross-version tests.

+ my $format_spec = substr($format, 0, 1);
This doesn't seem great for readability, how about storing the formats and
specfiers in an array of Perl hashes which can be iterated over with
descriptive names, like $format{'name'} and $format{'spec'}?

Instead of an array of hashes, I used a single hash with format
description as key and format spec as value. Hope that's acceptable.

LGTM.

--
Daniel Gustafsson

#31

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 1 year ago

In reply to: Daniel Gustafsson (#30)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Fri, Dec 27, 2024 at 6:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 20 Dec 2024, at 11:01, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Dec 18, 2024 at 7:39 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 18 Dec 2024, at 12:28, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
+   if (   $ENV{PG_TEST_EXTRA}
+       && $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
Should this also test that $oldnode and $newnode have matching pg_version to
keep this from running in a cross-version upgrade test?  While it can be argued
that running this in a cross-version upgrade is breaking it and getting to keep
both pieces, it's also not ideal to run a resource intensive test we know will
fail.  (It can't be done at this exact callsite, just picked to illustrate.)
You already wrote it in parenthesis. At the exact callsite $oldnode
and $newnode can not be of different versions. In fact newnode is yet
to be created at this point. But $oldnode has the same version as the
server run from the code. In a cross-version upgrade this test will
not be executed. I am confused as to what this comment is about.
Sure, it can't be checked until $newnode is created, but it seems like a cheap
test to ensure it's not executed as part of someones cross-version tests.

Hmm. The new node is always the node created with the version of code.
It's the old node which may have a different version. Hence I added
code to compare the versions of source node (which is the oldnode) and
destination node (which is created the same way as the new node and
hence has the same version as the new node) in
test_regression_dump_restore() itself. Additionally the code makes
sure that the oldnode doesn't use a custom install path. This is 0002
patch. 0001 in this patchset is 0001 + 0002 in the earlier patch set.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20241231.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20241231.patchDownload

From 23e97c45827dec4a0bc8544b20b65d4a731b0b5f Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Many regression tests mention tht they leave objects
behind for dump/restore testing. But till now 002_pg_upgrade only tested
dump/restore through pg_upgrade which is different from dump/restore
through pg_dump. Adding the new testcase closes that gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Multiple tests compare pg_dump outputs taken from two clusters in plain
format as a way to compare the contents of those clusters. Add
PostreSQL::Test::Utils::compare_dumps() to standardize and modularize
the comparison.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 137 +++++++++++++++++---
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 122 +++++++++++++++++
 src/test/perl/PostgreSQL/Test/Utils.pm      |  51 ++++++++
 src/test/perl/meson.build                   |   1 +
 src/test/recovery/t/027_stream_regress.pl   |  14 +-
 7 files changed, 314 insertions(+), 25 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index f4cef9e80f7..60da8eb95e5 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 82a82a1841a..a817ed0d00b 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,13 +6,13 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +262,20 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds. Do it only when requested so as to
+	# avoid spending those extra seconds in every check-world run.
+	#
+	# Do this while the old cluster is running before the upgrade.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -515,20 +529,109 @@ my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
 my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn is left in running state. The dump is
+# restored on a fresh node created using given `node_params`. Plain dumps from
+# both the nodes are compared to make sure that all the dumped objects are
+# restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	while (my ($format, $format_spec) = each %dump_formats)
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restore dump taken in $format format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		compare_dumps($src_dump, $dst_dump,
+			"dump outputs of original and restored regression database, using $format format match"
+		);
+	}
+}
+
+# Dump database pointed by given connection string in plain format and adjust it
+# to compare dumps from original and restored database.
+#
+# file_prefix is used to create unique names for all dump files, so that they
+# remain available for debugging in case the test fails.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
 }
 
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index c02f18454e3..91235204c7a 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..5b9990e4719
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,122 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $original);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $original)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in these C<CREATE TABLE ... INHERITS>
+statements in the dump file from original database to match that from the
+restored database.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied gtestxx_4 adjustments');
+
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c1 adjustments');
+
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 022b44ba22b..bf56eb4b23c 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -89,6 +90,8 @@ our @EXPORT = qw(
   command_fails_like
   command_checks_all
 
+  compare_dumps
+
   $windows_os
   $is_msys2
   $use_unix_sockets
@@ -1081,6 +1084,54 @@ sub command_checks_all
 
 =pod
 
+=item compare_dumps(dump1, dump2, testname)
+
+Test that the given two files match. The files usually contain pg_dump output in
+"plain" format. Output the difference if any.
+
+=over
+
+=item C<dump1> and C<dump2>: Dump files to compare
+
+=item C<testname>: test name
+
+=back
+
+=cut
+
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+	else
+	{
+		# Fail if the comparison succeeds because the files are the same. This
+		# will detect simple programming errors. It won't detect more complex
+		# errors like passing different links pointing to the same underlying
+		# file.
+		ok($dump1 ne $dump2, "dump files being compared are distinct")
+	}
+
+	return;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index fc9cf971ea3..3a98ac49daa 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index d1ae32d97d6..b5ea1356751 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -116,8 +116,9 @@ command_ok(
 		'--no-sync', '-p', $node_standby_1->port
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
+compare_dumps(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -146,12 +147,9 @@ command_ok(
 		'regression'
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump'
-	],
+compare_dumps(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.34.1

0002-Don-t-run-the-test-when-testing-cross-versi-20241231.patchtext/x-patch; charset=US-ASCII; name=0002-Don-t-run-the-test-when-testing-cross-versi-20241231.patchDownload

From 9235027bcfca33f781e8de3c4f680903ca46d9df Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 31 Dec 2024 16:58:52 +0530
Subject: [PATCH 2/2] Don't run the test when testing cross-version setup

... per suggestion by Daniel Gustafsson. To be squashed into the first commit
before committing to head.
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index a817ed0d00b..2ba325f0598 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -545,6 +545,17 @@ sub test_regression_dump_restore
 	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
 	my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
 
+	# Make sure that the source and destination nodes have same version and do
+	# not use custom install paths. In both the cases, the dump and restore test
+	# will mostly fail after utilizing the time and resources unnecessarily.
+	# Don't run the test in such a case.
+	if ($src_node->pg_version != $dst_node->pg_version or
+		defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
 	# Dump the original database for comparison later.
 	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
 		'src_dump', 1);
-- 
2.34.1

#32

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

12 months ago

In reply to: Ashutosh Bapat (#31)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Tue, Dec 31, 2024 at 5:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Dec 27, 2024 at 6:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 20 Dec 2024, at 11:01, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Dec 18, 2024 at 7:39 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 18 Dec 2024, at 12:28, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
+   if (   $ENV{PG_TEST_EXTRA}
+       && $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
Should this also test that $oldnode and $newnode have matching pg_version to
keep this from running in a cross-version upgrade test?  While it can be argued
that running this in a cross-version upgrade is breaking it and getting to keep
both pieces, it's also not ideal to run a resource intensive test we know will
fail.  (It can't be done at this exact callsite, just picked to illustrate.)
You already wrote it in parenthesis. At the exact callsite $oldnode
and $newnode can not be of different versions. In fact newnode is yet
to be created at this point. But $oldnode has the same version as the
server run from the code. In a cross-version upgrade this test will
not be executed. I am confused as to what this comment is about.
Sure, it can't be checked until $newnode is created, but it seems like a cheap
test to ensure it's not executed as part of someones cross-version tests.
Hmm. The new node is always the node created with the version of code.
It's the old node which may have a different version. Hence I added
code to compare the versions of source node (which is the oldnode) and
destination node (which is created the same way as the new node and
hence has the same version as the new node) in
test_regression_dump_restore() itself. Additionally the code makes
sure that the oldnode doesn't use a custom install path. This is 0002
patch. 0001 in this patchset is 0001 + 0002 in the earlier patch set.

Here's a rebased patch with some cosmetic fixes, typos and grammar
fixes after a self review. I have squashed all the patches into a
single patch now.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20250115.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250115.patchDownload

From 2778fd8aa11ddfdf7df683803759fadcea40439f Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Multiple tests compare pg_dump outputs taken from two clusters in plain
format as a way to compare the contents of those clusters. Add
PostreSQL::Test::Utils::compare_dumps() to standardize and modularize
the comparison.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 152 +++++++++++++++++---
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 125 ++++++++++++++++
 src/test/perl/PostgreSQL/Test/Utils.pm      |  51 +++++++
 src/test/perl/meson.build                   |   1 +
 src/test/recovery/t/027_stream_regress.pl   |  14 +-
 7 files changed, 332 insertions(+), 25 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index f4cef9e80f7..60da8eb95e5 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index e49bff6454a..7c5563d0c9e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,13 +6,13 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -515,20 +530,123 @@ my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
 my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version or
+		defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+	while (my ($format, $format_spec) = each %dump_formats)
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		compare_dumps($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database pointed by given connection string `connstr` in plain format and adjust it
+# for comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+										$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
 }
 
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..c232ce2b1a5
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,125 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in the relevant
+C<CREATE TABLE... INHERITS> statements in the dump file from original database
+to match those from the restored database. We could instead adjust the
+statements in the dump from the restored database to match those from original
+database or adjust both to a canonical order. But we have chosen to adjust the
+statements in the dump from original database for no particular reason.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied gtestxx_4 adjustments');
+
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c1 adjustments');
+
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 9c83d93f79f..0e78973760d 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -89,6 +90,8 @@ our @EXPORT = qw(
   command_fails_like
   command_checks_all
 
+  compare_dumps
+
   $windows_os
   $is_msys2
   $use_unix_sockets
@@ -1081,6 +1084,54 @@ sub command_checks_all
 
 =pod
 
+=item compare_dumps(dump1, dump2, testname)
+
+Test that the given two files match. The files usually contain pg_dump output in
+"plain" format. Output the difference if any.
+
+=over
+
+=item C<dump1> and C<dump2>: Dump files to compare
+
+=item C<testname>: test name
+
+=back
+
+=cut
+
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+	else
+	{
+		# Fail if the comparison succeeds because the files are the same. This
+		# will detect simple programming errors. It won't detect more complex
+		# errors like passing different links pointing to the same underlying
+		# file.
+		ok($dump1 ne $dump2, "dump files being compared are distinct")
+	}
+
+	return;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 467113b1379..25ef5e63c05 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -116,8 +116,9 @@ command_ok(
 		'--no-sync', '-p', $node_standby_1->port
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
+compare_dumps(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -146,12 +147,9 @@ command_ok(
 		'regression'
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump'
-	],
+compare_dumps(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.34.1

#33

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

12 months ago

In reply to: Ashutosh Bapat (#32)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Wed, Jan 15, 2025 at 5:59 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Dec 31, 2024 at 5:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Fri, Dec 27, 2024 at 6:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 20 Dec 2024, at 11:01, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Dec 18, 2024 at 7:39 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 18 Dec 2024, at 12:28, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
+   if (   $ENV{PG_TEST_EXTRA}
+       && $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
Should this also test that $oldnode and $newnode have matching pg_version to
keep this from running in a cross-version upgrade test?  While it can be argued
that running this in a cross-version upgrade is breaking it and getting to keep
both pieces, it's also not ideal to run a resource intensive test we know will
fail.  (It can't be done at this exact callsite, just picked to illustrate.)
You already wrote it in parenthesis. At the exact callsite $oldnode
and $newnode can not be of different versions. In fact newnode is yet
to be created at this point. But $oldnode has the same version as the
server run from the code. In a cross-version upgrade this test will
not be executed. I am confused as to what this comment is about.
Sure, it can't be checked until $newnode is created, but it seems like a cheap
test to ensure it's not executed as part of someones cross-version tests.
Hmm. The new node is always the node created with the version of code.
It's the old node which may have a different version. Hence I added
code to compare the versions of source node (which is the oldnode) and
destination node (which is created the same way as the new node and
hence has the same version as the new node) in
test_regression_dump_restore() itself. Additionally the code makes
sure that the oldnode doesn't use a custom install path. This is 0002
patch. 0001 in this patchset is 0001 + 0002 in the earlier patch set.
Here's a rebased patch with some cosmetic fixes, typos and grammar
fixes after a self review. I have squashed all the patches into a
single patch now.

PFA patch with rebased on the latest HEAD and conflicts fixed.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20250127.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250127.patchDownload

From 363203acf5aa7020ac26d3757468383b85c22f77 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Multiple tests compare pg_dump outputs taken from two clusters in plain
format as a way to compare the contents of those clusters. Add
PostreSQL::Test::Utils::compare_dumps() to standardize and modularize
the comparison.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 152 +++++++++++++++++---
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 125 ++++++++++++++++
 src/test/perl/PostgreSQL/Test/Utils.pm      |  51 +++++++
 src/test/perl/meson.build                   |   1 +
 src/test/recovery/t/027_stream_regress.pl   |  14 +-
 7 files changed, 332 insertions(+), 25 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 7c474559bdf..3061ce42fd1 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -347,6 +347,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index e49bff6454a..7c5563d0c9e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,13 +6,13 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -515,20 +530,123 @@ my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
 my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
+compare_dumps($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
+
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version or
+		defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
 
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
+	while (my ($format, $format_spec) = each %dump_formats)
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		compare_dumps($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database pointed by given connection string `connstr` in plain format and adjust it
+# for comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
 {
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+										$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
 }
 
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..c232ce2b1a5
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,125 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in the relevant
+C<CREATE TABLE... INHERITS> statements in the dump file from original database
+to match those from the restored database. We could instead adjust the
+statements in the dump from the restored database to match those from original
+database or adjust both to a canonical order. But we have chosen to adjust the
+statements in the dump from original database for no particular reason.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied gtestxx_4 adjustments');
+
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c1 adjustments');
+
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 9c83d93f79f..0e78973760d 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -89,6 +90,8 @@ our @EXPORT = qw(
   command_fails_like
   command_checks_all
 
+  compare_dumps
+
   $windows_os
   $is_msys2
   $use_unix_sockets
@@ -1081,6 +1084,54 @@ sub command_checks_all
 
 =pod
 
+=item compare_dumps(dump1, dump2, testname)
+
+Test that the given two files match. The files usually contain pg_dump output in
+"plain" format. Output the difference if any.
+
+=over
+
+=item C<dump1> and C<dump2>: Dump files to compare
+
+=item C<testname>: test name
+
+=back
+
+=cut
+
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+	else
+	{
+		# Fail if the comparison succeeds because the files are the same. This
+		# will detect simple programming errors. It won't detect more complex
+		# errors like passing different links pointing to the same underlying
+		# file.
+		ok($dump1 ne $dump2, "dump files being compared are distinct")
+	}
+
+	return;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index bab7b28084b..92b610e709a 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -120,8 +120,9 @@ command_ok(
 		'--port' => $node_standby_1->port,
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump', ],
+compare_dumps(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -150,12 +151,9 @@ command_ok(
 		'regression',
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump',
-	],
+compare_dumps(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.34.1

#34

Michael Paquier

michael@paquier.xyz

11 months ago

In reply to: Ashutosh Bapat (#33)

Re: Test to dump and restore objects left behind by regression

On Mon, Jan 27, 2025 at 03:04:55PM +0530, Ashutosh Bapat wrote:

PFA patch with rebased on the latest HEAD and conflicts fixed.

Thanks for the new patch.

Hmm. I was reading through the patch and there is something that
clearly stands out IMO: the new compare_dumps(). It is in Utils.pm,
and it acts as a wrapper of `diff` with its formalized output format.
It is not really about dumps, but about file comparisons. This should
be renamed compare_files(), with internals adjusted as such, and
reused in all the existing tests. Good idea to use that in
027_stream_regress.pl, actually. I'll go extract that first, to
reduce the presence of `diff` in the whole set of TAP tests.

AdjustDump.pm looks like a fine concept as it stands. I still need to
think more about it. It feels like we don't have the most optimal
interface, though, but perhaps that will be clearer once
compare_dumps() is moved out of the way.
--
Michael

#35

Michael Paquier

michael@paquier.xyz

11 months ago

In reply to: Michael Paquier (#34)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Wed, Feb 05, 2025 at 03:28:04PM +0900, Michael Paquier wrote:

Hmm. I was reading through the patch and there is something that
clearly stands out IMO: the new compare_dumps(). It is in Utils.pm,
and it acts as a wrapper of `diff` with its formalized output format.
It is not really about dumps, but about file comparisons. This should
be renamed compare_files(), with internals adjusted as such, and
reused in all the existing tests. Good idea to use that in
027_stream_regress.pl, actually. I'll go extract that first, to
reduce the presence of `diff` in the whole set of TAP tests.

The result of this part is pretty neat, resulting in 0001 where it is
possible to use the refactored routine as well in pg_combinebackup
where there is a piece comparing dumps. There are three more tests
with diff commands and assumptions of their own, that I've left out.
This has the merit of unifying the output generated should any diffs
show up, while removing a nice chunk from the main patch.

AdjustDump.pm looks like a fine concept as it stands. I still need to
think more about it. It feels like we don't have the most optimal
interface, though, but perhaps that will be clearer once
compare_dumps() is moved out of the way.

+ my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');

No need for this mapping, let's just use the long options.

+        # restore data as well to catch any errors while doing so.
+        command_ok(
+            [
+                'pg_dump', "-F$format_spec", '--no-sync',
+                '-d', $src_node->connstr('regression'),
+                '-f', $dump_file
+            ],
+            "pg_dump on source instance in $format format");

The use of command_ok() looks incorrect here. Shouldn't we use
$src_node->command_ok() here to ensure a correct PATH? That would be
more consistent with the other dump commands. Same remark about
@restore_command.

+    # The order of columns in COPY statements dumped from the original database
+    # and that from the restored database differs. These differences are hard to

What are the relations we are talking about here?

I am attaching the patch set, with 0002 being the main patch adjusted
with the changes of 0001 that I'm planning to apply, before diving
more into the internals of 0002.
--
Michael

Attachments:

0001-Refactor-code-for-file-comparisons-in-TAP-tests.patchtext/x-diff; charset=us-asciiDownload

From 9dc46db3b024b4c3779a3d4ab3b9add06813bf1e Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 6 Feb 2025 14:31:30 +0900
Subject: [PATCH 1/2] Refactor code for file comparisons in TAP tests

This unifies the output used should any differences be found in the
files provided.

There are a couple of tests that still use directly a diff command:
001_pg_bsd_indent, 017_shm and test_json_parser's 003.  These rely on
different properties and are left out for now.
---
 .../pg_combinebackup/t/002_compare_backups.pl | 19 +--------
 src/bin/pg_upgrade/t/002_pg_upgrade.pl        | 22 ++--------
 src/test/perl/PostgreSQL/Test/Utils.pm        | 40 +++++++++++++++++++
 src/test/recovery/t/027_stream_regress.pl     | 14 +++----
 4 files changed, 52 insertions(+), 43 deletions(-)

diff --git a/src/bin/pg_combinebackup/t/002_compare_backups.pl b/src/bin/pg_combinebackup/t/002_compare_backups.pl
index ebd68bfb850..4ca489b4511 100644
--- a/src/bin/pg_combinebackup/t/002_compare_backups.pl
+++ b/src/bin/pg_combinebackup/t/002_compare_backups.pl
@@ -192,27 +192,12 @@ $pitr2->command_ok(
 
 # Compare the two dumps, there should be no differences other than
 # the tablespace paths.
-my $compare_res = compare_text(
+my $compare_res = compare_files(
 	$dump1, $dump2,
+	"contents of dumps match for both PITRs",
 	sub {
 		s{create tablespace .* location .*\btspitr\K[12]}{N}i for @_;
 		return $_[0] ne $_[1];
 	});
-note($dump1);
-note($dump2);
-is($compare_res, 0, "dumps are identical");
-
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
-{
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1, $dump2 ]);
-	print "=== diff of $dump1 and $dump2\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
-}
 
 done_testing();
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index e49bff6454a..ddb4c40c2e6 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -6,9 +6,8 @@ use warnings FATAL => 'all';
 
 use Cwd            qw(abs_path);
 use File::Basename qw(dirname);
-use File::Compare;
-use File::Find qw(find);
-use File::Path qw(rmtree);
+use File::Find     qw(find);
+use File::Path     qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
@@ -515,20 +514,7 @@ my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
 my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare($dump1_filtered, $dump2_filtered);
-is($compare_res, 0, 'old and new dumps match after pg_upgrade');
-
-# Provide more context if the dumps do not match.
-if ($compare_res != 0)
-{
-	my ($stdout, $stderr) =
-	  run_command([ 'diff', '-u', $dump1_filtered, $dump2_filtered ]);
-	print "=== diff of $dump1_filtered and $dump2_filtered\n";
-	print "=== stdout ===\n";
-	print $stdout;
-	print "=== stderr ===\n";
-	print $stderr;
-	print "=== EOF ===\n";
-}
+my $compare_res = compare_files($dump1_filtered, $dump2_filtered,
+	'old and new dumps match after pg_upgrade');
 
 done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 9c83d93f79f..0c867c179f6 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -50,6 +50,7 @@ use Cwd;
 use Exporter 'import';
 use Fcntl qw(:mode :seek);
 use File::Basename;
+use File::Compare;
 use File::Find;
 use File::Spec;
 use File::stat qw(stat);
@@ -70,6 +71,7 @@ our @EXPORT = qw(
   check_mode_recursive
   chmod_recursive
   check_pg_config
+  compare_files
   dir_symlink
   scan_server_header
   system_or_bail
@@ -773,6 +775,44 @@ sub check_pg_config
 
 =pod
 
+=item compare_files(file1, file2, testname)
+
+Check that two files match, printing the difference if any.
+
+C<line_comp_function> is an optional CODE reference to a line comparison
+function, passed down as-is to File::Compare::compare_text.
+
+=cut
+
+sub compare_files
+{
+	my ($file1, $file2, $testname, $line_comp_function) = @_;
+
+	$line_comp_function = sub { $_[0] ne $_[1] }
+	  unless defined $line_comp_function;
+
+	my $compare_res =
+	  File::Compare::compare_text($file1, $file2, $line_comp_function);
+	is($compare_res, 0, $testname);
+
+	# Provide more context if the files do not match.
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $file1, $file2 ]);
+		print "=== diff of $file1 and $file2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+
+	return;
+}
+
+=pod
+
 =item dir_symlink(oldname, newname)
 
 Portably create a symlink for a directory. On Windows this creates a junction
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index bab7b28084b..0eac8f66a9c 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -120,8 +120,9 @@ command_ok(
 		'--port' => $node_standby_1->port,
 	],
 	'dump standby server');
-command_ok(
-	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump', ],
+compare_files(
+	$outputdir . '/primary.dump',
+	$outputdir . '/standby.dump',
 	'compare primary and standby dumps');
 
 # Likewise for the catalogs of the regression database, after disabling
@@ -150,12 +151,9 @@ command_ok(
 		'regression',
 	],
 	'dump catalogs of standby server');
-command_ok(
-	[
-		'diff',
-		$outputdir . '/catalogs_primary.dump',
-		$outputdir . '/catalogs_standby.dump',
-	],
+compare_files(
+	$outputdir . '/catalogs_primary.dump',
+	$outputdir . '/catalogs_standby.dump',
 	'compare primary and standby catalog dumps');
 
 # Check some data from pg_stat_statements.
-- 
2.47.2

0002-Test-pg_dump-restore-of-regression-objects.patchtext/x-diff; charset=us-asciiDownload

From 97920995d9d04eeeaaedda00c441900425762030 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 6 Feb 2025 14:38:59 +0900
Subject: [PATCH 2/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Multiple tests compare pg_dump outputs taken from two clusters in plain
format as a way to compare the contents of those clusters.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Paquier, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 138 +++++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 125 ++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 doc/src/sgml/regress.sgml                   |  12 ++
 5 files changed, 275 insertions(+), 3 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index ddb4c40c2e6..e198615c7cb 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -261,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -514,7 +530,123 @@ my $dump1_filtered = filter_dump(1, $oldnode->pg_version, $dump1_file);
 my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 
 # Compare the two dumps, there should be no differences.
-my $compare_res = compare_files($dump1_filtered, $dump2_filtered,
+compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+	my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version or
+		defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump = get_dump_for_comparison($src_node->connstr('regression'),
+		'src_dump', 1);
+
+	# Setup destination database
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	while (my ($format, $format_spec) = each %dump_formats)
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		command_ok(
+			[
+				'pg_dump', "-F$format_spec", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		# Dump restored database for comparison
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node->connstr($restored_db),
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database pointed by given connection string `connstr` in plain format and adjust it
+# for comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($connstr, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	command_ok(
+		[ 'pg_dump', '-s', '--no-sync', '-d', $connstr, '-f', $dumpfile ],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+										$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..c232ce2b1a5
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,125 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in the relevant
+C<CREATE TABLE... INHERITS> statements in the dump file from original database
+to match those from the restored database. We could instead adjust the
+statements in the dump from the restored database to match those from original
+database or adjust both to a canonical order. But we have chosen to adjust the
+statements in the dump from original database for no particular reason.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied gtestxx_4 adjustments');
+
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c1 adjustments');
+
+		$dump =~ s/(CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+
+		ok($saved_dump ne $dump, 'applied test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 7c474559bdf..3061ce42fd1 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -347,6 +347,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
-- 
2.47.2

#36

Alvaro Herrera

alvherre@alvh.no-ip.org

11 months ago

In reply to: Michael Paquier (#35)

Re: Test to dump and restore objects left behind by regression

On 2025-Feb-06, Michael Paquier wrote:

On Wed, Feb 05, 2025 at 03:28:04PM +0900, Michael Paquier wrote:

Hmm. I was reading through the patch and there is something that
clearly stands out IMO: the new compare_dumps(). It is in Utils.pm,
and it acts as a wrapper of `diff` with its formalized output format.
It is not really about dumps, but about file comparisons. This should
be renamed compare_files(), with internals adjusted as such, and
reused in all the existing tests. Good idea to use that in
027_stream_regress.pl, actually. I'll go extract that first, to
reduce the presence of `diff` in the whole set of TAP tests.

The result of this part is pretty neat, resulting in 0001 where it is
possible to use the refactored routine as well in pg_combinebackup
where there is a piece comparing dumps. There are three more tests
with diff commands and assumptions of their own, that I've left out.

Great, I've looked at doing something like this in the libpq_pipeline
test for better diff reporting -- what I have uses Test::Differences,
which is pretty neat and usable, but it's not part of the standard
installed perl modules, which is a large downside. I can probably get
rid of my hack once you get 0001 in.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)

#37

Michael Paquier

michael@paquier.xyz

11 months ago

In reply to: Alvaro Herrera (#36)

Re: Test to dump and restore objects left behind by regression

On Thu, Feb 06, 2025 at 10:43:56AM +0100, Alvaro Herrera wrote:

Great, I've looked at doing something like this in the libpq_pipeline
test for better diff reporting -- what I have uses Test::Differences,
which is pretty neat and usable, but it's not part of the standard
installed perl modules, which is a large downside. I can probably get
rid of my hack once you get 0001 in.

Okay, thanks for the feedback. We have been relying on diff -u for
the parts of the tests touched by 0001 for some time now, so if there
are no objections I would like to apply 0001 in a couple of days.

The CF entry has been switched as waiting on author.
--
Michael

#38

Michael Paquier

michael@paquier.xyz

11 months ago

In reply to: Michael Paquier (#37)

Re: Test to dump and restore objects left behind by regression

On Fri, Feb 07, 2025 at 07:11:25AM +0900, Michael Paquier wrote:

Okay, thanks for the feedback. We have been relying on diff -u for
the parts of the tests touched by 0001 for some time now, so if there
are no objections I would like to apply 0001 in a couple of days.

This part has been applied as 169208092f5c.
--
Michael

#39

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

11 months ago

In reply to: Michael Paquier (#35)

Re: Test to dump and restore objects left behind by regression

On Thu, Feb 6, 2025 at 11:32 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Feb 05, 2025 at 03:28:04PM +0900, Michael Paquier wrote:

Hmm. I was reading through the patch and there is something that
clearly stands out IMO: the new compare_dumps(). It is in Utils.pm,
and it acts as a wrapper of `diff` with its formalized output format.
It is not really about dumps, but about file comparisons. This should
be renamed compare_files(), with internals adjusted as such, and
reused in all the existing tests. Good idea to use that in
027_stream_regress.pl, actually. I'll go extract that first, to
reduce the presence of `diff` in the whole set of TAP tests.

The result of this part is pretty neat, resulting in 0001 where it is
possible to use the refactored routine as well in pg_combinebackup
where there is a piece comparing dumps. There are three more tests
with diff commands and assumptions of their own, that I've left out.
This has the merit of unifying the output generated should any diffs
show up, while removing a nice chunk from the main patch.

Sorry for replying late here. The refactored code in
002_compare_backups.pl has a potential to cause confusion even without
this refactoring. The differences in tablespace paths are adjusted in
compare_files() and not in the actual dump outputs. In case there's a
difference other than paths, diff between the dump outputs is reported
which will also show the differences in paths. That might mislead
developers in thinking that the differences in paths are also not
expected. Am I right?

I will address other comments soon, but the answer to this question
has some impact there.

--
Best Wishes,
Ashutosh Bapat

#40

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

11 months ago

In reply to: Michael Paquier (#38)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

Hi Michael,

On Sun, Feb 9, 2025 at 1:25 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Feb 07, 2025 at 07:11:25AM +0900, Michael Paquier wrote:

Okay, thanks for the feedback. We have been relying on diff -u for
the parts of the tests touched by 0001 for some time now, so if there
are no objections I would like to apply 0001 in a couple of days.

This part has been applied as 169208092f5c.

Thanks. PFA rebased patches.

I have added another diff adjustment to adjust_regress_dumpfile().
It's introduced by 83ea6c54025bea67bcd4949a6d58d3fc11c3e21b.

On Thu, Feb 6, 2025 at 11:32 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Feb 05, 2025 at 03:28:04PM +0900, Michael Paquier wrote:

Hmm. I was reading through the patch and there is something that
clearly stands out IMO: the new compare_dumps(). It is in Utils.pm,
and it acts as a wrapper of `diff` with its formalized output format.
It is not really about dumps, but about file comparisons. This should
be renamed compare_files(), with internals adjusted as such, and
reused in all the existing tests. Good idea to use that in
027_stream_regress.pl, actually. I'll go extract that first, to
reduce the presence of `diff` in the whole set of TAP tests.

The result of this part is pretty neat, resulting in 0001 where it is
possible to use the refactored routine as well in pg_combinebackup
where there is a piece comparing dumps. There are three more tests
with diff commands and assumptions of their own, that I've left out.
This has the merit of unifying the output generated should any diffs
show up, while removing a nice chunk from the main patch.

AdjustDump.pm looks like a fine concept as it stands. I still need to
think more about it. It feels like we don't have the most optimal
interface, though, but perhaps that will be clearer once
compare_dumps() is moved out of the way.

Without knowing what makes the interface suboptimal, it's hard to make
it optimal. I did think about getting rid of adjust_child_columns
flag. But that either means we adjust CREATE TABLE ... INHERIT
statements from both the dump outputs from original and the restored
database to a canonical form or get rid of the tests in that function
to make sure that the adjustment is required. The first seems more
work (coding and run time). The tests look useful to detect when the
adjustment won't be required.

I also looked at the routines which adjust the dumps from upgrade
tests. They seem to be specific to the older versions and lack the
extensibility you mentioned earlier.

The third thing I looked at was the possibility of applying the
adjustments to only the dump from the original database where it is
required by passing the newline adjustments to compare_files().
However 0002 in the attached set of patches adds more logic,
applicable to both the original and restored dump outputs, to
AdjustDump.pm. So we can't do that either.

I am clueless as to what could be improved here.

+ my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');

No need for this mapping, let's just use the long options.

Hmm, didn't realize -F accepts whole format name as well. pg_dump
--help doesn't give that impression but user facing documentation
mentions it. Done.

+        # restore data as well to catch any errors while doing so.
+        command_ok(
+            [
+                'pg_dump', "-F$format_spec", '--no-sync',
+                '-d', $src_node->connstr('regression'),
+                '-f', $dump_file
+            ],
+            "pg_dump on source instance in $format format");
The use of command_ok() looks incorrect here. Shouldn't we use
$src_node->command_ok() here to ensure a correct PATH? That would be
more consistent with the other dump commands. Same remark about
@restore_command.

Cluster::command_ok's prologue doesn't mention PATH but mentions
PGHOST and PGPORT.
```
Runs a shell command like PostgreSQL::Test::Utils::command_ok, but
with PGHOST and PGPORT set
so that the command will default to connecting to this PostgreSQL::Test::Cluster
```
According to sub _get_env(), PATH is set only when custom install path
is provided. In the absence of that, build path is used. In this case,
the source and destination nodes are created from the build itself, so
no separate path is required. PGHOST and PGPORT are anyway overridden
by the connection string fetched from the node. So I don't think
there's any correctness issue here, but it's better to use
Cluster::command_ok() just for better readability. Done

+    # The order of columns in COPY statements dumped from the original database
+    # and that from the restored database differs. These differences are hard to

What are the relations we are talking about here?

These are the child tables whose parent has added a column after being
inherited. Now that I have more expertise with perl regex, I have
added code in AdjustDump.pm to remove only the COPY statements where
we see legitimate difference. Added a comment explaining the cause
behind the difference. This patch is supposed to be merged into 0001
before committing the change.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20250211.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250211.patchDownload

From 42ddf5e9fd6ea64c04e8c45033001fc834cd8039 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 140 +++++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 134 +++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 287 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 7c474559bdf..3061ce42fd1 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -347,6 +347,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 68516fa486a..0d636529d74 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -261,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -517,4 +533,124 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		# Create a new database for restoring dump from every format so that it
+		# is available for debugging in case the test fails.
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, $restored_db,
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	$node->command_ok(
+		[
+			'pg_dump', '-s', '--no-sync', '-d',
+			$node->connstr($db), '-f', $dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..e3e152b88fa
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,134 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in the relevant
+C<CREATE TABLE... INHERITS> statements in the dump file from original database
+to match those from the restored database. We could instead adjust the
+statements in the dump from the restored database to match those from original
+database or adjust both to a canonical order. But we have chosen to adjust the
+statements in the dump from original database for no particular reason.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: 6998db59c2959c4f280a9088054e6dbf7178efe0
-- 
2.34.1

0002-Filter-COPY-statements-with-differing-colum-20250211.patchtext/x-patch; charset=US-ASCII; name=0002-Filter-COPY-statements-with-differing-colum-20250211.patchDownload

From 6976ebad1e4fa717cd8fa2f70fbe73cd476e7caf Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 11 Feb 2025 16:31:10 +0530
Subject: [PATCH 2/2] Filter COPY statements with differing column order

---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 10 +---
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 59 +++++++++++++++------
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 0d636529d74..b3e1573ec34 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -569,9 +569,6 @@ sub test_regression_dump_restore
 		my $dump_file = "$tempdir/regression_dump.$format";
 		my $restored_db = 'regression_' . $format;
 
-		# Even though we compare only schema from the original and the restored
-		# database (See get_dump_for_comparison() for details.), we dump and
-		# restore data as well to catch any errors while doing so.
 		$src_node->command_ok(
 			[
 				'pg_dump', "-F$format", '--no-sync',
@@ -633,13 +630,10 @@ sub get_dump_for_comparison
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
 
-	# The order of columns in COPY statements dumped from the original database
-	# and that from the restored database differs. These differences are hard to
-	# adjust. Hence we compare only schema dumps for now.
 	$node->command_ok(
 		[
-			'pg_dump', '-s', '--no-sync', '-d',
-			$node->connstr($db), '-f', $dumpfile
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
 		],
 		'dump for comparison succeeded');
 
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
index e3e152b88fa..e00a00d1b2c 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -44,22 +44,36 @@ our @EXPORT = qw(
 
 If we take dump of the regression database left behind after running regression
 tests, restore the dump, and take dump of the restored regression database, the
-outputs of both the dumps differ. Some regression tests purposefully create
-some child tables in such a way that their column orders differ from column
-orders of their respective parents. In the restored database, however, their
-column orders are same as that of their respective parents. Thus the column
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
 orders of these child tables in the original database and those in the restored
 database differ, causing difference in the dump outputs. See MergeAttributes()
-and dumpTableSchema() for details.
-
-This routine rearranges the column declarations in the relevant
-C<CREATE TABLE... INHERITS> statements in the dump file from original database
-to match those from the restored database. We could instead adjust the
-statements in the dump from the restored database to match those from original
-database or adjust both to a canonical order. But we have chosen to adjust the
-statements in the dump from original database for no particular reason.
-
-Additionally it adjusts blank and new lines to avoid noise.
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
 
 Arguments:
 
@@ -84,8 +98,6 @@ sub adjust_regress_dumpfile
 
 	# use Unix newlines
 	$dump =~ s/\r\n/\n/g;
-	# Suppress blank lines, as some places in pg_dump emit more or fewer.
-	$dump =~ s/\n\n+/\n/g;
 
 	# Adjust the CREATE TABLE ... INHERITS statements.
 	if ($adjust_child_columns)
@@ -122,6 +134,21 @@ sub adjust_regress_dumpfile
 			'applied public.test_type_diff2_c2 adjustments');
 	}
 
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
 	return $dump;
 }
 
-- 
2.34.1

#41

Michael Paquier

michael@paquier.xyz

11 months ago

In reply to: Ashutosh Bapat (#39)

Re: Test to dump and restore objects left behind by regression

On Tue, Feb 11, 2025 at 12:19:33PM +0530, Ashutosh Bapat wrote:

Sorry for replying late here. The refactored code in
002_compare_backups.pl has a potential to cause confusion even without
this refactoring. The differences in tablespace paths are adjusted in
compare_files() and not in the actual dump outputs. In case there's a
difference other than paths, diff between the dump outputs is reported
which will also show the differences in paths. That might mislead
developers in thinking that the differences in paths are also not
expected. Am I right?

Logically, 002_compare_backups.pl is still the same, isn't it? We're
still passing the file paths to compare_text(), except that the
comparison routine is given as an argument one level higher.

You are right that there could be an argument for changing the files
are they are on-disk, and do a diff based on what's on disk after what
has changed so as the filtered parts are out of the report. However,
there is also an argument for not changing them as that's more useful
to know the original state of the dump for debugging. This one
involves only a small change, which is OK as-is, IMHO.
--
Michael

#42

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

11 months ago

In reply to: Michael Paquier (#41)

Re: Test to dump and restore objects left behind by regression

On Wed, Feb 12, 2025 at 5:25 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Feb 11, 2025 at 12:19:33PM +0530, Ashutosh Bapat wrote:

Sorry for replying late here. The refactored code in
002_compare_backups.pl has a potential to cause confusion even without
this refactoring. The differences in tablespace paths are adjusted in
compare_files() and not in the actual dump outputs. In case there's a
difference other than paths, diff between the dump outputs is reported
which will also show the differences in paths. That might mislead
developers in thinking that the differences in paths are also not
expected. Am I right?

Logically, 002_compare_backups.pl is still the same, isn't it? We're
still passing the file paths to compare_text(), except that the
comparison routine is given as an argument one level higher.

Yes. That's right. Not something introduced by
169208092f5c98a6021b23b38f03a5d65f84ad96.

You are right that there could be an argument for changing the files
are they are on-disk, and do a diff based on what's on disk after what
has changed so as the filtered parts are out of the report. However,
there is also an argument for not changing them as that's more useful
to know the original state of the dump for debugging. This one
involves only a small change, which is OK as-is, IMHO.

Fine. We know what to fix if an ambiguity arises in future.

--
Best Wishes,
Ashutosh Bapat

#43

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

11 months ago

In reply to: Ashutosh Bapat (#40)

3 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Tue, Feb 11, 2025 at 5:53 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi Michael,

On Sun, Feb 9, 2025 at 1:25 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Feb 07, 2025 at 07:11:25AM +0900, Michael Paquier wrote:

Okay, thanks for the feedback. We have been relying on diff -u for
the parts of the tests touched by 0001 for some time now, so if there
are no objections I would like to apply 0001 in a couple of days.

This part has been applied as 169208092f5c.

Thanks. PFA rebased patches.

PFA rebased patches.

After rebasing I found another bug and reported it at [1]/messages/by-id/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=C+FnxDvfprWvkng@mail.gmail.com.

For the time being I have added --no-statistics to the pg_dump command
when taking a dump for comparison.

[1]: /messages/by-id/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=C+FnxDvfprWvkng@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Filter-COPY-statements-with-differing-colum-20250225.patchtext/x-patch; charset=US-ASCII; name=0002-Filter-COPY-statements-with-differing-colum-20250225.patchDownload

From d4372813f92ead1a6ebb57c42acc6439c8162427 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 11 Feb 2025 16:31:10 +0530
Subject: [PATCH 2/3] Filter COPY statements with differing column order

---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 10 +---
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 59 +++++++++++++++------
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 25de01615f6..2cc571219ce 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -576,9 +576,6 @@ sub test_regression_dump_restore
 		my $dump_file = "$tempdir/regression_dump.$format";
 		my $restored_db = 'regression_' . $format;
 
-		# Even though we compare only schema from the original and the restored
-		# database (See get_dump_for_comparison() for details.), we dump and
-		# restore data as well to catch any errors while doing so.
 		$src_node->command_ok(
 			[
 				'pg_dump', "-F$format", '--no-sync',
@@ -640,13 +637,10 @@ sub get_dump_for_comparison
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
 
-	# The order of columns in COPY statements dumped from the original database
-	# and that from the restored database differs. These differences are hard to
-	# adjust. Hence we compare only schema dumps for now.
 	$node->command_ok(
 		[
-			'pg_dump', '-s', '--no-sync', '-d',
-			$node->connstr($db), '-f', $dumpfile
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
 		],
 		'dump for comparison succeeded');
 
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
index e3e152b88fa..e00a00d1b2c 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -44,22 +44,36 @@ our @EXPORT = qw(
 
 If we take dump of the regression database left behind after running regression
 tests, restore the dump, and take dump of the restored regression database, the
-outputs of both the dumps differ. Some regression tests purposefully create
-some child tables in such a way that their column orders differ from column
-orders of their respective parents. In the restored database, however, their
-column orders are same as that of their respective parents. Thus the column
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
 orders of these child tables in the original database and those in the restored
 database differ, causing difference in the dump outputs. See MergeAttributes()
-and dumpTableSchema() for details.
-
-This routine rearranges the column declarations in the relevant
-C<CREATE TABLE... INHERITS> statements in the dump file from original database
-to match those from the restored database. We could instead adjust the
-statements in the dump from the restored database to match those from original
-database or adjust both to a canonical order. But we have chosen to adjust the
-statements in the dump from original database for no particular reason.
-
-Additionally it adjusts blank and new lines to avoid noise.
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
 
 Arguments:
 
@@ -84,8 +98,6 @@ sub adjust_regress_dumpfile
 
 	# use Unix newlines
 	$dump =~ s/\r\n/\n/g;
-	# Suppress blank lines, as some places in pg_dump emit more or fewer.
-	$dump =~ s/\n\n+/\n/g;
 
 	# Adjust the CREATE TABLE ... INHERITS statements.
 	if ($adjust_child_columns)
@@ -122,6 +134,21 @@ sub adjust_regress_dumpfile
 			'applied public.test_type_diff2_c2 adjustments');
 	}
 
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
 	return $dump;
 }
 
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20250225.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250225.patchDownload

From 9c45faedda92901bf7638c4fee42b397b802be96 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 140 +++++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 134 +++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 287 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 45ea94c84bb..25de01615f6 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -261,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -524,4 +540,124 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		# Create a new database for restoring dump from every format so that it
+		# is available for debugging in case the test fails.
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, $restored_db,
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	$node->command_ok(
+		[
+			'pg_dump', '-s', '--no-sync', '-d',
+			$node->connstr($db), '-f', $dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..e3e152b88fa
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,134 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in the relevant
+C<CREATE TABLE... INHERITS> statements in the dump file from original database
+to match those from the restored database. We could instead adjust the
+statements in the dump from the restored database to match those from original
+database or adjust both to a canonical order. But we have chosen to adjust the
+statements in the dump from original database for no particular reason.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: 5b8f2ccc0a93375acb64a457817e61f400404a1f
-- 
2.34.1

0003-Do-not-dump-statistics-in-the-file-dumped-f-20250225.patchtext/x-patch; charset=US-ASCII; name=0003-Do-not-dump-statistics-in-the-file-dumped-f-20250225.patchDownload

From 996e175a17ff406373560134bcc5c657bc92a643 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 3/3] Do not dump statistics in the file dumped for comparison

As reported at [1], the dumped and restored statistics may differ if there's a
primary key on the table. Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=C+FnxDvfprWvkng@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 2cc571219ce..f3892f7150d 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -639,7 +639,7 @@ sub get_dump_for_comparison
 
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

#44

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#43)

3 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Tue, Feb 25, 2025 at 11:59 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Feb 11, 2025 at 5:53 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi Michael,

On Sun, Feb 9, 2025 at 1:25 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Feb 07, 2025 at 07:11:25AM +0900, Michael Paquier wrote:

Okay, thanks for the feedback. We have been relying on diff -u for
the parts of the tests touched by 0001 for some time now, so if there
are no objections I would like to apply 0001 in a couple of days.

This part has been applied as 169208092f5c.

Thanks. PFA rebased patches.

PFA rebased patches.

After rebasing I found another bug and reported it at [1].

This bug has been fixed. But now that it's fixed, it's easy to see
another bug related to materialized view statistics. I have reported
it at [2]/messages/by-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com. That's the fourth bug identified by this test.

For the time being I have added --no-statistics to the pg_dump command
when taking a dump for comparison.

I have not taken out this option because of materialized view bug.

[1] /messages/by-id/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=C+FnxDvfprWvkng@mail.gmail.com

[2]: /messages/by-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Filter-COPY-statements-with-differing-colum-20250311.patchtext/x-patch; charset=US-ASCII; name=0002-Filter-COPY-statements-with-differing-colum-20250311.patchDownload

From a140d50245249894a49c39a908163e9bac2fe4bb Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 11 Feb 2025 16:31:10 +0530
Subject: [PATCH 2/3] Filter COPY statements with differing column order

---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 10 +---
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 59 +++++++++++++++------
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index c6b99125d9e..e6d8ac9a757 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -581,9 +581,6 @@ sub test_regression_dump_restore
 		my $dump_file = "$tempdir/regression_dump.$format";
 		my $restored_db = 'regression_' . $format;
 
-		# Even though we compare only schema from the original and the restored
-		# database (See get_dump_for_comparison() for details.), we dump and
-		# restore data as well to catch any errors while doing so.
 		$src_node->command_ok(
 			[
 				'pg_dump', "-F$format", '--no-sync',
@@ -645,13 +642,10 @@ sub get_dump_for_comparison
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
 
-	# The order of columns in COPY statements dumped from the original database
-	# and that from the restored database differs. These differences are hard to
-	# adjust. Hence we compare only schema dumps for now.
 	$node->command_ok(
 		[
-			'pg_dump', '-s', '--no-sync', '-d',
-			$node->connstr($db), '-f', $dumpfile
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
 		],
 		'dump for comparison succeeded');
 
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
index e3e152b88fa..e00a00d1b2c 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -44,22 +44,36 @@ our @EXPORT = qw(
 
 If we take dump of the regression database left behind after running regression
 tests, restore the dump, and take dump of the restored regression database, the
-outputs of both the dumps differ. Some regression tests purposefully create
-some child tables in such a way that their column orders differ from column
-orders of their respective parents. In the restored database, however, their
-column orders are same as that of their respective parents. Thus the column
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
 orders of these child tables in the original database and those in the restored
 database differ, causing difference in the dump outputs. See MergeAttributes()
-and dumpTableSchema() for details.
-
-This routine rearranges the column declarations in the relevant
-C<CREATE TABLE... INHERITS> statements in the dump file from original database
-to match those from the restored database. We could instead adjust the
-statements in the dump from the restored database to match those from original
-database or adjust both to a canonical order. But we have chosen to adjust the
-statements in the dump from original database for no particular reason.
-
-Additionally it adjusts blank and new lines to avoid noise.
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
 
 Arguments:
 
@@ -84,8 +98,6 @@ sub adjust_regress_dumpfile
 
 	# use Unix newlines
 	$dump =~ s/\r\n/\n/g;
-	# Suppress blank lines, as some places in pg_dump emit more or fewer.
-	$dump =~ s/\n\n+/\n/g;
 
 	# Adjust the CREATE TABLE ... INHERITS statements.
 	if ($adjust_child_columns)
@@ -122,6 +134,21 @@ sub adjust_regress_dumpfile
 			'applied public.test_type_diff2_c2 adjustments');
 	}
 
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
 	return $dump;
 }
 
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20250311.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250311.patchDownload

From 25c4c7e4ee754dd989d3fd8f015c7355fd9992d6 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 142 +++++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 134 ++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 289 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index c00cf68d660..c6b99125d9e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -261,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -527,4 +543,126 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Even though we compare only schema from the original and the restored
+		# database (See get_dump_for_comparison() for details.), we dump and
+		# restore data as well to catch any errors while doing so.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		# Create a new database for restoring dump from every format so that it
+		# is available for debugging in case the test fails.
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, $restored_db,
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+
+	# The order of columns in COPY statements dumped from the original database
+	# and that from the restored database differs. These differences are hard to
+	# adjust. Hence we compare only schema dumps for now.
+	$node->command_ok(
+		[
+			'pg_dump', '-s', '--no-sync', '-d',
+			$node->connstr($db), '-f', $dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..e3e152b88fa
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,134 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ. Some regression tests purposefully create
+some child tables in such a way that their column orders differ from column
+orders of their respective parents. In the restored database, however, their
+column orders are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.
+
+This routine rearranges the column declarations in the relevant
+C<CREATE TABLE... INHERITS> statements in the dump file from original database
+to match those from the restored database. We could instead adjust the
+statements in the dump from the restored database to match those from original
+database or adjust both to a canonical order. But we have chosen to adjust the
+statements in the dump from original database for no particular reason.
+
+Additionally it adjusts blank and new lines to avoid noise.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: dabccf45139a8c7c3c2e7683a943c31077e55a78
-- 
2.34.1

0003-Do-not-dump-statistics-in-the-file-dumped-f-20250311.patchtext/x-patch; charset=US-ASCII; name=0003-Do-not-dump-statistics-in-the-file-dumped-f-20250311.patchDownload

From 4258ed1bcad537418c4c3f4ba0e3712ec515e09e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 3/3] Do not dump statistics in the file dumped for comparison

As reported at [1], the dumped and restored statistics may differ if there's a
primary key on the table. Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=C+FnxDvfprWvkng@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index e6d8ac9a757..8924cf8344a 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -644,7 +644,7 @@ sub get_dump_for_comparison
 
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

#45

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#44)

Re: Test to dump and restore objects left behind by regression

Hello

When running these tests, I encounter this strange diff in the dumps,
which seems to be that the locale for type money does not match. I
imagine the problem is that the locale is not set correctly when
initdb'ing one of them? Grepping the regress_log for initdb, I see
this:

$ grep -B1 'Running: initdb' tmp_check/log/regress_log_002_pg_upgrade
[13:00:57.580](0.003s) # initializing database system by running initdb
# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
--
[13:01:12.879](0.044s) # initializing database system by running initdb
# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_dst_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
--
[13:01:28.000](0.033s) # initializing database system by running initdb
# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding SQL_ASCII --locale-provider libc

[12:50:31.838](0.102s) not ok 15 - dump outputs from original and restored regression database (using plain format) match
[12:50:31.839](0.000s) 
[12:50:31.839](0.000s) #   Failed test 'dump outputs from original and restored regression database (using plain format) match'
#   at /pgsql/source/master/src/test/perl/PostgreSQL/Test/Utils.pm line 797.
[12:50:31.839](0.000s) #          got: '1'
#     expected: '0'
=== diff of /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted and /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted
=== stdout ===
--- /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted	2025-03-12 12:50:27.674918597 +0100
+++ /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted	2025-03-12 12:50:31.778840338 +0100
@@ -208972,7 +208972,7 @@
 -- Data for Name: money_data; Type: TABLE DATA; Schema: public; Owner: alvherre
 --
 COPY public.money_data (m) FROM stdin;
-$123.46
+$ 12.346,00
 \.
 --
 -- Data for Name: mvtest_t; Type: TABLE DATA; Schema: public; Owner: alvherre
@@ -376231,7 +376231,7 @@
 -- Data for Name: tab_core_types; Type: TABLE DATA; Schema: public; Owner: alvherre
 --
 COPY public.tab_core_types (point, line, lseg, box, openedpath, closedpath, polygon, circle, date, "time", "timestamp", timetz, timestamptz, "interval", "json", jsonb, jsonpath, inet, cidr, macaddr8, macaddr, int2, int4, int8, float4, float8, pi, "char", bpchar, "varchar", name, text, bool, bytea, "bit", varbit, money, refcursor, int2vector, oidvector, aclitem, tsvector, tsquery, uuid, xid8, regclass, type, regrole, oid, tid, xid, cid, txid_snapshot, pg_snapshot, pg_lsn, cardinal_number, character_data, sql_identifier, time_stamp, yes_or_no, int4range, int4multirange, int8range, int8multirange, numrange, nummultirange, daterange, datemultirange, tsrange, tsmultirange, tstzrange, tstzmultirange) FROM stdin;
-(11,12)	{1,-1,0}	[(11,11),(12,12)]	(13,13),(11,11)	((11,12),(13,13),(14,14))	[(11,12),(13,13),(14,14)]	((11,12),(13,13),(14,14))	<(1,1),1>	2025-03-12	04:50:14.125899	2025-03-12 04:50:14.125899	04:50:14.125899-07	2025-03-12 12:50:14.125899+01	00:00:12	{"reason":"because"}	{"when": "now"}	$."a"[*]?(@ > 2)	127.0.0.1	127.0.0.0/8	00:01:03:ff:fe:86:1c:ba	00:01:03:86:1c:ba	2	4	8	4	8	3.14159265358979	f	c	abc	name	txt	t	\\xdeadbeef	1	10001	$12.34	abc	1 2	1 2	alvherre=UC/alvherre	'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'	'fat' & 'rat'	a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11	11	pg_class	regtype	pg_monitor	1259	(1,1)	2	3	10:20:10,14,15	10:20:10,14,15	16/B374D848	1	l	n	2025-03-12 12:50:14.13+01	YES	empty	{}	empty	{}	(3,4)	{(3,4)}	[2020-01-03,2021-02-03)	{[2020-01-03,2021-02-03)}	("2020-01-02 03:04:05","2021-02-03 06:07:08")	{("2020-01-02 03:04:05","2021-02-03 06:07:08")}	("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")	{("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")}
+(11,12)	{1,-1,0}	[(11,11),(12,12)]	(13,13),(11,11)	((11,12),(13,13),(14,14))	[(11,12),(13,13),(14,14)]	((11,12),(13,13),(14,14))	<(1,1),1>	2025-03-12	04:50:14.125899	2025-03-12 04:50:14.125899	04:50:14.125899-07	2025-03-12 12:50:14.125899+01	00:00:12	{"reason":"because"}	{"when": "now"}	$."a"[*]?(@ > 2)	127.0.0.1	127.0.0.0/8	00:01:03:ff:fe:86:1c:ba	00:01:03:86:1c:ba	2	4	8	4	8	3.14159265358979	f	c	abc	name	txt	t	\\xdeadbeef	1	10001	$ 1.234,00	abc	1 2	1 2	alvherre=UC/alvherre	'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'	'fat' & 'rat'	a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11	11	pg_class	regtype	pg_monitor	1259	(1,1)	2	3	10:20:10,14,15	10:20:10,14,15	16/B374D848	1	l	n	2025-03-12 12:50:14.13+01	YES	empty	{}	empty	{}	(3,4)	{(3,4)}	[2020-01-03,2021-02-03)	{[2020-01-03,2021-02-03)}	("2020-01-02 03:04:05","2021-02-03 06:07:08")	{("2020-01-02 03:04:05","2021-02-03 06:07:08")}	("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")	{("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")}
 \.
 --
 -- Data for Name: tableam_parted_a_heap2; Type: TABLE DATA; Schema: public; Owner: alvherre=== stderr ===
=== EOF ===

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"¿Qué importan los años? Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)

#46

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#45)

Re: Test to dump and restore objects left behind by regression

On Wed, Mar 12, 2025 at 5:35 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello

When running these tests, I encounter this strange diff in the dumps,
which seems to be that the locale for type money does not match. I
imagine the problem is that the locale is not set correctly when
initdb'ing one of them? Grepping the regress_log for initdb, I see
this:

$ grep -B1 'Running: initdb' tmp_check/log/regress_log_002_pg_upgrade
[13:00:57.580](0.003s) # initializing database system by running initdb
# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
--
[13:01:12.879](0.044s) # initializing database system by running initdb
# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_dst_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
--
[13:01:28.000](0.033s) # initializing database system by running initdb
# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding SQL_ASCII --locale-provider libc

The original node and the node where dump is restored have the same
initdb commands. It's the upgraded node which has different initdb
command. But that's how the test is written originally.

[12:50:31.838](0.102s) not ok 15 - dump outputs from original and restored regression database (using plain format) match
[12:50:31.839](0.000s)
[12:50:31.839](0.000s) #   Failed test 'dump outputs from original and restored regression database (using plain format) match'
#   at /pgsql/source/master/src/test/perl/PostgreSQL/Test/Utils.pm line 797.
[12:50:31.839](0.000s) #          got: '1'
#     expected: '0'
=== diff of /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted and /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted
=== stdout ===
--- /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted     2025-03-12 12:50:27.674918597 +0100
+++ /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted      2025-03-12 12:50:31.778840338 +0100
@@ -208972,7 +208972,7 @@
-- Data for Name: money_data; Type: TABLE DATA; Schema: public; Owner: alvherre
--
COPY public.money_data (m) FROM stdin;
-$123.46
+$ 12.346,00
\.
--
-- Data for Name: mvtest_t; Type: TABLE DATA; Schema: public; Owner: alvherre
@@ -376231,7 +376231,7 @@
-- Data for Name: tab_core_types; Type: TABLE DATA; Schema: public; Owner: alvherre
--
COPY public.tab_core_types (point, line, lseg, box, openedpath, closedpath, polygon, circle, date, "time", "timestamp", timetz, timestamptz, "interval", "json", jsonb, jsonpath, inet, cidr, macaddr8, macaddr, int2, int4, int8, float4, float8, pi, "char", bpchar, "varchar", name, text, bool, bytea, "bit", varbit, money, refcursor, int2vector, oidvector, aclitem, tsvector, tsquery, uuid, xid8, regclass, type, regrole, oid, tid, xid, cid, txid_snapshot, pg_snapshot, pg_lsn, cardinal_number, character_data, sql_identifier, time_stamp, yes_or_no, int4range, int4multirange, int8range, int8multirange, numrange, nummultirange, daterange, datemultirange, tsrange, tsmultirange, tstzrange, tstzmultirange) FROM stdin;
-(11,12)        {1,-1,0}        [(11,11),(12,12)]       (13,13),(11,11) ((11,12),(13,13),(14,14))       [(11,12),(13,13),(14,14)]       ((11,12),(13,13),(14,14))       <(1,1),1>       2025-03-12      04:50:14.125899 2025-03-12 04:50:14.125899      04:50:14.125899-07      2025-03-12 12:50:14.125899+01   00:00:12        {"reason":"because"}    {"when": "now"} $."a"[*]?(@ > 2)        127.0.0.1       127.0.0.0/8     00:01:03:ff:fe:86:1c:ba 00:01:03:86:1c:ba       2       4       8       4       8       3.14159265358979        f       c       abc     name    txt     t       \\xdeadbeef     1       10001   $12.34  abc     1 2     1 2     alvherre=UC/alvherre    'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'      'fat' & 'rat'   a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11    11      pg_class        regtype pg_monitor      1259    (1,1)   2       3       10:20:10,14,15  10:20:10,14,15  16/B374D848     1       l       n       2025-03-12 12:50:14.13+01       YES     empty   {}      empty   {}      (3,4)   {(3,4)} [2020-01-03,2021-02-03) {[2020-01-03,2021-02-03)}       ("2020-01-02 03:04:05","2021-02-03 06:07:08")   {("2020-01-02 03:04:05","2021-02-03 06:07:08")} ("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")     {("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")}
+(11,12)        {1,-1,0}        [(11,11),(12,12)]       (13,13),(11,11) ((11,12),(13,13),(14,14))       [(11,12),(13,13),(14,14)]       ((11,12),(13,13),(14,14))       <(1,1),1>       2025-03-12      04:50:14.125899 2025-03-12 04:50:14.125899      04:50:14.125899-07      2025-03-12 12:50:14.125899+01   00:00:12        {"reason":"because"}    {"when": "now"} $."a"[*]?(@ > 2)        127.0.0.1       127.0.0.0/8     00:01:03:ff:fe:86:1c:ba 00:01:03:86:1c:ba       2       4       8       4       8       3.14159265358979        f       c       abc     name    txt     t       \\xdeadbeef     1       10001   $ 1.234,00      abc     1 2     1 2     alvherre=UC/alvherre    'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'      'fat' & 'rat'   a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11    11      pg_class        regtype pg_monitor      1259    (1,1)   2       3       10:20:10,14,15  10:20:10,14,15  16/B374D848     1       l       n       2025-03-12 12:50:14.13+01       YES     empty   {}      empty   {}      (3,4)   {(3,4)} [2020-01-03,2021-02-03) {[2020-01-03,2021-02-03)}       ("2020-01-02 03:04:05","2021-02-03 06:07:08")   {("2020-01-02 03:04:05","2021-02-03 06:07:08")} ("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")     {("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")}
\.
--
-- Data for Name: tableam_parted_a_heap2; Type: TABLE DATA; Schema: public; Owner: alvherre=== stderr ===
=== EOF ===

However these differences are coming from original and restored
database which are using the same initdb options.

Does the test pass for you if you don't apply my patches?

Over at [1]/messages/by-id/CAExHW5s+XNiP8aPGw9=hvbjdoOG5A-QCJnDdRcKsY1rDdZe4Jw@mail.gmail.com, I had seen a locale related failure without applying my patches.

[1]: /messages/by-id/CAExHW5s+XNiP8aPGw9=hvbjdoOG5A-QCJnDdRcKsY1rDdZe4Jw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

#47

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#46)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-12, Ashutosh Bapat wrote:

Does the test pass for you if you don't apply my patches?

Yes. It also passes if I keep PG_TEST_EXTRA empty.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#48

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#47)

Re: Test to dump and restore objects left behind by regression

Hi Alvaro,

On Wed, Mar 12, 2025 at 9:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-12, Ashutosh Bapat wrote:

Does the test pass for you if you don't apply my patches?

Yes. It also passes if I keep PG_TEST_EXTRA empty.

I am not able to reproduce this problem locally.

The test uses

In my case the money is printed $<digits before decimal>.<digits after
decimal> format in both the dumps. But in your case the money printed
from restored database has a space between $ and amount and the amount
also has decimal and comma in odd places - I can't figure out what
that means or what lc_monetary value would print something like that.
Can you please help me with
1. can you please run the test again and share the dump outputs. They
will be located in a temporary directory with names
src_dump.sql_adjusted and dest_dump.<format>.sql_adjusted.
2. Are you seeing this diff only with plain format or other formats as well?

Sorry for the trouble.

--
Best Wishes,
Ashutosh Bapat

#49

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#48)

Re: Test to dump and restore objects left behind by regression

Hello

On 2025-Mar-13, Ashutosh Bapat wrote:

1. can you please run the test again and share the dump outputs. They
will be located in a temporary directory with names
src_dump.sql_adjusted and dest_dump.<format>.sql_adjusted.

Ah, I see the problem :-) The first initdb does this:

# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
The files belonging to this database system will be owned by user "alvherre".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: C
LC_MONETARY: es_CL.UTF-8
LC_NUMERIC: es_CL.UTF-8
LC_TIME: es_CL.UTF-8
The default text search configuration will be set to "english".

Data page checksums are enabled.

which for some reason used my environment setting for LC_MONETARY.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"But static content is just dynamic content that isn't moving!"
http://smylers.hates-software.com/2007/08/15/fe244d0c.html

#50

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#49)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 13, 2025 at 2:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello

On 2025-Mar-13, Ashutosh Bapat wrote:

1. can you please run the test again and share the dump outputs. They
will be located in a temporary directory with names
src_dump.sql_adjusted and dest_dump.<format>.sql_adjusted.

Ah, I see the problem :-) The first initdb does this:

# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
The files belonging to this database system will be owned by user "alvherre".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: C
LC_MONETARY: es_CL.UTF-8
LC_NUMERIC: es_CL.UTF-8
LC_TIME: es_CL.UTF-8
The default text search configuration will be set to "english".

Data page checksums are enabled.

which for some reason used my environment setting for LC_MONETARY.

Thanks. This is super helpful. I am able to reproduce the problem
$ unset LC_MONETARY
$ export PG_TEST_EXTRA=regress_dump_test
$ meson test --suite setup && meson test pg_upgrade/002_pg_upgrade
... snip ...
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
72.38s 44 subtests passed

Ok: 1
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 0
Timeout: 0

Full log written to
/home/ashutosh/work/units/pg_dump_test/build/dev/meson-logs/testlog.txt
$ export LC_MONETARY="es_CL.UTF-8"
$ meson test --suite setup && meson test pg_upgrade/002_pg_upgrade
... snip ...
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR
69.18s exit status 4

with_icu=no LD_LIBRARY_PATH=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install//home/ashutosh/work/units/pg_dump_test/build/dev/lib/x86_64-linux-gnu REGRESS_SHLIB=/home/ashutosh/work/units/pg_dump_test/build/dev/src/test/regress/regress.so PATH=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install//home/ashutosh/work/units/pg_dump_test/build/dev/bin:/home/ashutosh/work/units/pg_dump_test/build/dev/src/bin/pg_upgrade:/home/ashutosh/work/units/pg_dump_test/build/dev/src/bin/pg_upgrade/test:/home/ashutosh/work/units/pg_dump_test/build/dev/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin MALLOC_PERTURB_=30 share_contrib_dir=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install//home/ashutosh/work/units/pg_dump_test/build/dev/share/postgresql/contrib PG_REGRESS=/home/ashutosh/work/units/pg_dump_test/build/dev/src/test/regress/pg_regress top_builddir=/home/ashutosh/work/units/pg_dump_test/build/dev INITDB_TEMPLATE=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install/initdb-template /usr/bin/python3 /home/ashutosh/work/units/pg_dump_test/build/dev/../../coderoot/pg/src/tools/testwrap --basedir /home/ashutosh/work/units/pg_dump_test/build/dev --srcdir /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/bin/pg_upgrade --pg-test-extra '' --testgroup pg_upgrade --testname 002_pg_upgrade -- /usr/bin/perl -I /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/test/perl -I /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/bin/pg_upgrade /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/bin/pg_upgrade/t/002_pg_upgrade.pl

Ok: 0
Expected Fail: 0
Fail: 1
Unexpected Pass: 0
Skipped: 0
Timeout: 0

I see what's happening. If I set LC_MONETARY environment explicitly,
that's taken by initdb
$ export LC_MONETARY="es_CL.UTF-8";rm -rf $DataDir; $BinDir/initdb -D
$DataDir -A trust -N --wal-segsize 1 --allow-group-access --encoding
UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin
--builtin-locale C.UTF-8 -k
The files belonging to this database system will be owned by user "ashutosh".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: es_CL.UTF-8
LC_NUMERIC: en_US.UTF-8
LC_TIME: en_US.UTF-8
The default text search configuration will be set to "english".

If I don't set it explicitly, it's taken from default settings
$ unset LC_MONETARY;rm -rf $DataDir; $BinDir/initdb -D $DataDir -A
trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8
--lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale
C.UTF-8 -k
The files belonging to this database system will be owned by user "ashutosh".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: en_US.UTF-8
LC_NUMERIC: en_US.UTF-8
LC_TIME: en_US.UTF-8
The default text search configuration will be set to "english".

In your case probably your default setting is es_CL.UTF-8 or have set
LC_MONETARY explicitly in your environment.

I think the fix is to explicitly pass --lc-monetary to the old cluster
and the restored cluster. 003 patch in the attached patch set does
that. Please check if it fixes the issue for you.

Additionally we should check that it gets copied to the new cluster as
well. But I haven't figured out how to get those settings yet. This
treatment is similar to how --lc-collate and --lc-ctype are treated. I
am wondering whether we should explicitly pass --lc-messages,
--lc-time and --lc-numeric as well.

2d819a08a1cbc11364e36f816b02e33e8dcc030b introduced buildin locale
provider and added overrides to LC_COLLATE and LC_TYPE. But it did not
override other LC_, which I think it should have. In pure upgrade
test, the upgraded node inherits the locale settings of the original
cluster, so this wasn't apparent. But with pg_dump testing, the
original and restored databases are independent. Hence I think we have
to override all LC_* settings by explicitly mentioning --lc-* options
to initdb. Please let me know what you think about this?

--
Best Wishes,
Ashutosh Bapat

#51

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#50)

3 attachment(s)

Re: Test to dump and restore objects left behind by regression

Here are patches missing in the previous email.

On Thu, Mar 13, 2025 at 6:09 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Mar 13, 2025 at 2:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello

On 2025-Mar-13, Ashutosh Bapat wrote:

1. can you please run the test again and share the dump outputs. They
will be located in a temporary directory with names
src_dump.sql_adjusted and dest_dump.<format>.sql_adjusted.

Ah, I see the problem :-) The first initdb does this:

# Running: initdb -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
The files belonging to this database system will be owned by user "alvherre".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: C
LC_MONETARY: es_CL.UTF-8
LC_NUMERIC: es_CL.UTF-8
LC_TIME: es_CL.UTF-8
The default text search configuration will be set to "english".

Data page checksums are enabled.

which for some reason used my environment setting for LC_MONETARY.

Thanks. This is super helpful. I am able to reproduce the problem
$ unset LC_MONETARY
$ export PG_TEST_EXTRA=regress_dump_test
$ meson test --suite setup && meson test pg_upgrade/002_pg_upgrade
... snip ...
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
72.38s 44 subtests passed

Ok: 1
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 0
Timeout: 0

Full log written to
/home/ashutosh/work/units/pg_dump_test/build/dev/meson-logs/testlog.txt
$ export LC_MONETARY="es_CL.UTF-8"
$ meson test --suite setup && meson test pg_upgrade/002_pg_upgrade
... snip ...
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR
69.18s exit status 4

with_icu=no LD_LIBRARY_PATH=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install//home/ashutosh/work/units/pg_dump_test/build/dev/lib/x86_64-linux-gnu REGRESS_SHLIB=/home/ashutosh/work/units/pg_dump_test/build/dev/src/test/regress/regress.so PATH=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install//home/ashutosh/work/units/pg_dump_test/build/dev/bin:/home/ashutosh/work/units/pg_dump_test/build/dev/src/bin/pg_upgrade:/home/ashutosh/work/units/pg_dump_test/build/dev/src/bin/pg_upgrade/test:/home/ashutosh/work/units/pg_dump_test/build/dev/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin MALLOC_PERTURB_=30 share_contrib_dir=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install//home/ashutosh/work/units/pg_dump_test/build/dev/share/postgresql/contrib PG_REGRESS=/home/ashutosh/work/units/pg_dump_test/build/dev/src/test/regress/pg_regress top_builddir=/home/ashutosh/work/units/pg_dump_test/build/dev INITDB_TEMPLATE=/home/ashutosh/work/units/pg_dump_test/build/dev/tmp_install/initdb-template /usr/bin/python3 /home/ashutosh/work/units/pg_dump_test/build/dev/../../coderoot/pg/src/tools/testwrap --basedir /home/ashutosh/work/units/pg_dump_test/build/dev --srcdir /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/bin/pg_upgrade --pg-test-extra '' --testgroup pg_upgrade --testname 002_pg_upgrade -- /usr/bin/perl -I /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/test/perl -I /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/bin/pg_upgrade /home/ashutosh/work/units/pg_dump_test/coderoot/pg/src/bin/pg_upgrade/t/002_pg_upgrade.pl

Ok: 0
Expected Fail: 0
Fail: 1
Unexpected Pass: 0
Skipped: 0
Timeout: 0

I see what's happening. If I set LC_MONETARY environment explicitly,
that's taken by initdb
$ export LC_MONETARY="es_CL.UTF-8";rm -rf $DataDir; $BinDir/initdb -D
$DataDir -A trust -N --wal-segsize 1 --allow-group-access --encoding
UTF-8 --lc-collate C --lc-ctype C --locale-provider builtin
--builtin-locale C.UTF-8 -k
The files belonging to this database system will be owned by user "ashutosh".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: es_CL.UTF-8
LC_NUMERIC: en_US.UTF-8
LC_TIME: en_US.UTF-8
The default text search configuration will be set to "english".

If I don't set it explicitly, it's taken from default settings
$ unset LC_MONETARY;rm -rf $DataDir; $BinDir/initdb -D $DataDir -A
trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8
--lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale
C.UTF-8 -k
The files belonging to this database system will be owned by user "ashutosh".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
locale provider: builtin
default collation: C.UTF-8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: en_US.UTF-8
LC_NUMERIC: en_US.UTF-8
LC_TIME: en_US.UTF-8
The default text search configuration will be set to "english".

In your case probably your default setting is es_CL.UTF-8 or have set
LC_MONETARY explicitly in your environment.

I think the fix is to explicitly pass --lc-monetary to the old cluster
and the restored cluster. 003 patch in the attached patch set does
that. Please check if it fixes the issue for you.

Additionally we should check that it gets copied to the new cluster as
well. But I haven't figured out how to get those settings yet. This
treatment is similar to how --lc-collate and --lc-ctype are treated. I
am wondering whether we should explicitly pass --lc-messages,
--lc-time and --lc-numeric as well.

2d819a08a1cbc11364e36f816b02e33e8dcc030b introduced buildin locale
provider and added overrides to LC_COLLATE and LC_TYPE. But it did not
override other LC_, which I think it should have. In pure upgrade
test, the upgraded node inherits the locale settings of the original
cluster, so this wasn't apparent. But with pg_dump testing, the
original and restored databases are independent. Hence I think we have
to override all LC_* settings by explicitly mentioning --lc-* options
to initdb. Please let me know what you think about this?

--
Best Wishes,
Ashutosh Bapat

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20250313.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250313.patchDownload

From ec6c178ba0a1a20dc989fee94fdc8d53d531e2e4 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered two bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 141 ++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 167 ++++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 321 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index c00cf68d660..bd8313cee6f 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -261,6 +262,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -527,4 +543,125 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. The dump from
+# `src_node` is restored on a fresh node created using given `node_params`.
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		# Create a new database for restoring dump from every format so that it
+		# is available for debugging in case the test fails.
+		$dst_node->command_ok([ 'createdb', $restored_db ],
+			"created destination database '$restored_db'");
+
+		# Restore into destination database.
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [
+				'psql', '-d', $dst_node->connstr($restored_db),
+				'-f', $dump_file
+			];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '-d',
+				$dst_node->connstr($restored_db), $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, $restored_db,
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+	# Usually we avoid comparing statistics in our tests since it is flaky by
+	# nature. However, if statistics is dumped and restored it is expected to be
+	# restored as it is i.e. the statistics from the original database and that
+	# from the restored database should match. We turn off autovacuum on the
+	# source and the target database to avoid any statistics update during
+	# restore operation. Hence we do not exclude statistics from dump.
+	$node->command_ok(
+		[
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
+
+Note: Usually we avoid comparing statistics in our tests since it is flaky by
+nature. However, if statistics is dumped and restored it is expected to be
+restored as it is i.e. the statistics from the original database and that from
+the restored database should match. Hence we do not filter statistics from dump,
+if it's dumped.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: 3691edfab97187789b8a1cbb9dce4acf0ecd8f5a
-- 
2.34.1

0003-set-lc_monetary-explicitly-at-initdb-time-20250313.patchtext/x-patch; charset=US-ASCII; name=0003-set-lc_monetary-explicitly-at-initdb-time-20250313.patchDownload

From 2f41b4371c0f0b8a3535537c4e4f9ddd1118d1ce Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Thu, 13 Mar 2025 16:17:57 +0530
Subject: [PATCH 3/3] set lc_monetary explicitly at initdb time

---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 65f4c7d4f2b..51ba79c8589 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -134,6 +134,7 @@ my $original_enc_name;
 my $original_provider;
 my $original_datcollate = "C";
 my $original_datctype = "C";
+my $original_datmonetary = "C";
 my $original_datlocale;
 
 if ($oldnode->pg_version >= '17devel')
@@ -163,6 +164,7 @@ my @initdb_params = @custom_opts;
 push @initdb_params, ('--encoding', $original_enc_name);
 push @initdb_params, ('--lc-collate', $original_datcollate);
 push @initdb_params, ('--lc-ctype', $original_datctype);
+push @initdb_params, ('--lc-monetary', $original_datmonetary);
 
 # add --locale-provider, if supported
 my %provider_name = ('b' => 'builtin', 'i' => 'icu', 'c' => 'libc');
-- 
2.34.1

0002-Do-not-dump-statistics-in-the-file-dumped-f-20250313.patchtext/x-patch; charset=US-ASCII; name=0002-Do-not-dump-statistics-in-the-file-dumped-f-20250313.patchDownload

From 75f2b869764d9db3ac5c548636ed5c2be8b47b36 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 2/3] Do not dump statistics in the file dumped for comparison

The dumped and restored statistics of a materialized view may differ as
reported in [1].  Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index bd8313cee6f..65f4c7d4f2b 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -641,15 +641,15 @@ sub get_dump_for_comparison
 	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
-	# Usually we avoid comparing statistics in our tests since it is flaky by
-	# nature. However, if statistics is dumped and restored it is expected to be
-	# restored as it is i.e. the statistics from the original database and that
-	# from the restored database should match. We turn off autovacuum on the
-	# source and the target database to avoid any statistics update during
-	# restore operation. Hence we do not exclude statistics from dump.
+	# If statistics is dumped and restored it is expected to be restored as it
+	# is i.e. the statistics from the original database and that from the
+	# restored database should match. We turn off autovacuum on the source and
+	# the target database to avoid any statistics update during restore
+	# operation. But as of now, there are cases when statistics is not being
+	# restored faithfully. Hence for now do not dump statistics.
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

#52

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#51)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 13, 2025 at 6:10 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I think the fix is to explicitly pass --lc-monetary to the old cluster
and the restored cluster. 003 patch in the attached patch set does
that. Please check if it fixes the issue for you.

Additionally we should check that it gets copied to the new cluster as
well. But I haven't figured out how to get those settings yet. This
treatment is similar to how --lc-collate and --lc-ctype are treated. I
am wondering whether we should explicitly pass --lc-messages,
--lc-time and --lc-numeric as well.

2d819a08a1cbc11364e36f816b02e33e8dcc030b introduced buildin locale
provider and added overrides to LC_COLLATE and LC_TYPE. But it did not
override other LC_, which I think it should have. In pure upgrade
test, the upgraded node inherits the locale settings of the original
cluster, so this wasn't apparent. But with pg_dump testing, the
original and restored databases are independent. Hence I think we have
to override all LC_* settings by explicitly mentioning --lc-* options
to initdb. Please let me know what you think about this?

Investigated this further. The problem is that the pg_regress run
creates regression database with specific properties but the restored
database does not have those properties. That led me to a better
solution. Additionally it's local to the new test. Use --create when
dumping and restoring the regression database. This way the database
properties or "configuration variable settings (as pg_dump
documentation calls them) are copied to the restored database as well.
Those properties include LC_MONETARY. Additionally now the test covers
--create option as well.

PFA patches.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Do-not-dump-statistics-in-the-file-dumped-f-20250319.patchtext/x-patch; charset=US-ASCII; name=0002-Do-not-dump-statistics-in-the-file-dumped-f-20250319.patchDownload

From 886e241e304a23bb31b5e59f12149741dfff2b14 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 2/2] Do not dump statistics in the file dumped for comparison

The dumped and restored statistics of a materialized view may differ as
reported in [1].  Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index d08eea6693f..f931fef2307 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -656,15 +656,15 @@ sub get_dump_for_comparison
 	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
-	# Usually we avoid comparing statistics in our tests since it is flaky by
-	# nature. However, if statistics is dumped and restored it is expected to be
-	# restored as it is i.e. the statistics from the original database and that
-	# from the restored database should match. We turn off autovacuum on the
-	# source and the target database to avoid any statistics update during
-	# restore operation. Hence we do not exclude statistics from dump.
+	# If statistics is dumped and restored it is expected to be restored as it
+	# is i.e. the statistics from the original database and that from the
+	# restored database should match. We turn off autovacuum on the source and
+	# the target database to avoid any statistics update during restore
+	# operation. But as of now, there are cases when statistics is not being
+	# restored faithfully. Hence for now do not dump statistics.
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20250319.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250319.patchDownload

From 1723050dadb89f3187fef19c994d8c866ee5a788 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered many bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.
3. Fixed by d611f8b1587b8f30caa7c0da99ae5d28e914d54f
3. Being discussed on hackers at https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 144 ++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 167 ++++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 00051b85035..d08eea6693f 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +263,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -539,4 +555,128 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. A fresh node is
+# created using given `node_params`, which are expected to be the same ones used
+# to create `src_node`, so as to avoid any differences in the databases.
+#
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	# Test all formats one by one.
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Use --create in dump and restore commands so that the restored
+		# database has the same configurable variable settings as the original
+		# database and the plain dumps taken for comparsion do not differ
+		# because of locale changes. Additionally this provides test coverage
+		# for --create option.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'--create', '-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '--create',
+				'-d', 'postgres', $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, 'regression',
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+
+		# Rename the restored database so that it is available for debugging in
+		# case the test fails.
+		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+	# Usually we avoid comparing statistics in our tests since it is flaky by
+	# nature. However, if statistics is dumped and restored it is expected to be
+	# restored as it is i.e. the statistics from the original database and that
+	# from the restored database should match. We turn off autovacuum on the
+	# source and the target database to avoid any statistics update during
+	# restore operation. Hence we do not exclude statistics from dump.
+	$node->command_ok(
+		[
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
+
+Note: Usually we avoid comparing statistics in our tests since it is flaky by
+nature. However, if statistics is dumped and restored it is expected to be
+restored as it is i.e. the statistics from the original database and that from
+the restored database should match. Hence we do not filter statistics from dump,
+if it's dumped.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: 190dc27998d5b7b4c36e12bebe62f7176f4b4507
-- 
2.34.1

#53

vignesh C

vignesh21@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#52)

Re: Test to dump and restore objects left behind by regression

On Wed, 19 Mar 2025 at 17:13, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Mar 13, 2025 at 6:10 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I think the fix is to explicitly pass --lc-monetary to the old cluster
and the restored cluster. 003 patch in the attached patch set does
that. Please check if it fixes the issue for you.

Additionally we should check that it gets copied to the new cluster as
well. But I haven't figured out how to get those settings yet. This
treatment is similar to how --lc-collate and --lc-ctype are treated. I
am wondering whether we should explicitly pass --lc-messages,
--lc-time and --lc-numeric as well.

2d819a08a1cbc11364e36f816b02e33e8dcc030b introduced buildin locale
provider and added overrides to LC_COLLATE and LC_TYPE. But it did not
override other LC_, which I think it should have. In pure upgrade
test, the upgraded node inherits the locale settings of the original
cluster, so this wasn't apparent. But with pg_dump testing, the
original and restored databases are independent. Hence I think we have
to override all LC_* settings by explicitly mentioning --lc-* options
to initdb. Please let me know what you think about this?

Investigated this further. The problem is that the pg_regress run
creates regression database with specific properties but the restored
database does not have those properties. That led me to a better
solution. Additionally it's local to the new test. Use --create when
dumping and restoring the regression database. This way the database
properties or "configuration variable settings (as pg_dump
documentation calls them) are copied to the restored database as well.
Those properties include LC_MONETARY. Additionally now the test covers
--create option as well.

PFA patches.

Will it help the execution time if we use --jobs in case of pg_dump
and pg_restore wherever supported:
+               $src_node->command_ok(
+                       [
+                               'pg_dump', "-F$format", '--no-sync',
+                               '-d', $src_node->connstr('regression'),
+                               '--create', '-f', $dump_file
+                       ],
+                       "pg_dump on source instance in $format format");
+
+               my @restore_command;
+               if ($format eq 'plain')
+               {
+                       # Restore dump in "plain" format using `psql`.
+                       @restore_command = [ 'psql', '-d', 'postgres',
'-f', $dump_file ];
+               }
+               else
+               {
+                       @restore_command = [
+                               'pg_restore', '--create',
+                               '-d', 'postgres', $dump_file
+                       ];
+               }

Should the copyright be only 2025 in this case:
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group

Regards,
Vignesh

#54

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: vignesh C (#53)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-20, vignesh C wrote:

Will it help the execution time if we use --jobs in case of pg_dump
and pg_restore wherever supported:

As I said in another thread, I think we should enable this test to run
without requiring any PG_TEST_EXTRA, because otherwise the only way to
know about problems is to commit a patch and wait for buildfarm to run
it. Furthermore, I think running all 4 dump format modes is a waste of
time; there isn't any extra coverage by running this test in additional
formats.

Putting those two thoughts together with yours about running with -j,
I propose that what we should do is make this test use -Fc with no
compression (to avoid wasting CPU on that) and use a lowish -j value for
both pg_dump and pg_restore, probably 2, or 3 at most. (Not more,
because this is likely to run in parallel with other tests anyway.)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."

#55

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: vignesh C (#53)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 20, 2025 at 8:37 PM vignesh C <vignesh21@gmail.com> wrote:

Will it help the execution time if we use --jobs in case of pg_dump
and pg_restore wherever supported:
+               $src_node->command_ok(
+                       [
+                               'pg_dump', "-F$format", '--no-sync',
+                               '-d', $src_node->connstr('regression'),
+                               '--create', '-f', $dump_file
+                       ],
+                       "pg_dump on source instance in $format format");
+
+               my @restore_command;
+               if ($format eq 'plain')
+               {
+                       # Restore dump in "plain" format using `psql`.
+                       @restore_command = [ 'psql', '-d', 'postgres',
'-f', $dump_file ];
+               }
+               else
+               {
+                       @restore_command = [
+                               'pg_restore', '--create',
+                               '-d', 'postgres', $dump_file
+                       ];
+               }

Will reply to this separately along with reply to Alvaro's comments.

Should the copyright be only 2025 in this case:
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group

The patch was posted in 2024 to this mailing list. So we better
protect the copyright since then. I remember a hackers discussion
where a senior member of the community mentioned that there's not harm
in mentioning longer copyright periods than being stricter about it. I
couldn't find the discussion though.

--
Best Wishes,
Ashutosh Bapat

#56

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#55)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-21, Ashutosh Bapat wrote:

On Thu, Mar 20, 2025 at 8:37 PM vignesh C <vignesh21@gmail.com> wrote:

Should the copyright be only 2025 in this case:

The patch was posted in 2024 to this mailing list. So we better
protect the copyright since then. I remember a hackers discussion
where a senior member of the community mentioned that there's not harm
in mentioning longer copyright periods than being stricter about it. I
couldn't find the discussion though.

On the other hand, my impression is that we do update copyright years to
current year, when committing new files of patches that have been around
for long.

And there's always
https://liferay.dev/blogs/-/blogs/how-and-why-to-properly-write-copyright-statements-in-your-code

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)

#57

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#56)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 21, 2025 at 6:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-21, Ashutosh Bapat wrote:

On Thu, Mar 20, 2025 at 8:37 PM vignesh C <vignesh21@gmail.com> wrote:

Should the copyright be only 2025 in this case:

The patch was posted in 2024 to this mailing list. So we better
protect the copyright since then. I remember a hackers discussion
where a senior member of the community mentioned that there's not harm
in mentioning longer copyright periods than being stricter about it. I
couldn't find the discussion though.

On the other hand, my impression is that we do update copyright years to
current year, when committing new files of patches that have been around
for long.

And there's always
https://liferay.dev/blogs/-/blogs/how-and-why-to-properly-write-copyright-statements-in-your-code

Right. So shouldn't the copyright notice be 2024-2025 and not just
only 2025? - Next year it will be changed to 2024-2026.

--
Best Wishes,
Ashutosh Bapat

#58

vignesh C

vignesh21@gmail.com

10 months ago

In reply to: Alvaro Herrera (#54)

Re: Test to dump and restore objects left behind by regression

On Thu, 20 Mar 2025 at 22:09, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-20, vignesh C wrote:

Will it help the execution time if we use --jobs in case of pg_dump
and pg_restore wherever supported:

As I said in another thread, I think we should enable this test to run
without requiring any PG_TEST_EXTRA, because otherwise the only way to
know about problems is to commit a patch and wait for buildfarm to run
it. Furthermore, I think running all 4 dump format modes is a waste of
time; there isn't any extra coverage by running this test in additional
formats.

+1 for running it in only one of the formats.

Regards,
Vignesh

#59

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#54)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 20, 2025 at 10:09 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-20, vignesh C wrote:

Will it help the execution time if we use --jobs in case of pg_dump
and pg_restore wherever supported:

As I said in another thread, I think we should enable this test to run
without requiring any PG_TEST_EXTRA, because otherwise the only way to
know about problems is to commit a patch and wait for buildfarm to run
it. Furthermore, I think running all 4 dump format modes is a waste of
time; there isn't any extra coverage by running this test in additional
formats.

Putting those two thoughts together with yours about running with -j,
I propose that what we should do is make this test use -Fc with no
compression (to avoid wasting CPU on that) and use a lowish -j value for
both pg_dump and pg_restore, probably 2, or 3 at most. (Not more,
because this is likely to run in parallel with other tests anyway.)

-Fc and -j are not allowed. -j is only allowed for directory format.

$ pg_dump -Fc -j2
pg_dump: error: parallel backup only supported by the directory format

Using just directory format, on my laptop with dev build (because
that's what most developers will use when running the tests)

$ meson test -C $BuildDir pg_upgrade/002_pg_upgrade | grep 002_pg_upgrade

without dump/restore test
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
33.51s 19 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
34.22s 19 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
34.64s 19 subtests passed

without -j, extra ~9 seconds
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
43.33s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
43.25s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
43.10s 28 subtests passed

with -j2, extra 7.5 seconds
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
42.77s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.67s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.88s 28 subtests passed

with -j3, extra 7 seconds
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
40.77s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.05s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.28s 28 subtests passed

Between -j2 and -j3 there's not much difference so we could use -j2.
But it still takes 7.5 extra seconds which almost 20% extra time. Do
you think that will be acceptable? I saw somewhere Andres mentioning
that he runs this test quite frequently. Please note that I would very
much like this test to be run by default, but Tom Lane has expressed a
concern about adding even that much time [1] to run the test and
mentioned that he would like the test to be opt-in.

When I started writing the test one year before, people raised
concerns about how useful the test would be. Within a year it has
shown 4 bugs. I have similar feeling about the formats - it's doubtful
now but will prove useful soon especially with the work happening on
dump formats in nearby threads. If we run the test by default, we
could run directory with -j by default and leave other formats as
opt-in OR just forget those formats for now. But If we are going to
make it opt-in, testing all formats gives the extra coverage.

About the format coverage, consensus so far is me and Daniel are for
including all formats when running test as opt-in. Alvaro and Vignesh
are for just one format. We need a tie-breaker or someone amongst us
needs to change their vote :D.

--
Best Wishes,
Ashutosh Bapat

#60

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#59)

Re: Test to dump and restore objects left behind by regression

I passed PROVE_FLAGS="--timer -v" to get the timings and run under
--format=directory.

Without new test:
ok 23400 ms ( 0.00 usr 0.00 sys + 2.84 cusr 1.53 csys = 4.37 CPU)
ok 23409 ms ( 0.00 usr 0.01 sys + 2.81 cusr 1.53 csys = 4.35 CPU)

With new test, under --format=directory:
-j2 (parallel, default gzip compression)
ok 27517 ms ( 0.00 usr 0.00 sys + 3.92 cusr 1.86 csys = 5.78 CPU)
ok 27772 ms ( 0.01 usr 0.00 sys + 3.96 cusr 1.86 csys = 5.83 CPU)
ok 27654 ms ( 0.00 usr 0.00 sys + 3.81 cusr 1.94 csys = 5.75 CPU)
ok 27663 ms ( 0.00 usr 0.00 sys + 4.11 cusr 1.71 csys = 5.82 CPU)

-j2 --compress=0
ok 27710 ms ( 0.00 usr 0.00 sys + 3.79 cusr 1.86 csys = 5.65 CPU)
ok 27567 ms ( 0.01 usr 0.00 sys + 3.67 cusr 1.96 csys = 5.64 CPU)
ok 27582 ms ( 0.00 usr 0.00 sys + 3.60 cusr 1.90 csys = 5.50 CPU)
ok 27519 ms ( 0.01 usr 0.00 sys + 3.71 cusr 1.80 csys = 5.52 CPU)

-j2 --compress=zstd
ok 27240 ms ( 0.01 usr 0.00 sys + 3.65 cusr 2.10 csys = 5.76 CPU)
ok 27301 ms ( 0.01 usr 0.00 sys + 3.77 cusr 1.97 csys = 5.75 CPU)

-j2 --compress=zstd:1
ok 27695 ms ( 0.01 usr 0.00 sys + 3.66 cusr 2.05 csys = 5.72 CPU)
ok 27671 ms ( 0.01 usr 0.00 sys + 3.76 cusr 1.95 csys = 5.72 CPU)

--compress=zstd:1 (no parallelism)
ok 28417 ms ( 0.01 usr 0.00 sys + 3.90 cusr 1.75 csys = 5.66 CPU)
ok 28388 ms ( 0.00 usr 0.00 sys + 3.74 cusr 1.81 csys = 5.55 CPU)

--compress=zstd (no parallelism)
ok 28310 ms ( 0.00 usr 0.01 sys + 3.81 cusr 1.83 csys = 5.65 CPU)
ok 28277 ms ( 0.01 usr 0.00 sys + 3.71 cusr 1.87 csys = 5.59 CPU)

So apparently, zstd if available is a bit better than gzip and
parallelism is better than no. But the differences are small -- half a
second or so. The total increase in runtime in the best case is about
four seconds. In all cases I used the same parallelism in pg_restore
than pg_dump; not sure if that could cause a difference.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#61

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#60)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 21, 2025 at 8:13 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I passed PROVE_FLAGS="--timer -v" to get the timings and run under
--format=directory.

Without new test:
ok 23400 ms ( 0.00 usr 0.00 sys + 2.84 cusr 1.53 csys = 4.37 CPU)
ok 23409 ms ( 0.00 usr 0.01 sys + 2.81 cusr 1.53 csys = 4.35 CPU)

With new test, under --format=directory:
-j2 (parallel, default gzip compression)
ok 27517 ms ( 0.00 usr 0.00 sys + 3.92 cusr 1.86 csys = 5.78 CPU)
ok 27772 ms ( 0.01 usr 0.00 sys + 3.96 cusr 1.86 csys = 5.83 CPU)
ok 27654 ms ( 0.00 usr 0.00 sys + 3.81 cusr 1.94 csys = 5.75 CPU)
ok 27663 ms ( 0.00 usr 0.00 sys + 4.11 cusr 1.71 csys = 5.82 CPU)

-j2 --compress=0
ok 27710 ms ( 0.00 usr 0.00 sys + 3.79 cusr 1.86 csys = 5.65 CPU)
ok 27567 ms ( 0.01 usr 0.00 sys + 3.67 cusr 1.96 csys = 5.64 CPU)
ok 27582 ms ( 0.00 usr 0.00 sys + 3.60 cusr 1.90 csys = 5.50 CPU)
ok 27519 ms ( 0.01 usr 0.00 sys + 3.71 cusr 1.80 csys = 5.52 CPU)

-j2 --compress=zstd
ok 27240 ms ( 0.01 usr 0.00 sys + 3.65 cusr 2.10 csys = 5.76 CPU)
ok 27301 ms ( 0.01 usr 0.00 sys + 3.77 cusr 1.97 csys = 5.75 CPU)

-j2 --compress=zstd:1
ok 27695 ms ( 0.01 usr 0.00 sys + 3.66 cusr 2.05 csys = 5.72 CPU)
ok 27671 ms ( 0.01 usr 0.00 sys + 3.76 cusr 1.95 csys = 5.72 CPU)

--compress=zstd:1 (no parallelism)
ok 28417 ms ( 0.01 usr 0.00 sys + 3.90 cusr 1.75 csys = 5.66 CPU)
ok 28388 ms ( 0.00 usr 0.00 sys + 3.74 cusr 1.81 csys = 5.55 CPU)

--compress=zstd (no parallelism)
ok 28310 ms ( 0.00 usr 0.01 sys + 3.81 cusr 1.83 csys = 5.65 CPU)
ok 28277 ms ( 0.01 usr 0.00 sys + 3.71 cusr 1.87 csys = 5.59 CPU)

So apparently, zstd if available is a bit better than gzip and
parallelism is better than no. But the differences are small -- half a
second or so. The total increase in runtime in the best case is about
four seconds. In all cases I used the same parallelism in pg_restore
than pg_dump; not sure if that could cause a difference.

I used the same parallelism in pg_restore and pg_dump too. And your
numbers seem to be similar to mine; slightly less than 20% slowdown.
But is that slowdown acceptable? From the earlier discussions, it
seems the answer is No. Haven't heard otherwise.

--
Best Wishes,
Ashutosh Bapat

#62

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#61)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-21, Ashutosh Bapat wrote:

I used the same parallelism in pg_restore and pg_dump too. And your
numbers seem to be similar to mine; slightly less than 20% slowdown.
But is that slowdown acceptable? From the earlier discussions, it
seems the answer is No. Haven't heard otherwise.

I don't think we need to see slowdown this in relative terms, the way we
would discuss a change in the executor. This is not a change that
would affect user-level stuff in any way. We need to see it in absolute
terms: in machines similar to mine, the pg_upgrade test would go from
taking 23s to taking 27s. This is 4s slower, but this isn't an increase
in total test runtime, because decently run test suites run multiple
tests in parallel. This is the same that Peter said in [1]/messages/by-id/b0635739-39f0-4a29-9127-f62aa570a2d8@eisentraut.org. The total
test runtime change might not be *that* large. I'll take a few numbers
and report back.

[1]: /messages/by-id/b0635739-39f0-4a29-9127-f62aa570a2d8@eisentraut.org

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)

#63

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#62)

3 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 21, 2025 at 11:38 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-21, Ashutosh Bapat wrote:

I used the same parallelism in pg_restore and pg_dump too. And your
numbers seem to be similar to mine; slightly less than 20% slowdown.
But is that slowdown acceptable? From the earlier discussions, it
seems the answer is No. Haven't heard otherwise.

I don't think we need to see slowdown this in relative terms, the way we
would discuss a change in the executor. This is not a change that
would affect user-level stuff in any way. We need to see it in absolute
terms: in machines similar to mine, the pg_upgrade test would go from
taking 23s to taking 27s. This is 4s slower, but this isn't an increase
in total test runtime, because decently run test suites run multiple
tests in parallel. This is the same that Peter said in [1]. The total
test runtime change might not be *that* large. I'll take a few numbers
and report back.

Using -j2 in pg_dump and -j3 in pg_restore does not improve timing
much on my laptop. I have used -j2 for both pg_dump and restore
instead of -j3 so as to avoid using more cores when tests are run in
parallel.

Further to reduce run time, I tried -1/--single-transaction but that's
not allowed with --create. I also tried --transaction-size=1000 but
that doesn't affect the run time of the test. Next I thought of using
standard output and input instead of files but it doesn't help since
1. directory format cannot use those and it's the only format allowing
parallelism, 2. that's slower than using files with --no-sync. Didn't
find any other way which can help us reduce the test time.

Please note that the dumps taken for comparison cannot use -j since
they are required to be in "plain" format so that text manipulation
comparison works on them.

One concern I have with directory format is the dumped database is not
readable. This might make investigating a but identified the test a
bit more complex. But I guess, in such a case investigator can either
use the dumps taken for comparison or change the code to use plain
format for investigation. So it's a price we pay for making test
faster.

Here's next patchset:
0001 - it's the same 0001 patch as previous one, includes the test
with all formats and also the PG_TEST_EXTRA option

0002 - removes PG_TEST_EXTRA and also tests only one format
--directory with -j2 with default compression. It should be merged
into 0001 before committing. This is a separate patch for now in case
we decide to go back to 0001.

0003 - same as 0002 in the previous patch set. It excludes statistics
from comparison, otherwise the test will fail because of bug reported
at [1]/messages/by-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com. Ideally we shouldn't commit this patch so as to test
statistics dump and restore, but in case we need the test to pass till
the bug is fixed, we should merge this patch to 0001 before
committing.

[1]: /messages/by-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20250324.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250324.patchDownload

From fcfd0d25ecd374d55970817b4d3ea2aecdd58251 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered many bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.
3. Fixed by d611f8b1587b8f30caa7c0da99ae5d28e914d54f
3. Being discussed on hackers at https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 144 ++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 167 ++++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 00051b85035..d08eea6693f 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +263,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -539,4 +555,128 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. A fresh node is
+# created using given `node_params`, which are expected to be the same ones used
+# to create `src_node`, so as to avoid any differences in the databases.
+#
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	# Test all formats one by one.
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Use --create in dump and restore commands so that the restored
+		# database has the same configurable variable settings as the original
+		# database and the plain dumps taken for comparsion do not differ
+		# because of locale changes. Additionally this provides test coverage
+		# for --create option.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'--create', '-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '--create',
+				'-d', 'postgres', $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, 'regression',
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+
+		# Rename the restored database so that it is available for debugging in
+		# case the test fails.
+		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+	# Usually we avoid comparing statistics in our tests since it is flaky by
+	# nature. However, if statistics is dumped and restored it is expected to be
+	# restored as it is i.e. the statistics from the original database and that
+	# from the restored database should match. We turn off autovacuum on the
+	# source and the target database to avoid any statistics update during
+	# restore operation. Hence we do not exclude statistics from dump.
+	$node->command_ok(
+		[
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
+
+Note: Usually we avoid comparing statistics in our tests since it is flaky by
+nature. However, if statistics is dumped and restored it is expected to be
+restored as it is i.e. the statistics from the original database and that from
+the restored database should match. Hence we do not filter statistics from dump,
+if it's dumped.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: 73eba5004a06a744b6b8570e42432b9e9f75997b
-- 
2.34.1

0003-Do-not-dump-statistics-in-the-file-dumped-f-20250324.patchtext/x-patch; charset=US-ASCII; name=0003-Do-not-dump-statistics-in-the-file-dumped-f-20250324.patchDownload

From 435c659489b34a803675abb65144fab6f0550432 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 3/3] Do not dump statistics in the file dumped for comparison

The dumped and restored statistics of a materialized view may differ as
reported in [1].  Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index cbd9831bf9e..abe93a49258 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -630,15 +630,15 @@ sub get_dump_for_comparison
 	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
-	# Usually we avoid comparing statistics in our tests since it is flaky by
-	# nature. However, if statistics is dumped and restored it is expected to be
-	# restored as it is i.e. the statistics from the original database and that
-	# from the restored database should match. We turn off autovacuum on the
-	# source and the target database to avoid any statistics update during
-	# restore operation. Hence we do not exclude statistics from dump.
+	# If statistics is dumped and restored it is expected to be restored as it
+	# is i.e. the statistics from the original database and that from the
+	# restored database should match. We turn off autovacuum on the source and
+	# the target database to avoid any statistics update during restore
+	# operation. But as of now, there are cases when statistics is not being
+	# restored faithfully. Hence for now do not dump statistics.
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

0002-Use-only-one-format-and-make-the-test-run-d-20250324.patchtext/x-patch; charset=US-ASCII; name=0002-Use-only-one-format-and-make-the-test-run-d-20250324.patchDownload

From f26f88364a196dc9589ca451cb54f5e514e3422e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Mon, 24 Mar 2025 11:21:12 +0530
Subject: [PATCH 2/3] Use only one format and make the test run default

According to Alvaro (and I agree with him), the test should be run by
default. Otherwise we get to know about a bug only after buildfarm
animal where it's enabled reports a failure. Further testing only one
format may suffice; since all the formats have shown the same bugs till
now.

If we use --directory format we can use -j which reduces the time taken
by dump/restore test by about 12%.

This patch removes PG_TEST_EXTRA option as well as runs the test only in
directory format with parallelism enabled.

Note for committer: If we decide to accept this change, it should be
merged with the previous commit.
---
 doc/src/sgml/regress.sgml              | 12 ----
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 76 +++++++++-----------------
 2 files changed, 25 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 237b974b3ab..0e5e8e8f309 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,18 +357,6 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
-
-    <varlistentry>
-     <term><literal>regress_dump_test</literal></term>
-     <listitem>
-      <para>
-       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
-       tests dump and restore of regression database left behind by the
-       regression run. Not enabled by default because it is time and resource
-       consuming.
-      </para>
-     </listitem>
-    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index d08eea6693f..cbd9831bf9e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -268,16 +268,9 @@ else
 	# should be done in a separate TAP test, but doing it here saves us one full
 	# regression run.
 	#
-	# This step takes several extra seconds and some extra disk space, so
-	# requires an opt-in with the PG_TEST_EXTRA environment variable.
-	#
 	# Do this while the old cluster is running before it is shut down by the
 	# upgrade test.
-	if (   $ENV{PG_TEST_EXTRA}
-		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
-	{
-		test_regression_dump_restore($oldnode, %node_params);
-	}
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -590,53 +583,34 @@ sub test_regression_dump_restore
 	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
 	$dst_node->start;
 
-	# Test all formats one by one.
-	for my $format ('plain', 'tar', 'directory', 'custom')
-	{
-		my $dump_file = "$tempdir/regression_dump.$format";
-		my $restored_db = 'regression_' . $format;
-
-		# Use --create in dump and restore commands so that the restored
-		# database has the same configurable variable settings as the original
-		# database and the plain dumps taken for comparsion do not differ
-		# because of locale changes. Additionally this provides test coverage
-		# for --create option.
-		$src_node->command_ok(
-			[
-				'pg_dump', "-F$format", '--no-sync',
-				'-d', $src_node->connstr('regression'),
-				'--create', '-f', $dump_file
-			],
-			"pg_dump on source instance in $format format");
+	my $dump_file = "$tempdir/regression.dump";
 
-		my @restore_command;
-		if ($format eq 'plain')
-		{
-			# Restore dump in "plain" format using `psql`.
-			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
-		}
-		else
-		{
-			@restore_command = [
-				'pg_restore', '--create',
-				'-d', 'postgres', $dump_file
-			];
-		}
-		$dst_node->command_ok(@restore_command,
-			"restored dump taken in $format format on destination instance");
+	# Use --create in dump and restore commands so that the restored database
+	# has the same configurable variable settings as the original database so
+	# that the plain dumps taken from both the database taken for comparisong do
+	# not differ because of locale changes. Additionally this provides test
+	# coverage for --create option.
+	#
+	# We use directory format which allows dumping and restoring in parallel to
+	# reduce the test's run time.
+	$src_node->command_ok(
+		[
+			'pg_dump', '-Fd', '-j2', '--no-sync',
+			'-d', $src_node->connstr('regression'),
+			'--create', '-f', $dump_file
+		],
+		"pg_dump on source instance succeeded");
 
-		my $dst_dump =
-		  get_dump_for_comparison($dst_node, 'regression',
-			'dest_dump.' . $format, 0);
+	$dst_node->command_ok(
+		[ 'pg_restore', '--create', '-j2', '-d', 'postgres', $dump_file ],
+		"restored dump to destination instance");
 
-		compare_files($src_dump, $dst_dump,
-			"dump outputs from original and restored regression database (using $format format) match"
-		);
+	my $dst_dump = get_dump_for_comparison($dst_node, 'regression',
+		'dest_dump', 0);
 
-		# Rename the restored database so that it is available for debugging in
-		# case the test fails.
-		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
-	}
+	compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database match"
+		);
 }
 
 # Dump database `db` from the given `node` in plain format and adjust it for
-- 
2.34.1

#64

Daniel Gustafsson

daniel@yesql.se

10 months ago

In reply to: Ashutosh Bapat (#63)

Re: Test to dump and restore objects left behind by regression

On 24 Mar 2025, at 10:54, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

0003 - same as 0002 in the previous patch set. It excludes statistics
from comparison, otherwise the test will fail because of bug reported
at [1]. Ideally we shouldn't commit this patch so as to test
statistics dump and restore, but in case we need the test to pass till
the bug is fixed, we should merge this patch to 0001 before
committing.

If the reported bug isn't fixed before feature freeze I think we should commit
this regardless as it has clearly shown value by finding bugs (though perhaps
under PG_TEST_EXTRA or in some disconnected till the bug is fixed to limit the
blast-radius in the buildfarm).

--
Daniel Gustafsson

#65

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#63)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-24, Ashutosh Bapat wrote:

One concern I have with directory format is the dumped database is not
readable. This might make investigating a but identified the test a
bit more complex.

Oh, it's readable all right. You just need to use `pg_restore -f-` to
read it. No big deal.

So I ran this a few times:
/usr/bin/time make -j8 -Otarget -C /pgsql/build/master check-world -s PROVE_FLAGS="-c -j6" > /dev/null

commenting out the call to test_regression_dump_restore() to test how
much additional runtime does the new test incur.

With test:

136.95user 116.56system 1:13.23elapsed 346%CPU (0avgtext+0avgdata 250704maxresident)k
4928inputs+55333008outputs (114major+14784937minor)pagefaults 0swaps

138.11user 117.43system 1:15.54elapsed 338%CPU (0avgtext+0avgdata 278592maxresident)k
48inputs+55333464outputs (80major+14794494minor)pagefaults 0swaps

137.05user 113.13system 1:08.19elapsed 366%CPU (0avgtext+0avgdata 279272maxresident)k
48inputs+55330064outputs (83major+14758028minor)pagefaults 0swaps

without the new test:

135.46user 114.55system 1:14.69elapsed 334%CPU (0avgtext+0avgdata 145372maxresident)k
32inputs+55155256outputs (105major+14737549minor)pagefaults 0swaps

135.48user 114.57system 1:09.60elapsed 359%CPU (0avgtext+0avgdata 148224maxresident)k
16inputs+55155432outputs (95major+14749502minor)pagefaults 0swaps

133.76user 113.26system 1:14.92elapsed 329%CPU (0avgtext+0avgdata 148064maxresident)k
48inputs+55154952outputs (84major+14749531minor)pagefaults 0swaps

134.06user 113.83system 1:16.09elapsed 325%CPU (0avgtext+0avgdata 145940maxresident)k
32inputs+55155032outputs (83major+14738602minor)pagefaults 0swaps

The increase in duration here is less than a second.

My conclusion with these numbers is that it's not worth hiding this test
in PG_TEST_EXTRA. If we really wanted to save some total test runtime,
it might be better to write a regress schedule file for
027_stream_regress.pl which only takes the test that emit useful WAL,
rather than all tests.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"The ability of users to misuse tools is, of course, legendary" (David Steele)
/messages/by-id/11b38a96-6ded-4668-b772-40f992132797@pgmasters.net

#66

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#65)

3 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Mon, Mar 24, 2025 at 5:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-24, Ashutosh Bapat wrote:

One concern I have with directory format is the dumped database is not
readable. This might make investigating a but identified the test a
bit more complex.

Oh, it's readable all right. You just need to use `pg_restore -f-` to
read it. No big deal.

So I ran this a few times:
/usr/bin/time make -j8 -Otarget -C /pgsql/build/master check-world -s PROVE_FLAGS="-c -j6" > /dev/null

commenting out the call to test_regression_dump_restore() to test how
much additional runtime does the new test incur.

With test:

136.95user 116.56system 1:13.23elapsed 346%CPU (0avgtext+0avgdata 250704maxresident)k
4928inputs+55333008outputs (114major+14784937minor)pagefaults 0swaps

138.11user 117.43system 1:15.54elapsed 338%CPU (0avgtext+0avgdata 278592maxresident)k
48inputs+55333464outputs (80major+14794494minor)pagefaults 0swaps

137.05user 113.13system 1:08.19elapsed 366%CPU (0avgtext+0avgdata 279272maxresident)k
48inputs+55330064outputs (83major+14758028minor)pagefaults 0swaps

without the new test:

135.46user 114.55system 1:14.69elapsed 334%CPU (0avgtext+0avgdata 145372maxresident)k
32inputs+55155256outputs (105major+14737549minor)pagefaults 0swaps

135.48user 114.57system 1:09.60elapsed 359%CPU (0avgtext+0avgdata 148224maxresident)k
16inputs+55155432outputs (95major+14749502minor)pagefaults 0swaps

133.76user 113.26system 1:14.92elapsed 329%CPU (0avgtext+0avgdata 148064maxresident)k
48inputs+55154952outputs (84major+14749531minor)pagefaults 0swaps

134.06user 113.83system 1:16.09elapsed 325%CPU (0avgtext+0avgdata 145940maxresident)k
32inputs+55155032outputs (83major+14738602minor)pagefaults 0swaps

The increase in duration here is less than a second.

My conclusion with these numbers is that it's not worth hiding this test
in PG_TEST_EXTRA.

Thanks for the conclusion.

On Mon, Mar 24, 2025 at 3:29 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 24 Mar 2025, at 10:54, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

0003 - same as 0002 in the previous patch set. It excludes statistics
from comparison, otherwise the test will fail because of bug reported
at [1]. Ideally we shouldn't commit this patch so as to test
statistics dump and restore, but in case we need the test to pass till
the bug is fixed, we should merge this patch to 0001 before
committing.

If the reported bug isn't fixed before feature freeze I think we should commit
this regardless as it has clearly shown value by finding bugs (though perhaps
under PG_TEST_EXTRA or in some disconnected till the bug is fixed to limit the
blast-radius in the buildfarm).

Combining Alvaro's and Daniel's recommendations, I think we should
squash all the three of my patches while committing the test if the
bug is not fixed by then. Otherwise we should squash first two patches
and commit it. Just attaching the patches again for reference.

If we really wanted to save some total test runtime,
it might be better to write a regress schedule file for
027_stream_regress.pl which only takes the test that emit useful WAL,
rather than all tests.

That's out of scope for this patch, but it seems like an idea worth exploring.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Use-only-one-format-and-make-the-test-run-d-20250324.patchtext/x-patch; charset=US-ASCII; name=0002-Use-only-one-format-and-make-the-test-run-d-20250324.patchDownload

From f26f88364a196dc9589ca451cb54f5e514e3422e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Mon, 24 Mar 2025 11:21:12 +0530
Subject: [PATCH 2/3] Use only one format and make the test run default

According to Alvaro (and I agree with him), the test should be run by
default. Otherwise we get to know about a bug only after buildfarm
animal where it's enabled reports a failure. Further testing only one
format may suffice; since all the formats have shown the same bugs till
now.

If we use --directory format we can use -j which reduces the time taken
by dump/restore test by about 12%.

This patch removes PG_TEST_EXTRA option as well as runs the test only in
directory format with parallelism enabled.

Note for committer: If we decide to accept this change, it should be
merged with the previous commit.
---
 doc/src/sgml/regress.sgml              | 12 ----
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 76 +++++++++-----------------
 2 files changed, 25 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 237b974b3ab..0e5e8e8f309 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,18 +357,6 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
-
-    <varlistentry>
-     <term><literal>regress_dump_test</literal></term>
-     <listitem>
-      <para>
-       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
-       tests dump and restore of regression database left behind by the
-       regression run. Not enabled by default because it is time and resource
-       consuming.
-      </para>
-     </listitem>
-    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index d08eea6693f..cbd9831bf9e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -268,16 +268,9 @@ else
 	# should be done in a separate TAP test, but doing it here saves us one full
 	# regression run.
 	#
-	# This step takes several extra seconds and some extra disk space, so
-	# requires an opt-in with the PG_TEST_EXTRA environment variable.
-	#
 	# Do this while the old cluster is running before it is shut down by the
 	# upgrade test.
-	if (   $ENV{PG_TEST_EXTRA}
-		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
-	{
-		test_regression_dump_restore($oldnode, %node_params);
-	}
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -590,53 +583,34 @@ sub test_regression_dump_restore
 	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
 	$dst_node->start;
 
-	# Test all formats one by one.
-	for my $format ('plain', 'tar', 'directory', 'custom')
-	{
-		my $dump_file = "$tempdir/regression_dump.$format";
-		my $restored_db = 'regression_' . $format;
-
-		# Use --create in dump and restore commands so that the restored
-		# database has the same configurable variable settings as the original
-		# database and the plain dumps taken for comparsion do not differ
-		# because of locale changes. Additionally this provides test coverage
-		# for --create option.
-		$src_node->command_ok(
-			[
-				'pg_dump', "-F$format", '--no-sync',
-				'-d', $src_node->connstr('regression'),
-				'--create', '-f', $dump_file
-			],
-			"pg_dump on source instance in $format format");
+	my $dump_file = "$tempdir/regression.dump";
 
-		my @restore_command;
-		if ($format eq 'plain')
-		{
-			# Restore dump in "plain" format using `psql`.
-			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
-		}
-		else
-		{
-			@restore_command = [
-				'pg_restore', '--create',
-				'-d', 'postgres', $dump_file
-			];
-		}
-		$dst_node->command_ok(@restore_command,
-			"restored dump taken in $format format on destination instance");
+	# Use --create in dump and restore commands so that the restored database
+	# has the same configurable variable settings as the original database so
+	# that the plain dumps taken from both the database taken for comparisong do
+	# not differ because of locale changes. Additionally this provides test
+	# coverage for --create option.
+	#
+	# We use directory format which allows dumping and restoring in parallel to
+	# reduce the test's run time.
+	$src_node->command_ok(
+		[
+			'pg_dump', '-Fd', '-j2', '--no-sync',
+			'-d', $src_node->connstr('regression'),
+			'--create', '-f', $dump_file
+		],
+		"pg_dump on source instance succeeded");
 
-		my $dst_dump =
-		  get_dump_for_comparison($dst_node, 'regression',
-			'dest_dump.' . $format, 0);
+	$dst_node->command_ok(
+		[ 'pg_restore', '--create', '-j2', '-d', 'postgres', $dump_file ],
+		"restored dump to destination instance");
 
-		compare_files($src_dump, $dst_dump,
-			"dump outputs from original and restored regression database (using $format format) match"
-		);
+	my $dst_dump = get_dump_for_comparison($dst_node, 'regression',
+		'dest_dump', 0);
 
-		# Rename the restored database so that it is available for debugging in
-		# case the test fails.
-		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
-	}
+	compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database match"
+		);
 }
 
 # Dump database `db` from the given `node` in plain format and adjust it for
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20250324.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250324.patchDownload

From fcfd0d25ecd374d55970817b4d3ea2aecdd58251 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered many bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.
3. Fixed by d611f8b1587b8f30caa7c0da99ae5d28e914d54f
3. Being discussed on hackers at https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 144 ++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 167 ++++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 00051b85035..d08eea6693f 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +263,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -539,4 +555,128 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. A fresh node is
+# created using given `node_params`, which are expected to be the same ones used
+# to create `src_node`, so as to avoid any differences in the databases.
+#
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	# Test all formats one by one.
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Use --create in dump and restore commands so that the restored
+		# database has the same configurable variable settings as the original
+		# database and the plain dumps taken for comparsion do not differ
+		# because of locale changes. Additionally this provides test coverage
+		# for --create option.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'--create', '-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '--create',
+				'-d', 'postgres', $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, 'regression',
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+
+		# Rename the restored database so that it is available for debugging in
+		# case the test fails.
+		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
+	}
+}
+
+# Dump database `db` from the given `node` in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# `file_prefix` is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# `adjust_child_columns` is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+	# Usually we avoid comparing statistics in our tests since it is flaky by
+	# nature. However, if statistics is dumped and restored it is expected to be
+	# restored as it is i.e. the statistics from the original database and that
+	# from the restored database should match. We turn off autovacuum on the
+	# source and the target database to avoid any statistics update during
+	# restore operation. Hence we do not exclude statistics from dump.
+	$node->command_ok(
+		[
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
+
+Note: Usually we avoid comparing statistics in our tests since it is flaky by
+nature. However, if statistics is dumped and restored it is expected to be
+restored as it is i.e. the statistics from the original database and that from
+the restored database should match. Hence we do not filter statistics from dump,
+if it's dumped.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: 73eba5004a06a744b6b8570e42432b9e9f75997b
-- 
2.34.1

0003-Do-not-dump-statistics-in-the-file-dumped-f-20250324.patchtext/x-patch; charset=US-ASCII; name=0003-Do-not-dump-statistics-in-the-file-dumped-f-20250324.patchDownload

From 435c659489b34a803675abb65144fab6f0550432 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 3/3] Do not dump statistics in the file dumped for comparison

The dumped and restored statistics of a materialized view may differ as
reported in [1].  Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index cbd9831bf9e..abe93a49258 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -630,15 +630,15 @@ sub get_dump_for_comparison
 	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
-	# Usually we avoid comparing statistics in our tests since it is flaky by
-	# nature. However, if statistics is dumped and restored it is expected to be
-	# restored as it is i.e. the statistics from the original database and that
-	# from the restored database should match. We turn off autovacuum on the
-	# source and the target database to avoid any statistics update during
-	# restore operation. Hence we do not exclude statistics from dump.
+	# If statistics is dumped and restored it is expected to be restored as it
+	# is i.e. the statistics from the original database and that from the
+	# restored database should match. We turn off autovacuum on the source and
+	# the target database to avoid any statistics update during restore
+	# operation. But as of now, there are cases when statistics is not being
+	# restored faithfully. Hence for now do not dump statistics.
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

#67

vignesh C

vignesh21@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#66)

Re: Test to dump and restore objects left behind by regression

On Tue, 25 Mar 2025 at 16:09, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Mon, Mar 24, 2025 at 5:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-24, Ashutosh Bapat wrote:

One concern I have with directory format is the dumped database is not
readable. This might make investigating a but identified the test a
bit more complex.

Oh, it's readable all right. You just need to use `pg_restore -f-` to
read it. No big deal.

So I ran this a few times:
/usr/bin/time make -j8 -Otarget -C /pgsql/build/master check-world -s PROVE_FLAGS="-c -j6" > /dev/null

commenting out the call to test_regression_dump_restore() to test how
much additional runtime does the new test incur.

With test:

136.95user 116.56system 1:13.23elapsed 346%CPU (0avgtext+0avgdata 250704maxresident)k
4928inputs+55333008outputs (114major+14784937minor)pagefaults 0swaps

138.11user 117.43system 1:15.54elapsed 338%CPU (0avgtext+0avgdata 278592maxresident)k
48inputs+55333464outputs (80major+14794494minor)pagefaults 0swaps

137.05user 113.13system 1:08.19elapsed 366%CPU (0avgtext+0avgdata 279272maxresident)k
48inputs+55330064outputs (83major+14758028minor)pagefaults 0swaps

without the new test:

135.46user 114.55system 1:14.69elapsed 334%CPU (0avgtext+0avgdata 145372maxresident)k
32inputs+55155256outputs (105major+14737549minor)pagefaults 0swaps

135.48user 114.57system 1:09.60elapsed 359%CPU (0avgtext+0avgdata 148224maxresident)k
16inputs+55155432outputs (95major+14749502minor)pagefaults 0swaps

133.76user 113.26system 1:14.92elapsed 329%CPU (0avgtext+0avgdata 148064maxresident)k
48inputs+55154952outputs (84major+14749531minor)pagefaults 0swaps

134.06user 113.83system 1:16.09elapsed 325%CPU (0avgtext+0avgdata 145940maxresident)k
32inputs+55155032outputs (83major+14738602minor)pagefaults 0swaps

The increase in duration here is less than a second.

My conclusion with these numbers is that it's not worth hiding this test
in PG_TEST_EXTRA.

Thanks for the conclusion.

On Mon, Mar 24, 2025 at 3:29 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 24 Mar 2025, at 10:54, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

0003 - same as 0002 in the previous patch set. It excludes statistics
from comparison, otherwise the test will fail because of bug reported
at [1]. Ideally we shouldn't commit this patch so as to test
statistics dump and restore, but in case we need the test to pass till
the bug is fixed, we should merge this patch to 0001 before
committing.

If the reported bug isn't fixed before feature freeze I think we should commit
this regardless as it has clearly shown value by finding bugs (though perhaps
under PG_TEST_EXTRA or in some disconnected till the bug is fixed to limit the
blast-radius in the buildfarm).

Combining Alvaro's and Daniel's recommendations, I think we should
squash all the three of my patches while committing the test if the
bug is not fixed by then. Otherwise we should squash first two patches
and commit it. Just attaching the patches again for reference.

Couple of minor thoughts:
1) I felt this error message is not conveying the error message correctly:
+       if ($src_node->pg_version != $dst_node->pg_version
+               or defined $src_node->{_install_path})
+       {
+               fail("same version dump and restore test using default
installation");
+               return;
+       }

how about something like below:
fail("source and destination nodes must have the same PostgreSQL
version and default installation paths");

2) Should "`" be ' or " here, we   generally use "`" to enclose commands:
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. A fresh node is
+# created using given `node_params`, which are expected to be the
same ones used
+# to create `src_node`, so as to avoid any differences in the databases.

There are few other instances similarly in the file.

Regards,
Vignesh

#68

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: vignesh C (#67)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 27, 2025 at 6:01 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 25 Mar 2025 at 16:09, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Mon, Mar 24, 2025 at 5:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-24, Ashutosh Bapat wrote:

One concern I have with directory format is the dumped database is not
readable. This might make investigating a but identified the test a
bit more complex.

Oh, it's readable all right. You just need to use `pg_restore -f-` to
read it. No big deal.

So I ran this a few times:
/usr/bin/time make -j8 -Otarget -C /pgsql/build/master check-world -s PROVE_FLAGS="-c -j6" > /dev/null

commenting out the call to test_regression_dump_restore() to test how
much additional runtime does the new test incur.

With test:

136.95user 116.56system 1:13.23elapsed 346%CPU (0avgtext+0avgdata 250704maxresident)k
4928inputs+55333008outputs (114major+14784937minor)pagefaults 0swaps

138.11user 117.43system 1:15.54elapsed 338%CPU (0avgtext+0avgdata 278592maxresident)k
48inputs+55333464outputs (80major+14794494minor)pagefaults 0swaps

137.05user 113.13system 1:08.19elapsed 366%CPU (0avgtext+0avgdata 279272maxresident)k
48inputs+55330064outputs (83major+14758028minor)pagefaults 0swaps

without the new test:

135.46user 114.55system 1:14.69elapsed 334%CPU (0avgtext+0avgdata 145372maxresident)k
32inputs+55155256outputs (105major+14737549minor)pagefaults 0swaps

135.48user 114.57system 1:09.60elapsed 359%CPU (0avgtext+0avgdata 148224maxresident)k
16inputs+55155432outputs (95major+14749502minor)pagefaults 0swaps

133.76user 113.26system 1:14.92elapsed 329%CPU (0avgtext+0avgdata 148064maxresident)k
48inputs+55154952outputs (84major+14749531minor)pagefaults 0swaps

134.06user 113.83system 1:16.09elapsed 325%CPU (0avgtext+0avgdata 145940maxresident)k
32inputs+55155032outputs (83major+14738602minor)pagefaults 0swaps

The increase in duration here is less than a second.

My conclusion with these numbers is that it's not worth hiding this test
in PG_TEST_EXTRA.

Thanks for the conclusion.

On Mon, Mar 24, 2025 at 3:29 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 24 Mar 2025, at 10:54, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

0003 - same as 0002 in the previous patch set. It excludes statistics
from comparison, otherwise the test will fail because of bug reported
at [1]. Ideally we shouldn't commit this patch so as to test
statistics dump and restore, but in case we need the test to pass till
the bug is fixed, we should merge this patch to 0001 before
committing.

If the reported bug isn't fixed before feature freeze I think we should commit
this regardless as it has clearly shown value by finding bugs (though perhaps
under PG_TEST_EXTRA or in some disconnected till the bug is fixed to limit the
blast-radius in the buildfarm).

Combining Alvaro's and Daniel's recommendations, I think we should
squash all the three of my patches while committing the test if the
bug is not fixed by then. Otherwise we should squash first two patches
and commit it. Just attaching the patches again for reference.
Couple of minor thoughts:
1) I felt this error message is not conveying the error message correctly:
+       if ($src_node->pg_version != $dst_node->pg_version
+               or defined $src_node->{_install_path})
+       {
+               fail("same version dump and restore test using default
installation");
+               return;
+       }
how about something like below:
fail("source and destination nodes must have the same PostgreSQL
version and default installation paths");

The text in ok(), fail() etc. are test names and not error messages.
See [1]https://metacpan.org/pod/Test::More. Your suggestion and other versions that I came up with became
too verbose to be test names. So I think the text here is compromise
between conveying enough information and not being too long. We
usually have to pick the testname and lookup the test code to
investigate the failure. This text serves that purpose.

2) Should "`" be ' or " here, we   generally use "`" to enclose commands:
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. A fresh node is
+# created using given `node_params`, which are expected to be the
same ones used
+# to create `src_node`, so as to avoid any differences in the databases.

Looking at prologues or some other functions, I see that we don't add
any decoration around the name of the argument. Hence dropped ``
altogether. Will post it with the next set of patches.

[1]: https://metacpan.org/pod/Test::More

--
Best Wishes,
Ashutosh Bapat

#69

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#68)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-27, Ashutosh Bapat wrote:

On Thu, Mar 27, 2025 at 6:01 PM vignesh C <vignesh21@gmail.com> wrote:

Couple of minor thoughts:
1) I felt this error message is not conveying the error message correctly:
+       if ($src_node->pg_version != $dst_node->pg_version
+               or defined $src_node->{_install_path})
+       {
+               fail("same version dump and restore test using default
installation");
+               return;
+       }
how about something like below:
fail("source and destination nodes must have the same PostgreSQL
version and default installation paths");
The text in ok(), fail() etc. are test names and not error messages.
See [1]. Your suggestion and other versions that I came up with became
too verbose to be test names. So I think the text here is compromise
between conveying enough information and not being too long. We
usually have to pick the testname and lookup the test code to
investigate the failure. This text serves that purpose.

Maybe
fail("roundtrip dump/restore of the regression database")

BTW another idea to shorten this tests's runtime might be to try and
identify which of parallel_schedule tests leave objects behind and
create a shorter schedule with only those (a possible implementation
might keep a list of the slow tests that don't leave any useful object
behind, then filter parallel_schedule to exclude those; this ensures
test files created in the future are still used.)

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)

#70

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Alvaro Herrera (#69)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 27, 2025 at 06:15:06PM +0100, Alvaro Herrera wrote:

BTW another idea to shorten this tests's runtime might be to try and
identify which of parallel_schedule tests leave objects behind and
create a shorter schedule with only those (a possible implementation
might keep a list of the slow tests that don't leave any useful object
behind, then filter parallel_schedule to exclude those; this ensures
test files created in the future are still used.)

I'm not much a fan of approaches that require an extra schedule,
because this is prone to forget the addition of objects that we'd want
to cover for the scope of this thread with the dump/restore
inter-dependencies, failing our goal of having more coverage. And
history has proven that we are quite bad at maintaining multiple
schedules for the regression test suite (remember the serial one or
the standby one in pg_regress?). So we should really do things so as
the schedules are down to a strict minimum: 1.

If we're worried about the time taken by the test (spoiler: I am and
the upgrade tests already show always as last to finish in parallel
runs), I would recommend to put that under a PG_TEST_EXTRA. I'm OK to
add the switch to my buildfarm animals if this option is the consensus
and if it gets into the tree.
--
Michael

#71

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#69)

Re: Test to dump and restore objects left behind by regression

On Thu, Mar 27, 2025 at 10:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-27, Ashutosh Bapat wrote:

On Thu, Mar 27, 2025 at 6:01 PM vignesh C <vignesh21@gmail.com> wrote:
Couple of minor thoughts:
1) I felt this error message is not conveying the error message correctly:
+       if ($src_node->pg_version != $dst_node->pg_version
+               or defined $src_node->{_install_path})
+       {
+               fail("same version dump and restore test using default
installation");
+               return;
+       }
how about something like below:
fail("source and destination nodes must have the same PostgreSQL
version and default installation paths");
The text in ok(), fail() etc. are test names and not error messages.
See [1]. Your suggestion and other versions that I came up with became
too verbose to be test names. So I think the text here is compromise
between conveying enough information and not being too long. We
usually have to pick the testname and lookup the test code to
investigate the failure. This text serves that purpose.
Maybe
fail("roundtrip dump/restore of the regression database")

No, that's losing some information like default installation and the
same version.

--
Best Wishes,
Ashutosh Bapat

#72

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Michael Paquier (#70)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 28, 2025 at 7:07 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 27, 2025 at 06:15:06PM +0100, Alvaro Herrera wrote:

BTW another idea to shorten this tests's runtime might be to try and
identify which of parallel_schedule tests leave objects behind and
create a shorter schedule with only those (a possible implementation
might keep a list of the slow tests that don't leave any useful object
behind, then filter parallel_schedule to exclude those; this ensures
test files created in the future are still used.)

I'm not much a fan of approaches that require an extra schedule,
because this is prone to forget the addition of objects that we'd want
to cover for the scope of this thread with the dump/restore
inter-dependencies, failing our goal of having more coverage. And
history has proven that we are quite bad at maintaining multiple
schedules for the regression test suite (remember the serial one or
the standby one in pg_regress?). So we should really do things so as
the schedules are down to a strict minimum: 1.

I see Alvaro's point about using a different and minimal schedule. We
already have 002_pg_upgrade and 027_stream_ as candidates which could
use schedules other than default and avoid wasting CPU cycles.
But I also agree with your opinion that maintaining multiple schedules
is painful and prone to errors.

What we could do is to create the schedule files automatically during
build. The automation script will require to know which file to place
in which schedules. That information could be either part of the sql
file itself or could be in a separate text file. For example, every
SQL file has the following line listing all the schedules that this
SQL file should be part of. E.g.

-- schedules: parallel, serial, upgrade

The automated script looks at every .sql file in a given sql directory
and creates the schedule files containing all the SQL files which had
respective schedules mentioned in their "schedule" annotation. The
automation script would flag SQL files that do not have scheduled
annotation so any new file added won't be missed. However, we will
still miss a SQL file if it wasn't part of a given schedule and later
acquired some changes which required it to be added to a new schedule.

If we go this route, we could make 'make check-tests' better. We could
add another annotation for depends listing all the SQL files that a
given SQL file depends upon. make check-tests would collect all
dependencies, sort them and run all the dependencies as well.

Of course that's out of scope for this patch. We don't have time left
for this in PG 18.

If we're worried about the time taken by the test (spoiler: I am and
the upgrade tests already show always as last to finish in parallel
runs), I would recommend to put that under a PG_TEST_EXTRA. I'm OK to
add the switch to my buildfarm animals if this option is the consensus
and if it gets into the tree.

I would prefer to run this test by default as Alvaro mentioned
previously. But if that means that we won't get this test committed at
all, I am ok putting it under PG_TEST_EXTRA. (Hence I have kept 0001
and 0002 separate.) But I will be disappointed if the test, which has
unearthed four bugs in a year alone, does not get committed to PG 18
because of this debate.

--
Best Wishes,
Ashutosh Bapat

#73

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#71)

Re: Test to dump and restore objects left behind by regression

Vignesh and Alvaro

On Fri, Mar 28, 2025 at 12:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Maybe
fail("roundtrip dump/restore of the regression database")

No, that's losing some information like default installation and the
same version.

How about "dump and restore across servers with same PostgreSQL
version using default installation". That's still mouthful but is more
readable.

--
Best Wishes,
Ashutosh Bapat

#74

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#71)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-28, Ashutosh Bapat wrote:

No, that's losing some information like default installation and the
same version.

You don't need to preserve such information. This is just a test name.
People looking for more details can grep for the name and they will find
the comments.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

#75

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#74)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 28, 2025 at 4:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-28, Ashutosh Bapat wrote:

No, that's losing some information like default installation and the
same version.

You don't need to preserve such information. This is just a test name.
People looking for more details can grep for the name and they will find
the comments.

Ok. In that case what's wrong with the testname I have in the patch?

--
Best Wishes,
Ashutosh Bapat

#76

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#72)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 28, 2025 at 12:20 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Mar 28, 2025 at 7:07 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 27, 2025 at 06:15:06PM +0100, Alvaro Herrera wrote:

BTW another idea to shorten this tests's runtime might be to try and
identify which of parallel_schedule tests leave objects behind and
create a shorter schedule with only those (a possible implementation
might keep a list of the slow tests that don't leave any useful object
behind, then filter parallel_schedule to exclude those; this ensures
test files created in the future are still used.)

I'm not much a fan of approaches that require an extra schedule,
because this is prone to forget the addition of objects that we'd want
to cover for the scope of this thread with the dump/restore
inter-dependencies, failing our goal of having more coverage. And
history has proven that we are quite bad at maintaining multiple
schedules for the regression test suite (remember the serial one or
the standby one in pg_regress?). So we should really do things so as
the schedules are down to a strict minimum: 1.

I see Alvaro's point about using a different and minimal schedule. We
already have 002_pg_upgrade and 027_stream_ as candidates which could
use schedules other than default and avoid wasting CPU cycles.
But I also agree with your opinion that maintaining multiple schedules
is painful and prone to errors.

What we could do is to create the schedule files automatically during
build. The automation script will require to know which file to place
in which schedules. That information could be either part of the sql
file itself or could be in a separate text file. For example, every
SQL file has the following line listing all the schedules that this
SQL file should be part of. E.g.

-- schedules: parallel, serial, upgrade

The automated script looks at every .sql file in a given sql directory
and creates the schedule files containing all the SQL files which had
respective schedules mentioned in their "schedule" annotation. The
automation script would flag SQL files that do not have scheduled
annotation so any new file added won't be missed. However, we will
still miss a SQL file if it wasn't part of a given schedule and later
acquired some changes which required it to be added to a new schedule.

If we go this route, we could make 'make check-tests' better. We could
add another annotation for depends listing all the SQL files that a
given SQL file depends upon. make check-tests would collect all
dependencies, sort them and run all the dependencies as well.

Of course that's out of scope for this patch. We don't have time left
for this in PG 18.

I spent several hours today examining each SQL file to decide whether
or not it has "interesting" objects that it leaves behind for
dump/restore test. I came up with attached schedule - which may not be
accurate since I it would require much more time to examine all tests
to get an accurate schedule. But what I have got may be close enough.
With that we could save about 6 seconds on my laptop. If we further
compact the schedule reorganizing the parallel groups we may shave
some more seconds.

no modifications to parallel schedule
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.84s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.80s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.37s 28 subtests passed

with attached modified parallel schedule
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
36.13s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
35.86s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
36.33s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
36.02s 28 subtests passed

However, it's a very painful process to come up with the schedule and
more painful and error prone to maintain it. It could take many days
to come up with the right schedule which can become inaccurate the
moment next SQL file is added OR an existing file is modified to
add/drop "interesting" objects.

--
Best Wishes,
Ashutosh Bapat

#77

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#76)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-28, Ashutosh Bapat wrote:

However, it's a very painful process to come up with the schedule and
more painful and error prone to maintain it. It could take many days
to come up with the right schedule which can become inaccurate the
moment next SQL file is added OR an existing file is modified to
add/drop "interesting" objects.

Hmm, I didn't mean that we'd maintain a separate schedule. I meant that
we'd take the existing schedule, then apply some Perl magic to it that
grep-outs the tests that we know to contribute nothing, and generate a
new schedule file dynamically. We don't need to maintain a separate
schedule file.

You're right that if an existing uninteresting test is modified to
create interesting objects, we'd lose coverage of those objects. That
seems a much smaller problem to me. So it's just a matter of doing some
Perl map/grep to generate a new schedule file using the attached
exclusion file.

(For what it's worth, what I did to try to determine which tests to
include, rather than scan each file manually, is to run pg_regress with
"test_setup thetest tablespace", then dump the regression database, and
see if anything is there that's not in the dump when I just with just
"test_setup tablespace". I didn't carry the experiment to completion
though.)

For the future, we could annotate each test as you said, either by
adding a marker on the test file itself, or by adding something next to
its name in the schedule file, so the schedule file could look like:

test: plancache(dump_ignore) limit(stream_ignore) plpgsql copy2
temp(stream_ignore,dump_ignore) domain rangefuncs(stream_ignore)
prepare conversion truncate alter_table
sequence polymorphism rowtypes returning largeobject with xml

... and so on.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#78

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Alvaro Herrera (#77)

Re: Test to dump and restore objects left behind by regression

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Hmm, I didn't mean that we'd maintain a separate schedule. I meant that
we'd take the existing schedule, then apply some Perl magic to it that
grep-outs the tests that we know to contribute nothing, and generate a
new schedule file dynamically. We don't need to maintain a separate
schedule file.

This seems like a fundamentally broken approach to me.

The entire argument for using the core regression tests as a source of
data to test dump/restore is that, more or less "for free", we can
expect to get coverage when new SQL language features are added.
That's always been a little bit questionable --- there's a temptation
to drop objects again at the end of a test script. But with this,
it becomes a complete crapshoot whether the objects you need will be
included in the dump.

I think instead of going this direction, we really need to create a
separately-purposed script that simply creates "one of everything"
without doing anything else (except maybe loading a little data).
I believe it'd be a lot easier to remember to add to that when
inventing new SQL than to remember to leave something behind from the
core regression tests. This would also be far faster to run than any
approach that involves picking a random subset of the core test
scripts.

regards, tom lane

#79

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Tom Lane (#78)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-28, Tom Lane wrote:

I think instead of going this direction, we really need to create a
separately-purposed script that simply creates "one of everything"
without doing anything else (except maybe loading a little data).
I believe it'd be a lot easier to remember to add to that when
inventing new SQL than to remember to leave something behind from the
core regression tests. This would also be far faster to run than any
approach that involves picking a random subset of the core test
scripts.

FWIW this sounds closely related to what I tried to do with
src/test/modules/test_ddl_deparse; it's currently incomplete, but maybe
we can use that as a starting point.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Always assume the user will do much worse than the stupidest thing
you can imagine." (Julien PUYDT)

#80

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#79)

Re: Test to dump and restore objects left behind by regression

On Fri, Mar 28, 2025 at 11:43 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-28, Tom Lane wrote:

I think instead of going this direction, we really need to create a
separately-purposed script that simply creates "one of everything"
without doing anything else (except maybe loading a little data).
I believe it'd be a lot easier to remember to add to that when
inventing new SQL than to remember to leave something behind from the
core regression tests. This would also be far faster to run than any
approach that involves picking a random subset of the core test
scripts.

It's easier to remember to do something or not do something in the
same file than in some other file. I find it hard to believe that
introducing another set of SQL files somewhere far from regress would
make this problem easier.

The number of states in which objects can be left behind in the
regress/sql is very large - and maintaining that 1:1 in some other set
of scripts is impossible unless it's automated.

FWIW this sounds closely related to what I tried to do with
src/test/modules/test_ddl_deparse; it's currently incomplete, but maybe
we can use that as a starting point.

create_table.sql in test_ddl_deparse has only one statement creating
an inheritance table whereas there are dozens of different states of
parent/child tables created by regress. It will require a lot of work
to bridge the gap between regress_ddl_deparse and regress and more
work to maintain it.

I might be missing something in your ideas.

IMO, whatever we do it should rely on a single set of files. One
possible way could be to break the existing files into three files
each, containing DDL, DML and queries from those files respectively
and create three schedules DDL, DML and queries containing the
respective files. These schedules will be run as required. Standard
regression run runs all the three schedules one by one. But
002_pg_upgrade will run DDL and DML on the source database and run
queries on target - thus checking sanity of the dump/restore or
pg_upgrade beyond just the dump comparison. 027_stream_regress might
run DDL, DML on the source server and queries on the target.

But that too is easier said than done for:
1. Our tests mix all three kinds of statements and also rely on the
order in which they are run. It will require some significant effort
to carefully separate the statements.
2. With the new set of files backpatching would become hard.

--
Best Wishes,
Ashutosh Bapat

#81

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Ashutosh Bapat (#80)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Mon, Mar 31, 2025 at 5:07 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The bug related to materialized views has been fixed and now the test
passes even if we compare statistics from dumped and restored
databases. Hence removing 0003. In the attached patchset I have also
addressed Vignesh's below comment

On Thu, Mar 27, 2025 at 10:01 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Mar 27, 2025 at 6:01 PM vignesh C <vignesh21@gmail.com> wrote:
2) Should "`" be ' or " here, we   generally use "`" to enclose commands:
+# It is expected that regression tests, which create `regression` database, are
+# run on `src_node`, which in turn, is left in running state. A fresh node is
+# created using given `node_params`, which are expected to be the
same ones used
+# to create `src_node`, so as to avoid any differences in the databases.
Looking at prologues or some other functions, I see that we don't add
any decoration around the name of the argument. Hence dropped ``
altogether. Will post it with the next set of patches.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Use-only-one-format-and-make-the-test-run-d-20250331.patchtext/x-patch; charset=US-ASCII; name=0002-Use-only-one-format-and-make-the-test-run-d-20250331.patchDownload

From 5ef4a15bf229d104028eac3a046636453e1e05fc Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Mon, 24 Mar 2025 11:21:12 +0530
Subject: [PATCH 2/2] Use only one format and make the test run default

According to Alvaro (and I agree with him), the test should be run by
default. Otherwise we get to know about a bug only after buildfarm
animal where it's enabled reports a failure. Further testing only one
format may suffice; since all the formats have shown the same bugs till
now.

If we use --directory format we can use -j which reduces the time taken
by dump/restore test by about 12%.

This patch removes PG_TEST_EXTRA option as well as runs the test only in
directory format with parallelism enabled.

Note for committer: If we decide to accept this change, it should be
merged with the previous commit.
---
 doc/src/sgml/regress.sgml              | 12 ----
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 76 +++++++++-----------------
 2 files changed, 25 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 237b974b3ab..0e5e8e8f309 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,18 +357,6 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
-
-    <varlistentry>
-     <term><literal>regress_dump_test</literal></term>
-     <listitem>
-      <para>
-       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
-       tests dump and restore of regression database left behind by the
-       regression run. Not enabled by default because it is time and resource
-       consuming.
-      </para>
-     </listitem>
-    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 8d22d538529..f7d5b96ecd2 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -268,16 +268,9 @@ else
 	# should be done in a separate TAP test, but doing it here saves us one full
 	# regression run.
 	#
-	# This step takes several extra seconds and some extra disk space, so
-	# requires an opt-in with the PG_TEST_EXTRA environment variable.
-	#
 	# Do this while the old cluster is running before it is shut down by the
 	# upgrade test.
-	if (   $ENV{PG_TEST_EXTRA}
-		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
-	{
-		test_regression_dump_restore($oldnode, %node_params);
-	}
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -590,53 +583,34 @@ sub test_regression_dump_restore
 	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
 	$dst_node->start;
 
-	# Test all formats one by one.
-	for my $format ('plain', 'tar', 'directory', 'custom')
-	{
-		my $dump_file = "$tempdir/regression_dump.$format";
-		my $restored_db = 'regression_' . $format;
-
-		# Use --create in dump and restore commands so that the restored
-		# database has the same configurable variable settings as the original
-		# database and the plain dumps taken for comparsion do not differ
-		# because of locale changes. Additionally this provides test coverage
-		# for --create option.
-		$src_node->command_ok(
-			[
-				'pg_dump', "-F$format", '--no-sync',
-				'-d', $src_node->connstr('regression'),
-				'--create', '-f', $dump_file
-			],
-			"pg_dump on source instance in $format format");
+	my $dump_file = "$tempdir/regression.dump";
 
-		my @restore_command;
-		if ($format eq 'plain')
-		{
-			# Restore dump in "plain" format using `psql`.
-			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
-		}
-		else
-		{
-			@restore_command = [
-				'pg_restore', '--create',
-				'-d', 'postgres', $dump_file
-			];
-		}
-		$dst_node->command_ok(@restore_command,
-			"restored dump taken in $format format on destination instance");
+	# Use --create in dump and restore commands so that the restored database
+	# has the same configurable variable settings as the original database so
+	# that the plain dumps taken from both the database taken for comparisong do
+	# not differ because of locale changes. Additionally this provides test
+	# coverage for --create option.
+	#
+	# We use directory format which allows dumping and restoring in parallel to
+	# reduce the test's run time.
+	$src_node->command_ok(
+		[
+			'pg_dump', '-Fd', '-j2', '--no-sync',
+			'-d', $src_node->connstr('regression'),
+			'--create', '-f', $dump_file
+		],
+		"pg_dump on source instance succeeded");
 
-		my $dst_dump =
-		  get_dump_for_comparison($dst_node, 'regression',
-			'dest_dump.' . $format, 0);
+	$dst_node->command_ok(
+		[ 'pg_restore', '--create', '-j2', '-d', 'postgres', $dump_file ],
+		"restored dump to destination instance");
 
-		compare_files($src_dump, $dst_dump,
-			"dump outputs from original and restored regression database (using $format format) match"
-		);
+	my $dst_dump = get_dump_for_comparison($dst_node, 'regression',
+		'dest_dump', 0);
 
-		# Rename the restored database so that it is available for debugging in
-		# case the test fails.
-		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
-	}
+	compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database match"
+		);
 }
 
 # Dump database db from the given node in plain format and adjust it for
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20250331.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250331.patchDownload

From aa1c74951b3b557de8330230185fd5f2ee46ecda Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered many bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.
3. Fixed by d611f8b1587b8f30caa7c0da99ae5d28e914d54f
3. Being discussed on hackers at https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 144 ++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 167 ++++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 00051b85035..8d22d538529 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +263,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -539,4 +555,128 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create 'regression' database, are
+# run on src_node, which in turn, is left in running state. A fresh node is
+# created using given node_params, which are expected to be the same ones use
+# to create src_node, so as to avoid any differences in the databases.
+#
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	# Test all formats one by one.
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Use --create in dump and restore commands so that the restored
+		# database has the same configurable variable settings as the original
+		# database and the plain dumps taken for comparsion do not differ
+		# because of locale changes. Additionally this provides test coverage
+		# for --create option.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'--create', '-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '--create',
+				'-d', 'postgres', $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, 'regression',
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+
+		# Rename the restored database so that it is available for debugging in
+		# case the test fails.
+		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
+	}
+}
+
+# Dump database db from the given node in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# file_prefix is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# adjust_child_columns is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+	# Usually we avoid comparing statistics in our tests since it is flaky by
+	# nature. However, if statistics is dumped and restored it is expected to be
+	# restored as it is i.e. the statistics from the original database and that
+	# from the restored database should match. We turn off autovacuum on the
+	# source and the target database to avoid any statistics update during
+	# restore operation. Hence we do not exclude statistics from dump.
+	$node->command_ok(
+		[
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
+
+Note: Usually we avoid comparing statistics in our tests since it is flaky by
+nature. However, if statistics is dumped and restored it is expected to be
+restored as it is i.e. the statistics from the original database and that from
+the restored database should match. Hence we do not filter statistics from dump,
+if it's dumped.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: e2809e3a1015697832ee4d37b75ba1cd0caac0f0
-- 
2.34.1

#82

Daniel Gustafsson

daniel@yesql.se

10 months ago

In reply to: Alvaro Herrera (#79)

Re: Test to dump and restore objects left behind by regression

On 28 Mar 2025, at 19:12, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-28, Tom Lane wrote:

I think instead of going this direction, we really need to create a
separately-purposed script that simply creates "one of everything"
without doing anything else (except maybe loading a little data).
I believe it'd be a lot easier to remember to add to that when
inventing new SQL than to remember to leave something behind from the
core regression tests. This would also be far faster to run than any
approach that involves picking a random subset of the core test
scripts.

FWIW this sounds closely related to what I tried to do with
src/test/modules/test_ddl_deparse; it's currently incomplete, but maybe
we can use that as a starting point.

Given where we are in the cycle, it seems to make sense to stick to using the
schedule we already have rather than invent a new process for generating it,
and work on that for 19?

--
Daniel Gustafsson

#83

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Daniel Gustafsson (#82)

Re: Test to dump and restore objects left behind by regression

On 2025-Mar-31, Daniel Gustafsson wrote:

Given where we are in the cycle, it seems to make sense to stick to using the
schedule we already have rather than invent a new process for generating it,
and work on that for 19?

No objections to that. I'll see about getting this committed during my
morning today, so that I have plenty of time to watch the buildfarm.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Are you not unsure you want to delete Firefox?
[Not unsure] [Not not unsure] [Cancel]
http://smylers.hates-software.com/2008/01/03/566e45b2.html

#84

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#83)

3 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Tue, Apr 1, 2025 at 11:52 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Mar-31, Daniel Gustafsson wrote:

Given where we are in the cycle, it seems to make sense to stick to using the
schedule we already have rather than invent a new process for generating it,
and work on that for 19?

No objections to that. I'll see about getting this committed during my
morning today, so that I have plenty of time to watch the buildfarm.

Thanks Alvaro.
Just today morning, I found something which looks like another bug in
statistics dump/restore [1]/messages/by-id/CAExHW5sFOgcUkVtZ8=QCAE+jv=sbNdBKq0xZCNJTh7019ZM+CQ@mail.gmail.com. As Daniel has expressed upthread [2], we
should go ahead and commit the test even if the bug is not fixed. But
in case it creates a lot of noise and makes the build farm red, we
could suppress the failure by not dumping statistics for comparison
till the bug is fixed. PFA patchset which reintroduces 0003 which
suppresses the statistics dump - in case we think it's needed. I have
made some minor cosmetic changes to 0001 and 0002 as well.

I will also watch buildfarm too, once you commit the patch.

[1]: /messages/by-id/CAExHW5sFOgcUkVtZ8=QCAE+jv=sbNdBKq0xZCNJTh7019ZM+CQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Use-only-one-format-and-make-the-test-run-d-20250401.patchtext/x-patch; charset=US-ASCII; name=0002-Use-only-one-format-and-make-the-test-run-d-20250401.patchDownload

From 28a146c7cdaf581d889cec90f541bb42046e7904 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Mon, 24 Mar 2025 11:21:12 +0530
Subject: [PATCH 2/3] Use only one format and make the test run default

According to Alvaro (and I agree with him), the test should be run by
default. Otherwise we get to know about a bug only after buildfarm
animal where it's enabled reports a failure. Further testing only one
format may suffice; since all the formats have shown the same bugs till
now.

If we use --directory format we can use -j which reduces the time taken
by dump/restore test by about 12%.

This patch removes PG_TEST_EXTRA option as well as runs the test only in
directory format with parallelism enabled.

Note for committer: If we decide to accept this change, it should be
merged with the previous commit.
---
 doc/src/sgml/regress.sgml              | 12 -----
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 75 +++++++++-----------------
 2 files changed, 24 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 237b974b3ab..0e5e8e8f309 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,18 +357,6 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
-
-    <varlistentry>
-     <term><literal>regress_dump_test</literal></term>
-     <listitem>
-      <para>
-       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
-       tests dump and restore of regression database left behind by the
-       regression run. Not enabled by default because it is time and resource
-       consuming.
-      </para>
-     </listitem>
-    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 8d22d538529..71dc25ca938 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -268,16 +268,9 @@ else
 	# should be done in a separate TAP test, but doing it here saves us one full
 	# regression run.
 	#
-	# This step takes several extra seconds and some extra disk space, so
-	# requires an opt-in with the PG_TEST_EXTRA environment variable.
-	#
 	# Do this while the old cluster is running before it is shut down by the
 	# upgrade test.
-	if (   $ENV{PG_TEST_EXTRA}
-		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
-	{
-		test_regression_dump_restore($oldnode, %node_params);
-	}
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -590,53 +583,33 @@ sub test_regression_dump_restore
 	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
 	$dst_node->start;
 
-	# Test all formats one by one.
-	for my $format ('plain', 'tar', 'directory', 'custom')
-	{
-		my $dump_file = "$tempdir/regression_dump.$format";
-		my $restored_db = 'regression_' . $format;
-
-		# Use --create in dump and restore commands so that the restored
-		# database has the same configurable variable settings as the original
-		# database and the plain dumps taken for comparsion do not differ
-		# because of locale changes. Additionally this provides test coverage
-		# for --create option.
-		$src_node->command_ok(
-			[
-				'pg_dump', "-F$format", '--no-sync',
-				'-d', $src_node->connstr('regression'),
-				'--create', '-f', $dump_file
-			],
-			"pg_dump on source instance in $format format");
+	my $dump_file = "$tempdir/regression.dump";
 
-		my @restore_command;
-		if ($format eq 'plain')
-		{
-			# Restore dump in "plain" format using `psql`.
-			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
-		}
-		else
-		{
-			@restore_command = [
-				'pg_restore', '--create',
-				'-d', 'postgres', $dump_file
-			];
-		}
-		$dst_node->command_ok(@restore_command,
-			"restored dump taken in $format format on destination instance");
+	# Use --create in dump and restore commands so that the restored database
+	# has the same configurable variable settings as the original database so
+	# that the plain dumps taken from both the database taken for comparisong do
+	# not differ because of locale changes. Additionally this provides test
+	# coverage for --create option.
+	#
+	# We use directory format which allows dumping and restoring in parallel to
+	# reduce the test's run time.
+	$src_node->command_ok(
+		[
+			'pg_dump', '-Fd', '-j2', '--no-sync',
+			'-d', $src_node->connstr('regression'),
+			'--create', '-f', $dump_file
+		],
+		"pg_dump on source instance succeeded");
 
-		my $dst_dump =
-		  get_dump_for_comparison($dst_node, 'regression',
-			'dest_dump.' . $format, 0);
+	$dst_node->command_ok(
+		[ 'pg_restore', '--create', '-j2', '-d', 'postgres', $dump_file ],
+		"restored dump to destination instance");
 
-		compare_files($src_dump, $dst_dump,
-			"dump outputs from original and restored regression database (using $format format) match"
-		);
+	my $dst_dump =
+	  get_dump_for_comparison($dst_node, 'regression', 'dest_dump', 0);
 
-		# Rename the restored database so that it is available for debugging in
-		# case the test fails.
-		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
-	}
+	compare_files($src_dump, $dst_dump,
+		"dump outputs from original and restored regression database match");
 }
 
 # Dump database db from the given node in plain format and adjust it for
-- 
2.34.1

0001-Test-pg_dump-restore-of-regression-objects-20250401.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20250401.patchDownload

From d501856270c5cca7a843f7d05ccf59d55ced4c03 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 1/3] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade of the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. Hence the
new testcase covers dump and restore scenarios not covered by individual
dump/restore cases. Till now 002_pg_upgrade only tested dump/restore
through pg_upgrade which only uses binary mode. Many regression tests
mention that they leave objects behind for dump/restore testing but they
are not tested in a non-binary mode. The new testcase closes that
gap.

Testing dump and restore of regression database makes this test run
longer for a relatively smaller benefit. Hence run it only when
explicitly requested by user by specifying "regress_dump_test" in
PG_TEST_EXTRA.

Note For the reviewers:
The new test has uncovered many bugs so far in one year.
1. Introduced by 14e87ffa5c54. Fixed in fd41ba93e4630921a72ed5127cd0d552a8f3f8fc.
2. Introduced by 0413a556990ba628a3de8a0b58be020fd9a14ed0. Reverted in 74563f6b90216180fc13649725179fc119dddeb5.
3. Fixed by d611f8b1587b8f30caa7c0da99ae5d28e914d54f
3. Being discussed on hackers at https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire, Daniel Gustafsson, Tom Lane, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
---
 doc/src/sgml/regress.sgml                   |  12 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl      | 144 ++++++++++++++++-
 src/test/perl/Makefile                      |   2 +
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 167 ++++++++++++++++++++
 src/test/perl/meson.build                   |   1 +
 5 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 0e5e8e8f309..237b974b3ab 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,6 +357,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>regress_dump_test</literal></term>
+     <listitem>
+      <para>
+       When enabled, <filename>src/bin/pg_upgrade/t/002_pg_upgrade.pl</filename>
+       tests dump and restore of regression database left behind by the
+       regression run. Not enabled by default because it is time and resource
+       consuming.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
 
    Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 00051b85035..8d22d538529 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,7 @@ use File::Path     qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -35,8 +36,8 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
 sub filter_dump
 {
 	my ($is_old, $old_version, $dump_file) = @_;
@@ -262,6 +263,21 @@ else
 		}
 	}
 	is($rc, 0, 'regression tests pass');
+
+	# Test dump/restore of the objects left behind by regression. Ideally it
+	# should be done in a separate TAP test, but doing it here saves us one full
+	# regression run.
+	#
+	# This step takes several extra seconds and some extra disk space, so
+	# requires an opt-in with the PG_TEST_EXTRA environment variable.
+	#
+	# Do this while the old cluster is running before it is shut down by the
+	# upgrade test.
+	if (   $ENV{PG_TEST_EXTRA}
+		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
+	{
+		test_regression_dump_restore($oldnode, %node_params);
+	}
 }
 
 # Initialize a new node for the upgrade.
@@ -539,4 +555,128 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Test dump and restore of objects left behind by the regression run.
+#
+# It is expected that regression tests, which create 'regression' database, are
+# run on src_node, which in turn, is left in running state. A fresh node is
+# created using given node_params, which are expected to be the same ones use
+# to create src_node, so as to avoid any differences in the databases.
+#
+# Plain dumps from both the nodes are compared to make sure that all the dumped
+# objects are restored faithfully.
+sub test_regression_dump_restore
+{
+	my ($src_node, %node_params) = @_;
+	my $dst_node = PostgreSQL::Test::Cluster->new('dst_node');
+
+	# Make sure that the source and destination nodes have the same version and
+	# do not use custom install paths. In both the cases, the dump files may
+	# require additional adjustments unknown to code here. Do not run this test
+	# in such a case to avoid utilizing the time and resources unnecessarily.
+	if ($src_node->pg_version != $dst_node->pg_version
+		or defined $src_node->{_install_path})
+	{
+		fail("same version dump and restore test using default installation");
+		return;
+	}
+
+	# Dump the original database for comparison later.
+	my $src_dump =
+	  get_dump_for_comparison($src_node, 'regression', 'src_dump', 1);
+
+	# Setup destination database cluster
+	$dst_node->init(%node_params);
+	# Stabilize stats for comparison.
+	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
+	$dst_node->start;
+
+	# Test all formats one by one.
+	for my $format ('plain', 'tar', 'directory', 'custom')
+	{
+		my $dump_file = "$tempdir/regression_dump.$format";
+		my $restored_db = 'regression_' . $format;
+
+		# Use --create in dump and restore commands so that the restored
+		# database has the same configurable variable settings as the original
+		# database and the plain dumps taken for comparsion do not differ
+		# because of locale changes. Additionally this provides test coverage
+		# for --create option.
+		$src_node->command_ok(
+			[
+				'pg_dump', "-F$format", '--no-sync',
+				'-d', $src_node->connstr('regression'),
+				'--create', '-f', $dump_file
+			],
+			"pg_dump on source instance in $format format");
+
+		my @restore_command;
+		if ($format eq 'plain')
+		{
+			# Restore dump in "plain" format using `psql`.
+			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
+		}
+		else
+		{
+			@restore_command = [
+				'pg_restore', '--create',
+				'-d', 'postgres', $dump_file
+			];
+		}
+		$dst_node->command_ok(@restore_command,
+			"restored dump taken in $format format on destination instance");
+
+		my $dst_dump =
+		  get_dump_for_comparison($dst_node, 'regression',
+			'dest_dump.' . $format, 0);
+
+		compare_files($src_dump, $dst_dump,
+			"dump outputs from original and restored regression database (using $format format) match"
+		);
+
+		# Rename the restored database so that it is available for debugging in
+		# case the test fails.
+		$dst_node->safe_psql('postgres', "ALTER DATABASE regression RENAME TO $restored_db");
+	}
+}
+
+# Dump database db from the given node in plain format and adjust it for
+# comparing dumps from the original and the restored database.
+#
+# file_prefix is used to create unique names for all dump files so that they
+# remain available for debugging in case the test fails.
+#
+# adjust_child_columns is passed to adjust_regress_dumpfile() which actually
+# adjusts the dump output.
+#
+# The name of the file containting adjusted dump is returned.
+sub get_dump_for_comparison
+{
+	my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+	my $dump_adjusted = "${dumpfile}_adjusted";
+
+	# Usually we avoid comparing statistics in our tests since it is flaky by
+	# nature. However, if statistics is dumped and restored it is expected to be
+	# restored as it is i.e. the statistics from the original database and that
+	# from the restored database should match. We turn off autovacuum on the
+	# source and the target database to avoid any statistics update during
+	# restore operation. Hence we do not exclude statistics from dump.
+	$node->command_ok(
+		[
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
+		],
+		'dump for comparison succeeded');
+
+	open(my $dh, '>', $dump_adjusted)
+	  || die
+	  "could not open $dump_adjusted for writing the adjusted dump: $!";
+	print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+		$adjust_child_columns);
+	close($dh);
+
+	return $dump_adjusted;
+}
+
 done_testing();
diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index d82fb67540e..def89650ead 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -26,6 +26,7 @@ install: all installdirs
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/BackgroundPsql.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustUpgrade.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/AdjustDump.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	$(INSTALL_DATA) $(srcdir)/PostgreSQL/Version.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 uninstall:
@@ -36,6 +37,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/BackgroundPsql.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustUpgrade.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/AdjustDump.pm'
 	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Version.pm'
 
 endif
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
new file mode 100644
index 00000000000..74b9a60cf34
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -0,0 +1,167 @@
+
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustDump - helper module for dump and restore tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustDump;
+
+  # Adjust contents of dump output file so that dump output from original
+  # regression database and that from the restored regression database match
+  $dump = adjust_regress_dumpfile($dump, $adjust_child_columns);
+
+=head1 DESCRIPTION
+
+C<PostgreSQL::Test::AdjustDump> encapsulates various hacks needed to
+compare the results of dump and restore tests
+
+=cut
+
+package PostgreSQL::Test::AdjustDump;
+
+use strict;
+use warnings FATAL => 'all';
+
+use Exporter 'import';
+use Test::More;
+
+our @EXPORT = qw(
+  adjust_regress_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $dump = adjust_regress_dumpfile($dump, $adjust_child_columns)
+
+If we take dump of the regression database left behind after running regression
+tests, restore the dump, and take dump of the restored regression database, the
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
+orders of these child tables in the original database and those in the restored
+database differ, causing difference in the dump outputs. See MergeAttributes()
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C<CREATE TABLE... INHERITS> statements in the dump
+file from original database to match those from the restored database. We could,
+instead, adjust the statements in the dump from the restored database to match
+those from original database or adjust both to a canonical order. But we have
+chosen to adjust the statements in the dump from original database for no
+particular reason.
+
+Case 2: When dumping COPY statements the columns are ordered by their attribute
+number by fmtCopyColumnList(). If a column is added to a parent table after a
+child has inherited the parent and the child has its own columns, the attribute
+number of the column changes after restoring the child table. This is because
+when executing the dumped C<CREATE TABLE... INHERITS> statement all the parent
+attributes are created before any child attributes. Thus the order of columns in
+COPY statements dumped from the original and the restored databases,
+respectively, differs. Such tables in regression tests are listed below. It is
+hard to adjust the column order in the COPY statement along with the data. Hence
+we just remove such COPY statements from the dump output.
+
+Additionally the routine adjusts blank and new lines to avoid noise.
+
+Note: Usually we avoid comparing statistics in our tests since it is flaky by
+nature. However, if statistics is dumped and restored it is expected to be
+restored as it is i.e. the statistics from the original database and that from
+the restored database should match. Hence we do not filter statistics from dump,
+if it's dumped.
+
+Arguments:
+
+=over
+
+=item C<dump>: Contents of dump file
+
+=item C<adjust_child_columns>: 1 indicates that the given dump file requires
+adjusting columns in the child tables; usually when the dump is from original
+database. 0 indicates no such adjustment is needed; usually when the dump is
+from restored database.
+
+=back
+
+Returns the adjusted dump text.
+
+=cut
+
+sub adjust_regress_dumpfile
+{
+	my ($dump, $adjust_child_columns) = @_;
+
+	# use Unix newlines
+	$dump =~ s/\r\n/\n/g;
+
+	# Adjust the CREATE TABLE ... INHERITS statements.
+	if ($adjust_child_columns)
+	{
+		my $saved_dump = $dump;
+
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_stored_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\sgenerated_virtual_tests\.gtestxx_4\s\()
+				   (\n\s+b\sinteger),
+				   (\n\s+a\sinteger\sNOT\sNULL)/$1$3,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied generated_virtual_tests.gtestxx_4 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c1\s\()
+				   (\n\s+int_four\sbigint),
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint)/$1$4,$2,$3/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c1 adjustments');
+
+		$saved_dump = $dump;
+		$dump =~ s/(^CREATE\sTABLE\spublic\.test_type_diff2_c2\s\()
+				   (\n\s+int_eight\sbigint),
+				   (\n\s+int_two\ssmallint),
+				   (\n\s+int_four\sbigint)/$1$3,$4,$2/mgx;
+		ok($saved_dump ne $dump,
+			'applied public.test_type_diff2_c2 adjustments');
+	}
+
+	# Remove COPY statements with differing column order
+	for my $table (
+		'public\.b_star', 'public\.c_star',
+		'public\.cc2', 'public\.d_star',
+		'public\.e_star', 'public\.f_star',
+		'public\.renamecolumnanother', 'public\.renamecolumnchild',
+		'public\.test_type_diff2_c1', 'public\.test_type_diff2_c2',
+		'public\.test_type_diff_c')
+	{
+		$dump =~ s/^COPY\s$table\s\(.+?^\\\.$//sm;
+	}
+
+	# Suppress blank lines, as some places in pg_dump emit more or fewer.
+	$dump =~ s/\n\n+/\n/g;
+
+	return $dump;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
index 58e30f15f9d..492ca571ff8 100644
--- a/src/test/perl/meson.build
+++ b/src/test/perl/meson.build
@@ -14,4 +14,5 @@ install_data(
   'PostgreSQL/Test/Cluster.pm',
   'PostgreSQL/Test/BackgroundPsql.pm',
   'PostgreSQL/Test/AdjustUpgrade.pm',
+  'PostgreSQL/Test/AdjustDump.pm',
   install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')

base-commit: af0c248557aecb335462d980cb7319bdf85a5c66
-- 
2.34.1

0003-Do-not-dump-statistics-in-the-file-dumped-f-20250401.patchtext/x-patch; charset=US-ASCII; name=0003-Do-not-dump-statistics-in-the-file-dumped-f-20250401.patchDownload

From 33faeadb5e3b52b9e0131ee55a54b9f71d085c2d Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Tue, 25 Feb 2025 11:42:51 +0530
Subject: [PATCH 3/3] Do not dump statistics in the file dumped for comparison

The dumped and restored statistics of a materialized view may differ as
reported in [1].  Hence do not dump the statistics to avoid differences
in the dump output from the original and restored database.

[1] https://www.postgresql.org/message-id/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Ashutosh Bapat
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 71dc25ca938..aed1dcfde62 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -629,15 +629,15 @@ sub get_dump_for_comparison
 	my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
-	# Usually we avoid comparing statistics in our tests since it is flaky by
-	# nature. However, if statistics is dumped and restored it is expected to be
-	# restored as it is i.e. the statistics from the original database and that
-	# from the restored database should match. We turn off autovacuum on the
-	# source and the target database to avoid any statistics update during
-	# restore operation. Hence we do not exclude statistics from dump.
+	# If statistics is dumped and restored it is expected to be restored as it
+	# is i.e. the statistics from the original database and that from the
+	# restored database should match. We turn off autovacuum on the source and
+	# the target database to avoid any statistics update during restore
+	# operation. But as of now, there are cases when statistics is not being
+	# restored faithfully. Hence for now do not dump statistics.
 	$node->command_ok(
 		[
-			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			'pg_dump', '--no-sync', '--no-statistics', '-d', $node->connstr($db), '-f',
 			$dumpfile
 		],
 		'dump for comparison succeeded');
-- 
2.34.1

#85

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#84)

Re: Test to dump and restore objects left behind by regression

On 2025-Apr-01, Ashutosh Bapat wrote:

Just today morning, I found something which looks like another bug in
statistics dump/restore [1]. As Daniel has expressed upthread [2], we
should go ahead and commit the test even if the bug is not fixed. But
in case it creates a lot of noise and makes the build farm red, we
could suppress the failure by not dumping statistics for comparison
till the bug is fixed. PFA patchset which reintroduces 0003 which
suppresses the statistics dump - in case we think it's needed. I have
made some minor cosmetic changes to 0001 and 0002 as well.

I have made some changes of my own, and included --no-statistics.
But I had already started messing with your patch, so I didn't look at
the cosmetic changes you did here. If they're still relevant, please
send them my way.

Hopefully it won't break, and if it does, it's likely fault of the
changes I made. I've run it through CI and all is well though, so
fingers crossed.
https://cirrus-ci.com/build/6327169669922816

I observe in the CI results that the pg_upgrade test is not necessarily
the last one to finish. In one case it even finished in place 12!

[16:36:48.447] 12/332 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 112.16s 22 subtests passed
https://api.cirrus-ci.com/v1/task/5803071017582592/logs/test_world.log

... but if we still find that it's too slow, we can make it into a
PG_TEST_EXTRA test easily with a "skip" line. (Or we can add
a new PG_TEST_EXCLUDE thingy for impatient people).

Thanks!

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#86

Daniel Gustafsson

daniel@yesql.se

10 months ago

In reply to: Alvaro Herrera (#85)

Re: Test to dump and restore objects left behind by regression

On 1 Apr 2025, at 19:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Thanks!

Thanks for taking this one across the finishing line!

--
Daniel Gustafsson

#87

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#85)

Re: Test to dump and restore objects left behind by regression

On Tue, Apr 1, 2025 at 10:31 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Apr-01, Ashutosh Bapat wrote:

Just today morning, I found something which looks like another bug in
statistics dump/restore [1]. As Daniel has expressed upthread [2], we
should go ahead and commit the test even if the bug is not fixed. But
in case it creates a lot of noise and makes the build farm red, we
could suppress the failure by not dumping statistics for comparison
till the bug is fixed. PFA patchset which reintroduces 0003 which
suppresses the statistics dump - in case we think it's needed. I have
made some minor cosmetic changes to 0001 and 0002 as well.

I have made some changes of my own, and included --no-statistics.
But I had already started messing with your patch, so I didn't look at
the cosmetic changes you did here. If they're still relevant, please
send them my way.

Thanks a lot. I hope the test will now reveal the problems before they
are committed :)

You have edited those places anyway. So it's ok.

I have closed the CF entry
https://commitfest.postgresql.org/patch/4564/ committed. I will
create another CF entry to park --no-statistics reversal change. That
way, we will know when statistics dump/restore has become stable.

Hopefully it won't break, and if it does, it's likely fault of the
changes I made. I've run it through CI and all is well though, so
fingers crossed.
https://cirrus-ci.com/build/6327169669922816

I observe in the CI results that the pg_upgrade test is not necessarily
the last one to finish. In one case it even finished in place 12!

[16:36:48.447] 12/332 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 112.16s 22 subtests passed
https://api.cirrus-ci.com/v1/task/5803071017582592/logs/test_world.log

Yes. Few animals that I sampled, the test is finishing pretty early
even though it's taking longer than many other tests. But it's not the
longest. I also looked at red animals, but none of them report this
test to be failing.

--
Best Wishes,
Ashutosh Bapat

#88

Alvaro Herrera

alvherre@alvh.no-ip.org

10 months ago

In reply to: Ashutosh Bapat (#87)

Re: Test to dump and restore objects left behind by regression

On 2025-Apr-02, Ashutosh Bapat wrote:

I have closed the CF entry
https://commitfest.postgresql.org/patch/4564/ committed. I will
create another CF entry to park --no-statistics reversal change. That
way, we will know when statistics dump/restore has become stable.

No commitfest entry please. Better to add an open item on the wiki
page.
https://wiki.postgresql.org/wiki/Open_Items

Yes. Few animals that I sampled, the test is finishing pretty early
even though it's taking longer than many other tests. But it's not the
longest. I also looked at red animals, but none of them report this
test to be failing.

Yay. Still, I don't think this is a reason not to seek a way to
optimize the test run time in one of the ways we discussed.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801

#89

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

10 months ago

In reply to: Alvaro Herrera (#88)

Re: Test to dump and restore objects left behind by regression

Hi Alvaro,

On Wed, Apr 2, 2025 at 2:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Apr-02, Ashutosh Bapat wrote:

I have closed the CF entry
https://commitfest.postgresql.org/patch/4564/ committed. I will
create another CF entry to park --no-statistics reversal change. That
way, we will know when statistics dump/restore has become stable.

No commitfest entry please. Better to add an open item on the wiki
page.
https://wiki.postgresql.org/wiki/Open_Items

Posted it on the thread where I have reported the bug. Hopefully, we
will commit both the bug fix and test change to enable stats together.

Yes. Few animals that I sampled, the test is finishing pretty early
even though it's taking longer than many other tests. But it's not the
longest. I also looked at red animals, but none of them report this
test to be failing.

Yay. Still, I don't think this is a reason not to seek a way to
optimize the test run time in one of the ways we discussed.

yes. Sure.

--
Best Wishes,
Ashutosh Bapat

#90

vignesh C

vignesh21@gmail.com

9 months ago

In reply to: Ashutosh Bapat (#87)

Re: Test to dump and restore objects left behind by regression

On Wed, 2 Apr 2025 at 13:49, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Apr 1, 2025 at 10:31 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Apr-01, Ashutosh Bapat wrote:

Just today morning, I found something which looks like another bug in
statistics dump/restore [1]. As Daniel has expressed upthread [2], we
should go ahead and commit the test even if the bug is not fixed. But
in case it creates a lot of noise and makes the build farm red, we
could suppress the failure by not dumping statistics for comparison
till the bug is fixed. PFA patchset which reintroduces 0003 which
suppresses the statistics dump - in case we think it's needed. I have
made some minor cosmetic changes to 0001 and 0002 as well.

I have made some changes of my own, and included --no-statistics.
But I had already started messing with your patch, so I didn't look at
the cosmetic changes you did here. If they're still relevant, please
send them my way.

Thanks a lot. I hope the test will now reveal the problems before they
are committed :)

You have edited those places anyway. So it's ok.

I have closed the CF entry
https://commitfest.postgresql.org/patch/4564/ committed. I will
create another CF entry to park --no-statistics reversal change. That
way, we will know when statistics dump/restore has become stable.

Hopefully it won't break, and if it does, it's likely fault of the
changes I made. I've run it through CI and all is well though, so
fingers crossed.
https://cirrus-ci.com/build/6327169669922816

I observe in the CI results that the pg_upgrade test is not necessarily
the last one to finish. In one case it even finished in place 12!

[16:36:48.447] 12/332 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 112.16s 22 subtests passed
https://api.cirrus-ci.com/v1/task/5803071017582592/logs/test_world.log

Yes. Few animals that I sampled, the test is finishing pretty early
even though it's taking longer than many other tests. But it's not the
longest. I also looked at red animals, but none of them report this
test to be failing.

I believe this commitfest entry at [1]https://commitfest.postgresql.org/patch/4956/ can be closed now, as the
buildfarm has been running stably for the past few days.
[1]: https://commitfest.postgresql.org/patch/4956/

Regards,
Vignesh

#91

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

9 months ago

In reply to: vignesh C (#90)

Re: Test to dump and restore objects left behind by regression

On Thu, Apr 3, 2025 at 9:29 AM vignesh C <vignesh21@gmail.com> wrote:

I believe this commitfest entry at [1] can be closed now, as the
buildfarm has been running stably for the past few days.
[1] - https://commitfest.postgresql.org/patch/4956/

I intended to close this but closed another entry by mistake. If
possible let's keep this entry open for a few days. I am trying
something so that we could remove --no-statistics as well.

--
Best Wishes,
Ashutosh Bapat

#92

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

9 months ago

In reply to: Ashutosh Bapat (#89)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Wed, Apr 2, 2025 at 3:36 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

No commitfest entry please. Better to add an open item on the wiki
page.
https://wiki.postgresql.org/wiki/Open_Items

Posted it on the thread where I have reported the bug. Hopefully, we
will commit both the bug fix and test change to enable stats together.

Looks like the problem is in the test itself as pointed out by Jeff in
[1]: /messages/by-id/5f3703fd7f27da62a8f3615218f937507f522347.camel@j-davis.com

The test file is arranged as follows
1. Setup old cluster (this step also runs regression if needed)
2. create new cluster for upgrade by modifying some configuration from
the old cluster.
3. disable autovacuum on old cluster
4. Run dump/restore roundtrip test which creates a destination cluster
with the same configuration as the old cluster

A note about variable name changes and introduction of new variables.
We run step 2 between 1 and 3 so that autovacuum gets a chance to run
on the old cluster and update statistics. Autovacuum run is not
necessary but useful here. Before these changes all the cluster
initializations were using the same variables @initdb_params and
%node_params. However with these changes, we modify the variable in
step 2 and then again require original values in step 4. So I have
used two sets of variables prefixed with old_ and new_ for clusters
created in 1st step and 2nd step respectively. 4th step uses the
variables with prefix old_. I think this change eliminates confusion
caused by using same variables with different values.

[1]: /messages/by-id/5f3703fd7f27da62a8f3615218f937507f522347.camel@j-davis.com

I will watch CF CI run to see if we see difference in statistics even
after this change.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Fix-differences-in-dumped-statistics-20250403.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-differences-in-dumped-statistics-20250403.patchDownload

From d3ba3f4259a4594dbb43e43e540c977a2346c523 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Wed, 2 Apr 2025 15:04:16 +0530
Subject: [PATCH] Fix differences in dumped statistics

Autovacuum is turned off after running dump/restore roundtrip test. This
test takes executes following steps sequentially.

1. takes a dump, with statistcs, from source database for comparison
2. initializes the destination database cluster
3. takes a dump to be restored on destination cluster
4. restores the dump
5. takes a dump from destination cluster for comparison

If autovacuum is triggered between 1st and 3rd step the statistics in
the dumps taken for comparison may differ, causing the test to fail. In
the testfile, autovacuum is turned off on the original cluster before
taking dump for upgrade test. Move the dump/restore roundtrip test after
that step so that statistics remain stable on the original cluster for
the entire duration of the test.

With this change we expect that there will be no difference in the
statistics dumped from the original and the restored databases. Hence
enable dumping statistics for comparison.

Author: Ashutosh Bapat (ashutosh.bapat.oss@gmail.com)
Analysis-by: Jeff Davis (pgsql@j-davis.com)
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 148 ++++++++++++-------------
 1 file changed, 74 insertions(+), 74 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 7494614ee64..49ee6ae3003 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -86,7 +86,7 @@ sub get_dump_for_comparison
 	# Don't dump statistics, because there are still some bugs.
 	$node->run_log(
 		[
-			'pg_dump', '--no-sync', '--no-statistics',
+			'pg_dump', '--no-sync',
 			'-d' => $node->connstr($db),
 			'-f' => $dumpfile
 		]);
@@ -128,7 +128,7 @@ my $oldnode =
   PostgreSQL::Test::Cluster->new('old_node',
 	install_path => $ENV{oldinstall});
 
-my %node_params = ();
+my %old_node_params = ();
 
 # To increase coverage of non-standard segment size and group access without
 # increasing test runtime, run these tests with a custom setting.
@@ -194,34 +194,34 @@ else
 my %encodings = ('UTF-8' => 6, 'SQL_ASCII' => 0);
 my $original_encoding = $encodings{$original_enc_name};
 
-my @initdb_params = @custom_opts;
+my @old_initdb_params = @custom_opts;
 
-push @initdb_params, ('--encoding', $original_enc_name);
-push @initdb_params, ('--lc-collate', $original_datcollate);
-push @initdb_params, ('--lc-ctype', $original_datctype);
+push @old_initdb_params, ('--encoding', $original_enc_name);
+push @old_initdb_params, ('--lc-collate', $original_datcollate);
+push @old_initdb_params, ('--lc-ctype', $original_datctype);
 
 # add --locale-provider, if supported
 my %provider_name = ('b' => 'builtin', 'i' => 'icu', 'c' => 'libc');
 if ($oldnode->pg_version >= 15)
 {
-	push @initdb_params,
+	push @old_initdb_params,
 	  ('--locale-provider', $provider_name{$original_provider});
 	if ($original_provider eq 'b')
 	{
-		push @initdb_params, ('--builtin-locale', $original_datlocale);
+		push @old_initdb_params, ('--builtin-locale', $original_datlocale);
 	}
 	elsif ($original_provider eq 'i')
 	{
-		push @initdb_params, ('--icu-locale', $original_datlocale);
+		push @old_initdb_params, ('--icu-locale', $original_datlocale);
 	}
 }
 
 # Since checksums are now enabled by default, and weren't before 18,
 # pass '-k' to initdb on old versions so that upgrades work.
-push @initdb_params, '-k' if $oldnode->pg_version < 18;
+push @old_initdb_params, '-k' if $oldnode->pg_version < 18;
 
-$node_params{extra} = \@initdb_params;
-$oldnode->init(%node_params);
+$old_node_params{extra} = \@old_initdb_params;
+$oldnode->init(%old_node_params);
 $oldnode->start;
 
 my $result;
@@ -301,74 +301,19 @@ else
 	is($rc, 0, 'regression tests pass');
 }
 
-# Test that dump/restore of the regression database roundtrips cleanly.  This
-# doesn't work well when the nodes are different versions, so skip it in that
-# case.  Note that this isn't a pg_restore test, but it's convenient to do it
-# here because we've gone to the trouble of creating the regression database.
-#
-# Do this while the old cluster is running before it is shut down by the
-# upgrade test.
-SKIP:
-{
-	my $dstnode = PostgreSQL::Test::Cluster->new('dst_node');
-
-	skip "different Postgres versions"
-	  if ($oldnode->pg_version != $dstnode->pg_version);
-	skip "source node not using default install"
-	  if (defined $oldnode->install_path);
-
-	# Dump the original database for comparison later.
-	my $src_dump =
-	  get_dump_for_comparison($oldnode, 'regression', 'src_dump', 1);
-
-	# Setup destination database cluster
-	$dstnode->init(%node_params);
-	# Stabilize stats for comparison.
-	$dstnode->append_conf('postgresql.conf', 'autovacuum = off');
-	$dstnode->start;
-
-	my $dump_file = "$tempdir/regression.dump";
-
-	# Use --create in dump and restore commands so that the restored
-	# database has the same configurable variable settings as the original
-	# database so that the dumps taken from both databases taken do not
-	# differ because of locale changes. Additionally this provides test
-	# coverage for --create option.
-	#
-	# Use directory format so that we can use parallel dump/restore.
-	$oldnode->command_ok(
-		[
-			'pg_dump', '-Fd', '-j2', '--no-sync',
-			'-d' => $oldnode->connstr('regression'),
-			'--create', '-f' => $dump_file
-		],
-		'pg_dump on source instance');
-
-	$dstnode->command_ok(
-		[ 'pg_restore', '--create', '-j2', '-d' => 'postgres', $dump_file ],
-		'pg_restore to destination instance');
-
-	my $dst_dump =
-	  get_dump_for_comparison($dstnode, 'regression', 'dest_dump', 0);
-
-	compare_files($src_dump, $dst_dump,
-		'dump outputs from original and restored regression databases match');
-}
-
 # Initialize a new node for the upgrade.
 my $newnode = PostgreSQL::Test::Cluster->new('new_node');
 
-# Reset to original parameters.
-@initdb_params = @custom_opts;
 
 # The new cluster will be initialized with different locale settings,
 # but these settings will be overwritten with those of the original
 # cluster.
-push @initdb_params, ('--encoding', 'SQL_ASCII');
-push @initdb_params, ('--locale-provider', 'libc');
-
-$node_params{extra} = \@initdb_params;
-$newnode->init(%node_params);
+my %new_node_params = %old_node_params;
+my @new_initdb_params = @custom_opts;
+push @new_initdb_params, ('--encoding', 'SQL_ASCII');
+push @new_initdb_params, ('--locale-provider', 'libc');
+$new_node_params{extra} = \@new_initdb_params;
+$newnode->init(%new_node_params);
 
 # Stabilize stats for comparison.
 $newnode->append_conf('postgresql.conf', 'autovacuum = off');
@@ -410,10 +355,65 @@ if (defined($ENV{oldinstall}))
 	}
 }
 
-# Stabilize stats before pg_dumpall.
+# Stabilize stats before pg_dumpall. Doing it after initializing the new node
+# gives enough time for autovacuum to update statistics on the old node.
 $oldnode->append_conf('postgresql.conf', 'autovacuum = off');
 $oldnode->restart;
 
+# Test that dump/restore of the regression database roundtrips cleanly.  This
+# doesn't work well when the nodes are different versions, so skip it in that
+# case.  Note that this isn't a pg_restore test, but it's convenient to do it
+# here because we've gone to the trouble of creating the regression database.
+#
+# Do this while the old cluster is running before it is shut down by the
+# upgrade test but after turning its autovacuum off for stable statistics.
+SKIP:
+{
+	my $dstnode = PostgreSQL::Test::Cluster->new('dst_node');
+
+	skip "different Postgres versions"
+	  if ($oldnode->pg_version != $dstnode->pg_version);
+	skip "source node not using default install"
+	  if (defined $oldnode->install_path);
+
+	# Setup destination database cluster with the same configuration as the
+	# source cluster to avoid any differences between dumps taken from both the
+	# clusters caused by differences in their configurations.
+	$dstnode->init(%old_node_params);
+	# Stabilize stats for comparison.
+	$dstnode->append_conf('postgresql.conf', 'autovacuum = off');
+	$dstnode->start;
+
+	# Use --create in dump and restore commands so that the restored
+	# database has the same configurable variable settings as the original
+	# database so that the dumps taken from both databases taken do not
+	# differ because of locale changes. Additionally this provides test
+	# coverage for --create option.
+	#
+	# Use directory format so that we can use parallel dump/restore.
+	my $dump_file = "$tempdir/regression.dump";
+	$oldnode->command_ok(
+		[
+			'pg_dump', '-Fd', '-j2', '--no-sync',
+			'-d' => $oldnode->connstr('regression'),
+			'--create', '-f' => $dump_file
+		],
+		'pg_dump on source instance');
+
+	$dstnode->command_ok(
+		[ 'pg_restore', '--create', '-j2', '-d' => 'postgres', $dump_file ],
+		'pg_restore to destination instance');
+
+	# Dump original and restored database for comparison.
+	my $src_dump =
+	  get_dump_for_comparison($oldnode, 'regression', 'src_dump', 1);
+	my $dst_dump =
+	  get_dump_for_comparison($dstnode, 'regression', 'dest_dump', 0);
+
+	compare_files($src_dump, $dst_dump,
+		'dump outputs from original and restored regression databases match');
+}
+
 # Take a dump before performing the upgrade as a base comparison. Note
 # that we need to use pg_dumpall from the new node here.
 my @dump_command = (

base-commit: a7187c3723b41057522038c5e5db329d84f41ac4
-- 
2.34.1

#93

Alvaro Herrera

alvherre@alvh.no-ip.org

9 months ago

In reply to: Ashutosh Bapat (#92)

Re: Test to dump and restore objects left behind by regression

On 2025-Apr-03, Ashutosh Bapat wrote:

Looks like the problem is in the test itself as pointed out by Jeff in
[1]. PFA patch fixing the test and enabling statistics back.

Thanks, pushed.

A note about variable name changes and introduction of new variables.
We run step 2 between 1 and 3 so that autovacuum gets a chance to run
on the old cluster and update statistics. Autovacuum run is not
necessary but useful here. Before these changes all the cluster
initializations were using the same variables @initdb_params and
%node_params. However with these changes, we modify the variable in
step 2 and then again require original values in step 4. So I have
used two sets of variables prefixed with old_ and new_ for clusters
created in 1st step and 2nd step respectively. 4th step uses the
variables with prefix old_. I think this change eliminates confusion
caused by using same variables with different values.

This was a good change, thanks.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"

#94

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

9 months ago

In reply to: Alvaro Herrera (#93)

Re: Test to dump and restore objects left behind by regression

On Thu, Apr 3, 2025 at 1:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Apr-03, Ashutosh Bapat wrote:

Looks like the problem is in the test itself as pointed out by Jeff in
[1]. PFA patch fixing the test and enabling statistics back.

Thanks, pushed.

Thanks.

--
Best Wishes,
Ashutosh Bapat

#95

Andres Freund

andres@anarazel.de

9 months ago

In reply to: Alvaro Herrera (#93)

Re: Test to dump and restore objects left behind by regression

Hi,

On 2025-04-03 10:20:09 +0200, Alvaro Herrera wrote:

On 2025-Apr-03, Ashutosh Bapat wrote:

Looks like the problem is in the test itself as pointed out by Jeff in
[1]. PFA patch fixing the test and enabling statistics back.

Thanks, pushed.

Since then the pg_upgrade tests have been failing on skink/valgrind, due to
exceeding the already substantially increased timeout.

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-04-03%2007%3A06%3A19&stg=pg_upgrade-check
(note that there are other issues in that run)

284/333 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade TIMEOUT 10000.66s killed by signal 15 SIGTERM

[10:38:19.815](16.712s) ok 20 - check that locales in new cluster match original cluster
...
# Running: pg_dumpall --no-sync --dbname port=15114 host=/tmp/bh_AdT5uvQ dbname='postgres' --file /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_gp2G/dump2.sql
death by signal at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 181.
...
[10:44:11.720](351.905s) # Tests were run but no plan was declared and done_testing() was not seen.

I've increased the timeout even further, but I can't say that I am happy about
the slowest test getting even slower. Adding test time in the serially slowest
test is way worse than adding the same time in a concurrent test.

I suspect that the test will go a bit faster if log_statement weren't forced
on, printing that many log lines, with context, does make valgrind slower,
IME. But Cluster.pm forces it to on, and I suspect that putting a global
log_statement=false into TEMP_CONFIG would have it's own disadvantages.

/me and checks prices for increasing the size of skink's host.

Greetings,

Andres

#96

Alvaro Herrera

alvherre@alvh.no-ip.org

9 months ago

In reply to: Andres Freund (#95)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

On 2025-Apr-03, Andres Freund wrote:

I've increased the timeout even further, but I can't say that I am happy about
the slowest test getting even slower. Adding test time in the serially slowest
test is way worse than adding the same time in a concurrent test.

Yeah. We discussed strategies to shorten the runtime, but the agreement
upthread was that we'd look for more elaborate ways to do that
afterwards. As I mentioned, I can see adding something like
PG_TEST_EXCLUDE that we could use to suppress this test on slow hosts.
Would that work for you?

(We also discussed the fact that this was part of 002_pg_upgrade.pl
instead of being elsewhere. The reason is that this depends on the
regression tests having run, and this is the only TAP test that does
that. Well, this one and 027_stream_regress.pl which is even slower.)

I suspect that the test will go a bit faster if log_statement weren't forced
on, printing that many log lines, with context, does make valgrind slower,
IME. But Cluster.pm forces it to on, and I suspect that putting a global
log_statement=false into TEMP_CONFIG would have it's own disadvantages.

I'm sure we can make this change as well somehow, overridding the
setting just 002_pg_upgrade.pl, as attached. I don't think it's
relevant for this particular test. The log files go from 21 MB to
2.4 MB. It's not nothing ...

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
Ni aún el genio más grande llegaría muy lejos si
quisiera sacarlo todo de su propio interior.

Attachments:

nologstatements.patchtext/x-diff; charset=utf-8Download

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 311391d7acd..46203c55baf 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -221,6 +221,8 @@ push @old_initdb_params, '-k' if $oldnode->pg_version < 18;
 
 $old_node_params{extra} = \@old_initdb_params;
 $oldnode->init(%old_node_params);
+# Override log_statement set by Cluster.pm; undesirable for this test
+$oldnode->append_conf('postgresql.conf', 'log_statement = none');
 $oldnode->start;
 
 my $result;
@@ -312,6 +314,7 @@ push @new_initdb_params, ('--encoding', 'SQL_ASCII');
 push @new_initdb_params, ('--locale-provider', 'libc');
 $new_node_params{extra} = \@new_initdb_params;
 $newnode->init(%new_node_params);
+$newnode->append_conf('postgresql.conf', 'log_statement=none');	# see above
 
 # Stabilize stats for comparison.
 $newnode->append_conf('postgresql.conf', 'autovacuum = off');
@@ -379,6 +382,7 @@ SKIP:
 	# source cluster to avoid any differences between dumps taken from both the
 	# clusters caused by differences in their configurations.
 	$dstnode->init(%old_node_params);
+	$dstnode->append_conf('postgresql.conf', 'log_statement=none');
 	# Stabilize stats for comparison.
 	$dstnode->append_conf('postgresql.conf', 'autovacuum = off');
 	$dstnode->start;

#97

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

9 months ago

In reply to: Alvaro Herrera (#96)

2 attachment(s)

Re: Test to dump and restore objects left behind by regression

On Thu, Apr 3, 2025 at 10:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Apr-03, Andres Freund wrote:

I've increased the timeout even further, but I can't say that I am happy about
the slowest test getting even slower. Adding test time in the serially slowest
test is way worse than adding the same time in a concurrent test.

Yeah. We discussed strategies to shorten the runtime, but the agreement
upthread was that we'd look for more elaborate ways to do that
afterwards. As I mentioned, I can see adding something like
PG_TEST_EXCLUDE that we could use to suppress this test on slow hosts.
Would that work for you?

(We also discussed the fact that this was part of 002_pg_upgrade.pl
instead of being elsewhere. The reason is that this depends on the
regression tests having run, and this is the only TAP test that does
that. Well, this one and 027_stream_regress.pl which is even slower.)

I suspect that the test will go a bit faster if log_statement weren't forced
on, printing that many log lines, with context, does make valgrind slower,
IME. But Cluster.pm forces it to on, and I suspect that putting a global
log_statement=false into TEMP_CONFIG would have it's own disadvantages.

I'm sure we can make this change as well somehow, overridding the
setting just 002_pg_upgrade.pl, as attached. I don't think it's
relevant for this particular test. The log files go from 21 MB to
2.4 MB. It's not nothing ...

It doesn't show any time improvement on my laptop, but it may improve
valgrind timing. My valgrind setup is broken, trying to fix it and run
it. I have included this as 0002 in the attached patchset.

0001 is an attempt to reduce runtime of the test by not setting up a
cluster for restoring the database. Instead the test uses the upgraded
node as the target. This works well since we expect the old node and
new node to be running the same version and default install. The only
unpleasantness is 1. dump and restore phases are spatially and
temporally separated 2. The upgraded regression database needs to be
renamed to save its state for diagnosis, if required. But as a result
this saves 3 seconds on my laptop. Earlier we saw that the test added
9 seconds on my laptop and we gained back 3 seconds; doesn't seem bad.
It will show a significant difference in valgrind run.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Reduce-time-taken-by-002_pg_upgrade-test-to-20250404.patchtext/x-patch; charset=US-ASCII; name=0001-Reduce-time-taken-by-002_pg_upgrade-test-to-20250404.patchDownload

From ef7880e1f1774c6875f4b265287f70d844e20ca4 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Fri, 4 Apr 2025 14:26:48 +0530
Subject: [PATCH 1/2] Reduce time taken by 002_pg_upgrade test to run

This test is one of the longest running tests and
172259afb563d35001410dc6daad78b250924038 makes it even longer by adding a test
to dump/restore roundtrip of regression database.

The test already creates two clusters for pg_upgrade test. When these clusters
have same version and do not use custom installation, we run dump/restore test.
The new upgraded cluster could be used as target of restore instead of creating
a new cluster.

But this separates the dump and restore phases of the test spatially and
temporaly since the dump needs to be taken while the old cluster is running and
it can be restored only after new upgraded cluster is running; in-between we run
upgrade test. This separation affects readability of the test and hence wasn't
attempted before. But since runtime of test seems to be more important, we take
a hit on readability. We have added additional comments so as to link the two
phases and improve readability.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 72 ++++++++++++++++----------
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 311391d7acd..8e7ccd2dad0 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -359,30 +359,34 @@ if (defined($ENV{oldinstall}))
 $oldnode->append_conf('postgresql.conf', 'autovacuum = off');
 $oldnode->restart;
 
+# Dump/restore Test.
+#
 # Test that dump/restore of the regression database roundtrips cleanly.  This
 # doesn't work well when the nodes are different versions, so skip it in that
 # case.  Note that this isn't a pg_upgrade test, but it's convenient to do it
 # here because we've gone to the trouble of creating the regression database.
 #
-# Do this while the old cluster is running before it is shut down by the
-# upgrade test but after turning its autovacuum off for stable statistics.
+# We execute this in two parts as follows:
+#
+# Part 1: Take dump from the old cluster while it is running before being shut
+# down by the upgrade test but after turning its autovacuum off for stable
+# statistics. If this part succeeds and is not skipped, it will leave behind
+# dump to be restored and a dump file for comparison.
+#
+# Part 2: The dump is restored on the upgraded cluster once it is running.
+#
+# Though this separates the two parts spatially and temporally, it avoids
+# creating a new cluster, thus saving time (and resources) in this already long
+# running test.
+my $regress_dump_file;
+my $src_dump;
 SKIP:
 {
-	my $dstnode = PostgreSQL::Test::Cluster->new('dst_node');
-
 	skip "different Postgres versions"
-	  if ($oldnode->pg_version != $dstnode->pg_version);
+	  if ($oldnode->pg_version != $newnode->pg_version);
 	skip "source node not using default install"
 	  if (defined $oldnode->install_path);
 
-	# Setup destination database cluster with the same configuration as the
-	# source cluster to avoid any differences between dumps taken from both the
-	# clusters caused by differences in their configurations.
-	$dstnode->init(%old_node_params);
-	# Stabilize stats for comparison.
-	$dstnode->append_conf('postgresql.conf', 'autovacuum = off');
-	$dstnode->start;
-
 	# Use --create in dump and restore commands so that the restored
 	# database has the same configurable variable settings as the original
 	# database so that the dumps taken from both databases taken do not
@@ -390,27 +394,18 @@ SKIP:
 	# coverage for --create option.
 	#
 	# Use directory format so that we can use parallel dump/restore.
-	my $dump_file = "$tempdir/regression.dump";
+	$regress_dump_file = "$tempdir/regression.dump";
 	$oldnode->command_ok(
 		[
 			'pg_dump', '-Fd', '-j2', '--no-sync',
 			'-d' => $oldnode->connstr('regression'),
-			'--create', '-f' => $dump_file
+			'--create', '-f' => $regress_dump_file
 		],
 		'pg_dump on source instance');
 
-	$dstnode->command_ok(
-		[ 'pg_restore', '--create', '-j2', '-d' => 'postgres', $dump_file ],
-		'pg_restore to destination instance');
-
-	# Dump original and restored database for comparison.
-	my $src_dump =
+	# Dump original database for comparison.
+	$src_dump =
 	  get_dump_for_comparison($oldnode, 'regression', 'src_dump', 1);
-	my $dst_dump =
-	  get_dump_for_comparison($dstnode, 'regression', 'dest_dump', 0);
-
-	compare_files($src_dump, $dst_dump,
-		'dump outputs from original and restored regression databases match');
 }
 
 # Take a dump before performing the upgrade as a base comparison. Note
@@ -629,4 +624,29 @@ my $dump2_filtered = filter_dump(0, $oldnode->pg_version, $dump2_file);
 compare_files($dump1_filtered, $dump2_filtered,
 	'old and new dumps match after pg_upgrade');
 
+# Execute Part 2 of Dump/restore Test.
+SKIP:
+{
+	# Skip Part 2 if the dump to be restored and the dump file for comparison do
+	# not exist.  Part 1 was not executed or did not succeed.
+	skip "no dump/restore test"
+	  if not defined $regress_dump_file or not defined $src_dump;
+
+	# Use --create option as explained in Part 1. Rename upgraded regression
+	# database so that pg_restore can succeed and the so that it's available for
+	# diagnosing problems if any.
+	$newnode->safe_psql('postgres', 'ALTER DATABASE regression RENAME TO
+	regression_upgraded');
+	$newnode->command_ok(
+		[ 'pg_restore', '--create', '-j2', '-d' => 'postgres', $regress_dump_file ],
+		'pg_restore to destination instance');
+
+	# Dump restored database for comparison.
+	my $dst_dump =
+	  get_dump_for_comparison($newnode, 'regression', 'dest_dump', 0);
+
+	compare_files($src_dump, $dst_dump,
+		'dump outputs from original and restored regression databases match');
+}
+
 done_testing();

base-commit: 7afca7edef751b8d7c0f5b6402ffcefc11c67fdd
-- 
2.34.1

0002-Turn-off-log_statement-to-save-CPU-cycles-20250404.patchtext/x-patch; charset=US-ASCII; name=0002-Turn-off-log_statement-to-save-CPU-cycles-20250404.patchDownload

From 262fcdab9568e447fade2fba886503aef27dcda7 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Fri, 4 Apr 2025 16:28:20 +0530
Subject: [PATCH 2/2] Turn off log_statement to save CPU cycles

The test tests pg_dump and pg_upgrade utilities. Hence logging actual statements
executed on the server is not worth the CPU cycles spent for that. This
is significant in valgrind environment which slows tests down
considerably.

Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 8e7ccd2dad0..96fc0b4dc73 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -221,6 +221,9 @@ push @old_initdb_params, '-k' if $oldnode->pg_version < 18;
 
 $old_node_params{extra} = \@old_initdb_params;
 $oldnode->init(%old_node_params);
+# Override log_statement set by Cluster.pm; logged statements are not as useful
+# and consume CPU cycles unnecessarily.
+$oldnode->append_conf('postgresql.conf', 'log_statement = none');
 $oldnode->start;
 
 my $result;
@@ -312,6 +315,8 @@ push @new_initdb_params, ('--encoding', 'SQL_ASCII');
 push @new_initdb_params, ('--locale-provider', 'libc');
 $new_node_params{extra} = \@new_initdb_params;
 $newnode->init(%new_node_params);
+# Override log_statement set by Cluster.pm; as explained in case of oldnode.
+$newnode->append_conf('postgresql.conf', 'log_statement=none');
 
 # Stabilize stats for comparison.
 $newnode->append_conf('postgresql.conf', 'autovacuum = off');
-- 
2.34.1

#98

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

9 months ago

In reply to: Ashutosh Bapat (#97)

Re: Test to dump and restore objects left behind by regression

On Fri, Apr 4, 2025 at 4:41 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Apr 3, 2025 at 10:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Apr-03, Andres Freund wrote:

I've increased the timeout even further, but I can't say that I am happy about
the slowest test getting even slower. Adding test time in the serially slowest
test is way worse than adding the same time in a concurrent test.

Yeah. We discussed strategies to shorten the runtime, but the agreement
upthread was that we'd look for more elaborate ways to do that
afterwards. As I mentioned, I can see adding something like
PG_TEST_EXCLUDE that we could use to suppress this test on slow hosts.
Would that work for you?

(We also discussed the fact that this was part of 002_pg_upgrade.pl
instead of being elsewhere. The reason is that this depends on the
regression tests having run, and this is the only TAP test that does
that. Well, this one and 027_stream_regress.pl which is even slower.)

I suspect that the test will go a bit faster if log_statement weren't forced
on, printing that many log lines, with context, does make valgrind slower,
IME. But Cluster.pm forces it to on, and I suspect that putting a global
log_statement=false into TEMP_CONFIG would have it's own disadvantages.

I'm sure we can make this change as well somehow, overridding the
setting just 002_pg_upgrade.pl, as attached. I don't think it's
relevant for this particular test. The log files go from 21 MB to
2.4 MB. It's not nothing ...

It doesn't show any time improvement on my laptop, but it may improve
valgrind timing. My valgrind setup is broken, trying to fix it and run
it. I have included this as 0002 in the attached patchset.

0001 is an attempt to reduce runtime of the test by not setting up a
cluster for restoring the database. Instead the test uses the upgraded
node as the target. This works well since we expect the old node and
new node to be running the same version and default install. The only
unpleasantness is 1. dump and restore phases are spatially and
temporally separated 2. The upgraded regression database needs to be
renamed to save its state for diagnosis, if required. But as a result
this saves 3 seconds on my laptop. Earlier we saw that the test added
9 seconds on my laptop and we gained back 3 seconds; doesn't seem bad.
It will show a significant difference in valgrind run.

Forgot to mention that I made sure that the test is still doing its
work correct by reverting fd41ba93e463 and checking that it brings
back the failure. Also tested export'ing LC_MONETARY to make sure that
the locales in original and restored database remain the same.

--
Best Wishes,
Ashutosh Bapat

#99

Andres Freund

andres@anarazel.de

9 months ago

In reply to: Alvaro Herrera (#96)

Re: Test to dump and restore objects left behind by regression

Hi,

On 2025-04-03 19:14:02 +0200, Alvaro Herrera wrote:

On 2025-Apr-03, Andres Freund wrote:

I've increased the timeout even further, but I can't say that I am happy about
the slowest test getting even slower. Adding test time in the serially slowest
test is way worse than adding the same time in a concurrent test.

Yeah. We discussed strategies to shorten the runtime, but the agreement
upthread was that we'd look for more elaborate ways to do that
afterwards.

I think it's not good to just say "we'll maybe somehow fix it in the future",
particularly if the solution is by no means agreed to. If this were a test
that wasn't already the bottleneck for test cycles, it'd would be a different
story, but it is.

I'm already unhappy that 002_pg_upgrade got noticeably slower due to the stats
dump changes. This made it even worse.

As I mentioned, I can see adding something like PG_TEST_EXCLUDE that we
could use to suppress this test on slow hosts. Would that work for you?

Not particularly well. For one I don't actually think it's good to exclude it
from something like valgrind testing. But it's also something that wouldn't
really be usable locally, I think, given that we'd be expected to keep the
test working. As outlined below, it really affects the test-hack-test loop
times.

(We also discussed the fact that this was part of 002_pg_upgrade.pl
instead of being elsewhere. The reason is that this depends on the
regression tests having run, and this is the only TAP test that does
that. Well, this one and 027_stream_regress.pl which is even slower.)

FWIW, for me 027 is actually considerably faster. In an cassert -O0 build (my
normal development env, I find even -Og too problematic for debugging):

pg_upgrade/002_pg_upgrade 96.61s
recovery/027_stream_regress 66.04s

After
git revert 8806e4e8deb1e21715e031e17181d904825a410e abe56227b2e213755dd3e194c530f5f06467bd7c 172259afb563d35001410dc6daad78b250924038

pg_upgrade/002_pg_upgrade 75.09s

Slower by 29%, far from the 3s increased time I saw mentioned somewhere.

And this really affects the overall test time:

All tests before:
real 1m38.173s
user 5m52.500s
sys 4m23.574s

All tests after:
real 2m0.397s
user 5m53.625s
sys 4m30.518s

The CPU time increase is rather minimal, but the wall clock time increase is
22%.

17:
real 1m14.822s
user 4m2.630s
sys 3m22.384s

We regressed wall clock time *60%* from 17->18. Some test cycle increase is
reasonable and can largely be compensated with hardware, but this cycle we're
growing way faster than hardware gets faster. I don't think that's
sustainable.

On valgrind it's also is very much true that 002_pg_upgrade is slower:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-04-04%2006%3A53%3A47&stg=recovery-check

276/333 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 10337.79s 22 subtests passed
233/333 postgresql:recovery / recovery/027_stream_regress OK 7642.90s 9 subtests passed

To look at the times in a bit more detail:

[09:19:31.371](6146.167s) ok 5 - regression tests pass
...
# Running: pg_dump -Fd -j2 --no-sync -d port=31947 host=/tmp/HlLiWBwv6T dbname='regression' --create -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/regression.dump
[09:26:35.178](201.199s) ok 6 - pg_dump on source instance
# Running: pg_restore --create -j2 -d postgres /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/regression.dump
[09:35:53.079](557.901s) ok 7 - pg_restore to destination instance
# Running: pg_dump --no-sync -d port=31947 host=/tmp/HlLiWBwv6T dbname='regression' -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/src_dump.sql
# Running: pg_dump --no-sync -d port=31949 host=/tmp/HlLiWBwv6T dbname='regression' -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/dest_dump.sql
[09:43:18.485](445.407s) ok 8 - dump outputs from original and restored regression databases match
# Running: pg_dumpall --no-sync --dbname port=31947 host=/tmp/HlLiWBwv6T dbname='postgres' --file /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/dump1.sql
[09:43:45.895](27.410s) ok 9 - dump before running pg_upgrade

I think the time for "pg_dump on source instance" is somewhat misleading, it
includes initdb and starting the server.

But it's pretty obvious that the newly added steps cost quite a bit of time.

I suspect that the test will go a bit faster if log_statement weren't forced
on, printing that many log lines, with context, does make valgrind slower,
IME. But Cluster.pm forces it to on, and I suspect that putting a global
log_statement=false into TEMP_CONFIG would have it's own disadvantages.

I'm sure we can make this change as well somehow, overridding the
setting just 002_pg_upgrade.pl, as attached. I don't think it's
relevant for this particular test. The log files go from 21 MB to
2.4 MB. It's not nothing ...

That is a nice improvement. I have to run a few errands, can check how that
affects valgrind times of a dump and restore of the regression db.

Greetings,

Andres Freund

#100

Andres Freund

andres@anarazel.de

9 months ago

In reply to: Andres Freund (#99)

Re: Test to dump and restore objects left behind by regression

Hi,

On 2025-04-04 12:01:16 -0400, Andres Freund wrote:

FWIW, for me 027 is actually considerably faster. In an cassert -O0 build (my
normal development env, I find even -Og too problematic for debugging):

pg_upgrade/002_pg_upgrade 96.61s
recovery/027_stream_regress 66.04s

After
git revert 8806e4e8deb1e21715e031e17181d904825a410e abe56227b2e213755dd3e194c530f5f06467bd7c 172259afb563d35001410dc6daad78b250924038

pg_upgrade/002_pg_upgrade 75.09s

Slower by 29%, far from the 3s increased time I saw mentioned somewhere.

And this really affects the overall test time:

All tests before:
real 1m38.173s
user 5m52.500s
sys 4m23.574s

All tests after:
real 2m0.397s
user 5m53.625s
sys 4m30.518s

The CPU time increase is rather minimal, but the wall clock time increase is
22%.

17:
real 1m14.822s
user 4m2.630s
sys 3m22.384s

We regressed wall clock time *60%* from 17->18. Some test cycle increase is
reasonable and can largely be compensated with hardware, but this cycle we're
growing way faster than hardware gets faster. I don't think that's
sustainable.

FWIW, with cassert and -O2, it's:

17:
real 0m53.981s
user 3m22.837s
sys 3m24.237s

HEAD:
real 1m19.749s
user 4m54.526s
sys 4m15.657s

so this isn't just due to me using -O0. A 48% increase is better than a 60%
increase, but it's still not sustainable.

Greetings,

Andres Freund

#101

Alvaro Herrera

alvherre@alvh.no-ip.org

5 months ago

In reply to: Andres Freund (#100)

1 attachment(s)

Re: Test to dump and restore objects left behind by regression

Hello,

On 2025-Apr-04, Andres Freund wrote:

FWIW, with cassert and -O2, it's:

17:
real 0m53.981s
user 3m22.837s
sys 3m24.237s

HEAD:
real 1m19.749s
user 4m54.526s
sys 4m15.657s

so this isn't just due to me using -O0. A 48% increase is better than a 60%
increase, but it's still not sustainable.

I happened to notice that this item was still open in the commitfest,
and rereading the thread I now have second thoughts about having it
enabled by default, giving your complaints about speed. How about
applying this to 18 and master?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"This is a foot just waiting to be shot" (Andrew Dunstan)

Attachments:

0001-Skip-expensive-pg_upgrade-test-unless-PG_TEST_EXTRA.patchtext/x-diff; charset=utf-8Download

From f50d28ceba7d98672f16bf5049203ae9745871f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Tue, 5 Aug 2025 16:32:16 +0200
Subject: [PATCH] Skip expensive pg_upgrade test unless PG_TEST_EXTRA

---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 7d82593879d..5afa6fdfe94 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -375,6 +375,8 @@ SKIP:
 {
 	my $dstnode = PostgreSQL::Test::Cluster->new('dst_node');
 
+	skip "not enabled in PG_TEST_EXTRA"
+	  if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bregress_dump_upgrade\b/);
 	skip "different Postgres versions"
 	  if ($oldnode->pg_version != $dstnode->pg_version);
 	skip "source node not using default install"
-- 
2.39.5

#102

Daniel Gustafsson

daniel@yesql.se

5 months ago

In reply to: Alvaro Herrera (#101)

Re: Test to dump and restore objects left behind by regression

On 5 Aug 2025, at 16:33, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I happened to notice that this item was still open in the commitfest,
and rereading the thread I now have second thoughts about having it
enabled by default, giving your complaints about speed. How about
applying this to 18 and master?

Thanks for reviving this. I am +1 on placing this behind PG_TEST_EXTRA as was
discussed upthread.

--
Daniel Gustafsson

#103

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Daniel Gustafsson (#102)

Re: Test to dump and restore objects left behind by regression

Daniel Gustafsson <daniel@yesql.se> writes:

Thanks for reviving this. I am +1 on placing this behind PG_TEST_EXTRA as was
discussed upthread.

+1 here too.

regards, tom lane

#104

Alvaro Herrera

alvherre@kurilemu.de

5 months ago

In reply to: Tom Lane (#103)

Re: Test to dump and restore objects left behind by regression

On 2025-Aug-05, Tom Lane wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

Thanks for reviving this. I am +1 on placing this behind PG_TEST_EXTRA as was
discussed upthread.

+1 here too.

Cool, thanks, done. Now we need a volunteer to set up a buildfarm
animal with this flag ...

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)

#105

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Alvaro Herrera (#104)

Re: Test to dump and restore objects left behind by regression

On Tue, Aug 05, 2025 at 08:11:41PM +0200, Alvaro Herrera wrote:

Cool, thanks, done. Now we need a volunteer to set up a buildfarm
animal with this flag ...

Sure. I have added regress_dump_restore to the configuration of
batta, hachi and gokiburi.
--
Michael

#106

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

5 months ago

In reply to: Alvaro Herrera (#104)

Re: Test to dump and restore objects left behind by regression

On Wed, Aug 6, 2025 at 8:21 AM Alvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-05, Tom Lane wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

Thanks for reviving this. I am +1 on placing this behind PG_TEST_EXTRA as was
discussed upthread.

+1 here too.

Cool, thanks, done. Now we need a volunteer to set up a buildfarm
animal with this flag ...

Your patch didn't contain doc changes. But the commit has it. Thanks.

--
Best Wishes,
Ashutosh Bapat