Test to dump and restore objects left behind by regression

Started by Ashutosh Bapatabout 2 years ago106 messages
Jump to latest
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com

Hi All,
In [1]/messages/by-id/CAExHW5vyqv=XLTcNMzCNccOrHiun_XhYPjcRqeV6dLvZSamriQ@mail.gmail.com we found that having a test to dump and restore objects left
behind by regression test is missing. Such a test would cover many
dump restore scenarios without much effort. It will also help identity
problems described in the same thread [2]/messages/by-id/3462358.1708107856@sss.pgh.pa.us during development itself.

I am starting a new thread to discuss such a test. Attached is a WIP
version of the test. The test does fail at the restore step when
commit 74563f6b90216180fc13649725179fc119dddeb5 is reverted
reintroducing the problem.

Attached WIP test is inspired from
src/bin/pg_upgrade/t/002_pg_upgrade.pl which tests binary-upgrade
dumps. Attached test tests the non-binary-upgrade dumps.

Similar to 0002_pg_upgrade.pl the test uses SQL dumps before and after
dump and restore to make sure that the objects are restored correctly.
The test has some shortcomings
1. Objects which are not dumped at all are never tested.
2. Since the rows are dumped in varying order by the two clusters, the
test only tests schema dump and restore.
3. The order of columns of the inheritance child table differs
depending upon the DDLs used to reach a given state. This introduces
diffs in the SQL dumps before and after restore. The test ignores
these diffs by hardcoding the diff in the test.

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.

Suggestions welcome.

[1]: /messages/by-id/CAExHW5vyqv=XLTcNMzCNccOrHiun_XhYPjcRqeV6dLvZSamriQ@mail.gmail.com
[2]: /messages/by-id/3462358.1708107856@sss.pgh.pa.us

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-WIP-Test-to-dump-and-restore-object-left-be-20240221.patchapplication/x-patch; name=0001-WIP-Test-to-dump-and-restore-object-left-be-20240221.patchDownload+184-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#1)
Re: Test to dump and restore objects left behind by regression

On Wed, Feb 21, 2024 at 12:18:45PM +0530, Ashutosh Bapat wrote:

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.

+	my $rc =
+	  system($ENV{PG_REGRESS}
+		  . " $extra_opts "
+		  . "--dlpath=\"$dlpath\" "
+		  . "--bindir= "
+		  . "--host="
+		  . $node->host . " "
+		  . "--port="
+		  . $node->port . " "
+		  . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+		  . "--max-concurrent-tests=20 "
+		  . "--inputdir=\"$inputdir\" "
+		  . "--outputdir=\"$outputdir\"");

I am not sure that it is a good idea to add a full regression test
cycle while we have already 027_stream_regress.pl that would be enough
to test some dump scenarios. These are very expensive and easy to
notice even with a high level of parallelization of the tests.
--
Michael

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#2)
Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Feb 21, 2024 at 12:18:45PM +0530, Ashutosh Bapat wrote:

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.

+       my $rc =
+         system($ENV{PG_REGRESS}
+                 . " $extra_opts "
+                 . "--dlpath=\"$dlpath\" "
+                 . "--bindir= "
+                 . "--host="
+                 . $node->host . " "
+                 . "--port="
+                 . $node->port . " "
+                 . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+                 . "--max-concurrent-tests=20 "
+                 . "--inputdir=\"$inputdir\" "
+                 . "--outputdir=\"$outputdir\"");

I am not sure that it is a good idea to add a full regression test
cycle while we have already 027_stream_regress.pl that would be enough
to test some dump scenarios.

That test *uses* pg_dump as a way to test whether the two clusters are
in sync. The test might change in future to use some other method to
make sure the two clusters are consistent. Adding the test here to
that test will make that change much harder.

It's not the dump, but restore, we are interested in here. No test
that runs PG_REGRESS also runs pg_restore in non-binary mode.

Also we need to keep this test near other pg_dump tests, not far from them.

These are very expensive and easy to
notice even with a high level of parallelization of the tests.

I agree, but I didn't find a suitable test to ride on.

--
Best Wishes,
Ashutosh Bapat

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#2)
Re: Test to dump and restore objects left behind by regression

On 22.02.24 02:01, Michael Paquier wrote:

On Wed, Feb 21, 2024 at 12:18:45PM +0530, Ashutosh Bapat wrote:

Even with 1 and 2 the test is useful to detect dump/restore anomalies.
I think we should improve 3, but I don't have a good and simpler
solution. I didn't find any way to compare two given clusters in our
TAP test framework. Building it will be a lot of work. Not sure if
it's worth it.

+	my $rc =
+	  system($ENV{PG_REGRESS}
+		  . " $extra_opts "
+		  . "--dlpath=\"$dlpath\" "
+		  . "--bindir= "
+		  . "--host="
+		  . $node->host . " "
+		  . "--port="
+		  . $node->port . " "
+		  . "--schedule=$srcdir/src/test/regress/parallel_schedule "
+		  . "--max-concurrent-tests=20 "
+		  . "--inputdir=\"$inputdir\" "
+		  . "--outputdir=\"$outputdir\"");

I am not sure that it is a good idea to add a full regression test
cycle while we have already 027_stream_regress.pl that would be enough
to test some dump scenarios. These are very expensive and easy to
notice even with a high level of parallelization of the tests.

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

Note that each of these steps could fail.

We have somewhat relied on the pg_upgrade test to provide this testing,
but we have recently discovered that the dumps in binary-upgrade mode
are different enough to not test the normal dumps well.

Yes, this test is a bit expensive. We could save some time by doing the
first dump at the end of the normal regress test and have the pg_dump
test reuse that, but then that would make the regress test run a bit
longer. Is that a better tradeoff?

I have done some timing tests:

master:

pg_dump check: 22s
pg_dump check -j8: 8s
check-world -j8: 2min44s

patched:

pg_dump check: 34s
pg_dump check -j8: 13s
check-world -j8: 2min46s

So overall it doesn't seem that bad.

#5Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#4)
Re: Test to dump and restore objects left behind by regression

On 22 Feb 2024, at 10:16, Peter Eisentraut <peter@eisentraut.org> wrote:

We have somewhat relied on the pg_upgrade test to provide this testing, but we have recently discovered that the dumps in binary-upgrade mode are different enough to not test the normal dumps well.

Yes, this test is a bit expensive. We could save some time by doing the first dump at the end of the normal regress test and have the pg_dump test reuse that, but then that would make the regress test run a bit longer. Is that a better tradeoff?

Something this expensive seems like what PG_TEST_EXTRA is intended for, we
already have important test suites there.

But. We know that the cluster has an interesting state when the pg_upgrade
test starts, could we use that to make a dump/restore test before continuing
with testing pg_upgrade? It can be argued that pg_upgrade shouldn't be
responsible for testing pg_dump, but it's already now a pretty important
testcase for pg_dump in binary upgrade mode so it's that far off. If pg_dump
has bugs then pg_upgrade risks subtly breaking.

When upgrading to the same version, we could perhaps also use this to test a
scenario like: Dump A, restore into B, upgrade B into C, dump C and compare C
to A.

--
Daniel Gustafsson

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Daniel Gustafsson (#5)
Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 22 Feb 2024, at 10:16, Peter Eisentraut <peter@eisentraut.org> wrote:

We have somewhat relied on the pg_upgrade test to provide this testing, but we have recently discovered that the dumps in binary-upgrade mode are different enough to not test the normal dumps well.

Yes, this test is a bit expensive. We could save some time by doing the first dump at the end of the normal regress test and have the pg_dump test reuse that, but then that would make the regress test run a bit longer. Is that a better tradeoff?

Something this expensive seems like what PG_TEST_EXTRA is intended for, we
already have important test suites there.

That's ok with me.

But. We know that the cluster has an interesting state when the pg_upgrade
test starts, could we use that to make a dump/restore test before continuing
with testing pg_upgrade? It can be argued that pg_upgrade shouldn't be
responsible for testing pg_dump, but it's already now a pretty important
testcase for pg_dump in binary upgrade mode so it's that far off. If pg_dump
has bugs then pg_upgrade risks subtly breaking.

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think. However if more people think we should
just add this test 002_pg_upgrade.pl, I am fine with it.

When upgrading to the same version, we could perhaps also use this to test a
scenario like: Dump A, restore into B, upgrade B into C, dump C and compare C
to A.

If comparison of C to A fails, we wouldn't know which step fails. I
would rather compare outputs of each step separately.

--
Best Wishes,
Ashutosh Bapat

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Ashutosh Bapat (#6)
Re: Test to dump and restore objects left behind by regression

On 22 Feb 2024, at 10:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think.

Quite possibly not, but pg_upgrade is already today an important testsuite for
testing pg_dump in binary-upgrade mode so maybe more developers touching
pg_dump should?

When upgrading to the same version, we could perhaps also use this to test a
scenario like: Dump A, restore into B, upgrade B into C, dump C and compare C
to A.

If comparison of C to A fails, we wouldn't know which step fails. I
would rather compare outputs of each step separately.

To be clear, this wasn't intended to replace what you are proposing, but an
idea for using it to test *more* scenarios.

--
Daniel Gustafsson

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#7)
Re: Test to dump and restore objects left behind by regression

On 22.02.24 11:00, Daniel Gustafsson wrote:

On 22 Feb 2024, at 10:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think.

Quite possibly not, but pg_upgrade is already today an important testsuite for
testing pg_dump in binary-upgrade mode so maybe more developers touching
pg_dump should?

Yeah, I think attaching this to the existing pg_upgrade test would be a
good idea. Not only would it save test run time, it would probably also
reduce code duplication.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#4)
Re: Test to dump and restore objects left behind by regression

Peter Eisentraut <peter@eisentraut.org> writes:

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

I'm very much against adding another full run of the core regression
tests to support this. But beyond the problem of not bloating the
check-world test runtime, there is the question of what this would
actually buy us. I doubt that it is worth very much, because
it would not detect bugs-of-omission in pg_dump. As I remarked in
the other thread, if pg_dump is blind to the existence of some
feature or field, testing that the dumps compare equal will fail
to reveal that it didn't restore that property.

I'm not sure what we could do about that. One could imagine writing
some test infrastructure that dumps out the contents of the system
catalogs directly, and comparing that instead of pg_dump output.
But that'd be a lot of infrastructure to write and maintain ...
and it's not real clear why it wouldn't *also* suffer from
I-forgot-to-add-this hazards.

On balance, I think there are good reasons that we've not added
such a test, and I don't believe those tradeoffs have changed.

regards, tom lane

#10Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Peter Eisentraut (#8)
Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 3:50 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 22.02.24 11:00, Daniel Gustafsson wrote:

On 22 Feb 2024, at 10:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Feb 22, 2024 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Somebody looking for dump/restore tests wouldn't search
src/bin/pg_upgrade, I think.

Quite possibly not, but pg_upgrade is already today an important testsuite for
testing pg_dump in binary-upgrade mode so maybe more developers touching
pg_dump should?

Yeah, I think attaching this to the existing pg_upgrade test would be a
good idea. Not only would it save test run time, it would probably also
reduce code duplication.

That's more than one vote for adding the test to 002_pg_ugprade.pl.
Seems fine to me.

--
Best Wishes,
Ashutosh Bapat

#11Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#9)
Re: Test to dump and restore objects left behind by regression

On Thu, Feb 22, 2024 at 8:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

I'm very much against adding another full run of the core regression
tests to support this.

This will be taken care of by Peter's latest idea of augmenting
existing 002_pg_upgrade.pl.

But beyond the problem of not bloating the
check-world test runtime, there is the question of what this would
actually buy us. I doubt that it is worth very much, because
it would not detect bugs-of-omission in pg_dump. As I remarked in
the other thread, if pg_dump is blind to the existence of some
feature or field, testing that the dumps compare equal will fail
to reveal that it didn't restore that property.

I'm not sure what we could do about that. One could imagine writing
some test infrastructure that dumps out the contents of the system
catalogs directly, and comparing that instead of pg_dump output.
But that'd be a lot of infrastructure to write and maintain ...
and it's not real clear why it wouldn't *also* suffer from
I-forgot-to-add-this hazards.

If a developer forgets to add logic to dump objects that their patch
adds, it's hard to detect it, through testing alone, in every possible
case. We need reviewers to take care of that. I don't think that's the
objective of this test case or of pg_upgrade test either.

On balance, I think there are good reasons that we've not added
such a test, and I don't believe those tradeoffs have changed.

I am not aware of those reasons. Are they documented somewhere? Any
pointers to the previous discussion on this topic? Googling "pg_dump
regression pgsql-hackers" returns threads about performance
regressions.

On the flip side, the test I wrote reproduces the COMPRESSION/STORAGE
bug you reported along with a few other bugs in that area which I will
report soon on that thread. I think, that shows that we need such a
test.

--
Best Wishes,
Ashutosh Bapat

#12Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#11)
Re: Test to dump and restore objects left behind by regression

On Fri, Feb 23, 2024 at 10:46 AM Ashutosh Bapat <
ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Feb 22, 2024 at 8:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

The problem is, we don't really have any end-to-end coverage of

dump
restore
dump again
compare the two dumps

with a database with lots of interesting objects in it.

I'm very much against adding another full run of the core regression
tests to support this.

This will be taken care of by Peter's latest idea of augmenting
existing 002_pg_upgrade.pl.

Incorporated the test to 002_pg_ugprade.pl.

Some points for discussion:
1. The test still hardcodes the diffs between two dumps. Haven't found a
better way to do it. I did consider removing the problematic objects from
the regression database but thought against it since we would lose some
coverage.

2. The new code tests dump and restore of just the regression database and
does not use pg_dumpall like pg_upgrade. Should it instead perform
pg_dumpall? I decided against it since a. we are interested in dumping and
restoring objects left behind by regression, b. I didn't find a way to
provide the format option to pg_dumpall. The test could be enhanced to use
different dump formats.

I have added it to the next commitfest.
https://commitfest.postgresql.org/48/4956/

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-pg_dump-restore-regression-objects-20240426.patchtext/x-patch; charset=US-ASCII; name=0001-pg_dump-restore-regression-objects-20240426.patchDownload+117-1
#13Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#12)
Re: Test to dump and restore objects left behind by regression

On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote:

Some points for discussion:
1. The test still hardcodes the diffs between two dumps. Haven't found a
better way to do it. I did consider removing the problematic objects from
the regression database but thought against it since we would lose some
coverage.

2. The new code tests dump and restore of just the regression database and
does not use pg_dumpall like pg_upgrade. Should it instead perform
pg_dumpall? I decided against it since a. we are interested in dumping and
restoring objects left behind by regression, b. I didn't find a way to
provide the format option to pg_dumpall. The test could be enhanced to use
different dump formats.

I have added it to the next commitfest.
https://commitfest.postgresql.org/48/4956/

Ashutosh and I have discussed this patch a bit last week. Here is a
short summary of my input, after I understood what is going on.

+	# We could avoid this by dumping the database loaded from original dump.
+	# But that would change the state of the objects as left behind by the
+	# regression.
+	my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
[...]
+	my ($stdout, $stderr) =
+		run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
+	# Clear file names, line numbers from the diffs; those are not going to
+	# remain the same always. Also clear empty lines and normalize new line
+	# characters across platforms.
+	$stdout =~ s/^\@\@.*$//mg;
+	$stdout =~ s/^.*$dump4_file.*$//mg;
+	$stdout =~ s/^.*$dump5_file.*$//mg;
+	$stdout =~ s/^\s*\n//mg;
+	$stdout =~ s/\r\n/\n/g;
+	$expected_diff =~ s/\r\n/\n/g;
+	is($stdout, $expected_diff, 'old and new dumps match after dump and restore');
+}

I am not a fan of what this patch does, adding the knowledge related
to the dump filtering within 002_pg_upgrade.pl. Please do not take
me wrong, I am not against the idea of adding that within this
pg_upgrade test to save from one full cycle of `make check` to check
the consistency of the dump. My issue is that this logic should be
externalized, and it should be in fewer lines of code.

For the externalization part, Ashutosh and I considered a few ideas,
but one that we found tempting is to create a small .pm, say named
AdjustDump.pm. This would share some rules with the existing
AdjustUpgrade.pm, which would be fine IMO even if there is a small
overlap, documenting the dependency between each module. That makes
the integration with the buildfarm much simpler by not creating more
dependencies with the modules shared between core and the buildfarm
code. For the "shorter" part, one idea that I had is to apply to the
dump a regexp that wipes out the column definitions within the
parenthesis, keeping around the CREATE TABLE and any other attributes
not impacted by the reordering. All that should be documented in the
module, of course.

Another thing would be to improve the backend so as we are able to
a better support for physical column ordering, which would, I assume
(and correct me if I'm wrong!), prevent the reordering of the
attributes like in this inheritance case. But that would not address
the case of dumps taken from older versions with a new version of
pg_dump, which is something that may be interesting to have more tests
for in the long-term. Overall a module sounds like a better solution.
--
Michael

#14Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#13)
Re: Test to dump and restore objects left behind by regression

On Tue, Jun 4, 2024 at 4:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote:

Some points for discussion:
1. The test still hardcodes the diffs between two dumps. Haven't found a
better way to do it. I did consider removing the problematic objects from
the regression database but thought against it since we would lose some
coverage.

2. The new code tests dump and restore of just the regression database

and

does not use pg_dumpall like pg_upgrade. Should it instead perform
pg_dumpall? I decided against it since a. we are interested in dumping

and

restoring objects left behind by regression, b. I didn't find a way to
provide the format option to pg_dumpall. The test could be enhanced to

use

different dump formats.

I have added it to the next commitfest.
https://commitfest.postgresql.org/48/4956/

Ashutosh and I have discussed this patch a bit last week. Here is a
short summary of my input, after I understood what is going on.

+       # We could avoid this by dumping the database loaded from original
dump.
+       # But that would change the state of the objects as left behind by
the
+       # regression.
+       my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
[...]
+       my ($stdout, $stderr) =
+               run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
+       # Clear file names, line numbers from the diffs; those are not
going to
+       # remain the same always. Also clear empty lines and normalize new
line
+       # characters across platforms.
+       $stdout =~ s/^\@\@.*$//mg;
+       $stdout =~ s/^.*$dump4_file.*$//mg;
+       $stdout =~ s/^.*$dump5_file.*$//mg;
+       $stdout =~ s/^\s*\n//mg;
+       $stdout =~ s/\r\n/\n/g;
+       $expected_diff =~ s/\r\n/\n/g;
+       is($stdout, $expected_diff, 'old and new dumps match after dump
and restore');
+}

I am not a fan of what this patch does, adding the knowledge related
to the dump filtering within 002_pg_upgrade.pl. Please do not take
me wrong, I am not against the idea of adding that within this
pg_upgrade test to save from one full cycle of `make check` to check
the consistency of the dump. My issue is that this logic should be
externalized, and it should be in fewer lines of code.

For the externalization part, Ashutosh and I considered a few ideas,
but one that we found tempting is to create a small .pm, say named
AdjustDump.pm. This would share some rules with the existing
AdjustUpgrade.pm, which would be fine IMO even if there is a small
overlap, documenting the dependency between each module. That makes
the integration with the buildfarm much simpler by not creating more
dependencies with the modules shared between core and the buildfarm
code. For the "shorter" part, one idea that I had is to apply to the
dump a regexp that wipes out the column definitions within the
parenthesis, keeping around the CREATE TABLE and any other attributes
not impacted by the reordering. All that should be documented in the
module, of course.

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.

Another thing would be to improve the backend so as we are able to
a better support for physical column ordering, which would, I assume
(and correct me if I'm wrong!), prevent the reordering of the
attributes like in this inheritance case. But that would not address
the case of dumps taken from older versions with a new version of
pg_dump, which is something that may be interesting to have more tests
for in the long-term. Overall a module sounds like a better solution.

Changing the physical order of column of a child table based on the
inherited table seems intentional as per MergeAttributes(). That logic
looks sane by itself. In binary mode pg_dump works very hard to retain the
column order by issuing UPDATE commands against catalog tables. I don't
think mimicking that behaviour is the right choice for non-binary dump. I
agree with your conclusion that we fix it in by fixing the diffs. The code
to do that will be part of a separate module.

--
Best Wishes,
Ashutosh Bapat

#15Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#14)
Re: Test to dump and restore objects left behind by regression

On Wed, Jun 05, 2024 at 05:09:58PM +0530, Ashutosh Bapat wrote:

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.

I had some doubts about PGBuild/Modules/TestUpgradeXversion.pm, but
after double-checking it loads dynamically AdjustUpgrade from the core
tree based on the base path where all the modules are:
# load helper module from source tree
unshift(@INC, "$srcdir/src/test/perl");
require PostgreSQL::Test::AdjustUpgrade;
PostgreSQL::Test::AdjustUpgrade->import;
shift(@INC);

It would be annoying to tweak the buildfarm code more to have a
different behavior depending on the branch of Postgres tested.
Anyway, from what I can see, you could create a new module with the
dump filtering rules that AdjustUpgrade requires without having to
update the buildfarm code.

Changing the physical order of column of a child table based on the
inherited table seems intentional as per MergeAttributes(). That logic
looks sane by itself. In binary mode pg_dump works very hard to retain the
column order by issuing UPDATE commands against catalog tables. I don't
think mimicking that behaviour is the right choice for non-binary dump. I
agree with your conclusion that we fix it in by fixing the diffs. The code
to do that will be part of a separate module.

Thanks.
--
Michael

#16Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#15)
Re: Test to dump and restore objects left behind by regression

Sorry for delay, but here's next version of the patchset for review.

On Thu, Jun 6, 2024 at 5:07 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 05, 2024 at 05:09:58PM +0530, Ashutosh Bapat wrote:

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.

I had some doubts about PGBuild/Modules/TestUpgradeXversion.pm, but
after double-checking it loads dynamically AdjustUpgrade from the core
tree based on the base path where all the modules are:
# load helper module from source tree
unshift(@INC, "$srcdir/src/test/perl");
require PostgreSQL::Test::AdjustUpgrade;
PostgreSQL::Test::AdjustUpgrade->import;
shift(@INC);

It would be annoying to tweak the buildfarm code more to have a
different behavior depending on the branch of Postgres tested.
Anyway, from what I can see, you could create a new module with the
dump filtering rules that AdjustUpgrade requires without having to
update the buildfarm code.

The two filtering rules that I picked from AdjustUpgrade() are a. use unix
style newline b. eliminate blank lines. I think we could copy those rule
into the new module (as done in the patch) without creating any dependency
between modules. There's little gained by creating another perl function
just for those two sed commands. There's no way to do that otherwise. If we
keep those two modules independent, we will be free to change each module
as required in future. Do we need to change buildfarm code to load the
AdjustDump module like above? I am not familiar with the buildfarm code.

Here's a description of patches and some notes
0001
-------
1. Per your suggestion the logic to handle dump output differences is
externalized in PostgreSQL::Test::AdjustDump. Instead of eliminating those
differences altogether from both the dump outputs, the corresponding DDL in
the original dump output is adjusted to look like that from the restored
database. Thus we retain full knowledge of what differences to expect.
2. I have changed the name filter_dump to filter_dump_for_upgrade so as to
differentiate between two adjustments 1. for upgrade and 2. for
dump/restore. Ideally the name should have been adjust_dump_for_ugprade() .
It's more of an adjustment than filtering as indicated by the function it
calls. But I haven't changed that. The new function to adjust dumps for
dump and restore tests is named adjust_dump_for_restore() however.
3. As suggested by Daniel upthread, the test for dump and restore happens
before upgrade which might change the old cluster thus changing the state
of objects left behind by regression. The test is not executed if
regression is not used to create the old cluster.
4. The code to compare two dumps and report differences if any is moved to
its own function compare_dumps() which is used for both upgrade and
dump/restore tests.
The test uses the custom dump format for dumping and restoring the database.

0002
------
This commit expands the previous test to test all dump formats. But as
expected that increases the time taken by this test. On my laptop 0001
takes approx 28 seconds to run the test and with 0002 it takes approx 35
seconds. But there's not much impact on the duration of running all the
tests (2m30s vs 2m40s). The code which creates the DDL statements in the
dump is independent of the dump format. So usually we shouldn't require to
test all the formats in this test. But each format stores the dependencies
between dumped objects in a different manner which would be tested with the
changes in this patch. I think this patch is also useful. If we decide to
keep this test, the patch is intended to be merged into 0001.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Test-dump-and-restore-in-all-formats-20240628.patchtext/x-patch; charset=US-ASCII; name=0002-Test-dump-and-restore-in-all-formats-20240628.patchDownload+54-31
0001-pg_dump-restore-regression-objects-20240628.patchtext/x-patch; charset=US-ASCII; name=0001-pg_dump-restore-regression-objects-20240628.patchDownload+216-18
#17Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#16)
Re: Test to dump and restore objects left behind by regression

On Fri, Jun 28, 2024 at 06:00:07PM +0530, Ashutosh Bapat wrote:

Here's a description of patches and some notes
0001
-------
1. Per your suggestion the logic to handle dump output differences is
externalized in PostgreSQL::Test::AdjustDump. Instead of eliminating those
differences altogether from both the dump outputs, the corresponding DDL in
the original dump output is adjusted to look like that from the restored
database. Thus we retain full knowledge of what differences to expect.
2. I have changed the name filter_dump to filter_dump_for_upgrade so as to
differentiate between two adjustments 1. for upgrade and 2. for
dump/restore. Ideally the name should have been adjust_dump_for_ugprade() .
It's more of an adjustment than filtering as indicated by the function it
calls. But I haven't changed that. The new function to adjust dumps for
dump and restore tests is named adjust_dump_for_restore() however.
3. As suggested by Daniel upthread, the test for dump and restore happens
before upgrade which might change the old cluster thus changing the state
of objects left behind by regression. The test is not executed if
regression is not used to create the old cluster.
4. The code to compare two dumps and report differences if any is moved to
its own function compare_dumps() which is used for both upgrade and
dump/restore tests.
The test uses the custom dump format for dumping and restoring the
database.

At quick glance, that seems to be going in the right direction. Note
that you have forgotten install and uninstall rules for the new .pm
file.

0002 increases more the runtime of a test that's already one of the
longest ones in the tree is not really appealing, I am afraid.
--
Michael

#18Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#17)
Re: Test to dump and restore objects left behind by regression

On Fri, Jul 5, 2024 at 10:59 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jun 28, 2024 at 06:00:07PM +0530, Ashutosh Bapat wrote:

Here's a description of patches and some notes
0001
-------
1. Per your suggestion the logic to handle dump output differences is
externalized in PostgreSQL::Test::AdjustDump. Instead of eliminating

those

differences altogether from both the dump outputs, the corresponding DDL

in

the original dump output is adjusted to look like that from the restored
database. Thus we retain full knowledge of what differences to expect.
2. I have changed the name filter_dump to filter_dump_for_upgrade so as

to

differentiate between two adjustments 1. for upgrade and 2. for
dump/restore. Ideally the name should have been

adjust_dump_for_ugprade() .

It's more of an adjustment than filtering as indicated by the function it
calls. But I haven't changed that. The new function to adjust dumps for
dump and restore tests is named adjust_dump_for_restore() however.
3. As suggested by Daniel upthread, the test for dump and restore happens
before upgrade which might change the old cluster thus changing the state
of objects left behind by regression. The test is not executed if
regression is not used to create the old cluster.
4. The code to compare two dumps and report differences if any is moved

to

its own function compare_dumps() which is used for both upgrade and
dump/restore tests.
The test uses the custom dump format for dumping and restoring the
database.

At quick glance, that seems to be going in the right direction. Note
that you have forgotten install and uninstall rules for the new .pm
file.

Before submitting the patch, I looked for all the places which mention
AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
add AdjustUpgrade.pm in those files?

0002 increases more the runtime of a test that's already one of the
longest ones in the tree is not really appealing, I am afraid.

We could forget 0002. I am fine with that. But I can change the code such
that formats other than "plain" are tested when PG_TEST_EXTRAS contains
"regress_dump_formats". Would that be acceptable?

--
Best Wishes,
Ashutosh Bapat

#19Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#18)
Re: Test to dump and restore objects left behind by regression

On Mon, Jul 08, 2024 at 03:59:30PM +0530, Ashutosh Bapat wrote:

Before submitting the patch, I looked for all the places which mention
AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
add AdjustUpgrade.pm in those files?

Good question. This has not been mentioned on the thread that added
the module:
/messages/by-id/891521.1673657296@sss.pgh.pa.us

And I could see it as being useful if installed. The same applies to
Kerberos.pm, actually. I'll ping that on a new thread.

We could forget 0002. I am fine with that. But I can change the code such
that formats other than "plain" are tested when PG_TEST_EXTRAS contains
"regress_dump_formats". Would that be acceptable?

Interesting idea. That may be acceptable, under the same arguments as
the xid_wraparound one.
--
Michael

#20Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#19)
Re: Test to dump and restore objects left behind by regression

On Tue, Jul 9, 2024 at 1:07 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 08, 2024 at 03:59:30PM +0530, Ashutosh Bapat wrote:

Before submitting the patch, I looked for all the places which mention
AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
add AdjustUpgrade.pm in those files?

Good question. This has not been mentioned on the thread that added
the module:
/messages/by-id/891521.1673657296@sss.pgh.pa.us

And I could see it as being useful if installed. The same applies to
Kerberos.pm, actually. I'll ping that on a new thread.

For now, it may be better to maintain status-quo. If we see a need to
use these modules in future by say extensions or tests outside core
tree, we will add them to meson and make files.

We could forget 0002. I am fine with that. But I can change the code such
that formats other than "plain" are tested when PG_TEST_EXTRAS contains
"regress_dump_formats". Would that be acceptable?

Interesting idea. That may be acceptable, under the same arguments as
the xid_wraparound one.

Done. Added a new entry in PG_TEST_EXTRA documentation.

I have merged the two patches now.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Test-pg_dump-restore-of-regression-objects-20240712.patchtext/x-patch; charset=US-ASCII; name=0001-Test-pg_dump-restore-of-regression-objects-20240712.patchDownload+277-19
#21Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#23)
#25Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#24)
#26Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#25)
#27Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#26)
#28Daniel Gustafsson
daniel@yesql.se
In reply to: Ashutosh Bapat (#27)
#29Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Daniel Gustafsson (#28)
#30Daniel Gustafsson
daniel@yesql.se
In reply to: Ashutosh Bapat (#29)
#31Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Daniel Gustafsson (#30)
#32Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#31)
#33Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#32)
#34Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#34)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#36)
#38Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#37)
#39Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#35)
#40Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#38)
#41Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#39)
#42Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#41)
#43Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#40)
#44Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#43)
#45Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#44)
#46Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#45)
#47Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#46)
#48Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#47)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#48)
#50Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#49)
#51Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#50)
#52Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#51)
#53vignesh C
vignesh21@gmail.com
In reply to: Ashutosh Bapat (#52)
#54Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: vignesh C (#53)
#55Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: vignesh C (#53)
#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#55)
#57Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#56)
#58vignesh C
vignesh21@gmail.com
In reply to: Alvaro Herrera (#54)
#59Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#54)
#60Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#59)
#61Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#60)
#62Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#61)
#63Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#62)
#64Daniel Gustafsson
daniel@yesql.se
In reply to: Ashutosh Bapat (#63)
#65Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#63)
#66Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#65)
#67vignesh C
vignesh21@gmail.com
In reply to: Ashutosh Bapat (#66)
#68Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: vignesh C (#67)
#69Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#68)
#70Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#69)
#71Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#69)
#72Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#70)
#73Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#71)
#74Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#71)
#75Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#74)
#76Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#72)
#77Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#76)
#78Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#77)
#79Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#78)
#80Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#79)
#81Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#80)
#82Daniel Gustafsson
daniel@yesql.se
In reply to: Alvaro Herrera (#79)
#83Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Gustafsson (#82)
#84Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#83)
#85Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#84)
#86Daniel Gustafsson
daniel@yesql.se
In reply to: Alvaro Herrera (#85)
#87Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#85)
#88Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#87)
#89Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#88)
#90vignesh C
vignesh21@gmail.com
In reply to: Ashutosh Bapat (#87)
#91Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: vignesh C (#90)
#92Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#89)
#93Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#92)
#94Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#93)
#95Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#93)
#96Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#95)
#97Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#96)
#98Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#97)
#99Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#96)
#100Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#99)
#101Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#100)
#102Daniel Gustafsson
daniel@yesql.se
In reply to: Alvaro Herrera (#101)
#103Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#102)
#104Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#103)
#105Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#104)
#106Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#104)