Proposal: remove obsolete hot-standby testing infrastructure
The attached proposed patch removes some ancient infrastructure for
manually testing hot standby. I doubt anyone has used this in years,
because AFAICS there is nothing here that's not done better by the
src/test/recovery TAP tests. (Or if there is, we ought to migrate
it into the TAP tests.)
Thoughts?
regards, tom lane
Attachments:
0001-remove-obsolete-hot-standby-tests.patchtext/x-diff; charset=us-ascii; name=0001-remove-obsolete-hot-standby-tests.patchDownload+0-709
Hello Tom,
04.01.2022 00:50, Tom Lane wrote:
The attached proposed patch removes some ancient infrastructure for
manually testing hot standby. I doubt anyone has used this in years,
because AFAICS there is nothing here that's not done better by the
src/test/recovery TAP tests. (Or if there is, we ought to migrate
it into the TAP tests.)Thoughts?
It's hardly that important, but we (Postgres Pro) run this test
regularly to check for primary-standby compatibility. It's useful when
checking binary packages from different minor versions. For example, we
setup postgresql-14.0 and postgresql-14.1 aside (renaming one
installation' directory and changing it's port) and perform the test.
What've found with it was e.g. incompatibility due to linkage of
different libicu versions (that was PgPro-only issue). I don't remember
whether we found something related to PostgreSQL itself, but we
definitely use this test and I'm not sure how to replace it in our setup
with a TAP test. On the other hand, testing binaries is not accustomed
in the community yet, so when such testing will be adopted, probably a
brand new set of tests should emerge.
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes:
04.01.2022 00:50, Tom Lane wrote:
The attached proposed patch removes some ancient infrastructure for
manually testing hot standby. I doubt anyone has used this in years,
because AFAICS there is nothing here that's not done better by the
src/test/recovery TAP tests. (Or if there is, we ought to migrate
it into the TAP tests.)
It's hardly that important, but we (Postgres Pro) run this test
regularly to check for primary-standby compatibility. It's useful when
checking binary packages from different minor versions. For example, we
setup postgresql-14.0 and postgresql-14.1 aside (renaming one
installation' directory and changing it's port) and perform the test.
What've found with it was e.g. incompatibility due to linkage of
different libicu versions (that was PgPro-only issue). I don't remember
whether we found something related to PostgreSQL itself, but we
definitely use this test and I'm not sure how to replace it in our setup
with a TAP test. On the other hand, testing binaries is not accustomed
in the community yet, so when such testing will be adopted, probably a
brand new set of tests should emerge.
Oh, interesting. I definitely concur that testing compatibility of
different builds or minor versions is an important use-case. And
I concede that making src/test/recovery do it would be tricky and
a bit out-of-scope. But having said that, the hs_standby_* scripts
seem like a poor fit for the job too. AFAICS they don't really
test any user data type except integer (so I'm surprised that they
located an ICU incompatibility for you); and they spend a lot of
effort on stuff that I doubt is relevant because it *is* covered
by the TAP tests.
If I were trying to test that topic using available spare parts,
what I'd do is run the regular regression tests on the primary
and see if the standby could track it. Maybe pg_dump from both
servers afterwards and see if the results match, a la the pg_upgrade
test. Bonus points for a script that could run some other pg_regress
suite such as one of the contrib modules, because then you could
check compatibility of those too.
I'm happy to keep the hs_standby_* scripts if there's a live use-case
for them; but I don't see what they're doing for you that wouldn't be
done better by other pg_regress suites.
regards, tom lane
04.01.2022 18:33, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:
It's hardly that important, but we (Postgres Pro) run this test
regularly to check for primary-standby compatibility. It's useful when
checking binary packages from different minor versions. For example, we
setup postgresql-14.0 and postgresql-14.1 aside (renaming one
installation' directory and changing it's port) and perform the test.
What've found with it was e.g. incompatibility due to linkage of
different libicu versions (that was PgPro-only issue). I don't remember
whether we found something related to PostgreSQL itself, but we
definitely use this test and I'm not sure how to replace it in our setup
with a TAP test. On the other hand, testing binaries is not accustomed
in the community yet, so when such testing will be adopted, probably a
brand new set of tests should emerge.Oh, interesting. I definitely concur that testing compatibility of
different builds or minor versions is an important use-case. And
I concede that making src/test/recovery do it would be tricky and
a bit out-of-scope. But having said that, the hs_standby_* scripts
seem like a poor fit for the job too. AFAICS they don't really
test any user data type except integer (so I'm surprised that they
located an ICU incompatibility for you); and they spend a lot of
effort on stuff that I doubt is relevant because it *is* covered
by the TAP tests.
An ICU incompatibility was detected due to our invention [1] "default
collation" that is checked upon connection (before any query processing):
--- C:/tmp/.../src/test/regress/expected/hs_standby_check.out
2021-10-14 04:07:38.000000000 +0200
+++ C:/tmp/.../src/test/regress/results/hs_standby_check.out
2021-10-14 06:06:12.004043500 +0200
@@ -1,3 +1,6 @@
+WARNING: collation "default" has version mismatch
+DETAIL: The collation in the database was created using version
153.64, but the operating system provides version 153.14.
+HINT: Check all objects affected by this collation and run ALTER
COLLATION pg_catalog."default" REFRESH VERSION
--
-- Hot Standby tests
--
I admit that we decided to use this test mainly because it exists and
described in the documentation, not because it seemed very useful. It's
usage increased test coverage without a doubt, as it requires a rather
non-trivial setup (similar setups performed by TAP tests, but not with
pre-packaged binaries).
If I were trying to test that topic using available spare parts,
what I'd do is run the regular regression tests on the primary
and see if the standby could track it. Maybe pg_dump from both
servers afterwards and see if the results match, a la the pg_upgrade
test. Bonus points for a script that could run some other pg_regress
suite such as one of the contrib modules, because then you could
check compatibility of those too.
Thanks for the idea! We certainly will implement something like that
when we start testing packages for v15. We've already learned to compare
dumps before/after minor upgrade, so we could reuse that logic for this
test too.
I'm happy to keep the hs_standby_* scripts if there's a live use-case
for them; but I don't see what they're doing for you that wouldn't be
done better by other pg_regress suites.
Yes, I will not miss the test in case you will remove it. I just wanted
to mention that we use(d) it in our testing more or less successfully.
[1]: /messages/by-id/37A534BE-CBF7-467C-B096-0AAD25091A9F@yandex-team.ru
/messages/by-id/37A534BE-CBF7-467C-B096-0AAD25091A9F@yandex-team.ru
Best regards,
Alexander
On 03.01.22 22:50, Tom Lane wrote:
The attached proposed patch removes some ancient infrastructure for
manually testing hot standby. I doubt anyone has used this in years,
because AFAICS there is nothing here that's not done better by the
src/test/recovery TAP tests. (Or if there is, we ought to migrate
it into the TAP tests.)
I looked into this some time ago and concluded that this test contains a
significant amount of testing that isn't obviously done anywhere else.
I don't have the notes anymore, and surely some things have progressed
since, but I wouldn't just throw the old test suite away without
actually checking.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 03.01.22 22:50, Tom Lane wrote:
The attached proposed patch removes some ancient infrastructure for
manually testing hot standby.
I looked into this some time ago and concluded that this test contains a
significant amount of testing that isn't obviously done anywhere else.
I don't have the notes anymore, and surely some things have progressed
since, but I wouldn't just throw the old test suite away without
actually checking.
Fair enough ... so I looked, and there's not much at all that
I'm worried about.
hs_standby_allowed:
This is basically checking that the standby can see data from
the primary, which we surely have covered. Although it does
also cover propagation of nextval, which AFAICS is not tested
in src/test/recovery, so perhaps that's worth troubling over.
There are also some checks that particular commands are allowed
on the standby, which seem to me to be not too helpful;
see also comments on the next file.
hs_standby_disallowed:
Inverse of the above: check that some commands are disallowed.
We check some of these in 001_stream_rep.pl, and given the current
code structure in utility.c (ClassifyUtilityCommandAsReadOnly etc),
I do not see much point in adding more test cases of the same sort.
The only likely new bug in that area would be misclassification of
some new command, and no amount of testing of existing cases will
catch that.
There are also tests that particular functions are disallowed, which
isn't something that goes through ClassifyUtilityCommandAsReadOnly.
Nonetheless, adding more test cases here wouldn't help catch future
oversights of that type, so I remain unexcited.
hs_standby_functions:
Mostly also checking that things are disallowed. There's also
a test of pg_cancel_backend, which is cute but probably suffers
from timing instability (i.e., delayed arrival of the signal
might change the output). Moreover, pg_cancel_backend is already
covered in the isolation tests, and I see no reason to think
it'd operate differently on a standby.
hs_standby_check:
Checks pg_is_in_recovery(), which is checked far more thoroughly
by pg_ctl/t/003_promote.pl.
hs_primary_extremes:
Checks that we can cope with deep subtransaction nesting.
Maybe this is worth preserving, but I sort of doubt it ---
the standby doesn't even see the nesting does it?
Also checks that the standby can cope with 257 exclusive
locks at once, which corresponds to no interesting limit
that I know of.
So basically, I'd be inclined to add a couple of tests of
sequence-update propagation to src/test/recovery and
call it good.
regards, tom lane