Addition of --no-sync to pg_upgrade for test speedup
Hi all,
As per $subject, avoiding the flush of the new cluster's data
directory shortens a bint the runtime of the test. In some of my slow
VMs, aka Windows, this shaves a couple of seconds even if the bulk of
the time is still spent on the main regression test suite.
In pg_upgrade, we let the flush happen with initdb --sync-only, based
on the binary path of the new cluster, so I think that we are not
going to miss any test coverage by skipping that.
Thoughts or opinions?
--
Michael
Attachments:
pg_upgrade-nosync.patchtext/x-diff; charset=us-asciiDownload+35-8
On 16.12.21 07:50, Michael Paquier wrote:
As per $subject, avoiding the flush of the new cluster's data
directory shortens a bint the runtime of the test. In some of my slow
VMs, aka Windows, this shaves a couple of seconds even if the bulk of
the time is still spent on the main regression test suite.In pg_upgrade, we let the flush happen with initdb --sync-only, based
on the binary path of the new cluster, so I think that we are not
going to miss any test coverage by skipping that.
I think that is reasonable.
Maybe we could have some global option, like some environment variable,
that enables the "sync" mode in all tests, so it's easy to test that
once in a while. Not really a requirement for your patch, but an idea
in case this is a concern.
On Fri, Dec 17, 2021 at 10:21:04AM +0100, Peter Eisentraut wrote:
On 16.12.21 07:50, Michael Paquier wrote:
As per $subject, avoiding the flush of the new cluster's data
directory shortens a bint the runtime of the test. In some of my slow
VMs, aka Windows, this shaves a couple of seconds even if the bulk of
the time is still spent on the main regression test suite.In pg_upgrade, we let the flush happen with initdb --sync-only, based
on the binary path of the new cluster, so I think that we are not
going to miss any test coverage by skipping that.I think that is reasonable.
Maybe we could have some global option, like some environment variable, that
enables the "sync" mode in all tests, so it's easy to test that once in a
while. Not really a requirement for your patch, but an idea in case this is
a concern.
Yes, I think it would be good to see all the places we might want to
pass the no-sync option.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Fri, Dec 17, 2021 at 09:47:05AM -0500, Bruce Momjian wrote:
On Fri, Dec 17, 2021 at 10:21:04AM +0100, Peter Eisentraut wrote:
I think that is reasonable.
Thanks. I have applied that, as that really helped here.
Maybe we could have some global option, like some environment variable, that
enables the "sync" mode in all tests, so it's easy to test that once in a
while. Not really a requirement for your patch, but an idea in case this is
a concern.Yes, I think it would be good to see all the places we might want to
pass the no-sync option.
The remaining places in src/bin/ that I can see are pg_resetwal, where
we would fsync() a WAL segment full of zeros, and pg_recvlogical
OutputFsync(), which does not point to much data, I guess. The first
one may be worth it, but that's just 16MB we are talking about and
WriteEmptyXLOG() is not a code path taken currently by the tests.
We could introduce a new environment variable if one wishes to enforce
those flushes, say PG_TEST_SYNC, on top of patching any TAP test that
has a --no-sync to filter it out.
--
Michael
On 2021-Dec-16, Michael Paquier wrote:
In pg_upgrade, we let the flush happen with initdb --sync-only, based
on the binary path of the new cluster, so I think that we are not
going to miss any test coverage by skipping that.
There was one patch of mine with breakage that only manifested in the
pg_upgrade test *because* of its lack of no-sync. I'm afraid that this
change would hide certain problems.
/messages/by-id/20210130023011.n545o54j65t4kgxn@alap3.anarazel.de
Thoughts or opinions?
I'm not 100% comfortable with this. What can we do to preserve *some*
testing that include syncing? Maybe some option that a few buildfarm
animals use?
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)
On Mon, Dec 20, 2021 at 10:46:13AM -0300, Alvaro Herrera wrote:
On 2021-Dec-16, Michael Paquier wrote:
In pg_upgrade, we let the flush happen with initdb --sync-only, based
on the binary path of the new cluster, so I think that we are not
going to miss any test coverage by skipping that.There was one patch of mine with breakage that only manifested in the
pg_upgrade test *because* of its lack of no-sync. I'm afraid that this
change would hide certain problems.
/messages/by-id/20210130023011.n545o54j65t4kgxn@alap3.anarazel.de
Hmm. This talks about fsync=on being a factor counting in detecting a
failure with the backend. Why would the fsync done with initdb
--sync-only on the target cluster once pg_upgrade is done change
something here?
I'm not 100% comfortable with this. What can we do to preserve *some*
testing that include syncing? Maybe some option that a few buildfarm
animals use?
If you object about this part, I am fine to revert the change in
test.sh until there is a better facility to enforce syncs across tests
in the buildfarm, though. I can hack something to centralize all
that, of course, but I am not sure when I'll be able to do so in the
short term. Could I keep that in MSVC's vcregress.pl at least for the
time being?
--
Michael