Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?
Hi,
As some of you might have seen when running CI, cirrus-ci is restricting how
much CI cycles everyone can use for free (announcement at [1]https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/). This takes
effect September 1st.
This obviously has consequences both for individual users of CI as well as
cfbot.
The first thing I think we should do is to lower the cost of CI. One thing I
had not entirely realized previously, is that macos CI is by far the most
expensive CI to provide. That's not just the case with cirrus-ci, but also
with other providers. See the series of patches described later in the email.
To me, the situation for cfbot is different than the one for individual
users.
IMO, for the individual user case it's important to use CI for "free", without
a whole lot of complexity. Which imo rules approaches like providing
$cloud_provider compute accounts, that's too much setup work. With the
improvements detailed below, cirrus' free CI would last about ~65 runs /
month.
For cfbot I hope we can find funding to pay for compute to use for CI. The, by
far, most expensive bit is macos. To a significant degree due to macos
licensing terms not allowing more than 2 VMs on a physical host :(.
The reason we chose cirrus-ci were
a) Ability to use full VMs, rather than a pre-selected set of VMs, which
allows us to test a larger number
b) Ability to link to log files, without requiring an account. E.g. github
actions doesn't allow to view logs unless logged in.
c) Amount of compute available.
The set of free CI providers has shrunk since we chose cirrus, as have the
"free" resources provided. I started, quite incomplete as of now, wiki page at
[4]: .
Potential paths forward for individual CI:
- migrate wholesale to another CI provider
- split CI tasks across different CI providers, rely on github et al
displaying the CI status for different platforms
- give up
Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.
- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.
- Build our own system, using buildbot, jenkins or whatnot.
Opinions as to what to do?
The attached series of patches:
1) Makes startup of macos instances faster, using more efficient caching of
the required packages. Also submitted as [2]/messages/by-id/20230805202539.r3umyamsnctysdc7@awork3.anarazel.de.
2) Introduces a template initdb that's reused during the tests. Also submitted
as [3]/messages/by-id/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
3) Remove use of -DRANDOMIZE_ALLOCATED_MEMORY from macos tasks. It's
expensive. And CI also uses asan on linux, so I don't think it's really
needed.
4) Switch tasks to use debugoptimized builds. Previously many tasks used -Og,
to get decent backtraces etc. But the amount of CPU burned that way is too
large. One issue with that is that use of ccache becomes much more crucial,
uncached build times do significantly increase.
5) Move use of -Dsegsize_blocks=6 from macos to linux
Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
could stop covering both meson and autoconf segsize_blocks. It does affect
runtime on linux as well.
6) Disable write cache flushes on windows
It's a bit ugly to do this without using the UI... Shaves off about 30s
from the tests.
7) pg_regress only checked once a second whether postgres started up, but it's
usually much faster. Use pg_ctl's logic. It might be worth replacing the
use psql with directly using libpq in pg_regress instead, looks like the
overhead of repeatedly starting psql is noticeable.
FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):
task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month
Greetings,
Andres Freund
[1]: https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/
[2]: /messages/by-id/20230805202539.r3umyamsnctysdc7@awork3.anarazel.de
[3]: /messages/by-id/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
Attachments:
v1-0001-ci-macos-used-cached-macports-install.patchtext/x-diff; charset=us-asciiDownload
From 00c48a90f3acc6cba6af873b29429ee6d4ba38a6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 3 Aug 2023 23:29:13 -0700
Subject: [PATCH v1 1/9] ci: macos: used cached macports install
This substantially speeds up the mac CI time.
Discussion: https://postgr.es/m/20230805202539.r3umyamsnctysdc7@awork3.anarazel.de
---
.cirrus.yml | 63 +++++++-----------
src/tools/ci/ci_macports_packages.sh | 97 ++++++++++++++++++++++++++++
2 files changed, 122 insertions(+), 38 deletions(-)
create mode 100755 src/tools/ci/ci_macports_packages.sh
diff --git a/.cirrus.yml b/.cirrus.yml
index d260f15c4e2..e9cfc542cfe 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -429,8 +429,7 @@ task:
CIRRUS_WORKING_DIR: ${HOME}/pgsql/
CCACHE_DIR: ${HOME}/ccache
- HOMEBREW_CACHE: ${HOME}/homebrew-cache
- PERL5LIB: ${HOME}/perl5/lib/perl5
+ MACPORTS_CACHE: ${HOME}/macports-cache
CC: ccache cc
CXX: ccache c++
@@ -454,55 +453,43 @@ task:
- mkdir ${HOME}/cores
- sudo sysctl kern.corefile="${HOME}/cores/core.%P"
- perl_cache:
- folder: ~/perl5
- cpan_install_script:
- - perl -mIPC::Run -e 1 || cpan -T IPC::Run
- - perl -mIO::Pty -e 1 || cpan -T IO::Pty
- upload_caches: perl
-
-
- # XXX: Could we instead install homebrew into a cached directory? The
- # homebrew installation takes a good bit of time every time, even if the
- # packages do not need to be downloaded.
- homebrew_cache:
- folder: $HOMEBREW_CACHE
+ # Use macports, even though homebrew is installed. The installation
+ # of the additional packages we need would take quite a while with
+ # homebrew, even if we cache the downloads. We can't cache all of
+ # homebrew, because it's already large. So we use macports. To cache
+ # the installation we create a .dmg file that we mount if it already
+ # exists.
+ # XXX: The reason for the direct p5.34* references is that we'd need
+ # the large macport tree around to figure out that p5-io-tty is
+ # actually p5.34-io-tty. Using the unversioned name works, but
+ # updates macports every time.
+ macports_cache:
+ folder: ${MACPORTS_CACHE}
setup_additional_packages_script: |
- brew install \
+ sh src/tools/ci/ci_macports_packages.sh \
ccache \
- icu4c \
- krb5 \
- llvm \
+ icu \
+ kerberos5 \
lz4 \
- make \
meson \
openldap \
openssl \
- python \
- tcl-tk \
+ p5.34-io-tty \
+ p5.34-ipc-run \
+ tcl \
zstd
-
- brew cleanup -s # to reduce cache size
- upload_caches: homebrew
+ # Make macports install visible for subsequent steps
+ echo PATH=/opt/local/sbin/:/opt/local/bin/:$PATH >> $CIRRUS_ENV
+ upload_caches: macports
ccache_cache:
folder: $CCACHE_DIR
configure_script: |
- brewpath="/opt/homebrew"
- PKG_CONFIG_PATH="${brewpath}/lib/pkgconfig:${PKG_CONFIG_PATH}"
-
- for pkg in icu4c krb5 openldap openssl zstd ; do
- pkgpath="${brewpath}/opt/${pkg}"
- PKG_CONFIG_PATH="${pkgpath}/lib/pkgconfig:${PKG_CONFIG_PATH}"
- PATH="${pkgpath}/bin:${pkgpath}/sbin:$PATH"
- done
-
- export PKG_CONFIG_PATH PATH
-
+ export PKG_CONFIG_PATH="/opt/local/lib/pkgconfig/"
meson setup \
--buildtype=debug \
- -Dextra_include_dirs=${brewpath}/include \
- -Dextra_lib_dirs=${brewpath}/lib \
+ -Dextra_include_dirs=/opt/local/include \
+ -Dextra_lib_dirs=/opt/local/lib \
-Dcassert=true \
-Duuid=e2fs -Ddtrace=auto \
-Dsegsize_blocks=6 \
diff --git a/src/tools/ci/ci_macports_packages.sh b/src/tools/ci/ci_macports_packages.sh
new file mode 100755
index 00000000000..5f5d3027760
--- /dev/null
+++ b/src/tools/ci/ci_macports_packages.sh
@@ -0,0 +1,97 @@
+#!/bin/sh
+
+# Installs the passed in packages via macports. To make it fast enough
+# for CI, cache the installation as a .dmg file. To avoid
+# unnecessarily updating the cache, the cached image is only modified
+# when packages are installed or removed. Any package this script is
+# not instructed to install, will be removed again.
+#
+# This currently expects to be run in a macos cirrus-ci environment.
+
+set -e
+set -x
+
+packages="$@"
+
+macports_url="https://github.com/macports/macports-base/releases/download/v2.8.1/MacPorts-2.8.1-13-Ventura.pkg"
+cache_dmg="macports.hfs.dmg"
+
+if [ "$CIRRUS_CI" != "true" ]; then
+ echo "expect to be called within cirrus-ci" 1>2
+ exit 1
+fi
+
+sudo mkdir -p /opt/local
+mkdir -p ${MACPORTS_CACHE}/
+
+# If we are starting from clean cache, perform a fresh macports
+# install. Otherwise decompress the .dmg we created previously.
+#
+# After this we have a working macports installation, with an unknown set of
+# packages installed.
+new_install=0
+update_cached_image=0
+if [ -e ${MACPORTS_CACHE}/${cache_dmg}.zstd ]; then
+ time zstd -T0 -d ${MACPORTS_CACHE}/${cache_dmg}.zstd -o ${cache_dmg}
+ time sudo hdiutil attach -kernel ${cache_dmg} -owners on -shadow ${cache_dmg}.shadow -mountpoint /opt/local
+else
+ new_install=1
+ curl -fsSL -o macports.pkg "$macports_url"
+ time sudo installer -pkg macports.pkg -target /
+ # this is a throwaway environment, and it'd be a few lines to gin
+ # up a correct user / group when using the cache.
+ echo macportsuser root | sudo tee -a /opt/local/etc/macports/macports.conf
+fi
+export PATH=/opt/local/sbin/:/opt/local/bin/:$PATH
+
+# mark all installed packages unrequested, that allows us to detect
+# packages that aren't needed anymore
+if [ -n "$(port -q installed installed)" ] ; then
+ sudo port unsetrequested installed
+fi
+
+# if setting all the required packages as requested fails, we need
+# to install at least one of them
+if ! sudo port setrequested $packages > /dev/null 2>&1 ; then
+ echo not all required packages installed, doing so now
+ update_cached_image=1
+ # to keep the image small, we deleted the ports tree from the image...
+ sudo port selfupdate
+ # XXX likely we'll need some other way to force an upgrade at some
+ # point...
+ sudo port upgrade outdated
+ sudo port install -N $packages
+ sudo port setrequested $packages
+fi
+
+# check if any ports should be uninstalled
+if [ -n "$(port -q installed rleaves)" ] ; then
+ echo superflous packages installed
+ update_cached_image=1
+ sudo port uninstall --follow-dependencies rleaves
+
+ # remove prior cache contents, don't want to increase size
+ rm -f ${MACPORTS_CACHE}/*
+fi
+
+# Shrink installation if we created / modified it
+if [ "$new_install" -eq 1 -o "$update_cached_image" -eq 1 ]; then
+ sudo /opt/local/bin/port clean --all installed
+ sudo rm -rf /opt/local/var/macports/{software,sources}/*
+fi
+
+# If we're starting from a clean cache, start a new image. If we have
+# an image, but the contents changed, update the image in the cache
+# location.
+if [ "$new_install" -eq 1 ]; then
+ # use a generous size, so additional software can be installed later
+ time sudo hdiutil create -fs HFS+ -format UDRO -size 10g -layout NONE -srcfolder /opt/local/ ${cache_dmg}
+ time zstd -T -10 -z ${cache_dmg} -o ${MACPORTS_CACHE}/${cache_dmg}.zstd
+elif [ "$update_cached_image" -eq 1 ]; then
+ sudo hdiutil detach /opt/local/
+ time hdiutil convert -format UDRO ${cache_dmg} -shadow ${cache_dmg}.shadow -o updated.hfs.dmg
+ rm ${cache_dmg}.shadow
+ mv updated.hfs.dmg ${cache_dmg}
+ time zstd -T -10 -z ${cache_dmg} -o ${MACPORTS_CACHE}/${cache_dmg}.zstd
+ time sudo hdiutil attach -kernel ${cache_dmg} -owners on -shadow ${cache_dmg}.shadow -mountpoint /opt/local
+fi
--
2.38.0
v1-0002-Use-template-initdb-in-tests.patchtext/x-diff; charset=us-asciiDownload
From b050364589df42faf58bc21820fa1593cb0bff5b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 2 Feb 2023 21:51:53 -0800
Subject: [PATCH v1 2/9] Use "template" initdb in tests
Discussion: https://postgr.es/m/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
---
meson.build | 30 ++++++++++
.cirrus.yml | 3 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 46 ++++++++++++++-
src/test/regress/pg_regress.c | 74 ++++++++++++++++++------
src/Makefile.global.in | 52 +++++++++--------
5 files changed, 161 insertions(+), 44 deletions(-)
diff --git a/meson.build b/meson.build
index 0a11efc97a1..0e8a72b12b9 100644
--- a/meson.build
+++ b/meson.build
@@ -3048,8 +3048,10 @@ testport = 40000
test_env = environment()
temp_install_bindir = test_install_location / get_option('bindir')
+test_initdb_template = meson.build_root() / 'tmp_install' / 'initdb-template'
test_env.set('PG_REGRESS', pg_regress.full_path())
test_env.set('REGRESS_SHLIB', regress_module.full_path())
+test_env.set('INITDB_TEMPLATE', test_initdb_template)
# Test suites that are not safe by default but can be run if selected
# by the user via the whitespace-separated list in variable PG_TEST_EXTRA.
@@ -3064,6 +3066,34 @@ if library_path_var != ''
endif
+# Create (and remove old) initdb template directory. Tests use that, where
+# possible, to make it cheaper to run tests.
+#
+# Use python to remove the old cached initdb, as we cannot rely on a working
+# 'rm' binary on windows.
+test('initdb_cache',
+ python,
+ args: [
+ '-c', '''
+import shutil
+import sys
+import subprocess
+
+shutil.rmtree(sys.argv[1], ignore_errors=True)
+sp = subprocess.run(sys.argv[2:] + [sys.argv[1]])
+sys.exit(sp.returncode)
+''',
+ test_initdb_template,
+ temp_install_bindir / 'initdb',
+ '-A', 'trust', '-N', '--no-instructions'
+ ],
+ priority: setup_tests_priority - 1,
+ timeout: 300,
+ is_parallel: false,
+ env: test_env,
+ suite: ['setup'])
+
+
###############################################################
# Test Generation
diff --git a/.cirrus.yml b/.cirrus.yml
index e9cfc542cfe..f08da65ed76 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -115,8 +115,9 @@ task:
test_minimal_script: |
su postgres <<-EOF
ulimit -c unlimited
+ meson test $MTEST_ARGS --suite setup
meson test $MTEST_ARGS --num-processes ${TEST_JOBS} \
- tmp_install cube/regress pg_ctl/001_start_stop
+ cube/regress pg_ctl/001_start_stop
EOF
on_failure:
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 5e161dbee60..4d449c35de9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -522,8 +522,50 @@ sub init
mkdir $self->backup_dir;
mkdir $self->archive_dir;
- PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
- 'trust', '-N', @{ $params{extra} });
+ # If available and if there aren't any parameters, use a previously
+ # initdb'd cluster as a template by copying it. For a lot of tests, that's
+ # substantially cheaper. Do so only if there aren't parameters, it doesn't
+ # seem worth figuring out whether they affect compatibility.
+ #
+ # There's very similar code in pg_regress.c, but we can't easily
+ # deduplicate it until we require perl at build time.
+ if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+ {
+ note("initializing database system by running initdb");
+ PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
+ 'trust', '-N', @{ $params{extra} });
+ }
+ else
+ {
+ my @copycmd;
+ my $expected_exitcode;
+
+ note("initializing database system by copying initdb template");
+
+ if ($PostgreSQL::Test::Utils::windows_os)
+ {
+ @copycmd = qw(robocopy /E /NJS /NJH /NFL /NDL /NP);
+ $expected_exitcode = 1; # 1 denotes files were copied
+ }
+ else
+ {
+ @copycmd = qw(cp -a);
+ $expected_exitcode = 0;
+ }
+
+ @copycmd = (@copycmd, $ENV{INITDB_TEMPLATE}, $pgdata);
+
+ my $ret = PostgreSQL::Test::Utils::system_log(@copycmd);
+
+ # See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR
+ if ($ret & 127 or $ret >> 8 != $expected_exitcode)
+ {
+ BAIL_OUT(
+ sprintf("failed to execute command \"%s\": $ret",
+ join(" ", @copycmd)));
+ }
+ }
+
PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS},
'--config-auth', $pgdata, @{ $params{auth_extra} });
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b68632320a7..407e3915cec 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2295,6 +2295,7 @@ regression_main(int argc, char *argv[],
FILE *pg_conf;
const char *env_wait;
int wait_seconds;
+ const char *initdb_template_dir;
/*
* Prepare the temp instance
@@ -2316,25 +2317,64 @@ regression_main(int argc, char *argv[],
if (!directory_exists(buf))
make_directory(buf);
- /* initdb */
initStringInfo(&cmd);
- appendStringInfo(&cmd,
- "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
- bindir ? bindir : "",
- bindir ? "/" : "",
- temp_instance);
- if (debug)
- appendStringInfo(&cmd, " --debug");
- if (nolocale)
- appendStringInfo(&cmd, " --no-locale");
- appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
- fflush(NULL);
- if (system(cmd.data))
+
+ /*
+ * Create data directory.
+ *
+ * If available, use a previously initdb'd cluster as a template by
+ * copying it. For a lot of tests, that's substantially cheaper.
+ *
+ * There's very similar code in Cluster.pm, but we can't easily de
+ * duplicate it until we require perl at build time.
+ */
+ initdb_template_dir = getenv("INITDB_TEMPLATE");
+ if (initdb_template_dir == NULL || nolocale || debug)
{
- bail("initdb failed\n"
- "# Examine \"%s/log/initdb.log\" for the reason.\n"
- "# Command was: %s",
- outputdir, cmd.data);
+ note("initializing database system by running initdb");
+
+ appendStringInfo(&cmd,
+ "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+ bindir ? bindir : "",
+ bindir ? "/" : "",
+ temp_instance);
+ if (debug)
+ appendStringInfo(&cmd, " --debug");
+ if (nolocale)
+ appendStringInfo(&cmd, " --no-locale");
+ appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
+ fflush(NULL);
+ if (system(cmd.data))
+ {
+ bail("initdb failed\n"
+ "# Examine \"%s/log/initdb.log\" for the reason.\n"
+ "# Command was: %s",
+ outputdir, cmd.data);
+ }
+ }
+ else
+ {
+#ifndef WIN32
+ const char *copycmd = "cp -a \"%s\" \"%s/data\"";
+ int expected_exitcode = 0;
+#else
+ const char *copycmd = "robocopy /E /NJS /NJH /NFL /NDL /NP \"%s\" \"%s/data\"";
+ int expected_exitcode = 1; /* 1 denotes files were copied */
+#endif
+
+ note("initializing database system by copying initdb template");
+
+ appendStringInfo(&cmd,
+ copycmd,
+ initdb_template_dir,
+ temp_instance);
+ if (system(cmd.data) != expected_exitcode)
+ {
+ bail("copying of initdb template failed\n"
+ "# Examine \"%s/log/initdb.log\" for the reason.\n"
+ "# Command was: %s",
+ outputdir, cmd.data);
+ }
}
pfree(cmd.data);
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index df9f721a41a..0b4ca0eb6ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -397,30 +397,6 @@ check: temp-install
.PHONY: temp-install
-temp-install: | submake-generated-headers
-ifndef NO_TEMP_INSTALL
-ifneq ($(abs_top_builddir),)
-ifeq ($(MAKELEVEL),0)
- rm -rf '$(abs_top_builddir)'/tmp_install
- $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
- $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
- $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-endif
-endif
-endif
-
-# Tasks to run serially at the end of temp-install. Some EXTRA_INSTALL
-# entries appear more than once in the tree, and parallel installs of the same
-# file can fail with EEXIST.
-checkprep:
- $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
-
-PROVE = @PROVE@
-# There are common routines in src/test/perl, and some test suites have
-# extra perl modules in their own directory.
-PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
-# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
-PROVE_FLAGS =
# prepend to path if already set, else just set it
define add_to_path
@@ -437,8 +413,36 @@ ld_library_path_var = LD_LIBRARY_PATH
with_temp_install = \
PATH="$(abs_top_builddir)/tmp_install$(bindir):$(CURDIR):$$PATH" \
$(call add_to_path,$(strip $(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+ INITDB_TEMPLATE='$(abs_top_builddir)'/tmp_install/initdb-template \
$(with_temp_install_extra)
+temp-install: | submake-generated-headers
+ifndef NO_TEMP_INSTALL
+ifneq ($(abs_top_builddir),)
+ifeq ($(MAKELEVEL),0)
+ rm -rf '$(abs_top_builddir)'/tmp_install
+ $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+ $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+ $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+
+ $(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+endif
+endif
+endif
+
+# Tasks to run serially at the end of temp-install. Some EXTRA_INSTALL
+# entries appear more than once in the tree, and parallel installs of the same
+# file can fail with EEXIST.
+checkprep:
+ $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+
+PROVE = @PROVE@
+# There are common routines in src/test/perl, and some test suites have
+# extra perl modules in their own directory.
+PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
+# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
+PROVE_FLAGS =
+
ifeq ($(enable_tap_tests),yes)
ifndef PGXS
--
2.38.0
v1-0003-ci-macos-Remove-use-of-DRANDOMIZE_ALLOCATED_MEMOR.patchtext/x-diff; charset=us-asciiDownload
From de80483807aeef8eceef175cf2834bae3da3a0e1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:27:11 -0700
Subject: [PATCH v1 3/9] ci: macos: Remove use of -DRANDOMIZE_ALLOCATED_MEMORY
RANDOMIZE_ALLOCATED_MEMORY causes a measurable slowdown. Macos is, by far, the
most expensive platform to perform CI on, therefore it doesn't make sense to
run such a test there.
Ubsan and asan on linux should detect most of the the cases of uninitialized
memory, so it doesn't really seem worth using -DRANDOMIZE_ALLOCATED_MEMORY in
another instance type.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
.cirrus.yml | 1 -
1 file changed, 1 deletion(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index f08da65ed76..f3d63ff3fb0 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -434,7 +434,6 @@ task:
CC: ccache cc
CXX: ccache c++
- CPPFLAGS: -DRANDOMIZE_ALLOCATED_MEMORY
CFLAGS: -Og -ggdb
CXXFLAGS: -Og -ggdb
--
2.38.0
v1-0004-ci-switch-tasks-to-debugoptimized-build.patchtext/x-diff; charset=us-asciiDownload
From a86a45567151f1b2534cf599f42414edab60e2e0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:27:51 -0700
Subject: [PATCH v1 4/9] ci: switch tasks to debugoptimized build
In aggregate the CI tasks burn a lot of cpu hours. Compared to that easy to
read backtraces aren't as important. Still use -ggdb where appropriate, as
that does make backtraces more reliable, particularly in the face of
optimization.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
.cirrus.yml | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index f3d63ff3fb0..bfe251f48e8 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -140,7 +140,7 @@ task:
CCACHE_DIR: /tmp/ccache_dir
CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
- CFLAGS: -Og -ggdb
+ CFLAGS: -ggdb
depends_on: SanityCheck
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
@@ -181,7 +181,7 @@ task:
configure_script: |
su postgres <<-EOF
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \
-DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
-Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \
@@ -275,7 +275,7 @@ task:
ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0
# SANITIZER_FLAGS is set in the tasks below
- CFLAGS: -Og -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
+ CFLAGS: -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
CXXFLAGS: $CFLAGS
LDFLAGS: $SANITIZER_FLAGS
CC: ccache gcc
@@ -364,7 +364,7 @@ task:
configure_script: |
su postgres <<-EOF
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dcassert=true \
${LINUX_MESON_FEATURES} \
-DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
@@ -377,7 +377,7 @@ task:
su postgres <<-EOF
export CC='ccache gcc -m32'
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dcassert=true \
${LINUX_MESON_FEATURES} \
-Dllvm=disabled \
@@ -434,8 +434,8 @@ task:
CC: ccache cc
CXX: ccache c++
- CFLAGS: -Og -ggdb
- CXXFLAGS: -Og -ggdb
+ CFLAGS: -ggdb
+ CXXFLAGS: -ggdb
depends_on: SanityCheck
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*'
@@ -487,7 +487,7 @@ task:
configure_script: |
export PKG_CONFIG_PATH="/opt/local/lib/pkgconfig/"
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dextra_include_dirs=/opt/local/include \
-Dextra_lib_dirs=/opt/local/lib \
-Dcassert=true \
--
2.38.0
v1-0005-ci-Move-use-of-Dsegsize_blocks-6-from-macos-to-li.patchtext/x-diff; charset=us-asciiDownload
From f888e984557e0ba79bf04263a962825dbc520c2d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 5 Aug 2023 13:36:28 -0700
Subject: [PATCH v1 5/9] ci: Move use of -Dsegsize_blocks=6 from macos to linux
The option causes a measurable slowdown. Macos is, by far, the most expensive
platform to perform CI on, therefore move the option to another task. In
addition, the filesystem overhead seems to impact macos worse than other
platforms.
---
.cirrus.yml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index bfe251f48e8..35ef9c97211 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -366,6 +366,7 @@ task:
meson setup \
--buildtype=debugoptimized \
-Dcassert=true \
+ -Dsegsize_blocks=6 \
${LINUX_MESON_FEATURES} \
-DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
build
@@ -492,7 +493,6 @@ task:
-Dextra_lib_dirs=/opt/local/lib \
-Dcassert=true \
-Duuid=e2fs -Ddtrace=auto \
- -Dsegsize_blocks=6 \
-DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
build
--
2.38.0
v1-0006-ci-windows-Disabling-write-cache-flushing-during-.patchtext/x-diff; charset=us-asciiDownload
From 7169a36f31568782d55cb2e57c5a3d6c87f23607 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:56:29 -0700
Subject: [PATCH v1 6/9] ci: windows: Disabling write cache flushing during
test
This has been measured to reduce windows test times by about 30s.
---
.cirrus.yml | 2 ++
src/tools/ci/windows_write_cache.ps1 | 3 +++
2 files changed, 5 insertions(+)
create mode 100644 src/tools/ci/windows_write_cache.ps1
diff --git a/.cirrus.yml b/.cirrus.yml
index 35ef9c97211..ef825485826 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -559,6 +559,8 @@ task:
setup_additional_packages_script: |
REM choco install -y --no-progress ...
+ change_write_caching_script: powershell src/tools/ci/windows_write_cache.ps1
+
setup_hosts_file_script: |
echo 127.0.0.1 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
echo 127.0.0.2 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
diff --git a/src/tools/ci/windows_write_cache.ps1 b/src/tools/ci/windows_write_cache.ps1
new file mode 100644
index 00000000000..9c52bc886d4
--- /dev/null
+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,3 @@
+Get-ItemProperty -path "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI/*/*/Device Parameters/Disk" -name CacheIsPowerProtected
+Set-ItemProperty -path "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI/*/*/Device Parameters/Disk" -name CacheIsPowerProtected -Value 0
+Get-ItemProperty -path "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI/*/*/Device Parameters/Disk" -name CacheIsPowerProtected
--
2.38.0
v1-0007-regress-Check-for-postgres-startup-completion-mor.patchtext/x-diff; charset=us-asciiDownload
From 99089f7cfbbfb2d10074009716a1420163bc6697 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:51:16 -0700
Subject: [PATCH v1 7/9] regress: Check for postgres startup completion more
often
Previously pg_regress.c only checked whether the server started up once a
second - in most cases startup is much faster though. Use the same interval as
pg_ctl does.
---
src/test/regress/pg_regress.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 407e3915cec..46af1fddbdb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -75,6 +75,9 @@ const char *pretty_diff_opts = "-w -U3";
*/
#define TESTNAME_WIDTH 36
+/* how often to recheck if postgres startup completed */
+#define WAITS_PER_SEC 10
+
typedef enum TAPtype
{
DIAG = 0,
@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;
- for (i = 0; i < wait_seconds; i++)
+ for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
{
/* Done if psql succeeds */
fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
outputdir);
}
- pg_usleep(1000000L);
+ pg_usleep(1000000L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{
--
2.38.0
v1-0008-ci-Don-t-specify-amount-of-memory.patchtext/x-diff; charset=us-asciiDownload
From 9edf90d5bfbb1933c68525481636f2e33ea3b1b4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:31:15 -0700
Subject: [PATCH v1 8/9] ci: Don't specify amount of memory
The number of CPUs is the cost-determining factor. Most instance types that
run tests have more memory/core than what we specified, there's no real
benefit in wasting that.
---
.cirrus.yml | 4 ----
1 file changed, 4 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index ef825485826..a4d64955489 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -150,7 +150,6 @@ task:
image: family/pg-ci-freebsd-13
platform: freebsd
cpu: $CPUS
- memory: 4G
disk: 50
sysinfo_script: |
@@ -292,7 +291,6 @@ task:
image: family/pg-ci-bullseye
platform: linux
cpu: $CPUS
- memory: 4G
ccache_cache:
folder: ${CCACHE_DIR}
@@ -554,7 +552,6 @@ task:
image: family/pg-ci-windows-ci-vs-2019
platform: windows
cpu: $CPUS
- memory: 4G
setup_additional_packages_script: |
REM choco install -y --no-progress ...
@@ -604,7 +601,6 @@ task:
image: family/pg-ci-windows-ci-mingw64
platform: windows
cpu: $CPUS
- memory: 4G
env:
TEST_JOBS: 4 # higher concurrency causes occasional failures
--
2.38.0
Hi,
On 2023-08-07 19:15:41 -0700, Andres Freund wrote:
The set of free CI providers has shrunk since we chose cirrus, as have the
"free" resources provided. I started, quite incomplete as of now, wiki page at
[4].
Oops, as Thomas just noticed, I left off that link:
[4]: https://wiki.postgresql.org/wiki/CI_Providers
Greetings,
Andres Freund
On 08/08/2023 05:15, Andres Freund wrote:
IMO, for the individual user case it's important to use CI for "free", without
a whole lot of complexity. Which imo rules approaches like providing
$cloud_provider compute accounts, that's too much setup work.
+1
With the improvements detailed below, cirrus' free CI would last
about ~65 runs / month.
I think that's plenty.
For cfbot I hope we can find funding to pay for compute to use for CI.
+1
Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.- Build our own system, using buildbot, jenkins or whatnot.
Opinions as to what to do?
The resources for running our own system isn't free either. I'm sure we
can get sponsors for the cirrus-ci credits, or use donations.
I have been quite happy with Cirrus CI overall.
The attached series of patches:
All of this makes sense to me, although I don't use macos myself.
5) Move use of -Dsegsize_blocks=6 from macos to linux
Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
could stop covering both meson and autoconf segsize_blocks. It does affect
runtime on linux as well.
Could we have a comment somewhere on why we use -Dsegsize_blocks on
these particular CI runs? It seems pretty random. I guess the idea is to
have one autoconf task and one meson task with that option, to check
that the autoconf/meson option works?
6) Disable write cache flushes on windows
It's a bit ugly to do this without using the UI... Shaves off about 30s
from the tests.
A brief comment would be nice: "We don't care about persistence over
hard crashes in the CI, so disable write cache flushes to speed it up."
--
Heikki Linnakangas
Neon (https://neon.tech)
On 08.08.23 04:15, Andres Freund wrote:
Potential paths forward for individual CI:
- migrate wholesale to another CI provider
- split CI tasks across different CI providers, rely on github et al
displaying the CI status for different platforms- give up
With the proposed optimizations, it seems you can still do a fair amount
of work under the free plan.
Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.- Build our own system, using buildbot, jenkins or whatnot.
I think we should use the "compute credits" plan from Cirrus CI. It
should be possible to estimate the costs for that. Money is available,
I think.
Hi,
On 2023-08-08 16:28:49 +0200, Peter Eisentraut wrote:
On 08.08.23 04:15, Andres Freund wrote:
Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.- Build our own system, using buildbot, jenkins or whatnot.
I think we should use the "compute credits" plan from Cirrus CI. It should
be possible to estimate the costs for that. Money is available, I think.
Unfortunately just doing that seems like it would up considerably on the too
expensive side. Here are the stats for last months' cfbot runtimes (provided
by Thomas):
task_name | sum
------------------------------------------------+------------
FreeBSD - 13 - Meson | 1017:56:09
Windows - Server 2019, MinGW64 - Meson | 00:00:00
SanityCheck | 76:48:41
macOS - Ventura - Meson | 873:12:43
Windows - Server 2019, VS 2019 - Meson & ninja | 1251:08:06
Linux - Debian Bullseye - Autoconf | 830:17:26
Linux - Debian Bullseye - Meson | 860:37:21
CompilerWarnings | 935:30:35
(8 rows)
If I did the math right, that's about 7000 credits (and 1 credit costs 1 USD).
task costs in credits
linux-sanity: 55.30
linux-autoconf: 598.04
linux-meson: 619.40
linux-compiler-warnings: 674.28
freebsd : 732.24
windows : 1201.09
macos : 3143.52
Now, those times are before optimizing test runtime. And besides optimizing
the tasks, we can also optimize not running tests for docs patches etc. And
optimize cfbot to schedule a bit better.
But still, the costs look not realistic to me.
If instead we were to use our own GCP account, it's a lot less. t2d-standard-4
instances, which are faster than what we use right now, cost $0.168984 / hour
as "normal" instances and $0.026764 as "spot" instances right now [1]https://cloud.google.com/compute/all-pricing. Windows
VMs are considerably more expensive due to licensing - 0.184$/h in addition.
Assuming spot instances, linux+freebsd tasks would cost ~100USD month (maybe
10-20% more in reality, due to a) spot instances getting terminated requiring
retries and b) disks).
Windows would be ~255 USD / month (same retries caveats).
Given the cost of macos, it seems like it'd be by far the most of affordable
to just buy 1-2 mac minis (2x ~660USD) and stick them in a shelf somewhere, as
persistent runners. Cirrus has builtin macos virtualization support - but can
only host two VMs on each mac, due to macos licensing restrictions. A single
mac mini would suffice to keep up with our unoptimized monthly runtime
(although there likely would be some overhead).
Greetings,
Andres Freund
On Mon Aug 7, 2023 at 9:15 PM CDT, Andres Freund wrote:
FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month
I am not in the loop on the autotools vs meson stuff. How much longer do
we anticipate keeping autotools around? Seems like it could be a good
opportunity to reduce some CI usage if autotools were finally dropped,
but I know there are still outstanding tasks to complete.
Back of the napkin math says autotools is about 12% of the credit cost,
though I haven't looked to see if linux-meson and linux-autotools are
1:1.
--
Tristan Partin
Neon (https://neon.tech)
Hi,
On 2023-08-08 10:25:52 -0500, Tristan Partin wrote:
On Mon Aug 7, 2023 at 9:15 PM CDT, Andres Freund wrote:
FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/monthI am not in the loop on the autotools vs meson stuff. How much longer do we
anticipate keeping autotools around?
I think it depends in what fashion. We've been talking about supporting
building out-of-tree modules with "pgxs" for at least a 5 year support
window. But the replacement isn't yet finished [1]https://github.com/anarazel/postgres/tree/meson-pkgconfig, so that clock hasn't yet
started ticking.
Seems like it could be a good opportunity to reduce some CI usage if
autotools were finally dropped, but I know there are still outstanding tasks
to complete.Back of the napkin math says autotools is about 12% of the credit cost,
though I haven't looked to see if linux-meson and linux-autotools are 1:1.
The autoconf task is actually doing quite useful stuff right now, leaving the
use of configure aside, as it builds with address sanitizer. Without that it'd
be a lot faster. But we'd loose, imo quite important, coverage. The tests
would run a bit faster with meson, but it'd be overall a difference on the
margins.
Greetings,
Andres Freund
[1]: https://github.com/anarazel/postgres/tree/meson-pkgconfig
On Tue Aug 8, 2023 at 10:38 AM CDT, Andres Freund wrote:
Hi,
On 2023-08-08 10:25:52 -0500, Tristan Partin wrote:
On Mon Aug 7, 2023 at 9:15 PM CDT, Andres Freund wrote:
FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/monthI am not in the loop on the autotools vs meson stuff. How much longer do we
anticipate keeping autotools around?I think it depends in what fashion. We've been talking about supporting
building out-of-tree modules with "pgxs" for at least a 5 year support
window. But the replacement isn't yet finished [1], so that clock hasn't yet
started ticking.Seems like it could be a good opportunity to reduce some CI usage if
autotools were finally dropped, but I know there are still outstanding tasks
to complete.Back of the napkin math says autotools is about 12% of the credit cost,
though I haven't looked to see if linux-meson and linux-autotools are 1:1.The autoconf task is actually doing quite useful stuff right now, leaving the
use of configure aside, as it builds with address sanitizer. Without that it'd
be a lot faster. But we'd loose, imo quite important, coverage. The tests
would run a bit faster with meson, but it'd be overall a difference on the
margins.[1] https://github.com/anarazel/postgres/tree/meson-pkgconfig
Makes sense. Please let me know if I can help you out in anyway for the
v17 development cycle besides what we have already talked about.
--
Tristan Partin
Neon (https://neon.tech)
On 2023-Aug-08, Andres Freund wrote:
Given the cost of macos, it seems like it'd be by far the most of affordable
to just buy 1-2 mac minis (2x ~660USD) and stick them in a shelf somewhere, as
persistent runners. Cirrus has builtin macos virtualization support - but can
only host two VMs on each mac, due to macos licensing restrictions. A single
mac mini would suffice to keep up with our unoptimized monthly runtime
(although there likely would be some overhead).
If using persistent workers is an option, maybe we should explore that.
I think we could move all or some of the Linux - Debian builds to
hardware that we already have in shelves (depending on how much compute
power is really needed.)
I think using other OSes is more difficult, mostly because I doubt we
want to deal with licenses; but even FreeBSD might not be a realistic
option, at least not in the short term.
Still,
task_name | sum
------------------------------------------------+------------
FreeBSD - 13 - Meson | 1017:56:09
Windows - Server 2019, MinGW64 - Meson | 00:00:00
SanityCheck | 76:48:41
macOS - Ventura - Meson | 873:12:43
Windows - Server 2019, VS 2019 - Meson & ninja | 1251:08:06
Linux - Debian Bullseye - Autoconf | 830:17:26
Linux - Debian Bullseye - Meson | 860:37:21
CompilerWarnings | 935:30:35
(8 rows)
moving just Debian, that might alleviate 76+830+860+935 hours from the
Cirrus infra, which is ~46%. Not bad.
(How come Windows - Meson reports allballs?)
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
(Carlos Caszeli)
Hi,
On 2023-08-08 18:34:58 +0200, Alvaro Herrera wrote:
On 2023-Aug-08, Andres Freund wrote:
Given the cost of macos, it seems like it'd be by far the most of affordable
to just buy 1-2 mac minis (2x ~660USD) and stick them in a shelf somewhere, as
persistent runners. Cirrus has builtin macos virtualization support - but can
only host two VMs on each mac, due to macos licensing restrictions. A single
mac mini would suffice to keep up with our unoptimized monthly runtime
(although there likely would be some overhead).If using persistent workers is an option, maybe we should explore that.
I think we could move all or some of the Linux - Debian builds to
hardware that we already have in shelves (depending on how much compute
power is really needed.)
(76+830+860+935)/((365/12)*24) = 3.7
3.7 instances with 4 "vcores" are busy 100% of the time. So we'd need at least
~16 cpu threads - I think cirrus sometimes uses instances that disable HT, so
it'd perhaps be 16 cores actually.
I think using other OSes is more difficult, mostly because I doubt we
want to deal with licenses; but even FreeBSD might not be a realistic
option, at least not in the short term.
They can be VMs, so that shouldn't be a big issue.
task_name | sum
------------------------------------------------+------------
FreeBSD - 13 - Meson | 1017:56:09
Windows - Server 2019, MinGW64 - Meson | 00:00:00
SanityCheck | 76:48:41
macOS - Ventura - Meson | 873:12:43
Windows - Server 2019, VS 2019 - Meson & ninja | 1251:08:06
Linux - Debian Bullseye - Autoconf | 830:17:26
Linux - Debian Bullseye - Meson | 860:37:21
CompilerWarnings | 935:30:35
(8 rows)moving just Debian, that might alleviate 76+830+860+935 hours from the
Cirrus infra, which is ~46%. Not bad.(How come Windows - Meson reports allballs?)
It's mingw64, which we've marked as "manual", because we didn't have the cpu
cycles to run it.
Greetings,
Andres Freund
Hi,
On 2023-08-08 11:58:25 +0300, Heikki Linnakangas wrote:
On 08/08/2023 05:15, Andres Freund wrote:
With the improvements detailed below, cirrus' free CI would last
about ~65 runs / month.I think that's plenty.
Not so sure, I would regularly exceed it, I think. But it definitely will
suffice for more casual contributors.
Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.- Build our own system, using buildbot, jenkins or whatnot.
Opinions as to what to do?
The resources for running our own system isn't free either. I'm sure we can
get sponsors for the cirrus-ci credits, or use donations.
As outlined in my reply to Alvaro, just using credits likely is financially
not viable...
5) Move use of -Dsegsize_blocks=6 from macos to linux
Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
could stop covering both meson and autoconf segsize_blocks. It does affect
runtime on linux as well.Could we have a comment somewhere on why we use -Dsegsize_blocks on these
particular CI runs? It seems pretty random. I guess the idea is to have one
autoconf task and one meson task with that option, to check that the
autoconf/meson option works?
Hm, some of that was in the commit message, but I should have added it to
.cirrus.yml as well.
Normally, the "relation segment" code basically has no coverage in our tests,
because we (quite reasonably) don't generate tables large enough. We've had
plenty bugs that we didn't notice due the code not being exercised much. So it
seemed useful to add CI coverage, by making the segments very small.
I chose the tasks by looking at how long they took at the time, I
think. Adding them to to the slower ones.
6) Disable write cache flushes on windows
It's a bit ugly to do this without using the UI... Shaves off about 30s
from the tests.A brief comment would be nice: "We don't care about persistence over hard
crashes in the CI, so disable write cache flushes to speed it up."
Turns out that patch doesn't work on its own anyway, at least not
reliably... I tested it by interactively logging into a windows vm and testing
it there. It doesn't actually seem to suffice when run in isolation, because
the relevant registry key doesn't yet exist. I haven't yet figured out the
magic incantations for adding the missing "intermediary", but I'm getting
there...
Greetings,
Andres Freund
On Tue, Aug 8, 2023 at 9:26 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-08-08 11:58:25 +0300, Heikki Linnakangas wrote:
On 08/08/2023 05:15, Andres Freund wrote:
With the improvements detailed below, cirrus' free CI would last
about ~65 runs / month.I think that's plenty.
Not so sure, I would regularly exceed it, I think. But it definitely will
suffice for more casual contributors.Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.- Build our own system, using buildbot, jenkins or whatnot.
Opinions as to what to do?
The resources for running our own system isn't free either. I'm sure we can
get sponsors for the cirrus-ci credits, or use donations.As outlined in my reply to Alvaro, just using credits likely is financially
not viable...
In case it's helpful, from an SPI oriented perspective, $7K/month is
probably an order of magnitude more than what we can sustain, so I
don't see a way to make that work without some kind of additional
magic that includes other non-profits and/or commercial companies
changing donation habits between now and September.
Purchasing a couple of mac-mini's (and/or similar gear) would be near
trivial though, just a matter of figuring out where/how to host it
(but I think infra can chime in on that if that's what get's decided).
The other likely option would be to seek out cloud credits from one of
the big three (or others); Amazon has continually said they would be
happy to donate more credits to us if we had a use, and I think some
of the other hosting providers have said similarly at times; so we'd
need to ask and hope it's not too bureaucratic.
Robert Treat
https://xzilla.net
Hi,
On 2023-08-08 22:29:50 -0400, Robert Treat wrote:
In case it's helpful, from an SPI oriented perspective, $7K/month is
probably an order of magnitude more than what we can sustain, so I
don't see a way to make that work without some kind of additional
magic that includes other non-profits and/or commercial companies
changing donation habits between now and September.
Yea, I think that'd make no sense, even if we could afford it. I think the
patches I've written should drop it to 1/2 already. Thomas added some
throttling to push it down further.
Purchasing a couple of mac-mini's (and/or similar gear) would be near
trivial though, just a matter of figuring out where/how to host it
(but I think infra can chime in on that if that's what get's decided).
Cool. Because of the limitation of running two VMs at a time on macos and the
comparatively low cost of mac minis, it seems they beat alternative models by
a fair bit.
Pginfra/sysadmin: ^
Based on being off by an order of magnitude, as you mention earlier, it seems
that:
1) reducing test runtime etc, as already in progress
2) getting 2 mac minis as runners
3) using ~350 USD / mo in GCP costs for windows, linux, freebsd (*)
Would be viable for a month or three? I hope we can get some cloud providers
to chip in for 3), but I'd like to have something in place that doesn't depend
on that.
Given the cost of macos VMs at AWS, the only of the big cloud providers to
have macos instances, I think we'd burn pointlessly quick through credits if
we used VMs for that.
(*) I think we should be able to get below that, but ...
The other likely option would be to seek out cloud credits from one of
the big three (or others); Amazon has continually said they would be
happy to donate more credits to us if we had a use, and I think some
of the other hosting providers have said similarly at times; so we'd
need to ask and hope it's not too bureaucratic.
Yep.
I tried to start that progress within microsoft, fwiw. Perhaps Joe and
Jonathan know how to start within AWS? And perhaps Noah inside GCP?
It'd be the least work to get it up and running in GCP, as it's already
running there, but should be quite doable at the others as well.
Greetings,
Andres Freund
On Wed, Aug 9, 2023 at 3:26 AM Andres Freund <andres@anarazel.de> wrote:
6) Disable write cache flushes on windows
It's a bit ugly to do this without using the UI... Shaves off
about 30s
from the tests.
A brief comment would be nice: "We don't care about persistence over hard
crashes in the CI, so disable write cache flushes to speed it up."Turns out that patch doesn't work on its own anyway, at least not
reliably... I tested it by interactively logging into a windows vm and
testing
it there. It doesn't actually seem to suffice when run in isolation,
because
the relevant registry key doesn't yet exist. I haven't yet figured out the
magic incantations for adding the missing "intermediary", but I'm getting
there...
You can find a good example on how to accomplish this in:
https://github.com/farag2/Utilities/blob/master/Enable_disk_write_caching.ps1
Regards,
Juan José Santamaría Flecha
Hello
So pginfra had a little chat about this. Firstly, there's consensus
that it makes sense for pginfra to help out with some persistent workers
in our existing VM system; however there are some aspects that need
some further discussion, to avoid destabilizing the rest of the
infrastructure. We're looking into it and we'll let you know.
Hosting a couple of Mac Minis is definitely a possibility, if some
entity like SPI buys them. Let's take this off-list to arrange the
details.
Regards
--
Álvaro Herrera
On Tue, Aug 08, 2023 at 07:59:55PM -0700, Andres Freund wrote:
On 2023-08-08 22:29:50 -0400, Robert Treat wrote:
3) using ~350 USD / mo in GCP costs for windows, linux, freebsd (*)
The other likely option would be to seek out cloud credits
I tried to start that progress within microsoft, fwiw. Perhaps Joe and
Jonathan know how to start within AWS? And perhaps Noah inside GCP?It'd be the least work to get it up and running in GCP, as it's already
running there
I'm looking at this. Thanks for bringing it to my attention.
Hi,
On 2023-08-07 19:15:41 -0700, Andres Freund wrote:
As some of you might have seen when running CI, cirrus-ci is restricting how
much CI cycles everyone can use for free (announcement at [1]). This takes
effect September 1st.This obviously has consequences both for individual users of CI as well as
cfbot.[...]
Potential paths forward for individual CI:
- migrate wholesale to another CI provider
- split CI tasks across different CI providers, rely on github et al
displaying the CI status for different platforms- give up
Potential paths forward for cfbot, in addition to the above:
- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.
To make that possible, we need to make the compute resources for CI
configurable on a per-repository basis. After experimenting with a bunch of
ways to do that, I got stuck on that for a while. But since today we have
sufficient macos runners for cfbot available, so... I think the approach I
finally settled on is decent, although not great. It's described in the "main"
commit message:
ci: Prepare to make compute resources for CI configurable
cirrus-ci will soon restrict the amount of free resources every user gets (as
have many other CI providers). For most users of CI that should not be an
issue. But e.g. for cfbot it will be an issue.
To allow configuring different resources on a per-repository basis, introduce
infrastructure for overriding the task execution environment. Unfortunately
this is not entirely trivial, as yaml anchors have to be defined before their
use, and cirrus-ci only allows injecting additional contents at the end of
.cirrus.yml.
To deal with that, move the definition of the CI tasks to
.cirrus.tasks.yml. The main .cirrus.yml is loaded first, then, if defined, the
file referenced by the REPO_CI_CONFIG_GIT_URL variable, will be added,
followed by the contents of .cirrus.tasks.yml. That allows
REPO_CI_CONFIG_GIT_URL to override the yaml anchors defined in .cirrus.yml.
Unfortunately git's default merge / rebase strategy does not handle copied
files, just renamed ones. To avoid painful rebasing over this change, this
commit just renames .cirrus.yml to .cirrus.tasks.yml, without adding a new
.cirrus.yml. That's done in the followup commit, which moves the relevant
portion of .cirrus.tasks.yml to .cirrus.yml. Until that is done,
REPO_CI_CONFIG_GIT_URL does not fully work.
The subsequent commit adds documentation for how to configure custom compute
resources to src/tools/ci/README
Discussion: /messages/by-id/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
I don't love moving most of the contents of .cirrus.yml into a new file, but I
don't see another way. I did implement it without that as well (see [1]https://github.com/anarazel/postgres/commit/b95fd302161b951f1dc14d586162ed3d85564bfc), but
that ends up considerably harder to understand, and hardcodes what cfbot
needs. Splitting the commit, as explained above, at least makes git rebase
fairly painless. FWIW, I did merge the changes into 15, with only reasonable
conflicts (due to new tasks, autoconf->meson).
A prerequisite commit converts "SanityCheck" and "CompilerWarnings" to use a
full VM instead of a container - that way providing custom compute resources
doesn't have to deal with containers in addition to VMs. It also looks like
the increased startup overhead is outweighed by the reduction in runtime
overhead.
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.
Greetings,
Andres Freund
[1]: https://github.com/anarazel/postgres/commit/b95fd302161b951f1dc14d586162ed3d85564bfc
Attachments:
v3-0001-ci-Don-t-specify-amount-of-memory.patchtext/x-diff; charset=us-asciiDownload
From d1fa394bf9318f08c1c529160c3e6cedc5bb5289 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:31:15 -0700
Subject: [PATCH v3 01/10] ci: Don't specify amount of memory
The number of CPUs is the cost-determining factor. Most instance types that
run tests have more memory/core than what we specified, there's no real
benefit in wasting that.
Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
.cirrus.yml | 4 ----
1 file changed, 4 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 727c434de40..9e84eb95be5 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -149,7 +149,6 @@ task:
image: family/pg-ci-freebsd-13
platform: freebsd
cpu: $CPUS
- memory: 4G
disk: 50
sysinfo_script: |
@@ -291,7 +290,6 @@ task:
image: family/pg-ci-bullseye
platform: linux
cpu: $CPUS
- memory: 4G
ccache_cache:
folder: ${CCACHE_DIR}
@@ -558,7 +556,6 @@ task:
image: family/pg-ci-windows-ci-vs-2019
platform: windows
cpu: $CPUS
- memory: 4G
setup_additional_packages_script: |
REM choco install -y --no-progress ...
@@ -606,7 +603,6 @@ task:
image: family/pg-ci-windows-ci-mingw64
platform: windows
cpu: $CPUS
- memory: 4G
env:
TEST_JOBS: 4 # higher concurrency causes occasional failures
--
2.38.0
v3-0002-ci-Move-execution-method-of-tasks-into-yaml-templ.patchtext/x-diff; charset=us-asciiDownload
From dee50464cd75c1eadc46e7cc673d23d6bf01e6b6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 21:25:28 -0700
Subject: [PATCH v3 02/10] ci: Move execution method of tasks into yaml
templates
This is done in preparation for making the compute resources for CI
configurable. It also looks cleaner.
Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
.cirrus.yml | 85 +++++++++++++++++++++++++++++++++++------------------
1 file changed, 57 insertions(+), 28 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 9e84eb95be5..75747b9b651 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -9,6 +9,7 @@ env:
GCP_PROJECT: pg-ci-images
IMAGE_PROJECT: $GCP_PROJECT
CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
+ DISK_SIZE: 25
# The lower depth accelerates git clone. Use a bit of depth so that
# concurrent tasks and retrying older jobs have a chance of working.
@@ -28,6 +29,45 @@ env:
PG_TEST_EXTRA: kerberos ldap ssl load_balance
+# Define how to run various types of tasks.
+
+# VMs provided by cirrus-ci. Each user has a limited number of "free" credits
+# for testing.
+cirrus_community_vm_template: &cirrus_community_vm_template
+ compute_engine_instance:
+ image_project: $IMAGE_PROJECT
+ image: family/$IMAGE_FAMILY
+ platform: $PLATFORM
+ cpu: $CPUS
+ disk: $DISK_SIZE
+
+
+default_linux_task_template: &linux_task_template
+ env:
+ PLATFORM: linux
+ <<: *cirrus_community_vm_template
+
+
+default_freebsd_task_template: &freebsd_task_template
+ env:
+ PLATFORM: freebsd
+ <<: *cirrus_community_vm_template
+
+
+default_windows_task_template: &windows_task_template
+ env:
+ PLATFORM: windows
+ <<: *cirrus_community_vm_template
+
+
+# macos workers provided by cirrus-ci
+default_macos_task_template: &macos_task_template
+ env:
+ PLATFORM: macos
+ macos_instance:
+ image: $IMAGE
+
+
# What files to preserve in case tests fail
on_failure_ac: &on_failure_ac
log_artifacts:
@@ -136,21 +176,18 @@ task:
CPUS: 2
BUILD_JOBS: 3
TEST_JOBS: 3
+ IMAGE_FAMILY: pg-ci-freebsd-13
+ DISK_SIZE: 50
CCACHE_DIR: /tmp/ccache_dir
CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
CFLAGS: -Og -ggdb
+ <<: *freebsd_task_template
+
depends_on: SanityCheck
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
- compute_engine_instance:
- image_project: $IMAGE_PROJECT
- image: family/pg-ci-freebsd-13
- platform: freebsd
- cpu: $CPUS
- disk: 50
-
sysinfo_script: |
id
uname -a
@@ -250,6 +287,7 @@ task:
CPUS: 4
BUILD_JOBS: 4
TEST_JOBS: 8 # experimentally derived to be a decent choice
+ IMAGE_FAMILY: pg-ci-bullseye
CCACHE_DIR: /tmp/ccache_dir
DEBUGINFOD_URLS: "https://debuginfod.debian.net"
@@ -282,15 +320,11 @@ task:
LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES
+ <<: *linux_task_template
+
depends_on: SanityCheck
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
- compute_engine_instance:
- image_project: $IMAGE_PROJECT
- image: family/pg-ci-bullseye
- platform: linux
- cpu: $CPUS
-
ccache_cache:
folder: ${CCACHE_DIR}
@@ -430,6 +464,7 @@ task:
# work OK. See
# https://postgr.es/m/20220927040208.l3shfcidovpzqxfh%40awork3.anarazel.de
TEST_JOBS: 8
+ IMAGE: ghcr.io/cirruslabs/macos-ventura-base:latest
CIRRUS_WORKING_DIR: ${HOME}/pgsql/
CCACHE_DIR: ${HOME}/ccache
@@ -440,12 +475,11 @@ task:
CFLAGS: -Og -ggdb
CXXFLAGS: -Og -ggdb
+ <<: *macos_task_template
+
depends_on: SanityCheck
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*'
- macos_instance:
- image: ghcr.io/cirruslabs/macos-ventura-base:latest
-
sysinfo_script: |
id
uname -a
@@ -524,6 +558,7 @@ WINDOWS_ENVIRONMENT_BASE: &WINDOWS_ENVIRONMENT_BASE
# Avoids port conflicts between concurrent tap test runs
PG_TEST_USE_UNIX_SOCKETS: 1
PG_REGRESS_SOCK_DIR: "c:/cirrus/"
+ DISK_SIZE: 50
sysinfo_script: |
chcp
@@ -547,16 +582,13 @@ task:
# given that it explicitly prevents crash dumps from working...
# 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX
CIRRUS_WINDOWS_ERROR_MODE: 0x8001
+ IMAGE_FAMILY: pg-ci-windows-ci-vs-2019
+
+ <<: *windows_task_template
depends_on: SanityCheck
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
- compute_engine_instance:
- image_project: $IMAGE_PROJECT
- image: family/pg-ci-windows-ci-vs-2019
- platform: windows
- cpu: $CPUS
-
setup_additional_packages_script: |
REM choco install -y --no-progress ...
@@ -598,12 +630,6 @@ task:
# otherwise it'll be sorted before other tasks
depends_on: SanityCheck
- compute_engine_instance:
- image_project: $IMAGE_PROJECT
- image: family/pg-ci-windows-ci-mingw64
- platform: windows
- cpu: $CPUS
-
env:
TEST_JOBS: 4 # higher concurrency causes occasional failures
CCACHE_DIR: C:/msys64/ccache
@@ -617,6 +643,9 @@ task:
# Start bash in current working directory
CHERE_INVOKING: 1
BASH: C:\msys64\usr\bin\bash.exe -l
+ IMAGE_FAMILY: pg-ci-windows-ci-mingw64
+
+ <<: *windows_task_template
ccache_cache:
folder: ${CCACHE_DIR}
--
2.38.0
v3-0003-ci-Use-VMs-for-SanityCheck-and-CompilerWarnings.patchtext/x-diff; charset=us-asciiDownload
From a60733b7bb86e93a3d23e45b98ce179d0f4e26fd Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 21:28:37 -0700
Subject: [PATCH v3 03/10] ci: Use VMs for SanityCheck and CompilerWarnings
The main reason for this change is to reduce different ways of executing
tasks, making it easier to use custom compute resources for cfbot. A secondary
benefit is that the tasks seem slightly faster this way, apparently the
increased startup overhead is outweighed by reduced runtime overhead.
Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
.cirrus.yml | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 75747b9b651..f4276ad8692 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -110,15 +110,14 @@ task:
CPUS: 4
BUILD_JOBS: 8
TEST_JOBS: 8
+ IMAGE_FAMILY: pg-ci-bullseye
CCACHE_DIR: ${CIRRUS_WORKING_DIR}/ccache_dir
# no options enabled, should be small
CCACHE_MAXSIZE: "150M"
- # Container starts up quickly, but is slower at runtime, particularly for
- # tests. Good for the briefly running sanity check.
- container:
- image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
- cpu: $CPUS
+ # While containers would start up a bit quicker, building is a bit
+ # slower. This way we don't have to maintain a container image.
+ <<: *linux_task_template
ccache_cache:
folder: $CCACHE_DIR
@@ -691,6 +690,7 @@ task:
env:
CPUS: 4
BUILD_JOBS: 4
+ IMAGE_FAMILY: pg-ci-bullseye
# Use larger ccache cache, as this task compiles with multiple compilers /
# flag combinations
@@ -700,9 +700,7 @@ task:
LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES
- container:
- image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
- cpu: $CPUS
+ <<: *linux_task_template
sysinfo_script: |
id
--
2.38.0
v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patchtext/x-diff; charset=us-asciiDownload
From a4e238c4c4426f371535fb88c4046fb9e127c923 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 23:48:32 -0700
Subject: [PATCH v3 04/10] ci: Prepare to make compute resources for CI
configurable
cirrus-ci will soon restrict the amount of free resources every user gets (as
have many other CI providers). For most users of CI that should not be an
issue. But e.g. for cfbot it will be an issue.
To allow configuring different resources on a per-repository basis, introduce
infrastructure for overriding the task execution environment. Unfortunately
this is not entirely trivial, as yaml anchors have to be defined before their
use, and cirrus-ci only allows injecting additional contents at the end of
.cirrus.yml.
To deal with that, move the definition of the CI tasks to
.cirrus.tasks.yml. The main .cirrus.yml is loaded first, then, if defined, the
file referenced by the REPO_CI_CONFIG_GIT_URL variable, will be added,
followed by the contents of .cirrus.tasks.yml. That allows
REPO_CI_CONFIG_GIT_URL to override the yaml anchors defined in .cirrus.yml.
Unfortunately git's default merge / rebase strategy does not handle copied
files, just renamed ones. To avoid painful rebasing over this change, this
commit just renames .cirrus.yml to .cirrus.tasks.yml, without adding a new
.cirrus.yml. That's done in the followup commit, which moves the relevant
portion of .cirrus.tasks.yml to .cirrus.yml. Until that is done,
REPO_CI_CONFIG_GIT_URL does not fully work.
The subsequent commit adds documentation for how to configure custom compute
resources to src/tools/ci/README
Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
.cirrus.star | 63 ++++++++++++++++++++++++++++++++
.cirrus.yml => .cirrus.tasks.yml | 0
2 files changed, 63 insertions(+)
create mode 100644 .cirrus.star
rename .cirrus.yml => .cirrus.tasks.yml (100%)
diff --git a/.cirrus.star b/.cirrus.star
new file mode 100644
index 00000000000..f3ea9b943e4
--- /dev/null
+++ b/.cirrus.star
@@ -0,0 +1,63 @@
+"""Additional CI configuration, using the starlark language. See
+https://cirrus-ci.org/guide/programming-tasks/#introduction-into-starlark
+
+See also the starlark specification at
+https://github.com/bazelbuild/starlark/blob/master/spec.md
+
+See also .cirrus.yml and src/tools/ci/README
+"""
+
+load("cirrus", "env", "fs")
+
+
+def main():
+ """The main function is executed by cirrus-ci after loading .cirrus.yml and can
+ extend the CI definition further.
+
+ As documented in .cirrus.yml, the final CI configuration is composed of
+
+ 1) the contents of this file
+
+ 2) if defined, the contents of the file referenced by the, repository
+ level, REPO_CI_CONFIG_GIT_URL variable (see
+ https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted
+ format)
+
+ 3) .cirrus.tasks.yml
+ """
+
+ output = ""
+
+ # 1) is included implicitly
+
+ # Add 2)
+ repo_config_url = env.get("REPO_CI_CONFIG_GIT_URL")
+ if repo_config_url != None:
+ print("loading additional configuration from \"{}\"".format(repo_config_url))
+ output += config_from(repo_config_url)
+ else:
+ output += "n# REPO_CI_CONFIG_URL was not set\n"
+
+ # Add 3)
+ output += config_from(".cirrus.tasks.yml")
+
+ return output
+
+
+def config_from(config_src):
+ """return contents of config file `config_src`, surrounded by markers
+ indicating start / end of the the included file
+ """
+
+ config_contents = fs.read(config_src)
+ config_fmt = """
+
+###
+# contents of config file `{0}` start here
+###
+{1}
+###
+# contents of config file `{0}` end here
+###
+"""
+ return config_fmt.format(config_src, config_contents)
diff --git a/.cirrus.yml b/.cirrus.tasks.yml
similarity index 100%
rename from .cirrus.yml
rename to .cirrus.tasks.yml
--
2.38.0
v3-0005-ci-Make-compute-resources-for-CI-configurable.patchtext/x-diff; charset=us-asciiDownload
From 53a4c9642dda8ffea3c7b73e97f4b23ea0a4f2a3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 22:26:54 -0700
Subject: [PATCH v3 05/10] ci: Make compute resources for CI configurable
See prior commit for an explanation for the goal of the change and why it had
to be split into two commits.
Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
.cirrus.yml | 73 +++++++++++++++++++++++++++++++++++++++++++++
.cirrus.tasks.yml | 45 ----------------------------
src/tools/ci/README | 17 +++++++++++
3 files changed, 90 insertions(+), 45 deletions(-)
create mode 100644 .cirrus.yml
diff --git a/.cirrus.yml b/.cirrus.yml
new file mode 100644
index 00000000000..a83129ae46d
--- /dev/null
+++ b/.cirrus.yml
@@ -0,0 +1,73 @@
+# CI configuration file for CI utilizing cirrus-ci.org
+#
+# For instructions on how to enable the CI integration in a repository and
+# further details, see src/tools/ci/README
+#
+#
+# The actual CI tasks are defined in .cirrus.tasks.yml. To make the compute
+# resources for CI configurable on a repository level, the "final" CI
+# configuration is the combination of:
+#
+# 1) the contents of this file
+#
+# 2) if defined, the contents of the file referenced by the, repository
+# level, REPO_CI_CONFIG_GIT_URL variable (see
+# https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted
+# format)
+#
+# 3) .cirrus.tasks.yml
+#
+# This composition is done by .cirrus.star
+
+
+env:
+ # Source of images / containers
+ GCP_PROJECT: pg-ci-images
+ IMAGE_PROJECT: $GCP_PROJECT
+ CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
+ DISK_SIZE: 25
+
+
+# Define how to run various types of tasks.
+
+# VMs provided by cirrus-ci. Each user has a limited number of "free" credits
+# for testing.
+cirrus_community_vm_template: &cirrus_community_vm_template
+ compute_engine_instance:
+ image_project: $IMAGE_PROJECT
+ image: family/$IMAGE_FAMILY
+ platform: $PLATFORM
+ cpu: $CPUS
+ disk: $DISK_SIZE
+
+
+default_linux_task_template: &linux_task_template
+ env:
+ PLATFORM: linux
+ <<: *cirrus_community_vm_template
+
+
+default_freebsd_task_template: &freebsd_task_template
+ env:
+ PLATFORM: freebsd
+ <<: *cirrus_community_vm_template
+
+
+default_windows_task_template: &windows_task_template
+ env:
+ PLATFORM: windows
+ <<: *cirrus_community_vm_template
+
+
+# macos workers provided by cirrus-ci
+default_macos_task_template: &macos_task_template
+ env:
+ PLATFORM: macos
+ macos_instance:
+ image: $IMAGE
+
+
+# Contents of REPO_CI_CONFIG_GIT_URL, if defined, will be inserted here,
+# followed by the contents .cirrus.tasks.yml. This allows
+# REPO_CI_CONFIG_GIT_URL to override how the task types above will be
+# executed, e.g. using a custom compute account or permanent workers.
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index f4276ad8692..0cf7ba77996 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -5,12 +5,6 @@
env:
- # Source of images / containers
- GCP_PROJECT: pg-ci-images
- IMAGE_PROJECT: $GCP_PROJECT
- CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
- DISK_SIZE: 25
-
# The lower depth accelerates git clone. Use a bit of depth so that
# concurrent tasks and retrying older jobs have a chance of working.
CIRRUS_CLONE_DEPTH: 500
@@ -29,45 +23,6 @@ env:
PG_TEST_EXTRA: kerberos ldap ssl load_balance
-# Define how to run various types of tasks.
-
-# VMs provided by cirrus-ci. Each user has a limited number of "free" credits
-# for testing.
-cirrus_community_vm_template: &cirrus_community_vm_template
- compute_engine_instance:
- image_project: $IMAGE_PROJECT
- image: family/$IMAGE_FAMILY
- platform: $PLATFORM
- cpu: $CPUS
- disk: $DISK_SIZE
-
-
-default_linux_task_template: &linux_task_template
- env:
- PLATFORM: linux
- <<: *cirrus_community_vm_template
-
-
-default_freebsd_task_template: &freebsd_task_template
- env:
- PLATFORM: freebsd
- <<: *cirrus_community_vm_template
-
-
-default_windows_task_template: &windows_task_template
- env:
- PLATFORM: windows
- <<: *cirrus_community_vm_template
-
-
-# macos workers provided by cirrus-ci
-default_macos_task_template: &macos_task_template
- env:
- PLATFORM: macos
- macos_instance:
- image: $IMAGE
-
-
# What files to preserve in case tests fail
on_failure_ac: &on_failure_ac
log_artifacts:
diff --git a/src/tools/ci/README b/src/tools/ci/README
index 80d01939e84..30ddd200c96 100644
--- a/src/tools/ci/README
+++ b/src/tools/ci/README
@@ -65,3 +65,20 @@ messages. Currently the following controls are available:
Only runs CI on operating systems specified. This can be useful when
addressing portability issues affecting only a subset of platforms.
+
+
+Using custom compute resources for CI
+=====================================
+
+When running a lot of tests in a repository, cirrus-ci's free credits do not
+suffice. In those cases a repository can be configured to use other
+infrastructure for running tests. To do so, the REPO_CI_CONFIG_GIT_URL
+variable can be configured for the repository in the cirrus-ci web interface,
+at https://cirrus-ci.com/github/<user or organization>. The file referenced
+(see https://cirrus-ci.org/guide/programming-tasks/#fs) by the variable can
+overwrite the default execution method for different operating systems,
+defined in .cirrus.yml, by redefining the relevant yaml anchors.
+
+Custom compute resources can be provided using
+- https://cirrus-ci.org/guide/supported-computing-services/
+- https://cirrus-ci.org/guide/persistent-workers/
--
2.38.0
v3-0006-ci-dontmerge-Example-custom-CI-configuration.patchtext/x-diff; charset=us-asciiDownload
From cae9e8fd2c257eaad0270f79b595eb70a743ab13 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 21:45:31 -0700
Subject: [PATCH v3 06/10] ci: dontmerge: Example custom CI configuration
Uses a custom google compute account (freebsd, linux and windows) and
persistent workers (macos).
This only works in repositories that are configured for both. In addition,
cirrus-ci needs to be configured to set the REPO_CI_CONFIG_GIT_URL pointing to
the file added here.
---
.cirrus.custom.yml | 49 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
create mode 100644 .cirrus.custom.yml
diff --git a/.cirrus.custom.yml b/.cirrus.custom.yml
new file mode 100644
index 00000000000..42cbeecc715
--- /dev/null
+++ b/.cirrus.custom.yml
@@ -0,0 +1,49 @@
+gcp_credentials:
+ workload_identity_provider: projects/1072892761768/locations/global/workloadIdentityPools/cirrus-ci-pool/providers/cirrus-oidc
+ service_account: cirrus-ci@pg-ci-runs.iam.gserviceaccount.com
+
+
+# Defaults
+gce_instance:
+ type: t2d-standard-4
+ spot: true
+ zone: us-west1-a
+ use_ssd: true
+
+
+gce_instance_template: &gce_instance_template
+ gce_instance:
+ image_project: $IMAGE_PROJECT
+ image_family: $IMAGE_FAMILY
+ platform: $PLATFORM
+ disk: $DISK_SIZE
+ platform: $PLATFORM
+
+
+gce_linux_task_template: &linux_task_template
+ env:
+ PLATFORM: linux
+ <<: *gce_instance_template
+
+
+gce_freebsd_task_template: &freebsd_task_template
+ env:
+ PLATFORM: freebsd
+ <<: *gce_instance_template
+
+
+gce_windows_task_template: &windows_task_template
+ env:
+ PLATFORM: windows
+ <<: *gce_instance_template
+
+
+persistent_worker_macos_task_template: &macos_task_template
+ env:
+ PLATFORM: macos
+ persistent_worker:
+ isolation:
+ tart:
+ image: $IMAGE
+ user: admin
+ password: admin
--
2.38.0
v3-0007-Use-template-initdb-in-tests.patchtext/x-diff; charset=us-asciiDownload
From 12a0287a38ed77529be2cf8a8a37d772459fa090 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 2 Feb 2023 21:51:53 -0800
Subject: [PATCH v3 07/10] Use "template" initdb in tests
Discussion: https://postgr.es/m/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
---
meson.build | 30 ++++++++++
src/test/perl/PostgreSQL/Test/Cluster.pm | 46 ++++++++++++++-
src/test/regress/pg_regress.c | 74 ++++++++++++++++++------
.cirrus.tasks.yml | 3 +-
src/Makefile.global.in | 52 +++++++++--------
5 files changed, 161 insertions(+), 44 deletions(-)
diff --git a/meson.build b/meson.build
index f5ec442f9a9..8b2b521a013 100644
--- a/meson.build
+++ b/meson.build
@@ -3070,8 +3070,10 @@ testport = 40000
test_env = environment()
temp_install_bindir = test_install_location / get_option('bindir')
+test_initdb_template = meson.build_root() / 'tmp_install' / 'initdb-template'
test_env.set('PG_REGRESS', pg_regress.full_path())
test_env.set('REGRESS_SHLIB', regress_module.full_path())
+test_env.set('INITDB_TEMPLATE', test_initdb_template)
# Test suites that are not safe by default but can be run if selected
# by the user via the whitespace-separated list in variable PG_TEST_EXTRA.
@@ -3086,6 +3088,34 @@ if library_path_var != ''
endif
+# Create (and remove old) initdb template directory. Tests use that, where
+# possible, to make it cheaper to run tests.
+#
+# Use python to remove the old cached initdb, as we cannot rely on a working
+# 'rm' binary on windows.
+test('initdb_cache',
+ python,
+ args: [
+ '-c', '''
+import shutil
+import sys
+import subprocess
+
+shutil.rmtree(sys.argv[1], ignore_errors=True)
+sp = subprocess.run(sys.argv[2:] + [sys.argv[1]])
+sys.exit(sp.returncode)
+''',
+ test_initdb_template,
+ temp_install_bindir / 'initdb',
+ '-A', 'trust', '-N', '--no-instructions'
+ ],
+ priority: setup_tests_priority - 1,
+ timeout: 300,
+ is_parallel: false,
+ env: test_env,
+ suite: ['setup'])
+
+
###############################################################
# Test Generation
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 5e161dbee60..4d449c35de9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -522,8 +522,50 @@ sub init
mkdir $self->backup_dir;
mkdir $self->archive_dir;
- PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
- 'trust', '-N', @{ $params{extra} });
+ # If available and if there aren't any parameters, use a previously
+ # initdb'd cluster as a template by copying it. For a lot of tests, that's
+ # substantially cheaper. Do so only if there aren't parameters, it doesn't
+ # seem worth figuring out whether they affect compatibility.
+ #
+ # There's very similar code in pg_regress.c, but we can't easily
+ # deduplicate it until we require perl at build time.
+ if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+ {
+ note("initializing database system by running initdb");
+ PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
+ 'trust', '-N', @{ $params{extra} });
+ }
+ else
+ {
+ my @copycmd;
+ my $expected_exitcode;
+
+ note("initializing database system by copying initdb template");
+
+ if ($PostgreSQL::Test::Utils::windows_os)
+ {
+ @copycmd = qw(robocopy /E /NJS /NJH /NFL /NDL /NP);
+ $expected_exitcode = 1; # 1 denotes files were copied
+ }
+ else
+ {
+ @copycmd = qw(cp -a);
+ $expected_exitcode = 0;
+ }
+
+ @copycmd = (@copycmd, $ENV{INITDB_TEMPLATE}, $pgdata);
+
+ my $ret = PostgreSQL::Test::Utils::system_log(@copycmd);
+
+ # See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR
+ if ($ret & 127 or $ret >> 8 != $expected_exitcode)
+ {
+ BAIL_OUT(
+ sprintf("failed to execute command \"%s\": $ret",
+ join(" ", @copycmd)));
+ }
+ }
+
PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS},
'--config-auth', $pgdata, @{ $params{auth_extra} });
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b68632320a7..407e3915cec 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2295,6 +2295,7 @@ regression_main(int argc, char *argv[],
FILE *pg_conf;
const char *env_wait;
int wait_seconds;
+ const char *initdb_template_dir;
/*
* Prepare the temp instance
@@ -2316,25 +2317,64 @@ regression_main(int argc, char *argv[],
if (!directory_exists(buf))
make_directory(buf);
- /* initdb */
initStringInfo(&cmd);
- appendStringInfo(&cmd,
- "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
- bindir ? bindir : "",
- bindir ? "/" : "",
- temp_instance);
- if (debug)
- appendStringInfo(&cmd, " --debug");
- if (nolocale)
- appendStringInfo(&cmd, " --no-locale");
- appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
- fflush(NULL);
- if (system(cmd.data))
+
+ /*
+ * Create data directory.
+ *
+ * If available, use a previously initdb'd cluster as a template by
+ * copying it. For a lot of tests, that's substantially cheaper.
+ *
+ * There's very similar code in Cluster.pm, but we can't easily de
+ * duplicate it until we require perl at build time.
+ */
+ initdb_template_dir = getenv("INITDB_TEMPLATE");
+ if (initdb_template_dir == NULL || nolocale || debug)
{
- bail("initdb failed\n"
- "# Examine \"%s/log/initdb.log\" for the reason.\n"
- "# Command was: %s",
- outputdir, cmd.data);
+ note("initializing database system by running initdb");
+
+ appendStringInfo(&cmd,
+ "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+ bindir ? bindir : "",
+ bindir ? "/" : "",
+ temp_instance);
+ if (debug)
+ appendStringInfo(&cmd, " --debug");
+ if (nolocale)
+ appendStringInfo(&cmd, " --no-locale");
+ appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
+ fflush(NULL);
+ if (system(cmd.data))
+ {
+ bail("initdb failed\n"
+ "# Examine \"%s/log/initdb.log\" for the reason.\n"
+ "# Command was: %s",
+ outputdir, cmd.data);
+ }
+ }
+ else
+ {
+#ifndef WIN32
+ const char *copycmd = "cp -a \"%s\" \"%s/data\"";
+ int expected_exitcode = 0;
+#else
+ const char *copycmd = "robocopy /E /NJS /NJH /NFL /NDL /NP \"%s\" \"%s/data\"";
+ int expected_exitcode = 1; /* 1 denotes files were copied */
+#endif
+
+ note("initializing database system by copying initdb template");
+
+ appendStringInfo(&cmd,
+ copycmd,
+ initdb_template_dir,
+ temp_instance);
+ if (system(cmd.data) != expected_exitcode)
+ {
+ bail("copying of initdb template failed\n"
+ "# Examine \"%s/log/initdb.log\" for the reason.\n"
+ "# Command was: %s",
+ outputdir, cmd.data);
+ }
}
pfree(cmd.data);
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 0cf7ba77996..e137769850d 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -109,8 +109,9 @@ task:
test_minimal_script: |
su postgres <<-EOF
ulimit -c unlimited
+ meson test $MTEST_ARGS --suite setup
meson test $MTEST_ARGS --num-processes ${TEST_JOBS} \
- tmp_install cube/regress pg_ctl/001_start_stop
+ cube/regress pg_ctl/001_start_stop
EOF
on_failure:
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index df9f721a41a..0b4ca0eb6ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -397,30 +397,6 @@ check: temp-install
.PHONY: temp-install
-temp-install: | submake-generated-headers
-ifndef NO_TEMP_INSTALL
-ifneq ($(abs_top_builddir),)
-ifeq ($(MAKELEVEL),0)
- rm -rf '$(abs_top_builddir)'/tmp_install
- $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
- $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
- $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-endif
-endif
-endif
-
-# Tasks to run serially at the end of temp-install. Some EXTRA_INSTALL
-# entries appear more than once in the tree, and parallel installs of the same
-# file can fail with EEXIST.
-checkprep:
- $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
-
-PROVE = @PROVE@
-# There are common routines in src/test/perl, and some test suites have
-# extra perl modules in their own directory.
-PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
-# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
-PROVE_FLAGS =
# prepend to path if already set, else just set it
define add_to_path
@@ -437,8 +413,36 @@ ld_library_path_var = LD_LIBRARY_PATH
with_temp_install = \
PATH="$(abs_top_builddir)/tmp_install$(bindir):$(CURDIR):$$PATH" \
$(call add_to_path,$(strip $(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+ INITDB_TEMPLATE='$(abs_top_builddir)'/tmp_install/initdb-template \
$(with_temp_install_extra)
+temp-install: | submake-generated-headers
+ifndef NO_TEMP_INSTALL
+ifneq ($(abs_top_builddir),)
+ifeq ($(MAKELEVEL),0)
+ rm -rf '$(abs_top_builddir)'/tmp_install
+ $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+ $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+ $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+
+ $(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+endif
+endif
+endif
+
+# Tasks to run serially at the end of temp-install. Some EXTRA_INSTALL
+# entries appear more than once in the tree, and parallel installs of the same
+# file can fail with EEXIST.
+checkprep:
+ $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+
+PROVE = @PROVE@
+# There are common routines in src/test/perl, and some test suites have
+# extra perl modules in their own directory.
+PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
+# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
+PROVE_FLAGS =
+
ifeq ($(enable_tap_tests),yes)
ifndef PGXS
--
2.38.0
v3-0008-ci-switch-tasks-to-debugoptimized-build.patchtext/x-diff; charset=us-asciiDownload
From a30d92a68b4e7a925cc9fccb3c83674f2c6407af Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:27:51 -0700
Subject: [PATCH v3 08/10] ci: switch tasks to debugoptimized build
In aggregate the CI tasks burn a lot of cpu hours. Compared to that easy to
read backtraces aren't as important. Still use -ggdb where appropriate, as
that does make backtraces more reliable, particularly in the face of
optimization.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
.cirrus.tasks.yml | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e137769850d..d1730ce08a8 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -136,7 +136,7 @@ task:
CCACHE_DIR: /tmp/ccache_dir
CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
- CFLAGS: -Og -ggdb
+ CFLAGS: -ggdb
<<: *freebsd_task_template
@@ -171,7 +171,7 @@ task:
configure_script: |
su postgres <<-EOF
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \
-DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
-Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \
@@ -266,7 +266,7 @@ task:
ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0
# SANITIZER_FLAGS is set in the tasks below
- CFLAGS: -Og -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
+ CFLAGS: -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
CXXFLAGS: $CFLAGS
LDFLAGS: $SANITIZER_FLAGS
CC: ccache gcc
@@ -356,7 +356,7 @@ task:
configure_script: |
su postgres <<-EOF
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dcassert=true \
${LINUX_MESON_FEATURES} \
-DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
@@ -369,7 +369,7 @@ task:
su postgres <<-EOF
export CC='ccache gcc -m32'
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dcassert=true \
${LINUX_MESON_FEATURES} \
-Dllvm=disabled \
@@ -427,8 +427,8 @@ task:
CC: ccache cc
CXX: ccache c++
- CFLAGS: -Og -ggdb
- CXXFLAGS: -Og -ggdb
+ CFLAGS: -ggdb
+ CXXFLAGS: -ggdb
<<: *macos_task_template
@@ -479,7 +479,7 @@ task:
configure_script: |
export PKG_CONFIG_PATH="/opt/local/lib/pkgconfig/"
meson setup \
- --buildtype=debug \
+ --buildtype=debugoptimized \
-Dextra_include_dirs=/opt/local/include \
-Dextra_lib_dirs=/opt/local/lib \
-Dcassert=true \
--
2.38.0
v3-0009-ci-windows-Disabling-write-cache-flushing-during-.patchtext/x-diff; charset=us-asciiDownload
From df832576932cd3f8dc8e966670eb171353242680 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:56:29 -0700
Subject: [PATCH v3 09/10] ci: windows: Disabling write cache flushing during
test
This has been measured to reduce windows test times by about 30s.
---
.cirrus.tasks.yml | 6 ++++++
src/tools/ci/windows_write_cache.ps1 | 20 ++++++++++++++++++++
2 files changed, 26 insertions(+)
create mode 100644 src/tools/ci/windows_write_cache.ps1
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index d1730ce08a8..360b1c775fd 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -547,6 +547,12 @@ task:
setup_additional_packages_script: |
REM choco install -y --no-progress ...
+ # Define the write cache to be power protected. This reduces the rate of
+ # cache flushes, which seems to help metadata heavy workloads on NTFS. We're
+ # just testing here anyway, so ...
+ change_write_caching_script:
+ - powershell src/tools/ci/windows_write_cache.ps1 2>&1
+
setup_hosts_file_script: |
echo 127.0.0.1 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
echo 127.0.0.2 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
diff --git a/src/tools/ci/windows_write_cache.ps1 b/src/tools/ci/windows_write_cache.ps1
new file mode 100644
index 00000000000..5c67b3ce54b
--- /dev/null
+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.
+
+Set-Location "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI";
+
+Get-ChildItem -Path "*/*" | foreach-object {
+ Push-Location;
+ cd /$_;
+ pwd;
+ cd 'Device Parameters';
+ if (!(Test-Path -Path "Disk")) {
+ New-Item -Path "Disk";
+ }
+
+ Set-ItemProperty -Path Disk -Type DWord -name CacheIsPowerProtected -Value 1;
+ Pop-Location;
+}
--
2.38.0
v3-0010-regress-Check-for-postgres-startup-completion-mor.patchtext/x-diff; charset=us-asciiDownload
From c7b2c9d9664cc03fc997230c6162c6920181ca85 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:51:16 -0700
Subject: [PATCH v3 10/10] regress: Check for postgres startup completion more
often
Previously pg_regress.c only checked whether the server started up once a
second - in most cases startup is much faster though. Use the same interval as
pg_ctl does.
---
src/test/regress/pg_regress.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 407e3915cec..46af1fddbdb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -75,6 +75,9 @@ const char *pretty_diff_opts = "-w -U3";
*/
#define TESTNAME_WIDTH 36
+/* how often to recheck if postgres startup completed */
+#define WAITS_PER_SEC 10
+
typedef enum TAPtype
{
DIAG = 0,
@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;
- for (i = 0; i < wait_seconds; i++)
+ for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
{
/* Done if psql succeeds */
fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
outputdir);
}
- pg_usleep(1000000L);
+ pg_usleep(1000000L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{
--
2.38.0
On 23 Aug 2023, at 08:58, Andres Freund <andres@anarazel.de> wrote:
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.
I've been reading over these and the thread, and while not within my area of
expertise, nothing really sticks out.
I'll do another pass, but below are a few small comments so far.
I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?
+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.
One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here. In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't. On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.
@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;
- for (i = 0; i < wait_seconds; i++)
+ for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
{
/* Done if psql succeeds */
fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
outputdir);
}
- pg_usleep(1000000L);
+ pg_usleep(1000000L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{
It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?
--
Daniel Gustafsson
Hi,
On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote:
On 23 Aug 2023, at 08:58, Andres Freund <andres@anarazel.de> wrote:
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.I've been reading over these and the thread, and while not within my area of
expertise, nothing really sticks out.
Thanks!
I'll do another pass, but below are a few small comments so far.
I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?
Ah, I should have explained that: I'm not planning to apply
- regress: Check for postgres startup completion more often
- ci: windows: Disabling write cache flushing during test
right now. Compared to the other patches the wins are much smaller and/or more
work is needed to make them good.
I think it might be worth going for
- ci: switch tasks to debugoptimized build
because that provides a fair bit of gain. But it might be more hurtful than
helpful due to costing more when ccache doesn't work...
+++ b/src/tools/ci/windows_write_cache.ps1 @@ -0,0 +1,20 @@ +# Define the write cache to be power protected. This reduces the rate of cache +# flushes, which seems to help metadata heavy workloads on NTFS. We're just +# testing here anyway, so ... +# +# Let's do so for all disks, this could be useful beyond cirrus-ci.One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here. In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't. On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.
I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check. Medium term, I think we should invent a way for pg_ctl and other
tooling (including pg_regress) to wait for the service to come up. E.g. having
a named pipe that postmaster opens once the server is up, which should allow
multiple clients to use select/epoll/... to wait for it without looping.
ISTM making pg_regress use libpq w/ PQping() should be a pretty simple patch?
The non-polling approach obviously is even better, but also requires more
thought (and documentation and ...).
@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;- for (i = 0; i < wait_seconds; i++) + for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++) { /* Done if psql succeeds */ fflush(NULL); @@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[], outputdir); }- pg_usleep(1000000L);
+ pg_usleep(1000000L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?
I wouldn't quite call not waiting for 1s for the server to start, when it does
so within a few ms, chasing cycles ;). For short tests it's a substantial
fraction of the overall runtime...
Greetings,
Andres Freund
On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote:
I'll do another pass, but below are a few small comments so far.
I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?Ah, I should have explained that: I'm not planning to apply
- regress: Check for postgres startup completion more often
- ci: windows: Disabling write cache flushing during test
right now. Compared to the other patches the wins are much smaller and/or more
work is needed to make them good.I think it might be worth going for
- ci: switch tasks to debugoptimized build
because that provides a fair bit of gain. But it might be more hurtful than
helpful due to costing more when ccache doesn't work...
Gotcha.
+++ b/src/tools/ci/windows_write_cache.ps1 @@ -0,0 +1,20 @@ +# Define the write cache to be power protected. This reduces the rate of cache +# flushes, which seems to help metadata heavy workloads on NTFS. We're just +# testing here anyway, so ... +# +# Let's do so for all disks, this could be useful beyond cirrus-ci.One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here. In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't. On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.
I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.
It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?I wouldn't quite call not waiting for 1s for the server to start, when it does
so within a few ms, chasing cycles ;). For short tests it's a substantial
fraction of the overall runtime...
Absolutely, I was referring to shifting the sleep before the test to avoid the
extra test, not the reduction of the pg_usleep. Reducing the sleep is a clear
win.
--
Daniel Gustafsson
Hi,
Thanks for the patch!
On Wed, 23 Aug 2023 at 09:58, Andres Freund <andres@anarazel.de> wrote:
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.
Patch looks good to me besides some minor points.
v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patch:
diff --git a/.cirrus.star b/.cirrus.star
+ """The main function is executed by cirrus-ci after loading
.cirrus.yml and can
+ extend the CI definition further.
+
+ As documented in .cirrus.yml, the final CI configuration is composed of
+
+ 1) the contents of this file
Instead of '1) the contents of this file' comment, '1) the contents
of .cirrus.yml file' could be better since this comment appears in
.cirrus.star file.
+ if repo_config_url != None:
+ print("loading additional configuration from
\"{}\"".format(repo_config_url))
+ output += config_from(repo_config_url)
+ else:
+ output += "n# REPO_CI_CONFIG_URL was not set\n"
Possible typo at output += "n# REPO_CI_CONFIG_URL was not set\n".
v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.
Regards,
Nazir Bilal Yavuz
Microsoft
Daniel Gustafsson <daniel@yesql.se> writes:
On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:
I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.
I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.
I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other? That seems like it's probably a "so don't
do that" case, though.
regards, tom lane
On 23 Aug 2023, at 23:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:
I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other? That seems like it's probably a "so don't
do that" case, though.
Ah yes, that does ring a familiar bell. I agree that using it for pinging the
server should be safe either way, but we should document the use-with-caution
in pg_regress.c if/when we go down that path. I'll take a stab at changing the
psql retry loop for pinging tomorrow to see what it would look like.
--
Daniel Gustafsson
Hi,
On 2023-08-23 17:02:51 -0400, Tom Lane wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:
I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think
Or perhaps the opposite? That an installcheck pg_regress run might use the
system libpq, which doesn't have the symbols, or such?
Either way, with a function like PQping(), which existing in well beyond the
supported branches, that shouldn't be an issue?
... unless we hit problems with, say, a different default port number or
socket path compiled into one copy vs. the other? That seems like it's
probably a "so don't do that" case, though.
If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?
Greetings,
Andres Freund
Hi,
On 2023-08-23 23:55:15 +0300, Nazir Bilal Yavuz wrote:
On Wed, 23 Aug 2023 at 09:58, Andres Freund <andres@anarazel.de> wrote:
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.Patch looks good to me besides some minor points.
Thanks for looking!
v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patch: diff --git a/.cirrus.star b/.cirrus.star + """The main function is executed by cirrus-ci after loading .cirrus.yml and can + extend the CI definition further. + + As documented in .cirrus.yml, the final CI configuration is composed of + + 1) the contents of this fileInstead of '1) the contents of this file' comment, '1) the contents
of .cirrus.yml file' could be better since this comment appears in
.cirrus.star file.
Good catch.
+ if repo_config_url != None: + print("loading additional configuration from \"{}\"".format(repo_config_url)) + output += config_from(repo_config_url) + else: + output += "n# REPO_CI_CONFIG_URL was not set\n"Possible typo at output += "n# REPO_CI_CONFIG_URL was not set\n".
Fixed.
v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.
We could, but I'm not sure what the use would be?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2023-08-23 17:02:51 -0400, Tom Lane wrote:
... unless we hit problems with, say, a different default port number or
socket path compiled into one copy vs. the other? That seems like it's
probably a "so don't do that" case, though.
If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?
The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are. So we don't explicitly know where that
libpq will connect.
As I said, we might be able to start treating installed-libpq-not-
compatible-with-build as a "don't do it" case. Another idea is to try
to ensure that pg_regress uses the same libpq that the psql-under-test
does; but I'm not sure how to implement that.
regards, tom lane
Hi,
On 2023-08-23 17:55:53 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2023-08-23 17:02:51 -0400, Tom Lane wrote:
... unless we hit problems with, say, a different default port number or
socket path compiled into one copy vs. the other? That seems like it's
probably a "so don't do that" case, though.If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are. So we don't explicitly know where that
libpq will connect.
Stepping back: I don't think installcheck matters for the concrete use of
libpq we're discussing - the only time we wait for server startup is the
non-installcheck case.
There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?
I was wondering if we could create an unambiguous connection info, but that
seems like it'd be hard to do, without creating cross version hazards.
As I said, we might be able to start treating installed-libpq-not-
compatible-with-build as a "don't do it" case. Another idea is to try
to ensure that pg_regress uses the same libpq that the psql-under-test
does; but I'm not sure how to implement that.
I don't think that's likely to work, psql could use a libpq with a different
soversion. We could dlopen() libpq, etc, but that seems way too complicated.
What's the reason we don't force psql to come from the same build as
pg_regress?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2023-08-23 17:55:53 -0400, Tom Lane wrote:
The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are. So we don't explicitly know where that
libpq will connect.
Stepping back: I don't think installcheck matters for the concrete use of
libpq we're discussing - the only time we wait for server startup is the
non-installcheck case.
Oh, that's an excellent point. So for the immediately proposed use-case,
there's no issue. (We don't have a mode where we try to start a server
using already-installed executables.)
There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?
Meh. I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work. It's seldom hard to
tell from the test output whether the server crashed.
I was wondering if we could create an unambiguous connection info, but that
seems like it'd be hard to do, without creating cross version hazards.
Hmm, we don't expect the regression test suite to work against other
server versions, so maybe that could be made to work --- that is, we
could run the psql under test and get a full set of connection
parameters out of it? But I'm still not finding this worth the
trouble.
What's the reason we don't force psql to come from the same build as
pg_regress?
Because the point of installcheck is to check the installed binaries
--- including the installed psql and libpq.
(Thinks for a bit...) Maybe we should add pg_regress to the installed
fileset, and use that copy not the in-tree copy for installcheck?
Then we could assume it's using the same libpq as psql. IIRC there
have already been suggestions to do that for the benefit of PGXS
testing.
regards, tom lane
Hi,
On 2023-08-23 18:32:26 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?Meh. I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work. It's seldom hard to
tell from the test output whether the server crashed.
I find it pretty painful to wade through a several-megabyte regression.diffs
to find the cause of a crash. I think we ought to use
restart_after_crash=false, since after a crash there's no hope for the tests
to succeed, but even in that case, we end up with a lot of pointless contents
in regression.diffs. If we instead realized that we shouldn't start further
tests, we'd limit that by a fair bit.
Greetings,
Andres Freund
On 24.08.23 00:56, Andres Freund wrote:
Hi,
On 2023-08-23 18:32:26 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?Meh. I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work. It's seldom hard to
tell from the test output whether the server crashed.I find it pretty painful to wade through a several-megabyte regression.diffs
to find the cause of a crash. I think we ought to use
restart_after_crash=false, since after a crash there's no hope for the tests
to succeed, but even in that case, we end up with a lot of pointless contents
in regression.diffs. If we instead realized that we shouldn't start further
tests, we'd limit that by a fair bit.
I once coded it up so that if the server crashes during a test, it would
wait until it recovers before running the next test. I found that
useful. I agree the current behavior is not useful in any case.
Hi,
On Thu, 24 Aug 2023 at 00:48, Andres Freund <andres@anarazel.de> wrote:
v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.We could, but I'm not sure what the use would be?
My main idea behind this was that PG developers could choose
'buildtype: debug' while working on their patches and that
optimization makes it easier to choose the buildtype.
Regards,
Nazir Bilal Yavuz
Microsoft
Hi,
On 2023-08-22 23:58:33 -0700, Andres Freund wrote:
To make that possible, we need to make the compute resources for CI
configurable on a per-repository basis. After experimenting with a bunch of
ways to do that, I got stuck on that for a while. But since today we have
sufficient macos runners for cfbot available, so... I think the approach I
finally settled on is decent, although not great. It's described in the "main"
commit message:
[...]
ci: Prepare to make compute resources for CI configurable
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.
I've pushed this yesterday.
And then utilized it to make cfbot use
1) macos persistent workers, hosted by two community members
2) our own GCP account for all the other operating systems
There were a few issues initially (needed to change how to run multiple jobs
on a single mac, and looks like there were some issues with macos going to
sleep while processing jobs...). But it now seems to be chugging alone ok.
One of the nice things is that with our own compute we also control how much
storage can be used, making things like generating docs or code coverage as
part of cfbot more realistic. And we could enable mingw by default when run as
part of cfbot...
Greetings,
Andres Freund
On 23 Aug 2023, at 23:12, Daniel Gustafsson <daniel@yesql.se> wrote:
On 23 Aug 2023, at 23:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:
I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other? That seems like it's probably a "so don't
do that" case, though.Ah yes, that does ring a familiar bell. I agree that using it for pinging the
server should be safe either way, but we should document the use-with-caution
in pg_regress.c if/when we go down that path. I'll take a stab at changing the
psql retry loop for pinging tomorrow to see what it would look like.
Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress. In order to see performance it also includes
a diag output for "Time to first test" which contains all setup costs. This
might not make it into a commit but it was quite helpful in hacking so I left
it in for now.
The patch incorporates Andres' idea for finer granularity of checks by checking
TICKS times per second rather than once per second, it also shifts the
pg_usleep around to require just one ping in most cases compard to two today.
On my relatively tired laptop this speeds up pg_regress setup with 100+ms with
much bigger wins on Windows in the CI. While it does add a dependency on
libpq, I think it's a fairly decent price to pay for running tests faster.
--
Daniel Gustafsson
Attachments:
v1-0001-Speed-up-pg_regress-server-testing.patchapplication/octet-stream; name=v1-0001-Speed-up-pg_regress-server-testing.patch; x-unix-mode=0644Download
From eb71567159e488e9f6ba5596f18939dfb83b9078 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 28 Aug 2023 10:54:25 +0200
Subject: [PATCH v1] Speed up pg_regress server testing
Instead of connecting to the server with psql to check if it is ready
for running tests, pg_regress now use PQPing which avoids performing
expensive system() calls on Windows.
The frequency of tests is also increased in order to connect to the
server faster.
This patch is part of a larger effort to make testing consume fewer
resources in order to be able to fit more tests into the available
CI constraints.
Discussion: https://postgr.es/m/20230823192239.jxew5s3sjru63lio@awork3.anarazel.de
---
src/interfaces/ecpg/test/Makefile | 3 +-
src/interfaces/ecpg/test/meson.build | 2 +-
src/test/isolation/Makefile | 2 +-
src/test/isolation/meson.build | 2 +-
src/test/regress/GNUmakefile | 4 +-
src/test/regress/meson.build | 2 +-
src/test/regress/pg_regress.c | 87 ++++++++++++++++++++--------
7 files changed, 70 insertions(+), 32 deletions(-)
diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile
index cf841a3a5b..ba6ca837b3 100644
--- a/src/interfaces/ecpg/test/Makefile
+++ b/src/interfaces/ecpg/test/Makefile
@@ -10,6 +10,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := \
'-I$(top_builddir)/src/port' \
'-I$(top_srcdir)/src/test/regress' \
+ '-I$(libpq_srcdir)' \
'-DHOST_TUPLE="$(host_tuple)"' \
'-DSHELLPROG="$(SHELL)"' \
$(CPPFLAGS)
@@ -45,7 +46,7 @@ clean distclean maintainer-clean:
all: pg_regress$(X)
pg_regress$(X): pg_regress_ecpg.o $(WIN32RES) $(top_builddir)/src/test/regress/pg_regress.o
- $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+ $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
$(top_builddir)/src/test/regress/pg_regress.o:
$(MAKE) -C $(dir $@) $(notdir $@)
diff --git a/src/interfaces/ecpg/test/meson.build b/src/interfaces/ecpg/test/meson.build
index 04c6819a79..b7a3fb4e0e 100644
--- a/src/interfaces/ecpg/test/meson.build
+++ b/src/interfaces/ecpg/test/meson.build
@@ -18,7 +18,7 @@ pg_regress_ecpg = executable('pg_regress_ecpg',
pg_regress_ecpg_sources,
c_args: pg_regress_cflags,
include_directories: [pg_regress_inc, include_directories('.')],
- dependencies: [frontend_code],
+ dependencies: [frontend_code, libpq],
kwargs: default_bin_args + {
'install': false
},
diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index b8738b7c1b..e99602ae52 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -38,7 +38,7 @@ pg_regress.o: | submake-regress
rm -f $@ && $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o .
pg_isolation_regress$(X): isolation_main.o pg_regress.o $(WIN32RES)
- $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+ $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
isolationtester$(X): $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
diff --git a/src/test/isolation/meson.build b/src/test/isolation/meson.build
index a4439e8ad0..e6ebe62c1e 100644
--- a/src/test/isolation/meson.build
+++ b/src/test/isolation/meson.build
@@ -35,7 +35,7 @@ pg_isolation_regress = executable('pg_isolation_regress',
isolation_sources,
c_args: pg_regress_cflags,
include_directories: pg_regress_inc,
- dependencies: frontend_code,
+ dependencies: [frontend_code, libpq],
kwargs: default_bin_args + {
'install_dir': dir_pgxs / 'src/test/isolation',
},
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 38c3a1f85b..db0687f867 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -36,11 +36,11 @@ EXTRADEFS = '-DHOST_TUPLE="$(host_tuple)"' \
all: pg_regress$(X)
pg_regress$(X): pg_regress.o pg_regress_main.o $(WIN32RES) | submake-libpgport
- $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+ $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
# dependencies ensure that path changes propagate
pg_regress.o: pg_regress.c $(top_builddir)/src/port/pg_config_paths.h
-pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port $(EXTRADEFS)
+pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port -I$(libpq_srcdir) $(EXTRADEFS)
# note: because of the submake dependency, this rule's action is really a no-op
$(top_builddir)/src/port/pg_config_paths.h: | submake-libpgport
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index a045c00c1f..f0dfd85591 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -30,7 +30,7 @@ endif
pg_regress = executable('pg_regress',
regress_sources,
c_args: pg_regress_cflags,
- dependencies: [frontend_code],
+ dependencies: [frontend_code, libpq],
kwargs: default_bin_args + {
'install_dir': dir_pgxs / 'src/test/regress',
},
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index ec67588cf5..601fbb33d3 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -32,6 +32,7 @@
#include "common/username.h"
#include "getopt_long.h"
#include "lib/stringinfo.h"
+#include "libpq-fe.h"
#include "libpq/pqcomm.h" /* needed for UNIXSOCK_PATH() */
#include "pg_config_paths.h"
#include "pg_regress.h"
@@ -75,6 +76,12 @@ const char *pretty_diff_opts = "-w -U3";
*/
#define TESTNAME_WIDTH 36
+/*
+ * The number times per second that pg_regress checks to see if the test
+ * instance server has started and is available for connection.
+ */
+#define WAIT_TICKS_PER_SECOND 20
+
typedef enum TAPtype
{
DIAG = 0,
@@ -107,6 +114,7 @@ static bool nolocale = false;
static bool use_existing = false;
static char *hostname = NULL;
static int port = -1;
+static char portstr[16];
static bool port_specified_by_user = false;
static char *dlpath = PKGLIBDIR;
static char *user = NULL;
@@ -2107,7 +2115,10 @@ regression_main(int argc, char *argv[],
int i;
int option_index;
char buf[MAXPGPATH * 4];
- char buf2[MAXPGPATH * 4];
+ instr_time starttime;
+ instr_time stoptime;
+
+ INSTR_TIME_SET_CURRENT(starttime);
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2296,6 +2307,8 @@ regression_main(int argc, char *argv[],
const char *env_wait;
int wait_seconds;
const char *initdb_template_dir;
+ const char *keywords[4];
+ const char *values[4];
/*
* Prepare the temp instance
@@ -2435,22 +2448,30 @@ regression_main(int argc, char *argv[],
}
#endif
+ sprintf(portstr, "%d", port);
+
/*
- * Check if there is a postmaster running already.
+ * Prepare the connection params for checking the state of the server
+ * before starting the tests.
*/
- snprintf(buf2, sizeof(buf2),
- "\"%s%spsql\" -X postgres <%s 2>%s",
- bindir ? bindir : "",
- bindir ? "/" : "",
- DEVNULL, DEVNULL);
+ keywords[0] = "dbname";
+ values[0] = "postgres";
+ keywords[1] = "port";
+ values[1] = portstr;
+ keywords[2] = "host";
+ values[2] = hostname ? hostname : sockdir;
+ keywords[3] = NULL;
+ values[3] = NULL;
+ /*
+ * Check if there is a postmaster running already.
+ */
for (i = 0; i < 16; i++)
{
- fflush(NULL);
- if (system(buf2) == 0)
- {
- char s[16];
+ PGPing rv = PQpingParams(keywords, values, 1);
+ if (rv == PQPING_OK)
+ {
if (port_specified_by_user || i == 15)
{
note("port %d apparently in use", port);
@@ -2461,8 +2482,8 @@ regression_main(int argc, char *argv[],
note("port %d apparently in use, trying %d", port, port + 1);
port++;
- sprintf(s, "%d", port);
- setenv("PGPORT", s, 1);
+ sprintf(portstr, "%d", port);
+ setenv("PGPORT", portstr, 1);
}
else
break;
@@ -2485,11 +2506,11 @@ regression_main(int argc, char *argv[],
bail("could not spawn postmaster: %s", strerror(errno));
/*
- * Wait till postmaster is able to accept connections; normally this
- * is only a second or so, but Cygwin is reportedly *much* slower, and
- * test builds using Valgrind or similar tools might be too. Hence,
- * allow the default timeout of 60 seconds to be overridden from the
- * PGCTLTIMEOUT environment variable.
+ * Wait till postmaster is able to accept connections; normally takes
+ * only a fraction of a second or so, but Cygwin is reportedly *much*
+ * slower, and test builds using Valgrind or similar tools might be
+ * too. Hence, allow the default timeout of 60 seconds to be
+ * overridden from the PGCTLTIMEOUT environment variable.
*/
env_wait = getenv("PGCTLTIMEOUT");
if (env_wait != NULL)
@@ -2501,13 +2522,24 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;
- for (i = 0; i < wait_seconds; i++)
+ for (i = 0; i < wait_seconds * WAIT_TICKS_PER_SECOND; i++)
{
- /* Done if psql succeeds */
- fflush(NULL);
- if (system(buf2) == 0)
+ /*
+ * It's fairly unlikely that the server is responding immediately
+ * so we start with sleeping before checking instead of the other
+ * way around.
+ */
+ pg_usleep(1000000L / WAIT_TICKS_PER_SECOND);
+
+ PGPing rv = PQpingParams(keywords, values, 1);
+
+ /* Done if the server is running and accepts connections */
+ if (rv == PQPING_OK)
break;
+ if (rv == PQPING_NO_ATTEMPT)
+ bail("attempting to connect to postmaster failed");
+
/*
* Fail immediately if postmaster has exited
*/
@@ -2520,10 +2552,8 @@ regression_main(int argc, char *argv[],
bail("postmaster failed, examine \"%s/log/postmaster.log\" for the reason",
outputdir);
}
-
- pg_usleep(1000000L);
}
- if (i >= wait_seconds)
+ if (i >= wait_seconds * WAIT_TICKS_PER_SECOND)
{
diag("postmaster did not respond within %d seconds, examine \"%s/log/postmaster.log\" for the reason",
wait_seconds, outputdir);
@@ -2582,6 +2612,13 @@ regression_main(int argc, char *argv[],
create_role(sl->str, dblist);
}
+ /*
+ * Report how much time we spent during instance setup.
+ */
+ INSTR_TIME_SET_CURRENT(stoptime);
+ INSTR_TIME_SUBTRACT(stoptime, starttime);
+ diag("Time to first test: %.0f ms", INSTR_TIME_GET_MILLISEC(stoptime));
+
/*
* Ready to run the tests
*/
--
2.32.1 (Apple Git-133)
On 28 Aug 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:
Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.
The attached v2 fixes a silly mistake which led to a compiler warning.
--
Daniel Gustafsson
Attachments:
v2-0001-Speed-up-pg_regress-server-testing.patchapplication/octet-stream; name=v2-0001-Speed-up-pg_regress-server-testing.patch; x-unix-mode=0644Download
From 8b4fb9ea81b7e4eed8d0269962ed5aeb744f5684 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 28 Aug 2023 10:54:25 +0200
Subject: [PATCH v2] Speed up pg_regress server testing
Instead of connecting to the server with psql to check if it is ready
for running tests, pg_regress now use PQPing which avoids performing
expensive system() calls on Windows.
The frequency of tests is also increased in order to connect to the
server faster.
This patch is part of a larger effort to make testing consume fewer
resources in order to be able to fit more tests into the available
CI constraints.
Discussion: https://postgr.es/m/20230823192239.jxew5s3sjru63lio@awork3.anarazel.de
---
src/interfaces/ecpg/test/Makefile | 3 +-
src/interfaces/ecpg/test/meson.build | 2 +-
src/test/isolation/Makefile | 2 +-
src/test/isolation/meson.build | 2 +-
src/test/regress/GNUmakefile | 4 +-
src/test/regress/meson.build | 2 +-
src/test/regress/pg_regress.c | 88 ++++++++++++++++++++--------
7 files changed, 71 insertions(+), 32 deletions(-)
diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile
index cf841a3a5b..ba6ca837b3 100644
--- a/src/interfaces/ecpg/test/Makefile
+++ b/src/interfaces/ecpg/test/Makefile
@@ -10,6 +10,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := \
'-I$(top_builddir)/src/port' \
'-I$(top_srcdir)/src/test/regress' \
+ '-I$(libpq_srcdir)' \
'-DHOST_TUPLE="$(host_tuple)"' \
'-DSHELLPROG="$(SHELL)"' \
$(CPPFLAGS)
@@ -45,7 +46,7 @@ clean distclean maintainer-clean:
all: pg_regress$(X)
pg_regress$(X): pg_regress_ecpg.o $(WIN32RES) $(top_builddir)/src/test/regress/pg_regress.o
- $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+ $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
$(top_builddir)/src/test/regress/pg_regress.o:
$(MAKE) -C $(dir $@) $(notdir $@)
diff --git a/src/interfaces/ecpg/test/meson.build b/src/interfaces/ecpg/test/meson.build
index 04c6819a79..b7a3fb4e0e 100644
--- a/src/interfaces/ecpg/test/meson.build
+++ b/src/interfaces/ecpg/test/meson.build
@@ -18,7 +18,7 @@ pg_regress_ecpg = executable('pg_regress_ecpg',
pg_regress_ecpg_sources,
c_args: pg_regress_cflags,
include_directories: [pg_regress_inc, include_directories('.')],
- dependencies: [frontend_code],
+ dependencies: [frontend_code, libpq],
kwargs: default_bin_args + {
'install': false
},
diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index b8738b7c1b..e99602ae52 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -38,7 +38,7 @@ pg_regress.o: | submake-regress
rm -f $@ && $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o .
pg_isolation_regress$(X): isolation_main.o pg_regress.o $(WIN32RES)
- $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+ $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
isolationtester$(X): $(OBJS) | submake-libpq submake-libpgport
$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
diff --git a/src/test/isolation/meson.build b/src/test/isolation/meson.build
index a4439e8ad0..e6ebe62c1e 100644
--- a/src/test/isolation/meson.build
+++ b/src/test/isolation/meson.build
@@ -35,7 +35,7 @@ pg_isolation_regress = executable('pg_isolation_regress',
isolation_sources,
c_args: pg_regress_cflags,
include_directories: pg_regress_inc,
- dependencies: frontend_code,
+ dependencies: [frontend_code, libpq],
kwargs: default_bin_args + {
'install_dir': dir_pgxs / 'src/test/isolation',
},
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 38c3a1f85b..db0687f867 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -36,11 +36,11 @@ EXTRADEFS = '-DHOST_TUPLE="$(host_tuple)"' \
all: pg_regress$(X)
pg_regress$(X): pg_regress.o pg_regress_main.o $(WIN32RES) | submake-libpgport
- $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+ $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
# dependencies ensure that path changes propagate
pg_regress.o: pg_regress.c $(top_builddir)/src/port/pg_config_paths.h
-pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port $(EXTRADEFS)
+pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port -I$(libpq_srcdir) $(EXTRADEFS)
# note: because of the submake dependency, this rule's action is really a no-op
$(top_builddir)/src/port/pg_config_paths.h: | submake-libpgport
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index a045c00c1f..f0dfd85591 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -30,7 +30,7 @@ endif
pg_regress = executable('pg_regress',
regress_sources,
c_args: pg_regress_cflags,
- dependencies: [frontend_code],
+ dependencies: [frontend_code, libpq],
kwargs: default_bin_args + {
'install_dir': dir_pgxs / 'src/test/regress',
},
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index ec67588cf5..48df02156c 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -32,6 +32,7 @@
#include "common/username.h"
#include "getopt_long.h"
#include "lib/stringinfo.h"
+#include "libpq-fe.h"
#include "libpq/pqcomm.h" /* needed for UNIXSOCK_PATH() */
#include "pg_config_paths.h"
#include "pg_regress.h"
@@ -75,6 +76,12 @@ const char *pretty_diff_opts = "-w -U3";
*/
#define TESTNAME_WIDTH 36
+/*
+ * The number times per second that pg_regress checks to see if the test
+ * instance server has started and is available for connection.
+ */
+#define WAIT_TICKS_PER_SECOND 20
+
typedef enum TAPtype
{
DIAG = 0,
@@ -107,6 +114,7 @@ static bool nolocale = false;
static bool use_existing = false;
static char *hostname = NULL;
static int port = -1;
+static char portstr[16];
static bool port_specified_by_user = false;
static char *dlpath = PKGLIBDIR;
static char *user = NULL;
@@ -2107,7 +2115,10 @@ regression_main(int argc, char *argv[],
int i;
int option_index;
char buf[MAXPGPATH * 4];
- char buf2[MAXPGPATH * 4];
+ instr_time starttime;
+ instr_time stoptime;
+
+ INSTR_TIME_SET_CURRENT(starttime);
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2296,6 +2307,9 @@ regression_main(int argc, char *argv[],
const char *env_wait;
int wait_seconds;
const char *initdb_template_dir;
+ const char *keywords[4];
+ const char *values[4];
+ PGPing rv;
/*
* Prepare the temp instance
@@ -2435,22 +2449,30 @@ regression_main(int argc, char *argv[],
}
#endif
+ sprintf(portstr, "%d", port);
+
/*
- * Check if there is a postmaster running already.
+ * Prepare the connection params for checking the state of the server
+ * before starting the tests.
*/
- snprintf(buf2, sizeof(buf2),
- "\"%s%spsql\" -X postgres <%s 2>%s",
- bindir ? bindir : "",
- bindir ? "/" : "",
- DEVNULL, DEVNULL);
+ keywords[0] = "dbname";
+ values[0] = "postgres";
+ keywords[1] = "port";
+ values[1] = portstr;
+ keywords[2] = "host";
+ values[2] = hostname ? hostname : sockdir;
+ keywords[3] = NULL;
+ values[3] = NULL;
+ /*
+ * Check if there is a postmaster running already.
+ */
for (i = 0; i < 16; i++)
{
- fflush(NULL);
- if (system(buf2) == 0)
- {
- char s[16];
+ rv = PQpingParams(keywords, values, 1);
+ if (rv == PQPING_OK)
+ {
if (port_specified_by_user || i == 15)
{
note("port %d apparently in use", port);
@@ -2461,8 +2483,8 @@ regression_main(int argc, char *argv[],
note("port %d apparently in use, trying %d", port, port + 1);
port++;
- sprintf(s, "%d", port);
- setenv("PGPORT", s, 1);
+ sprintf(portstr, "%d", port);
+ setenv("PGPORT", portstr, 1);
}
else
break;
@@ -2485,11 +2507,11 @@ regression_main(int argc, char *argv[],
bail("could not spawn postmaster: %s", strerror(errno));
/*
- * Wait till postmaster is able to accept connections; normally this
- * is only a second or so, but Cygwin is reportedly *much* slower, and
- * test builds using Valgrind or similar tools might be too. Hence,
- * allow the default timeout of 60 seconds to be overridden from the
- * PGCTLTIMEOUT environment variable.
+ * Wait till postmaster is able to accept connections; normally takes
+ * only a fraction of a second or so, but Cygwin is reportedly *much*
+ * slower, and test builds using Valgrind or similar tools might be
+ * too. Hence, allow the default timeout of 60 seconds to be
+ * overridden from the PGCTLTIMEOUT environment variable.
*/
env_wait = getenv("PGCTLTIMEOUT");
if (env_wait != NULL)
@@ -2501,13 +2523,24 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;
- for (i = 0; i < wait_seconds; i++)
+ for (i = 0; i < wait_seconds * WAIT_TICKS_PER_SECOND; i++)
{
- /* Done if psql succeeds */
- fflush(NULL);
- if (system(buf2) == 0)
+ /*
+ * It's fairly unlikely that the server is responding immediately
+ * so we start with sleeping before checking instead of the other
+ * way around.
+ */
+ pg_usleep(1000000L / WAIT_TICKS_PER_SECOND);
+
+ rv = PQpingParams(keywords, values, 1);
+
+ /* Done if the server is running and accepts connections */
+ if (rv == PQPING_OK)
break;
+ if (rv == PQPING_NO_ATTEMPT)
+ bail("attempting to connect to postmaster failed");
+
/*
* Fail immediately if postmaster has exited
*/
@@ -2520,10 +2553,8 @@ regression_main(int argc, char *argv[],
bail("postmaster failed, examine \"%s/log/postmaster.log\" for the reason",
outputdir);
}
-
- pg_usleep(1000000L);
}
- if (i >= wait_seconds)
+ if (i >= wait_seconds * WAIT_TICKS_PER_SECOND)
{
diag("postmaster did not respond within %d seconds, examine \"%s/log/postmaster.log\" for the reason",
wait_seconds, outputdir);
@@ -2582,6 +2613,13 @@ regression_main(int argc, char *argv[],
create_role(sl->str, dblist);
}
+ /*
+ * Report how much time we spent during instance setup.
+ */
+ INSTR_TIME_SET_CURRENT(stoptime);
+ INSTR_TIME_SUBTRACT(stoptime, starttime);
+ diag("Time to first test: %.0f ms", INSTR_TIME_GET_MILLISEC(stoptime));
+
/*
* Ready to run the tests
*/
--
2.32.1 (Apple Git-133)
Hi,
On 2023-08-30 10:57:10 +0200, Daniel Gustafsson wrote:
On 28 Aug 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:
Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.The attached v2 fixes a silly mistake which led to a compiler warning.
Still seems like a good idea to me. To see what impact it has, I measured the
time running the pg_regress tests that take less than 6s on my machine - I
excluded the slower ones (like the main regression tests) because they'd hide
any overall difference.
ninja && m test --suite setup --no-rebuild && tests=$(m test --no-rebuild --list|grep -E '/regress'|grep -vE '(regress|postgres_fdw|test_integerset|intarray|amcheck|test_decoding)/regress'|cut -d' ' -f 3) && time m test --no-rebuild $tests
Time for:
master:
cassert:
real 0m5.265s
user 0m8.422s
sys 0m8.381s
optimized:
real 0m4.926s
user 0m6.356s
sys 0m8.263s
my patch (probing every 100ms with psql):
cassert:
real 0m3.465s
user 0m8.827s
sys 0m8.579s
optimized:
real 0m2.932s
user 0m6.596s
sys 0m8.458s
Daniel's (probing every 50ms with PQping()):
cassert:
real 0m3.347s
user 0m8.373s
sys 0m8.354s
optimized:
real 0m2.527s
user 0m6.156s
sys 0m8.315s
My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.
Greetings,
Andres Freund
On 13 Sep 2023, at 01:49, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-30 10:57:10 +0200, Daniel Gustafsson wrote:On 28 Aug 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:
Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.The attached v2 fixes a silly mistake which led to a compiler warning.
Still seems like a good idea to me. To see what impact it has, I measured the
time running the pg_regress tests that take less than 6s on my machine - I
excluded the slower ones (like the main regression tests) because they'd hide
any overall difference.
My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.
While it does add a lib dependency I think it's worth doing, so I propose we go
ahead with this for master.
--
Daniel Gustafsson
On 13 Sep 2023, at 01:49, Andres Freund <andres@anarazel.de> wrote:
My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.
I went ahead and applied this on master, thanks for review! Now to see if
there will be any noticeable difference in resource usage.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
I went ahead and applied this on master, thanks for review! Now to see if
there will be any noticeable difference in resource usage.
I think that tools like Coverity are likely to whine about your
use of sprintf instead of snprintf. Sure, it's perfectly safe,
but that won't stop the no-sprintf-ever crowd from complaining.
regards, tom lane
On 24 Oct 2023, at 22:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
I went ahead and applied this on master, thanks for review! Now to see if
there will be any noticeable difference in resource usage.I think that tools like Coverity are likely to whine about your
use of sprintf instead of snprintf. Sure, it's perfectly safe,
but that won't stop the no-sprintf-ever crowd from complaining.
Fair point, that's probably quite likely to happen. I can apply an snprintf()
conversion change like this in the two places introduced by this:
- sprintf(s, "%d", port);
+ sprintf(s, sizeof(s), "%d", port);
--
Daniel Gustafsson