Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Started by Andres Freundover 2 years ago39 messages
#1Andres Freund
andres@anarazel.de
8 attachment(s)

Hi,

As some of you might have seen when running CI, cirrus-ci is restricting how
much CI cycles everyone can use for free (announcement at [1]https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/). This takes
effect September 1st.

This obviously has consequences both for individual users of CI as well as
cfbot.

The first thing I think we should do is to lower the cost of CI. One thing I
had not entirely realized previously, is that macos CI is by far the most
expensive CI to provide. That's not just the case with cirrus-ci, but also
with other providers. See the series of patches described later in the email.

To me, the situation for cfbot is different than the one for individual
users.

IMO, for the individual user case it's important to use CI for "free", without
a whole lot of complexity. Which imo rules approaches like providing
$cloud_provider compute accounts, that's too much setup work. With the
improvements detailed below, cirrus' free CI would last about ~65 runs /
month.

For cfbot I hope we can find funding to pay for compute to use for CI. The, by
far, most expensive bit is macos. To a significant degree due to macos
licensing terms not allowing more than 2 VMs on a physical host :(.

The reason we chose cirrus-ci were

a) Ability to use full VMs, rather than a pre-selected set of VMs, which
allows us to test a larger number

b) Ability to link to log files, without requiring an account. E.g. github
actions doesn't allow to view logs unless logged in.

c) Amount of compute available.

The set of free CI providers has shrunk since we chose cirrus, as have the
"free" resources provided. I started, quite incomplete as of now, wiki page at
[4]: .

Potential paths forward for individual CI:

- migrate wholesale to another CI provider

- split CI tasks across different CI providers, rely on github et al
displaying the CI status for different platforms

- give up

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.

Opinions as to what to do?

The attached series of patches:

1) Makes startup of macos instances faster, using more efficient caching of
the required packages. Also submitted as [2]/messages/by-id/20230805202539.r3umyamsnctysdc7@awork3.anarazel.de.

2) Introduces a template initdb that's reused during the tests. Also submitted
as [3]/messages/by-id/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de

3) Remove use of -DRANDOMIZE_ALLOCATED_MEMORY from macos tasks. It's
expensive. And CI also uses asan on linux, so I don't think it's really
needed.

4) Switch tasks to use debugoptimized builds. Previously many tasks used -Og,
to get decent backtraces etc. But the amount of CPU burned that way is too
large. One issue with that is that use of ccache becomes much more crucial,
uncached build times do significantly increase.

5) Move use of -Dsegsize_blocks=6 from macos to linux

Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
could stop covering both meson and autoconf segsize_blocks. It does affect
runtime on linux as well.

6) Disable write cache flushes on windows

It's a bit ugly to do this without using the UI... Shaves off about 30s
from the tests.

7) pg_regress only checked once a second whether postgres started up, but it's
usually much faster. Use pg_ctl's logic. It might be worth replacing the
use psql with directly using libpq in pg_regress instead, looks like the
overhead of repeatedly starting psql is noticeable.

FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):

task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month

Greetings,

Andres Freund

[1]: https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/
[2]: /messages/by-id/20230805202539.r3umyamsnctysdc7@awork3.anarazel.de
[3]: /messages/by-id/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de

Attachments:

v1-0001-ci-macos-used-cached-macports-install.patchtext/x-diff; charset=us-asciiDownload
From 00c48a90f3acc6cba6af873b29429ee6d4ba38a6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 3 Aug 2023 23:29:13 -0700
Subject: [PATCH v1 1/9] ci: macos: used cached macports install

This substantially speeds up the mac CI time.

Discussion: https://postgr.es/m/20230805202539.r3umyamsnctysdc7@awork3.anarazel.de
---
 .cirrus.yml                          | 63 +++++++-----------
 src/tools/ci/ci_macports_packages.sh | 97 ++++++++++++++++++++++++++++
 2 files changed, 122 insertions(+), 38 deletions(-)
 create mode 100755 src/tools/ci/ci_macports_packages.sh

diff --git a/.cirrus.yml b/.cirrus.yml
index d260f15c4e2..e9cfc542cfe 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -429,8 +429,7 @@ task:
 
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
-    HOMEBREW_CACHE: ${HOME}/homebrew-cache
-    PERL5LIB: ${HOME}/perl5/lib/perl5
+    MACPORTS_CACHE: ${HOME}/macports-cache
 
     CC: ccache cc
     CXX: ccache c++
@@ -454,55 +453,43 @@ task:
     - mkdir ${HOME}/cores
     - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
-  perl_cache:
-    folder: ~/perl5
-  cpan_install_script:
-    - perl -mIPC::Run -e 1 || cpan -T IPC::Run
-    - perl -mIO::Pty -e 1 || cpan -T IO::Pty
-  upload_caches: perl
-
-
-  # XXX: Could we instead install homebrew into a cached directory? The
-  # homebrew installation takes a good bit of time every time, even if the
-  # packages do not need to be downloaded.
-  homebrew_cache:
-    folder: $HOMEBREW_CACHE
+  # Use macports, even though homebrew is installed. The installation
+  # of the additional packages we need would take quite a while with
+  # homebrew, even if we cache the downloads. We can't cache all of
+  # homebrew, because it's already large. So we use macports. To cache
+  # the installation we create a .dmg file that we mount if it already
+  # exists.
+  # XXX: The reason for the direct p5.34* references is that we'd need
+  # the large macport tree around to figure out that p5-io-tty is
+  # actually p5.34-io-tty. Using the unversioned name works, but
+  # updates macports every time.
+  macports_cache:
+    folder: ${MACPORTS_CACHE}
   setup_additional_packages_script: |
-    brew install \
+    sh src/tools/ci/ci_macports_packages.sh \
       ccache \
-      icu4c \
-      krb5 \
-      llvm \
+      icu \
+      kerberos5 \
       lz4 \
-      make \
       meson \
       openldap \
       openssl \
-      python \
-      tcl-tk \
+      p5.34-io-tty \
+      p5.34-ipc-run \
+      tcl \
       zstd
-
-    brew cleanup -s # to reduce cache size
-  upload_caches: homebrew
+    # Make macports install visible for subsequent steps
+    echo PATH=/opt/local/sbin/:/opt/local/bin/:$PATH >> $CIRRUS_ENV
+  upload_caches: macports
 
   ccache_cache:
     folder: $CCACHE_DIR
   configure_script: |
-    brewpath="/opt/homebrew"
-    PKG_CONFIG_PATH="${brewpath}/lib/pkgconfig:${PKG_CONFIG_PATH}"
-
-    for pkg in icu4c krb5 openldap openssl zstd ; do
-      pkgpath="${brewpath}/opt/${pkg}"
-      PKG_CONFIG_PATH="${pkgpath}/lib/pkgconfig:${PKG_CONFIG_PATH}"
-      PATH="${pkgpath}/bin:${pkgpath}/sbin:$PATH"
-    done
-
-    export PKG_CONFIG_PATH PATH
-
+    export PKG_CONFIG_PATH="/opt/local/lib/pkgconfig/"
     meson setup \
       --buildtype=debug \
-      -Dextra_include_dirs=${brewpath}/include \
-      -Dextra_lib_dirs=${brewpath}/lib \
+      -Dextra_include_dirs=/opt/local/include \
+      -Dextra_lib_dirs=/opt/local/lib \
       -Dcassert=true \
       -Duuid=e2fs -Ddtrace=auto \
       -Dsegsize_blocks=6 \
diff --git a/src/tools/ci/ci_macports_packages.sh b/src/tools/ci/ci_macports_packages.sh
new file mode 100755
index 00000000000..5f5d3027760
--- /dev/null
+++ b/src/tools/ci/ci_macports_packages.sh
@@ -0,0 +1,97 @@
+#!/bin/sh
+
+# Installs the passed in packages via macports. To make it fast enough
+# for CI, cache the installation as a .dmg file.  To avoid
+# unnecessarily updating the cache, the cached image is only modified
+# when packages are installed or removed.  Any package this script is
+# not instructed to install, will be removed again.
+#
+# This currently expects to be run in a macos cirrus-ci environment.
+
+set -e
+set -x
+
+packages="$@"
+
+macports_url="https://github.com/macports/macports-base/releases/download/v2.8.1/MacPorts-2.8.1-13-Ventura.pkg"
+cache_dmg="macports.hfs.dmg"
+
+if [ "$CIRRUS_CI" != "true" ]; then
+    echo "expect to be called within cirrus-ci" 1>2
+    exit 1
+fi
+
+sudo mkdir -p /opt/local
+mkdir -p ${MACPORTS_CACHE}/
+
+# If we are starting from clean cache, perform a fresh macports
+# install. Otherwise decompress the .dmg we created previously.
+#
+# After this we have a working macports installation, with an unknown set of
+# packages installed.
+new_install=0
+update_cached_image=0
+if [ -e ${MACPORTS_CACHE}/${cache_dmg}.zstd ]; then
+    time zstd -T0 -d ${MACPORTS_CACHE}/${cache_dmg}.zstd -o ${cache_dmg}
+    time sudo hdiutil attach -kernel ${cache_dmg} -owners on -shadow ${cache_dmg}.shadow -mountpoint /opt/local
+else
+    new_install=1
+    curl -fsSL -o macports.pkg "$macports_url"
+    time sudo installer -pkg macports.pkg -target /
+    # this is a throwaway environment, and it'd be a few lines to gin
+    # up a correct user / group when using the cache.
+    echo macportsuser root | sudo tee -a /opt/local/etc/macports/macports.conf
+fi
+export PATH=/opt/local/sbin/:/opt/local/bin/:$PATH
+
+# mark all installed packages unrequested, that allows us to detect
+# packages that aren't needed anymore
+if [ -n "$(port -q installed installed)" ] ; then
+    sudo port unsetrequested installed
+fi
+
+# if setting all the required packages as requested fails, we need
+# to install at least one of them
+if ! sudo port setrequested $packages > /dev/null 2>&1 ; then
+    echo not all required packages installed, doing so now
+    update_cached_image=1
+    # to keep the image small, we deleted the ports tree from the image...
+    sudo port selfupdate
+    # XXX likely we'll need some other way to force an upgrade at some
+    # point...
+    sudo port upgrade outdated
+    sudo port install -N $packages
+    sudo port setrequested $packages
+fi
+
+# check if any ports should be uninstalled
+if [ -n "$(port -q installed rleaves)" ] ; then
+    echo superflous packages installed
+    update_cached_image=1
+    sudo port uninstall --follow-dependencies rleaves
+
+    # remove prior cache contents, don't want to increase size
+    rm -f ${MACPORTS_CACHE}/*
+fi
+
+# Shrink installation if we created / modified it
+if [ "$new_install" -eq 1 -o "$update_cached_image" -eq 1 ]; then
+    sudo /opt/local/bin/port clean --all installed
+    sudo rm -rf /opt/local/var/macports/{software,sources}/*
+fi
+
+# If we're starting from a clean cache, start a new image. If we have
+# an image, but the contents changed, update the image in the cache
+# location.
+if [ "$new_install" -eq 1 ]; then
+    # use a generous size, so additional software can be installed later
+    time sudo hdiutil create -fs HFS+ -format UDRO -size 10g -layout NONE -srcfolder /opt/local/ ${cache_dmg}
+    time zstd -T -10 -z ${cache_dmg} -o ${MACPORTS_CACHE}/${cache_dmg}.zstd
+elif [ "$update_cached_image" -eq 1 ]; then
+    sudo hdiutil detach /opt/local/
+    time hdiutil convert -format UDRO ${cache_dmg} -shadow ${cache_dmg}.shadow -o updated.hfs.dmg
+    rm ${cache_dmg}.shadow
+    mv updated.hfs.dmg ${cache_dmg}
+    time zstd -T -10 -z ${cache_dmg} -o ${MACPORTS_CACHE}/${cache_dmg}.zstd
+    time sudo hdiutil attach -kernel ${cache_dmg} -owners on -shadow ${cache_dmg}.shadow -mountpoint /opt/local
+fi
-- 
2.38.0

v1-0002-Use-template-initdb-in-tests.patchtext/x-diff; charset=us-asciiDownload
From b050364589df42faf58bc21820fa1593cb0bff5b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 2 Feb 2023 21:51:53 -0800
Subject: [PATCH v1 2/9] Use "template" initdb in tests

Discussion: https://postgr.es/m/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
---
 meson.build                              | 30 ++++++++++
 .cirrus.yml                              |  3 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm | 46 ++++++++++++++-
 src/test/regress/pg_regress.c            | 74 ++++++++++++++++++------
 src/Makefile.global.in                   | 52 +++++++++--------
 5 files changed, 161 insertions(+), 44 deletions(-)

diff --git a/meson.build b/meson.build
index 0a11efc97a1..0e8a72b12b9 100644
--- a/meson.build
+++ b/meson.build
@@ -3048,8 +3048,10 @@ testport = 40000
 test_env = environment()
 
 temp_install_bindir = test_install_location / get_option('bindir')
+test_initdb_template = meson.build_root() / 'tmp_install' / 'initdb-template'
 test_env.set('PG_REGRESS', pg_regress.full_path())
 test_env.set('REGRESS_SHLIB', regress_module.full_path())
+test_env.set('INITDB_TEMPLATE', test_initdb_template)
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable PG_TEST_EXTRA.
@@ -3064,6 +3066,34 @@ if library_path_var != ''
 endif
 
 
+# Create (and remove old) initdb template directory. Tests use that, where
+# possible, to make it cheaper to run tests.
+#
+# Use python to remove the old cached initdb, as we cannot rely on a working
+# 'rm' binary on windows.
+test('initdb_cache',
+     python,
+     args: [
+       '-c', '''
+import shutil
+import sys
+import subprocess
+
+shutil.rmtree(sys.argv[1], ignore_errors=True)
+sp = subprocess.run(sys.argv[2:] + [sys.argv[1]])
+sys.exit(sp.returncode)
+''',
+       test_initdb_template,
+       temp_install_bindir / 'initdb',
+       '-A', 'trust', '-N', '--no-instructions'
+     ],
+     priority: setup_tests_priority - 1,
+     timeout: 300,
+     is_parallel: false,
+     env: test_env,
+     suite: ['setup'])
+
+
 
 ###############################################################
 # Test Generation
diff --git a/.cirrus.yml b/.cirrus.yml
index e9cfc542cfe..f08da65ed76 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -115,8 +115,9 @@ task:
   test_minimal_script: |
     su postgres <<-EOF
       ulimit -c unlimited
+      meson test $MTEST_ARGS --suite setup
       meson test $MTEST_ARGS --num-processes ${TEST_JOBS} \
-        tmp_install cube/regress pg_ctl/001_start_stop
+        cube/regress pg_ctl/001_start_stop
     EOF
 
   on_failure:
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 5e161dbee60..4d449c35de9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -522,8 +522,50 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
-		'trust', '-N', @{ $params{extra} });
+	# If available and if there aren't any parameters, use a previously
+	# initdb'd cluster as a template by copying it. For a lot of tests, that's
+	# substantially cheaper. Do so only if there aren't parameters, it doesn't
+	# seem worth figuring out whether they affect compatibility.
+	#
+	# There's very similar code in pg_regress.c, but we can't easily
+	# deduplicate it until we require perl at build time.
+	if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+	{
+		note("initializing database system by running initdb");
+		PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
+			'trust', '-N', @{ $params{extra} });
+	}
+	else
+	{
+		my @copycmd;
+		my $expected_exitcode;
+
+		note("initializing database system by copying initdb template");
+
+		if ($PostgreSQL::Test::Utils::windows_os)
+		{
+			@copycmd = qw(robocopy /E /NJS /NJH /NFL /NDL /NP);
+			$expected_exitcode = 1;    # 1 denotes files were copied
+		}
+		else
+		{
+			@copycmd = qw(cp -a);
+			$expected_exitcode = 0;
+		}
+
+		@copycmd = (@copycmd, $ENV{INITDB_TEMPLATE}, $pgdata);
+
+		my $ret = PostgreSQL::Test::Utils::system_log(@copycmd);
+
+		# See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR
+		if ($ret & 127 or $ret >> 8 != $expected_exitcode)
+		{
+			BAIL_OUT(
+				sprintf("failed to execute command \"%s\": $ret",
+					join(" ", @copycmd)));
+		}
+	}
+
 	PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS},
 		'--config-auth', $pgdata, @{ $params{auth_extra} });
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b68632320a7..407e3915cec 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2295,6 +2295,7 @@ regression_main(int argc, char *argv[],
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
+		const char *initdb_template_dir;
 
 		/*
 		 * Prepare the temp instance
@@ -2316,25 +2317,64 @@ regression_main(int argc, char *argv[],
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
 		initStringInfo(&cmd);
-		appendStringInfo(&cmd,
-						 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
-						 bindir ? bindir : "",
-						 bindir ? "/" : "",
-						 temp_instance);
-		if (debug)
-			appendStringInfo(&cmd, " --debug");
-		if (nolocale)
-			appendStringInfo(&cmd, " --no-locale");
-		appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
-		fflush(NULL);
-		if (system(cmd.data))
+
+		/*
+		 * Create data directory.
+		 *
+		 * If available, use a previously initdb'd cluster as a template by
+		 * copying it. For a lot of tests, that's substantially cheaper.
+		 *
+		 * There's very similar code in Cluster.pm, but we can't easily de
+		 * duplicate it until we require perl at build time.
+		 */
+		initdb_template_dir = getenv("INITDB_TEMPLATE");
+		if (initdb_template_dir == NULL || nolocale || debug)
 		{
-			bail("initdb failed\n"
-				 "# Examine \"%s/log/initdb.log\" for the reason.\n"
-				 "# Command was: %s",
-				 outputdir, cmd.data);
+			note("initializing database system by running initdb");
+
+			appendStringInfo(&cmd,
+							 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+							 bindir ? bindir : "",
+							 bindir ? "/" : "",
+							 temp_instance);
+			if (debug)
+				appendStringInfo(&cmd, " --debug");
+			if (nolocale)
+				appendStringInfo(&cmd, " --no-locale");
+			appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
+			fflush(NULL);
+			if (system(cmd.data))
+			{
+				bail("initdb failed\n"
+					 "# Examine \"%s/log/initdb.log\" for the reason.\n"
+					 "# Command was: %s",
+					 outputdir, cmd.data);
+			}
+		}
+		else
+		{
+#ifndef WIN32
+			const char *copycmd = "cp -a \"%s\" \"%s/data\"";
+			int			expected_exitcode = 0;
+#else
+			const char *copycmd = "robocopy /E /NJS /NJH /NFL /NDL /NP \"%s\" \"%s/data\"";
+			int			expected_exitcode = 1;	/* 1 denotes files were copied */
+#endif
+
+			note("initializing database system by copying initdb template");
+
+			appendStringInfo(&cmd,
+							 copycmd,
+							 initdb_template_dir,
+							 temp_instance);
+			if (system(cmd.data) != expected_exitcode)
+			{
+				bail("copying of initdb template failed\n"
+					 "# Examine \"%s/log/initdb.log\" for the reason.\n"
+					 "# Command was: %s",
+					 outputdir, cmd.data);
+			}
 		}
 
 		pfree(cmd.data);
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index df9f721a41a..0b4ca0eb6ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -397,30 +397,6 @@ check: temp-install
 
 .PHONY: temp-install
 
-temp-install: | submake-generated-headers
-ifndef NO_TEMP_INSTALL
-ifneq ($(abs_top_builddir),)
-ifeq ($(MAKELEVEL),0)
-	rm -rf '$(abs_top_builddir)'/tmp_install
-	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
-	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-endif
-endif
-endif
-
-# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
-# entries appear more than once in the tree, and parallel installs of the same
-# file can fail with EEXIST.
-checkprep:
-	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
-
-PROVE = @PROVE@
-# There are common routines in src/test/perl, and some test suites have
-# extra perl modules in their own directory.
-PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
-# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
-PROVE_FLAGS =
 
 # prepend to path if already set, else just set it
 define add_to_path
@@ -437,8 +413,36 @@ ld_library_path_var = LD_LIBRARY_PATH
 with_temp_install = \
 	PATH="$(abs_top_builddir)/tmp_install$(bindir):$(CURDIR):$$PATH" \
 	$(call add_to_path,$(strip $(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+	INITDB_TEMPLATE='$(abs_top_builddir)'/tmp_install/initdb-template \
 	$(with_temp_install_extra)
 
+temp-install: | submake-generated-headers
+ifndef NO_TEMP_INSTALL
+ifneq ($(abs_top_builddir),)
+ifeq ($(MAKELEVEL),0)
+	rm -rf '$(abs_top_builddir)'/tmp_install
+	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+
+	$(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+endif
+endif
+endif
+
+# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
+# entries appear more than once in the tree, and parallel installs of the same
+# file can fail with EEXIST.
+checkprep:
+	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+
+PROVE = @PROVE@
+# There are common routines in src/test/perl, and some test suites have
+# extra perl modules in their own directory.
+PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
+# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
+PROVE_FLAGS =
+
 ifeq ($(enable_tap_tests),yes)
 
 ifndef PGXS
-- 
2.38.0

v1-0003-ci-macos-Remove-use-of-DRANDOMIZE_ALLOCATED_MEMOR.patchtext/x-diff; charset=us-asciiDownload
From de80483807aeef8eceef175cf2834bae3da3a0e1 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:27:11 -0700
Subject: [PATCH v1 3/9] ci: macos: Remove use of -DRANDOMIZE_ALLOCATED_MEMORY

RANDOMIZE_ALLOCATED_MEMORY causes a measurable slowdown. Macos is, by far, the
most expensive platform to perform CI on, therefore it doesn't make sense to
run such a test there.

Ubsan and asan on linux should detect most of the the cases of uninitialized
memory, so it doesn't really seem worth using -DRANDOMIZE_ALLOCATED_MEMORY in
another instance type.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 .cirrus.yml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f08da65ed76..f3d63ff3fb0 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -434,7 +434,6 @@ task:
 
     CC: ccache cc
     CXX: ccache c++
-    CPPFLAGS: -DRANDOMIZE_ALLOCATED_MEMORY
     CFLAGS: -Og -ggdb
     CXXFLAGS: -Og -ggdb
 
-- 
2.38.0

v1-0004-ci-switch-tasks-to-debugoptimized-build.patchtext/x-diff; charset=us-asciiDownload
From a86a45567151f1b2534cf599f42414edab60e2e0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:27:51 -0700
Subject: [PATCH v1 4/9] ci: switch tasks to debugoptimized build

In aggregate the CI tasks burn a lot of cpu hours. Compared to that easy to
read backtraces aren't as important. Still use -ggdb where appropriate, as
that does make backtraces more reliable, particularly in the face of
optimization.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 .cirrus.yml | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f3d63ff3fb0..bfe251f48e8 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -140,7 +140,7 @@ task:
 
     CCACHE_DIR: /tmp/ccache_dir
     CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
-    CFLAGS: -Og -ggdb
+    CFLAGS: -ggdb
 
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
@@ -181,7 +181,7 @@ task:
   configure_script: |
     su postgres <<-EOF
       meson setup \
-        --buildtype=debug \
+        --buildtype=debugoptimized \
         -Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \
         -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
         -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \
@@ -275,7 +275,7 @@ task:
     ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0
 
     # SANITIZER_FLAGS is set in the tasks below
-    CFLAGS: -Og -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
+    CFLAGS: -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
     CXXFLAGS: $CFLAGS
     LDFLAGS: $SANITIZER_FLAGS
     CC: ccache gcc
@@ -364,7 +364,7 @@ task:
       configure_script: |
         su postgres <<-EOF
           meson setup \
-            --buildtype=debug \
+            --buildtype=debugoptimized \
             -Dcassert=true \
             ${LINUX_MESON_FEATURES} \
             -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
@@ -377,7 +377,7 @@ task:
         su postgres <<-EOF
           export CC='ccache gcc -m32'
           meson setup \
-            --buildtype=debug \
+            --buildtype=debugoptimized \
             -Dcassert=true \
             ${LINUX_MESON_FEATURES} \
             -Dllvm=disabled \
@@ -434,8 +434,8 @@ task:
 
     CC: ccache cc
     CXX: ccache c++
-    CFLAGS: -Og -ggdb
-    CXXFLAGS: -Og -ggdb
+    CFLAGS: -ggdb
+    CXXFLAGS: -ggdb
 
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*'
@@ -487,7 +487,7 @@ task:
   configure_script: |
     export PKG_CONFIG_PATH="/opt/local/lib/pkgconfig/"
     meson setup \
-      --buildtype=debug \
+      --buildtype=debugoptimized \
       -Dextra_include_dirs=/opt/local/include \
       -Dextra_lib_dirs=/opt/local/lib \
       -Dcassert=true \
-- 
2.38.0

v1-0005-ci-Move-use-of-Dsegsize_blocks-6-from-macos-to-li.patchtext/x-diff; charset=us-asciiDownload
From f888e984557e0ba79bf04263a962825dbc520c2d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 5 Aug 2023 13:36:28 -0700
Subject: [PATCH v1 5/9] ci: Move use of -Dsegsize_blocks=6 from macos to linux

The option causes a measurable slowdown. Macos is, by far, the most expensive
platform to perform CI on, therefore move the option to another task. In
addition, the filesystem overhead seems to impact macos worse than other
platforms.
---
 .cirrus.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index bfe251f48e8..35ef9c97211 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -366,6 +366,7 @@ task:
           meson setup \
             --buildtype=debugoptimized \
             -Dcassert=true \
+            -Dsegsize_blocks=6 \
             ${LINUX_MESON_FEATURES} \
             -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
             build
@@ -492,7 +493,6 @@ task:
       -Dextra_lib_dirs=/opt/local/lib \
       -Dcassert=true \
       -Duuid=e2fs -Ddtrace=auto \
-      -Dsegsize_blocks=6 \
       -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
       build
 
-- 
2.38.0

v1-0006-ci-windows-Disabling-write-cache-flushing-during-.patchtext/x-diff; charset=us-asciiDownload
From 7169a36f31568782d55cb2e57c5a3d6c87f23607 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:56:29 -0700
Subject: [PATCH v1 6/9] ci: windows: Disabling write cache flushing during
 test

This has been measured to reduce windows test times by about 30s.
---
 .cirrus.yml                          | 2 ++
 src/tools/ci/windows_write_cache.ps1 | 3 +++
 2 files changed, 5 insertions(+)
 create mode 100644 src/tools/ci/windows_write_cache.ps1

diff --git a/.cirrus.yml b/.cirrus.yml
index 35ef9c97211..ef825485826 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -559,6 +559,8 @@ task:
   setup_additional_packages_script: |
     REM choco install -y --no-progress ...
 
+  change_write_caching_script: powershell src/tools/ci/windows_write_cache.ps1
+
   setup_hosts_file_script: |
     echo 127.0.0.1 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
     echo 127.0.0.2 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
diff --git a/src/tools/ci/windows_write_cache.ps1 b/src/tools/ci/windows_write_cache.ps1
new file mode 100644
index 00000000000..9c52bc886d4
--- /dev/null
+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,3 @@
+Get-ItemProperty -path "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI/*/*/Device Parameters/Disk" -name CacheIsPowerProtected
+Set-ItemProperty -path "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI/*/*/Device Parameters/Disk" -name CacheIsPowerProtected -Value 0
+Get-ItemProperty -path "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI/*/*/Device Parameters/Disk" -name CacheIsPowerProtected
-- 
2.38.0

v1-0007-regress-Check-for-postgres-startup-completion-mor.patchtext/x-diff; charset=us-asciiDownload
From 99089f7cfbbfb2d10074009716a1420163bc6697 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:51:16 -0700
Subject: [PATCH v1 7/9] regress: Check for postgres startup completion more
 often

Previously pg_regress.c only checked whether the server started up once a
second - in most cases startup is much faster though. Use the same interval as
pg_ctl does.
---
 src/test/regress/pg_regress.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 407e3915cec..46af1fddbdb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -75,6 +75,9 @@ const char *pretty_diff_opts = "-w -U3";
  */
 #define TESTNAME_WIDTH 36
 
+/* how often to recheck if postgres startup completed */
+#define WAITS_PER_SEC	10
+
 typedef enum TAPtype
 {
 	DIAG = 0,
@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
 		else
 			wait_seconds = 60;
 
-		for (i = 0; i < wait_seconds; i++)
+		for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
 		{
 			/* Done if psql succeeds */
 			fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
 					 outputdir);
 			}
 
-			pg_usleep(1000000L);
+			pg_usleep(1000000L / WAITS_PER_SEC);
 		}
 		if (i >= wait_seconds)
 		{
-- 
2.38.0

v1-0008-ci-Don-t-specify-amount-of-memory.patchtext/x-diff; charset=us-asciiDownload
From 9edf90d5bfbb1933c68525481636f2e33ea3b1b4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:31:15 -0700
Subject: [PATCH v1 8/9] ci: Don't specify amount of memory

The number of CPUs is the cost-determining factor. Most instance types that
run tests have more memory/core than what we specified, there's no real
benefit in wasting that.
---
 .cirrus.yml | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index ef825485826..a4d64955489 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -150,7 +150,6 @@ task:
     image: family/pg-ci-freebsd-13
     platform: freebsd
     cpu: $CPUS
-    memory: 4G
     disk: 50
 
   sysinfo_script: |
@@ -292,7 +291,6 @@ task:
     image: family/pg-ci-bullseye
     platform: linux
     cpu: $CPUS
-    memory: 4G
 
   ccache_cache:
     folder: ${CCACHE_DIR}
@@ -554,7 +552,6 @@ task:
     image: family/pg-ci-windows-ci-vs-2019
     platform: windows
     cpu: $CPUS
-    memory: 4G
 
   setup_additional_packages_script: |
     REM choco install -y --no-progress ...
@@ -604,7 +601,6 @@ task:
     image: family/pg-ci-windows-ci-mingw64
     platform: windows
     cpu: $CPUS
-    memory: 4G
 
   env:
     TEST_JOBS: 4 # higher concurrency causes occasional failures
-- 
2.38.0

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-07 19:15:41 -0700, Andres Freund wrote:

The set of free CI providers has shrunk since we chose cirrus, as have the
"free" resources provided. I started, quite incomplete as of now, wiki page at
[4].

Oops, as Thomas just noticed, I left off that link:

[4]: https://wiki.postgresql.org/wiki/CI_Providers

Greetings,

Andres Freund

#3Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#1)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 08/08/2023 05:15, Andres Freund wrote:

IMO, for the individual user case it's important to use CI for "free", without
a whole lot of complexity. Which imo rules approaches like providing
$cloud_provider compute accounts, that's too much setup work.

+1

With the improvements detailed below, cirrus' free CI would last
about ~65 runs / month.

I think that's plenty.

For cfbot I hope we can find funding to pay for compute to use for CI.

+1

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.

Opinions as to what to do?

The resources for running our own system isn't free either. I'm sure we
can get sponsors for the cirrus-ci credits, or use donations.

I have been quite happy with Cirrus CI overall.

The attached series of patches:

All of this makes sense to me, although I don't use macos myself.

5) Move use of -Dsegsize_blocks=6 from macos to linux

Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
could stop covering both meson and autoconf segsize_blocks. It does affect
runtime on linux as well.

Could we have a comment somewhere on why we use -Dsegsize_blocks on
these particular CI runs? It seems pretty random. I guess the idea is to
have one autoconf task and one meson task with that option, to check
that the autoconf/meson option works?

6) Disable write cache flushes on windows

It's a bit ugly to do this without using the UI... Shaves off about 30s
from the tests.

A brief comment would be nice: "We don't care about persistence over
hard crashes in the CI, so disable write cache flushes to speed it up."

--
Heikki Linnakangas
Neon (https://neon.tech)

#4Peter Eisentraut
peter@eisentraut.org
In reply to: Andres Freund (#1)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 08.08.23 04:15, Andres Freund wrote:

Potential paths forward for individual CI:

- migrate wholesale to another CI provider

- split CI tasks across different CI providers, rely on github et al
displaying the CI status for different platforms

- give up

With the proposed optimizations, it seems you can still do a fair amount
of work under the free plan.

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.

I think we should use the "compute credits" plan from Cirrus CI. It
should be possible to estimate the costs for that. Money is available,
I think.

#5Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#4)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-08 16:28:49 +0200, Peter Eisentraut wrote:

On 08.08.23 04:15, Andres Freund wrote:

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.

I think we should use the "compute credits" plan from Cirrus CI. It should
be possible to estimate the costs for that. Money is available, I think.

Unfortunately just doing that seems like it would up considerably on the too
expensive side. Here are the stats for last months' cfbot runtimes (provided
by Thomas):

task_name | sum
------------------------------------------------+------------
FreeBSD - 13 - Meson | 1017:56:09
Windows - Server 2019, MinGW64 - Meson | 00:00:00
SanityCheck | 76:48:41
macOS - Ventura - Meson | 873:12:43
Windows - Server 2019, VS 2019 - Meson & ninja | 1251:08:06
Linux - Debian Bullseye - Autoconf | 830:17:26
Linux - Debian Bullseye - Meson | 860:37:21
CompilerWarnings | 935:30:35
(8 rows)

If I did the math right, that's about 7000 credits (and 1 credit costs 1 USD).

task costs in credits
linux-sanity: 55.30
linux-autoconf: 598.04
linux-meson: 619.40
linux-compiler-warnings: 674.28
freebsd : 732.24
windows : 1201.09
macos : 3143.52

Now, those times are before optimizing test runtime. And besides optimizing
the tasks, we can also optimize not running tests for docs patches etc. And
optimize cfbot to schedule a bit better.

But still, the costs look not realistic to me.

If instead we were to use our own GCP account, it's a lot less. t2d-standard-4
instances, which are faster than what we use right now, cost $0.168984 / hour
as "normal" instances and $0.026764 as "spot" instances right now [1]https://cloud.google.com/compute/all-pricing. Windows
VMs are considerably more expensive due to licensing - 0.184$/h in addition.

Assuming spot instances, linux+freebsd tasks would cost ~100USD month (maybe
10-20% more in reality, due to a) spot instances getting terminated requiring
retries and b) disks).

Windows would be ~255 USD / month (same retries caveats).

Given the cost of macos, it seems like it'd be by far the most of affordable
to just buy 1-2 mac minis (2x ~660USD) and stick them in a shelf somewhere, as
persistent runners. Cirrus has builtin macos virtualization support - but can
only host two VMs on each mac, due to macos licensing restrictions. A single
mac mini would suffice to keep up with our unoptimized monthly runtime
(although there likely would be some overhead).

Greetings,

Andres Freund

[1]: https://cloud.google.com/compute/all-pricing

#6Tristan Partin
tristan@neon.tech
In reply to: Andres Freund (#1)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On Mon Aug 7, 2023 at 9:15 PM CDT, Andres Freund wrote:

FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):

task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month

I am not in the loop on the autotools vs meson stuff. How much longer do
we anticipate keeping autotools around? Seems like it could be a good
opportunity to reduce some CI usage if autotools were finally dropped,
but I know there are still outstanding tasks to complete.

Back of the napkin math says autotools is about 12% of the credit cost,
though I haven't looked to see if linux-meson and linux-autotools are
1:1.

--
Tristan Partin
Neon (https://neon.tech)

#7Andres Freund
andres@anarazel.de
In reply to: Tristan Partin (#6)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-08 10:25:52 -0500, Tristan Partin wrote:

On Mon Aug 7, 2023 at 9:15 PM CDT, Andres Freund wrote:

FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):

task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month

I am not in the loop on the autotools vs meson stuff. How much longer do we
anticipate keeping autotools around?

I think it depends in what fashion. We've been talking about supporting
building out-of-tree modules with "pgxs" for at least a 5 year support
window. But the replacement isn't yet finished [1]https://github.com/anarazel/postgres/tree/meson-pkgconfig, so that clock hasn't yet
started ticking.

Seems like it could be a good opportunity to reduce some CI usage if
autotools were finally dropped, but I know there are still outstanding tasks
to complete.

Back of the napkin math says autotools is about 12% of the credit cost,
though I haven't looked to see if linux-meson and linux-autotools are 1:1.

The autoconf task is actually doing quite useful stuff right now, leaving the
use of configure aside, as it builds with address sanitizer. Without that it'd
be a lot faster. But we'd loose, imo quite important, coverage. The tests
would run a bit faster with meson, but it'd be overall a difference on the
margins.

Greetings,

Andres Freund

[1]: https://github.com/anarazel/postgres/tree/meson-pkgconfig

#8Tristan Partin
tristan@neon.tech
In reply to: Andres Freund (#7)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On Tue Aug 8, 2023 at 10:38 AM CDT, Andres Freund wrote:

Hi,

On 2023-08-08 10:25:52 -0500, Tristan Partin wrote:

On Mon Aug 7, 2023 at 9:15 PM CDT, Andres Freund wrote:

FWIW: with the patches applied, the "credit costs" in cirrus CI are roughly
like the following (depends on caching etc):

task costs in credits
linux-sanity: 0.01
linux-compiler-warnings: 0.05
linux-meson: 0.07
freebsd : 0.08
linux-autoconf: 0.09
windows : 0.18
macos : 0.28
total task runtime is 40.8
cost in credits is 0.76, monthly credits of 50 allow approx 66.10 runs/month

I am not in the loop on the autotools vs meson stuff. How much longer do we
anticipate keeping autotools around?

I think it depends in what fashion. We've been talking about supporting
building out-of-tree modules with "pgxs" for at least a 5 year support
window. But the replacement isn't yet finished [1], so that clock hasn't yet
started ticking.

Seems like it could be a good opportunity to reduce some CI usage if
autotools were finally dropped, but I know there are still outstanding tasks
to complete.

Back of the napkin math says autotools is about 12% of the credit cost,
though I haven't looked to see if linux-meson and linux-autotools are 1:1.

The autoconf task is actually doing quite useful stuff right now, leaving the
use of configure aside, as it builds with address sanitizer. Without that it'd
be a lot faster. But we'd loose, imo quite important, coverage. The tests
would run a bit faster with meson, but it'd be overall a difference on the
margins.

[1] https://github.com/anarazel/postgres/tree/meson-pkgconfig

Makes sense. Please let me know if I can help you out in anyway for the
v17 development cycle besides what we have already talked about.

--
Tristan Partin
Neon (https://neon.tech)

#9Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#5)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 2023-Aug-08, Andres Freund wrote:

Given the cost of macos, it seems like it'd be by far the most of affordable
to just buy 1-2 mac minis (2x ~660USD) and stick them in a shelf somewhere, as
persistent runners. Cirrus has builtin macos virtualization support - but can
only host two VMs on each mac, due to macos licensing restrictions. A single
mac mini would suffice to keep up with our unoptimized monthly runtime
(although there likely would be some overhead).

If using persistent workers is an option, maybe we should explore that.
I think we could move all or some of the Linux - Debian builds to
hardware that we already have in shelves (depending on how much compute
power is really needed.)

I think using other OSes is more difficult, mostly because I doubt we
want to deal with licenses; but even FreeBSD might not be a realistic
option, at least not in the short term.

Still,

task_name | sum
------------------------------------------------+------------
FreeBSD - 13 - Meson | 1017:56:09
Windows - Server 2019, MinGW64 - Meson | 00:00:00
SanityCheck | 76:48:41
macOS - Ventura - Meson | 873:12:43
Windows - Server 2019, VS 2019 - Meson & ninja | 1251:08:06
Linux - Debian Bullseye - Autoconf | 830:17:26
Linux - Debian Bullseye - Meson | 860:37:21
CompilerWarnings | 935:30:35
(8 rows)

moving just Debian, that might alleviate 76+830+860+935 hours from the
Cirrus infra, which is ~46%. Not bad.

(How come Windows - Meson reports allballs?)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
(Carlos Caszeli)

#10Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#9)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-08 18:34:58 +0200, Alvaro Herrera wrote:

On 2023-Aug-08, Andres Freund wrote:

Given the cost of macos, it seems like it'd be by far the most of affordable
to just buy 1-2 mac minis (2x ~660USD) and stick them in a shelf somewhere, as
persistent runners. Cirrus has builtin macos virtualization support - but can
only host two VMs on each mac, due to macos licensing restrictions. A single
mac mini would suffice to keep up with our unoptimized monthly runtime
(although there likely would be some overhead).

If using persistent workers is an option, maybe we should explore that.
I think we could move all or some of the Linux - Debian builds to
hardware that we already have in shelves (depending on how much compute
power is really needed.)

(76+830+860+935)/((365/12)*24) = 3.7

3.7 instances with 4 "vcores" are busy 100% of the time. So we'd need at least
~16 cpu threads - I think cirrus sometimes uses instances that disable HT, so
it'd perhaps be 16 cores actually.

I think using other OSes is more difficult, mostly because I doubt we
want to deal with licenses; but even FreeBSD might not be a realistic
option, at least not in the short term.

They can be VMs, so that shouldn't be a big issue.

task_name | sum
------------------------------------------------+------------
FreeBSD - 13 - Meson | 1017:56:09
Windows - Server 2019, MinGW64 - Meson | 00:00:00
SanityCheck | 76:48:41
macOS - Ventura - Meson | 873:12:43
Windows - Server 2019, VS 2019 - Meson & ninja | 1251:08:06
Linux - Debian Bullseye - Autoconf | 830:17:26
Linux - Debian Bullseye - Meson | 860:37:21
CompilerWarnings | 935:30:35
(8 rows)

moving just Debian, that might alleviate 76+830+860+935 hours from the
Cirrus infra, which is ~46%. Not bad.

(How come Windows - Meson reports allballs?)

It's mingw64, which we've marked as "manual", because we didn't have the cpu
cycles to run it.

Greetings,

Andres Freund

#11Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#3)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-08 11:58:25 +0300, Heikki Linnakangas wrote:

On 08/08/2023 05:15, Andres Freund wrote:

With the improvements detailed below, cirrus' free CI would last
about ~65 runs / month.

I think that's plenty.

Not so sure, I would regularly exceed it, I think. But it definitely will
suffice for more casual contributors.

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.

Opinions as to what to do?

The resources for running our own system isn't free either. I'm sure we can
get sponsors for the cirrus-ci credits, or use donations.

As outlined in my reply to Alvaro, just using credits likely is financially
not viable...

5) Move use of -Dsegsize_blocks=6 from macos to linux

Macos is expensive, -Dsegsize_blocks=6 slows things down. Alternatively we
could stop covering both meson and autoconf segsize_blocks. It does affect
runtime on linux as well.

Could we have a comment somewhere on why we use -Dsegsize_blocks on these
particular CI runs? It seems pretty random. I guess the idea is to have one
autoconf task and one meson task with that option, to check that the
autoconf/meson option works?

Hm, some of that was in the commit message, but I should have added it to
.cirrus.yml as well.

Normally, the "relation segment" code basically has no coverage in our tests,
because we (quite reasonably) don't generate tables large enough. We've had
plenty bugs that we didn't notice due the code not being exercised much. So it
seemed useful to add CI coverage, by making the segments very small.

I chose the tasks by looking at how long they took at the time, I
think. Adding them to to the slower ones.

6) Disable write cache flushes on windows

It's a bit ugly to do this without using the UI... Shaves off about 30s
from the tests.

A brief comment would be nice: "We don't care about persistence over hard
crashes in the CI, so disable write cache flushes to speed it up."

Turns out that patch doesn't work on its own anyway, at least not
reliably... I tested it by interactively logging into a windows vm and testing
it there. It doesn't actually seem to suffice when run in isolation, because
the relevant registry key doesn't yet exist. I haven't yet figured out the
magic incantations for adding the missing "intermediary", but I'm getting
there...

Greetings,

Andres Freund

#12Robert Treat
rob@xzilla.net
In reply to: Andres Freund (#11)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On Tue, Aug 8, 2023 at 9:26 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-08-08 11:58:25 +0300, Heikki Linnakangas wrote:

On 08/08/2023 05:15, Andres Freund wrote:

With the improvements detailed below, cirrus' free CI would last
about ~65 runs / month.

I think that's plenty.

Not so sure, I would regularly exceed it, I think. But it definitely will
suffice for more casual contributors.

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

- Build our own system, using buildbot, jenkins or whatnot.

Opinions as to what to do?

The resources for running our own system isn't free either. I'm sure we can
get sponsors for the cirrus-ci credits, or use donations.

As outlined in my reply to Alvaro, just using credits likely is financially
not viable...

In case it's helpful, from an SPI oriented perspective, $7K/month is
probably an order of magnitude more than what we can sustain, so I
don't see a way to make that work without some kind of additional
magic that includes other non-profits and/or commercial companies
changing donation habits between now and September.

Purchasing a couple of mac-mini's (and/or similar gear) would be near
trivial though, just a matter of figuring out where/how to host it
(but I think infra can chime in on that if that's what get's decided).

The other likely option would be to seek out cloud credits from one of
the big three (or others); Amazon has continually said they would be
happy to donate more credits to us if we had a use, and I think some
of the other hosting providers have said similarly at times; so we'd
need to ask and hope it's not too bureaucratic.

Robert Treat
https://xzilla.net

#13Andres Freund
andres@anarazel.de
In reply to: Robert Treat (#12)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-08 22:29:50 -0400, Robert Treat wrote:

In case it's helpful, from an SPI oriented perspective, $7K/month is
probably an order of magnitude more than what we can sustain, so I
don't see a way to make that work without some kind of additional
magic that includes other non-profits and/or commercial companies
changing donation habits between now and September.

Yea, I think that'd make no sense, even if we could afford it. I think the
patches I've written should drop it to 1/2 already. Thomas added some
throttling to push it down further.

Purchasing a couple of mac-mini's (and/or similar gear) would be near
trivial though, just a matter of figuring out where/how to host it
(but I think infra can chime in on that if that's what get's decided).

Cool. Because of the limitation of running two VMs at a time on macos and the
comparatively low cost of mac minis, it seems they beat alternative models by
a fair bit.

Pginfra/sysadmin: ^

Based on being off by an order of magnitude, as you mention earlier, it seems
that:

1) reducing test runtime etc, as already in progress
2) getting 2 mac minis as runners
3) using ~350 USD / mo in GCP costs for windows, linux, freebsd (*)

Would be viable for a month or three? I hope we can get some cloud providers
to chip in for 3), but I'd like to have something in place that doesn't depend
on that.

Given the cost of macos VMs at AWS, the only of the big cloud providers to
have macos instances, I think we'd burn pointlessly quick through credits if
we used VMs for that.

(*) I think we should be able to get below that, but ...

The other likely option would be to seek out cloud credits from one of
the big three (or others); Amazon has continually said they would be
happy to donate more credits to us if we had a use, and I think some
of the other hosting providers have said similarly at times; so we'd
need to ask and hope it's not too bureaucratic.

Yep.

I tried to start that progress within microsoft, fwiw. Perhaps Joe and
Jonathan know how to start within AWS? And perhaps Noah inside GCP?

It'd be the least work to get it up and running in GCP, as it's already
running there, but should be quite doable at the others as well.

Greetings,

Andres Freund

#14Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Andres Freund (#11)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On Wed, Aug 9, 2023 at 3:26 AM Andres Freund <andres@anarazel.de> wrote:

6) Disable write cache flushes on windows

It's a bit ugly to do this without using the UI... Shaves off

about 30s

from the tests.

A brief comment would be nice: "We don't care about persistence over hard
crashes in the CI, so disable write cache flushes to speed it up."

Turns out that patch doesn't work on its own anyway, at least not
reliably... I tested it by interactively logging into a windows vm and
testing
it there. It doesn't actually seem to suffice when run in isolation,
because
the relevant registry key doesn't yet exist. I haven't yet figured out the
magic incantations for adding the missing "intermediary", but I'm getting
there...

You can find a good example on how to accomplish this in:

https://github.com/farag2/Utilities/blob/master/Enable_disk_write_caching.ps1

Regards,

Juan José Santamaría Flecha

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#13)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hello

So pginfra had a little chat about this. Firstly, there's consensus
that it makes sense for pginfra to help out with some persistent workers
in our existing VM system; however there are some aspects that need
some further discussion, to avoid destabilizing the rest of the
infrastructure. We're looking into it and we'll let you know.

Hosting a couple of Mac Minis is definitely a possibility, if some
entity like SPI buys them. Let's take this off-list to arrange the
details.

Regards

--
Álvaro Herrera

#16Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#13)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On Tue, Aug 08, 2023 at 07:59:55PM -0700, Andres Freund wrote:

On 2023-08-08 22:29:50 -0400, Robert Treat wrote:

3) using ~350 USD / mo in GCP costs for windows, linux, freebsd (*)

The other likely option would be to seek out cloud credits

I tried to start that progress within microsoft, fwiw. Perhaps Joe and
Jonathan know how to start within AWS? And perhaps Noah inside GCP?

It'd be the least work to get it up and running in GCP, as it's already
running there

I'm looking at this. Thanks for bringing it to my attention.

#17Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
10 attachment(s)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-07 19:15:41 -0700, Andres Freund wrote:

As some of you might have seen when running CI, cirrus-ci is restricting how
much CI cycles everyone can use for free (announcement at [1]). This takes
effect September 1st.

This obviously has consequences both for individual users of CI as well as
cfbot.

[...]

Potential paths forward for individual CI:

- migrate wholesale to another CI provider

- split CI tasks across different CI providers, rely on github et al
displaying the CI status for different platforms

- give up

Potential paths forward for cfbot, in addition to the above:

- Pay for compute / ask the various cloud providers to grant us compute
credits. At least some of the cloud providers can be used via cirrus-ci.

- Host (some) CI runners ourselves. Particularly with macos and windows, that
could provide significant savings.

To make that possible, we need to make the compute resources for CI
configurable on a per-repository basis. After experimenting with a bunch of
ways to do that, I got stuck on that for a while. But since today we have
sufficient macos runners for cfbot available, so... I think the approach I
finally settled on is decent, although not great. It's described in the "main"
commit message:
ci: Prepare to make compute resources for CI configurable

cirrus-ci will soon restrict the amount of free resources every user gets (as
have many other CI providers). For most users of CI that should not be an
issue. But e.g. for cfbot it will be an issue.

To allow configuring different resources on a per-repository basis, introduce
infrastructure for overriding the task execution environment. Unfortunately
this is not entirely trivial, as yaml anchors have to be defined before their
use, and cirrus-ci only allows injecting additional contents at the end of
.cirrus.yml.

To deal with that, move the definition of the CI tasks to
.cirrus.tasks.yml. The main .cirrus.yml is loaded first, then, if defined, the
file referenced by the REPO_CI_CONFIG_GIT_URL variable, will be added,
followed by the contents of .cirrus.tasks.yml. That allows
REPO_CI_CONFIG_GIT_URL to override the yaml anchors defined in .cirrus.yml.

Unfortunately git's default merge / rebase strategy does not handle copied
files, just renamed ones. To avoid painful rebasing over this change, this
commit just renames .cirrus.yml to .cirrus.tasks.yml, without adding a new
.cirrus.yml. That's done in the followup commit, which moves the relevant
portion of .cirrus.tasks.yml to .cirrus.yml. Until that is done,
REPO_CI_CONFIG_GIT_URL does not fully work.

The subsequent commit adds documentation for how to configure custom compute
resources to src/tools/ci/README

Discussion: /messages/by-id/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added

I don't love moving most of the contents of .cirrus.yml into a new file, but I
don't see another way. I did implement it without that as well (see [1]https://github.com/anarazel/postgres/commit/b95fd302161b951f1dc14d586162ed3d85564bfc), but
that ends up considerably harder to understand, and hardcodes what cfbot
needs. Splitting the commit, as explained above, at least makes git rebase
fairly painless. FWIW, I did merge the changes into 15, with only reasonable
conflicts (due to new tasks, autoconf->meson).

A prerequisite commit converts "SanityCheck" and "CompilerWarnings" to use a
full VM instead of a container - that way providing custom compute resources
doesn't have to deal with containers in addition to VMs. It also looks like
the increased startup overhead is outweighed by the reduction in runtime
overhead.

I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

Greetings,

Andres Freund

[1]: https://github.com/anarazel/postgres/commit/b95fd302161b951f1dc14d586162ed3d85564bfc

Attachments:

v3-0001-ci-Don-t-specify-amount-of-memory.patchtext/x-diff; charset=us-asciiDownload
From d1fa394bf9318f08c1c529160c3e6cedc5bb5289 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:31:15 -0700
Subject: [PATCH v3 01/10] ci: Don't specify amount of memory

The number of CPUs is the cost-determining factor. Most instance types that
run tests have more memory/core than what we specified, there's no real
benefit in wasting that.

Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
 .cirrus.yml | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 727c434de40..9e84eb95be5 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -149,7 +149,6 @@ task:
     image: family/pg-ci-freebsd-13
     platform: freebsd
     cpu: $CPUS
-    memory: 4G
     disk: 50
 
   sysinfo_script: |
@@ -291,7 +290,6 @@ task:
     image: family/pg-ci-bullseye
     platform: linux
     cpu: $CPUS
-    memory: 4G
 
   ccache_cache:
     folder: ${CCACHE_DIR}
@@ -558,7 +556,6 @@ task:
     image: family/pg-ci-windows-ci-vs-2019
     platform: windows
     cpu: $CPUS
-    memory: 4G
 
   setup_additional_packages_script: |
     REM choco install -y --no-progress ...
@@ -606,7 +603,6 @@ task:
     image: family/pg-ci-windows-ci-mingw64
     platform: windows
     cpu: $CPUS
-    memory: 4G
 
   env:
     TEST_JOBS: 4 # higher concurrency causes occasional failures
-- 
2.38.0

v3-0002-ci-Move-execution-method-of-tasks-into-yaml-templ.patchtext/x-diff; charset=us-asciiDownload
From dee50464cd75c1eadc46e7cc673d23d6bf01e6b6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 21:25:28 -0700
Subject: [PATCH v3 02/10] ci: Move execution method of tasks into yaml
 templates

This is done in preparation for making the compute resources for CI
configurable. It also looks cleaner.

Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
 .cirrus.yml | 85 +++++++++++++++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 9e84eb95be5..75747b9b651 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -9,6 +9,7 @@ env:
   GCP_PROJECT: pg-ci-images
   IMAGE_PROJECT: $GCP_PROJECT
   CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
+  DISK_SIZE: 25
 
   # The lower depth accelerates git clone. Use a bit of depth so that
   # concurrent tasks and retrying older jobs have a chance of working.
@@ -28,6 +29,45 @@ env:
   PG_TEST_EXTRA: kerberos ldap ssl load_balance
 
 
+# Define how to run various types of tasks.
+
+# VMs provided by cirrus-ci. Each user has a limited number of "free" credits
+# for testing.
+cirrus_community_vm_template: &cirrus_community_vm_template
+  compute_engine_instance:
+    image_project: $IMAGE_PROJECT
+    image: family/$IMAGE_FAMILY
+    platform: $PLATFORM
+    cpu: $CPUS
+    disk: $DISK_SIZE
+
+
+default_linux_task_template: &linux_task_template
+  env:
+    PLATFORM: linux
+  <<: *cirrus_community_vm_template
+
+
+default_freebsd_task_template: &freebsd_task_template
+  env:
+    PLATFORM: freebsd
+  <<: *cirrus_community_vm_template
+
+
+default_windows_task_template: &windows_task_template
+  env:
+    PLATFORM: windows
+  <<: *cirrus_community_vm_template
+
+
+# macos workers provided by cirrus-ci
+default_macos_task_template: &macos_task_template
+  env:
+    PLATFORM: macos
+  macos_instance:
+    image: $IMAGE
+
+
 # What files to preserve in case tests fail
 on_failure_ac: &on_failure_ac
   log_artifacts:
@@ -136,21 +176,18 @@ task:
     CPUS: 2
     BUILD_JOBS: 3
     TEST_JOBS: 3
+    IMAGE_FAMILY: pg-ci-freebsd-13
+    DISK_SIZE: 50
 
     CCACHE_DIR: /tmp/ccache_dir
     CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
     CFLAGS: -Og -ggdb
 
+  <<: *freebsd_task_template
+
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
 
-  compute_engine_instance:
-    image_project: $IMAGE_PROJECT
-    image: family/pg-ci-freebsd-13
-    platform: freebsd
-    cpu: $CPUS
-    disk: 50
-
   sysinfo_script: |
     id
     uname -a
@@ -250,6 +287,7 @@ task:
     CPUS: 4
     BUILD_JOBS: 4
     TEST_JOBS: 8 # experimentally derived to be a decent choice
+    IMAGE_FAMILY: pg-ci-bullseye
 
     CCACHE_DIR: /tmp/ccache_dir
     DEBUGINFOD_URLS: "https://debuginfod.debian.net"
@@ -282,15 +320,11 @@ task:
     LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
     LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES
 
+  <<: *linux_task_template
+
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
 
-  compute_engine_instance:
-    image_project: $IMAGE_PROJECT
-    image: family/pg-ci-bullseye
-    platform: linux
-    cpu: $CPUS
-
   ccache_cache:
     folder: ${CCACHE_DIR}
 
@@ -430,6 +464,7 @@ task:
     # work OK. See
     # https://postgr.es/m/20220927040208.l3shfcidovpzqxfh%40awork3.anarazel.de
     TEST_JOBS: 8
+    IMAGE: ghcr.io/cirruslabs/macos-ventura-base:latest
 
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
@@ -440,12 +475,11 @@ task:
     CFLAGS: -Og -ggdb
     CXXFLAGS: -Og -ggdb
 
+  <<: *macos_task_template
+
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*'
 
-  macos_instance:
-    image: ghcr.io/cirruslabs/macos-ventura-base:latest
-
   sysinfo_script: |
     id
     uname -a
@@ -524,6 +558,7 @@ WINDOWS_ENVIRONMENT_BASE: &WINDOWS_ENVIRONMENT_BASE
     # Avoids port conflicts between concurrent tap test runs
     PG_TEST_USE_UNIX_SOCKETS: 1
     PG_REGRESS_SOCK_DIR: "c:/cirrus/"
+    DISK_SIZE: 50
 
   sysinfo_script: |
     chcp
@@ -547,16 +582,13 @@ task:
     # given that it explicitly prevents crash dumps from working...
     # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX
     CIRRUS_WINDOWS_ERROR_MODE: 0x8001
+    IMAGE_FAMILY: pg-ci-windows-ci-vs-2019
+
+  <<: *windows_task_template
 
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
 
-  compute_engine_instance:
-    image_project: $IMAGE_PROJECT
-    image: family/pg-ci-windows-ci-vs-2019
-    platform: windows
-    cpu: $CPUS
-
   setup_additional_packages_script: |
     REM choco install -y --no-progress ...
 
@@ -598,12 +630,6 @@ task:
   # otherwise it'll be sorted before other tasks
   depends_on: SanityCheck
 
-  compute_engine_instance:
-    image_project: $IMAGE_PROJECT
-    image: family/pg-ci-windows-ci-mingw64
-    platform: windows
-    cpu: $CPUS
-
   env:
     TEST_JOBS: 4 # higher concurrency causes occasional failures
     CCACHE_DIR: C:/msys64/ccache
@@ -617,6 +643,9 @@ task:
     # Start bash in current working directory
     CHERE_INVOKING: 1
     BASH: C:\msys64\usr\bin\bash.exe -l
+    IMAGE_FAMILY: pg-ci-windows-ci-mingw64
+
+  <<: *windows_task_template
 
   ccache_cache:
     folder: ${CCACHE_DIR}
-- 
2.38.0

v3-0003-ci-Use-VMs-for-SanityCheck-and-CompilerWarnings.patchtext/x-diff; charset=us-asciiDownload
From a60733b7bb86e93a3d23e45b98ce179d0f4e26fd Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 21:28:37 -0700
Subject: [PATCH v3 03/10] ci: Use VMs for SanityCheck and CompilerWarnings

The main reason for this change is to reduce different ways of executing
tasks, making it easier to use custom compute resources for cfbot. A secondary
benefit is that the tasks seem slightly faster this way, apparently the
increased startup overhead is outweighed by reduced runtime overhead.

Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
 .cirrus.yml | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 75747b9b651..f4276ad8692 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -110,15 +110,14 @@ task:
     CPUS: 4
     BUILD_JOBS: 8
     TEST_JOBS: 8
+    IMAGE_FAMILY: pg-ci-bullseye
     CCACHE_DIR: ${CIRRUS_WORKING_DIR}/ccache_dir
     # no options enabled, should be small
     CCACHE_MAXSIZE: "150M"
 
-  # Container starts up quickly, but is slower at runtime, particularly for
-  # tests. Good for the briefly running sanity check.
-  container:
-    image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
-    cpu: $CPUS
+  # While containers would start up a bit quicker, building is a bit
+  # slower. This way we don't have to maintain a container image.
+  <<: *linux_task_template
 
   ccache_cache:
     folder: $CCACHE_DIR
@@ -691,6 +690,7 @@ task:
   env:
     CPUS: 4
     BUILD_JOBS: 4
+    IMAGE_FAMILY: pg-ci-bullseye
 
     # Use larger ccache cache, as this task compiles with multiple compilers /
     # flag combinations
@@ -700,9 +700,7 @@ task:
     LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
     LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES
 
-  container:
-    image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
-    cpu: $CPUS
+  <<: *linux_task_template
 
   sysinfo_script: |
     id
-- 
2.38.0

v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patchtext/x-diff; charset=us-asciiDownload
From a4e238c4c4426f371535fb88c4046fb9e127c923 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 23:48:32 -0700
Subject: [PATCH v3 04/10] ci: Prepare to make compute resources for CI
 configurable

cirrus-ci will soon restrict the amount of free resources every user gets (as
have many other CI providers). For most users of CI that should not be an
issue. But e.g. for cfbot it will be an issue.

To allow configuring different resources on a per-repository basis, introduce
infrastructure for overriding the task execution environment. Unfortunately
this is not entirely trivial, as yaml anchors have to be defined before their
use, and cirrus-ci only allows injecting additional contents at the end of
.cirrus.yml.

To deal with that, move the definition of the CI tasks to
.cirrus.tasks.yml. The main .cirrus.yml is loaded first, then, if defined, the
file referenced by the REPO_CI_CONFIG_GIT_URL variable, will be added,
followed by the contents of .cirrus.tasks.yml. That allows
REPO_CI_CONFIG_GIT_URL to override the yaml anchors defined in .cirrus.yml.

Unfortunately git's default merge / rebase strategy does not handle copied
files, just renamed ones. To avoid painful rebasing over this change, this
commit just renames .cirrus.yml to .cirrus.tasks.yml, without adding a new
.cirrus.yml. That's done in the followup commit, which moves the relevant
portion of .cirrus.tasks.yml to .cirrus.yml.  Until that is done,
REPO_CI_CONFIG_GIT_URL does not fully work.

The subsequent commit adds documentation for how to configure custom compute
resources to src/tools/ci/README

Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
 .cirrus.star                     | 63 ++++++++++++++++++++++++++++++++
 .cirrus.yml => .cirrus.tasks.yml |  0
 2 files changed, 63 insertions(+)
 create mode 100644 .cirrus.star
 rename .cirrus.yml => .cirrus.tasks.yml (100%)

diff --git a/.cirrus.star b/.cirrus.star
new file mode 100644
index 00000000000..f3ea9b943e4
--- /dev/null
+++ b/.cirrus.star
@@ -0,0 +1,63 @@
+"""Additional CI configuration, using the starlark language. See
+https://cirrus-ci.org/guide/programming-tasks/#introduction-into-starlark
+
+See also the starlark specification at
+https://github.com/bazelbuild/starlark/blob/master/spec.md
+
+See also .cirrus.yml and src/tools/ci/README
+"""
+
+load("cirrus", "env", "fs")
+
+
+def main():
+    """The main function is executed by cirrus-ci after loading .cirrus.yml and can
+    extend the CI definition further.
+
+    As documented in .cirrus.yml, the final CI configuration is composed of
+
+    1) the contents of this file
+
+    2) if defined, the contents of the file referenced by the, repository
+       level, REPO_CI_CONFIG_GIT_URL variable (see
+       https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted
+       format)
+
+    3) .cirrus.tasks.yml
+    """
+
+    output = ""
+
+    # 1) is included implicitly
+
+    # Add 2)
+    repo_config_url = env.get("REPO_CI_CONFIG_GIT_URL")
+    if repo_config_url != None:
+        print("loading additional configuration from \"{}\"".format(repo_config_url))
+        output += config_from(repo_config_url)
+    else:
+        output += "n# REPO_CI_CONFIG_URL was not set\n"
+
+    # Add 3)
+    output += config_from(".cirrus.tasks.yml")
+
+    return output
+
+
+def config_from(config_src):
+    """return contents of config file `config_src`, surrounded by markers
+    indicating start / end of the the included file
+    """
+
+    config_contents = fs.read(config_src)
+    config_fmt = """
+
+###
+# contents of config file `{0}` start here
+###
+{1}
+###
+# contents of config file `{0}` end here
+###
+"""
+    return config_fmt.format(config_src, config_contents)
diff --git a/.cirrus.yml b/.cirrus.tasks.yml
similarity index 100%
rename from .cirrus.yml
rename to .cirrus.tasks.yml
-- 
2.38.0

v3-0005-ci-Make-compute-resources-for-CI-configurable.patchtext/x-diff; charset=us-asciiDownload
From 53a4c9642dda8ffea3c7b73e97f4b23ea0a4f2a3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 22:26:54 -0700
Subject: [PATCH v3 05/10] ci: Make compute resources for CI configurable

See prior commit for an explanation for the goal of the change and why it had
to be split into two commits.

Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qmn3w@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
 .cirrus.yml         | 73 +++++++++++++++++++++++++++++++++++++++++++++
 .cirrus.tasks.yml   | 45 ----------------------------
 src/tools/ci/README | 17 +++++++++++
 3 files changed, 90 insertions(+), 45 deletions(-)
 create mode 100644 .cirrus.yml

diff --git a/.cirrus.yml b/.cirrus.yml
new file mode 100644
index 00000000000..a83129ae46d
--- /dev/null
+++ b/.cirrus.yml
@@ -0,0 +1,73 @@
+# CI configuration file for CI utilizing cirrus-ci.org
+#
+# For instructions on how to enable the CI integration in a repository and
+# further details, see src/tools/ci/README
+#
+#
+# The actual CI tasks are defined in .cirrus.tasks.yml. To make the compute
+# resources for CI configurable on a repository level, the "final" CI
+# configuration is the combination of:
+#
+# 1) the contents of this file
+#
+# 2) if defined, the contents of the file referenced by the, repository
+#    level, REPO_CI_CONFIG_GIT_URL variable (see
+#    https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted
+#    format)
+#
+# 3) .cirrus.tasks.yml
+#
+# This composition is done by .cirrus.star
+
+
+env:
+  # Source of images / containers
+  GCP_PROJECT: pg-ci-images
+  IMAGE_PROJECT: $GCP_PROJECT
+  CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
+  DISK_SIZE: 25
+
+
+# Define how to run various types of tasks.
+
+# VMs provided by cirrus-ci. Each user has a limited number of "free" credits
+# for testing.
+cirrus_community_vm_template: &cirrus_community_vm_template
+  compute_engine_instance:
+    image_project: $IMAGE_PROJECT
+    image: family/$IMAGE_FAMILY
+    platform: $PLATFORM
+    cpu: $CPUS
+    disk: $DISK_SIZE
+
+
+default_linux_task_template: &linux_task_template
+  env:
+    PLATFORM: linux
+  <<: *cirrus_community_vm_template
+
+
+default_freebsd_task_template: &freebsd_task_template
+  env:
+    PLATFORM: freebsd
+  <<: *cirrus_community_vm_template
+
+
+default_windows_task_template: &windows_task_template
+  env:
+    PLATFORM: windows
+  <<: *cirrus_community_vm_template
+
+
+# macos workers provided by cirrus-ci
+default_macos_task_template: &macos_task_template
+  env:
+    PLATFORM: macos
+  macos_instance:
+    image: $IMAGE
+
+
+# Contents of REPO_CI_CONFIG_GIT_URL, if defined, will be inserted here,
+# followed by the contents .cirrus.tasks.yml. This allows
+# REPO_CI_CONFIG_GIT_URL to override how the task types above will be
+# executed, e.g. using a custom compute account or permanent workers.
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index f4276ad8692..0cf7ba77996 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -5,12 +5,6 @@
 
 
 env:
-  # Source of images / containers
-  GCP_PROJECT: pg-ci-images
-  IMAGE_PROJECT: $GCP_PROJECT
-  CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
-  DISK_SIZE: 25
-
   # The lower depth accelerates git clone. Use a bit of depth so that
   # concurrent tasks and retrying older jobs have a chance of working.
   CIRRUS_CLONE_DEPTH: 500
@@ -29,45 +23,6 @@ env:
   PG_TEST_EXTRA: kerberos ldap ssl load_balance
 
 
-# Define how to run various types of tasks.
-
-# VMs provided by cirrus-ci. Each user has a limited number of "free" credits
-# for testing.
-cirrus_community_vm_template: &cirrus_community_vm_template
-  compute_engine_instance:
-    image_project: $IMAGE_PROJECT
-    image: family/$IMAGE_FAMILY
-    platform: $PLATFORM
-    cpu: $CPUS
-    disk: $DISK_SIZE
-
-
-default_linux_task_template: &linux_task_template
-  env:
-    PLATFORM: linux
-  <<: *cirrus_community_vm_template
-
-
-default_freebsd_task_template: &freebsd_task_template
-  env:
-    PLATFORM: freebsd
-  <<: *cirrus_community_vm_template
-
-
-default_windows_task_template: &windows_task_template
-  env:
-    PLATFORM: windows
-  <<: *cirrus_community_vm_template
-
-
-# macos workers provided by cirrus-ci
-default_macos_task_template: &macos_task_template
-  env:
-    PLATFORM: macos
-  macos_instance:
-    image: $IMAGE
-
-
 # What files to preserve in case tests fail
 on_failure_ac: &on_failure_ac
   log_artifacts:
diff --git a/src/tools/ci/README b/src/tools/ci/README
index 80d01939e84..30ddd200c96 100644
--- a/src/tools/ci/README
+++ b/src/tools/ci/README
@@ -65,3 +65,20 @@ messages. Currently the following controls are available:
 
   Only runs CI on operating systems specified. This can be useful when
   addressing portability issues affecting only a subset of platforms.
+
+
+Using custom compute resources for CI
+=====================================
+
+When running a lot of tests in a repository, cirrus-ci's free credits do not
+suffice. In those cases a repository can be configured to use other
+infrastructure for running tests. To do so, the REPO_CI_CONFIG_GIT_URL
+variable can be configured for the repository in the cirrus-ci web interface,
+at https://cirrus-ci.com/github/<user or organization>. The file referenced
+(see https://cirrus-ci.org/guide/programming-tasks/#fs) by the variable can
+overwrite the default execution method for different operating systems,
+defined in .cirrus.yml, by redefining the relevant yaml anchors.
+
+Custom compute resources can be provided using
+- https://cirrus-ci.org/guide/supported-computing-services/
+- https://cirrus-ci.org/guide/persistent-workers/
-- 
2.38.0

v3-0006-ci-dontmerge-Example-custom-CI-configuration.patchtext/x-diff; charset=us-asciiDownload
From cae9e8fd2c257eaad0270f79b595eb70a743ab13 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 22 Aug 2023 21:45:31 -0700
Subject: [PATCH v3 06/10] ci: dontmerge: Example custom CI configuration

Uses a custom google compute account (freebsd, linux and windows) and
persistent workers (macos).

This only works in repositories that are configured for both. In addition,
cirrus-ci needs to be configured to set the REPO_CI_CONFIG_GIT_URL pointing to
the file added here.
---
 .cirrus.custom.yml | 49 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 .cirrus.custom.yml

diff --git a/.cirrus.custom.yml b/.cirrus.custom.yml
new file mode 100644
index 00000000000..42cbeecc715
--- /dev/null
+++ b/.cirrus.custom.yml
@@ -0,0 +1,49 @@
+gcp_credentials:
+  workload_identity_provider: projects/1072892761768/locations/global/workloadIdentityPools/cirrus-ci-pool/providers/cirrus-oidc
+  service_account: cirrus-ci@pg-ci-runs.iam.gserviceaccount.com
+
+
+# Defaults
+gce_instance:
+  type: t2d-standard-4
+  spot: true
+  zone: us-west1-a
+  use_ssd: true
+
+
+gce_instance_template: &gce_instance_template
+  gce_instance:
+    image_project: $IMAGE_PROJECT
+    image_family: $IMAGE_FAMILY
+    platform: $PLATFORM
+    disk: $DISK_SIZE
+    platform: $PLATFORM
+
+
+gce_linux_task_template: &linux_task_template
+  env:
+    PLATFORM: linux
+  <<: *gce_instance_template
+
+
+gce_freebsd_task_template: &freebsd_task_template
+  env:
+    PLATFORM: freebsd
+  <<: *gce_instance_template
+
+
+gce_windows_task_template: &windows_task_template
+  env:
+    PLATFORM: windows
+  <<: *gce_instance_template
+
+
+persistent_worker_macos_task_template: &macos_task_template
+  env:
+    PLATFORM: macos
+  persistent_worker:
+    isolation:
+      tart:
+        image: $IMAGE
+        user: admin
+        password: admin
-- 
2.38.0

v3-0007-Use-template-initdb-in-tests.patchtext/x-diff; charset=us-asciiDownload
From 12a0287a38ed77529be2cf8a8a37d772459fa090 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 2 Feb 2023 21:51:53 -0800
Subject: [PATCH v3 07/10] Use "template" initdb in tests

Discussion: https://postgr.es/m/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
---
 meson.build                              | 30 ++++++++++
 src/test/perl/PostgreSQL/Test/Cluster.pm | 46 ++++++++++++++-
 src/test/regress/pg_regress.c            | 74 ++++++++++++++++++------
 .cirrus.tasks.yml                        |  3 +-
 src/Makefile.global.in                   | 52 +++++++++--------
 5 files changed, 161 insertions(+), 44 deletions(-)

diff --git a/meson.build b/meson.build
index f5ec442f9a9..8b2b521a013 100644
--- a/meson.build
+++ b/meson.build
@@ -3070,8 +3070,10 @@ testport = 40000
 test_env = environment()
 
 temp_install_bindir = test_install_location / get_option('bindir')
+test_initdb_template = meson.build_root() / 'tmp_install' / 'initdb-template'
 test_env.set('PG_REGRESS', pg_regress.full_path())
 test_env.set('REGRESS_SHLIB', regress_module.full_path())
+test_env.set('INITDB_TEMPLATE', test_initdb_template)
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable PG_TEST_EXTRA.
@@ -3086,6 +3088,34 @@ if library_path_var != ''
 endif
 
 
+# Create (and remove old) initdb template directory. Tests use that, where
+# possible, to make it cheaper to run tests.
+#
+# Use python to remove the old cached initdb, as we cannot rely on a working
+# 'rm' binary on windows.
+test('initdb_cache',
+     python,
+     args: [
+       '-c', '''
+import shutil
+import sys
+import subprocess
+
+shutil.rmtree(sys.argv[1], ignore_errors=True)
+sp = subprocess.run(sys.argv[2:] + [sys.argv[1]])
+sys.exit(sp.returncode)
+''',
+       test_initdb_template,
+       temp_install_bindir / 'initdb',
+       '-A', 'trust', '-N', '--no-instructions'
+     ],
+     priority: setup_tests_priority - 1,
+     timeout: 300,
+     is_parallel: false,
+     env: test_env,
+     suite: ['setup'])
+
+
 
 ###############################################################
 # Test Generation
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 5e161dbee60..4d449c35de9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -522,8 +522,50 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
-		'trust', '-N', @{ $params{extra} });
+	# If available and if there aren't any parameters, use a previously
+	# initdb'd cluster as a template by copying it. For a lot of tests, that's
+	# substantially cheaper. Do so only if there aren't parameters, it doesn't
+	# seem worth figuring out whether they affect compatibility.
+	#
+	# There's very similar code in pg_regress.c, but we can't easily
+	# deduplicate it until we require perl at build time.
+	if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+	{
+		note("initializing database system by running initdb");
+		PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
+			'trust', '-N', @{ $params{extra} });
+	}
+	else
+	{
+		my @copycmd;
+		my $expected_exitcode;
+
+		note("initializing database system by copying initdb template");
+
+		if ($PostgreSQL::Test::Utils::windows_os)
+		{
+			@copycmd = qw(robocopy /E /NJS /NJH /NFL /NDL /NP);
+			$expected_exitcode = 1;    # 1 denotes files were copied
+		}
+		else
+		{
+			@copycmd = qw(cp -a);
+			$expected_exitcode = 0;
+		}
+
+		@copycmd = (@copycmd, $ENV{INITDB_TEMPLATE}, $pgdata);
+
+		my $ret = PostgreSQL::Test::Utils::system_log(@copycmd);
+
+		# See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR
+		if ($ret & 127 or $ret >> 8 != $expected_exitcode)
+		{
+			BAIL_OUT(
+				sprintf("failed to execute command \"%s\": $ret",
+					join(" ", @copycmd)));
+		}
+	}
+
 	PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS},
 		'--config-auth', $pgdata, @{ $params{auth_extra} });
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b68632320a7..407e3915cec 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2295,6 +2295,7 @@ regression_main(int argc, char *argv[],
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
+		const char *initdb_template_dir;
 
 		/*
 		 * Prepare the temp instance
@@ -2316,25 +2317,64 @@ regression_main(int argc, char *argv[],
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
 		initStringInfo(&cmd);
-		appendStringInfo(&cmd,
-						 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
-						 bindir ? bindir : "",
-						 bindir ? "/" : "",
-						 temp_instance);
-		if (debug)
-			appendStringInfo(&cmd, " --debug");
-		if (nolocale)
-			appendStringInfo(&cmd, " --no-locale");
-		appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
-		fflush(NULL);
-		if (system(cmd.data))
+
+		/*
+		 * Create data directory.
+		 *
+		 * If available, use a previously initdb'd cluster as a template by
+		 * copying it. For a lot of tests, that's substantially cheaper.
+		 *
+		 * There's very similar code in Cluster.pm, but we can't easily de
+		 * duplicate it until we require perl at build time.
+		 */
+		initdb_template_dir = getenv("INITDB_TEMPLATE");
+		if (initdb_template_dir == NULL || nolocale || debug)
 		{
-			bail("initdb failed\n"
-				 "# Examine \"%s/log/initdb.log\" for the reason.\n"
-				 "# Command was: %s",
-				 outputdir, cmd.data);
+			note("initializing database system by running initdb");
+
+			appendStringInfo(&cmd,
+							 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+							 bindir ? bindir : "",
+							 bindir ? "/" : "",
+							 temp_instance);
+			if (debug)
+				appendStringInfo(&cmd, " --debug");
+			if (nolocale)
+				appendStringInfo(&cmd, " --no-locale");
+			appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
+			fflush(NULL);
+			if (system(cmd.data))
+			{
+				bail("initdb failed\n"
+					 "# Examine \"%s/log/initdb.log\" for the reason.\n"
+					 "# Command was: %s",
+					 outputdir, cmd.data);
+			}
+		}
+		else
+		{
+#ifndef WIN32
+			const char *copycmd = "cp -a \"%s\" \"%s/data\"";
+			int			expected_exitcode = 0;
+#else
+			const char *copycmd = "robocopy /E /NJS /NJH /NFL /NDL /NP \"%s\" \"%s/data\"";
+			int			expected_exitcode = 1;	/* 1 denotes files were copied */
+#endif
+
+			note("initializing database system by copying initdb template");
+
+			appendStringInfo(&cmd,
+							 copycmd,
+							 initdb_template_dir,
+							 temp_instance);
+			if (system(cmd.data) != expected_exitcode)
+			{
+				bail("copying of initdb template failed\n"
+					 "# Examine \"%s/log/initdb.log\" for the reason.\n"
+					 "# Command was: %s",
+					 outputdir, cmd.data);
+			}
 		}
 
 		pfree(cmd.data);
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 0cf7ba77996..e137769850d 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -109,8 +109,9 @@ task:
   test_minimal_script: |
     su postgres <<-EOF
       ulimit -c unlimited
+      meson test $MTEST_ARGS --suite setup
       meson test $MTEST_ARGS --num-processes ${TEST_JOBS} \
-        tmp_install cube/regress pg_ctl/001_start_stop
+        cube/regress pg_ctl/001_start_stop
     EOF
 
   on_failure:
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index df9f721a41a..0b4ca0eb6ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -397,30 +397,6 @@ check: temp-install
 
 .PHONY: temp-install
 
-temp-install: | submake-generated-headers
-ifndef NO_TEMP_INSTALL
-ifneq ($(abs_top_builddir),)
-ifeq ($(MAKELEVEL),0)
-	rm -rf '$(abs_top_builddir)'/tmp_install
-	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
-	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-endif
-endif
-endif
-
-# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
-# entries appear more than once in the tree, and parallel installs of the same
-# file can fail with EEXIST.
-checkprep:
-	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
-
-PROVE = @PROVE@
-# There are common routines in src/test/perl, and some test suites have
-# extra perl modules in their own directory.
-PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
-# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
-PROVE_FLAGS =
 
 # prepend to path if already set, else just set it
 define add_to_path
@@ -437,8 +413,36 @@ ld_library_path_var = LD_LIBRARY_PATH
 with_temp_install = \
 	PATH="$(abs_top_builddir)/tmp_install$(bindir):$(CURDIR):$$PATH" \
 	$(call add_to_path,$(strip $(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+	INITDB_TEMPLATE='$(abs_top_builddir)'/tmp_install/initdb-template \
 	$(with_temp_install_extra)
 
+temp-install: | submake-generated-headers
+ifndef NO_TEMP_INSTALL
+ifneq ($(abs_top_builddir),)
+ifeq ($(MAKELEVEL),0)
+	rm -rf '$(abs_top_builddir)'/tmp_install
+	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+
+	$(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+endif
+endif
+endif
+
+# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
+# entries appear more than once in the tree, and parallel installs of the same
+# file can fail with EEXIST.
+checkprep:
+	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+
+PROVE = @PROVE@
+# There are common routines in src/test/perl, and some test suites have
+# extra perl modules in their own directory.
+PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
+# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
+PROVE_FLAGS =
+
 ifeq ($(enable_tap_tests),yes)
 
 ifndef PGXS
-- 
2.38.0

v3-0008-ci-switch-tasks-to-debugoptimized-build.patchtext/x-diff; charset=us-asciiDownload
From a30d92a68b4e7a925cc9fccb3c83674f2c6407af Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 17:27:51 -0700
Subject: [PATCH v3 08/10] ci: switch tasks to debugoptimized build

In aggregate the CI tasks burn a lot of cpu hours. Compared to that easy to
read backtraces aren't as important. Still use -ggdb where appropriate, as
that does make backtraces more reliable, particularly in the face of
optimization.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 .cirrus.tasks.yml | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e137769850d..d1730ce08a8 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -136,7 +136,7 @@ task:
 
     CCACHE_DIR: /tmp/ccache_dir
     CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
-    CFLAGS: -Og -ggdb
+    CFLAGS: -ggdb
 
   <<: *freebsd_task_template
 
@@ -171,7 +171,7 @@ task:
   configure_script: |
     su postgres <<-EOF
       meson setup \
-        --buildtype=debug \
+        --buildtype=debugoptimized \
         -Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \
         -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
         -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \
@@ -266,7 +266,7 @@ task:
     ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0
 
     # SANITIZER_FLAGS is set in the tasks below
-    CFLAGS: -Og -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
+    CFLAGS: -ggdb -fno-sanitize-recover=all $SANITIZER_FLAGS
     CXXFLAGS: $CFLAGS
     LDFLAGS: $SANITIZER_FLAGS
     CC: ccache gcc
@@ -356,7 +356,7 @@ task:
       configure_script: |
         su postgres <<-EOF
           meson setup \
-            --buildtype=debug \
+            --buildtype=debugoptimized \
             -Dcassert=true \
             ${LINUX_MESON_FEATURES} \
             -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
@@ -369,7 +369,7 @@ task:
         su postgres <<-EOF
           export CC='ccache gcc -m32'
           meson setup \
-            --buildtype=debug \
+            --buildtype=debugoptimized \
             -Dcassert=true \
             ${LINUX_MESON_FEATURES} \
             -Dllvm=disabled \
@@ -427,8 +427,8 @@ task:
 
     CC: ccache cc
     CXX: ccache c++
-    CFLAGS: -Og -ggdb
-    CXXFLAGS: -Og -ggdb
+    CFLAGS: -ggdb
+    CXXFLAGS: -ggdb
 
   <<: *macos_task_template
 
@@ -479,7 +479,7 @@ task:
   configure_script: |
     export PKG_CONFIG_PATH="/opt/local/lib/pkgconfig/"
     meson setup \
-      --buildtype=debug \
+      --buildtype=debugoptimized \
       -Dextra_include_dirs=/opt/local/include \
       -Dextra_lib_dirs=/opt/local/lib \
       -Dcassert=true \
-- 
2.38.0

v3-0009-ci-windows-Disabling-write-cache-flushing-during-.patchtext/x-diff; charset=us-asciiDownload
From df832576932cd3f8dc8e966670eb171353242680 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:56:29 -0700
Subject: [PATCH v3 09/10] ci: windows: Disabling write cache flushing during
 test

This has been measured to reduce windows test times by about 30s.
---
 .cirrus.tasks.yml                    |  6 ++++++
 src/tools/ci/windows_write_cache.ps1 | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+)
 create mode 100644 src/tools/ci/windows_write_cache.ps1

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index d1730ce08a8..360b1c775fd 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -547,6 +547,12 @@ task:
   setup_additional_packages_script: |
     REM choco install -y --no-progress ...
 
+  # Define the write cache to be power protected. This reduces the rate of
+  # cache flushes, which seems to help metadata heavy workloads on NTFS. We're
+  # just testing here anyway, so ...
+  change_write_caching_script:
+    - powershell src/tools/ci/windows_write_cache.ps1 2>&1
+
   setup_hosts_file_script: |
     echo 127.0.0.1 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
     echo 127.0.0.2 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
diff --git a/src/tools/ci/windows_write_cache.ps1 b/src/tools/ci/windows_write_cache.ps1
new file mode 100644
index 00000000000..5c67b3ce54b
--- /dev/null
+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.
+
+Set-Location "HKLM:/SYSTEM/CurrentControlSet/Enum/SCSI";
+
+Get-ChildItem -Path "*/*" | foreach-object {
+    Push-Location;
+    cd /$_;
+    pwd;
+    cd 'Device Parameters';
+    if (!(Test-Path -Path "Disk")) {
+	New-Item -Path "Disk";
+    }
+
+    Set-ItemProperty -Path Disk -Type DWord -name CacheIsPowerProtected -Value 1;
+    Pop-Location;
+}
-- 
2.38.0

v3-0010-regress-Check-for-postgres-startup-completion-mor.patchtext/x-diff; charset=us-asciiDownload
From c7b2c9d9664cc03fc997230c6162c6920181ca85 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 7 Aug 2023 16:51:16 -0700
Subject: [PATCH v3 10/10] regress: Check for postgres startup completion more
 often

Previously pg_regress.c only checked whether the server started up once a
second - in most cases startup is much faster though. Use the same interval as
pg_ctl does.
---
 src/test/regress/pg_regress.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 407e3915cec..46af1fddbdb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -75,6 +75,9 @@ const char *pretty_diff_opts = "-w -U3";
  */
 #define TESTNAME_WIDTH 36
 
+/* how often to recheck if postgres startup completed */
+#define WAITS_PER_SEC	10
+
 typedef enum TAPtype
 {
 	DIAG = 0,
@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
 		else
 			wait_seconds = 60;
 
-		for (i = 0; i < wait_seconds; i++)
+		for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
 		{
 			/* Done if psql succeeds */
 			fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
 					 outputdir);
 			}
 
-			pg_usleep(1000000L);
+			pg_usleep(1000000L / WAITS_PER_SEC);
 		}
 		if (i >= wait_seconds)
 		{
-- 
2.38.0

#18Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#17)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 23 Aug 2023, at 08:58, Andres Freund <andres@anarazel.de> wrote:

I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

I've been reading over these and the thread, and while not within my area of
expertise, nothing really sticks out.

I'll do another pass, but below are a few small comments so far.

I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?

+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.

One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here. In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't. On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.

@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;

-		for (i = 0; i < wait_seconds; i++)
+		for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
 		{
 			/* Done if psql succeeds */
 			fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
 					 outputdir);
 			}

- pg_usleep(1000000L);
+ pg_usleep(1000000L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{

It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?

--
Daniel Gustafsson

#19Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#18)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote:

On 23 Aug 2023, at 08:58, Andres Freund <andres@anarazel.de> wrote:

I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

I've been reading over these and the thread, and while not within my area of
expertise, nothing really sticks out.

Thanks!

I'll do another pass, but below are a few small comments so far.

I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?

Ah, I should have explained that: I'm not planning to apply
- regress: Check for postgres startup completion more often
- ci: windows: Disabling write cache flushing during test
right now. Compared to the other patches the wins are much smaller and/or more
work is needed to make them good.

I think it might be worth going for
- ci: switch tasks to debugoptimized build
because that provides a fair bit of gain. But it might be more hurtful than
helpful due to costing more when ccache doesn't work...

+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.

One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here. In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't. On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check. Medium term, I think we should invent a way for pg_ctl and other
tooling (including pg_regress) to wait for the service to come up. E.g. having
a named pipe that postmaster opens once the server is up, which should allow
multiple clients to use select/epoll/... to wait for it without looping.

ISTM making pg_regress use libpq w/ PQping() should be a pretty simple patch?
The non-polling approach obviously is even better, but also requires more
thought (and documentation and ...).

@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;

-		for (i = 0; i < wait_seconds; i++)
+		for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
{
/* Done if psql succeeds */
fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
outputdir);
}

- pg_usleep(1000000L);
+ pg_usleep(1000000L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{

It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?

I wouldn't quite call not waiting for 1s for the server to start, when it does
so within a few ms, chasing cycles ;). For short tests it's a substantial
fraction of the overall runtime...

Greetings,

Andres Freund

#20Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#19)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote:

I'll do another pass, but below are a few small comments so far.

I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?

Ah, I should have explained that: I'm not planning to apply
- regress: Check for postgres startup completion more often
- ci: windows: Disabling write cache flushing during test
right now. Compared to the other patches the wins are much smaller and/or more
work is needed to make them good.

I think it might be worth going for
- ci: switch tasks to debugoptimized build
because that provides a fair bit of gain. But it might be more hurtful than
helpful due to costing more when ccache doesn't work...

Gotcha.

+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.

One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here. In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't. On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.

I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.

It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?

I wouldn't quite call not waiting for 1s for the server to start, when it does
so within a few ms, chasing cycles ;). For short tests it's a substantial
fraction of the overall runtime...

Absolutely, I was referring to shifting the sleep before the test to avoid the
extra test, not the reduction of the pg_usleep. Reducing the sleep is a clear
win.

--
Daniel Gustafsson

#21Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andres Freund (#17)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

Thanks for the patch!

On Wed, 23 Aug 2023 at 09:58, Andres Freund <andres@anarazel.de> wrote:

I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

Patch looks good to me besides some minor points.

v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patch:
diff --git a/.cirrus.star b/.cirrus.star
+    """The main function is executed by cirrus-ci after loading
.cirrus.yml and can
+    extend the CI definition further.
+
+    As documented in .cirrus.yml, the final CI configuration is composed of
+
+    1) the contents of this file

Instead of '1) the contents of this file' comment, '1) the contents
of .cirrus.yml file' could be better since this comment appears in
.cirrus.star file.

+    if repo_config_url != None:
+        print("loading additional configuration from
\"{}\"".format(repo_config_url))
+        output += config_from(repo_config_url)
+    else:
+        output += "n# REPO_CI_CONFIG_URL was not set\n"

Possible typo at output += "n# REPO_CI_CONFIG_URL was not set\n".

v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.

Regards,
Nazir Bilal Yavuz
Microsoft

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#20)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Daniel Gustafsson <daniel@yesql.se> writes:

On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.

I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.

I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other? That seems like it's probably a "so don't
do that" case, though.

regards, tom lane

#23Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#22)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 23 Aug 2023, at 23:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.

I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.

I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other? That seems like it's probably a "so don't
do that" case, though.

Ah yes, that does ring a familiar bell. I agree that using it for pinging the
server should be safe either way, but we should document the use-with-caution
in pg_regress.c if/when we go down that path. I'll take a stab at changing the
psql retry loop for pinging tomorrow to see what it would look like.

--
Daniel Gustafsson

#24Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#22)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 17:02:51 -0400, Tom Lane wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.

I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.

I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think

Or perhaps the opposite? That an installcheck pg_regress run might use the
system libpq, which doesn't have the symbols, or such?

Either way, with a function like PQping(), which existing in well beyond the
supported branches, that shouldn't be an issue?

... unless we hit problems with, say, a different default port number or
socket path compiled into one copy vs. the other? That seems like it's
probably a "so don't do that" case, though.

If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?

Greetings,

Andres Freund

#25Andres Freund
andres@anarazel.de
In reply to: Nazir Bilal Yavuz (#21)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 23:55:15 +0300, Nazir Bilal Yavuz wrote:

On Wed, 23 Aug 2023 at 09:58, Andres Freund <andres@anarazel.de> wrote:

I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

Patch looks good to me besides some minor points.

Thanks for looking!

v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patch:
diff --git a/.cirrus.star b/.cirrus.star
+    """The main function is executed by cirrus-ci after loading
.cirrus.yml and can
+    extend the CI definition further.
+
+    As documented in .cirrus.yml, the final CI configuration is composed of
+
+    1) the contents of this file

Instead of '1) the contents of this file' comment, '1) the contents
of .cirrus.yml file' could be better since this comment appears in
.cirrus.star file.

Good catch.

+    if repo_config_url != None:
+        print("loading additional configuration from
\"{}\"".format(repo_config_url))
+        output += config_from(repo_config_url)
+    else:
+        output += "n# REPO_CI_CONFIG_URL was not set\n"

Possible typo at output += "n# REPO_CI_CONFIG_URL was not set\n".

Fixed.

v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.

We could, but I'm not sure what the use would be?

Greetings,

Andres Freund

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#24)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Andres Freund <andres@anarazel.de> writes:

On 2023-08-23 17:02:51 -0400, Tom Lane wrote:

... unless we hit problems with, say, a different default port number or
socket path compiled into one copy vs. the other? That seems like it's
probably a "so don't do that" case, though.

If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?

The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are. So we don't explicitly know where that
libpq will connect.

As I said, we might be able to start treating installed-libpq-not-
compatible-with-build as a "don't do it" case. Another idea is to try
to ensure that pg_regress uses the same libpq that the psql-under-test
does; but I'm not sure how to implement that.

regards, tom lane

#27Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#26)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 17:55:53 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-08-23 17:02:51 -0400, Tom Lane wrote:

... unless we hit problems with, say, a different default port number or
socket path compiled into one copy vs. the other? That seems like it's
probably a "so don't do that" case, though.

If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?

The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are. So we don't explicitly know where that
libpq will connect.

Stepping back: I don't think installcheck matters for the concrete use of
libpq we're discussing - the only time we wait for server startup is the
non-installcheck case.

There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?

I was wondering if we could create an unambiguous connection info, but that
seems like it'd be hard to do, without creating cross version hazards.

As I said, we might be able to start treating installed-libpq-not-
compatible-with-build as a "don't do it" case. Another idea is to try
to ensure that pg_regress uses the same libpq that the psql-under-test
does; but I'm not sure how to implement that.

I don't think that's likely to work, psql could use a libpq with a different
soversion. We could dlopen() libpq, etc, but that seems way too complicated.

What's the reason we don't force psql to come from the same build as
pg_regress?

Greetings,

Andres Freund

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#27)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Andres Freund <andres@anarazel.de> writes:

On 2023-08-23 17:55:53 -0400, Tom Lane wrote:

The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are. So we don't explicitly know where that
libpq will connect.

Stepping back: I don't think installcheck matters for the concrete use of
libpq we're discussing - the only time we wait for server startup is the
non-installcheck case.

Oh, that's an excellent point. So for the immediately proposed use-case,
there's no issue. (We don't have a mode where we try to start a server
using already-installed executables.)

There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?

Meh. I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work. It's seldom hard to
tell from the test output whether the server crashed.

I was wondering if we could create an unambiguous connection info, but that
seems like it'd be hard to do, without creating cross version hazards.

Hmm, we don't expect the regression test suite to work against other
server versions, so maybe that could be made to work --- that is, we
could run the psql under test and get a full set of connection
parameters out of it? But I'm still not finding this worth the
trouble.

What's the reason we don't force psql to come from the same build as
pg_regress?

Because the point of installcheck is to check the installed binaries
--- including the installed psql and libpq.

(Thinks for a bit...) Maybe we should add pg_regress to the installed
fileset, and use that copy not the in-tree copy for installcheck?
Then we could assume it's using the same libpq as psql. IIRC there
have already been suggestions to do that for the benefit of PGXS
testing.

regards, tom lane

#29Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#28)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 18:32:26 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?

Meh. I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work. It's seldom hard to
tell from the test output whether the server crashed.

I find it pretty painful to wade through a several-megabyte regression.diffs
to find the cause of a crash. I think we ought to use
restart_after_crash=false, since after a crash there's no hope for the tests
to succeed, but even in that case, we end up with a lot of pointless contents
in regression.diffs. If we instead realized that we shouldn't start further
tests, we'd limit that by a fair bit.

Greetings,

Andres Freund

#30Peter Eisentraut
peter@eisentraut.org
In reply to: Andres Freund (#29)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 24.08.23 00:56, Andres Freund wrote:

Hi,

On 2023-08-23 18:32:26 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?

Meh. I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work. It's seldom hard to
tell from the test output whether the server crashed.

I find it pretty painful to wade through a several-megabyte regression.diffs
to find the cause of a crash. I think we ought to use
restart_after_crash=false, since after a crash there's no hope for the tests
to succeed, but even in that case, we end up with a lot of pointless contents
in regression.diffs. If we instead realized that we shouldn't start further
tests, we'd limit that by a fair bit.

I once coded it up so that if the server crashes during a test, it would
wait until it recovers before running the next test. I found that
useful. I agree the current behavior is not useful in any case.

#31Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andres Freund (#25)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On Thu, 24 Aug 2023 at 00:48, Andres Freund <andres@anarazel.de> wrote:

v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.

We could, but I'm not sure what the use would be?

My main idea behind this was that PG developers could choose
'buildtype: debug' while working on their patches and that
optimization makes it easier to choose the buildtype.

Regards,
Nazir Bilal Yavuz
Microsoft

#32Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#17)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-22 23:58:33 -0700, Andres Freund wrote:

To make that possible, we need to make the compute resources for CI
configurable on a per-repository basis. After experimenting with a bunch of
ways to do that, I got stuck on that for a while. But since today we have
sufficient macos runners for cfbot available, so... I think the approach I
finally settled on is decent, although not great. It's described in the "main"
commit message:
[...]
ci: Prepare to make compute resources for CI configurable
I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

I've pushed this yesterday.

And then utilized it to make cfbot use
1) macos persistent workers, hosted by two community members
2) our own GCP account for all the other operating systems

There were a few issues initially (needed to change how to run multiple jobs
on a single mac, and looks like there were some issues with macos going to
sleep while processing jobs...). But it now seems to be chugging alone ok.

One of the nice things is that with our own compute we also control how much
storage can be used, making things like generating docs or code coverage as
part of cfbot more realistic. And we could enable mingw by default when run as
part of cfbot...

Greetings,

Andres Freund

#33Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#23)
1 attachment(s)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 23 Aug 2023, at 23:12, Daniel Gustafsson <daniel@yesql.se> wrote:

On 23 Aug 2023, at 23:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

On 23 Aug 2023, at 21:22, Andres Freund <andres@anarazel.de> wrote:

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check.

I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.

I have a vague feeling that you are right about that. Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test. As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other? That seems like it's probably a "so don't
do that" case, though.

Ah yes, that does ring a familiar bell. I agree that using it for pinging the
server should be safe either way, but we should document the use-with-caution
in pg_regress.c if/when we go down that path. I'll take a stab at changing the
psql retry loop for pinging tomorrow to see what it would look like.

Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress. In order to see performance it also includes
a diag output for "Time to first test" which contains all setup costs. This
might not make it into a commit but it was quite helpful in hacking so I left
it in for now.

The patch incorporates Andres' idea for finer granularity of checks by checking
TICKS times per second rather than once per second, it also shifts the
pg_usleep around to require just one ping in most cases compard to two today.

On my relatively tired laptop this speeds up pg_regress setup with 100+ms with
much bigger wins on Windows in the CI. While it does add a dependency on
libpq, I think it's a fairly decent price to pay for running tests faster.

--
Daniel Gustafsson

Attachments:

v1-0001-Speed-up-pg_regress-server-testing.patchapplication/octet-stream; name=v1-0001-Speed-up-pg_regress-server-testing.patch; x-unix-mode=0644Download
From eb71567159e488e9f6ba5596f18939dfb83b9078 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 28 Aug 2023 10:54:25 +0200
Subject: [PATCH v1] Speed up pg_regress server testing

Instead of connecting to the server with psql to check if it is ready
for running tests, pg_regress now use PQPing which avoids performing
expensive system() calls on Windows.

The frequency of tests is also increased in order to connect to the
server faster.

This patch is part of a larger effort to make testing consume fewer
resources in order to be able to fit more tests into the available
CI constraints.

Discussion: https://postgr.es/m/20230823192239.jxew5s3sjru63lio@awork3.anarazel.de
---
 src/interfaces/ecpg/test/Makefile    |  3 +-
 src/interfaces/ecpg/test/meson.build |  2 +-
 src/test/isolation/Makefile          |  2 +-
 src/test/isolation/meson.build       |  2 +-
 src/test/regress/GNUmakefile         |  4 +-
 src/test/regress/meson.build         |  2 +-
 src/test/regress/pg_regress.c        | 87 ++++++++++++++++++++--------
 7 files changed, 70 insertions(+), 32 deletions(-)

diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile
index cf841a3a5b..ba6ca837b3 100644
--- a/src/interfaces/ecpg/test/Makefile
+++ b/src/interfaces/ecpg/test/Makefile
@@ -10,6 +10,7 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := \
 	'-I$(top_builddir)/src/port' \
 	'-I$(top_srcdir)/src/test/regress' \
+	'-I$(libpq_srcdir)' \
 	'-DHOST_TUPLE="$(host_tuple)"' \
 	'-DSHELLPROG="$(SHELL)"' \
 	$(CPPFLAGS)
@@ -45,7 +46,7 @@ clean distclean maintainer-clean:
 all: pg_regress$(X)
 
 pg_regress$(X): pg_regress_ecpg.o $(WIN32RES) $(top_builddir)/src/test/regress/pg_regress.o
-	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
 
 $(top_builddir)/src/test/regress/pg_regress.o:
 	$(MAKE) -C $(dir $@) $(notdir $@)
diff --git a/src/interfaces/ecpg/test/meson.build b/src/interfaces/ecpg/test/meson.build
index 04c6819a79..b7a3fb4e0e 100644
--- a/src/interfaces/ecpg/test/meson.build
+++ b/src/interfaces/ecpg/test/meson.build
@@ -18,7 +18,7 @@ pg_regress_ecpg = executable('pg_regress_ecpg',
   pg_regress_ecpg_sources,
   c_args: pg_regress_cflags,
   include_directories: [pg_regress_inc, include_directories('.')],
-  dependencies: [frontend_code],
+  dependencies: [frontend_code, libpq],
   kwargs: default_bin_args + {
     'install': false
   },
diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index b8738b7c1b..e99602ae52 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -38,7 +38,7 @@ pg_regress.o: | submake-regress
 	rm -f $@ && $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o .
 
 pg_isolation_regress$(X): isolation_main.o pg_regress.o $(WIN32RES)
-	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
 
 isolationtester$(X): $(OBJS) | submake-libpq submake-libpgport
 	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
diff --git a/src/test/isolation/meson.build b/src/test/isolation/meson.build
index a4439e8ad0..e6ebe62c1e 100644
--- a/src/test/isolation/meson.build
+++ b/src/test/isolation/meson.build
@@ -35,7 +35,7 @@ pg_isolation_regress = executable('pg_isolation_regress',
   isolation_sources,
   c_args: pg_regress_cflags,
   include_directories: pg_regress_inc,
-  dependencies: frontend_code,
+  dependencies: [frontend_code, libpq],
   kwargs: default_bin_args + {
     'install_dir': dir_pgxs / 'src/test/isolation',
   },
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 38c3a1f85b..db0687f867 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -36,11 +36,11 @@ EXTRADEFS = '-DHOST_TUPLE="$(host_tuple)"' \
 all: pg_regress$(X)
 
 pg_regress$(X): pg_regress.o pg_regress_main.o $(WIN32RES) | submake-libpgport
-	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+	$(CC) $(CFLAGS)  $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
 
 # dependencies ensure that path changes propagate
 pg_regress.o: pg_regress.c $(top_builddir)/src/port/pg_config_paths.h
-pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port $(EXTRADEFS)
+pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port -I$(libpq_srcdir) $(EXTRADEFS)
 
 # note: because of the submake dependency, this rule's action is really a no-op
 $(top_builddir)/src/port/pg_config_paths.h: | submake-libpgport
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index a045c00c1f..f0dfd85591 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -30,7 +30,7 @@ endif
 pg_regress = executable('pg_regress',
   regress_sources,
   c_args: pg_regress_cflags,
-  dependencies: [frontend_code],
+  dependencies: [frontend_code, libpq],
   kwargs: default_bin_args + {
     'install_dir': dir_pgxs / 'src/test/regress',
   },
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index ec67588cf5..601fbb33d3 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -32,6 +32,7 @@
 #include "common/username.h"
 #include "getopt_long.h"
 #include "lib/stringinfo.h"
+#include "libpq-fe.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -75,6 +76,12 @@ const char *pretty_diff_opts = "-w -U3";
  */
 #define TESTNAME_WIDTH 36
 
+/*
+ * The number times per second that pg_regress checks to see if the test
+ * instance server has started and is available for connection.
+ */
+#define WAIT_TICKS_PER_SECOND 20
+
 typedef enum TAPtype
 {
 	DIAG = 0,
@@ -107,6 +114,7 @@ static bool nolocale = false;
 static bool use_existing = false;
 static char *hostname = NULL;
 static int	port = -1;
+static char portstr[16];
 static bool port_specified_by_user = false;
 static char *dlpath = PKGLIBDIR;
 static char *user = NULL;
@@ -2107,7 +2115,10 @@ regression_main(int argc, char *argv[],
 	int			i;
 	int			option_index;
 	char		buf[MAXPGPATH * 4];
-	char		buf2[MAXPGPATH * 4];
+	instr_time	starttime;
+	instr_time	stoptime;
+
+	INSTR_TIME_SET_CURRENT(starttime);
 
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
@@ -2296,6 +2307,8 @@ regression_main(int argc, char *argv[],
 		const char *env_wait;
 		int			wait_seconds;
 		const char *initdb_template_dir;
+		const char *keywords[4];
+		const char *values[4];
 
 		/*
 		 * Prepare the temp instance
@@ -2435,22 +2448,30 @@ regression_main(int argc, char *argv[],
 		}
 #endif
 
+		sprintf(portstr, "%d", port);
+
 		/*
-		 * Check if there is a postmaster running already.
+		 * Prepare the connection params for checking the state of the server
+		 * before starting the tests.
 		 */
-		snprintf(buf2, sizeof(buf2),
-				 "\"%s%spsql\" -X postgres <%s 2>%s",
-				 bindir ? bindir : "",
-				 bindir ? "/" : "",
-				 DEVNULL, DEVNULL);
+		keywords[0] = "dbname";
+		values[0] = "postgres";
+		keywords[1] = "port";
+		values[1] = portstr;
+		keywords[2] = "host";
+		values[2] = hostname ? hostname : sockdir;
+		keywords[3] = NULL;
+		values[3] = NULL;
 
+		/*
+		 * Check if there is a postmaster running already.
+		 */
 		for (i = 0; i < 16; i++)
 		{
-			fflush(NULL);
-			if (system(buf2) == 0)
-			{
-				char		s[16];
+			PGPing rv = PQpingParams(keywords, values, 1);
 
+			if (rv == PQPING_OK)
+			{
 				if (port_specified_by_user || i == 15)
 				{
 					note("port %d apparently in use", port);
@@ -2461,8 +2482,8 @@ regression_main(int argc, char *argv[],
 
 				note("port %d apparently in use, trying %d", port, port + 1);
 				port++;
-				sprintf(s, "%d", port);
-				setenv("PGPORT", s, 1);
+				sprintf(portstr, "%d", port);
+				setenv("PGPORT", portstr, 1);
 			}
 			else
 				break;
@@ -2485,11 +2506,11 @@ regression_main(int argc, char *argv[],
 			bail("could not spawn postmaster: %s", strerror(errno));
 
 		/*
-		 * Wait till postmaster is able to accept connections; normally this
-		 * is only a second or so, but Cygwin is reportedly *much* slower, and
-		 * test builds using Valgrind or similar tools might be too.  Hence,
-		 * allow the default timeout of 60 seconds to be overridden from the
-		 * PGCTLTIMEOUT environment variable.
+		 * Wait till postmaster is able to accept connections; normally takes
+		 * only a fraction of a second or so, but Cygwin is reportedly *much*
+		 * slower, and test builds using Valgrind or similar tools might be
+		 * too.  Hence, allow the default timeout of 60 seconds to be
+		 * overridden from the PGCTLTIMEOUT environment variable.
 		 */
 		env_wait = getenv("PGCTLTIMEOUT");
 		if (env_wait != NULL)
@@ -2501,13 +2522,24 @@ regression_main(int argc, char *argv[],
 		else
 			wait_seconds = 60;
 
-		for (i = 0; i < wait_seconds; i++)
+		for (i = 0; i < wait_seconds * WAIT_TICKS_PER_SECOND; i++)
 		{
-			/* Done if psql succeeds */
-			fflush(NULL);
-			if (system(buf2) == 0)
+			/*
+			 * It's fairly unlikely that the server is responding immediately
+			 * so we start with sleeping before checking instead of the other
+			 * way around.
+			 */
+			pg_usleep(1000000L / WAIT_TICKS_PER_SECOND);
+
+			PGPing rv = PQpingParams(keywords, values, 1);
+
+			/* Done if the server is running and accepts connections */
+			if (rv == PQPING_OK)
 				break;
 
+			if (rv == PQPING_NO_ATTEMPT)
+				bail("attempting to connect to postmaster failed");
+
 			/*
 			 * Fail immediately if postmaster has exited
 			 */
@@ -2520,10 +2552,8 @@ regression_main(int argc, char *argv[],
 				bail("postmaster failed, examine \"%s/log/postmaster.log\" for the reason",
 					 outputdir);
 			}
-
-			pg_usleep(1000000L);
 		}
-		if (i >= wait_seconds)
+		if (i >= wait_seconds * WAIT_TICKS_PER_SECOND)
 		{
 			diag("postmaster did not respond within %d seconds, examine \"%s/log/postmaster.log\" for the reason",
 				 wait_seconds, outputdir);
@@ -2582,6 +2612,13 @@ regression_main(int argc, char *argv[],
 			create_role(sl->str, dblist);
 	}
 
+	/*
+	 * Report how much time we spent during instance setup.
+	 */
+	INSTR_TIME_SET_CURRENT(stoptime);
+	INSTR_TIME_SUBTRACT(stoptime, starttime);
+	diag("Time to first test: %.0f ms", INSTR_TIME_GET_MILLISEC(stoptime));
+
 	/*
 	 * Ready to run the tests
 	 */
-- 
2.32.1 (Apple Git-133)

#34Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#33)
1 attachment(s)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 28 Aug 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.

The attached v2 fixes a silly mistake which led to a compiler warning.

--
Daniel Gustafsson

Attachments:

v2-0001-Speed-up-pg_regress-server-testing.patchapplication/octet-stream; name=v2-0001-Speed-up-pg_regress-server-testing.patch; x-unix-mode=0644Download
From 8b4fb9ea81b7e4eed8d0269962ed5aeb744f5684 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 28 Aug 2023 10:54:25 +0200
Subject: [PATCH v2] Speed up pg_regress server testing

Instead of connecting to the server with psql to check if it is ready
for running tests, pg_regress now use PQPing which avoids performing
expensive system() calls on Windows.

The frequency of tests is also increased in order to connect to the
server faster.

This patch is part of a larger effort to make testing consume fewer
resources in order to be able to fit more tests into the available
CI constraints.

Discussion: https://postgr.es/m/20230823192239.jxew5s3sjru63lio@awork3.anarazel.de
---
 src/interfaces/ecpg/test/Makefile    |  3 +-
 src/interfaces/ecpg/test/meson.build |  2 +-
 src/test/isolation/Makefile          |  2 +-
 src/test/isolation/meson.build       |  2 +-
 src/test/regress/GNUmakefile         |  4 +-
 src/test/regress/meson.build         |  2 +-
 src/test/regress/pg_regress.c        | 88 ++++++++++++++++++++--------
 7 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile
index cf841a3a5b..ba6ca837b3 100644
--- a/src/interfaces/ecpg/test/Makefile
+++ b/src/interfaces/ecpg/test/Makefile
@@ -10,6 +10,7 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := \
 	'-I$(top_builddir)/src/port' \
 	'-I$(top_srcdir)/src/test/regress' \
+	'-I$(libpq_srcdir)' \
 	'-DHOST_TUPLE="$(host_tuple)"' \
 	'-DSHELLPROG="$(SHELL)"' \
 	$(CPPFLAGS)
@@ -45,7 +46,7 @@ clean distclean maintainer-clean:
 all: pg_regress$(X)
 
 pg_regress$(X): pg_regress_ecpg.o $(WIN32RES) $(top_builddir)/src/test/regress/pg_regress.o
-	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
 
 $(top_builddir)/src/test/regress/pg_regress.o:
 	$(MAKE) -C $(dir $@) $(notdir $@)
diff --git a/src/interfaces/ecpg/test/meson.build b/src/interfaces/ecpg/test/meson.build
index 04c6819a79..b7a3fb4e0e 100644
--- a/src/interfaces/ecpg/test/meson.build
+++ b/src/interfaces/ecpg/test/meson.build
@@ -18,7 +18,7 @@ pg_regress_ecpg = executable('pg_regress_ecpg',
   pg_regress_ecpg_sources,
   c_args: pg_regress_cflags,
   include_directories: [pg_regress_inc, include_directories('.')],
-  dependencies: [frontend_code],
+  dependencies: [frontend_code, libpq],
   kwargs: default_bin_args + {
     'install': false
   },
diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index b8738b7c1b..e99602ae52 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -38,7 +38,7 @@ pg_regress.o: | submake-regress
 	rm -f $@ && $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o .
 
 pg_isolation_regress$(X): isolation_main.o pg_regress.o $(WIN32RES)
-	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
 
 isolationtester$(X): $(OBJS) | submake-libpq submake-libpgport
 	$(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
diff --git a/src/test/isolation/meson.build b/src/test/isolation/meson.build
index a4439e8ad0..e6ebe62c1e 100644
--- a/src/test/isolation/meson.build
+++ b/src/test/isolation/meson.build
@@ -35,7 +35,7 @@ pg_isolation_regress = executable('pg_isolation_regress',
   isolation_sources,
   c_args: pg_regress_cflags,
   include_directories: pg_regress_inc,
-  dependencies: frontend_code,
+  dependencies: [frontend_code, libpq],
   kwargs: default_bin_args + {
     'install_dir': dir_pgxs / 'src/test/isolation',
   },
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 38c3a1f85b..db0687f867 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -36,11 +36,11 @@ EXTRADEFS = '-DHOST_TUPLE="$(host_tuple)"' \
 all: pg_regress$(X)
 
 pg_regress$(X): pg_regress.o pg_regress_main.o $(WIN32RES) | submake-libpgport
-	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
+	$(CC) $(CFLAGS)  $^ $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
 
 # dependencies ensure that path changes propagate
 pg_regress.o: pg_regress.c $(top_builddir)/src/port/pg_config_paths.h
-pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port $(EXTRADEFS)
+pg_regress.o: override CPPFLAGS += -I$(top_builddir)/src/port -I$(libpq_srcdir) $(EXTRADEFS)
 
 # note: because of the submake dependency, this rule's action is really a no-op
 $(top_builddir)/src/port/pg_config_paths.h: | submake-libpgport
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index a045c00c1f..f0dfd85591 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -30,7 +30,7 @@ endif
 pg_regress = executable('pg_regress',
   regress_sources,
   c_args: pg_regress_cflags,
-  dependencies: [frontend_code],
+  dependencies: [frontend_code, libpq],
   kwargs: default_bin_args + {
     'install_dir': dir_pgxs / 'src/test/regress',
   },
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index ec67588cf5..48df02156c 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -32,6 +32,7 @@
 #include "common/username.h"
 #include "getopt_long.h"
 #include "lib/stringinfo.h"
+#include "libpq-fe.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -75,6 +76,12 @@ const char *pretty_diff_opts = "-w -U3";
  */
 #define TESTNAME_WIDTH 36
 
+/*
+ * The number times per second that pg_regress checks to see if the test
+ * instance server has started and is available for connection.
+ */
+#define WAIT_TICKS_PER_SECOND 20
+
 typedef enum TAPtype
 {
 	DIAG = 0,
@@ -107,6 +114,7 @@ static bool nolocale = false;
 static bool use_existing = false;
 static char *hostname = NULL;
 static int	port = -1;
+static char portstr[16];
 static bool port_specified_by_user = false;
 static char *dlpath = PKGLIBDIR;
 static char *user = NULL;
@@ -2107,7 +2115,10 @@ regression_main(int argc, char *argv[],
 	int			i;
 	int			option_index;
 	char		buf[MAXPGPATH * 4];
-	char		buf2[MAXPGPATH * 4];
+	instr_time	starttime;
+	instr_time	stoptime;
+
+	INSTR_TIME_SET_CURRENT(starttime);
 
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
@@ -2296,6 +2307,9 @@ regression_main(int argc, char *argv[],
 		const char *env_wait;
 		int			wait_seconds;
 		const char *initdb_template_dir;
+		const char *keywords[4];
+		const char *values[4];
+		PGPing		rv;
 
 		/*
 		 * Prepare the temp instance
@@ -2435,22 +2449,30 @@ regression_main(int argc, char *argv[],
 		}
 #endif
 
+		sprintf(portstr, "%d", port);
+
 		/*
-		 * Check if there is a postmaster running already.
+		 * Prepare the connection params for checking the state of the server
+		 * before starting the tests.
 		 */
-		snprintf(buf2, sizeof(buf2),
-				 "\"%s%spsql\" -X postgres <%s 2>%s",
-				 bindir ? bindir : "",
-				 bindir ? "/" : "",
-				 DEVNULL, DEVNULL);
+		keywords[0] = "dbname";
+		values[0] = "postgres";
+		keywords[1] = "port";
+		values[1] = portstr;
+		keywords[2] = "host";
+		values[2] = hostname ? hostname : sockdir;
+		keywords[3] = NULL;
+		values[3] = NULL;
 
+		/*
+		 * Check if there is a postmaster running already.
+		 */
 		for (i = 0; i < 16; i++)
 		{
-			fflush(NULL);
-			if (system(buf2) == 0)
-			{
-				char		s[16];
+			rv = PQpingParams(keywords, values, 1);
 
+			if (rv == PQPING_OK)
+			{
 				if (port_specified_by_user || i == 15)
 				{
 					note("port %d apparently in use", port);
@@ -2461,8 +2483,8 @@ regression_main(int argc, char *argv[],
 
 				note("port %d apparently in use, trying %d", port, port + 1);
 				port++;
-				sprintf(s, "%d", port);
-				setenv("PGPORT", s, 1);
+				sprintf(portstr, "%d", port);
+				setenv("PGPORT", portstr, 1);
 			}
 			else
 				break;
@@ -2485,11 +2507,11 @@ regression_main(int argc, char *argv[],
 			bail("could not spawn postmaster: %s", strerror(errno));
 
 		/*
-		 * Wait till postmaster is able to accept connections; normally this
-		 * is only a second or so, but Cygwin is reportedly *much* slower, and
-		 * test builds using Valgrind or similar tools might be too.  Hence,
-		 * allow the default timeout of 60 seconds to be overridden from the
-		 * PGCTLTIMEOUT environment variable.
+		 * Wait till postmaster is able to accept connections; normally takes
+		 * only a fraction of a second or so, but Cygwin is reportedly *much*
+		 * slower, and test builds using Valgrind or similar tools might be
+		 * too.  Hence, allow the default timeout of 60 seconds to be
+		 * overridden from the PGCTLTIMEOUT environment variable.
 		 */
 		env_wait = getenv("PGCTLTIMEOUT");
 		if (env_wait != NULL)
@@ -2501,13 +2523,24 @@ regression_main(int argc, char *argv[],
 		else
 			wait_seconds = 60;
 
-		for (i = 0; i < wait_seconds; i++)
+		for (i = 0; i < wait_seconds * WAIT_TICKS_PER_SECOND; i++)
 		{
-			/* Done if psql succeeds */
-			fflush(NULL);
-			if (system(buf2) == 0)
+			/*
+			 * It's fairly unlikely that the server is responding immediately
+			 * so we start with sleeping before checking instead of the other
+			 * way around.
+			 */
+			pg_usleep(1000000L / WAIT_TICKS_PER_SECOND);
+
+			rv = PQpingParams(keywords, values, 1);
+
+			/* Done if the server is running and accepts connections */
+			if (rv == PQPING_OK)
 				break;
 
+			if (rv == PQPING_NO_ATTEMPT)
+				bail("attempting to connect to postmaster failed");
+
 			/*
 			 * Fail immediately if postmaster has exited
 			 */
@@ -2520,10 +2553,8 @@ regression_main(int argc, char *argv[],
 				bail("postmaster failed, examine \"%s/log/postmaster.log\" for the reason",
 					 outputdir);
 			}
-
-			pg_usleep(1000000L);
 		}
-		if (i >= wait_seconds)
+		if (i >= wait_seconds * WAIT_TICKS_PER_SECOND)
 		{
 			diag("postmaster did not respond within %d seconds, examine \"%s/log/postmaster.log\" for the reason",
 				 wait_seconds, outputdir);
@@ -2582,6 +2613,13 @@ regression_main(int argc, char *argv[],
 			create_role(sl->str, dblist);
 	}
 
+	/*
+	 * Report how much time we spent during instance setup.
+	 */
+	INSTR_TIME_SET_CURRENT(stoptime);
+	INSTR_TIME_SUBTRACT(stoptime, starttime);
+	diag("Time to first test: %.0f ms", INSTR_TIME_GET_MILLISEC(stoptime));
+
 	/*
 	 * Ready to run the tests
 	 */
-- 
2.32.1 (Apple Git-133)

#35Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#34)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-30 10:57:10 +0200, Daniel Gustafsson wrote:

On 28 Aug 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.

The attached v2 fixes a silly mistake which led to a compiler warning.

Still seems like a good idea to me. To see what impact it has, I measured the
time running the pg_regress tests that take less than 6s on my machine - I
excluded the slower ones (like the main regression tests) because they'd hide
any overall difference.

ninja && m test --suite setup --no-rebuild && tests=$(m test --no-rebuild --list|grep -E '/regress'|grep -vE '(regress|postgres_fdw|test_integerset|intarray|amcheck|test_decoding)/regress'|cut -d' ' -f 3) && time m test --no-rebuild $tests

Time for:

master:

cassert:
real 0m5.265s
user 0m8.422s
sys 0m8.381s

optimized:
real 0m4.926s
user 0m6.356s
sys 0m8.263s

my patch (probing every 100ms with psql):

cassert:
real 0m3.465s
user 0m8.827s
sys 0m8.579s

optimized:
real 0m2.932s
user 0m6.596s
sys 0m8.458s

Daniel's (probing every 50ms with PQping()):

cassert:
real 0m3.347s
user 0m8.373s
sys 0m8.354s

optimized:
real 0m2.527s
user 0m6.156s
sys 0m8.315s

My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.

Greetings,

Andres Freund

#36Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#35)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 13 Sep 2023, at 01:49, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-30 10:57:10 +0200, Daniel Gustafsson wrote:

On 28 Aug 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.

The attached v2 fixes a silly mistake which led to a compiler warning.

Still seems like a good idea to me. To see what impact it has, I measured the
time running the pg_regress tests that take less than 6s on my machine - I
excluded the slower ones (like the main regression tests) because they'd hide
any overall difference.

My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.

While it does add a lib dependency I think it's worth doing, so I propose we go
ahead with this for master.

--
Daniel Gustafsson

#37Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#35)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 13 Sep 2023, at 01:49, Andres Freund <andres@anarazel.de> wrote:

My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.

I went ahead and applied this on master, thanks for review! Now to see if
there will be any noticeable difference in resource usage.

--
Daniel Gustafsson

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#37)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Daniel Gustafsson <daniel@yesql.se> writes:

I went ahead and applied this on master, thanks for review! Now to see if
there will be any noticeable difference in resource usage.

I think that tools like Coverity are likely to whine about your
use of sprintf instead of snprintf. Sure, it's perfectly safe,
but that won't stop the no-sprintf-ever crowd from complaining.

regards, tom lane

#39Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#38)
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

On 24 Oct 2023, at 22:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

I went ahead and applied this on master, thanks for review! Now to see if
there will be any noticeable difference in resource usage.

I think that tools like Coverity are likely to whine about your
use of sprintf instead of snprintf. Sure, it's perfectly safe,
but that won't stop the no-sprintf-ever crowd from complaining.

Fair point, that's probably quite likely to happen. I can apply an snprintf()
conversion change like this in the two places introduced by this:

-        sprintf(s, "%d", port);
+        sprintf(s, sizeof(s), "%d", port);

--
Daniel Gustafsson