slowest tap tests - split or accelerate?

Started by Andres Freundabout 4 years ago32 messages
#1Andres Freund
andres@anarazel.de

Hi,

cfbot now runs most tests on windows, the windows task is by far the slowest,
and the task limitted most in concurrency [2]https://cirrus-ci.org/faq/#are-there-any-limits. Running tap tests is the
biggest part of that. This is a bigger issue on windows because we don't have
infrastructure (yet) to run tests in parallel.

There's a few tests which stand out in their slowness, which seem worth
addressing even if we tackle test parallelism on windows at some point. I
often find them to be the slowest tests on linux too.

Picking a random successful cfbot run [1]https://cirrus-ci.com/task/5207126145499136 I see the following tap tests taking
more than 20 seconds:

67188 ms pg_basebackup t/010_pg_basebackup.pl
59710 ms recovery t/001_stream_rep.pl
57542 ms pg_rewind t/001_basic.pl
56179 ms subscription t/001_rep_changes.pl
42146 ms pgbench t/001_pgbench_with_server.pl
38264 ms recovery t/018_wal_optimize.pl
33642 ms subscription t/013_partition.pl
29129 ms pg_dump t/002_pg_dump.pl
25751 ms pg_verifybackup t/002_algorithm.pl
20628 ms subscription t/011_generated.pl

It would be good if we could make those tests faster, or if not easily
possible, at least split those tests into smaller tap tests.

Splitting a longer test into smaller ones is preferrable even if they take the
same time, because we can use prove level concurrency on windows to gain some
test parallelism. As a nice side-effect it makes it also quicker to run a
split test isolated during development.

Greetings,

Andres Freund

[1]: https://cirrus-ci.com/task/5207126145499136
[2]: https://cirrus-ci.org/faq/#are-there-any-limits

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
1 attachment(s)
Re: slowest tap tests - split or accelerate?

Hi,

On 2021-12-31 11:25:28 -0800, Andres Freund wrote:

cfbot now runs most tests on windows, the windows task is by far the slowest,
and the task limitted most in concurrency [2]. Running tap tests is the
biggest part of that. This is a bigger issue on windows because we don't have
infrastructure (yet) to run tests in parallel.

There's a few tests which stand out in their slowness, which seem worth
addressing even if we tackle test parallelism on windows at some point. I
often find them to be the slowest tests on linux too.

Picking a random successful cfbot run [1] I see the following tap tests taking
more than 20 seconds:

67188 ms pg_basebackup t/010_pg_basebackup.pl
25751 ms pg_verifybackup t/002_algorithm.pl

The reason these in particular are slow is that they do a lot of
pg_basebackups without either / one-of -cfast / --no-sync. The lack of -cfast
in particularly is responsible for a significant proportion of the test
time. The only reason this didn't cause the tests to take many minutes is that
spread checkpoints only throttle when writing out a buffer and there aren't
that many dirty buffers...

Attached is a patch changing the parameters in all the instances I
found. Testing on a local instance it about halves the runtime of
t/010_pg_basebackup.pl on linux and windows (but there's still a 2x time
difference between the two), it's less when running the tests concurrently CI.

It might be worth having one explicit use of -cspread. Perhaps combined with
an explicit checkpoint beforehand?

Greetings,

Andres Freund

Attachments:

v1-0001-tests-Consistently-use-pg_basebackup-cfast-no-syn.patchtext/x-diff; charset=us-asciiDownload
From 4abed85e9085786f82d87667f2451821c01d37c0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 17 Jan 2022 00:51:43 -0800
Subject: [PATCH v1 1/4] tests: Consistently use pg_basebackup -cfast --no-sync
 to accelerate tests.

Most tests invoking pg_basebackup themselves did not yet use -cfast, which
makes pg_basebackup take considerably longer. The only reason this didn't
cause the tests to take many minutes is that spread checkpoints only throttle
when writing out a buffer and there aren't that many dirty buffers...

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 104 ++++++++++---------
 src/bin/pg_verifybackup/t/002_algorithm.pl   |   2 +-
 src/bin/pg_verifybackup/t/003_corruption.pl  |   2 +-
 src/bin/pg_verifybackup/t/004_options.pl     |   2 +-
 src/bin/pg_verifybackup/t/006_encoding.pl    |   2 +-
 src/bin/pg_verifybackup/t/007_wal.pl         |   4 +-
 6 files changed, 61 insertions(+), 55 deletions(-)

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 872fec3d350..ebd102cd098 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -20,6 +20,13 @@ my $tempdir = PostgreSQL::Test::Utils::tempdir;
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 
+# For nearly all tests some options should be specified, to keep test times
+# reasonable. Using @pg_basebackup_defs as the first element of the array
+# passed to to IPC::Run interpolates the array (as it is not a reference to an
+# array)..
+my @pg_basebackup_defs = ('pg_basebackup', '--no-sync', '-cfast');
+
+
 # Set umask so test directories and files are created with default permissions
 umask(0077);
 
@@ -43,7 +50,7 @@ $node->set_replication_conf();
 system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup" ],
 	'pg_basebackup fails because of WAL configuration');
 
 ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
@@ -54,7 +61,7 @@ mkdir("$tempdir/backup")
   or BAIL_OUT("unable to create $tempdir/backup");
 append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
 
-$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+$node->command_fails([ @pg_basebackup_defs, '-D', "$tempdir/backup", '-n' ],
 	'failing run with no-clean option');
 
 ok(-d "$tempdir/backup", 'backup directory was created and left behind');
@@ -105,7 +112,7 @@ foreach my $filename (@tempRelationFiles)
 }
 
 # Run base backup.
-$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
+$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/backup", '-X', 'none' ],
 	'pg_basebackup runs');
 ok(-f "$tempdir/backup/PG_VERSION",      'backup was created');
 ok(-f "$tempdir/backup/backup_manifest", 'backup manifest included');
@@ -165,9 +172,9 @@ rmtree("$tempdir/backup");
 
 $node->command_ok(
 	[
-		'pg_basebackup',    '-D',
-		"$tempdir/backup2", '--no-manifest',
-		'--waldir',         "$tempdir/xlog2"
+		@pg_basebackup_defs, '-D',
+		"$tempdir/backup2",  '--no-manifest',
+		'--waldir',          "$tempdir/xlog2"
 	],
 	'separate xlog directory');
 ok(-f "$tempdir/backup2/PG_VERSION",       'backup was created');
@@ -176,31 +183,31 @@ ok(-d "$tempdir/xlog2/",                   'xlog directory was created');
 rmtree("$tempdir/backup2");
 rmtree("$tempdir/xlog2");
 
-$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft' ],
+$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/tarbackup", '-Ft' ],
 	'tar format');
 ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created');
 rmtree("$tempdir/tarbackup");
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T=/foo" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_foo", '-Fp', "-T=/foo" ],
 	'-T with empty old directory fails');
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=" ],
 	'-T with empty new directory fails');
 $node->command_fails(
 	[
-		'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp',
+		@pg_basebackup_defs, '-D', "$tempdir/backup_foo", '-Fp',
 		"-T/foo=/bar=/baz"
 	],
 	'-T with multiple = fails');
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ],
 	'-T with old directory not absolute fails');
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ],
 	'-T with new directory not absolute fails');
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ],
 	'-T with invalid format fails');
 
 # Tar format doesn't support filenames longer than 100 bytes.
@@ -211,7 +218,7 @@ open my $file, '>', "$superlongpath"
   or die "unable to create file $superlongpath";
 close $file;
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l1", '-Ft' ],
 	'pg_basebackup tar with long name fails');
 unlink "$pgdata/$superlongname";
 
@@ -329,14 +336,14 @@ foreach my $filename (@tempRelationFiles)
 }
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup1", '-Fp' ],
 	'plain format with tablespaces fails without tablespace mapping');
 
 $node->command_ok(
 	[
-		'pg_basebackup',    '-D',
-		"$tempdir/backup1", '-Fp',
-		"-T$realTsDir=$real_tempdir/tbackup/tblspc1"
+		@pg_basebackup_defs, '-D',
+		"$tempdir/backup1",  '-Fp',
+		"-T$realTsDir=$real_tempdir/tbackup/tblspc1",
 	],
 	'plain format with tablespaces succeeds with tablespace mapping');
 ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
@@ -404,9 +411,9 @@ $node->safe_psql('postgres',
 $realTsDir =~ s/=/\\=/;
 $node->command_ok(
 	[
-		'pg_basebackup',    '-D',
-		"$tempdir/backup3", '-Fp',
-		"-T$realTsDir=$real_tempdir/tbackup/tbl\\=spc2"
+		@pg_basebackup_defs, '-D',
+		"$tempdir/backup3",  '-Fp',
+		"-T$realTsDir=$real_tempdir/tbackup/tbl\\=spc2",
 	],
 	'mapping tablespace with = sign in path');
 ok(-d "$tempdir/tbackup/tbl=spc2", 'tablespace with = sign was relocated');
@@ -417,12 +424,12 @@ mkdir "$tempdir/$superlongname";
 $realTsDir = "$real_sys_tempdir/$superlongname";
 $node->safe_psql('postgres',
 	"CREATE TABLESPACE tblspc3 LOCATION '$realTsDir';");
-$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' ],
+$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l3", '-Ft' ],
 	'pg_basebackup tar with long symlink target');
 $node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
 rmtree("$tempdir/tarbackup_l3");
 
-$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' ],
+$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/backupR", '-R' ],
 	'pg_basebackup -R runs');
 ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
 ok(-f "$tempdir/backupR/standby.signal",       'standby.signal was created');
@@ -436,32 +443,32 @@ like(
 	'postgresql.auto.conf sets primary_conninfo');
 
 $node->command_ok(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxd" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxd" ],
 	'pg_basebackup runs in default xlog mode');
 ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
 	'WAL files copied');
 rmtree("$tempdir/backupxd");
 
 $node->command_ok(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxf", '-X', 'fetch' ],
 	'pg_basebackup -X fetch runs');
 ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
 	'WAL files copied');
 rmtree("$tempdir/backupxf");
 $node->command_ok(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs", '-X', 'stream' ],
 	'pg_basebackup -X stream runs');
 ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
 	'WAL files copied');
 rmtree("$tempdir/backupxs");
 $node->command_ok(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' ],
 	'pg_basebackup -X stream runs in tar mode');
 ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
 rmtree("$tempdir/backupxst");
 $node->command_ok(
 	[
-		'pg_basebackup',         '-D',
+		@pg_basebackup_defs,     '-D',
 		"$tempdir/backupnoslot", '-X',
 		'stream',                '--no-slot'
 	],
@@ -470,7 +477,7 @@ rmtree("$tempdir/backupnoslot");
 
 $node->command_fails(
 	[
-		'pg_basebackup',             '-D',
+		@pg_basebackup_defs,         '-D',
 		"$tempdir/backupxs_sl_fail", '-X',
 		'stream',                    '-S',
 		'slot0'
@@ -478,12 +485,12 @@ $node->command_fails(
 	'pg_basebackup fails with nonexistent replication slot');
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C' ],
 	'pg_basebackup -C fails without slot name');
 
 $node->command_fails(
 	[
-		'pg_basebackup',          '-D',
+		@pg_basebackup_defs,      '-D',
 		"$tempdir/backupxs_slot", '-C',
 		'-S',                     'slot0',
 		'--no-slot'
@@ -491,7 +498,7 @@ $node->command_fails(
 	'pg_basebackup fails with -C -S --no-slot');
 
 $node->command_ok(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0' ],
 	'pg_basebackup -C runs');
 rmtree("$tempdir/backupxs_slot");
 
@@ -510,7 +517,7 @@ isnt(
 	'restart LSN of new slot is not null');
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0' ],
 	'pg_basebackup fails with -C -S and a previously existing slot');
 
 $node->safe_psql('postgres',
@@ -520,12 +527,12 @@ my $lsn = $node->safe_psql('postgres',
 );
 is($lsn, '', 'restart LSN of new slot is null');
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none' ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none' ],
 	'pg_basebackup with replication slot fails without WAL streaming');
 $node->command_ok(
 	[
-		'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
-		'stream',        '-S', 'slot1'
+		@pg_basebackup_defs, '-D', "$tempdir/backupxs_sl", '-X',
+		'stream',            '-S', 'slot1'
 	],
 	'pg_basebackup -X stream with replication slot runs');
 $lsn = $node->safe_psql('postgres',
@@ -536,8 +543,8 @@ rmtree("$tempdir/backupxs_sl");
 
 $node->command_ok(
 	[
-		'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
-		'stream',        '-S', 'slot1',                  '-R'
+		@pg_basebackup_defs, '-D', "$tempdir/backupxs_sl_R", '-X',
+		'stream',            '-S', 'slot1',                  '-R',
 	],
 	'pg_basebackup with replication slot and -R runs');
 like(
@@ -570,7 +577,7 @@ close $file;
 system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt" ],
 	1,
 	[qr{^$}],
 	[qr/^WARNING.*checksum verification failed/s],
@@ -590,7 +597,7 @@ close $file;
 system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt2" ],
 	1,
 	[qr{^$}],
 	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
@@ -606,7 +613,7 @@ close $file;
 system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 
 $node->command_checks_all(
-	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3" ],
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt3" ],
 	1,
 	[qr{^$}],
 	[qr/^WARNING.*7 total checksum verification failures/s],
@@ -616,8 +623,8 @@ rmtree("$tempdir/backup_corrupt3");
 # do not verify checksums, should return ok
 $node->command_ok(
 	[
-		'pg_basebackup',            '-D',
-		"$tempdir/backup_corrupt4", '--no-verify-checksums'
+		@pg_basebackup_defs,        '-D',
+		"$tempdir/backup_corrupt4", '--no-verify-checksums',
 	],
 	'pg_basebackup with -k does not report checksum mismatch');
 rmtree("$tempdir/backup_corrupt4");
@@ -635,18 +642,17 @@ SKIP:
 
 	$node->command_ok(
 		[
-			'pg_basebackup',        '-D',
+			@pg_basebackup_defs,    '-D',
 			"$tempdir/backup_gzip", '--compress',
-			'1',                    '--no-sync',
-			'--format',             't'
+			'1',                    '--format',
+			't'
 		],
 		'pg_basebackup with --compress');
 	$node->command_ok(
 		[
-			'pg_basebackup',         '-D',
+			@pg_basebackup_defs,     '-D',
 			"$tempdir/backup_gzip2", '--gzip',
-			'--no-sync',             '--format',
-			't'
+			'--format',              't'
 		],
 		'pg_basebackup with --gzip');
 
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 06333c6c0d5..5d9965ba8f4 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -21,7 +21,7 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
 	my $backup_path = $primary->backup_dir . '/' . $algorithm;
 	my @backup      = (
 		'pg_basebackup', '-D', $backup_path,
-		'--manifest-checksums', $algorithm, '--no-sync');
+		'--manifest-checksums', $algorithm, '--no-sync', '-cfast');
 	my @verify = ('pg_verifybackup', '-e', $backup_path);
 
 	# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index 63cad6647b5..53be2efd87d 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -114,7 +114,7 @@ for my $scenario (@scenario)
 		local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
 		$primary->command_ok(
 			[
-				'pg_basebackup', '-D', $backup_path, '--no-sync',
+				'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
 				'-T', "${source_ts_path}=${backup_ts_path}"
 			],
 			"base backup ok");
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 3ec4651b680..5a8fd9b0ecb 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -17,7 +17,7 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
 my $backup_path = $primary->backup_dir . '/test_options';
-$primary->command_ok([ 'pg_basebackup', '-D', $backup_path, '--no-sync' ],
+$primary->command_ok([ 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast' ],
 	"base backup ok");
 
 # Verify that pg_verifybackup -q succeeds and produces no output.
diff --git a/src/bin/pg_verifybackup/t/006_encoding.pl b/src/bin/pg_verifybackup/t/006_encoding.pl
index e746ae6dcc4..b869cd8fe6e 100644
--- a/src/bin/pg_verifybackup/t/006_encoding.pl
+++ b/src/bin/pg_verifybackup/t/006_encoding.pl
@@ -19,7 +19,7 @@ $primary->command_ok(
 	[
 		'pg_basebackup', '-D',
 		$backup_path,    '--no-sync',
-		'--manifest-force-encode'
+		'-cfast',        '--manifest-force-encode'
 	],
 	"backup ok with forced hex encoding");
 
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index b9282da7078..4723047d73d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -17,7 +17,7 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
 my $backup_path = $primary->backup_dir . '/test_wal';
-$primary->command_ok([ 'pg_basebackup', '-D', $backup_path, '--no-sync' ],
+$primary->command_ok([ 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast' ],
 	"base backup ok");
 
 # Rename pg_wal.
@@ -71,7 +71,7 @@ $primary->safe_psql('postgres', 'SELECT pg_switch_wal()');
 my $backup_path2 = $primary->backup_dir . '/test_tli';
 # The base backup run below does a checkpoint, that removes the first segment
 # of the current timeline.
-$primary->command_ok([ 'pg_basebackup', '-D', $backup_path2, '--no-sync' ],
+$primary->command_ok([ 'pg_basebackup', '-D', $backup_path2, '--no-sync', '-cfast' ],
 	"base backup 2 ok");
 command_ok(
 	[ 'pg_verifybackup', $backup_path2 ],
-- 
2.34.0

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#2)
Re: slowest tap tests - split or accelerate?

On Mon, Jan 17, 2022 at 1:41 PM Andres Freund <andres@anarazel.de> wrote:

The reason these in particular are slow is that they do a lot of
pg_basebackups without either / one-of -cfast / --no-sync. The lack of -cfast
in particularly is responsible for a significant proportion of the test
time. The only reason this didn't cause the tests to take many minutes is that
spread checkpoints only throttle when writing out a buffer and there aren't
that many dirty buffers...

Adding -cfast to 002_algorithm.pl seems totally reasonable. I'm not
sure what else can realistically be done to speed it up without losing
the point of the test. And it's basically just a single loop, so
splitting it up doesn't seem to make a lot of sense either.

pg_basebackup's 010_pg_basebackup.pl looks like it could be split up,
though. That one, at least to me, looks like people have just kept
adding semi-related things into the same test file.

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#3)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-17 14:05:17 -0500, Robert Haas wrote:

On Mon, Jan 17, 2022 at 1:41 PM Andres Freund <andres@anarazel.de> wrote:

The reason these in particular are slow is that they do a lot of
pg_basebackups without either / one-of -cfast / --no-sync. The lack of -cfast
in particularly is responsible for a significant proportion of the test
time. The only reason this didn't cause the tests to take many minutes is that
spread checkpoints only throttle when writing out a buffer and there aren't
that many dirty buffers...

Adding -cfast to 002_algorithm.pl seems totally reasonable. I'm not
sure what else can realistically be done to speed it up without losing
the point of the test. And it's basically just a single loop, so
splitting it up doesn't seem to make a lot of sense either.

It's also not that slow compared other tests after the -cfast addition.

However, I'm a bit surprised at how long the individual pg_verifybackup calls
take on windows - about as long as the pg_basebackup call itself.

Running: pg_basebackup -D C:/dev/postgres/.\src\bin\pg_verifybackup\/tmp_check/t_002_algorithm_primary_data/backup/sha224 --manifest-checksums sha224 --no-sync -cfast
# timing: [4.798 + 0.704s]: complete
# Running: pg_verifybackup -e C:/dev/postgres/.\src\bin\pg_verifybackup\/tmp_check/t_002_algorithm_primary_data/backup/sha224
backup successfully verified
# timing: [5.507 + 0.697s]: completed

Interestingly, with crc32c, this is not so:

# Running: pg_basebackup -D C:/dev/postgres/.\src\bin\pg_verifybackup\/tmp_check/t_002_algorithm_primary_data/backup/crc32c --manifest-checksums crc32c --no-sync -cfast
# timing: [3.500 + 0.688s]: completed
ok 5 - backup ok with algorithm "crc32c"
ok 6 - crc32c is mentioned many times in the manifest
# Running: pg_verifybackup -e C:/dev/postgres/.\src\bin\pg_verifybackup\/tmp_check/t_002_algorithm_primary_data/backup/crc32c
backup successfully verified
# timing: [4.194 + 0.197s]: completed

I wonder if there's something explaining why pg_verifybackup is greatly slowed
down by sha224 but not crc32c, but the server's runtime only differs by ~20ms?
It seems incongruous that pg_basebackup, with all the complexity of needing to
communicate with the server, transferring the backup over network, and *also*
computing checksums, takes as long as the pg_verifybackup invocation?

pg_basebackup's 010_pg_basebackup.pl looks like it could be split up,
though. That one, at least to me, looks like people have just kept
adding semi-related things into the same test file.

Yea.

It's generally interesting how much time initdb takes in these tests. It's
about 1.1s on my linux workstation, and 2.1s on windows.

I've occasionally pondered caching initdb results and reusing them across
tests - just the locking around it seems a bit nasty, but perhaps that could
be done as part of the tmp_install step. Of course, it'd need to deal with
different options etc...

Greetings,

Andres Freund

#5Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#4)
Re: slowest tap tests - split or accelerate?

On Mon, Jan 17, 2022 at 2:57 PM Andres Freund <andres@anarazel.de> wrote:

I wonder if there's something explaining why pg_verifybackup is greatly slowed
down by sha224 but not crc32c, but the server's runtime only differs by ~20ms?
It seems incongruous that pg_basebackup, with all the complexity of needing to
communicate with the server, transferring the backup over network, and *also*
computing checksums, takes as long as the pg_verifybackup invocation?

I guess there must be something explaining it, but I don't know what
it could be. The client and the server are each running the checksum
algorithm over the same data. If that's not the same speed then .... I
don't get it. Unless, somehow, they're using different implementations
of it?

I've occasionally pondered caching initdb results and reusing them across
tests - just the locking around it seems a bit nasty, but perhaps that could
be done as part of the tmp_install step. Of course, it'd need to deal with
different options etc...

It's a thought, but it does seem like a bit of a pain to implement.

--
Robert Haas
EDB: http://www.enterprisedb.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: slowest tap tests - split or accelerate?

Andres Freund <andres@anarazel.de> writes:

I've occasionally pondered caching initdb results and reusing them across
tests - just the locking around it seems a bit nasty, but perhaps that could
be done as part of the tmp_install step. Of course, it'd need to deal with
different options etc...

I'd actually built a prototype to do that, based on making a reference
cluster and then "cp -a"'ing it instead of re-running initdb. I gave
up when I found than on slower, disk-bound machines it was hardly
any faster. Thinking about it now, I wonder why not just re-use one
cluster for many tests, only dropping and re-creating the database
in which the testing happens.

regards, tom lane

#7Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-17 15:13:57 -0500, Robert Haas wrote:

I guess there must be something explaining it, but I don't know what
it could be. The client and the server are each running the checksum
algorithm over the same data. If that's not the same speed then .... I
don't get it. Unless, somehow, they're using different implementations
of it?

I think that actually might be the issue. On linux a test a pg_verifybackup
was much faster than on windows (as in 10x). But if I disable openssl, it's
only 2x.

On the windows instance I *do* have openssl enabled. But I suspect something
is off and the windows buildsystem ends up with our hand-rolled implementation
on the client side, but not the server side. Which'd explain the times I'm
seeing: We have a fast CRC implementation, but the rest is pretty darn
unoptimized.

Greetings,

Andres Freund

#8Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#4)
1 attachment(s)
Re: slowest tap tests - split or accelerate?

On Mon, Jan 17, 2022 at 2:57 PM Andres Freund <andres@anarazel.de> wrote:

pg_basebackup's 010_pg_basebackup.pl looks like it could be split up,
though. That one, at least to me, looks like people have just kept
adding semi-related things into the same test file.

Yea.

Here's a patch that splits up that file. Essentially the first half of
the file is concerned with testing that a backup ends up in the state
it expects, while the second half is concerned with checking that
various options to pg_basebackup work. So I split it that way, plus I
moved some of the really basic stuff to a completely separate file
with a very brief runtime. The test results are interesting.

Unpatched:

[12:33:33] t/010_pg_basebackup.pl ... ok 16161 ms ( 0.02 usr 0.00
sys + 2.07 cusr 7.80 csys = 9.89 CPU)
[12:33:49] t/020_pg_receivewal.pl ... ok 4115 ms ( 0.00 usr 0.00
sys + 0.89 cusr 1.73 csys = 2.62 CPU)
[12:33:53] t/030_pg_recvlogical.pl .. ok 1857 ms ( 0.01 usr 0.01
sys + 0.63 cusr 0.73 csys = 1.38 CPU)
[12:33:55]
All tests successful.
Files=3, Tests=177, 22 wallclock secs ( 0.04 usr 0.02 sys + 3.59
cusr 10.26 csys = 13.91 CPU)

Pached:

[12:32:03] t/010_pg_basebackup_basic.pl ...... ok 192 ms ( 0.01
usr 0.00 sys + 0.10 cusr 0.05 csys = 0.16 CPU)
[12:32:03] t/011_pg_basebackup_integrity.pl .. ok 5530 ms ( 0.00
usr 0.00 sys + 0.87 cusr 2.51 csys = 3.38 CPU)
[12:32:09] t/012_pg_basebackup_options.pl .... ok 13117 ms ( 0.00
usr 0.00 sys + 1.87 cusr 6.31 csys = 8.18 CPU)
[12:32:22] t/020_pg_receivewal.pl ............ ok 4314 ms ( 0.01
usr 0.00 sys + 0.97 cusr 1.77 csys = 2.75 CPU)
[12:32:26] t/030_pg_recvlogical.pl ........... ok 1908 ms ( 0.00
usr 0.00 sys + 0.64 cusr 0.77 csys = 1.41 CPU)
[12:32:28]
All tests successful.
Files=5, Tests=177, 25 wallclock secs ( 0.04 usr 0.02 sys + 4.45
cusr 11.41 csys = 15.92 CPU)

Sadly, we've gained about 2.5 seconds of runtime as the price of
splitting the test. Arguably the options part could be split up a lot
more finely than this, but that would drive up the runtime even more,
basically because we'd need more initdbs. So I don't know whether it's
better to leave things as they are, split them this much, or split
them more. I think this amount of splitting might be justified simply
in the interests of clarity, but I'm reluctant to go further unless we
get some nifty initdb-caching system.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Split-010_pg_basebackup.pl.patchapplication/octet-stream; name=v1-0001-Split-010_pg_basebackup.pl.patchDownload
From aeaa4ba4629d1b48bbf13fd7512a9f1171631d70 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 18 Jan 2022 12:38:10 -0500
Subject: [PATCH v1] Split 010_pg_basebackup.pl.

---
 .../t/010_pg_basebackup_basic.pl              |  15 +
 ...ckup.pl => 011_pg_basebackup_integrity.pl} | 287 +---------------
 .../t/012_pg_basebackup_options.pl            | 318 ++++++++++++++++++
 3 files changed, 334 insertions(+), 286 deletions(-)
 create mode 100644 src/bin/pg_basebackup/t/010_pg_basebackup_basic.pl
 rename src/bin/pg_basebackup/t/{010_pg_basebackup.pl => 011_pg_basebackup_integrity.pl} (58%)
 create mode 100644 src/bin/pg_basebackup/t/012_pg_basebackup_options.pl

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup_basic.pl b/src/bin/pg_basebackup/t/010_pg_basebackup_basic.pl
new file mode 100644
index 0000000000..48863dc213
--- /dev/null
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup_basic.pl
@@ -0,0 +1,15 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 9;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+command_fails(['pg_basebackup'],
+			  'pg_basebackup needs target directory specified');
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/011_pg_basebackup_integrity.pl
similarity index 58%
rename from src/bin/pg_basebackup/t/010_pg_basebackup.pl
rename to src/bin/pg_basebackup/t/011_pg_basebackup_integrity.pl
index f0243f28d4..68332931ac 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/011_pg_basebackup_integrity.pl
@@ -10,11 +10,7 @@ use File::Path qw(rmtree);
 use Fcntl qw(:seek);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 115;
-
-program_help_ok('pg_basebackup');
-program_version_ok('pg_basebackup');
-program_options_handling_ok('pg_basebackup');
+use Test::More tests => 61;
 
 my $tempdir = PostgreSQL::Test::Utils::tempdir;
 
@@ -35,9 +31,6 @@ $node->init(extra => ['--data-checksums']);
 $node->start;
 my $pgdata = $node->data_dir;
 
-$node->command_fails(['pg_basebackup'],
-	'pg_basebackup needs target directory specified');
-
 # Some Windows ANSI code pages may reject this filename, in which case we
 # quietly proceed without this bit of test coverage.
 if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -400,281 +393,3 @@ foreach my $filename (@tempRelationFiles)
 ok( -d "$tempdir/backup1/pg_replslot",
 	'pg_replslot symlink copied as directory');
 rmtree("$tempdir/backup1");
-
-mkdir "$tempdir/tbl=spc2";
-$realTsDir = "$real_sys_tempdir/tbl=spc2";
-$node->safe_psql('postgres', "DROP TABLE test1;");
-$node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
-$node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
-$node->safe_psql('postgres',
-	"CREATE TABLESPACE tblspc2 LOCATION '$realTsDir';");
-$realTsDir =~ s/=/\\=/;
-$node->command_ok(
-	[
-		@pg_basebackup_defs, '-D',
-		"$tempdir/backup3",  '-Fp',
-		"-T$realTsDir=$real_tempdir/tbackup/tbl\\=spc2",
-	],
-	'mapping tablespace with = sign in path');
-ok(-d "$tempdir/tbackup/tbl=spc2", 'tablespace with = sign was relocated');
-$node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
-rmtree("$tempdir/backup3");
-
-mkdir "$tempdir/$superlongname";
-$realTsDir = "$real_sys_tempdir/$superlongname";
-$node->safe_psql('postgres',
-	"CREATE TABLESPACE tblspc3 LOCATION '$realTsDir';");
-$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l3", '-Ft' ],
-	'pg_basebackup tar with long symlink target');
-$node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
-rmtree("$tempdir/tarbackup_l3");
-
-$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/backupR", '-R' ],
-	'pg_basebackup -R runs');
-ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
-ok(-f "$tempdir/backupR/standby.signal",       'standby.signal was created');
-my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
-rmtree("$tempdir/backupR");
-
-my $port = $node->port;
-like(
-	$recovery_conf,
-	qr/^primary_conninfo = '.*port=$port.*'\n/m,
-	'postgresql.auto.conf sets primary_conninfo');
-
-$node->command_ok(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxd" ],
-	'pg_basebackup runs in default xlog mode');
-ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
-	'WAL files copied');
-rmtree("$tempdir/backupxd");
-
-$node->command_ok(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxf", '-X', 'fetch' ],
-	'pg_basebackup -X fetch runs');
-ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
-	'WAL files copied');
-rmtree("$tempdir/backupxf");
-$node->command_ok(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs", '-X', 'stream' ],
-	'pg_basebackup -X stream runs');
-ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
-	'WAL files copied');
-rmtree("$tempdir/backupxs");
-$node->command_ok(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' ],
-	'pg_basebackup -X stream runs in tar mode');
-ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
-rmtree("$tempdir/backupxst");
-$node->command_ok(
-	[
-		@pg_basebackup_defs,     '-D',
-		"$tempdir/backupnoslot", '-X',
-		'stream',                '--no-slot'
-	],
-	'pg_basebackup -X stream runs with --no-slot');
-rmtree("$tempdir/backupnoslot");
-
-$node->command_fails(
-	[
-		@pg_basebackup_defs,         '-D',
-		"$tempdir/backupxs_sl_fail", '-X',
-		'stream',                    '-S',
-		'slot0'
-	],
-	'pg_basebackup fails with nonexistent replication slot');
-
-$node->command_fails(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C' ],
-	'pg_basebackup -C fails without slot name');
-
-$node->command_fails(
-	[
-		@pg_basebackup_defs,      '-D',
-		"$tempdir/backupxs_slot", '-C',
-		'-S',                     'slot0',
-		'--no-slot'
-	],
-	'pg_basebackup fails with -C -S --no-slot');
-
-$node->command_ok(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0' ],
-	'pg_basebackup -C runs');
-rmtree("$tempdir/backupxs_slot");
-
-is( $node->safe_psql(
-		'postgres',
-		q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
-	),
-	'slot0',
-	'replication slot was created');
-isnt(
-	$node->safe_psql(
-		'postgres',
-		q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
-	),
-	'',
-	'restart LSN of new slot is not null');
-
-$node->command_fails(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0' ],
-	'pg_basebackup fails with -C -S and a previously existing slot');
-
-$node->safe_psql('postgres',
-	q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
-my $lsn = $node->safe_psql('postgres',
-	q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
-);
-is($lsn, '', 'restart LSN of new slot is null');
-$node->command_fails(
-	[ @pg_basebackup_defs, '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none' ],
-	'pg_basebackup with replication slot fails without WAL streaming');
-$node->command_ok(
-	[
-		@pg_basebackup_defs, '-D', "$tempdir/backupxs_sl", '-X',
-		'stream',            '-S', 'slot1'
-	],
-	'pg_basebackup -X stream with replication slot runs');
-$lsn = $node->safe_psql('postgres',
-	q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
-);
-like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
-rmtree("$tempdir/backupxs_sl");
-
-$node->command_ok(
-	[
-		@pg_basebackup_defs, '-D', "$tempdir/backupxs_sl_R", '-X',
-		'stream',            '-S', 'slot1',                  '-R',
-	],
-	'pg_basebackup with replication slot and -R runs');
-like(
-	slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
-	qr/^primary_slot_name = 'slot1'\n/m,
-	'recovery conf file sets primary_slot_name');
-
-my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
-is($checksum, 'on', 'checksums are enabled');
-rmtree("$tempdir/backupxs_sl_R");
-
-# create tables to corrupt and get their relfilenodes
-my $file_corrupt1 = $node->safe_psql('postgres',
-	q{CREATE TABLE corrupt1 AS SELECT a FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
-);
-my $file_corrupt2 = $node->safe_psql('postgres',
-	q{CREATE TABLE corrupt2 AS SELECT b FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
-);
-
-# set page header and block sizes
-my $pageheader_size = 24;
-my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
-
-# induce corruption
-system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
-open $file, '+<', "$pgdata/$file_corrupt1";
-seek($file, $pageheader_size, SEEK_SET);
-syswrite($file, "\0\0\0\0\0\0\0\0\0");
-close $file;
-system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
-
-$node->command_checks_all(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt" ],
-	1,
-	[qr{^$}],
-	[qr/^WARNING.*checksum verification failed/s],
-	'pg_basebackup reports checksum mismatch');
-rmtree("$tempdir/backup_corrupt");
-
-# induce further corruption in 5 more blocks
-system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
-open $file, '+<', "$pgdata/$file_corrupt1";
-for my $i (1 .. 5)
-{
-	my $offset = $pageheader_size + $i * $block_size;
-	seek($file, $offset, SEEK_SET);
-	syswrite($file, "\0\0\0\0\0\0\0\0\0");
-}
-close $file;
-system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
-
-$node->command_checks_all(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt2" ],
-	1,
-	[qr{^$}],
-	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
-	'pg_basebackup does not report more than 5 checksum mismatches');
-rmtree("$tempdir/backup_corrupt2");
-
-# induce corruption in a second file
-system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
-open $file, '+<', "$pgdata/$file_corrupt2";
-seek($file, $pageheader_size, SEEK_SET);
-syswrite($file, "\0\0\0\0\0\0\0\0\0");
-close $file;
-system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
-
-$node->command_checks_all(
-	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt3" ],
-	1,
-	[qr{^$}],
-	[qr/^WARNING.*7 total checksum verification failures/s],
-	'pg_basebackup correctly report the total number of checksum mismatches');
-rmtree("$tempdir/backup_corrupt3");
-
-# do not verify checksums, should return ok
-$node->command_ok(
-	[
-		@pg_basebackup_defs,        '-D',
-		"$tempdir/backup_corrupt4", '--no-verify-checksums',
-	],
-	'pg_basebackup with -k does not report checksum mismatch');
-rmtree("$tempdir/backup_corrupt4");
-
-$node->safe_psql('postgres', "DROP TABLE corrupt1;");
-$node->safe_psql('postgres', "DROP TABLE corrupt2;");
-
-note "Testing pg_basebackup with compression methods";
-
-# Check ZLIB compression if available.
-SKIP:
-{
-	skip "postgres was not built with ZLIB support", 5
-	  if (!check_pg_config("#define HAVE_LIBZ 1"));
-
-	$node->command_ok(
-		[
-			@pg_basebackup_defs,    '-D',
-			"$tempdir/backup_gzip", '--compress',
-			'1',                    '--format',
-			't'
-		],
-		'pg_basebackup with --compress');
-	$node->command_ok(
-		[
-			@pg_basebackup_defs,     '-D',
-			"$tempdir/backup_gzip2", '--gzip',
-			'--format',              't'
-		],
-		'pg_basebackup with --gzip');
-
-	# Verify that the stored files are generated with their expected
-	# names.
-	my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
-	is(scalar(@zlib_files), 2,
-		"two files created with --compress (base.tar.gz and pg_wal.tar.gz)");
-	my @zlib_files2 = glob "$tempdir/backup_gzip2/*.tar.gz";
-	is(scalar(@zlib_files2), 2,
-		"two files created with --gzip (base.tar.gz and pg_wal.tar.gz)");
-
-	# Check the integrity of the files generated.
-	my $gzip = $ENV{GZIP_PROGRAM};
-	skip "program gzip is not found in your system", 1
-	  if ( !defined $gzip
-		|| $gzip eq ''
-		|| system_log($gzip, '--version') != 0);
-
-	my $gzip_is_valid =
-	  system_log($gzip, '--test', @zlib_files, @zlib_files2);
-	is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
-	rmtree("$tempdir/backup_gzip");
-	rmtree("$tempdir/backup_gzip2");
-}
diff --git a/src/bin/pg_basebackup/t/012_pg_basebackup_options.pl b/src/bin/pg_basebackup/t/012_pg_basebackup_options.pl
new file mode 100644
index 0000000000..35c13f9e97
--- /dev/null
+++ b/src/bin/pg_basebackup/t/012_pg_basebackup_options.pl
@@ -0,0 +1,318 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use Fcntl qw(:seek);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 45;
+
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+my $real_tempdir = PostgreSQL::Test::Utils::perl2host($tempdir);
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+
+# For nearly all pg_basebackup invocations some options should be specified,
+# to keep test times reasonable. Using @pg_basebackup_defs as the first
+# element of the array passed to to IPC::Run interpolate the array (as it is
+# not a reference to an array)...
+my @pg_basebackup_defs = ('pg_basebackup', '--no-sync', '-cfast');
+
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node
+$node->init(allows_streaming => 1, extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create a temporary directory in the system location and symlink it
+# to our physical temp location.  That way we can use shorter names
+# for the tablespace directories, which hopefully won't run afoul of
+# the 99 character length limit.
+my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short;
+my $real_sys_tempdir = PostgreSQL::Test::Utils::perl2host($sys_tempdir) . "/tempdir";
+my $shorter_tempdir =  $sys_tempdir . "/tempdir";
+dir_symlink "$tempdir", $shorter_tempdir;
+
+mkdir "$tempdir/tbl=spc2";
+my $realTsDir = "$real_sys_tempdir/tbl=spc2";
+$node->safe_psql('postgres',
+	"CREATE TABLESPACE tblspc2 LOCATION '$realTsDir';");
+$realTsDir =~ s/=/\\=/;
+$node->command_ok(
+	[
+		@pg_basebackup_defs, '-D',
+		"$tempdir/backup3",  '-Fp',
+		"-T$realTsDir=$real_tempdir/tbackup/tbl\\=spc2",
+	],
+	'mapping tablespace with = sign in path');
+ok(-d "$tempdir/tbackup/tbl=spc2", 'tablespace with = sign was relocated');
+$node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+rmtree("$tempdir/backup3");
+
+my $superlongname = "superlongname_" . ("x" x 100);
+mkdir "$tempdir/$superlongname";
+$realTsDir = "$real_sys_tempdir/$superlongname";
+$node->safe_psql('postgres',
+	"CREATE TABLESPACE tblspc3 LOCATION '$realTsDir';");
+$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l3", '-Ft' ],
+	'pg_basebackup tar with long symlink target');
+$node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+rmtree("$tempdir/tarbackup_l3");
+
+$node->command_ok([ @pg_basebackup_defs, '-D', "$tempdir/backupR", '-R' ],
+	'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal",       'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+	$recovery_conf,
+	qr/^primary_conninfo = '.*port=$port.*'\n/m,
+	'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxd" ],
+	'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+	'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxf", '-X', 'fetch' ],
+	'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+	'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs", '-X', 'stream' ],
+	'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+	'WAL files copied');
+rmtree("$tempdir/backupxs");
+$node->command_ok(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' ],
+	'pg_basebackup -X stream runs in tar mode');
+ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
+rmtree("$tempdir/backupxst");
+$node->command_ok(
+	[
+		@pg_basebackup_defs,     '-D',
+		"$tempdir/backupnoslot", '-X',
+		'stream',                '--no-slot'
+	],
+	'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+	[
+		@pg_basebackup_defs,         '-D',
+		"$tempdir/backupxs_sl_fail", '-X',
+		'stream',                    '-S',
+		'slot0'
+	],
+	'pg_basebackup fails with nonexistent replication slot');
+
+$node->command_fails(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C' ],
+	'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+	[
+		@pg_basebackup_defs,      '-D',
+		"$tempdir/backupxs_slot", '-C',
+		'-S',                     'slot0',
+		'--no-slot'
+	],
+	'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0' ],
+	'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+		'postgres',
+		q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+	),
+	'slot0',
+	'replication slot was created');
+isnt(
+	$node->safe_psql(
+		'postgres',
+		q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+	),
+	'',
+	'restart LSN of new slot is not null');
+
+$node->command_fails(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0' ],
+	'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+	q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+	q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+	[ @pg_basebackup_defs, '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none' ],
+	'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+	[
+		@pg_basebackup_defs, '-D', "$tempdir/backupxs_sl", '-X',
+		'stream',            '-S', 'slot1'
+	],
+	'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+	q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+	[
+		@pg_basebackup_defs, '-D', "$tempdir/backupxs_sl_R", '-X',
+		'stream',            '-S', 'slot1',                  '-R',
+	],
+	'pg_basebackup with replication slot and -R runs');
+like(
+	slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+	qr/^primary_slot_name = 'slot1'\n/m,
+	'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+	q{CREATE TABLE corrupt1 AS SELECT a FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+	q{CREATE TABLE corrupt2 AS SELECT b FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open my $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, SEEK_SET);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt" ],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*checksum verification failed/s],
+	'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+	my $offset = $pageheader_size + $i * $block_size;
+	seek($file, $offset, SEEK_SET);
+	syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt2" ],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+	'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, SEEK_SET);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+	[ @pg_basebackup_defs, '-D', "$tempdir/backup_corrupt3" ],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*7 total checksum verification failures/s],
+	'pg_basebackup correctly report the total number of checksum mismatches');
+rmtree("$tempdir/backup_corrupt3");
+
+# do not verify checksums, should return ok
+$node->command_ok(
+	[
+		@pg_basebackup_defs,        '-D',
+		"$tempdir/backup_corrupt4", '--no-verify-checksums',
+	],
+	'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
+
+note "Testing pg_basebackup with compression methods";
+
+# Check ZLIB compression if available.
+SKIP:
+{
+	skip "postgres was not built with ZLIB support", 5
+	  if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+	$node->command_ok(
+		[
+			@pg_basebackup_defs,    '-D',
+			"$tempdir/backup_gzip", '--compress',
+			'1',                    '--format',
+			't'
+		],
+		'pg_basebackup with --compress');
+	$node->command_ok(
+		[
+			@pg_basebackup_defs,     '-D',
+			"$tempdir/backup_gzip2", '--gzip',
+			'--format',              't'
+		],
+		'pg_basebackup with --gzip');
+
+	# Verify that the stored files are generated with their expected
+	# names.
+	my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+	is(scalar(@zlib_files), 2,
+		"two files created with --compress (base.tar.gz and pg_wal.tar.gz)");
+	my @zlib_files2 = glob "$tempdir/backup_gzip2/*.tar.gz";
+	is(scalar(@zlib_files2), 2,
+		"two files created with --gzip (base.tar.gz and pg_wal.tar.gz)");
+
+	# Check the integrity of the files generated.
+	my $gzip = $ENV{GZIP_PROGRAM};
+	skip "program gzip is not found in your system", 1
+	  if ( !defined $gzip
+		|| $gzip eq ''
+		|| system_log($gzip, '--version') != 0);
+
+	my $gzip_is_valid =
+	  system_log($gzip, '--test', @zlib_files, @zlib_files2);
+	is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+	rmtree("$tempdir/backup_gzip");
+	rmtree("$tempdir/backup_gzip2");
+}
-- 
2.24.3 (Apple Git-128)

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-18 12:49:16 -0500, Robert Haas wrote:

Here's a patch that splits up that file.

Ah, nice! The split seems sensible to me.

Sadly, we've gained about 2.5 seconds of runtime as the price of
splitting the test. Arguably the options part could be split up a lot
more finely than this, but that would drive up the runtime even more,
basically because we'd need more initdbs. So I don't know whether it's
better to leave things as they are, split them this much, or split
them more. I think this amount of splitting might be justified simply
in the interests of clarity, but I'm reluctant to go further unless we
get some nifty initdb-caching system.

Hm. From the buildfarm / CF perspective it might still be a win, because the
different pieces can run concurrently. But it's not great :(.

Maybe we really should do at least the most simplistic caching for initdbs, by
doing one initdb as part of the creation of temp_install. Then Cluster::init
would need logic to only use that if $params{extra} is empty.

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#9)
1 attachment(s)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-18 13:40:40 -0800, Andres Freund wrote:

Maybe we really should do at least the most simplistic caching for initdbs, by
doing one initdb as part of the creation of temp_install. Then Cluster::init
would need logic to only use that if $params{extra} is empty.

I hacked this together. And the wins are bigger than I thought. On my
workstation, with plenty cpu and storage bandwidth, according to
/usr/bin/time check-world NO_TEMP_INSTALL=1
things go from

321.56user 74.00system 2:26.22elapsed 270%CPU (0avgtext+0avgdata 93768maxresident)k
24inputs+32781336outputs (2254major+8717121minor)pagefaults 0swaps

to

86.62user 57.10system 1:57.83elapsed 121%CPU (0avgtext+0avgdata 93752maxresident)k
8inputs+32683408outputs (1360major+6672618minor)pagefaults 0swaps

The difference in elapsed and system time is pretty good, but the user time
difference is quite staggering.

This doesn't yet actually address the case of the basebackup tests, because
that specifies a "non-default" option, preventing the use of the template
initdb. But the effects are already big enough that I thought it's worth
sharing.

On CI for windows this reduces the time for the subscription tests from 03:24,
to 2:39. There's some run-to-run variation, but it's a pretty clear signal...

Greetings,

Andres Freund

Attachments:

v2-0001-hack-use-template-initdb-in-tap-tests.patchtext/x-diff; charset=us-asciiDownload
From a0818b6c8d03f152baa5df231b27aa7b8a7fde45 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 18 Jan 2022 15:25:27 -0800
Subject: [PATCH v2] hack: use "template" initdb in tap tests.

---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 31 +++++++++++++-
 src/test/regress/pg_regress.c            | 45 ++++++++++++++------
 src/Makefile.global.in                   | 52 +++++++++++++-----------
 src/tools/msvc/Install.pm                |  4 ++
 src/tools/msvc/vcregress.pl              |  1 +
 5 files changed, 94 insertions(+), 39 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db139..7e235c90d8c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -430,8 +430,35 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
+	if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+	{
+		diag("*not* using initdb template");
+		PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
+			@{ $params{extra} });
+	}
+	else
+	{
+		my $old_umask;
+
+		diag("using initdb template");
+
+		if (!$PostgreSQL::Test::Utils::windows_os)
+		{
+			$old_umask = umask;
+			umask 0077;
+		}
+
+		PostgreSQL::Test::RecursiveCopy::copypath(
+			$ENV{INITDB_TEMPLATE},
+			$pgdata,
+		  );
+
+		if (!$PostgreSQL::Test::Utils::windows_os)
+		{
+			umask $old_umask;
+		}
+	}
+
 	PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata,
 		@{ $params{auth_extra} });
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index e6f71c7582e..07f1beae6c6 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2232,6 +2232,7 @@ regression_main(int argc, char *argv[],
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
+		const char* initdb_template_dir;
 
 		/*
 		 * Prepare the temp instance
@@ -2258,20 +2259,38 @@ regression_main(int argc, char *argv[],
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
-		header(_("initializing database system"));
-		snprintf(buf, sizeof(buf),
-				 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
-				 bindir ? bindir : "",
-				 bindir ? "/" : "",
-				 temp_instance,
-				 debug ? " --debug" : "",
-				 nolocale ? " --no-locale" : "",
-				 outputdir);
-		if (system(buf))
+		/* create data directory, either from a template, or by running initdb */
+		initdb_template_dir = getenv("INITDB_TEMPLATE");
+		if (initdb_template_dir == NULL)
 		{
-			fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
-			exit(2);
+			header(_("initializing database system"));
+			snprintf(buf, sizeof(buf),
+					 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
+					 bindir ? bindir : "",
+					 bindir ? "/" : "",
+					 temp_instance,
+					 debug ? " --debug" : "",
+					 nolocale ? " --no-locale" : "",
+					 outputdir);
+			if (system(buf))
+			{
+				fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
+				exit(2);
+			}
+		}
+		else
+		{
+			/* FIXME: this only works on windows when there's a GNU cp in PATH */
+			header(_("initializing database system from template"));
+			snprintf(buf, sizeof(buf),
+					 "cp -a \"%s\" \"%s/data\"",
+					 initdb_template_dir,
+					 temp_instance);
+			if (system(buf))
+			{
+				fprintf(stderr, _("\n%s: copying of initdb template failed\n"), progname);
+				exit(2);
+			}
 		}
 
 		/*
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 05c54b27def..6d07e76ad68 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -401,30 +401,6 @@ check: temp-install
 
 .PHONY: temp-install
 
-temp-install: | submake-generated-headers
-ifndef NO_TEMP_INSTALL
-ifneq ($(abs_top_builddir),)
-ifeq ($(MAKELEVEL),0)
-	rm -rf '$(abs_top_builddir)'/tmp_install
-	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
-	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-endif
-endif
-endif
-
-# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
-# entries appear more than once in the tree, and parallel installs of the same
-# file can fail with EEXIST.
-checkprep:
-	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
-
-PROVE = @PROVE@
-# There are common routines in src/test/perl, and some test suites have
-# extra perl modules in their own directory.
-PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
-# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
-PROVE_FLAGS =
 
 # prepend to path if already set, else just set it
 define add_to_path
@@ -441,8 +417,36 @@ ld_library_path_var = LD_LIBRARY_PATH
 with_temp_install = \
 	PATH="$(abs_top_builddir)/tmp_install$(bindir):$(CURDIR):$$PATH" \
 	$(call add_to_path,$(strip $(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+	INITDB_TEMPLATE='$(abs_top_builddir)'/tmp_install/initdb-template \
 	$(with_temp_install_extra)
 
+temp-install: | submake-generated-headers
+ifndef NO_TEMP_INSTALL
+ifneq ($(abs_top_builddir),)
+ifeq ($(MAKELEVEL),0)
+	rm -rf '$(abs_top_builddir)'/tmp_install
+	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+
+	$(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+endif
+endif
+endif
+
+# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
+# entries appear more than once in the tree, and parallel installs of the same
+# file can fail with EEXIST.
+checkprep:
+	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+
+PROVE = @PROVE@
+# There are common routines in src/test/perl, and some test suites have
+# extra perl modules in their own directory.
+PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
+# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
+PROVE_FLAGS =
+
 ifeq ($(enable_tap_tests),yes)
 
 ifndef PGXS
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index 8de79c618cb..cef069c6162 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -14,6 +14,7 @@ use Carp;
 use File::Basename;
 use File::Copy;
 use File::Find ();
+use File::Path qw(rmtree);
 
 use Exporter;
 our (@ISA, @EXPORT_OK);
@@ -183,6 +184,9 @@ sub Install
 
 	GenerateNLSFiles($target, $config->{nls}, $majorver) if ($config->{nls});
 
+	rmtree("$target/initdb_template");
+	system("$target/bin/initdb -D $target/initdb_template -A trust -N") == 0 || die "template initdb failed";
+
 	print "Installation complete.\n";
 	return;
 }
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 6dcd742fa6c..d05fc364c55 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -73,6 +73,7 @@ $ENV{with_icu} = $config->{icu} ? 'yes' : 'no';
 $ENV{with_gssapi} = $config->{gss} ? 'yes' : 'no';
 $ENV{with_krb_srvnam} = $config->{krb_srvnam} || 'postgres';
 $ENV{with_readline} = 'no';
+$ENV{INITDB_TEMPLATE} = "$tmp_installdir/initdb_template";
 
 $ENV{PATH} = "$topdir/$Config/libpq;$ENV{PATH}";
 
-- 
2.34.0

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#10)
Re: slowest tap tests - split or accelerate?

Andres Freund <andres@anarazel.de> writes:

On 2022-01-18 13:40:40 -0800, Andres Freund wrote:

Maybe we really should do at least the most simplistic caching for initdbs, by
doing one initdb as part of the creation of temp_install. Then Cluster::init
would need logic to only use that if $params{extra} is empty.

I hacked this together. And the wins are bigger than I thought.

Me too ;-). As I remarked earlier, I'd tried this once before and
gave up because it didn't seem to be winning much. But that was
before we had so many initdb's triggered by TAP tests, I think.

I tried this patch on florican's host, which seems mostly disk-bound
when doing check-world. It barely gets any win from parallelism:

$ time make -s check-world -j1 >/dev/null
3809.60 real 584.44 user 282.23 sys
$ time make -s check-world -j2 >/dev/null
3789.90 real 610.60 user 289.60 sys

Adding v2-0001-hack-use-template-initdb-in-tap-tests.patch:

$ time make -s check-world -j1 >/dev/null
3193.46 real 221.32 user 226.11 sys
$ time make -s check-world -j2 >/dev/null
3211.19 real 224.31 user 230.07 sys

(Note that all four runs have the "fsync = on" removed from
008_fsm_truncation.pl.)

So this looks like it'll be a nice win for low-end hardware, too.

regards, tom lane

#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#11)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-19 11:54:01 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2022-01-18 13:40:40 -0800, Andres Freund wrote:

Maybe we really should do at least the most simplistic caching for initdbs, by
doing one initdb as part of the creation of temp_install. Then Cluster::init
would need logic to only use that if $params{extra} is empty.

I hacked this together. And the wins are bigger than I thought.

Me too ;-). As I remarked earlier, I'd tried this once before and
gave up because it didn't seem to be winning much. But that was
before we had so many initdb's triggered by TAP tests, I think.

What approach did you use? Do you have a better idea than generating
tmp_install/initdb_template?

I for a bit wondered whether initdb should do this internally instead. But it
seemed more work than I wanted to tackle.

The bit in the patch about generating initdb_template in Install.pm definitely
needs to be made conditional, but I don't precisely know on what. The
buildfarm just calls it as
perl install.pl "$installdir

So this looks like it'll be a nice win for low-end hardware, too.

Nice!

(Note that all four runs have the "fsync = on" removed from
008_fsm_truncation.pl.)

I assume you're planning on comitting that?

Greetings,

Andres Freund

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
1 attachment(s)
Re: slowest tap tests - split or accelerate?

Andres Freund <andres@anarazel.de> writes:

On 2022-01-19 11:54:01 -0500, Tom Lane wrote:

Me too ;-). As I remarked earlier, I'd tried this once before and
gave up because it didn't seem to be winning much. But that was
before we had so many initdb's triggered by TAP tests, I think.

What approach did you use? Do you have a better idea than generating
tmp_install/initdb_template?

No, it was largely the same as what you have here, I think. I dug
up my WIP patch and attach it below, just in case there's any ideas
worth borrowing.

(Note that all four runs have the "fsync = on" removed from
008_fsm_truncation.pl.)

I assume you're planning on comitting that?

Yeah, will do that shortly.

regards, tom lane

Attachments:

old-hack-to-avoid-duplicate-initdb.patchtext/x-diff; charset=us-ascii; name=old-hack-to-avoid-duplicate-initdb.patchDownload
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dc8a89af8e..a28a03ac72 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -332,6 +332,7 @@ ifeq ($(MAKELEVEL),0)
 	rm -rf '$(abs_top_builddir)'/tmp_install
 	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
 	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+	'$(abs_top_builddir)/tmp_install$(bindir)/initdb' -D '$(abs_top_builddir)/tmp_install/proto_pgdata' --no-clean --no-sync -A trust >'$(abs_top_builddir)'/tmp_install/log/initdb.log 2>&1
 endif
 	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done)
 endif
@@ -355,7 +356,7 @@ $(if $(filter $(PORTNAME),darwin),DYLD_LIBRARY_PATH,$(if $(filter $(PORTNAME),ai
 endef
 
 define with_temp_install
-PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir))
+PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) PG_PROTO_DATADIR='$(abs_top_builddir)/tmp_install/proto_pgdata'
 endef
 
 ifeq ($(enable_tap_tests),yes)
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index f4fa600951..22c1a726c7 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -404,8 +404,23 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
+	# If we're using default initdb parameters, and the top-level "make check"
+	# created a prototype data directory for us, just copy that.
+	if (!defined($params{extra}) &&
+		defined($ENV{PG_PROTO_DATADIR}) &&
+		-d $ENV{PG_PROTO_DATADIR})
+	{
+		rmdir($pgdata);
+		RecursiveCopy::copypath($ENV{PG_PROTO_DATADIR}, $pgdata);
+		chmod(0700, $pgdata);
+	}
+	else
+	{
+		TestLib::system_or_bail('initdb', '-D', $pgdata,
+								'--no-clean', '--no-sync', '-A', 'trust',
+								@{ $params{extra} });
+	}
+
 	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
 
 	open my $conf, '>>', "$pgdata/postgresql.conf";
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b1ed..04d7fb910b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2214,6 +2214,8 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 
 	if (temp_instance)
 	{
+		char	   *pg_proto_datadir;
+		struct stat ppst;
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
@@ -2243,20 +2245,43 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
-		header(_("initializing database system"));
-		snprintf(buf, sizeof(buf),
-				 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
-				 bindir ? bindir : "",
-				 bindir ? "/" : "",
-				 temp_instance,
-				 debug ? " --debug" : "",
-				 nolocale ? " --no-locale" : "",
-				 outputdir);
-		if (system(buf))
+		/* see if we should run initdb or just copy a prototype datadir */
+		pg_proto_datadir = getenv("PG_PROTO_DATADIR");
+		if (!nolocale &&
+			pg_proto_datadir &&
+			stat(pg_proto_datadir, &ppst) == 0 &&
+			S_ISDIR(ppst.st_mode))
 		{
-			fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
-			exit(2);
+			/* copy prototype */
+			header(_("copying prototype data directory"));
+			snprintf(buf, sizeof(buf),
+					 "cp -a \"%s\" \"%s/data\" > \"%s/log/initdb.log\" 2>&1",
+					 pg_proto_datadir,
+					 temp_instance,
+					 outputdir);
+			if (system(buf))
+			{
+				fprintf(stderr, _("\n%s: cp failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
+				exit(2);
+			}
+		}
+		else
+		{
+			/* run initdb */
+			header(_("initializing database system"));
+			snprintf(buf, sizeof(buf),
+					 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
+					 bindir ? bindir : "",
+					 bindir ? "/" : "",
+					 temp_instance,
+					 debug ? " --debug" : "",
+					 nolocale ? " --no-locale" : "",
+					 outputdir);
+			if (system(buf))
+			{
+				fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
+				exit(2);
+			}
 		}
 
 		/*
#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-19 12:14:21 -0500, Tom Lane wrote:

No, it was largely the same as what you have here, I think. I dug
up my WIP patch and attach it below, just in case there's any ideas
worth borrowing.

Heh, it does look quite similar.

+					 "cp -a \"%s\" \"%s/data\" > \"%s/log/initdb.log\" 2>&1",
+					 pg_proto_datadir,
+					 temp_instance,
+					 outputdir);
+			if (system(buf))
+			{
+				fprintf(stderr, _("\n%s: cp failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
+				exit(2);
+			}

Both ours have this. Unfortunately on windows cp doesn't natively
exist. Although git does provide it. I tried a few things that appear to be
natively available (time is best of three executions):

gnu cp from git, cp -a tmp_install\initdb_template t\
670ms

xcopy.exe /E /Q tmp_install\initdb_template t\
638ms

robocopy /e /NFL /NDL tmp_install\initdb_template t\
575ms

So I guess we could use robocopy? That's shipped as part of windows starting in
windows 10... xcopy has been there for longer, so I might just default to that.

Greetings,

Andres Freund

#15Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#14)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-19 09:42:31 -0800, Andres Freund wrote:

Both ours have this. Unfortunately on windows cp doesn't natively
exist. Although git does provide it. I tried a few things that appear to be
natively available (time is best of three executions):

gnu cp from git, cp -a tmp_install\initdb_template t\
670ms

xcopy.exe /E /Q tmp_install\initdb_template t\
638ms

This errors out if there's any forward slashes in paths, thinking it's a
flag. Seems out.

robocopy /e /NFL /NDL tmp_install\initdb_template t\
575ms

So I guess we could use robocopy? That's shipped as part of windows starting in
windows 10... xcopy has been there for longer, so I might just default to that.

It's part of of the OS back to at least windows 2016. I've found some random
links on the webs saying that it's included "This command is available in
Vista and Windows 7 by default. For Windows XP and Server 2003 this tool can
be downloaded as part of Server 2003 Windows Resource Kit tools. ".

Given that our oldest supported msvc version only runs on Windows 7 upwards
[2]: https://docs.microsoft.com/en-us/visualstudio/releases/2013/vs2013-sysrequirements-vs

Alternatively we could lift copydir() to src/common? But that seems like a bit
more work than I want to put in.

For a second I was thinking that using something like copy --reflink=auto
could make a lot of sense for machines like florican, removing most of the IO
from a "templated initdb". But it looks like freebsd doesn't have that, and
it'd be a pain to figure out whether cp has --reflink.

Greetings,

Andres Freund

[1]: https://www.windows-commandline.com/download-robocopy/
[2]: https://docs.microsoft.com/en-us/visualstudio/releases/2013/vs2013-sysrequirements-vs

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#15)
Re: slowest tap tests - split or accelerate?

On 1/19/22 21:18, Andres Freund wrote:

Hi,

On 2022-01-19 09:42:31 -0800, Andres Freund wrote:

Both ours have this. Unfortunately on windows cp doesn't natively
exist. Although git does provide it. I tried a few things that appear to be
natively available (time is best of three executions):

gnu cp from git, cp -a tmp_install\initdb_template t\
670ms

xcopy.exe /E /Q tmp_install\initdb_template t\
638ms

This errors out if there's any forward slashes in paths, thinking it's a
flag. Seems out.

robocopy /e /NFL /NDL tmp_install\initdb_template t\
575ms

So I guess we could use robocopy? That's shipped as part of windows starting in
windows 10... xcopy has been there for longer, so I might just default to that.

It's part of of the OS back to at least windows 2016. I've found some random
links on the webs saying that it's included "This command is available in
Vista and Windows 7 by default. For Windows XP and Server 2003 this tool can
be downloaded as part of Server 2003 Windows Resource Kit tools. ".

Given that our oldest supported msvc version only runs on Windows 7 upwards
[2], I think we should be good?

Alternatively we could lift copydir() to src/common? But that seems like a bit
more work than I want to put in.

For a second I was thinking that using something like copy --reflink=auto
could make a lot of sense for machines like florican, removing most of the IO
from a "templated initdb". But it looks like freebsd doesn't have that, and
it'd be a pain to figure out whether cp has --reflink.

FYI, the buildfarm code has this. It doesn't need backslashed paths, you
just need to quote the paths, which you should probably do anyway:

sub copydir
{
    my ($from, $to, $logfile) = @_;
    my ($cp, $rd);
    if ($PGBuild::conf{using_msvc})
    {
        $cp = "robocopy /nfl /ndl /np /e /sec ";
        $rd = qq{/LOG+:"$logfile" >nul};
    }
    else
    {
        $cp = "cp -r";
        $rd = qq{> "$logfile"};
    }
    system(qq{$cp "$from" "$to" $rd 2>&1});
    ## no critic (RequireLocalizedPunctuationVars)
    $? = 0 if ($cp =~ /robocopy/ && $? >> 8 == 1);
    return;
}

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#17Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#15)
Re: slowest tap tests - split or accelerate?

Hi,

On 2022-01-19 18:18:59 -0800, Andres Freund wrote:

robocopy /e /NFL /NDL tmp_install\initdb_template t\
575ms

So I guess we could use robocopy? That's shipped as part of windows starting in
windows 10... xcopy has been there for longer, so I might just default to that.

It's part of of the OS back to at least windows 2016. I've found some random
links on the webs saying that it's included "This command is available in
Vista and Windows 7 by default. For Windows XP and Server 2003 this tool can
be downloaded as part of Server 2003 Windows Resource Kit tools. ".

Given that our oldest supported msvc version only runs on Windows 7 upwards
[2], I think we should be good?

One thing I'm not sure about is where to perform the creation of the
"template" for the msvc scripts. The prototype upthread created it
unconditionally in Install.pm, but that's clearly not right.

The buildfarm currently creates the temporary installation using a generic
perl install.pl "$installdir" and then uses NO_TEMP_INSTALL.

I don't really have a better idea than to introduce a dedicated vcregress.pl
command to create the temporary installation? :(

Greetings,

Andres Freund

#18Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#15)
1 attachment(s)
initdb caching during tests

Hi,

We have some issues with CI on macos and windows being too expensive (more on
that soon in a separate email), which reminded me of this thread (with
original title: [1]/messages/by-id/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de)

I've attached a somewhat cleaned up version of the patch to cache initdb
across runs. The results are still fairly impressive in my opinion.

One thing I do not like, but don't have a good idea for how to improve, is
that there's a bunch of duplicated logic in pg_regress.c and Cluster.pm. I've
tried to move that into initdb.c itself, but that ends up pretty ugly, because
we need to be a lot more careful about checking whether options are compatible
etc. I've also thought about just putting this into a separate perl script,
but right now we still allow basic regression tests without perl being
available. So I concluded that for now just having the copies is the best
answer.

Times for running all tests under meson, on my workstation (20 cores / 40
threads):

cassert build -O2:

Before:
real 0m44.638s
user 7m58.780s
sys 2m48.773s

After:
real 0m38.938s
user 2m37.615s
sys 2m0.570s

cassert build -O0:

Before:
real 1m11.290s
user 13m9.817s
sys 2m54.946s

After:
real 1m2.959s
user 3m5.835s
sys 1m59.887s

non-cassert build:

Before:
real 0m34.579s
user 5m30.418s
sys 2m40.507s

After:
real 0m27.710s
user 2m20.644s
sys 1m55.770s

On CI this reduces the test times substantially:
Freebsd 8:51 -> 5:35
Debian w/ asan, autoconf 6:43 -> 4:55
Debian w/ alignmentsan, ubsan 4:02 -> 2:33
macos 5:07 -> 4:29
windows 10:21 -> 9:49

This is ignoring a bit of run-to-run variance, but the trend is obvious enough
that it's not worth worrying about that.

Greetings,

Andres Freund

[1]: /messages/by-id/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de

Attachments:

v3-0001-Use-template-initdb-in-tests.patchtext/x-diff; charset=us-asciiDownload
From 0fd431c277f01284a91999a04368de6b59b6691e Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 2 Feb 2023 21:51:53 -0800
Subject: [PATCH v3 1/2] Use "template" initdb in tests

Discussion: https://postgr.es/m/20220120021859.3zpsfqn4z7ob7afz@alap3.anarazel.de
---
 meson.build                              | 30 ++++++++++
 .cirrus.yml                              |  3 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm | 46 ++++++++++++++-
 src/test/regress/pg_regress.c            | 74 ++++++++++++++++++------
 src/Makefile.global.in                   | 52 +++++++++--------
 5 files changed, 161 insertions(+), 44 deletions(-)

diff --git a/meson.build b/meson.build
index 04ea3488522..47429a18c3f 100644
--- a/meson.build
+++ b/meson.build
@@ -3056,8 +3056,10 @@ testport = 40000
 test_env = environment()
 
 temp_install_bindir = test_install_location / get_option('bindir')
+test_initdb_template = meson.build_root() / 'tmp_install' / 'initdb-template'
 test_env.set('PG_REGRESS', pg_regress.full_path())
 test_env.set('REGRESS_SHLIB', regress_module.full_path())
+test_env.set('INITDB_TEMPLATE', test_initdb_template)
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable PG_TEST_EXTRA.
@@ -3072,6 +3074,34 @@ if library_path_var != ''
 endif
 
 
+# Create (and remove old) initdb template directory. Tests use that, where
+# possible, to make it cheaper to run tests.
+#
+# Use python to remove the old cached initdb, as we cannot rely on a working
+# 'rm' binary on windows.
+test('initdb_cache',
+     python,
+     args: [
+       '-c', '''
+import shutil
+import sys
+import subprocess
+
+shutil.rmtree(sys.argv[1], ignore_errors=True)
+sp = subprocess.run(sys.argv[2:] + [sys.argv[1]])
+sys.exit(sp.returncode)
+''',
+       test_initdb_template,
+       temp_install_bindir / 'initdb',
+       '-A', 'trust', '-N', '--no-instructions'
+     ],
+     priority: setup_tests_priority - 1,
+     timeout: 300,
+     is_parallel: false,
+     env: test_env,
+     suite: ['setup'])
+
+
 
 ###############################################################
 # Test Generation
diff --git a/.cirrus.yml b/.cirrus.yml
index d260f15c4e2..8fce17dff08 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -115,8 +115,9 @@ task:
   test_minimal_script: |
     su postgres <<-EOF
       ulimit -c unlimited
+      meson test $MTEST_ARGS --suite setup
       meson test $MTEST_ARGS --num-processes ${TEST_JOBS} \
-        tmp_install cube/regress pg_ctl/001_start_stop
+        cube/regress pg_ctl/001_start_stop
     EOF
 
   on_failure:
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 5e161dbee60..4d449c35de9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -522,8 +522,50 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
-		'trust', '-N', @{ $params{extra} });
+	# If available and if there aren't any parameters, use a previously
+	# initdb'd cluster as a template by copying it. For a lot of tests, that's
+	# substantially cheaper. Do so only if there aren't parameters, it doesn't
+	# seem worth figuring out whether they affect compatibility.
+	#
+	# There's very similar code in pg_regress.c, but we can't easily
+	# deduplicate it until we require perl at build time.
+	if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+	{
+		note("initializing database system by running initdb");
+		PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A',
+			'trust', '-N', @{ $params{extra} });
+	}
+	else
+	{
+		my @copycmd;
+		my $expected_exitcode;
+
+		note("initializing database system by copying initdb template");
+
+		if ($PostgreSQL::Test::Utils::windows_os)
+		{
+			@copycmd = qw(robocopy /E /NJS /NJH /NFL /NDL /NP);
+			$expected_exitcode = 1;    # 1 denotes files were copied
+		}
+		else
+		{
+			@copycmd = qw(cp -a);
+			$expected_exitcode = 0;
+		}
+
+		@copycmd = (@copycmd, $ENV{INITDB_TEMPLATE}, $pgdata);
+
+		my $ret = PostgreSQL::Test::Utils::system_log(@copycmd);
+
+		# See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR
+		if ($ret & 127 or $ret >> 8 != $expected_exitcode)
+		{
+			BAIL_OUT(
+				sprintf("failed to execute command \"%s\": $ret",
+					join(" ", @copycmd)));
+		}
+	}
+
 	PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS},
 		'--config-auth', $pgdata, @{ $params{auth_extra} });
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b68632320a7..407e3915cec 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2295,6 +2295,7 @@ regression_main(int argc, char *argv[],
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
+		const char *initdb_template_dir;
 
 		/*
 		 * Prepare the temp instance
@@ -2316,25 +2317,64 @@ regression_main(int argc, char *argv[],
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
 		initStringInfo(&cmd);
-		appendStringInfo(&cmd,
-						 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
-						 bindir ? bindir : "",
-						 bindir ? "/" : "",
-						 temp_instance);
-		if (debug)
-			appendStringInfo(&cmd, " --debug");
-		if (nolocale)
-			appendStringInfo(&cmd, " --no-locale");
-		appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
-		fflush(NULL);
-		if (system(cmd.data))
+
+		/*
+		 * Create data directory.
+		 *
+		 * If available, use a previously initdb'd cluster as a template by
+		 * copying it. For a lot of tests, that's substantially cheaper.
+		 *
+		 * There's very similar code in Cluster.pm, but we can't easily de
+		 * duplicate it until we require perl at build time.
+		 */
+		initdb_template_dir = getenv("INITDB_TEMPLATE");
+		if (initdb_template_dir == NULL || nolocale || debug)
 		{
-			bail("initdb failed\n"
-				 "# Examine \"%s/log/initdb.log\" for the reason.\n"
-				 "# Command was: %s",
-				 outputdir, cmd.data);
+			note("initializing database system by running initdb");
+
+			appendStringInfo(&cmd,
+							 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync",
+							 bindir ? bindir : "",
+							 bindir ? "/" : "",
+							 temp_instance);
+			if (debug)
+				appendStringInfo(&cmd, " --debug");
+			if (nolocale)
+				appendStringInfo(&cmd, " --no-locale");
+			appendStringInfo(&cmd, " > \"%s/log/initdb.log\" 2>&1", outputdir);
+			fflush(NULL);
+			if (system(cmd.data))
+			{
+				bail("initdb failed\n"
+					 "# Examine \"%s/log/initdb.log\" for the reason.\n"
+					 "# Command was: %s",
+					 outputdir, cmd.data);
+			}
+		}
+		else
+		{
+#ifndef WIN32
+			const char *copycmd = "cp -a \"%s\" \"%s/data\"";
+			int			expected_exitcode = 0;
+#else
+			const char *copycmd = "robocopy /E /NJS /NJH /NFL /NDL /NP \"%s\" \"%s/data\"";
+			int			expected_exitcode = 1;	/* 1 denotes files were copied */
+#endif
+
+			note("initializing database system by copying initdb template");
+
+			appendStringInfo(&cmd,
+							 copycmd,
+							 initdb_template_dir,
+							 temp_instance);
+			if (system(cmd.data) != expected_exitcode)
+			{
+				bail("copying of initdb template failed\n"
+					 "# Examine \"%s/log/initdb.log\" for the reason.\n"
+					 "# Command was: %s",
+					 outputdir, cmd.data);
+			}
 		}
 
 		pfree(cmd.data);
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index df9f721a41a..0b4ca0eb6ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -397,30 +397,6 @@ check: temp-install
 
 .PHONY: temp-install
 
-temp-install: | submake-generated-headers
-ifndef NO_TEMP_INSTALL
-ifneq ($(abs_top_builddir),)
-ifeq ($(MAKELEVEL),0)
-	rm -rf '$(abs_top_builddir)'/tmp_install
-	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
-	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
-endif
-endif
-endif
-
-# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
-# entries appear more than once in the tree, and parallel installs of the same
-# file can fail with EEXIST.
-checkprep:
-	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
-
-PROVE = @PROVE@
-# There are common routines in src/test/perl, and some test suites have
-# extra perl modules in their own directory.
-PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
-# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
-PROVE_FLAGS =
 
 # prepend to path if already set, else just set it
 define add_to_path
@@ -437,8 +413,36 @@ ld_library_path_var = LD_LIBRARY_PATH
 with_temp_install = \
 	PATH="$(abs_top_builddir)/tmp_install$(bindir):$(CURDIR):$$PATH" \
 	$(call add_to_path,$(strip $(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+	INITDB_TEMPLATE='$(abs_top_builddir)'/tmp_install/initdb-template \
 	$(with_temp_install_extra)
 
+temp-install: | submake-generated-headers
+ifndef NO_TEMP_INSTALL
+ifneq ($(abs_top_builddir),)
+ifeq ($(MAKELEVEL),0)
+	rm -rf '$(abs_top_builddir)'/tmp_install
+	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+
+	$(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+endif
+endif
+endif
+
+# Tasks to run serially at the end of temp-install.  Some EXTRA_INSTALL
+# entries appear more than once in the tree, and parallel installs of the same
+# file can fail with EEXIST.
+checkprep:
+	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+
+PROVE = @PROVE@
+# There are common routines in src/test/perl, and some test suites have
+# extra perl modules in their own directory.
+PG_PROVE_FLAGS = -I $(top_srcdir)/src/test/perl/ -I $(srcdir)
+# User-supplied prove flags such as --verbose can be provided in PROVE_FLAGS.
+PROVE_FLAGS =
+
 ifeq ($(enable_tap_tests),yes)
 
 ifndef PGXS
-- 
2.38.0

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: initdb caching during tests

Andres Freund <andres@anarazel.de> writes:

Times for running all tests under meson, on my workstation (20 cores / 40
threads):

cassert build -O2:

Before:
real 0m44.638s
user 7m58.780s
sys 2m48.773s

After:
real 0m38.938s
user 2m37.615s
sys 2m0.570s

Impressive results. Even though your bottom-line time doesn't change that
much, the big reduction in CPU time should translate to a nice speedup
on slower buildfarm animals.

(Disclaimer: I've not read the patch.)

regards, tom lane

#20Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
Re: initdb caching during tests

Hi,

On 2023-08-05 16:58:38 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Times for running all tests under meson, on my workstation (20 cores / 40
threads):

cassert build -O2:

Before:
real 0m44.638s
user 7m58.780s
sys 2m48.773s

After:
real 0m38.938s
user 2m37.615s
sys 2m0.570s

Impressive results. Even though your bottom-line time doesn't change that
much

Unfortunately we have a few tests that take quite a while - for those the
initdb removal doesn't make that much of a difference. Particularly because
this machine has enough CPUs to not be fully busy except for the first few
seconds...

E.g. for a run with the patch applied:

258/265 postgresql:pg_basebackup / pg_basebackup/010_pg_basebackup OK 16.58s 187 subtests passed
259/265 postgresql:subscription / subscription/100_bugs OK 6.69s 12 subtests passed
260/265 postgresql:regress / regress/regress OK 24.95s 215 subtests passed
261/265 postgresql:ssl / ssl/001_ssltests OK 7.97s 205 subtests passed
262/265 postgresql:pg_dump / pg_dump/002_pg_dump OK 19.65s 11262 subtests passed
263/265 postgresql:recovery / recovery/027_stream_regress OK 29.34s 6 subtests passed
264/265 postgresql:isolation / isolation/isolation OK 33.94s 112 subtests passed
265/265 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 38.22s 18 subtests passed

The pg_upgrade test is faster in isolation (29s), but not that much. The
overall runtime is reduces due to the reduced "competing" cpu usage, but other
than that...

Looking at where the time is spent when running the pg_upgrade test on its own:

grep -E '^\[' testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade |sed -E -e 's/.*\(([0-9.]+)s\)(.*)/\1 \2/g'|sort -n -r

cassert:
13.094 ok 5 - regression tests pass
6.147 ok 14 - run of pg_upgrade for new instance
2.340 ok 6 - dump before running pg_upgrade
1.638 ok 17 - dump after running pg_upgrade
1.375 ok 12 - run of pg_upgrade --check for new instance
0.798 ok 1 - check locales in original cluster
0.371 ok 9 - invalid database causes failure status (got 1 vs expected 1)
0.149 ok 7 - run of pg_upgrade --check for new instance with incorrect binary path
0.131 ok 16 - check that locales in new cluster match original cluster

optimized:
8.372 ok 5 - regression tests pass
3.641 ok 14 - run of pg_upgrade for new instance
1.371 ok 12 - run of pg_upgrade --check for new instance
1.104 ok 6 - dump before running pg_upgrade
0.636 ok 17 - dump after running pg_upgrade
0.594 ok 1 - check locales in original cluster
0.359 ok 9 - invalid database causes failure status (got 1 vs expected 1)
0.148 ok 7 - run of pg_upgrade --check for new instance with incorrect binary path
0.127 ok 16 - check that locales in new cluster match original cluster

The time for "dump before running pg_upgrade" is misleadingly high - there's
no output between starting initdb and the dump, so the timing includes initdb
and a bunch of other work. But it's still not fast (1.637s) after.

A small factor is that the initdb times are not insignificant, because the
template initdb can't be used due to a bunch of parameters passed to initdb :)

the big reduction in CPU time should translate to a nice speedup on slower
buildfarm animals.

Yea. It's a particularly large win when using valgrind. Under valgrind, a very
large portion of the time for many tests is just spent doing initdb... So I am
hoping to see some nice gains for skink.

Greetings,

Andres Freund

#21Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#18)
Re: initdb caching during tests

On 5 Aug 2023, at 21:56, Andres Freund <andres@anarazel.de> wrote:

We have some issues with CI on macos and windows being too expensive (more on
that soon in a separate email), which reminded me of this thread (with
original title: [1])

I've attached a somewhat cleaned up version of the patch to cache initdb
across runs. The results are still fairly impressive in my opinion.

One thing I do not like, but don't have a good idea for how to improve, is
that there's a bunch of duplicated logic in pg_regress.c and Cluster.pm. I've
tried to move that into initdb.c itself, but that ends up pretty ugly, because
we need to be a lot more careful about checking whether options are compatible
etc. I've also thought about just putting this into a separate perl script,
but right now we still allow basic regression tests without perl being
available. So I concluded that for now just having the copies is the best
answer.

I had a look at this today and have been running a lot of tests with it without
finding anything that breaks. The duplicated code is unfortunate, but after
playing around with some options I agree that it's likely the best option.

While looking I did venture down the rabbithole of making it support extra
params as well, but I don't think moving the goalposts there is doing us any
favors, it's clearly chasing diminishing returns.

My only small gripe is that I keep thinking about template databases for CREATE
DATABASE when reading the error messages in this patch, which is clearly not
related to what this does.

+ note("initializing database system by copying initdb template");

I personally would've used cache instead of template in the user facing parts
to keep concepts separated, but thats personal taste.

All in all, I think this is committable as is.

--
Daniel Gustafsson

#22Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#21)
Re: initdb caching during tests

Hi,

On 2023-08-22 23:47:24 +0200, Daniel Gustafsson wrote:

I had a look at this today and have been running a lot of tests with it without
finding anything that breaks.

Thanks!

The duplicated code is unfortunate, but after playing around with some
options I agree that it's likely the best option.

Good and bad to hear :)

While looking I did venture down the rabbithole of making it support extra
params as well, but I don't think moving the goalposts there is doing us any
favors, it's clearly chasing diminishing returns.

Agreed. I also went down that rabbithole, but it quickly gets a lot more code
and complexity - and there just aren't that many tests using non-default
options.

My only small gripe is that I keep thinking about template databases for CREATE
DATABASE when reading the error messages in this patch, which is clearly not
related to what this does.

+ note("initializing database system by copying initdb template");

I personally would've used cache instead of template in the user facing parts
to keep concepts separated, but thats personal taste.

I am going back and forth on that one (as one can notice with $subject). It
doesn't quite seem like a cache, as it's not "created" on demand and only
usable when the exactly same parameters are used repeatedly. But template is
overloaded as you say...

All in all, I think this is committable as is.

Cool. Planning to do that tomorrow. We can easily extend / adjust this later,
it just affects testing infrastructure.

Greetings,

Andres Freund

#23Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#22)
Re: initdb caching during tests

On 23 Aug 2023, at 03:17, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-22 23:47:24 +0200, Daniel Gustafsson wrote:

My only small gripe is that I keep thinking about template databases for CREATE
DATABASE when reading the error messages in this patch, which is clearly not
related to what this does.

+ note("initializing database system by copying initdb template");

I personally would've used cache instead of template in the user facing parts
to keep concepts separated, but thats personal taste.

I am going back and forth on that one (as one can notice with $subject). It
doesn't quite seem like a cache, as it's not "created" on demand and only
usable when the exactly same parameters are used repeatedly. But template is
overloaded as you say...

That's a fair point, cache is not a good word to describe a stored copy of
something prefabricated. Let's go with template, we can always refine in-tree
if a better wording comes along.

--
Daniel Gustafsson

#24Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#23)
Re: initdb caching during tests

Hi,

On 2023-08-23 10:10:31 +0200, Daniel Gustafsson wrote:

On 23 Aug 2023, at 03:17, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-22 23:47:24 +0200, Daniel Gustafsson wrote:

My only small gripe is that I keep thinking about template databases for CREATE
DATABASE when reading the error messages in this patch, which is clearly not
related to what this does.

+ note("initializing database system by copying initdb template");

I personally would've used cache instead of template in the user facing parts
to keep concepts separated, but thats personal taste.

I am going back and forth on that one (as one can notice with $subject). It
doesn't quite seem like a cache, as it's not "created" on demand and only
usable when the exactly same parameters are used repeatedly. But template is
overloaded as you say...

That's a fair point, cache is not a good word to describe a stored copy of
something prefabricated. Let's go with template, we can always refine in-tree
if a better wording comes along.

Cool. Pushed that way. Only change I made is to redirect the output of cp
(and/or robocopy) in pg_regress, similar to how that was done for initdb
proper.

Let's see what the buildfarm says - it's not inconceivable that it'll show
some issues.

Greetings,

Andres Freund

#25Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#24)
1 attachment(s)
Re: initdb caching during tests

On Fri, Aug 25, 2023 at 10:10 AM Andres Freund <andres@anarazel.de> wrote:

Let's see what the buildfarm says - it's not inconceivable that it'll show
some issues.

Apparently Solaris doesn't like "cp -a", per animal "margay". I think
"cp -RPp" should be enough everywhere?

https://docs.oracle.com/cd/E88353_01/html/E37839/cp-1.html
https://pubs.opengroup.org/onlinepubs/9699919799.2013edition/utilities/cp.html

Attachments:

0001-Avoid-non-POSIX-cp-flags.patchtext/x-patch; charset=US-ASCII; name=0001-Avoid-non-POSIX-cp-flags.patchDownload
From fd6c558e6bd43eef40d633ace763d8a2088c2509 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 25 Aug 2023 17:30:48 +1200
Subject: [PATCH] Avoid non-POSIX cp flags.

Commit 252dcb32 introduced cp -a, but apparently Solaris doesn't like
it.  Use cp -RPp instead.

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 426f94ff09..227c34ab4d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -549,7 +549,7 @@ sub init
 		}
 		else
 		{
-			@copycmd = qw(cp -a);
+			@copycmd = qw(cp -RPp);
 			$expected_exitcode = 0;
 		}
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 06674141a3..ec67588cf5 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2355,7 +2355,7 @@ regression_main(int argc, char *argv[],
 		else
 		{
 #ifndef WIN32
-			const char *copycmd = "cp -a \"%s\" \"%s/data\"";
+			const char *copycmd = "cp -RPp \"%s\" \"%s/data\"";
 			int			expected_exitcode = 0;
 #else
 			const char *copycmd = "robocopy /E /NJS /NJH /NFL /NDL /NP \"%s\" \"%s/data\"";
-- 
2.39.2

#26Daniel Gustafsson
daniel@yesql.se
In reply to: Thomas Munro (#25)
Re: initdb caching during tests

On 25 Aug 2023, at 07:50, Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Aug 25, 2023 at 10:10 AM Andres Freund <andres@anarazel.de> wrote:

Let's see what the buildfarm says - it's not inconceivable that it'll show
some issues.

Apparently Solaris doesn't like "cp -a", per animal "margay". I think
"cp -RPp" should be enough everywhere?

Agreed, AFAICT that should work equally well on all supported platforms.

--
Daniel Gustafsson

#27Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#26)
Re: initdb caching during tests

Hi,

On 2023-08-25 09:00:24 +0200, Daniel Gustafsson wrote:

On 25 Aug 2023, at 07:50, Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Aug 25, 2023 at 10:10 AM Andres Freund <andres@anarazel.de> wrote:

Let's see what the buildfarm says - it's not inconceivable that it'll show
some issues.

Apparently Solaris doesn't like "cp -a", per animal "margay". I think
"cp -RPp" should be enough everywhere?

Thanks for noticing the issue and submitting the patch.

Agreed, AFAICT that should work equally well on all supported platforms.

Also agreed. Unsurprisingly, CI didn't find anything on the tested platforms.

Pushed.

Greetings,

Andres Freund

#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#24)
Re: initdb caching during tests

On Thu, Aug 24, 2023 at 03:10:00PM -0700, Andres Freund wrote:

Cool. Pushed that way.

I just noticed the tests running about 30% faster on my machine due to
this. Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#29Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andres Freund (#24)
1 attachment(s)
Re: initdb caching during tests

On Fri, 25 Aug 2023 at 00:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-08-23 10:10:31 +0200, Daniel Gustafsson wrote:

On 23 Aug 2023, at 03:17, Andres Freund <andres@anarazel.de> wrote:
On 2023-08-22 23:47:24 +0200, Daniel Gustafsson wrote:

My only small gripe is that I keep thinking about template databases for CREATE
DATABASE when reading the error messages in this patch, which is clearly not
related to what this does.

+ note("initializing database system by copying initdb template");

I personally would've used cache instead of template in the user facing parts
to keep concepts separated, but thats personal taste.

I am going back and forth on that one (as one can notice with $subject). It
doesn't quite seem like a cache, as it's not "created" on demand and only
usable when the exactly same parameters are used repeatedly. But template is
overloaded as you say...

That's a fair point, cache is not a good word to describe a stored copy of
something prefabricated. Let's go with template, we can always refine in-tree
if a better wording comes along.

Cool. Pushed that way. Only change I made is to redirect the output of cp
(and/or robocopy) in pg_regress, similar to how that was done for initdb
proper.

While working on some things that are prone to breaking initdb, I
noticed that this template isn't generated with --no-clean, while
pg_regress does do that. This meant `make check` didn't have any
meaningful debuggable output when I broke the processes in initdb,
which is undesirable.

Attached a patch that fixes this for both make and meson, by adding
--no-clean to the initdb template.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachments:

v1-0001-Don-t-remove-initdb-template-when-initdb-fails.patchapplication/octet-stream; name=v1-0001-Don-t-remove-initdb-template-when-initdb-fails.patchDownload
From bd727a7cd560140a1646e3afaba237dfb6a92dc4 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Thu, 7 Dec 2023 14:47:32 +0100
Subject: [PATCH v1] Don't remove initdb template when initdb fails

pg_regress doesn't do that either, so keep a copy around
to allow us to debug those issues.
---
 meson.build            | 2 +-
 src/Makefile.global.in | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 0095fb183a..5bae1eb04c 100644
--- a/meson.build
+++ b/meson.build
@@ -3112,7 +3112,7 @@ sys.exit(sp.returncode)
 ''',
        test_initdb_template,
        temp_install_bindir / 'initdb',
-       '-A', 'trust', '-N', '--no-instructions', '--no-locale'
+       '-A', 'trust', '-N', '--no-instructions', '--no-locale', '--no-clean'
      ],
      priority: setup_tests_priority - 1,
      timeout: 300,
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index b3ca6392a6..06588b1c47 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -423,7 +423,7 @@ ifeq ($(MAKELEVEL),0)
 	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
 	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
 
-	$(with_temp_install) initdb -A trust -N --no-instructions --no-locale '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+	$(with_temp_install) initdb -A trust -N --no-instructions --no-locale --no-clean '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
 endif
 endif
 endif
-- 
2.40.1

#30Daniel Gustafsson
daniel@yesql.se
In reply to: Matthias van de Meent (#29)
Re: initdb caching during tests

On 7 Dec 2023, at 14:50, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Attached a patch that fixes this for both make and meson, by adding
--no-clean to the initdb template.

Makes sense. While in there I think we should rename -N to the long optoin
--no-sync to make it easier to grep for and make the buildfiles more
self-documenting.

--
Daniel Gustafsson

#31Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Daniel Gustafsson (#30)
1 attachment(s)
Re: initdb caching during tests

On Thu, 7 Dec 2023 at 15:06, Daniel Gustafsson <daniel@yesql.se> wrote:

On 7 Dec 2023, at 14:50, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Attached a patch that fixes this for both make and meson, by adding
--no-clean to the initdb template.

Makes sense. While in there I think we should rename -N to the long optoin
--no-sync to make it easier to grep for and make the buildfiles more
self-documenting.

Then that'd be the attached patch, which also includes --auth instead
of -A, for the same reason as -N vs --no-sync

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachments:

v2-0001-Don-t-remove-initdb-template-when-initdb-fails.patchapplication/octet-stream; name=v2-0001-Don-t-remove-initdb-template-when-initdb-fails.patchDownload
From a77293d22bc58fe23bdc9bdec64ac9b2f153718b Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Thu, 7 Dec 2023 14:47:32 +0100
Subject: [PATCH v2] Don't remove initdb template when initdb fails

pg_regress doesn't do that either, so keep a copy around
to allow us to debug those issues.

While we're here, use the long options for initdb, to
make this code more self-documenting.

Reviewed-by: Daniel Gustafsson
---
 meson.build            | 3 ++-
 src/Makefile.global.in | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 0095fb183a..502f511588 100644
--- a/meson.build
+++ b/meson.build
@@ -3112,7 +3112,8 @@ sys.exit(sp.returncode)
 ''',
        test_initdb_template,
        temp_install_bindir / 'initdb',
-       '-A', 'trust', '-N', '--no-instructions', '--no-locale'
+       '--auth', 'trust', '--no-sync', '--no-instructions', '--no-locale',
+       '--no-clean'
      ],
      priority: setup_tests_priority - 1,
      timeout: 300,
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index b3ca6392a6..104e5de0fe 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -423,7 +423,7 @@ ifeq ($(MAKELEVEL),0)
 	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
 	$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
 
-	$(with_temp_install) initdb -A trust -N --no-instructions --no-locale '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
+	$(with_temp_install) initdb --auth trust --no-sync --no-instructions --no-locale --no-clean '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1
 endif
 endif
 endif
-- 
2.40.1

#32Daniel Gustafsson
daniel@yesql.se
In reply to: Matthias van de Meent (#31)
Re: initdb caching during tests

On 7 Dec 2023, at 15:27, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Then that'd be the attached patch, which also includes --auth instead
of -A, for the same reason as -N vs --no-sync

Applied to master, thanks!

--
Daniel Gustafsson