Minor improvements to test log navigability
Hi,
Speaking as someone who regularly trawls through megabytes of build farm output:
1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?
2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?
Attachments:
0001-Use-restart_after_crash-off-for-regression-tests.patchtext/x-patch; charset=US-ASCII; name=0001-Use-restart_after_crash-off-for-regression-tests.patchDownload
From 2ddc124126ad0bb635023b805dbb6ad1b30fc863 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 16 May 2022 10:34:04 +1200
Subject: [PATCH 1/2] Use restart_after_crash=off for regression tests.
Continuing to try to connect while the server is not accepting
connections in crash recovery produces a lot of useless extra log
material. Let's not do that.
---
src/test/regress/pg_regress.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 982801e029..9a04007651 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2376,6 +2376,7 @@ regression_main(int argc, char *argv[],
snprintf(buf, sizeof(buf),
"\"%s%spostgres\" -D \"%s/data\" -F%s "
"-c \"listen_addresses=%s\" -k \"%s\" "
+ "-c \"restart_after_crash=off\" "
"> \"%s/log/postmaster.log\" 2>&1",
bindir ? bindir : "",
bindir ? "/" : "",
--
2.36.0
0002-Rename-TAP-test-log-output-files.patchtext/x-patch; charset=US-ASCII; name=0002-Rename-TAP-test-log-output-files.patchDownload
From aa24fa3b8b25bfc237ca007f9986fc2b8c9c37d7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 16 May 2022 10:36:30 +1200
Subject: [PATCH 2/2] Rename TAP test log output files.
Instead of "regress_log_001_testname", use "001_testname.log". This
will cluster them with the corresponding server logs in sorted directory
listings (CI) and dumps (build farm).
---
.cirrus.yml | 1 -
src/test/perl/PostgreSQL/Test/Utils.pm | 2 +-
src/test/perl/README | 2 +-
3 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae55..e7232b7a7c 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -33,7 +33,6 @@ on_failure: &on_failure
paths:
- "**/*.log"
- "**/*.diffs"
- - "**/regress_log_*"
type: text/plain
task:
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 1ca2cc5917..8d7e20b0eb 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -201,7 +201,7 @@ INIT
# Open the test log file, whose name depends on the test name.
$test_logfile = basename($0);
$test_logfile =~ s/\.[^.]+$//;
- $test_logfile = "$log_path/regress_log_$test_logfile";
+ $test_logfile = "$log_path/$test_logfile.log";
open my $testlog, '>', $test_logfile
or die "could not open STDOUT to logfile \"$test_logfile\": $!";
diff --git a/src/test/perl/README b/src/test/perl/README
index 4b160cce36..1227944132 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -19,7 +19,7 @@ make check-world PROVE_FLAGS='--verbose'
When a test fails, the terminal output from 'prove' is usually not sufficient
to diagnose the problem. Look into the log files that are left under
-tmp_check/log/ to get more info. Files named 'regress_log_XXX' are log
+tmp_check/log/ to get more info. Files named '001_testname.log' are log
output from the perl test scripts themselves, and should be examined first.
Other files are postmaster logs, and may be helpful as additional data.
--
2.36.0
On 2022-May-16, Thomas Munro wrote:
1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?
I don't know of any. Let's.
2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?
+1.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2022-May-16, Thomas Munro wrote:
1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?
I don't know of any. Let's.
Have you actually tested what happens? I fear this would just
result in different spam.
2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?
+1.
Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.
regards, tom lane
On Mon, May 16, 2022 at 12:18:24PM -0400, Tom Lane wrote:
Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.
FWIW, the buildfarm client feeds from the contents of the directory
tmp_check/log/, so a simple renaming of the main log file would have
no impact. Here are the parts of the code doing that:
PGBuild/Modules/TestMyTap.pm: my @logs = glob("$self->{where}/tmp_check/log/*");
run_build.pl: my @logs = glob("$dir/tmp_check/log/*");
--
Michael
On Tue, May 17, 2022 at 4:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2022-May-16, Thomas Munro wrote:
1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?I don't know of any. Let's.
Have you actually tested what happens? I fear this would just
result in different spam.
I'd forgotten that we already do this on CI, via
src/tools/ci/pg_ci_base.conf, so we can compare. A CI postmaster.log
that ends with "shutting down because restart_after_crash is off":
https://api.cirrus-ci.com/v1/artifact/task/6537277877780480/log/src/test/regress/log/postmaster.log
The build farm version has ~350 lines of "FATAL: the database system
is in recovery mode" instead:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=seawasp&dt=2022-05-16%2023%3A17%3A49
Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:
- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-ZqXocK/.s.PGSQL.5678" failed: FATAL: the database
system is in recovery mode
In the CI version, that looks like:
- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-T35Yzi/.s.PGSQL.51696" failed: No such file or
directory
+ Is the server running locally and accepting connections on that socket?
I wonder if there would be a good way to filter those "never managed
to connect" cases out... Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.
Thomas Munro <thomas.munro@gmail.com> writes:
... Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:
Yeah, that's really the main problem.
I wonder if there would be a good way to filter those "never managed
to connect" cases out... Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.
Maybe pg_regress could check that postmaster.pid is still there
before launching each new test script? (Obviously this all applies
only to "make check" not "make installcheck".)
regards, tom lane
On 16.05.22 01:01, Thomas Munro wrote:
2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?
I'm in favor of a saner name, but wouldn't something.log be confusable
with a server log? Maybe something.out would be clearer. Or
something_output.log, if the .log suffix is somehow desirable.