Minor improvements to test log navigability

Started by Thomas Munroover 3 years ago7 messages
#1Thomas Munro
thomas.munro@gmail.com
2 attachment(s)

Hi,

Speaking as someone who regularly trawls through megabytes of build farm output:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

Attachments:

0001-Use-restart_after_crash-off-for-regression-tests.patchtext/x-patch; charset=US-ASCII; name=0001-Use-restart_after_crash-off-for-regression-tests.patchDownload
From 2ddc124126ad0bb635023b805dbb6ad1b30fc863 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 16 May 2022 10:34:04 +1200
Subject: [PATCH 1/2] Use restart_after_crash=off for regression tests.

Continuing to try to connect while the server is not accepting
connections in crash recovery produces a lot of useless extra log
material.  Let's not do that.
---
 src/test/regress/pg_regress.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 982801e029..9a04007651 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2376,6 +2376,7 @@ regression_main(int argc, char *argv[],
 		snprintf(buf, sizeof(buf),
 				 "\"%s%spostgres\" -D \"%s/data\" -F%s "
 				 "-c \"listen_addresses=%s\" -k \"%s\" "
+				 "-c \"restart_after_crash=off\" "
 				 "> \"%s/log/postmaster.log\" 2>&1",
 				 bindir ? bindir : "",
 				 bindir ? "/" : "",
-- 
2.36.0

0002-Rename-TAP-test-log-output-files.patchtext/x-patch; charset=US-ASCII; name=0002-Rename-TAP-test-log-output-files.patchDownload
From aa24fa3b8b25bfc237ca007f9986fc2b8c9c37d7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 16 May 2022 10:36:30 +1200
Subject: [PATCH 2/2] Rename TAP test log output files.

Instead of "regress_log_001_testname", use "001_testname.log".  This
will cluster them with the corresponding server logs in sorted directory
listings (CI) and dumps (build farm).
---
 .cirrus.yml                            | 1 -
 src/test/perl/PostgreSQL/Test/Utils.pm | 2 +-
 src/test/perl/README                   | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae55..e7232b7a7c 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -33,7 +33,6 @@ on_failure: &on_failure
     paths:
       - "**/*.log"
       - "**/*.diffs"
-      - "**/regress_log_*"
     type: text/plain
 
 task:
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 1ca2cc5917..8d7e20b0eb 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -201,7 +201,7 @@ INIT
 	# Open the test log file, whose name depends on the test name.
 	$test_logfile = basename($0);
 	$test_logfile =~ s/\.[^.]+$//;
-	$test_logfile = "$log_path/regress_log_$test_logfile";
+	$test_logfile = "$log_path/$test_logfile.log";
 	open my $testlog, '>', $test_logfile
 	  or die "could not open STDOUT to logfile \"$test_logfile\": $!";
 
diff --git a/src/test/perl/README b/src/test/perl/README
index 4b160cce36..1227944132 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -19,7 +19,7 @@ make check-world PROVE_FLAGS='--verbose'
 
 When a test fails, the terminal output from 'prove' is usually not sufficient
 to diagnose the problem.  Look into the log files that are left under
-tmp_check/log/ to get more info.  Files named 'regress_log_XXX' are log
+tmp_check/log/ to get more info.  Files named '001_testname.log' are log
 output from the perl test scripts themselves, and should be examined first.
 Other files are postmaster logs, and may be helpful as additional data.
 
-- 
2.36.0

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Thomas Munro (#1)
Re: Minor improvements to test log navigability

On 2022-May-16, Thomas Munro wrote:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

I don't know of any. Let's.

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

+1.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Minor improvements to test log navigability

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2022-May-16, Thomas Munro wrote:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

I don't know of any. Let's.

Have you actually tested what happens? I fear this would just
result in different spam.

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

+1.

Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Minor improvements to test log navigability

On Mon, May 16, 2022 at 12:18:24PM -0400, Tom Lane wrote:

Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.

FWIW, the buildfarm client feeds from the contents of the directory
tmp_check/log/, so a simple renaming of the main log file would have
no impact. Here are the parts of the code doing that:
PGBuild/Modules/TestMyTap.pm: my @logs = glob("$self->{where}/tmp_check/log/*");
run_build.pl: my @logs = glob("$dir/tmp_check/log/*");
--
Michael

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#3)
Re: Minor improvements to test log navigability

On Tue, May 17, 2022 at 4:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2022-May-16, Thomas Munro wrote:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

I don't know of any. Let's.

Have you actually tested what happens? I fear this would just
result in different spam.

I'd forgotten that we already do this on CI, via
src/tools/ci/pg_ci_base.conf, so we can compare. A CI postmaster.log
that ends with "shutting down because restart_after_crash is off":

https://api.cirrus-ci.com/v1/artifact/task/6537277877780480/log/src/test/regress/log/postmaster.log

The build farm version has ~350 lines of "FATAL: the database system
is in recovery mode" instead:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=seawasp&amp;dt=2022-05-16%2023%3A17%3A49

Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:

- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-ZqXocK/.s.PGSQL.5678" failed: FATAL:  the database
system is in recovery mode

In the CI version, that looks like:

- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-T35Yzi/.s.PGSQL.51696" failed: No such file or
directory
+    Is the server running locally and accepting connections on that socket?

I wonder if there would be a good way to filter those "never managed
to connect" cases out... Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#5)
Re: Minor improvements to test log navigability

Thomas Munro <thomas.munro@gmail.com> writes:

... Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:

Yeah, that's really the main problem.

I wonder if there would be a good way to filter those "never managed
to connect" cases out... Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.

Maybe pg_regress could check that postmaster.pid is still there
before launching each new test script? (Obviously this all applies
only to "make check" not "make installcheck".)

regards, tom lane

#7Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Thomas Munro (#1)
Re: Minor improvements to test log navigability

On 16.05.22 01:01, Thomas Munro wrote:

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

I'm in favor of a saner name, but wouldn't something.log be confusable
with a server log? Maybe something.out would be clearer. Or
something_output.log, if the .log suffix is somehow desirable.