Minor improvements to test log navigability

Started by Thomas Munroalmost 4 years ago7 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

Speaking as someone who regularly trawls through megabytes of build farm output:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

Attachments:

0001-Use-restart_after_crash-off-for-regression-tests.patchtext/x-patch; charset=US-ASCII; name=0001-Use-restart_after_crash-off-for-regression-tests.patchDownload+1-1
0002-Rename-TAP-test-log-output-files.patchtext/x-patch; charset=US-ASCII; name=0002-Rename-TAP-test-log-output-files.patchDownload+2-4
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#1)
Re: Minor improvements to test log navigability

On 2022-May-16, Thomas Munro wrote:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

I don't know of any. Let's.

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

+1.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Minor improvements to test log navigability

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2022-May-16, Thomas Munro wrote:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

I don't know of any. Let's.

Have you actually tested what happens? I fear this would just
result in different spam.

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

+1.

Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Minor improvements to test log navigability

On Mon, May 16, 2022 at 12:18:24PM -0400, Tom Lane wrote:

Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.

FWIW, the buildfarm client feeds from the contents of the directory
tmp_check/log/, so a simple renaming of the main log file would have
no impact. Here are the parts of the code doing that:
PGBuild/Modules/TestMyTap.pm: my @logs = glob("$self->{where}/tmp_check/log/*");
run_build.pl: my @logs = glob("$dir/tmp_check/log/*");
--
Michael

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#3)
Re: Minor improvements to test log navigability

On Tue, May 17, 2022 at 4:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2022-May-16, Thomas Munro wrote:

1. It seems a bit useless to have a load of "FATAL: the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress. Any reason not to just turn that off, as we do for
the TAP tests?

I don't know of any. Let's.

Have you actually tested what happens? I fear this would just
result in different spam.

I'd forgotten that we already do this on CI, via
src/tools/ci/pg_ci_base.conf, so we can compare. A CI postmaster.log
that ends with "shutting down because restart_after_crash is off":

https://api.cirrus-ci.com/v1/artifact/task/6537277877780480/log/src/test/regress/log/postmaster.log

The build farm version has ~350 lines of "FATAL: the database system
is in recovery mode" instead:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=seawasp&amp;dt=2022-05-16%2023%3A17%3A49

Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:

- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-ZqXocK/.s.PGSQL.5678" failed: FATAL:  the database
system is in recovery mode

In the CI version, that looks like:

- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-T35Yzi/.s.PGSQL.51696" failed: No such file or
directory
+    Is the server running locally and accepting connections on that socket?

I wonder if there would be a good way to filter those "never managed
to connect" cases out... Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#5)
Re: Minor improvements to test log navigability

Thomas Munro <thomas.munro@gmail.com> writes:

... Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:

Yeah, that's really the main problem.

I wonder if there would be a good way to filter those "never managed
to connect" cases out... Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.

Maybe pg_regress could check that postmaster.pid is still there
before launching each new test script? (Obviously this all applies
only to "make check" not "make installcheck".)

regards, tom lane

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#1)
Re: Minor improvements to test log navigability

On 16.05.22 01:01, Thomas Munro wrote:

2. The TAP test logs are strangely named. Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

I'm in favor of a saner name, but wouldn't something.log be confusable
with a server log? Maybe something.out would be clearer. Or
something_output.log, if the .log suffix is somehow desirable.