reduce size of logs stored by buildfarm

Started by Álvaro Herreraabout 2 months ago8 messages

alvherre@kurilemu.de

about 2 months ago

3 attachment(s)

Hi,

I just noticed that one regress log file for the pg_dump test is about
7MB long because it contains totally unnecessary dump files in the
regress log output. This is because the tests run pg_dump of schema
pg_catalog without redirecting the output anywhere, which means it goes
to stdout, and then `prove` helpfully puts it in the regress log file.
This is completely pointless. We can make the log file 1 MB instead by
adding a --file parameter to that command, as 0001 attached.

We also have a couple of tests that leave rather large server logs
behind that are perhaps not very useful. I would propose to run those
servers with log_statement=off and avoid some of the bloat. 0002 does
that.

Another possible approach, instead of individually tweaking t/*.pl files
to add 'log_statements=none', would be to set it that way in
PostgreSQL::Test::Cluster globally. Right now we have
log_statements=all there, but maybe that was not a great idea.

I also noticed that an oauth test file contains a couple MBs of a
gigantic string of just 'x'. I suppose that will compress well (with
gzip at least, the 2 MB file becomes 8 kB). Still, it's kinda
ridiculous and useless to bloat a 67 kB file to 2 MB that way. The
problem here is that we print 'sending JSON response' and the verbatim
response, included the 1 MB of "x" as '_pad_'. Can we supress this?
0003 does that by simply cutting the string short at 10k, which reduces
the size of the log from 2 MB to some 86 kB. Maybe there are better
ways to deal with this? Jacob?

With these three patches, we go from having 62 MB bytes in files with
"log" in their names under testrun/ (except those that have "catalog")
to 30 MB. I think that's not a bad reduction.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Are you not unsure you want to delete Firefox?
[Not unsure] [Not not unsure] [Cancel]
http://smylers.hates-software.com/2008/01/03/566e45b2.html

Attachments:

0001-pg_dump-tests-don-t-put-dumps-in-stdout.patchtext/x-diff; charset=utf-8Download

From ae1568e9bb21b9ee6a6b24ef967c2bd14022233b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Tue, 25 Nov 2025 13:42:27 +0100
Subject: [PATCH 1/3] pg_dump tests: don't put dumps in stdout

---
 src/bin/pg_dump/t/002_pg_dump.pl | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 445a541abf6..a05212719d9 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -5019,7 +5019,12 @@ $node->command_fails_like(
 # Test dumping pg_catalog (for research -- cannot be reloaded)
 
 $node->command_ok(
-	[ 'pg_dump', '--port' => $port, '--schema' => 'pg_catalog' ],
+	[
+		'pg_dump',
+		'--port' => $port,
+		'--schema' => 'pg_catalog',
+		'--file' => 'pgdump_pgcatalog.dmp'
+	],
 	'pg_dump: option -n pg_catalog');
 
 #########################################
@@ -5029,7 +5034,8 @@ $node->command_ok(
 	[
 		'pg_dumpall',
 		'--port' => $port,
-		'--exclude-database' => '"myhost.mydb"'
+		'--exclude-database' => '"myhost.mydb"',
+		'--file' => 'pgdumpall.dmp'
 	],
 	'pg_dumpall: option --exclude-database handles database names with embedded dots'
 );
-- 
2.47.3

0002-log_statement-off.patchtext/x-diff; charset=utf-8Download

From b10634f5f68e4a65708a0ce13fc0dd95a875bc4a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Tue, 25 Nov 2025 13:56:49 +0100
Subject: [PATCH 2/3] log_statement=off

---
 src/bin/pg_amcheck/t/003_check.pl            | 5 ++++-
 src/bin/pg_dump/t/002_pg_dump.pl             | 3 +++
 src/bin/pgbench/t/001_pgbench_with_server.pl | 4 ++++
 src/test/recovery/t/027_stream_regress.pl    | 1 +
 src/test/subscription/t/035_conflicts.pl     | 3 +++
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_amcheck/t/003_check.pl b/src/bin/pg_amcheck/t/003_check.pl
index 881854da254..63de75adf4f 100644
--- a/src/bin/pg_amcheck/t/003_check.pl
+++ b/src/bin/pg_amcheck/t/003_check.pl
@@ -121,7 +121,10 @@ sub perform_all_corruptions()
 # Test set-up
 $node = PostgreSQL::Test::Cluster->new('test');
 $node->init(no_data_checksums => 1);
-$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->append_conf('postgresql.conf',
+	qq{autovacuum=off
+log_statement=none
+});
 $node->start;
 $port = $node->port;
 
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index a05212719d9..db3fe1aa068 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -4839,6 +4839,9 @@ my %tests = (
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
+$node->append_conf(
+	'postgresql.conf',
+	qq{log_statement=none});
 $node->start;
 
 my $port = $node->port;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 581e9af7907..83b929642ec 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -42,6 +42,10 @@ my $node = PostgreSQL::Test::Cluster->new('main');
 # Set to untranslated messages, to be able to compare program output with
 # expected strings.
 $node->init(extra => [ '--locale', 'C' ]);
+$node->append_conf(
+	'postgresql.conf',
+	qq{log_statement=none});
+
 $node->start;
 
 # tablespace for testing, because partitioned tables cannot use pg_default
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 589c79d97d3..64b15f276e2 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -26,6 +26,7 @@ $node_primary->append_conf(
 	qq{shared_preload_libraries = 'pg_stat_statements'
 pg_stat_statements.max = 50000
 compute_query_id = 'regress'
+log_statement=none
 });
 
 # We'll stick with Cluster->new's small default shared_buffers, but since that
diff --git a/src/test/subscription/t/035_conflicts.pl b/src/test/subscription/t/035_conflicts.pl
index a94bde16f53..3ad6af2908e 100644
--- a/src/test/subscription/t/035_conflicts.pl
+++ b/src/test/subscription/t/035_conflicts.pl
@@ -14,6 +14,9 @@ use Test::More;
 # Create a publisher node
 my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
 $node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+	'postgresql.conf',
+	qq{log_statements=none});
 $node_publisher->start;
 
 # Create a subscriber node
-- 
2.47.3

0003-oauth_server-don-t-log-huge-strings.patchtext/x-diff; charset=utf-8Download

From 14a835cdaa6b0f8bad368bbf3f75f1fde74edb05 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Tue, 25 Nov 2025 13:57:08 +0100
Subject: [PATCH 3/3] oauth_server: don't log huge strings

---
 src/test/modules/oauth_validator/t/oauth_server.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/test/modules/oauth_validator/t/oauth_server.py b/src/test/modules/oauth_validator/t/oauth_server.py
index 0f8836aadf3..a4821a8e8b2 100755
--- a/src/test/modules/oauth_validator/t/oauth_server.py
+++ b/src/test/modules/oauth_validator/t/oauth_server.py
@@ -263,7 +263,8 @@ class OAuthHandler(http.server.BaseHTTPRequestHandler):
         self._response_code can be modified to send JSON error responses.
         """
         resp = json.dumps(js).encode("ascii")
-        self.log_message("sending JSON response: %s", resp)
+        # avoid saving humongous strings to the log file
+        self.log_message("sending JSON response: %s", resp[0:10000])
 
         self.send_response(self._response_code)
         self.send_header("Content-Type", self._content_type)
-- 
2.47.3

Andres Freund

andres@anarazel.de

about 2 months ago

In reply to: Álvaro Herrera (#1)

Re: reduce size of logs stored by buildfarm

Hi,

On 2025-11-25 14:30:33 +0100, ï¿½lvaro Herrera wrote:

I just noticed that one regress log file for the pg_dump test is about
7MB long because it contains totally unnecessary dump files in the
regress log output. This is because the tests run pg_dump of schema
pg_catalog without redirecting the output anywhere, which means it goes
to stdout, and then `prove` helpfully puts it in the regress log file.
This is completely pointless. We can make the log file 1 MB instead by
adding a --file parameter to that command, as 0001 attached.

Yea, that makes no sense.

We also have a couple of tests that leave rather large server logs
behind that are perhaps not very useful. I would propose to run those
servers with log_statement=off and avoid some of the bloat. 0002 does
that.

Another possible approach, instead of individually tweaking t/*.pl files
to add 'log_statements=none', would be to set it that way in
PostgreSQL::Test::Cluster globally. Right now we have
log_statements=all there, but maybe that was not a great idea.

I think that be a bad idea - many tests are essentially undebuggable that way
because there's no other way to correlate the regress_log and the server log
than the statement log.

TBH, I have even needed the pg_dump log_statement output on the buildfarm to
debug issues :(

I am against doing this for 027_stream_regress.pl, it's definitely
undebuggable without the statement logs.

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

about 2 months ago

In reply to: Álvaro Herrera (#1)

Re: reduce size of logs stored by buildfarm

=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:

With these three patches, we go from having 62 MB bytes in files with
"log" in their names under testrun/ (except those that have "catalog")
to 30 MB. I think that's not a bad reduction.

I wonder how much this overlaps with what Andrew just did
to the BF client [1]/messages/by-id/601bb91d-b55d-4fa6-bb57-c2126fb22620@dunslane.net.

regards, tom lane

[1]: /messages/by-id/601bb91d-b55d-4fa6-bb57-c2126fb22620@dunslane.net

Andrew Dunstan

andrew@dunslane.net

about 2 months ago

In reply to: Tom Lane (#3)

Re: reduce size of logs stored by buildfarm

On 2025-11-25 Tu 10:37 AM, Tom Lane wrote:

=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:

With these three patches, we go from having 62 MB bytes in files with
"log" in their names under testrun/ (except those that have "catalog")
to 30 MB. I think that's not a bad reduction.

I wonder how much this overlaps with what Andrew just did
to the BF client [1].

[1] /messages/by-id/601bb91d-b55d-4fa6-bb57-c2126fb22620@dunslane.net

Pretty much none of it, although both Alvaro and I are clearly motivated
by the same thing, namely disk space pressure on the buildfarm server.

I think his patches 1 and 3 seem like no-brainers.

I did two things in the BF client:

. prevent uploading redundant copies of the same file in meson builds
(this was a bug)

. inhibit loading postmaster log files from installcheck runs and
cross-version upgrade runs, but only if there hasn't been a failure.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Álvaro Herrera

alvherre@kurilemu.de

about 2 months ago

In reply to: Andrew Dunstan (#4)

Re: reduce size of logs stored by buildfarm

On 2025-Nov-25, Andrew Dunstan wrote:

I think his patches 1 and 3 seem like no-brainers.

On 2025-Nov-25, Andres Freund wrote:

On 2025-11-25 14:30:33 +0100, Álvaro Herrera wrote:

I just noticed that one regress log file for the pg_dump test is about
7MB long because it contains totally unnecessary dump files in the
regress log output. This is because the tests run pg_dump of schema
pg_catalog without redirecting the output anywhere, which means it goes
to stdout, and then `prove` helpfully puts it in the regress log file.
This is completely pointless. We can make the log file 1 MB instead by
adding a --file parameter to that command, as 0001 attached.

Yea, that makes no sense.

Thanks, pushed 0001.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

Import Notes

Reply to msg id not found: a9db7d10-a941-46bc-80b7-d9c616097af3@dunslane.netozycnnem6xbrsa3nhylpqy2kyl2kudq3r4sq2obxh6cm6evqsm@ozihkcefz3e4 | Resolved by subject fallback

Jacob Champion

jacob.champion@enterprisedb.com

about 2 months ago

In reply to: Álvaro Herrera (#1)

1 attachment(s)

Re: reduce size of logs stored by buildfarm

On Tue, Nov 25, 2025 at 5:30 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:

I also noticed that an oauth test file contains a couple MBs of a
gigantic string of just 'x'. I suppose that will compress well (with
gzip at least, the 2 MB file becomes 8 kB). Still, it's kinda
ridiculous and useless to bloat a 67 kB file to 2 MB that way.

Sorry about that.

0003 does that by simply cutting the string short at 10k, which reduces
the size of the log from 2 MB to some 86 kB. Maybe there are better
ways to deal with this? Jacob?

Only thing I don't like about this is that the JSON you need for
debugging might be after the useless padding. Attached is a patch that
does things more surgically, tested against Python 3.6, and I'm
running it through the CI now.

Thanks,
--Jacob

Attachments:

truncate-JSON-logs.patch.txttext/plain; charset=UTF-8; name=truncate-JSON-logs.patch.txtDownload

From 7f6b1b4d19ac19f62dc5525198bb3d047d21ae7e Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Tue, 25 Nov 2025 10:07:29 -0800
Subject: [PATCH] WIP: truncate long JSON responses in logs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reported-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/202511251218.zfs4nu2qnh2m%40alvherre.pgsql
---
 .../modules/oauth_validator/t/oauth_server.py | 22 ++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/test/modules/oauth_validator/t/oauth_server.py b/src/test/modules/oauth_validator/t/oauth_server.py
index 0f8836aadf3..c70783ecbe4 100755
--- a/src/test/modules/oauth_validator/t/oauth_server.py
+++ b/src/test/modules/oauth_validator/t/oauth_server.py
@@ -257,13 +257,33 @@ class OAuthHandler(http.server.BaseHTTPRequestHandler):
 
         return token
 
+    def _log_response(self, js: JsonObject) -> None:
+        """
+        Trims the response JSON, if necessary, and logs it for later debugging.
+        """
+        # At the moment the biggest problem for tests is the _pad_ member, which
+        # is a megabyte in size, so truncate that to something more reasonable.
+        if "_pad_" in js:
+            pad = js["_pad_"]
+
+            # Don't modify the original dict.
+            js = dict(js)
+            js["_pad_"] = pad[:64] + f"[...truncated from {len(pad)} bytes]"
+
+        resp = json.dumps(js).encode("ascii")
+        self.log_message("sending JSON response: %s", resp)
+
+        # If you've tripped this assertion, please truncate the new addition as
+        # above, or else come up with a new strategy.
+        assert len(resp) < 1024, "_log_response must be adjusted for new JSON"
+
     def _send_json(self, js: JsonObject) -> None:
         """
         Sends the provided JSON dict as an application/json response.
         self._response_code can be modified to send JSON error responses.
         """
         resp = json.dumps(js).encode("ascii")
-        self.log_message("sending JSON response: %s", resp)
+        self._log_response(js)
 
         self.send_response(self._response_code)
         self.send_header("Content-Type", self._content_type)
-- 
2.34.1

Álvaro Herrera

alvherre@kurilemu.de

about 2 months ago

In reply to: Jacob Champion (#6)

Re: reduce size of logs stored by buildfarm

On 2025-Nov-25, Jacob Champion wrote:

On Tue, Nov 25, 2025 at 5:30 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:

I also noticed that an oauth test file contains a couple MBs of a
gigantic string of just 'x'. I suppose that will compress well (with
gzip at least, the 2 MB file becomes 8 kB). Still, it's kinda
ridiculous and useless to bloat a 67 kB file to 2 MB that way.

Sorry about that.

No need!

0003 does that by simply cutting the string short at 10k, which reduces
the size of the log from 2 MB to some 86 kB. Maybe there are better
ways to deal with this? Jacob?

Only thing I don't like about this is that the JSON you need for
debugging might be after the useless padding. Attached is a patch that
does things more surgically, tested against Python 3.6, and I'm
running it through the CI now.

Brilliant, thanks.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick." (Andrew Sullivan)
/messages/by-id/20050809113420.GD2768@phlogiston.dyndns.org

Jacob Champion

jacob.champion@enterprisedb.com

about 1 month ago

In reply to: Álvaro Herrera (#7)

Re: reduce size of logs stored by buildfarm

On Tue, Nov 25, 2025 at 10:46 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:

Only thing I don't like about this is that the JSON you need for
debugging might be after the useless padding. Attached is a patch that
does things more surgically, tested against Python 3.6, and I'm
running it through the CI now.

Brilliant, thanks.

(This was pushed last week; forgot to follow up here.)

Thanks,
--Jacob