Re: CI and test improvements

Started by Justin Pryzbyalmost 4 years ago71 messageshackers

pryzby@telsasoft.com

almost 4 years ago

Resending with a problematic address removed...

On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:

I'm anticipating the need to further re-arrange the patch set - it's not clear
which patches should go first. Maybe some patches should be dropped in favour
of the meson project. I guess some patches will have to be re-implemented with
meson (msvc warnings).

Maybe the improvements to vcregress should go into v15 ? CI should run all the
tests (which also serves to document *how* to run all the tests, even if there
isn't a literal check-world target).

On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote:

Should any of the test changes go into v15 ?

On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote:

Also, cirrus/freebsd task can run 3x faster with more CPUs.

Checking if there's interest in any/none of these patches ?
I have added several more.

Do you have an idea when the meson branch might be merged ?

Will vcregress remain for a while, or will it go away for v16 ?

--
Justin

Import Notes

Reply to msg id not found: 20220708002232.GF13040@telsasoft.com20220623193125.GB22452@telsasoft.com20220528153741.GK19626@telsasoft.com

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Justin Pryzby (#1)

Hi,

On 2022-08-28 09:44:47 -0500, Justin Pryzby wrote:

On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:

I'm anticipating the need to further re-arrange the patch set - it's not clear
which patches should go first. Maybe some patches should be dropped in favour
of the meson project. I guess some patches will have to be re-implemented with
meson (msvc warnings).

Maybe the improvements to vcregress should go into v15 ? CI should run all the
tests (which also serves to document *how* to run all the tests, even if there
isn't a literal check-world target).

On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote:

Should any of the test changes go into v15 ?

On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote:

Also, cirrus/freebsd task can run 3x faster with more CPUs.

Checking if there's interest in any/none of these patches ?
I have added several more.

Do you have an idea when the meson branch might be merged ?

I hope to do that fairly soon, but it's of course dependant on review etc. The
plan was to merge it early and mature it in tree to some degree. There's only
so much we can do "from the outside"...

Will vcregress remain for a while, or will it go away for v16 ?

The plan was for the windows stuff to go away fairly quickly.

From 99ee0bef5054539aad0e23a24dd9c9cc9ccee788 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 01/25] cirrus/windows: add compiler_warnings_script

Looks good.

-    MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo
+    # -fileLoggerParameters1: write warnings to msbuild.warn.log.
+    MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log

Except, I think it'd be good to split this line. What do you think about using
something like
MSBFLAGS: >-
-nologo
-m -verbosity:minimal
/p:TrackFileAccess=false
"-consoleLoggerParameters:Summary;ForceNoAlign"
-fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log

I think that should work?

# If tests hang forever, cirrus eventually times out. In that case log
# output etc is not uploaded, making the problem hard to debug. Of course
@@ -450,6 +451,11 @@ task:
cd src/tools/msvc
%T_C% perl vcregress.pl ecpgcheck

+  # These should be last, so all the important checks are always run
+  always:
+    compiler_warnings_script:
+      - sh src\tools\ci\windows-compiler-warnings msbuild.warn.log
+
on_failure:
<<: *on_failure
crashlog_artifacts:
diff --git a/src/tools/ci/windows-compiler-warnings b/src/tools/ci/windows-compiler-warnings
new file mode 100755
index 00000000000..d6f9a1fc569
--- /dev/null
+++ b/src/tools/ci/windows-compiler-warnings
@@ -0,0 +1,16 @@
+#! /bin/sh
+# Success if the given file doesn't exist or is empty, else fail
+# This is a separate file only to avoid dealing with windows shell quoting and escaping.
+set -e
+
+fn=$1
+
+if [ -s "$fn" ]
+then
+	# Display the file's content, then exit indicating failure
+	cat "$fn"
+	exit 1
+else
+	# Success
+	exit 0
+fi
--
2.17.1

Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?

From 1064a0794e85e06b3a0eca4ed3765f078795cb36 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 3 Apr 2022 00:10:20 -0500
Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats

ccache since 4.0 enables zstd compression by default.

With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179
todo: compiler warnings

With compression disabled (https://cirrus-ci.com/task/4614182514458624):
linux: 91MB cirrus cache; cache_size_kibibyte 316136
macos: 41MB cirrus cache; cache_size_kibibyte 118068
windows: 42MB cirrus cache; cache_size_kibibyte 134064
freebsd is the same

The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
to avoid re-uploading cache tarball in a 100% cached build, due only to
updating ./**/stats).

Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.

I stared at this commit message for a while, trying to make sense of it, and
couldn't really. I assume you're saying that the cirrus compression is better
with ccache compression disabled, but it's extremely hard to parse out of it.

This does too much at once. Show stats, change cache sizes, disable
compression.

From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 26 Jul 2022 20:30:02 -0500
Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys..

Since otherwise, building with ci-os-only will probably fail to use the normal
cache, since the cache key is computed using both the task name and its *index*
in the list of caches (internal/executor/cache.go:184).

Hm, perhaps worth confirming and/or reporting to cirrus rather?

From 8de5c977686270b0a4e666a924ebe820a245913a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 24 Jul 2022 23:09:12 -0500
Subject: [PATCH 05/25] silence make distprep and generated-headers

this saves vertical screen space.

/messages/by-id/20220221164736.rq3ornzjdkmwk2wo@alap3.anarazel.de

I don't feel this should go in as a part of CI changes. Or rather, I feel
uncomfortable committing it when just discussed under this subject.

From eaf263ccaa8310c5d9834b97e93ad8434d63296e Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 24 Jul 2022 22:44:53 -0500
Subject: [PATCH 06/25] pg_regress: run more quietly

The number of lines of output should be closer to 1 per test, rather than 25 +
1 per test.

/messages/by-id/20220221173109.yl6dqqu3ud52ripd@alap3.anarazel.de

See above. There's also a dedicated thread about revising the output.

From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 24 Jun 2022 00:09:12 -0500
Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
repartitiion

There was some historic problem where tests under freebsd took 8+ minutes (and
before 4a288a37f took 15 minutes).

This reduces test time from 10min to 3min.
4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600

6 CPUs https://cirrus-ci.com/task/6678321684545536
8 CPUs https://cirrus-ci.com/task/6264854121021440

See also:
/messages/by-id/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de
8 jobs 7min https://cirrus-ci.com/task/6186376667332608

xi-os-only: freebsd

Typo.

@@ -71,8 +69,6 @@ task:
fingerprint_key: ccache/freebsd
reupload_on_changes: true

- # Workaround around performance issues due to 32KB block size
- repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .
--

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 27 Jul 2022 16:54:47 -0500
Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath

From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 13 Feb 2022 17:56:40 -0600
Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min

No explanation?

From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 26 Aug 2022 12:00:10 -0500
Subject: [PATCH 15/25] f!and chdir

I don't see the point of pointing fixup commits to the list.

Greetings,

Andres Freund

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Andres Freund (#2)

On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:

--- /dev/null
+++ b/src/tools/ci/windows-compiler-warnings
Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?

I had written it inline in a couple ways, like
- sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'

but then separated it out as you suggested in
20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de

after I complained about cmd.exe requiring escaping for && and ||
That makes writing any shell script a bit perilous and a separate script
seems better.

Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats

ccache since 4.0 enables zstd compression by default.

With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179
todo: compiler warnings

With compression disabled (https://cirrus-ci.com/task/4614182514458624):
linux: 91MB cirrus cache; cache_size_kibibyte 316136
macos: 41MB cirrus cache; cache_size_kibibyte 118068
windows: 42MB cirrus cache; cache_size_kibibyte 134064
freebsd is the same

The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
to avoid re-uploading cache tarball in a 100% cached build, due only to
updating ./**/stats).

Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.

I stared at this commit message for a while, trying to make sense of it, and
couldn't really. I assume you're saying that the cirrus compression is better
with ccache compression disabled, but it's extremely hard to parse out of it.

Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going
to use no matter what ccache does, and gzip's default -6 is better than
ccache's zstd-1.

This does too much at once. Show stats, change cache sizes, disable
compression.

The cache size change is related to the compression level change; ccache
prunes based on the local size, which was compressed with zstd-1 and,
with this patch, not compressed (so ~2x larger). Also, it's more
interesting to control the size uploaded to cirrus (after compression
ith gzip-6).

From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 26 Jul 2022 20:30:02 -0500
Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys..

Since otherwise, building with ci-os-only will probably fail to use the normal
cache, since the cache key is computed using both the task name and its *index*
in the list of caches (internal/executor/cache.go:184).

Hm, perhaps worth confirming and/or reporting to cirrus rather?

I know because of reading their source. Unfortunately, there's no
commit history indicating the intent or rationale.
https://github.com/cirruslabs/cirrus-ci-agent/blob/master/internal/executor/cache.go#L183

From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 24 Jun 2022 00:09:12 -0500
Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
repartitiion

There was some historic problem where tests under freebsd took 8+ minutes (and
before 4a288a37f took 15 minutes).

This reduces test time from 10min to 3min.
4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600

6 CPUs https://cirrus-ci.com/task/6678321684545536
8 CPUs https://cirrus-ci.com/task/6264854121021440

See also:
/messages/by-id/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de
8 jobs 7min https://cirrus-ci.com/task/6186376667332608

xi-os-only: freebsd

Typo.

No - it's deliberate so I can switch to and from "everything" to "this
only".

@@ -71,8 +69,6 @@ task:
fingerprint_key: ccache/freebsd
reupload_on_changes: true

- # Workaround around performance issues due to 32KB block size
- repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .
--

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

Is there a good description of the original problem ? Originally,
freebsd check-world took ~15min to run tests, and when we changed to use
-Og it took 10min. Since then, seems to have improved on its own, and
currently takes ~6min. This patch adds CPUs to make it run in ~4min,
and takes the opportuity to drop the historic repartition stuff.

From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 27 Jul 2022 16:54:47 -0500
Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath

From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 13 Feb 2022 17:56:40 -0600
Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min

No explanation?

Because of the immediately following commit which makes it run all the
tests.

From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 26 Aug 2022 12:00:10 -0500
Subject: [PATCH 15/25] f!and chdir

I don't see the point of pointing fixup commits to the list.

It's a separate commit to make it easy to see the changes, separately,
since I imagine maybe the "chdir" part won't be desirable, or maybe the
PATH part won't. But I'm not sure, so I'm here soliciting feedback.

--
Justin

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Justin Pryzby (#3)

Hi,

On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote:

On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
--- /dev/null
+++ b/src/tools/ci/windows-compiler-warnings
Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?
I had written it inline in a couple ways, like
- sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'

but then separated it out as you suggested in
20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de

after I complained about cmd.exe requiring escaping for && and ||
That makes writing any shell script a bit perilous and a separate script
seems better.

I remember that I suggested it - but note that the way I wrote above doesn't
have anything needing escaping. Anyway, what do you think of the multiline
split I suggested?

Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats

ccache since 4.0 enables zstd compression by default.

With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179
todo: compiler warnings

With compression disabled (https://cirrus-ci.com/task/4614182514458624):
linux: 91MB cirrus cache; cache_size_kibibyte 316136
macos: 41MB cirrus cache; cache_size_kibibyte 118068
windows: 42MB cirrus cache; cache_size_kibibyte 134064
freebsd is the same

The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
to avoid re-uploading cache tarball in a 100% cached build, due only to
updating ./**/stats).

Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.

I stared at this commit message for a while, trying to make sense of it, and
couldn't really. I assume you're saying that the cirrus compression is better
with ccache compression disabled, but it's extremely hard to parse out of it.

Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going
to use no matter what ccache does, and gzip's default -6 is better than
ccache's zstd-1.

This does too much at once. Show stats, change cache sizes, disable
compression.

The cache size change is related to the compression level change; ccache
prunes based on the local size, which was compressed with zstd-1 and,
with this patch, not compressed (so ~2x larger). Also, it's more
interesting to control the size uploaded to cirrus (after compression
ith gzip-6).

That's what should have been in the commit message.

From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 24 Jun 2022 00:09:12 -0500
Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
repartitiion

There was some historic problem where tests under freebsd took 8+ minutes (and
before 4a288a37f took 15 minutes).

This reduces test time from 10min to 3min.
4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600

6 CPUs https://cirrus-ci.com/task/6678321684545536
8 CPUs https://cirrus-ci.com/task/6264854121021440

See also:
/messages/by-id/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de
8 jobs 7min https://cirrus-ci.com/task/6186376667332608

xi-os-only: freebsd

Typo.

No - it's deliberate so I can switch to and from "everything" to "this
only".

I don't see the point in posting patches to be applied if they contain lots of
such things that a potential committer would need to catch and include a lot
of of fixup patches.

@@ -71,8 +69,6 @@ task:
fingerprint_key: ccache/freebsd
reupload_on_changes: true

- # Workaround around performance issues due to 32KB block size
- repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .
--

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

It's possible it was mainly needed for testing with aio + dio. But also
possible that an upgrade improved the situation since.

From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 27 Jul 2022 16:54:47 -0500
Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath

From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 13 Feb 2022 17:56:40 -0600
Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min

No explanation?

Because of the immediately following commit which makes it run all the
tests.

Mention that in the commit message then. Especially when dealing with 25
commits I don't think you can expect others to infer such things.

From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 26 Aug 2022 12:00:10 -0500
Subject: [PATCH 15/25] f!and chdir

I don't see the point of pointing fixup commits to the list.

It's a separate commit to make it easy to see the changes, separately,
since I imagine maybe the "chdir" part won't be desirable, or maybe the
PATH part won't. But I'm not sure, so I'm here soliciting feedback.

Shrug, I doubt you'll get much if asked that way.

Greetings,

Andres Freund

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#4)

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:

On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote:
On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
--- /dev/null
+++ b/src/tools/ci/windows-compiler-warnings
Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?
I had written it inline in a couple ways, like
- sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'

but then separated it out as you suggested in
20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de

after I complained about cmd.exe requiring escaping for && and ||
That makes writing any shell script a bit perilous and a separate script
seems better.
I remember that I suggested it - but note that the way I wrote above doesn't
have anything needing escaping.

It doesn't require it, but that still gives the impression that it's
normally possible to write one-liner shell scripts there, which is
misleading/wrong, and the reason why I took your suggestion to use a
separate script file.

Anyway, what do you think of the multiline split I suggested?

Done, and sorted.

That's what should have been in the commit message.

Sure. I copied into the commit message the explanation that I had
written in June's email.

xi-os-only: freebsd

Typo.

No - it's deliberate so I can switch to and from "everything" to "this
only".

I don't see the point in posting patches to be applied if they contain lots of
such things that a potential committer would need to catch and include a lot
of of fixup patches.

I get that you disliked that I disabled the effect of a CI tag by
munging "c" to "x". I've amended the message to avoid confusion. But,
lots of what such things ? "ci-os-only" would be removed before being
pushed anyway.

"catching things" is the first part of the review process, which (as I
understand) is intended to help patch authors to improve their patches.
If you found lots of problems in my patches, I'd need to know about
them; but most of what I heard seem like quibbles about the presentation
of the patches. It's true that some parts are dirty/unclear, and that
seems reasonable for patches most of which haven't yet received review,
for which I asked whether to pursue the patch at all, and how best to
present them. This is (or could be) an opportunity to make
improvements.

I renamed the two, related patches to Cluser.pm which said "f!", which
are deliberately separate but looked like "fixup" patches. Are you
interested in any combination of those three, related changes to move
logic from Makefile to perl ? If not, we don't need to debate the
merits of spliting the patch.

What about the three, related changes for ccache compression ?

Should these be dropped in favour of meson ?
- cirrus/vcregress: test modules/contrib with NO_INSTALLCHECK=1
- vcregress: add alltaptests

I added: "WIP: skip building if only docs have changed"

changesInclude() didn't seem to work right when I first tried to use it.
Eventually, I realized that it seems to use something like "git log",
and not "git diff" (as I'd thought). It seems to work fine now that I
know what to expect.

git commit --amend --no-edit
git diff --stat @{1}..@{0} # this outputs nothing
git log --stat @{1}..@{0} # this lists the files changed by the tip commit

It'd be nice to be have cfbot inject this patch into each commitfest
patch for awhile, to make sure everything works as expected. Same for
the code coverage patch and the doc artifacts patch. (These patches
currently assume that the base commit is HEAD~1, which is correct for
cfbot, and that would also provide code coverage and docs until such
time as cfbot is updated to apply and preserve the original series of
patches).

--
Justin

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#4)

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:

@@ -71,8 +69,6 @@ task:
fingerprint_key: ccache/freebsd
reupload_on_changes: true

- # Workaround around performance issues due to 32KB block size
- repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .
--

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

It's possible it was mainly needed for testing with aio + dio. But also
possible that an upgrade improved the situation since.

Maybe freebsd got faster as a result of the TAU CPUs?
https://mobile.twitter.com/cirrus_labs/status/1534982111568052240

I noticed because it's been *slower* the last ~24h since cirrusci
disabled TAU, as Thomas commit mentioned.
https://twitter.com/cirrus_labs/status/1572657320093712384

For example this CF entry:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3736
https://cirrus-ci.com/task/4670794365140992 5m36s - 4days ago
https://cirrus-ci.com/task/4974926233862144 5m25s - 3days ago
https://cirrus-ci.com/task/5561409034518528 5m29s - 2days ago
https://cirrus-ci.com/task/6432442008469504 9m19s - yesterday

CF_BOT's latest tasks seem to be fast again, since 1-2h ago.
https://cirrus-ci.com/build/5178906041909248 9m1s
https://cirrus-ci.com/build/4593160281128960 5m8s
https://cirrus-ci.com/build/4539845644124160 5m22s

The logs for July show when freebsd started "being fast":
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3708
https://cirrus-ci.com/task/6316073015312384 10m25s Jul 13
https://cirrus-ci.com/task/5662878987452416 5m48s Jul 15

Maybe that changed in July rather than June because the TAU CPUs were
still not available in every region/zone (?)
https://cloud.google.com/compute/docs/regions-zones/

I have no idea if the TAU CPUs eliminate/mitigate the original
performance issue you had with AIO. But they have such a large effect
on freebsd that it could now be the fastest task, if given more than 2
CPUs.

--
Justin

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#6)

Hi,

On 2022-09-22 16:07:02 -0500, Justin Pryzby wrote:

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:

@@ -71,8 +69,6 @@ task:
fingerprint_key: ccache/freebsd
reupload_on_changes: true

- # Workaround around performance issues due to 32KB block size
- repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .
--

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

It's possible it was mainly needed for testing with aio + dio. But also
possible that an upgrade improved the situation since.

Maybe freebsd got faster as a result of the TAU CPUs?
https://mobile.twitter.com/cirrus_labs/status/1534982111568052240

I noticed because it's been *slower* the last ~24h since cirrusci
disabled TAU, as Thomas commit mentioned.
https://twitter.com/cirrus_labs/status/1572657320093712384

Yea, I noticed that as well. It's entirely possible that something in the
"hardware" stack improved sufficiently to avoid problems.

I have no idea if the TAU CPUs eliminate/mitigate the original
performance issue you had with AIO. But they have such a large effect
on freebsd that it could now be the fastest task, if given more than 2
CPUs.

I'm planning to rebase early next week and try that out.

Greetings,

Andres Freund

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#6)

On Fri, Sep 23, 2022 at 9:07 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:

- # Workaround around performance issues due to 32KB block size
- repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
create_user_script: |
pw useradd postgres
chown -R postgres:postgres .

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

It's possible it was mainly needed for testing with aio + dio. But also
possible that an upgrade improved the situation since.

Yeah, it is very important for direct I/O (patches soon...), because
every 8KB random write becomes a read-32KB/wait/write-32KB without it.
It's slightly less important for buffered I/O, because the kernel
caches hide that, but it still triggers I/O bandwidth amplification,
and we definitely saw positive effects earlier on the CI system back
on the previous generation. FWIW I am planning to see about getting
the FreeBSD installer to create the root file system the way we want,
instead of this ugliness.

Maybe freebsd got faster as a result of the TAU CPUs?
[data]

Very interesting. Would be good to find the exact instance types +
storage types to see if there has been a massive IOPS boost, perhaps
via local SSD. The newer times are getting closer to a local
developer machine.

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#5)

Hi,

On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote:

From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 3 Apr 2022 00:10:20 -0500
Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats

Since v4.0, ccache enables zstd compression by default, saving roughly
2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable
ccache compression, allowing cirrus to gzip the uncompressed data
(better than ccache's default of zstd-1).

I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's
default IIRC). It'd be good if we could increase cache utilization.

From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 14 Apr 2022 06:27:07 -0500
Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and
freebsd

windows is slower than freebsd and mac, so it's okay to enable options which
will slow them down some. Also, the cirrusci mac instances always have lot of
cores available.

See:
/messages/by-id/20211217193159.pwrelhiyx7kevgsn@alap3.anarazel.de
/messages/by-id/20211213211223.vkgg3wwiss2tragj@alap3.anarazel.de
/messages/by-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA@mail.gmail.com
/messages/by-id/20220325000933.vgazz7pjk2ytj65d@alap3.anarazel.de

ci-os-only: freebsd, macos
---
.cirrus.yml | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 183e8746ce6..4ad20892eeb 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -113,7 +113,9 @@ task:
\
CC="ccache cc" \
CXX="ccache c++" \
-        CFLAGS="-Og -ggdb"
+        CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST" \
+        CXXFLAGS="-Og -ggdb -march=native -mtune=native" \
+        CFLAGS="-Og -ggdb -march=native -mtune=native"

What's reason for -march=native -mtune=native here?

EOF
build_script: |
su postgres -c "ccache --zero-stats"
@@ -336,8 +338,8 @@ task:
CC="ccache cc" \
CXX="ccache c++" \
CLANG="ccache ${brewpath}/llvm/bin/ccache" \
-      CFLAGS="-Og -ggdb" \
-      CXXFLAGS="-Og -ggdb" \
+      CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
+      CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
\
LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
PYTHON=python3

I'd also use CPPFLAGS here, given you'd used it above...

I'm planning to commit an updated version of this change soon, without the
-march=native -mtune=native bit, unless somebody protests...

Greetings,

Andres Freund

#10

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#9)

On Sat, Oct 01, 2022 at 05:45:01PM -0700, Andres Freund wrote:

Hi,

On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote:

From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 3 Apr 2022 00:10:20 -0500
Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats

Since v4.0, ccache enables zstd compression by default, saving roughly
2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable
ccache compression, allowing cirrus to gzip the uncompressed data
(better than ccache's default of zstd-1).

I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's
default IIRC). It'd be good if we could increase cache utilization.

I considered that (and I think that's what I wrote initially).

I figured that if cirrus is going to use gzip-6 (tar.gz) in any case, we
might as well disable compression. Then, all the tasks are also doing
the same thing (half the tasks have ccache before 4.0).

From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 14 Apr 2022 06:27:07 -0500
Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and
freebsd

windows is slower than freebsd and mac, so it's okay to enable options which
will slow them down some. Also, the cirrusci mac instances always have lot of
cores available.
See:
/messages/by-id/20211217193159.pwrelhiyx7kevgsn@alap3.anarazel.de
/messages/by-id/20211213211223.vkgg3wwiss2tragj@alap3.anarazel.de
/messages/by-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA@mail.gmail.com
/messages/by-id/20220325000933.vgazz7pjk2ytj65d@alap3.anarazel.de

ci-os-only: freebsd, macos
---
.cirrus.yml | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 183e8746ce6..4ad20892eeb 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -113,7 +113,9 @@ task:
\
CC="ccache cc" \
CXX="ccache c++" \
-        CFLAGS="-Og -ggdb"
+        CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST" \
+        CXXFLAGS="-Og -ggdb -march=native -mtune=native" \
+        CFLAGS="-Og -ggdb -march=native -mtune=native"
What's reason for -march=native -mtune=native here?

No particular reason, and my initial patch didn't have it.
I suppose I added it to test its effect and never got rid of it.

EOF
build_script: |
su postgres -c "ccache --zero-stats"
@@ -336,8 +338,8 @@ task:
CC="ccache cc" \
CXX="ccache c++" \
CLANG="ccache ${brewpath}/llvm/bin/ccache" \
-      CFLAGS="-Og -ggdb" \
-      CXXFLAGS="-Og -ggdb" \
+      CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
+      CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
\
LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
PYTHON=python3
I'd also use CPPFLAGS here, given you'd used it above...

I'm planning to commit an updated version of this change soon, without the
-march=native -mtune=native bit, unless somebody protests...

One other thing is that your -m32 changes caused the linux/meson task to
take an additional 3+ minutes (total ~8). That's no issue, except that
the Warnings task depends on the linux/mason task, and itself can take
up to 15 minutes.

So those two potentially take as long as the windows task.
I suggested that CompileWarnings could instead "Depend on: Freebsd",
which currently takes 6-7min (and could take 4-5min if given more CPUs).

--
Justin

#11

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#10)

Hi,

On 2022-10-01 19:58:01 -0500, Justin Pryzby wrote:

One other thing is that your -m32 changes caused the linux/meson task to
take an additional 3+ minutes (total ~8). That's no issue, except that
the Warnings task depends on the linux/mason task, and itself can take
up to 15 minutes.

So those two potentially take as long as the windows task.
I suggested that CompileWarnings could instead "Depend on: Freebsd",
which currently takes 6-7min (and could take 4-5min if given more CPUs).

I am wondering if we should instead introduce a new "quickcheck" task that
just compiles and runs maybe one test and have *all* other tests depend on
that. Wasting a precious available windows instance to just fail to build or
immediately fail during tests doesn't really make sense.

Greetings,

Andres Freund

#12

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#11)

Hi,

On 2022-10-01 18:36:41 -0700, Andres Freund wrote:

I am wondering if we should instead introduce a new "quickcheck" task that
just compiles and runs maybe one test and have *all* other tests depend on
that. Wasting a precious available windows instance to just fail to build or
immediately fail during tests doesn't really make sense.

Attached is an implementation of that idea.

I fairly randomly chose two quick tests to execute as part of the sanity
check, cube/regress pg_ctl/001_start_stop. I wanted to have coverage for
initdb, a pg_regress style test, a tap test, some other client binary.

With a primed cache this takes ~32s, not too bad imo. 12s of that is cloning
the repo.

What do you think?

We could bake a bare repo into the images to make the clone step in faster,
but that'd be for later anyway.

set -e
rm -rf /tmp/pg-clone-better
mkdir /tmp/pg-clone-better
cd /tmp/pg-clone-better
git init --bare
git remote add origin https://github.com/postgres/postgres.git --no-tags -t 'REL_*' -t master
git fetch -v
git repack -ad -f
du -sh

results in a 227MB repo.

git clone https://github.com/anarazel/postgres.git -v --depth 1000 -b ci-sanitycheck --reference /tmp/pg-clone-better /tmp/pg-clone-better-clone

clones an example branch in ~1.35s.

Greetings,

Andres Freund

#13

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#12)

On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:

Hi,

On 2022-10-01 18:36:41 -0700, Andres Freund wrote:

I am wondering if we should instead introduce a new "quickcheck" task that
just compiles and runs maybe one test and have *all* other tests depend on
that. Wasting a precious available windows instance to just fail to build or
immediately fail during tests doesn't really make sense.

With a primed cache this takes ~32s, not too bad imo. 12s of that is
cloning the repo.

Maybe - that would avoid waiting 4 minutes for a windows instance to
start in the (hopefully atypical) case of a patch that fails in 1-2
minutes under linux/freebsd.

If the patch were completely broken, the windows task would take ~4min
to start, plus up to ~4min before failing to compile or failing an early
test. 6-8 minutes isn't nothing, but doesn't seem worth the added
complexity.

Also, this would mean that in the common case, the slowest task would be
delayed until after the SanityCheck task instance starts, compiles, and
runs some test :( Your best case is 32sec, but I doubt that's going to
be typical.

I was thinking about the idea of cfbot handling "tasks" separately,
similar to what it used to do with travis/appveyor. The logic for
"windows tasks are only run if linux passes tests" could live there.
That could also be useful if there's ever the possibility of running an
additional OS on another CI provider, or if another provider can run
windows tasks faster, or if we need to reduce our load/dependency on
cirrus. I realized that goes backwards in some ways to the direction
we've gone with cirrus, and I'm not sure how exactly it would do that (I
suppose it might add ci-os-only tags to its commit message).

+    # no options enabled, should be small
+    CCACHE_MAXSIZE: "150M"

Actually, tasks can share caches if the "cache key" is set.

If there was a separate "Sanity" task, I think it should use whatever
flags linux (or freebsd) use to avoid doing two compilations (with lots
of cache misses for patches which modify *.h files, which would then
happen twice, in serial).

+  # use always: to continue after failures. Task that did not run count as a
+  # success, so we need to recheck SanityChecks's condition here ...

- # task that did not run, count as a success, so we need to recheck Linux'
- # condition here ...

Another/better justification/description is that "cirrus warns if the
depending task has different only_if conditions than the dependant task".

--
Justin

#14

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#13)

Hi,

On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote:

Maybe - that would avoid waiting 4 minutes for a windows instance to
start in the (hopefully atypical) case of a patch that fails in 1-2
minutes under linux/freebsd.

If the patch were completely broken, the windows task would take ~4min
to start, plus up to ~4min before failing to compile or failing an early
test. 6-8 minutes isn't nothing, but doesn't seem worth the added
complexity.

Avoiding 6-8mins of wasted windows time would, I think, allow us to crank
cfbot's concurrency up a notch or two.

Also, this would mean that in the common case, the slowest task would be
delayed until after the SanityCheck task instance starts, compiles, and
runs some test :( Your best case is 32sec, but I doubt that's going to
be typical.

Even the worst case isn't that bad, the uncached minimal build is 67s.

I was thinking about the idea of cfbot handling "tasks" separately,
similar to what it used to do with travis/appveyor. The logic for
"windows tasks are only run if linux passes tests" could live there.

I don't really see the advantage of doing that over just increasing
concurrency by a bit.

+    # no options enabled, should be small
+    CCACHE_MAXSIZE: "150M"
Actually, tasks can share caches if the "cache key" is set.
If there was a separate "Sanity" task, I think it should use whatever
flags linux (or freebsd) use to avoid doing two compilations (with lots
of cache misses for patches which modify *.h files, which would then
happen twice, in serial).

I think the price of using exactly the same flags is higher than the gain. And
it'll rarely work if we use the container task for the sanity check, as the
timestamps of the compiler, system headers etc will be different.

+  # use always: to continue after failures. Task that did not run count as a
+  # success, so we need to recheck SanityChecks's condition here ...
- # task that did not run, count as a success, so we need to recheck Linux'
- # condition here ...

Another/better justification/description is that "cirrus warns if the
depending task has different only_if conditions than the dependant task".

That doesn't really seem easier to understand to me.

Greetings,

Andres Freund

#15

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#13)

Hi,

On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote:

On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:

On 2022-10-01 18:36:41 -0700, Andres Freund wrote:

I am wondering if we should instead introduce a new "quickcheck" task that
just compiles and runs maybe one test and have *all* other tests depend on
that. Wasting a precious available windows instance to just fail to build or
immediately fail during tests doesn't really make sense.

With a primed cache this takes ~32s, not too bad imo. 12s of that is
cloning the repo.

Maybe - that would avoid waiting 4 minutes for a windows instance to
start in the (hopefully atypical) case of a patch that fails in 1-2
minutes under linux/freebsd.

If the patch were completely broken, the windows task would take ~4min
to start, plus up to ~4min before failing to compile or failing an early
test. 6-8 minutes isn't nothing, but doesn't seem worth the added
complexity.

Btw, the motivation to work on this just now was that I'd like to enable more
sanitizers (undefined,alignment for linux-meson, address for
linux-autoconf). Yes, we could make the dependency on freebsd instead, but I'd
like to try to enable the clang-only memory sanitizer there (if it works on
freebsd)...

Greetings,

Andres Freund

#16

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#15)

On Sun, Oct 02, 2022 at 02:54:21PM -0700, Andres Freund wrote:

the clang-only memory sanitizer there (if it works on freebsd)...

Have you looked at this much ? I think it'll require a bunch of
exclusions, right ?

--
Justin

#17

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Justin Pryzby (#5)

On Sat, Sep 10, 2022 at 03:05:42PM -0500, Justin Pryzby wrote:

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote:
On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
--- /dev/null
+++ b/src/tools/ci/windows-compiler-warnings
Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?
I had written it inline in a couple ways, like
- sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'

but then separated it out as you suggested in
20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de

after I complained about cmd.exe requiring escaping for && and ||
That makes writing any shell script a bit perilous and a separate script
seems better.
I remember that I suggested it - but note that the way I wrote above doesn't
have anything needing escaping.
It doesn't require it, but that still gives the impression that it's
normally possible to write one-liner shell scripts there, which is
misleading/wrong, and the reason why I took your suggestion to use a
separate script file.

Anyway, what do you think of the multiline split I suggested?

Done, and sorted.

Rewrote this and rebased some of the other stuff on top of the meson
commit, for which I also include some new patches.

#18

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#17)

Hi,

On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote:

Subject: [PATCH 1/8] meson: PROVE is not required
Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation'

Pushed, thanks for the patches.

Subject: [PATCH 2/8] meson: other fixes for cygwin

XXX: what about HAVE_BUGGY_STRTOF ?

What are you wondering about here? Shouldn't that continue to be set via
src/include/port/cygwin.h?

diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index 3dcfc11278f..6ec3c77af53 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -10,7 +10,7 @@ regress_sources = pg_regress_c + files(
# patterns like ".*-.*-mingw.*". We probably can do better, but for now just
# replace 'gcc' with 'mingw' on windows.
host_tuple_cc = cc.get_id()
-if host_system == 'windows' and host_tuple_cc == 'gcc'
+if host_system in ['windows', 'cygwin'] and host_tuple_cc == 'gcc'
host_tuple_cc = 'mingw'
endif

This doesn't quite seem right - shouldn't it say cywin? Not that it makes a
difference right now, given the contents of resultmap:
float4:out:.*-.*-cygwin.*=float4-misrounded-input.out
float4:out:.*-.*-mingw.*=float4-misrounded-input.out

From 0acbbd2fdd97bbafc5c4552e26f92d52c597eea9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 4/8] cirrus/windows: add compiler_warnings_script

I'm not sure how to write this test in windows shell; it's also not easy to
write it in posix sh, since windows shell is somehow interpretting && and ||...

/messages/by-id/20220212212310.f645c6vw3njkgxka@alap3.anarazel.de

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

ci-os-only: windows
---
.cirrus.yml | 10 +++++++++-
src/tools/ci/windows-compiler-warnings | 24 ++++++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletion(-)
create mode 100755 src/tools/ci/windows-compiler-warnings
diff --git a/.cirrus.yml b/.cirrus.yml
index 9f2282471a9..99ac09dc679 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -451,12 +451,20 @@ task:
build_script: |
vcvarsall x64
-    ninja -C build
+    ninja -C build |tee build/meson-logs/build.txt
+    REM Since pipes lose exit status of the preceding command, rerun compilation,
+    REM without the pipe exiting now if it fails, rather than trying to run checks
+    ninja -C build > nul

This seems mighty grotty :(. but I guess it's quick enough not worry about,
and I can't come up with a better plan.

It doesn't seem quite right to redirect into meson-logs/ to me, my
interpretation is that that's "meson's namespace". Why not just store it in
build/?

From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 26 Feb 2022 19:34:35 -0600
Subject: [PATCH 5/8] cirrus: build docs as a separate task..

This will run the doc build if any docs have changed, even if Linux
fails, to allow catch doc build failures.

This'll automatically show up as a separate "column" on cfbot.

Also, in the future, this will hopefully upload each patch's changed HTML docs
as an artifact, for easy review.

Note that this is currently building docs with both autoconf and meson.

ci-os-only: html
---
.cirrus.yml | 62 +++++++++++++++++++++++++++++++++++++----------------
1 file changed, 44 insertions(+), 18 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 99ac09dc679..37fd79e5b77 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -472,6 +472,9 @@ task:
type: text/plain
+###
+# Test that code can be built with gcc/clang without warnings
+###
task:
name: CompilerWarnings
@@ -515,10 +518,6 @@ task:
#apt-get update
#DEBIAN_FRONTEND=noninteractive apt-get -y install ...

- ###
- # Test that code can be built with gcc/clang without warnings
- ###
-

Why remove this?

setup_script: echo "COPT=-Werror" > src/Makefile.custom

# Trace probes have a history of getting accidentally broken. Use the
@@ -580,20 +579,6 @@ task:
make -s -j${BUILD_JOBS} clean
time make -s -j${BUILD_JOBS} world-bin

-  ###
-  # Verify docs can be built
-  ###
-  # XXX: Only do this if there have been changes in doc/ since last build
-  always:
-    docs_build_script: |
-      time ./configure \
-        --cache gcc.cache \
-        CC="ccache gcc" \
-        CXX="ccache g++" \
-        CLANG="ccache clang"
-      make -s -j${BUILD_JOBS} clean
-      time make -s -j${BUILD_JOBS} -C doc
-
###
# Verify headerscheck / cpluspluscheck succeed
#
@@ -617,3 +602,44 @@ task:

always:
upload_caches: ccache
+
+
+###
+# Verify docs can be built
+# changesInclude() will skip this task if none of the commits since
+# CIRRUS_LAST_GREEN_CHANGE touched any relevant files. The comparison appears
+# to be like "git log a..b -- ./file", not "git diff a..b -- ./file"
+###
+
+task:
+  name: Documentation
+
+  env:
+    CPUS: 1
+    BUILD_JOBS: 1
+
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*'
+  skip: "!changesInclude('.cirrus.yml', 'doc/**')"

Perhaps we should introduce something other than ci-os-only if we want that to
include things like "docs and html". At least this should update
src/tools/ci/README.

+  sysinfo_script: |
+    id
+    uname -a
+    cat /proc/cmdline
+    ulimit -a -H && ulimit -a -S
+    export

I think we can skip this here.

+  # Exercise HTML and other docs:
+  ninja_docs_build_script: |
+    mkdir build.ninja
+    cd build.ninja

Perhaps build-ninja instead? build.ninja is the filename for ninja's build
instructions, so it seems a bit confusing.

From adebe93a4409990e929f2775d45c6613134a4243 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 26 Jul 2022 20:30:02 -0500
Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..

Since otherwise, building with ci-os-only will probably fail to use the
normal cache, since the cache key is computed using both the task name
and its *index* in the list of caches (internal/executor/cache.go:184).

Seems like this would potentially better addressed by reporting a bug to the
cirrus folks?

ccache_cache:
folder: ${CCACHE_DIR}
+    fingerprint_key: ccache/linux
+    reupload_on_changes: true

There's enough copies of this to make it worth deduplicating. If we use
something like
fingerprint_script: echo ccache/$CIRRUS_BRANCH/$CIRRUS_OS
we can use a yaml ref?

I think you experimented with creating a 'base' ccache dir (e.g. on the master
branch) and then using branch specific secondar caches? How did that turn out?
I think cfbot's caches constantly get removed due to overrunning the global
space.

From f16739bc5d2087847129baf663aa25fa9edb8449 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 3 Apr 2022 00:10:20 -0500
Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats

Since v4.0, ccache enables zstd compression by default, saving roughly
2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable
ccache compression, allowing cirrus to gzip the uncompressed data
(better than ccache's default of zstd-1).

With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte 51179
todo: compiler warnings

With compression disabled (https://cirrus-ci.com/task/4614182514458624):
linux: 91MB cirrus cache; cache_size_kibibyte 316136
macos: 41MB cirrus cache; cache_size_kibibyte 118068
windows: 42MB cirrus cache; cache_size_kibibyte 134064
freebsd is the same

I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for
example, and all that additional IO and memory usage is bound to show up.

The stats should be shown and/or logged.
ccache --show-stats shows the *cumulative* stats (including prior
compilations)
ccache --zero-stats clears out not only the global stats, but the
per-file cache stats (from which the global stats are derived) - which
obviously makes the cache work poorly.

Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
The log should be written *outside* the ccache folder - it shouldn't be
preserved across cirrusci task invocations.

I assume we don't have a new enough ccache everywhere yet?

Greetings,

Andres Freund

#19

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#18)

On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote:

Hi,

On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote:

Subject: [PATCH 1/8] meson: PROVE is not required
Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation'

Pushed, thanks for the patches.

Thanks.

diff --git a/.cirrus.yml b/.cirrus.yml
index 9f2282471a9..99ac09dc679 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -451,12 +451,20 @@ task:
build_script: |
vcvarsall x64
-    ninja -C build
+    ninja -C build |tee build/meson-logs/build.txt
+    REM Since pipes lose exit status of the preceding command, rerun compilation,
+    REM without the pipe exiting now if it fails, rather than trying to run checks
+    ninja -C build > nul
This seems mighty grotty :(. but I guess it's quick enough not worry about,
and I can't come up with a better plan.

It doesn't seem quite right to redirect into meson-logs/ to me, my
interpretation is that that's "meson's namespace". Why not just store it in
build/?

I put it there so it'd be included with the build artifacts.
Maybe it's worth adding a separate line to artifacts for stuff like
this, and ccache log ?

From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 26 Feb 2022 19:34:35 -0600
Subject: [PATCH 5/8] cirrus: build docs as a separate task..

+  # Exercise HTML and other docs:
+  ninja_docs_build_script: |
+    mkdir build.ninja
+    cd build.ninja
Perhaps build-ninja instead? build.ninja is the filename for ninja's build
instructions, so it seems a bit confusing.

Sure.

Do you think building docs with both autoconf and meson is what's
desirable here ?

I'm not sure if this ought to be combined with/before/after your "move
compilerwarnings task to meson" patch? (Regarding that patch: I
mentioned that it shouldn't use ccache -C, and it should use
meson_log_artifacts.)

From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 26 Jul 2022 20:30:02 -0500
Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..

Since otherwise, building with ci-os-only will probably fail to use the
normal cache, since the cache key is computed using both the task name
and its *index* in the list of caches (internal/executor/cache.go:184).

Seems like this would potentially better addressed by reporting a bug to the
cirrus folks?

You said that before, but I don't think so - since they wrote code to do
that, it's odd to file a bug that says that the behavior is wrong. I am
curious why, but it seems delibrate.

/messages/by-id/20220828171029.GO2342@telsasoft.com

There's enough copies of this to make it worth deduplicating. If we
use something like fingerprint_script: echo
ccache/$CIRRUS_BRANCH/$CIRRUS_OS we can use a yaml ref?

I'll look into it...

I think you experimented with creating a 'base' ccache dir (e.g. on the master
branch) and then using branch specific secondar caches?

I have to revisit that sometime.

That's a new feaure in ccache 4.4, which is currently only in macos.
This is another thing that it'd be easier to test by having cfbot
clobber the cirrus.yml rather than committing to postgres repo.
(Technically, it should probably only do use the in-testing cirrus.yml
if the patch it's testing doesn't itself modify .cirrus.yml)

How did that turn out? I think cfbot's caches constantly get removed
due to overrunning the global space.

For cfbot, I don't know if there's much hope that any patch-specific
build artifacts will be cached from the previous run, typically ~24h
prior.

One idea I have, for the "Warnings" task (and maybe linux too), is to
defer pruning until after all the compilations. To avoid LRU pruning
during early tasks causing bad hit ratios of later tasks.

Another thing that probably happens is that task1 starts compiling
patch1, and then another instance of task1 starts compiling patch2. A
bit later, the first instance will upload its ccache result for patch1,
which will be summarily overwritten by the second instance's compilation
result, which doesn't include anything from the first instance.

Also, whenever ccache hits its MAXSIZE threshold, it prunes the cache
down to 80% of the configured size, which probably wipes away everything
from all but the most recent ~20 builds.

I also thought about having separate caches for each compilation in the
warnings task - but that requires too much repeated yaml just for that..

From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 3 Apr 2022 00:10:20 -0500
Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats

linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte 51179

I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for
example, and all that additional IO and memory usage is bound to show up.

I think you're referring to the proposed mingw task which runs under
windows, and not the existing cross-compilation ?

And you're right - I remember this now (I think it's due to PCH?)

In my local copy I'd "unset CCACHE_NOCOMPRESS". But I view that as an
oddity of windows headers, rather than an argument against disabling
compression elsewhere. BTW, freebsd ccache is too old to use
compression.

What about using CCACHE_HARDLINK (which implies no compression) ?

Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
The log should be written *outside* the ccache folder - it shouldn't be
preserved across cirrusci task invocations.

I assume we don't have a new enough ccache everywhere yet?

No - see above.

I've added patches to update macos.

--
Justin

#20

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#15)

Hi,

On 2022-10-02 14:54:21 -0700, Andres Freund wrote:

On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote:

On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:

On 2022-10-01 18:36:41 -0700, Andres Freund wrote:

I am wondering if we should instead introduce a new "quickcheck" task that
just compiles and runs maybe one test and have *all* other tests depend on
that. Wasting a precious available windows instance to just fail to build or
immediately fail during tests doesn't really make sense.

With a primed cache this takes ~32s, not too bad imo. 12s of that is
cloning the repo.

Maybe - that would avoid waiting 4 minutes for a windows instance to
start in the (hopefully atypical) case of a patch that fails in 1-2
minutes under linux/freebsd.

If the patch were completely broken, the windows task would take ~4min
to start, plus up to ~4min before failing to compile or failing an early
test. 6-8 minutes isn't nothing, but doesn't seem worth the added
complexity.

Btw, the motivation to work on this just now was that I'd like to enable more
sanitizers (undefined,alignment for linux-meson, address for
linux-autoconf). Yes, we could make the dependency on freebsd instead, but I'd
like to try to enable the clang-only memory sanitizer there (if it works on
freebsd)...

I've used this a bunch on personal branches, and I think it's the way to
go. It doesn't take long, saves a lot of cycles when one pushes something
broken. Starts to runs the CompilerWarnings task after a minimal amount of
sanity checking, instead of having to wait for a task running all tests,
without the waste of running it immediately and failing all the different
configurations, which takes forever.

Greetings,

Andres Freund

#21

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#20)

#22

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#21)

#23

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#22)

#24

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#12)

#25

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#24)

#26

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#25)

#27

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#25)

#28

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#27)

#29

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#25)

#30

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#19)

#31

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#30)

#32

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#31)

#33

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: Thomas Munro (#32)

#34

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#30)

#35

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Justin Pryzby (#31)

#36

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Justin Pryzby (#35)

#37

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#36)

#38

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Andres Freund (#37)

#39

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Thomas Munro (#32)

#40

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Thomas Munro (#39)

#41

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Tom Lane (#40)

#42

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Thomas Munro (#41)

#43

Peter Eisentraut

peter_e@gmx.net

about 3 years ago

In reply to: Justin Pryzby (#42)

#44

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Peter Eisentraut (#43)

#45

Peter Eisentraut

peter_e@gmx.net

about 3 years ago

In reply to: Justin Pryzby (#44)

#46

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Peter Eisentraut (#45)

#47

Peter Eisentraut

peter_e@gmx.net

about 3 years ago

In reply to: Justin Pryzby (#46)

#48

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Peter Eisentraut (#47)

#49

Peter Eisentraut

peter_e@gmx.net

almost 3 years ago

In reply to: Justin Pryzby (#48)

#50

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Justin Pryzby (#36)

#51

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Justin Pryzby (#50)

#52

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Michael Paquier (#51)

#53

Michael Paquier

michael@paquier.xyz

over 2 years ago

In reply to: vignesh C (#52)

#54

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Michael Paquier (#53)

#55

Alvaro Herrera

alvherre@2ndquadrant.com

over 2 years ago

In reply to: vignesh C (#54)

#56

Justin Pryzby

pryzby@telsasoft.com

over 2 years ago

In reply to: Alvaro Herrera (#55)

#57

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Justin Pryzby (#56)

#58

Andrey Borodin

amborodin@acm.org

about 2 years ago

In reply to: Peter Eisentraut (#57)

#59

Michael Paquier

michael@paquier.xyz

about 2 years ago

In reply to: Andrey Borodin (#58)

#60

Justin Pryzby

pryzby@telsasoft.com

almost 2 years ago

In reply to: Justin Pryzby (#56)

#61

Nazir Bilal Yavuz

byavuz81@gmail.com

almost 2 years ago

In reply to: Justin Pryzby (#60)

#62

Justin Pryzby

pryzby@telsasoft.com

almost 2 years ago

In reply to: Nazir Bilal Yavuz (#61)

#63

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Justin Pryzby (#62)

#64

Nazir Bilal Yavuz

byavuz81@gmail.com

almost 2 years ago

In reply to: Justin Pryzby (#62)

#65

Justin Pryzby

pryzby@telsasoft.com

almost 2 years ago

In reply to: Nazir Bilal Yavuz (#64)

#66

Andres Freund

andres@anarazel.de

almost 2 years ago

In reply to: Justin Pryzby (#65)

#67

Justin Pryzby

pryzby@telsasoft.com

almost 2 years ago

In reply to: Andres Freund (#66)

#68

Andres Freund

andres@anarazel.de

almost 2 years ago

In reply to: Justin Pryzby (#67)

#69

Justin Pryzby

pryzby@telsasoft.com

almost 2 years ago

In reply to: Andres Freund (#68)

#70

Andres Freund

andres@anarazel.de

over 1 year ago

In reply to: Justin Pryzby (#69)

#71

Justin Pryzby

pryzby@telsasoft.com

over 1 year ago

In reply to: Andres Freund (#70)

Re: CI and test improvements

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: