Speed up clean meson builds by ~25%
Building the generated ecpg preproc file can take a long time. You can
check how long using:
ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o
This moves that file much closer to the top of our build order, so
building it can be pipelined much better with other files.
It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds.
You can check improvements for yourself with:
ninja -C build clean && ninja -C build all
Attachments:
v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patchapplication/octet-stream; name=v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patchDownload
From 1e2ae033c8f32f7fb35601eca17edf4783c57a30 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Fri, 5 Apr 2024 00:38:02 +0200
Subject: [PATCH v1] Speed up clean parallel meson builds a lot
Building the generated ecpg preproc file can take a long time. You can
check how long using:
ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o
This moves that file much closer to the top of our build order, so
building it can be pipelined much better with other files.
It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds.
---
src/interfaces/ecpg/meson.build | 2 +-
src/meson.build | 9 ++++++---
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/src/interfaces/ecpg/meson.build b/src/interfaces/ecpg/meson.build
index ac42a70a31f..05db32c226e 100644
--- a/src/interfaces/ecpg/meson.build
+++ b/src/interfaces/ecpg/meson.build
@@ -3,9 +3,9 @@
ecpg_targets = []
subdir('include')
+subdir('preproc')
subdir('pgtypeslib')
subdir('ecpglib')
subdir('compatlib')
-subdir('preproc')
alias_target('ecpg', ecpg_targets)
diff --git a/src/meson.build b/src/meson.build
index 65c7d17d08f..6f3946ffd54 100644
--- a/src/meson.build
+++ b/src/meson.build
@@ -1,7 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
# libraries that other subsystems might depend upon first, in their respective
-# dependency order
+# dependency order. Apart from "interfaces", which we put at the top because
+# that speeds up parallel builds significantly since our generated ecpg file
+# takes a very long time to build so we want to start that as soon as
+# possible.
+
+subdir('interfaces')
subdir('timezone')
@@ -11,8 +16,6 @@ subdir('bin')
subdir('pl')
-subdir('interfaces')
-
subdir('tools/pg_bsd_indent')
base-commit: 88620824c2a62376e224c4b595b9fe69fb858978
--
2.34.1
On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio <me@jeltef.nl> wrote:
It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds.
After discussing this off-list with Bilal, I realized that this gain
is only happening for clang builds on my system. Because those take a
long time as was also recently discussed in[1]/messages/by-id/CA+hUKGLvJ7-=fS-J9kN=aZWrpyqykwqCBbxXLEhUa9831dPFcg@mail.gmail.com. My builds don't take
nearly as long though. I tried with clang 15 through 18 and they all
took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu
22.04
[1]: /messages/by-id/CA+hUKGLvJ7-=fS-J9kN=aZWrpyqykwqCBbxXLEhUa9831dPFcg@mail.gmail.com
Hi,
On 2024-04-05 15:36:34 +0200, Jelte Fennema-Nio wrote:
On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio <me@jeltef.nl> wrote:
It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds.After discussing this off-list with Bilal, I realized that this gain
is only happening for clang builds on my system. Because those take a
long time as was also recently discussed in[1]. My builds don't take
nearly as long though. I tried with clang 15 through 18 and they all
took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu
22.04[1]: /messages/by-id/CA+hUKGLvJ7-=fS-J9kN=aZWrpyqykwqCBbxXLEhUa9831dPFcg@mail.gmail.com
I recommend opening a bug report for clang, best with an already preprocessed
input file.
We're going to need to do something about this from our side as well, I
suspect. The times aren't great with gcc either, even if not as bad as with
clang.
Greetings,
Andres Freund
On Fri, 5 Apr 2024 at 17:24, Andres Freund <andres@anarazel.de> wrote:
I recommend opening a bug report for clang, best with an already preprocessed
input file.
We're going to need to do something about this from our side as well, I
suspect. The times aren't great with gcc either, even if not as bad as with
clang.
Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.
So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.
On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:
Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.
Well, this is also painful with ./configure. So, even if we are going
to move away from it at this point, we still need to support it for a
couple of years. It looks to me that letting the clang folks know
about the situation is the best way forward.
--
Michael
Hello Michael,
08.04.2024 08:23, Michael Paquier wrote:
On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:
Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.Well, this is also painful with ./configure. So, even if we are going
to move away from it at this point, we still need to support it for a
couple of years. It looks to me that letting the clang folks know
about the situation is the best way forward.
As I wrote in [1]/messages/by-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8@gmail.com, I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...
[1]: /messages/by-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8@gmail.com
Best regards,
Alexander
On 05.04.24 18:19, Jelte Fennema-Nio wrote:
On Fri, 5 Apr 2024 at 17:24, Andres Freund <andres@anarazel.de> wrote:
I recommend opening a bug report for clang, best with an already preprocessed
input file.We're going to need to do something about this from our side as well, I
suspect. The times aren't great with gcc either, even if not as bad as with
clang.Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.
I have tested this with various compilers, and I can confirm that this
shaves off about 5 seconds from the build wall-clock time, which
represents about 10%-20% of the total time. I think this is a good patch.
Interestingly, if I apply the analogous changes to the makefiles, I
don't get any significant improvements. (Depends on local
circumstances, of course.) But I would still suggest to keep the
makefiles aligned.
On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin <exclusion@gmail.com> wrote:
As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...
Using the attached script I got these timings. Clang is significantly
slower in all of them. But especially with -Og the difference between
is huge.
gcc 11.4.0: 7.276s
clang 18.1.3: 17.216s
gcc 11.4.0 --debug: 7.441s
clang 18.1.3 --debug: 18.164s
gcc 11.4.0 --debug -Og: 2.418s
clang 18.1.3 --debug -Og: 14.864s
I reported this same issue to the LLVM project here:
https://github.com/llvm/llvm-project/issues/87973
Attachments:
On Mon, 8 Apr 2024 at 10:02, Peter Eisentraut <peter@eisentraut.org> wrote:
I have tested this with various compilers, and I can confirm that this
shaves off about 5 seconds from the build wall-clock time, which
represents about 10%-20% of the total time. I think this is a good patch.
Great to hear.
Interestingly, if I apply the analogous changes to the makefiles, I
don't get any significant improvements. (Depends on local
circumstances, of course.) But I would still suggest to keep the
makefiles aligned.
Attached is a patch that also updates the Makefiles, but indeed I
don't get any perf gains there either.
On Mon, 8 Apr 2024 at 07:23, Michael Paquier <michael@paquier.xyz> wrote:
Well, this is also painful with ./configure. So, even if we are going
to move away from it at this point, we still need to support it for a
couple of years. It looks to me that letting the clang folks know
about the situation is the best way forward.
I reported the issue to the clang folks:
https://github.com/llvm/llvm-project/issues/87973
But even if my patch doesn't help for ./configure, it still seems like
a good idea to me to still reduce compile times when using meson while
we wait for clang folks to address the issue.
Attachments:
v2-0001-Speed-up-clean-parallel-meson-builds-a-lot.patchapplication/octet-stream; name=v2-0001-Speed-up-clean-parallel-meson-builds-a-lot.patchDownload
From 9088dc0baf19b9d4c46fb5bf0a37b32f9902c1cd Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Fri, 5 Apr 2024 00:38:02 +0200
Subject: [PATCH v2] Speed up clean parallel meson builds a lot
Building the generated ecpg preproc file can take a long time. You can
check how long using:
ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o
This moves that file much closer to the top of our build order, so
building it can be pipelined much better with other files.
It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds. Perfermance gains are obviously dependent on
compiler, compile flags and machine.
This also updates our Makefiles in the same way to stay consistent with
the meson files, but I'm unable to get similar wall-clock time
performance gains.
---
src/Makefile | 2 +-
src/interfaces/Makefile | 2 +-
src/interfaces/ecpg/Makefile | 2 +-
src/interfaces/ecpg/meson.build | 2 +-
src/meson.build | 9 ++++++---
5 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..069202f0bd0 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -13,6 +13,7 @@ top_builddir = ..
include Makefile.global
SUBDIRS = \
+ interfaces \
common \
port \
timezone \
@@ -20,7 +21,6 @@ SUBDIRS = \
backend/utils/mb/conversion_procs \
backend/snowball \
include \
- interfaces \
backend/replication/libpqwalreceiver \
backend/replication/pgoutput \
fe_utils \
diff --git a/src/interfaces/Makefile b/src/interfaces/Makefile
index 7d56b29d28f..1f09925b213 100644
--- a/src/interfaces/Makefile
+++ b/src/interfaces/Makefile
@@ -12,7 +12,7 @@ subdir = src/interfaces
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = libpq ecpg
+SUBDIRS = ecpg libpq
$(recurse)
diff --git a/src/interfaces/ecpg/Makefile b/src/interfaces/ecpg/Makefile
index 3002bc3c1bb..949f8513c19 100644
--- a/src/interfaces/ecpg/Makefile
+++ b/src/interfaces/ecpg/Makefile
@@ -2,7 +2,7 @@ subdir = src/interfaces/ecpg
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = include pgtypeslib ecpglib compatlib preproc
+SUBDIRS = include preproc pgtypeslib ecpglib compatlib
# Suppress parallel build of subdirectories to avoid a bug in GNU make 3.82, cf
# https://savannah.gnu.org/bugs/?30653
diff --git a/src/interfaces/ecpg/meson.build b/src/interfaces/ecpg/meson.build
index ac42a70a31f..05db32c226e 100644
--- a/src/interfaces/ecpg/meson.build
+++ b/src/interfaces/ecpg/meson.build
@@ -3,9 +3,9 @@
ecpg_targets = []
subdir('include')
+subdir('preproc')
subdir('pgtypeslib')
subdir('ecpglib')
subdir('compatlib')
-subdir('preproc')
alias_target('ecpg', ecpg_targets)
diff --git a/src/meson.build b/src/meson.build
index 65c7d17d08f..6f3946ffd54 100644
--- a/src/meson.build
+++ b/src/meson.build
@@ -1,7 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
# libraries that other subsystems might depend upon first, in their respective
-# dependency order
+# dependency order. Apart from "interfaces", which we put at the top because
+# that speeds up parallel builds significantly since our generated ecpg file
+# takes a very long time to build so we want to start that as soon as
+# possible.
+
+subdir('interfaces')
subdir('timezone')
@@ -11,8 +16,6 @@ subdir('bin')
subdir('pl')
-subdir('interfaces')
-
subdir('tools/pg_bsd_indent')
base-commit: 8a1b31e6e59631807a08a4e9465134c343bbdf5e
--
2.34.1
Hello Jelte,
08.04.2024 11:36, Jelte Fennema-Nio wrote:
On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin <exclusion@gmail.com> wrote:
As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...Using the attached script I got these timings. Clang is significantly
slower in all of them. But especially with -Og the difference between
is huge.gcc 11.4.0: 7.276s
clang 18.1.3: 17.216s
gcc 11.4.0 --debug: 7.441s
clang 18.1.3 --debug: 18.164s
gcc 11.4.0 --debug -Og: 2.418s
clang 18.1.3 --debug -Og: 14.864sI reported this same issue to the LLVM project here:
https://github.com/llvm/llvm-project/issues/87973
Maybe we're talking about different problems.
At [1] Thomas (and then I) was unhappy with more than 200 seconds
duration...
/messages/by-id/CA+hUKGLvJ7-=fS-J9kN=aZWrpyqykwqCBbxXLEhUa9831dPFcg@mail.gmail.com
Best regards,
Alexander
Hi,
On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
Hello Michael,
08.04.2024 08:23, Michael Paquier wrote:
On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:
Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.Well, this is also painful with ./configure. So, even if we are going
to move away from it at this point, we still need to support it for a
couple of years. It looks to me that letting the clang folks know
about the situation is the best way forward.As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...[1] /messages/by-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8@gmail.com
I had this problem on my local computer. My build times are:
gcc: 20s
clang-15: 24s
clang-16: 105s
clang-17: 111s
clang-18: 25s
--
Regards,
Nazir Bilal Yavuz
Microsoft
On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...[1] /messages/by-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8@gmail.com
I had this problem on my local computer. My build times are:
gcc: 20s
clang-15: 24s
clang-16: 105s
clang-17: 111s
clang-18: 25s
Interesting. A parallel build of ecpg shows similar numbers here:
clang-16: 101s
clang-17: 112s
clang-18: 14s
gcc: 10s
Most of the time is still spent on preproc.c with clang-18, but that's
much, much faster (default version of clang is 16 on Debian GID where
I've run these numbers).
--
Michael
On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...[1] /messages/by-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8@gmail.com
I had this problem on my local computer. My build times are:
gcc: 20s
clang-15: 24s
clang-16: 105s
clang-17: 111s
clang-18: 25sInteresting. A parallel build of ecpg shows similar numbers here:
clang-16: 101s
clang-17: 112s
clang-18: 14s
gcc: 10s
I don't expect it to get fixed BTW, because it's present in 16.0.6,
and .6 is the terminal release, if I understand their system
correctly. They're currently only doing bug fixes for 18, and even
there not for much longer. Interesting that not everyone saw this at
first, perhaps the bug arrived in a minor release that some people
didn't have yet? Or perhaps there is something special required to
trigger it?
Hi,
On 2024-04-09 17:13:52 +1200, Thomas Munro wrote:
On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...[1] /messages/by-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8@gmail.com
I had this problem on my local computer. My build times are:
gcc: 20s
clang-15: 24s
clang-16: 105s
clang-17: 111s
clang-18: 25sInteresting. A parallel build of ecpg shows similar numbers here:
clang-16: 101s
clang-17: 112s
clang-18: 14s
gcc: 10sI don't expect it to get fixed BTW, because it's present in 16.0.6,
and .6 is the terminal release, if I understand their system
correctly. They're currently only doing bug fixes for 18, and even
there not for much longer. Interesting that not everyone saw this at
first, perhaps the bug arrived in a minor release that some people
didn't have yet? Or perhaps there is something special required to
trigger it?
I think we need to do something about the compile time of this file, even with
gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
top makes it considerably worse.
ISTM there's a bunch of pretty pointless stuff in the generated preproc.y,
which do seem to have some impact on compile time. E.g. a good bit of the file
is just stuff like
reserved_keyword:
ALL
{
$$ = mm_strdup("all");
}
...
Why are strduping all of these? We could instead just use the value of the
token, instead of forcing the compiler to generate branches for all individual
keywords etc.
I don't know off-hand if the keyword lookup machinery ends up with an
uppercase keyword, but if so, that'd be easy enough to change.
It actually looks to me like the many calls to mm_strdup() might actually be
what's driving clang nuts. I hacked up preproc.y to not need those calls for
unreserved_keyword
col_name_keyword
type_func_name_keyword
reserved_keyword
bare_label_keyword
by removing the actions and defining those tokens to be of type str. There are
many more such calls that could be dealt with similarly.
That alone reduced compile times with
clang-16 -O1 from 18.268s to 12.516s
clang-16 -O2 from 345.188 to 158.084s
clang-19 -O2 from 26.018s to 15.200s
I suspect what is happening is that clang tries to optimize the number of
calls to mm_strdup(), by separating the argument setup from the function
call. Which leads to a control flow graph with *many* incoming edges to the
basic block containing the function call to mm_strdup(), triggering a normally
harmless O(N^2) or such.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
I think we need to do something about the compile time of this file, even with
gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
top makes it considerably worse.
Seems reasonable, if we can.
Why are strduping all of these?
IIRC, the issue is that the mechanism for concatenating the tokens
back together frees the input strings
static char *
cat2_str(char *str1, char *str2)
{
char * res_str = (char *)mm_alloc(strlen(str1) + strlen(str2) + 2);
strcpy(res_str, str1);
if (strlen(str1) != 0 && strlen(str2) != 0)
strcat(res_str, " ");
strcat(res_str, str2);
free(str1); <------------------
free(str2); <------------------
return res_str;
}
So that ought to dump core if you don't make all the productions
return malloc'd strings. How did you work around that?
(Maybe it'd be okay to just leak all the strings?)
regards, tom lane
Hi,
On 2024-04-09 15:33:10 -0700, Andres Freund wrote:
Which leads to a control flow graph with *many* incoming edges to the basic
block containing the function call to mm_strdup(), triggering a normally
harmless O(N^2) or such.
With clang-16 -O2 there is a basic block with 3904 incoming basic blocks. With
the hacked up preproc.y it's 2968. A 30% increase leading to a doubling of
runtime imo seems consistent with my theory of there being some ~quadratic
behaviour.
I suspect that this is also what's causing gram.c compilation to be fairly
slow, with both clang and gcc. There aren't as many pstrdup()s in gram.y as
the are mm_strdup() in preproc.y, but there still are many.
ISTM that there are many pstrdup()s that really make very little sense. I
think it's largely because there are many rules declared %type <str>, which
prevents us from returning a string constant without a warning.
There may be (?) some rules that modify strings returned by subsidiary rules,
but if so, it can't be many.
Greetings,
Andres
Hi,
On 2024-04-09 19:00:41 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I think we need to do something about the compile time of this file, even with
gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
top makes it considerably worse.Seems reasonable, if we can.
Why are strduping all of these?
IIRC, the issue is that the mechanism for concatenating the tokens
back together frees the input strings
Ah, that explains it - but also seems somewhat unnecessary.
So that ought to dump core if you don't make all the productions
return malloc'd strings. How did you work around that?
I just tried to get to the point of understanding the reasons for slow
compilation, not to actually keep it working :). I.e. I didn't.
(Maybe it'd be okay to just leak all the strings?)
Hm. The input to ecpg can be fairly large, I guess. And we have fun code like
cat_str(), which afaict is O(arguments^2) in its memory usage if we wouldn't
free?
Not immediately sure what the right path is.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-04-09 19:00:41 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Why are strduping all of these?
IIRC, the issue is that the mechanism for concatenating the tokens
back together frees the input strings
Ah, that explains it - but also seems somewhat unnecessary.
I experimented with replacing mm_strdup() with
#define mm_strdup(x) (x)
As you did, I wasn't trying to get to a working result, so I didn't do
anything about removing all the free's or fixing the cast-away-const
warnings. The result was disappointing though. On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec. Which is better, but not enough better to persuade
me to do all the janitorial work of restructuring ecpg's
string-slinging. I think we haven't really identified the problem.
As a comparison point, compiling gram.o on the same machine
takes 1.3sec. So I am seeing a problem here, sure enough,
although not as bad as it is in some other clang versions.
regards, tom lane
Hi,
On 2024-04-09 19:44:03 -0400, Tom Lane wrote:
I experimented with replacing mm_strdup() with
#define mm_strdup(x) (x)
As you did, I wasn't trying to get to a working result, so I didn't do
anything about removing all the free's or fixing the cast-away-const
warnings. The result was disappointing though. On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec. Which is better, but not enough better to persuade
me to do all the janitorial work of restructuring ecpg's
string-slinging. I think we haven't really identified the problem.
With what level of optimization was that? It kinda looks like their version
might be from before the worst of the issue...
FWIW, just redefining mm_strdup() that way doesn't help much here either,
even with an affected compiler. The gain increases substantially after
simplifying unreserved_keyword etc to just use the default action.
I think having the non-default actions for those branches leaves you with a
similar issue, each of the actions just set a register, storing that and going
to the loop iteration is the same.
FWIW:
clang-19 -O2
"plain" 0m24.354s
mm_strdup redefined 0m23.741s
+use default action 0m14.218s
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-04-09 19:44:03 -0400, Tom Lane wrote:
As you did, I wasn't trying to get to a working result, so I didn't do
anything about removing all the free's or fixing the cast-away-const
warnings. The result was disappointing though. On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec. Which is better, but not enough better to persuade
me to do all the janitorial work of restructuring ecpg's
string-slinging. I think we haven't really identified the problem.
With what level of optimization was that? It kinda looks like their version
might be from before the worst of the issue...
Just the autoconf-default -O2.
FWIW, just redefining mm_strdup() that way doesn't help much here either,
even with an affected compiler. The gain increases substantially after
simplifying unreserved_keyword etc to just use the default action.
Hm.
In any case, this is all moot unless we can come to a new design for
how ecpg does its string-mashing. Thoughts?
I thought for a bit about not allocating strings as such, but just
passing around pointers into the source text plus lengths, and
reassembling the string data only at the end when we need to output it.
Not sure how well that would work, but it could be a starting point.
regards, tom lane
Hi,
On 2024-04-09 20:12:48 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
FWIW, just redefining mm_strdup() that way doesn't help much here either,
even with an affected compiler. The gain increases substantially after
simplifying unreserved_keyword etc to just use the default action.Hm.
In any case, this is all moot unless we can come to a new design for
how ecpg does its string-mashing. Thoughts?
Tthere might be a quick-n-dirty way: We could make pgc.l return dynamically
allocated keywords.
Am I missing something, or is ecpg string handling almost comically
inefficient? Building up strings in tiny increments, which then get mashed
together to get slightly larger pieces, just to then be mashed together again?
It's like an intentional allocator stress test.
It's particularly absurd because in the end we just print those strings, after
carefully assembling them...
I thought for a bit about not allocating strings as such, but just
passing around pointers into the source text plus lengths, and
reassembling the string data only at the end when we need to output it.
Not sure how well that would work, but it could be a starting point.
I was wondering about something vaguely similar: Instead of concatenating
individual strings, append them to a stringinfo. The stringinfo can be reset
individually...
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-04-09 20:12:48 -0400, Tom Lane wrote:
In any case, this is all moot unless we can come to a new design for
how ecpg does its string-mashing. Thoughts?
Am I missing something, or is ecpg string handling almost comically
inefficient? Building up strings in tiny increments, which then get mashed
together to get slightly larger pieces, just to then be mashed together again?
It's like an intentional allocator stress test.
It's particularly absurd because in the end we just print those strings, after
carefully assembling them...
It is that. Here's what I'm thinking: probably 90% of what ecpg
does is to verify that a chunk of its input represents a valid bit
of SQL (or C) syntax and then emit a representation of that chunk.
Currently, that representation tends to case-normalize tokens and
smash inter-token whitespace and comments to a single space.
I propose though that neither of those behaviors is mission-critical,
or even all that desirable. I think few users would complain if
ecpg preserved the input's casing and spacing and comments.
Given that definition, most of ecpg's productions (certainly just
about all the auto-generated ones) would simply need to return a
pointer and length describing a part of the input string. There are
places where ecpg wants to insert some text it generates, and I think
it might need to re-order text in a few places, so we need a
production result representation that can cope with those cases ---
but if we can make "regurgitate the input" cases efficient, I think
we'll have licked the performance problem.
With that in mind, I wonder whether we couldn't make the simple
cases depend on bison's existing support for location tracking.
In which case, the actual actions for all those cases could be
default, achieving one of the goals you mention.
Obviously, this is not going to be a small lift, but it kind
of seems do-able.
regards, tom lane
On Wed, Apr 10, 2024 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
... On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec.
Having seen multi-minute compile times on FreeBSD (where clang is the
system compiler) and Debian (where I get packages from apt.llvm.org),
I have been quietly waiting for this issue to hit Mac users too (where
a clang with unknown proprietary changes is the system compiler), but
it never did. Huh.
I tried to understand a bit more about Apple's version soup. This
seems to be an up-to-date table (though I don't understand their
source of information):
https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_(since_visionOS_support)_2
According to cc -v on my up-to-date MacBook Air, it has "Apple clang
version 15.0.0 (clang-1500.3.9.4)", which, if the table is correct,
means that it's using LLVM 16.0.0 (note, not 16.0.6, the final version
of that branch of [open] LLVM, and the version I saw the issue with on
FreeBSD and Debian). They relabel everything to match the Xcode
version that shipped it, and they're currently off by one.
I wondered if perhaps the table just wasn't accurate in the final
digits, so I looked for clues in strings in the binary, and sure
enough it contains "LLVM 15.0.0". My guess would be that they've
clobbered the major version, but not the rest: the Xcode version is
15.3, and I don't see a 3, so I guess this is really derived from LLVM
16.0.0.
One explanation would be that they rebase their proprietary bits and
pieces over the .0 version of each major release, and then cherry-pick
urgent fixes and stuff later, not pulling in the whole minor release;
they also presumably have to maintain it for much longer than the LLVM
project's narrow support window. Who knows. So now I wonder if it
could be that LLVM 16.0.6 does this, but LLVM 16.0.0 doesn't.
I installed clang-16 (16.0.6) with MacPorts, and it does show the problem.
Thomas Munro <thomas.munro@gmail.com> writes:
On Wed, Apr 10, 2024 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
... On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec.
Having seen multi-minute compile times on FreeBSD (where clang is the
system compiler) and Debian (where I get packages from apt.llvm.org),
I have been quietly waiting for this issue to hit Mac users too (where
a clang with unknown proprietary changes is the system compiler), but
it never did. Huh.
I don't doubt that there are other clang versions where the problem
bites a lot harder. What result do you get from the test I tried
(turning mm_strdup into a no-op macro)?
regards, tom lane
On Wed, Apr 10, 2024 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't doubt that there are other clang versions where the problem
bites a lot harder. What result do you get from the test I tried
(turning mm_strdup into a no-op macro)?
#define mm_strdup(x) (x) does this:
Apple clang 15: master: 14s -> 9s
MacPorts clang 16, master: 170s -> 10s
Thomas Munro <thomas.munro@gmail.com> writes:
On Wed, Apr 10, 2024 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't doubt that there are other clang versions where the problem
bites a lot harder. What result do you get from the test I tried
(turning mm_strdup into a no-op macro)?
#define mm_strdup(x) (x) does this:
Apple clang 15: master: 14s -> 9s
MacPorts clang 16, master: 170s -> 10s
Wow. So (a) it's definitely worth doing something about this;
but (b) anything that would move the needle would probably require
significant refactoring of ecpg's string handling, and I doubt we
want to consider that post-feature-freeze. The earliest we could
land such a fix would be ~ July, if we follow past schedules.
The immediate question then is do we want to take Jelte's patch
as a way to ameliorate the pain meanwhile. I'm kind of down on
it, because AFAICS what would happen if you break the core
grammar is that (most likely) the failure would be reported
against ecpg first. That seems pretty confusing. However, maybe
it's tolerable as a short-term band-aid that we plan to revert later.
I would not commit the makefile side of the patch though, as
that creates the same problem for makefile users while providing
little benefit.
As for the longer-term fix, I'd be willing to have a go at
implementing the sketch I wrote last night; but I'm also happy
to defer to anyone else who's hot to work on this.
regards, tom lane
On 10.04.24 17:33, Tom Lane wrote:
The immediate question then is do we want to take Jelte's patch
as a way to ameliorate the pain meanwhile. I'm kind of down on
it, because AFAICS what would happen if you break the core
grammar is that (most likely) the failure would be reported
against ecpg first. That seems pretty confusing.
Yeah that would be confusing.
I suppose we could just take the part of the patch that moves up preproc
among the subdirectories of ecpg, but I don't know if that by itself
would really buy anything.
On Wed, 17 Apr 2024 at 13:41, Peter Eisentraut <peter@eisentraut.org> wrote:
On 10.04.24 17:33, Tom Lane wrote:
The immediate question then is do we want to take Jelte's patch
as a way to ameliorate the pain meanwhile. I'm kind of down on
it, because AFAICS what would happen if you break the core
grammar is that (most likely) the failure would be reported
against ecpg first. That seems pretty confusing.Yeah that would be confusing.
How can I test if this actually happens? Because it sounds like if
that indeed happens it should be fixable fairly easily.
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
How can I test if this actually happens? Because it sounds like if
that indeed happens it should be fixable fairly easily.
Break gram.y (say, misspell some token in a production) and
see what happens. The behavior's likely to be timing sensitive
though.
regards, tom lane
On Wed, 17 Apr 2024 at 16:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Break gram.y (say, misspell some token in a production) and
see what happens. The behavior's likely to be timing sensitive
though.
Thanks for clarifying. It took me a little while to break gram.y in
such a way that I was able to consistently reproduce, but I managed in
the end using the attached small diff.
And then running ninja in non-parallel mode:
ninja -C build all -j1
As I expected this problem was indeed fairly easy to address by still
building "backend/parser" before "interfaces". See attached patch.
Attachments:
v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patchapplication/octet-stream; name=v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patchDownload
From f8156314eaf9d9df3cb4d20e281bacf5cd853017 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Fri, 5 Apr 2024 00:38:02 +0200
Subject: [PATCH v1] Speed up clean parallel meson builds a lot
Building the generated ecpg preproc file can take a long time. You can
check how long using:
ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o
This moves that file much closer to the top of our build order, so
building it can be pipelined much better with other files.
It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds. Perfermance gains are obviously dependent on
compiler, compile flags and machine.
---
src/backend/meson.build | 9 +--------
src/interfaces/ecpg/meson.build | 2 +-
src/meson.build | 21 ++++++++++++++++++---
3 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/src/backend/meson.build b/src/backend/meson.build
index 436c04af080..f8f86c8a8ed 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -1,12 +1,6 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
-backend_build_deps = [backend_code]
-backend_sources = []
-backend_link_with = [pgport_srv, common_srv]
-
-generated_backend_sources = []
-post_export_backend_sources = []
-
+# NB: parser is entered directly from the meson file in the src directory
subdir('access')
subdir('archive')
subdir('backup')
@@ -21,7 +15,6 @@ subdir('libpq')
subdir('main')
subdir('nodes')
subdir('optimizer')
-subdir('parser')
subdir('partitioning')
subdir('port')
subdir('postmaster')
diff --git a/src/interfaces/ecpg/meson.build b/src/interfaces/ecpg/meson.build
index ac42a70a31f..05db32c226e 100644
--- a/src/interfaces/ecpg/meson.build
+++ b/src/interfaces/ecpg/meson.build
@@ -3,9 +3,9 @@
ecpg_targets = []
subdir('include')
+subdir('preproc')
subdir('pgtypeslib')
subdir('ecpglib')
subdir('compatlib')
-subdir('preproc')
alias_target('ecpg', ecpg_targets)
diff --git a/src/meson.build b/src/meson.build
index 65c7d17d08f..f331eb3b68c 100644
--- a/src/meson.build
+++ b/src/meson.build
@@ -1,7 +1,24 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+backend_build_deps = [backend_code]
+backend_sources = []
+backend_link_with = [pgport_srv, common_srv]
+
+generated_backend_sources = []
+post_export_backend_sources = []
+
# libraries that other subsystems might depend upon first, in their respective
-# dependency order
+# dependency order. Apart from "interfaces", which we put at the top because
+# that speeds up parallel builds significantly since our generated ecpg file
+# takes a very long time to build so we want to start that as soon as
+# possible. We also move backend/parser to the top for the same reason, but
+# also because we want to start building the parser before the ecpg files.
+# Otherwise an error in the grammer would confusingly show up as an issue in
+# the ecpg file.
+
+subdir('backend/parser')
+
+subdir('interfaces')
subdir('timezone')
@@ -11,8 +28,6 @@ subdir('bin')
subdir('pl')
-subdir('interfaces')
-
subdir('tools/pg_bsd_indent')
base-commit: 0bd4b0689ba1f12fbbd9919ca76a71df3e7702a2
--
2.34.1
trigger-confusing-build-error.diffapplication/octet-stream; name=trigger-confusing-build-error.diffDownload
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e8b619926ef..52bd2f792d7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4408,7 +4408,7 @@ columnList:
;
optionalPeriodName:
- ',' PERIOD columnElem { $$ = $3; }
+ ',' columnElem { $$ = $2; }
| /*EMPTY*/ { $$ = NULL; }
;
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
As I expected this problem was indeed fairly easy to address by still
building "backend/parser" before "interfaces". See attached patch.
I think we should hold off on this. I found a simpler way to address
ecpg's problem than what I sketched upthread. I have a not-ready-to-
show-yet patch that allows the vast majority of ecpg's grammar
productions to use the default semantic action. Testing on my M1
Macbook with clang 16.0.6 from MacPorts, I see the compile time for
preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.
The core idea of the patch is to get rid of <str> results from
grammar nonterminals and instead pass the strings back as yylloc
results, which we can do by redefining YYLTYPE as "char *"; since
ecpg isn't using Bison's location logic for anything, this is free.
Then we can implement a one-size-fits-most token concatenation
rule in YYLLOC_DEFAULT, and only the various handmade rules that
don't want to just concatenate their inputs need to do something
different.
The patch presently passes regression tests, but its memory management
is shoddy as can be (basically "leak like there's no tomorrow"), and
I want to fix that before presenting it. One could almost argue that
we don't care about memory consumption of the ecpg preprocessor;
but I think it's possible to do better.
regards, tom lane
On 2024-04-17 23:10:53 -0400, Tom Lane wrote:
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
As I expected this problem was indeed fairly easy to address by still
building "backend/parser" before "interfaces". See attached patch.I think we should hold off on this. I found a simpler way to address
ecpg's problem than what I sketched upthread. I have a not-ready-to-
show-yet patch that allows the vast majority of ecpg's grammar
productions to use the default semantic action. Testing on my M1
Macbook with clang 16.0.6 from MacPorts, I see the compile time for
preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.
That's pretty amazing.
Andres Freund <andres@anarazel.de> writes:
On 2024-04-17 23:10:53 -0400, Tom Lane wrote:
I think we should hold off on this. I found a simpler way to address
ecpg's problem than what I sketched upthread. I have a not-ready-to-
show-yet patch that allows the vast majority of ecpg's grammar
productions to use the default semantic action. Testing on my M1
Macbook with clang 16.0.6 from MacPorts, I see the compile time for
preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.
That's pretty amazing.
Patch posted at [1]/messages/by-id/2011420.1713493114@sss.pgh.pa.us.
regards, tom lane
On Wed, Apr 17, 2024 at 11:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should hold off on this. I found a simpler way to address
ecpg's problem than what I sketched upthread. I have a not-ready-to-
show-yet patch that allows the vast majority of ecpg's grammar
productions to use the default semantic action. Testing on my M1
Macbook with clang 16.0.6 from MacPorts, I see the compile time for
preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.
If this is the consensus opinion, then
https://commitfest.postgresql.org/48/4914/ should be marked Rejected.
However, while I think the improvements that Tom was able to make here
sound fantastic, I don't understand why that's an argument against
Jelte's patches. After all, Tom's work will only go into v18, but this
patch could be adopted in v17 and back-patched to all releases that
support meson builds, saving oodles of compile time for as long as
those releases are supported. The most obvious beneficiary of that
course of action would seem to be Tom himself, since he back-patches
more fixes than anybody, last I checked, but it'd be also be useful to
get slightly quicker results from the buildfarm and slightly quicker
results for anyone using CI on back-branches and for other hackers who
are looking to back-patch bug fixes. I don't quite understand why we
want to throw those potential benefits out the window just because we
have a better fix for the future.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
If this is the consensus opinion, then
https://commitfest.postgresql.org/48/4914/ should be marked Rejected.
However, while I think the improvements that Tom was able to make here
sound fantastic, I don't understand why that's an argument against
Jelte's patches. After all, Tom's work will only go into v18, but this
patch could be adopted in v17 and back-patched to all releases that
support meson builds, saving oodles of compile time for as long as
those releases are supported. The most obvious beneficiary of that
course of action would seem to be Tom himself, since he back-patches
more fixes than anybody, last I checked, but it'd be also be useful to
get slightly quicker results from the buildfarm and slightly quicker
results for anyone using CI on back-branches and for other hackers who
are looking to back-patch bug fixes. I don't quite understand why we
want to throw those potential benefits out the window just because we
have a better fix for the future.
As I mentioned upthread, I'm more worried about confusing error
reports than the machine time. It would save me personally exactly
nada, since (a) I usually develop with gcc not clang, (b) when
I do use clang it's not a heavily-affected version, and (c) since
we *very* seldom change the grammar in stable branches, ccache will
hide the problem pretty effectively anyway in the back branches.
(If you're not using ccache, please don't complain about build time.)
I grant that there are people who are more affected, but still, I'd
just as soon not contort the build rules for a temporary problem.
regards, tom lane
On Fri, May 17, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
As I mentioned upthread, I'm more worried about confusing error
reports than the machine time.
Well, Jelte fixed that.
I grant that there are people who are more affected, but still, I'd
just as soon not contort the build rules for a temporary problem.
Arguably by doing this, but I don't think it's enough of a contortion
to get excited about.
Anyone else want to vote?
--
Robert Haas
EDB: http://www.enterprisedb.com
On 17.05.24 23:01, Robert Haas wrote:
On Fri, May 17, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
As I mentioned upthread, I'm more worried about confusing error
reports than the machine time.Well, Jelte fixed that.
I grant that there are people who are more affected, but still, I'd
just as soon not contort the build rules for a temporary problem.Arguably by doing this, but I don't think it's enough of a contortion
to get excited about.Anyone else want to vote?
I retested the patch from 2024-04-07 (I think that's the one that "fixed
that"? There are multiple "v1" patches in this thread.) using gcc-14
and clang-18, with ccache disabled of course. The measured effects of
the patch are:
gcc-14: 1% slower
clang-18: 3% faster
So with that, it doesn't seem very interesting.
On 2024-May-17, Robert Haas wrote:
Anyone else want to vote?
I had pretty much the same thought as you. It seems a waste to leave
the code in existing branches be slow only because we have a much better
approach for a branch that doesn't even exist yet.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"We're here to devour each other alive" (Hobbes)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2024-May-17, Robert Haas wrote:
Anyone else want to vote?
I had pretty much the same thought as you. It seems a waste to leave
the code in existing branches be slow only because we have a much better
approach for a branch that doesn't even exist yet.
I won't complain too loudly as long as we remember to revert the
patch from HEAD once the ecpg fix goes in.
regards, tom lane
On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote:
A few tests with ccache disabled:
These tests seem to show no difference between the two releases, so I
wonder what you're intending to demonstrate here.
Locally, I have ninja 1.11.1. I'm sure Andres will be absolutely
shocked, shocked I say, to hear that I haven't upgraded to the very
latest.
Anyway, I tried a clean build with CC=clang without the patch and then
with the patch and got:
unpatched - real 1m9.872s
patched - real 1m6.130s
That's the time to run my script that first calls meson setup and then
afterward runs ninja. I tried ninja -v and put the output to a file.
With the patch:
[292/2402] /opt/local/bin/perl
../pgsql/src/interfaces/ecpg/preproc/parse.pl --srcdir
../pgsql/src/interfaces/ecpg/preproc --parser
../pgsql/src/interfaces/ecpg/preproc/../../../backend/parser/gram.y
--output src/interfaces/ecpg/preproc/preproc.y
And without:
[1854/2402] /opt/local/bin/perl
../pgsql/src/interfaces/ecpg/preproc/parse.pl --srcdir
../pgsql/src/interfaces/ecpg/preproc --parser
../pgsql/src/interfaces/ecpg/preproc/../../../backend/parser/gram.y
--output src/interfaces/ecpg/preproc/preproc.y
With my usual CC='ccache clang', I get real 0m37.500s unpatched and
real 0m37.786s patched. I also tried this with the original v1 patch
(not to be confused with the revised, still-v1 patch) and got real
37.950s.
So I guess I'm now feeling pretty unexcited about this. If the patch
really did what the subject line says, that would be nice to have.
However, since Tom will patch the underlying problem in v18, this will
only help people building the back-branches. Of those, it will only
help the people using a slow compiler; I read the thread as suggesting
that you need to be using clang rather than gcc and also not using
ccache. Plus, it sounds like you also need an old ninja version. Even
then, the highest reported savings is 10s and I only save 3s. I think
that's a small enough savings with enough conditionals that we should
not bother. It seems fine to say that people who need the fastest
possible builds of back-branches should use at least one of gcc,
ccache, and ninja >= 1.12.
I'll go mark this patch Rejected in the CommitFest. Incidentally, this
thread is an excellent object lesson in why it's so darn hard to cut
down the size of the CF. I mean, this is a 7-line patch and we've now
got a 30+ email thread about it. In my role as temporary
self-appointed wielder of the CommitFest mace, I have now spent
probably a good 2 hours trying to figure out what to do about it.
There are more than 230 active patches in the CommitFest. That means
CommitFest management would be more than a full-time job for an
experienced committer even if every patch were as simple as this one,
which is definitely not the case. If we want to restore some sanity
here, we're going to have to find some way of distributing the burden
across more people, including the patch authors. Note that Jelte's
last update to the thread is over a month ago. I'm not saying that
he's done something wrong, but I do strongly suspect that the time
that the community as whole has spent on this patch is a pretty
significant multiple of the time that he personally spent on it, and
such dynamics are bound to create scaling issues. I don't know how we
do better: I just want to highlight the problem.
--
Robert Haas
EDB: http://www.enterprisedb.com
Import Notes
Reply to msg id not found: 20240518220902.7pircee5i3i2nu3a@awork3.anarazel.de
On 2024-05-20 09:37:46 -0400, Robert Haas wrote:
On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote:
A few tests with ccache disabled:
These tests seem to show no difference between the two releases, so I
wonder what you're intending to demonstrate here.
They show a few seconds of win for 'real' time.
-O0: 0m21.577s->0m19.529s
-O3: 0m59.730s->0m54.853s
On Mon, May 20, 2024 at 11:37 AM Andres Freund <andres@anarazel.de> wrote:
On 2024-05-20 09:37:46 -0400, Robert Haas wrote:
On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote:
A few tests with ccache disabled:
These tests seem to show no difference between the two releases, so I
wonder what you're intending to demonstrate here.They show a few seconds of win for 'real' time.
-O0: 0m21.577s->0m19.529s
-O3: 0m59.730s->0m54.853s
Ah, OK, I think I misinterpreted.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 19.05.24 00:09, Andres Freund wrote:
On 2024-05-18 00:35:08 +0200, Peter Eisentraut wrote:
I retested the patch from 2024-04-07 (I think that's the one that "fixed
that"? There are multiple "v1" patches in this thread.) using gcc-14 and
clang-18, with ccache disabled of course. The measured effects of the patch
are:gcc-14: 1% slower
clang-18: 3% fasterSo with that, it doesn't seem very interesting.
I wonder whether the reason you're seing less benefit than Jelte is that - I'm
guessing - you now used ninja 1.12 and Jelte something older. Ninja 1.12
prioritizes building edges using a "critical path" heuristic, leading to
scheduling nodes with more incoming dependencies, and deeper in the dependency
graph earlier.
Yes! Very interesting!
With ninja 1.11 and gcc-14, I see the patch gives about a 17% speedup.
Import Notes
Reply to msg id not found: 20240518220902.7pircee5i3i2nu3a@awork3.anarazel.de