Clang optimiser vs preproc.c
Hi,
FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
clang15: ~16s
clang16: ~211s
clang17: ~233s
First noticed on FreeBSD (where the system cc is now clang16), but
reproduced also on Debian (via apt.llvm.org packages).
Thomas Munro <thomas.munro@gmail.com> writes:
FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
clang15: ~16s
clang16: ~211s
clang17: ~233s
What are the numbers for gram.c?
regards, tom lane
On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
clang15: ~16s
clang16: ~211s
clang17: ~233sWhat are the numbers for gram.c?
clang15: ~3.8s
clang16: ~3.2s
clang17: ~2.9s
Thomas Munro <thomas.munro@gmail.com> writes:
On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
clang15: ~16s
clang16: ~211s
clang17: ~233s
What are the numbers for gram.c?
clang15: ~3.8s
clang16: ~3.2s
clang17: ~2.9s
Huh. There's not that many more productions in the ecpg grammar
than the core, so it doesn't seem likely that this is purely a
size-of-file issue. I'd bet on there being something that clang
doesn't do well about the (very stylized) C code being generated
within the grammar productions.
We actually noticed this or a closely-related problem before [1]/messages/by-id/20220902174033.wgt6li7oiaxmwftn@awork3.anarazel.de
and briefly discussed the possibility of rearranging the generated
code to make it less indigestible to clang. But there was no concrete
idea about what to do specifically, and the thread slid off the radar
screen.
regards, tom lane
[1]: /messages/by-id/20220902174033.wgt6li7oiaxmwftn@awork3.anarazel.de
On Sat, Dec 16, 2023 at 4:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
We actually noticed this or a closely-related problem before [1]
and briefly discussed the possibility of rearranging the generated
code to make it less indigestible to clang. But there was no concrete
idea about what to do specifically, and the thread slid off the radar
screen.
I've never paid attention to the output of -ftime-report before but
this difference stands out pretty clearly with clang16:
---User Time--- --System Time-- --User+System-- ---Wall
Time--- --- Name ---
201.2266 ( 99.6%) 0.0074 ( 99.3%) 201.2341 ( 99.6%) 207.1308 (
99.6%) SLPVectorizerPass
The equivalent line for clang15 is:
3.0979 ( 64.8%) 0.0000 ( 0.0%) 3.0979 ( 64.8%) 3.0996 (
64.8%) SLPVectorizerPass
The thing Andres showed in that other thread was like this (though in
my output it's grown "#2") which is much of the time in 15, but "only"
goes up by a couple of seconds in 16, so it's not our primary problem:
9.1890 ( 73.1%) 0.0396 ( 23.9%) 9.2286 ( 72.4%) 9.6586 (
72.9%) Greedy Register Allocator #2
Hi,
On 2023-12-15 22:19:56 -0500, Tom Lane wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
clang15: ~16s
clang16: ~211s
clang17: ~233s
Is this with non-assert clangs? Because I see times that seem smaller by more
than what can be explained by hardware differences:
preproc.c:
17 10.270s
16 9.685s
15 8.300s
gram.c:
17 1.936s
16 2.131s
15 2.161s
That's still bad, but a far cry away from 233s.
Huh. There's not that many more productions in the ecpg grammar
than the core, so it doesn't seem likely that this is purely a
size-of-file issue. I'd bet on there being something that clang
doesn't do well about the (very stylized) C code being generated
within the grammar productions.We actually noticed this or a closely-related problem before [1]
and briefly discussed the possibility of rearranging the generated
code to make it less indigestible to clang. But there was no concrete
idea about what to do specifically, and the thread slid off the radar
screen.
One interest aspect might be that preproc.c ends up with ~33% more states than
gram.c
gram.c:
#define YYLAST 116587
preproc.c:
#define YYLAST 155495
Greetings,
Andres Freund
On Sun, Dec 17, 2023 at 1:29 AM Andres Freund <andres@anarazel.de> wrote:
On 2023-12-15 22:19:56 -0500, Tom Lane wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
On Sat, Dec 16, 2023 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
FYI, it looks like there is a big jump in CPU time to compile preproc.c at -O2:
clang15: ~16s
clang16: ~211s
clang17: ~233sIs this with non-assert clangs? Because I see times that seem smaller by more
than what can be explained by hardware differences:preproc.c:
17 10.270s
16 9.685s
15 8.300sgram.c:
17 1.936s
16 2.131s
15 2.161sThat's still bad, but a far cry away from 233s.
Hrmph. Well something weird is going on, but it might indeed involve
me being confused about debug options of the compiler itself. How can
one find out which build options were used for clang/llvm compiler +
libraries? My earlier reports were from a little machine at home, so
let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running
Debian 12, again using packages from apt.llvm.org:
17 ~198s
16 ~14s
15 ~11s
OK so even if we ignore the wild outlier it is getting significantly
slower. But... huh, there goes the big jump, but at a different
version than I saw with FBSD's packages. Here's what perf says it's
doing:
+ 99.42% 20.12% clang-17 libLLVM-17.so.1 [.]
llvm::slpvectorizer::BoUpSLP::getTreeCost
◆
+ 96.91% 0.00% clang-17 libLLVM-17.so.1 [.]
llvm::SLPVectorizerPass::runImpl
▒
+ 96.91% 0.00% clang-17 libLLVM-17.so.1 [.]
llvm::SLPVectorizerPass::vectorizeChainsInBlock
▒
+ 96.91% 0.00% clang-17 libLLVM-17.so.1 [.]
llvm::SLPVectorizerPass::vectorizeSimpleInstructions
▒
+ 96.91% 0.00% clang-17 libLLVM-17.so.1 [.]
llvm::SLPVectorizerPass::vectorizeInsertElementInst
▒
+ 96.91% 0.00% clang-17 libLLVM-17.so.1 [.]
llvm::SLPVectorizerPass::tryToVectorizeList
▒
+ 73.79% 0.00% clang-17 libLLVM-17.so.1 [.]
0x00007fbead445cb0
▒
+ 36.88% 36.88% clang-17 libLLVM-17.so.1 [.]
0x0000000001e45cda
▒
+ 3.95% 3.95% clang-17 libLLVM-17.so.1 [.] 0x0000000001e45d11
On Tue, Dec 19, 2023 at 11:42 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Hrmph. Well something weird is going on, but it might indeed involve
me being confused about debug options of the compiler itself. How can
one find out which build options were used for clang/llvm compiler +
libraries? My earlier reports were from a little machine at home, so
let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running
Debian 12, again using packages from apt.llvm.org:17 ~198s
16 ~14s
15 ~11s
And on another Debian machine (this time a VM) also using apt.llvm.org
packages, the huge ~3 minute time occurs with clang-16... hrrrnnnff...
seems like there must be some other variable here that I haven't
spotted yet...
Hello Thomas,
19.12.2023 07:20, Thomas Munro wrote:
On Tue, Dec 19, 2023 at 11:42 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Hrmph. Well something weird is going on, but it might indeed involve
me being confused about debug options of the compiler itself. How can
one find out which build options were used for clang/llvm compiler +
libraries? My earlier reports were from a little machine at home, so
let's try again on an i9-9900 CPU @ 3.10GHz (a bit snappier) running
Debian 12, again using packages from apt.llvm.org:17 ~198s
16 ~14s
15 ~11sAnd on another Debian machine (this time a VM) also using apt.llvm.org
packages, the huge ~3 minute time occurs with clang-16... hrrrnnnff...
seems like there must be some other variable here that I haven't
spotted yet...
Reproduced here, with clang-16 and clang-17 (on Ubuntu 22.04, on Fedora 39).
Namely, I tried on Ubuntu
clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04 and
clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04
from https://github.com/llvm/llvm-project/releases, as follows:
PATH=".../clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin:$PATH" CC=clang-17 CPPFLAGS="-O2" sh -c "./configure -q
--enable-debug --enable-cassert && time make >make.log"
and then
PATH=".../clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin:$PATH" sh -c "cd src/interfaces/ecpg/preproc; time
clang-17 -v -g -I../include -I../../../../src/interfaces/ecpg/include -I. -I../../../../src/interfaces/ecpg/ecpglib
-I../../../../src/interfaces/libpq -I../../../../src/include -O2 -D_GNU_SOURCE -c -o preproc.o preproc.c"
144.59user 0.25system 2:24.94elapsed 99%CPU (0avgtext+0avgdata 217320maxresident)k
The same is observed with clang-16 (16.0.6 20231112100510+7cbf1a259152...)
installed from http://apt.llvm.org/jammy/.
(Adding parameters -fno-slp-vectorize or -mllvm -slp-threshold=100000 or
-mllvm -slp-max-vf=3 decreases/normalizes compilation time.)
On a fresh vagrant image "fedora/39-cloud-base", I tried versions
16.0.0~rc4, 17.0.0~rc3, 17.0.6 downloaded from
https://koji.fedoraproject.org/koji/packageinfo?packageID=21848
https://koji.fedoraproject.org/koji/packageinfo?packageID=5646
All of them also give 2+ minutes for me.
But I see no slowdown with version 15.0.7 on the same VM.
Also, I see no issue with clang-18, installed on Ubuntu from apt.llvm.org.
So, as far as I can see, this anomaly started from clang-16, and ended with
clang-18.
Comparing histories of SLPVectorizer.cpp:
https://github.com/llvm/llvm-project/commits/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
https://github.com/llvm/llvm-project/commits/release/17.x/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
I see a commit that probably could fix the issue in the master branch (for
clang-18):
[SLP][NFC]Improve compile time by storing all nodes for the given ...
Though I still can't explain how you get ~14s with clang-16. Could you show
the exact sequence of commands you use to measure the duration?
Best regards,
Alexander