broken JIT support on Fedora 40

Started by Pavel Stehuleabout 2 years ago24 messageshackers
Jump to latest
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

after today update, the build with --with-llvm produces broken code, and
make check fails with crash

Upgrade hwdata-0.380-1.fc40.noarch
@updates-testing
Upgraded hwdata-0.379-1.fc40.noarch @@System
Upgrade llvm-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded llvm-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-devel-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded llvm-devel-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-googletest-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded llvm-googletest-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-libs-18.1.0~rc4-1.fc40.i686
@updates-testing
Upgraded llvm-libs-17.0.6-6.fc40.i686 @@System
Upgrade llvm-libs-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded llvm-libs-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-static-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded llvm-static-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-test-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Instalovat llvm17-libs-17.0.6-7.fc40.i686
@updates-testing
Instalovat llvm17-libs-17.0.6-7.fc40.x86_64
@updates-testing

Regards

Pavel

#2Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#1)
Re: broken JIT support on Fedora 40

On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:

after today update, the build with --with-llvm produces broken code, and make check fails with crash

Upgrade hwdata-0.380-1.fc40.noarch @updates-testing
Upgraded hwdata-0.379-1.fc40.noarch @@System
Upgrade llvm-18.1.0~rc4-1.fc40.x86_64 @updates-testing
Upgraded llvm-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-devel-18.1.0~rc4-1.fc40.x86_64 @updates-testing
Upgraded llvm-devel-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-googletest-18.1.0~rc4-1.fc40.x86_64 @updates-testing
Upgraded llvm-googletest-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-libs-18.1.0~rc4-1.fc40.i686 @updates-testing
Upgraded llvm-libs-17.0.6-6.fc40.i686 @@System
Upgrade llvm-libs-18.1.0~rc4-1.fc40.x86_64 @updates-testing
Upgraded llvm-libs-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-static-18.1.0~rc4-1.fc40.x86_64 @updates-testing
Upgraded llvm-static-17.0.6-6.fc40.x86_64 @@System
Upgrade llvm-test-18.1.0~rc4-1.fc40.x86_64 @updates-testing
Instalovat llvm17-libs-17.0.6-7.fc40.i686 @updates-testing
Instalovat llvm17-libs-17.0.6-7.fc40.x86_64 @updates-testing

I don't know anything about LLVM, but somehow I'm guessing that people
who do will need more detail than this in order to be able to fix the
problem. I see that support for LLVM 18 was added in commit
d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
were building from an older commit?

--
Robert Haas
EDB: http://www.enterprisedb.com

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#2)
Re: broken JIT support on Fedora 40

Hi

čt 14. 3. 2024 v 19:20 odesílatel Robert Haas <robertmhaas@gmail.com>
napsal:

On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

after today update, the build with --with-llvm produces broken code, and

make check fails with crash

Upgrade hwdata-0.380-1.fc40.noarch

@updates-testing

Upgraded hwdata-0.379-1.fc40.noarch

@@System

Upgrade llvm-18.1.0~rc4-1.fc40.x86_64

@updates-testing

Upgraded llvm-17.0.6-6.fc40.x86_64

@@System

Upgrade llvm-devel-18.1.0~rc4-1.fc40.x86_64

@updates-testing

Upgraded llvm-devel-17.0.6-6.fc40.x86_64

@@System

Upgrade llvm-googletest-18.1.0~rc4-1.fc40.x86_64

@updates-testing

Upgraded llvm-googletest-17.0.6-6.fc40.x86_64

@@System

Upgrade llvm-libs-18.1.0~rc4-1.fc40.i686

@updates-testing

Upgraded llvm-libs-17.0.6-6.fc40.i686

@@System

Upgrade llvm-libs-18.1.0~rc4-1.fc40.x86_64

@updates-testing

Upgraded llvm-libs-17.0.6-6.fc40.x86_64

@@System

Upgrade llvm-static-18.1.0~rc4-1.fc40.x86_64

@updates-testing

Upgraded llvm-static-17.0.6-6.fc40.x86_64

@@System

Upgrade llvm-test-18.1.0~rc4-1.fc40.x86_64

@updates-testing

Instalovat llvm17-libs-17.0.6-7.fc40.i686

@updates-testing

Instalovat llvm17-libs-17.0.6-7.fc40.x86_64

@updates-testing

I don't know anything about LLVM, but somehow I'm guessing that people
who do will need more detail than this in order to be able to fix the
problem. I see that support for LLVM 18 was added in commit
d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
were building from an older commit?

I repeated build and check today, fresh master, today fedora update with
same result

build is ok, but regress tests fails with crash (it works without
-with-llvm)

Regards

Pavel

Show quoted text

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#3)
Re: broken JIT support on Fedora 40

On 14 Mar 2024, at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:

build is ok, but regress tests fails with crash (it works without -with-llvm)

Can you post some details around this crash? It doesn't seem to be a
combination we have covered in the buildfarm.

--
Daniel Gustafsson

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Daniel Gustafsson (#4)
Re: broken JIT support on Fedora 40

On Fri, Mar 15, 2024 at 12:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 14 Mar 2024, at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:

build is ok, but regress tests fails with crash (it works without -with-llvm)

Can you post some details around this crash? It doesn't seem to be a
combination we have covered in the buildfarm.

Yeah, 18.1 (note they switched to 1-based minor numbers, there was no
18.0) just came out a week or so ago. Despite testing their 18 branch
just before their "RC1" tag, as recently as

commit d282e88e50521a457fa1b36e55f43bac02a3167f
Author: Thomas Munro <tmunro@postgresql.org>
Date: Thu Jan 25 10:37:35 2024 +1300

Track LLVM 18 changes.

at which point everything worked, it seems that something changed
before they released. I haven't looked into why yet but it's crashing
on my FreeBSD box too.

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#5)
Re: broken JIT support on Fedora 40

For me it seems that the LLVMRunPasses() call, new in

commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
Author: Thomas Munro <tmunro@postgresql.org>
Date: Wed Oct 18 22:15:54 2023 +1300

jit: Changes for LLVM 17.

is reaching code that segfaults inside libLLVM, specifically in
llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
llvm::AAResults*, bool, llvm::Function*). First obvious question
would be: is that NULL argument still acceptable? Perhaps it wants
our LLVMTargetMachineRef there:

err = LLVMRunPasses(module, passes, NULL, options);

But then when we see what is does with that argument, it arrives at a
place that apparently accepts nullptr.

https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124

Hrmph. Might need an assertion build to learn more. I'll try to look
again next week or so.

#7Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Thomas Munro (#6)
Re: broken JIT support on Fedora 40

On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
For me it seems that the LLVMRunPasses() call, new in

commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
Author: Thomas Munro <tmunro@postgresql.org>
Date: Wed Oct 18 22:15:54 2023 +1300

jit: Changes for LLVM 17.

is reaching code that segfaults inside libLLVM, specifically in
llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
llvm::AAResults*, bool, llvm::Function*). First obvious question
would be: is that NULL argument still acceptable? Perhaps it wants
our LLVMTargetMachineRef there:

err = LLVMRunPasses(module, passes, NULL, options);

But then when we see what is does with that argument, it arrives at a
place that apparently accepts nullptr.

https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124

Hrmph. Might need an assertion build to learn more. I'll try to look
again next week or so.

Looks like I can reproduce this as well, libLLVM crashes when reaching
AddReturnAttributes inside InlineFunction, when trying to access
operands of the return instruction. I think, for whatever reason, the
latest LLVM doesn't like (i.e. do not expect this when performing
inlining pass) return instructions that do not have a return value, and
this is what happens in the outblock of deform function we generate
(slot_compile_deform).

For verification, I've modified the deform.outblock to call LLVMBuildRet
instead of LLVMBuildRetVoid and this seems to help -- inline and deform
stages are still performed as before, but nothing crashes. But of course
it doesn't sound right that inlining pass cannot process such code.
Unfortunately I don't see any obvious change in the recent LLVM history
that would justify this outcome, might be a genuine bug, will
investigate further.

#8Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#7)
Re: broken JIT support on Fedora 40

On Sun, Mar 17, 2024 at 09:02:08PM +0100, Dmitry Dolgov wrote:

On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
For me it seems that the LLVMRunPasses() call, new in

commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
Author: Thomas Munro <tmunro@postgresql.org>
Date: Wed Oct 18 22:15:54 2023 +1300

jit: Changes for LLVM 17.

is reaching code that segfaults inside libLLVM, specifically in
llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
llvm::AAResults*, bool, llvm::Function*). First obvious question
would be: is that NULL argument still acceptable? Perhaps it wants
our LLVMTargetMachineRef there:

err = LLVMRunPasses(module, passes, NULL, options);

But then when we see what is does with that argument, it arrives at a
place that apparently accepts nullptr.

https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124

Hrmph. Might need an assertion build to learn more. I'll try to look
again next week or so.

Looks like I can reproduce this as well, libLLVM crashes when reaching
AddReturnAttributes inside InlineFunction, when trying to access
operands of the return instruction. I think, for whatever reason, the
latest LLVM doesn't like (i.e. do not expect this when performing
inlining pass) return instructions that do not have a return value, and
this is what happens in the outblock of deform function we generate
(slot_compile_deform).

For verification, I've modified the deform.outblock to call LLVMBuildRet
instead of LLVMBuildRetVoid and this seems to help -- inline and deform
stages are still performed as before, but nothing crashes. But of course
it doesn't sound right that inlining pass cannot process such code.
Unfortunately I don't see any obvious change in the recent LLVM history
that would justify this outcome, might be a genuine bug, will
investigate further.

I think I found the change that got it all started [1]https://github.com/llvm/llvm-project/commit/2da4960f20f7e5d88a68ce25636a895284dc66d8, the commit has a
set of tags like 18.1.0-rc1 and is relatively recent. The message
doesn't say anything related to the crash that we see, so I assume it's
indeed a bug. I've opened an issue to confirm this understanding [2]https://github.com/llvm/llvm-project/issues/86162
(wow, issues were indeed moved to github since the last time I was
touching LLVM), let's see what would be the response.

[1]: https://github.com/llvm/llvm-project/commit/2da4960f20f7e5d88a68ce25636a895284dc66d8
[2]: https://github.com/llvm/llvm-project/issues/86162

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Dmitry Dolgov (#8)
Re: broken JIT support on Fedora 40

On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

For verification, I've modified the deform.outblock to call LLVMBuildRet
instead of LLVMBuildRetVoid and this seems to help -- inline and deform
stages are still performed as before, but nothing crashes. But of course
it doesn't sound right that inlining pass cannot process such code.

Thanks for investigating and filing the issue. It doesn't seem to be
moving yet. Do you want to share the LLVMBuildRet() workaround?
Maybe we need to consider shipping something like that in the
meantime?

#10Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Thomas Munro (#9)
Re: broken JIT support on Fedora 40

On Sat, Mar 30, 2024 at 04:38:11PM +1300, Thomas Munro wrote:
On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

For verification, I've modified the deform.outblock to call LLVMBuildRet
instead of LLVMBuildRetVoid and this seems to help -- inline and deform
stages are still performed as before, but nothing crashes. But of course
it doesn't sound right that inlining pass cannot process such code.

Thanks for investigating and filing the issue. It doesn't seem to be
moving yet. Do you want to share the LLVMBuildRet() workaround?
Maybe we need to consider shipping something like that in the
meantime?

Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday
I've opened a one-line PR fix that should address the issue, maybe this
would help. In the meantime I've attached what did work for me as a
workaround -- it essentially just makes the deform function to return
some value. It's ugly, but since call site will ignore that, and it's
only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable.
Give me a moment, I'm going to test this change more (waiting on
rebuilding LLVM, it takes quire a while on my machine :( ), then can
confirm that it works as expected on the latest version.

Attachments:

v1-0001-Workaround-for-deform-crashing-in-LLVM-inliner.patchtext/x-diff; charset=us-asciiDownload+2-2
#11Thomas Munro
thomas.munro@gmail.com
In reply to: Dmitry Dolgov (#10)
Re: broken JIT support on Fedora 40

On Sun, Mar 31, 2024 at 5:59 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday
I've opened a one-line PR fix that should address the issue, maybe this
would help. In the meantime I've attached what did work for me as a
workaround -- it essentially just makes the deform function to return
some value. It's ugly, but since call site will ignore that, and it's
only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable.
Give me a moment, I'm going to test this change more (waiting on
rebuilding LLVM, it takes quire a while on my machine :( ), then can
confirm that it works as expected on the latest version.

Great news. I see your PR:

https://github.com/llvm/llvm-project/pull/87093

Checking their release schedule, they have:

Mar 5th: 18.1.0 was released
Mar 8th: 18.1.1 was released
Mar 19th: 18.1.2 was released
Apr 16th: 18.1.3
Apr 30th: 18.1.4
May 14th: 18.1.5
May 28th: 18.1.6 (if necessary)

Our next release is May 9th. So assuming your PR goes in in the next
couple of weeks and makes it into their 18.1.3 or even .4, there is no
point in pushing a work-around on our side.

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#11)
Re: broken JIT support on Fedora 40

On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see... It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function. It works (I mean doesn't crash)
if you just comment this line out:

llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function? So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..92b4993a98a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from,
LLVMValueRef v_to)
        /* copy function attributes */
        llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex);
-       /* and the return value attributes */
-       llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex);
+       if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) !=
LLVMVoidTypeKind)
+       {
+               /* and the return value attributes */
+               llvm_copy_attributes_at_index(v_from, v_to,
LLVMAttributeReturnIndex);
+       }
#13Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Thomas Munro (#12)
Re: broken JIT support on Fedora 40

On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see... It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function. It works (I mean doesn't crash)
if you just comment this line out:

llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function? So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).

#14Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#13)
Re: broken JIT support on Fedora 40

On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:

On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see... It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function. It works (I mean doesn't crash)
if you just comment this line out:

llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function? So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

Running analysis: VerifierAnalysis on deform_0_1
Attribute 'noundef' applied to incompatible type!

llvm error: Broken function found, compilation aborted!

#15Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#14)
Re: broken JIT support on Fedora 40

On Fri, Apr 05, 2024 at 03:50:50PM +0200, Dmitry Dolgov wrote:

On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:

On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see... It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function. It works (I mean doesn't crash)
if you just comment this line out:

llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function? So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

Running analysis: VerifierAnalysis on deform_0_1
Attribute 'noundef' applied to incompatible type!

llvm error: Broken function found, compilation aborted!

Here is what I have in mind. Interestingly enough, it also shows few
more errors besides "noundef":

Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0
ptr @llvm.lifetime.end.p0i8

It refers to the function from create_LifetimeEnd, not sure how
important is this.

Attachments:

v1-0001-Add-jit_verify_bitcode.patchtext/x-diff; charset=us-asciiDownload+40-5
#16Thomas Munro
thomas.munro@gmail.com
In reply to: Dmitry Dolgov (#15)
Re: broken JIT support on Fedora 40

On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

Running analysis: VerifierAnalysis on deform_0_1
Attribute 'noundef' applied to incompatible type!

llvm error: Broken function found, compilation aborted!

Here is what I have in mind. Interestingly enough, it also shows few
more errors besides "noundef":

Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0
ptr @llvm.lifetime.end.p0i8

It refers to the function from create_LifetimeEnd, not sure how
important is this.

Would it be too slow to run the verify pass always, in assertion
builds? Here's a patch for the original issue, and a patch to try
that idea + a fix for that other complaint it spits out. The latter
would only run for LLVM 17+, but that seems OK.

Attachments:

0001-Fix-illegal-attribute-LLVM-attribute-propagation.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-illegal-attribute-LLVM-attribute-propagation.patchDownload+5-3
0002-Run-LLVM-verify-pass-on-all-generated-IR.patchtext/x-patch; charset=US-ASCII; name=0002-Run-LLVM-verify-pass-on-all-generated-IR.patchDownload+10-4
#17Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Thomas Munro (#16)
Re: broken JIT support on Fedora 40

On Tue, Apr 09, 2024 at 07:07:58PM +1200, Thomas Munro wrote:
On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

Running analysis: VerifierAnalysis on deform_0_1
Attribute 'noundef' applied to incompatible type!

llvm error: Broken function found, compilation aborted!

Here is what I have in mind. Interestingly enough, it also shows few
more errors besides "noundef":

Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0
ptr @llvm.lifetime.end.p0i8

It refers to the function from create_LifetimeEnd, not sure how
important is this.

Would it be too slow to run the verify pass always, in assertion
builds? Here's a patch for the original issue, and a patch to try
that idea + a fix for that other complaint it spits out. The latter
would only run for LLVM 17+, but that seems OK.

Sounds like a good idea. About the overhead, I've done a quick test on the
reproducer at hands, doing explain analyze in a tight loop and fetching
"optimization" timinigs. It gives quite visible difference 96ms p99 with verify
vs 46ms p99 without verify (and a rather low stddev, ~1.5ms). But I can
imagine it's acceptable for a build with assertions.

Btw, I've found there is a C-api for this exposed, which produces the same
warnings for me. Maybe it would be even better this way:

/**
* Toggle adding the VerifierPass for the PassBuilder, ensuring all functions
* inside the module is valid.
*/
void LLVMPassBuilderOptionsSetVerifyEach(LLVMPassBuilderOptionsRef Options,
LLVMBool VerifyEach);

	+       /* In assertion builds, run the LLVM verify pass. */
	+#ifdef USE_ASSERT_CHECKING
	+       LLVMPassBuilderOptionsSetVerifyEach(options, true);
	+#endif
#18Thomas Munro
thomas.munro@gmail.com
In reply to: Dmitry Dolgov (#17)
Re: broken JIT support on Fedora 40

On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

+       /* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+       LLVMPassBuilderOptionsSetVerifyEach(options, true);
+#endif

Thanks, that seems nicer. I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value. Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-       const char *nm = "llvm.lifetime.end.p0i8";
+       const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?
I don't currently feel inclined to go and test this on the ancient
versions we claim to support in back-branches. Perhaps we should just
do this in master, and then it'd be limited to worrying about LLVM
versions 10-18 (see 820b5af7), which have the distinct advantage of
being available in package repositories for testing. Or I suppose we
could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10. Or we
could do it unconditionally, and wait for ancient-LLVM build farm
animals to break if they're going to.

I pushed the illegal attribute fix though. Thanks for the detective work!

(It crossed my mind that perhaps deform functions should have their
own template function, but if someone figures out that that's a good
idea, I think we'll *still* need that change just pushed.)

#19Pavel Stehule
pavel.stehule@gmail.com
In reply to: Thomas Munro (#18)
Re: broken JIT support on Fedora 40

st 10. 4. 2024 v 2:44 odesílatel Thomas Munro <thomas.munro@gmail.com>
napsal:

On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

+       /* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+       LLVMPassBuilderOptionsSetVerifyEach(options, true);
+#endif

Thanks, that seems nicer. I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value. Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-       const char *nm = "llvm.lifetime.end.p0i8";
+       const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?
I don't currently feel inclined to go and test this on the ancient
versions we claim to support in back-branches. Perhaps we should just
do this in master, and then it'd be limited to worrying about LLVM
versions 10-18 (see 820b5af7), which have the distinct advantage of
being available in package repositories for testing. Or I suppose we
could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10. Or we
could do it unconditionally, and wait for ancient-LLVM build farm
animals to break if they're going to.

I pushed the illegal attribute fix though. Thanks for the detective work!

(It crossed my mind that perhaps deform functions should have their
own template function, but if someone figures out that that's a good
idea, I think we'll *still* need that change just pushed.)

all tests passed on fc 40 without problems

Thank you

Pavel

#20Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Thomas Munro (#18)
Re: broken JIT support on Fedora 40

On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote:
On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

+       /* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+       LLVMPassBuilderOptionsSetVerifyEach(options, true);
+#endif

Thanks, that seems nicer. I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value. Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline,
but at least locally for me installcheck became only few percent slower
with the verify pass.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-       const char *nm = "llvm.lifetime.end.p0i8";
+       const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?

I'm not sure, but inclined to think it's the same as with noundef -- a
mistake, which was revealed in some recent version of LLVM. From what I
understand the suffix i8 indicates an overloaded argument of that type,
which is probably not needed. At the same time I can't get this error
from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by
accident). To make it even more confusing I've found a few similar
examples in other projects, where this was really triggered by an issue
in LLVM [1]https://github.com/rust-lang/rust/issues/102738.

[1]: https://github.com/rust-lang/rust/issues/102738

#21Andres Freund
andres@anarazel.de
In reply to: Dmitry Dolgov (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#18)
#23Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Robert Haas (#22)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Dmitry Dolgov (#23)