New LLVM JIT Features

Started by preejackieabout 7 years ago7 messagesgeneral
Jump to latest
#1preejackie
praveenvelliengiri@gmail.com

Hi

I'm Praveen Velliengiri, student from India. I'm working on developing a
Speculative compilation support in LLVM ORC JIT Infrastructure.

As LLVM ORC supports compiling in multiple backend threads, it would be
effective if we compile the functions speculatively before they are
called by the executing function. So when we request JIT to compile a
function, JIT will immediately returns the function address for raw
executable bits. This will greatly reduce the JIT latencies in modern
multi-core machines. And also I'm working on designing a ORC in-place
dynamic profiling support, by this JIT will automatically able to
identify the hot functions, and compile it in higher optimization level
to achieve good performance.

I'm proposing this project for GSoC 2019. It would be helpful to know
how this new features are effective to pgsql engine, so that I include
your comments in "View from Clients" proposal section.

Please reply :)

--

Have a great day!
PreeJackie

#2preejackie
praveenvelliengiri@gmail.com
In reply to: preejackie (#1)
Re: New LLVM JIT Features

hi
I'm following up on this request, please feel free to reply

On Apr 2, 2019 12:51 AM, "preejackie" <praveenvelliengiri@gmail.com> wrote:

Show quoted text

Hi

I'm Praveen Velliengiri, student from India. I'm working on developing a
Speculative compilation support in LLVM ORC JIT Infrastructure.

As LLVM ORC supports compiling in multiple backend threads, it would be
effective if we compile the functions speculatively before they are called
by the executing function. So when we request JIT to compile a function,
JIT will immediately returns the function address for raw executable bits.
This will greatly reduce the JIT latencies in modern multi-core machines.
And also I'm working on designing a ORC in-place dynamic profiling support,
by this JIT will automatically able to identify the hot functions, and
compile it in higher optimization level to achieve good performance.

I'm proposing this project for GSoC 2019. It would be helpful to know how
this new features are effective to pgsql engine, so that I include your
comments in "View from Clients" proposal section.

Please reply :)
--

Have a great day!
PreeJackie

#3Thomas Munro
thomas.munro@gmail.com
In reply to: preejackie (#2)
Re: New LLVM JIT Features

On Wed, Apr 3, 2019 at 8:39 AM Praveen Velliengiri
<praveenvelliengiri@gmail.com> wrote:

On Apr 2, 2019 12:51 AM, "preejackie" <praveenvelliengiri@gmail.com> wrote:

I'm Praveen Velliengiri, student from India. I'm working on developing a Speculative compilation support in LLVM ORC JIT Infrastructure.

As LLVM ORC supports compiling in multiple backend threads, it would be effective if we compile the functions speculatively before they are called by the executing function. So when we request JIT to compile a function, JIT will immediately returns the function address for raw executable bits. This will greatly reduce the JIT latencies in modern multi-core machines. And also I'm working on designing a ORC in-place dynamic profiling support, by this JIT will automatically able to identify the hot functions, and compile it in higher optimization level to achieve good performance.

I'm proposing this project for GSoC 2019. It would be helpful to know how this new features are effective to pgsql engine, so that I include your comments in "View from Clients" proposal section.

Hi Praveen,

FYI the final "commitfest" for PostgreSQL 12 is wrapping up right now
and the code freeze begins in a few days, so I wouldn't expect an
immediate reply.

--
Thomas Munro
https://enterprisedb.com

#4Andres Freund
andres@anarazel.de
In reply to: preejackie (#1)
Re: New LLVM JIT Features

Hi,

On 2019-04-02 00:51:51 +0530, preejackie wrote:

As LLVM ORC supports compiling in multiple backend threads, it would be
effective if we compile the functions speculatively before they are called
by the executing function. So when we request JIT to compile a function, JIT
will immediately returns the function address for raw executable bits. This
will greatly reduce the JIT latencies in modern multi-core machines.

I personally think this should be approached somewhat differently -
putting patchpoints into code reduces the efficiency of the generated
code, so I don't think that's the right approach. What I think we should
do is to, if we decide it's worthwhile at plan time, generate the LLVM
IR time at the beginning of execution, but continue to use interpreted
execution initially. The generated IR would then be handed over to a
background [process|thread|whatnot] for optimization of code
generation. Then, when finished, I'd switch over from interpreted to JIT
compiled execution. That approach will, in my view, yield better
latency behaviour because we can actually evaluate quals etc for which
we've not yet finished code generation.

And also I'm working on designing a ORC in-place dynamic profiling support, by
this JIT will automatically able to identify the hot functions, and compile
it in higher optimization level to achieve good performance.

I think that's a nice concept, but at the moment the generated code is
so bad that it's much more likely to get big benefits by improving the
generated IR, compared to giving more hints to the optimizer.

Greetings,

Andres Freund

#5preejackie
praveenvelliengiri@gmail.com
In reply to: Andres Freund (#4)
Re: New LLVM JIT Features

Hi Andres,

Thanks for the reply! Please see my comments inline.

On 03/04/19 3:20 AM, Andres Freund wrote:

Hi,

On 2019-04-02 00:51:51 +0530, preejackie wrote:

As LLVM ORC supports compiling in multiple backend threads, it would be
effective if we compile the functions speculatively before they are called
by the executing function. So when we request JIT to compile a function, JIT
will immediately returns the function address for raw executable bits. This
will greatly reduce the JIT latencies in modern multi-core machines.

I personally think this should be approached somewhat differently -
putting patchpoints into code reduces the efficiency of the generated
code, so I don't think that's the right approach. What I think we should

 What do you mean by patch points here? To my knowledge, LLVM symbols
have arbitrary stub associated which resolve to function address at
function address.

do is to, if we decide it's worthwhile at plan time, generate the LLVM
IR time at the beginning of execution, but continue to use interpreted
execution initially. The generated IR would then be handed over to a
background [process|thread|whatnot] for optimization of code
generation. Then, when finished, I'd switch over from interpreted to JIT
compiled execution. That approach will, in my view, yield better
latency behaviour because we can actually evaluate quals etc for which
we've not yet finished code generation.

And also I'm working on designing a ORC in-place dynamic profiling support, by
this JIT will automatically able to identify the hot functions, and compile
it in higher optimization level to achieve good performance.

I think that's a nice concept, but at the moment the generated code is
so bad that it's much more likely to get big benefits by improving the
generated IR, compared to giving more hints to the optimizer.

By improving the generated IR, you mean by turning pgsql queries into
LLVM IR? If it is the case, this design doesn't handles that, it works
only when the given program representation is in LLVM IR.

Greetings,

Andres Freund

--
Have a great day!
PreeJackie

#6Andres Freund
andres@anarazel.de
In reply to: preejackie (#5)
Re: New LLVM JIT Features

On 2019-04-03 10:44:06 +0530, preejackie wrote:

Hi Andres,

Thanks for the reply! Please see my comments inline.

On 03/04/19 3:20 AM, Andres Freund wrote:

Hi,

On 2019-04-02 00:51:51 +0530, preejackie wrote:

As LLVM ORC supports compiling in multiple backend threads, it would be
effective if we compile the functions speculatively before they are called
by the executing function. So when we request JIT to compile a function, JIT
will immediately returns the function address for raw executable bits. This
will greatly reduce the JIT latencies in modern multi-core machines.

I personally think this should be approached somewhat differently -
putting patchpoints into code reduces the efficiency of the generated
code, so I don't think that's the right approach. What I think we should

�What do you mean by patch points here? To my knowledge, LLVM symbols have
arbitrary stub associated which resolve to function address at function
address.

I was assuming that you'd want to improve latency by not compiling all
the functions at the start of the executor (like we currently do), but
have sub-functions compiled in the background. That'd require
patchpoints to be able to initially redirect to a function to wait for
compilation, which then can be changed to directly jump to the function.
Because we already just compile all the functions reachable at the start
of execution in one go, so it's not a one-by-one function affair.

do is to, if we decide it's worthwhile at plan time, generate the LLVM
IR time at the beginning of execution, but continue to use interpreted
execution initially. The generated IR would then be handed over to a
background [process|thread|whatnot] for optimization of code
generation. Then, when finished, I'd switch over from interpreted to JIT
compiled execution. That approach will, in my view, yield better
latency behaviour because we can actually evaluate quals etc for which
we've not yet finished code generation.

And also I'm working on designing a ORC in-place dynamic profiling support, by
this JIT will automatically able to identify the hot functions, and compile
it in higher optimization level to achieve good performance.

I think that's a nice concept, but at the moment the generated code is
so bad that it's much more likely to get big benefits by improving the
generated IR, compared to giving more hints to the optimizer.

By improving the generated IR, you mean by turning pgsql queries into LLVM
IR? If it is the case, this design doesn't handles that, it works only when
the given program representation is in LLVM IR.

My point is that we generate IR that's hard for LLVM to optimize. And
that fixing that is going to give you way bigger wins than profile
guided optimization.

Greetings,

Andres Freund

#7preejackie
praveenvelliengiri@gmail.com
In reply to: Andres Freund (#6)
Re: New LLVM JIT Features

Hi Andres,

Thanks for your thoughts , please see my comments inline.

On 03/04/19 10:53 AM, Andres Freund wrote:

On 2019-04-03 10:44:06 +0530, preejackie wrote:

Hi Andres,

Thanks for the reply! Please see my comments inline.

On 03/04/19 3:20 AM, Andres Freund wrote:

Hi,

On 2019-04-02 00:51:51 +0530, preejackie wrote:

As LLVM ORC supports compiling in multiple backend threads, it would be
effective if we compile the functions speculatively before they are called
by the executing function. So when we request JIT to compile a function, JIT
will immediately returns the function address for raw executable bits. This
will greatly reduce the JIT latencies in modern multi-core machines.

I personally think this should be approached somewhat differently -
putting patchpoints into code reduces the efficiency of the generated
code, so I don't think that's the right approach. What I think we should

 What do you mean by patch points here? To my knowledge, LLVM symbols have
arbitrary stub associated which resolve to function address at function
address.

I was assuming that you'd want to improve latency by not compiling all
the functions at the start of the executor (like we currently do), but
have sub-functions compiled in the background. That'd require
patchpoints to be able to initially redirect to a function to wait for
compilation, which then can be changed to directly jump to the function.
Because we already just compile all the functions reachable at the start
of execution in one go, so it's not a one-by-one function affair.

  Compiling the whole module will increase your start-up time of the
application right? Is there any techniques applied in Pgsql to handle
this ? Sometimes, you will compile functions that you don't need
immediately or even it will not called in run time. This is the
trade-off between different JIT implementations.  Also adding patch
points in the generated code will degrade performance only when we
didn't compile the function ahead-of-time, theoretically this will patch
points miss will go down when we increase the number of compiler
threads. And practically every computer have at least 4 cores nowadays.

do is to, if we decide it's worthwhile at plan time, generate the LLVM
IR time at the beginning of execution, but continue to use interpreted
execution initially. The generated IR would then be handed over to a
background [process|thread|whatnot] for optimization of code
generation. Then, when finished, I'd switch over from interpreted to JIT
compiled execution. That approach will, in my view, yield better
latency behaviour because we can actually evaluate quals etc for which
we've not yet finished code generation.

And also I'm working on designing a ORC in-place dynamic profiling support, by
this JIT will automatically able to identify the hot functions, and compile
it in higher optimization level to achieve good performance.

I think that's a nice concept, but at the moment the generated code is
so bad that it's much more likely to get big benefits by improving the
generated IR, compared to giving more hints to the optimizer.

By improving the generated IR, you mean by turning pgsql queries into LLVM
IR? If it is the case, this design doesn't handles that, it works only when
the given program representation is in LLVM IR.

My point is that we generate IR that's hard for LLVM to optimize. And
that fixing that is going to give you way bigger wins than profile
guided optimization.

  I hope this is problem of Pgsql, but I'm proposing this project for
LLVM Community.

Greetings,

Andres Freund

--
Have a great day!
PreeJackie