Query JITing with LLVM ORC

Started by João Paulo Labegalini de Carvalhoover 3 years ago7 messages

Hi all,

I am working on a project with LLVM ORC that led us to PostgreSQL as a
target application. We were surprised by learning that PGSQL already uses
LLVM ORC to JIT certain queries.

I would love to know what motivated this feature and for what it is being
currently used for, as it is not enabled by default.

Thanks.

--
João Paulo L. de Carvalho
Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carvalho@ic.unicamp.br
joao.carvalho@ualberta.ca

#2Thomas Munro
thomas.munro@gmail.com
In reply to: João Paulo Labegalini de Carvalho (#1)
Re: Query JITing with LLVM ORC

On Thu, Sep 22, 2022 at 4:17 AM João Paulo Labegalini de Carvalho
<jaopaulolc@gmail.com> wrote:

I am working on a project with LLVM ORC that led us to PostgreSQL as a target application. We were surprised by learning that PGSQL already uses LLVM ORC to JIT certain queries.

It JITs expressions but not whole queries. Query execution at the
tuple-flow level is still done using a C call stack the same shape as
the query plan, but it *could* be transformed to a different control
flow that could be run more efficiently and perhaps JITed. CCing
Andres who developed all this and had some ideas about that...

I would love to know what motivated this feature and for what it is being currently used for,

https://www.postgresql.org/docs/current/jit-reason.html

as it is not enabled by default.

It's enabled by default in v12 and higher (if you built with
--with-llvm, as packagers do), but not always used:

https://www.postgresql.org/docs/current/jit-decision.html

In reply to: Thomas Munro (#2)
Re: Query JITing with LLVM ORC

Hi Thomas,

It JITs expressions but not whole queries.

Thanks for the clarification.

Query execution at the
tuple-flow level is still done using a C call stack the same shape as
the query plan, but it *could* be transformed to a different control
flow that could be run more efficiently and perhaps JITed.

I see, so there is room for extending the use of Orc JIT in PGSQL.

CCing
Andres who developed all this and had some ideas about that...

Thanks for CCing Andres, it will be great to hear from him.

I would love to know what motivated this feature and for what it is

being currently used for,

https://www.postgresql.org/docs/current/jit-reason.html

In that link I found the README under src/backend/jit, which was very
helpful.

as it is not enabled by default.

It's enabled by default in v12 and higher (if you built with
--with-llvm, as packagers do), but not always used:

https://www.postgresql.org/docs/current/jit-decision.html

Good to know. I compiled from the REL_14_5 tag and did a simple experiment
to contrast building with and w/o passing --with-llvm.
I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up,
and 120 of measurement time.
The number of transactions per minute was about the same with & w/o JITing.
Is this expected? Should I use a different benchmark to observe a
performance difference?

Regards,

--
João Paulo L. de Carvalho
Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carvalho@ic.unicamp.br
joao.carvalho@ualberta.ca

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: João Paulo Labegalini de Carvalho (#3)
Re: Query JITing with LLVM ORC

=?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= <jaopaulolc@gmail.com> writes:

Good to know. I compiled from the REL_14_5 tag and did a simple experiment
to contrast building with and w/o passing --with-llvm.
I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up,
and 120 of measurement time.
The number of transactions per minute was about the same with & w/o JITing.
Is this expected? Should I use a different benchmark to observe a
performance difference?

TPC-C is mostly short queries, so we aren't likely to choose to use JIT
(and if we did, it'd likely be slower). You need a long query that will
execute the same expressions over and over for it to make sense to
compile them. Did you check whether any JIT was happening there?

There are a bunch of issues in this area concerning whether our cost
models are good enough to accurately predict whether JIT is a good
idea. But single-row fetches and updates are basically never going
to use it, nor should they.

regards, tom lane

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#4)
Re: Query JITing with LLVM ORC

On Thu, Sep 22, 2022 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= <jaopaulolc@gmail.com> writes:

Good to know. I compiled from the REL_14_5 tag and did a simple experiment
to contrast building with and w/o passing --with-llvm.
I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up,
and 120 of measurement time.
The number of transactions per minute was about the same with & w/o JITing.
Is this expected? Should I use a different benchmark to observe a
performance difference?

TPC-C is mostly short queries, so we aren't likely to choose to use JIT
(and if we did, it'd likely be slower). You need a long query that will
execute the same expressions over and over for it to make sense to
compile them. Did you check whether any JIT was happening there?

See also the proposal thread which has some earlier numbers from TPC-H.

/messages/by-id/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

#6Thomas Munro
thomas.munro@gmail.com
In reply to: João Paulo Labegalini de Carvalho (#3)
Re: Query JITing with LLVM ORC

On Thu, Sep 22, 2022 at 10:04 AM João Paulo Labegalini de Carvalho
<jaopaulolc@gmail.com> wrote:

building with and w/o passing --with-llvm

BTW you can also just turn it off with runtime settings, no need to rebuild.

In reply to: Thomas Munro (#6)
Re: Query JITing with LLVM ORC

Tom & Thomas:

Thank you so much, those a very useful comments.

I noticed that I didn't make my intentions very clear. My teams goal is to
evaluate if there are any gains in JITing PostgreSQL itself, or at least
parts of it, and not the expressions or parts of a query.

The rationale to use PostgreSQL is because DBs are long running
applications and the cost of JITing can be amortized.

We have a prototype LLVM IR pass that outlines functions in a program to
JIT and a ORC-based runtime to re-compile functions. Our goal is to see
improvements due to target/sub-target specialization.

The reason I was looking at benchmarks is to have a workload to profile
PostgreSQL and find its bottlenecks. The hot functions would then be
outlined for JITing.

On Wed., Sep. 21, 2022, 4:54 p.m. Thomas Munro, <thomas.munro@gmail.com>
wrote:

Show quoted text

On Thu, Sep 22, 2022 at 10:04 AM João Paulo Labegalini de Carvalho
<jaopaulolc@gmail.com> wrote:

building with and w/o passing --with-llvm

BTW you can also just turn it off with runtime settings, no need to
rebuild.