wrong query result with jit_above_cost= 0

Started by Rushabh Lathiaover 7 years ago11 messages
#1Rushabh Lathia
rushabh.lathia@gmail.com

Hi,

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES
(7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

#2Andres Freund
andres@anarazel.de
In reply to: Rushabh Lathia (#1)
Re: wrong query result with jit_above_cost= 0

Hi,

On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES
(7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Interesting. I've opened an open item referencing this.

Greetings,

Andres Freund

#3Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#2)
Re: wrong query result with jit_above_cost= 0

On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Interesting. I've opened an open item referencing this.

Hi,

Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:

=# show jit_above_cost;
jit_above_cost
----------------
100000
(1 row)

=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)

=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)

And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):

=# select max(epno), deptno, epno
from emp group by rollup((deptno,epno)) order by 1 asc;

max | deptno | epno
------+--------+------
7369 | 20 | 7369
7499 | NULL | NULL
7499 | 30 | 7499
(3 rows)

#4Andres Freund
andres@anarazel.de
In reply to: Dmitry Dolgov (#3)
Re: wrong query result with jit_above_cost= 0

On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:

On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Interesting. I've opened an open item referencing this.

Hi,

Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:

=# show jit_above_cost;
jit_above_cost
----------------
100000
(1 row)

=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)

=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)

And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):

=# select max(epno), deptno, epno
from emp group by rollup((deptno,epno)) order by 1 asc;

max | deptno | epno
------+--------+------
7369 | 20 | 7369
7499 | NULL | NULL
7499 | 30 | 7499
(3 rows)

I've not reproduced the problem yet (I'm deep in a review / edit of
another patchset). Could it be that you've not compiled with JIT
support and thus don't see the problem Rushab was complaining about?
SELECT pg_jit_available();

Greetings,

Andres Freund

#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#4)
Re: wrong query result with jit_above_cost= 0

On 26 June 2018 at 22:11, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:

On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Interesting. I've opened an open item referencing this.

Hi,

Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:

=# show jit_above_cost;
jit_above_cost
----------------
100000
(1 row)

=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)

=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)

And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):

=# select max(epno), deptno, epno
from emp group by rollup((deptno,epno)) order by 1 asc;

max | deptno | epno
------+--------+------
7369 | 20 | 7369
7499 | NULL | NULL
7499 | 30 | 7499
(3 rows)

I've not reproduced the problem yet (I'm deep in a review / edit of
another patchset). Could it be that you've not compiled with JIT
support and thus don't see the problem Rushab was complaining about?
SELECT pg_jit_available();

Yep, my bad, forgot to turn it on. Now I see what's the problem, one of the
null fields is screwed up, will try to figure out why is that.

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Dmitry Dolgov (#5)
Re: wrong query result with jit_above_cost= 0

"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:

Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.

The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)

(This wasn't my first choice of approach; originally I had a GroupedVar
node that evaluated either as the Var would or to NULL instead, and put
that into the projection. However there was a lot of resistance to this
approach at the time, and the new node idea was abandoned in favour of
poking directly into the slot.)

--
Andrew (irc:RhodiumToad)

#7Andres Freund
andres@anarazel.de
In reply to: Andrew Gierth (#6)
Re: wrong query result with jit_above_cost= 0

On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:

"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:

Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.

The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)

I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.

Greetings,

Andres Freund

#8Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#7)
Re: wrong query result with jit_above_cost= 0

On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:

"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:

Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.

The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)

I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.

So far I found out that in agg_retrieve_hash_table, when there is a scan for
TupleHashEntryData, that contains AggStatePerGroup structure in the field
"additional", it's possible to get some garbage data (or at least transValue is
lost). It happens when we do:

ReScanExprContext(aggstate->aggcontexts[i]);

in agg_retrieve_direct before that. Apparently, the reason is that in the jit
code there is a store operation for curaggcontext into aggcontext:

v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext,
l_ptr(StructExprContext));

/* set aggstate globals */
LLVMBuildStore(b, v_aggcontext, v_curaggcontext);

I haven't found anything similar in the original code or in the other branches
for aggregation logic. I can't say that I fully understand the idea behind it,
but at least it was suspicious for me. When I removed this operation, the
problem has disappeared.

#9Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#8)
2 attachment(s)
Re: wrong query result with jit_above_cost= 0

On Wed, 27 Jun 2018 at 17:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:

"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:

Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.

The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)

I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.

So far I found out that in agg_retrieve_hash_table, when there is a scan for
TupleHashEntryData, that contains AggStatePerGroup structure in the field
"additional", it's possible to get some garbage data (or at least transValue is
lost). It happens when we do:

ReScanExprContext(aggstate->aggcontexts[i]);

in agg_retrieve_direct before that. Apparently, the reason is that in the jit
code there is a store operation for curaggcontext into aggcontext:

v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext,
l_ptr(StructExprContext));

/* set aggstate globals */
LLVMBuildStore(b, v_aggcontext, v_curaggcontext);

I haven't found anything similar in the original code or in the other branches
for aggregation logic. I can't say that I fully understand the idea behind it,
but at least it was suspicious for me. When I removed this operation, the
problem has disappeared.

Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing
to do here (this change was just masking a problem by changing a memory context
so that the wrong one will never be used). The problem was that in the
llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory
context from op->d.agg_init_trans.aggcontext (see the attached patch),
otherwise we'll get in the situation when current memory context is hashcontext
instead of aggcontext.

Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.

And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1]/messages/by-id/20171005065739.dgsplipwkpmrkspg@alap3.anarazel.de from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:

error: ‘ObjHandleT’ was not declared in this scope

So I'm confused how should it be?

[1]: /messages/by-id/20171005065739.dgsplipwkpmrkspg@alap3.anarazel.de

Attachments:

llvmjit_ctx_switch.patchtext/x-patch; charset=US-ASCII; name=llvmjit_ctx_switch.patchDownload
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index d04ff62..7e7387e 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -2491,6 +2491,7 @@ llvm_compile_expr(ExprState *state)
 						/* store trans value */
 						LLVMBuildStore(b, v_newval, v_transvaluep);
 						LLVMBuildStore(b, v_fcinfo_isnull, v_transnullp);
+						l_mcxt_switch(mod, b, v_oldcontext);
 						LLVMBuildBr(b, opblocks[i + 1]);
 
 						/* returned datum passed datum, no need to reparent */
llvmjit_curaggcontext.patchtext/x-patch; charset=US-ASCII; name=llvmjit_curaggcontext.patchDownload
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 36c5f7d..d04ff62 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -2186,6 +2186,9 @@ llvm_compile_expr(ExprState *state)
 
 					LLVMBasicBlockRef b_init;
 
+					LLVMValueRef v_curaggcontext;
+					LLVMValueRef v_aggcontext;
+
 					aggstate = op->d.agg_init_trans.aggstate;
 					pertrans = op->d.agg_init_trans.pertrans;
 
@@ -2229,6 +2232,17 @@ llvm_compile_expr(ExprState *state)
 					{
 						LLVMValueRef params[3];
 
+						v_aggcontext = l_ptr_const(op->d.agg_init_trans.aggcontext,
+												   l_ptr(StructExprContext));
+
+						v_curaggcontext =
+							LLVMBuildStructGEP(b,
+											   v_aggstatep,
+											   FIELDNO_AGGSTATE_CURAGGCONTEXT,
+											   "aggstate.curaggcontext");
+
+						LLVMBuildStore(b, v_aggcontext, v_curaggcontext);
+
 						params[0] = v_aggstatep;
 						params[1] = v_pertransp;
 						params[2] = v_pergroupp;
#10Andres Freund
andres@anarazel.de
In reply to: Dmitry Dolgov (#9)
Re: wrong query result with jit_above_cost= 0

Hi,

On 2018-07-07 21:41:05 +0200, Dmitry Dolgov wrote:

Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing
to do here (this change was just masking a problem by changing a memory context
so that the wrong one will never be used). The problem was that in the
llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory
context from op->d.agg_init_trans.aggcontext (see the attached patch),
otherwise we'll get in the situation when current memory context is hashcontext
instead of aggcontext.

Nice catch! I pushed your fix, but I also made it set current_set.

Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.

I'll look at that next.

And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1] from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:

error: ‘ObjHandleT’ was not declared in this scope

So I'm confused how should it be?

I've merged the GDB part into LLVM, and am about to merge the perf part
too. I plan to push a fix to PG adapting it to use the agreed upon /
merged LLVM APIs. Then you'll just need a recent LLVM
checkout. Unfortunately the relevant LLVM internal APIs have changed
quite rapidly over the last few releases (and a lot within individual
releases), so it's not easy to provide a patch for the individual versions.

Greetings,

Andres Freund

#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#10)
Re: wrong query result with jit_above_cost= 0

Hi,

On 2018-07-22 17:09:37 -0700, Andres Freund wrote:

Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.

I'll look at that next.

That's indeed necessary. Pushed. Thanks!

And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1] from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:

error: ‘ObjHandleT’ was not declared in this scope

So I'm confused how should it be?

I've merged the GDB part into LLVM, and am about to merge the perf part
too. I plan to push a fix to PG adapting it to use the agreed upon /
merged LLVM APIs. Then you'll just need a recent LLVM
checkout. Unfortunately the relevant LLVM internal APIs have changed
quite rapidly over the last few releases (and a lot within individual
releases), so it's not easy to provide a patch for the individual versions.

Adapted the PG bits.

Greetings,

Andres Freund