wrong query result with jit_above_cost= 0
Hi,
I found the below query which returns the wrong output
when jit_above_cost= 0 is set.
Steps to reproduce:
CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);
INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES
(7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
set jit_above_cost= 0;
select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
without the ROLLUP, I don't see any problem with results.
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Hi,
On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
I found the below query which returns the wrong output
when jit_above_cost= 0 is set.Steps to reproduce:
CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES
(7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);set jit_above_cost= 0;
select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
without the ROLLUP, I don't see any problem with results.
Interesting. I've opened an open item referencing this.
Greetings,
Andres Freund
On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
Hi,On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
I found the below query which returns the wrong output
when jit_above_cost= 0 is set.Steps to reproduce:
CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);set jit_above_cost= 0;
select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
without the ROLLUP, I don't see any problem with results.
Interesting. I've opened an open item referencing this.
Hi,
Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:
=# show jit_above_cost;
jit_above_cost
----------------
100000
(1 row)
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)
=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)
And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):
=# select max(epno), deptno, epno
from emp group by rollup((deptno,epno)) order by 1 asc;
max | deptno | epno
------+--------+------
7369 | 20 | 7369
7499 | NULL | NULL
7499 | 30 | 7499
(3 rows)
On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:
On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
Hi,On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
I found the below query which returns the wrong output
when jit_above_cost= 0 is set.Steps to reproduce:
CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);set jit_above_cost= 0;
select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
without the ROLLUP, I don't see any problem with results.
Interesting. I've opened an open item referencing this.
Hi,
Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:=# show jit_above_cost;
jit_above_cost
----------------
100000
(1 row)=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):=# select max(epno), deptno, epno
from emp group by rollup((deptno,epno)) order by 1 asc;max | deptno | epno
------+--------+------
7369 | 20 | 7369
7499 | NULL | NULL
7499 | 30 | 7499
(3 rows)
I've not reproduced the problem yet (I'm deep in a review / edit of
another patchset). Could it be that you've not compiled with JIT
support and thus don't see the problem Rushab was complaining about?
SELECT pg_jit_available();
Greetings,
Andres Freund
On 26 June 2018 at 22:11, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
Hi,On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
I found the below query which returns the wrong output
when jit_above_cost= 0 is set.Steps to reproduce:
CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);set jit_above_cost= 0;
select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
without the ROLLUP, I don't see any problem with results.
Interesting. I've opened an open item referencing this.
Hi,
Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:=# show jit_above_cost;
jit_above_cost
----------------
100000
(1 row)=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
max
------
7369
7499
7499
(3 rows)And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):=# select max(epno), deptno, epno
from emp group by rollup((deptno,epno)) order by 1 asc;max | deptno | epno
------+--------+------
7369 | 20 | 7369
7499 | NULL | NULL
7499 | 30 | 7499
(3 rows)I've not reproduced the problem yet (I'm deep in a review / edit of
another patchset). Could it be that you've not compiled with JIT
support and thus don't see the problem Rushab was complaining about?
SELECT pg_jit_available();
Yep, my bad, forgot to turn it on. Now I see what's the problem, one of the
null fields is screwed up, will try to figure out why is that.
"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.
The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)
(This wasn't my first choice of approach; originally I had a GroupedVar
node that evaluated either as the Var would or to NULL instead, and put
that into the projection. However there was a lot of resistance to this
approach at the time, and the new node idea was abandoned in favour of
poking directly into the slot.)
--
Andrew (irc:RhodiumToad)
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:
"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)
I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.
Greetings,
Andres Freund
On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.
So far I found out that in agg_retrieve_hash_table, when there is a scan for
TupleHashEntryData, that contains AggStatePerGroup structure in the field
"additional", it's possible to get some garbage data (or at least transValue is
lost). It happens when we do:
ReScanExprContext(aggstate->aggcontexts[i]);
in agg_retrieve_direct before that. Apparently, the reason is that in the jit
code there is a store operation for curaggcontext into aggcontext:
v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext,
l_ptr(StructExprContext));
/* set aggstate globals */
LLVMBuildStore(b, v_aggcontext, v_curaggcontext);
I haven't found anything similar in the original code or in the other branches
for aggregation logic. I can't say that I fully understand the idea behind it,
but at least it was suspicious for me. When I removed this operation, the
problem has disappeared.
On Wed, 27 Jun 2018 at 17:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote:
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:"Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
Dmitry> problem, one of the null fields is screwed up, will try to
Dmitry> figure out why is that.The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.So far I found out that in agg_retrieve_hash_table, when there is a scan for
TupleHashEntryData, that contains AggStatePerGroup structure in the field
"additional", it's possible to get some garbage data (or at least transValue is
lost). It happens when we do:ReScanExprContext(aggstate->aggcontexts[i]);
in agg_retrieve_direct before that. Apparently, the reason is that in the jit
code there is a store operation for curaggcontext into aggcontext:v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext,
l_ptr(StructExprContext));/* set aggstate globals */
LLVMBuildStore(b, v_aggcontext, v_curaggcontext);I haven't found anything similar in the original code or in the other branches
for aggregation logic. I can't say that I fully understand the idea behind it,
but at least it was suspicious for me. When I removed this operation, the
problem has disappeared.
Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing
to do here (this change was just masking a problem by changing a memory context
so that the wrong one will never be used). The problem was that in the
llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory
context from op->d.agg_init_trans.aggcontext (see the attached patch),
otherwise we'll get in the situation when current memory context is hashcontext
instead of aggcontext.
Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.
And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1]/messages/by-id/20171005065739.dgsplipwkpmrkspg@alap3.anarazel.de from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:
error: ‘ObjHandleT’ was not declared in this scope
So I'm confused how should it be?
[1]: /messages/by-id/20171005065739.dgsplipwkpmrkspg@alap3.anarazel.de
Attachments:
llvmjit_ctx_switch.patchtext/x-patch; charset=US-ASCII; name=llvmjit_ctx_switch.patchDownload
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index d04ff62..7e7387e 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -2491,6 +2491,7 @@ llvm_compile_expr(ExprState *state)
/* store trans value */
LLVMBuildStore(b, v_newval, v_transvaluep);
LLVMBuildStore(b, v_fcinfo_isnull, v_transnullp);
+ l_mcxt_switch(mod, b, v_oldcontext);
LLVMBuildBr(b, opblocks[i + 1]);
/* returned datum passed datum, no need to reparent */
llvmjit_curaggcontext.patchtext/x-patch; charset=US-ASCII; name=llvmjit_curaggcontext.patchDownload
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 36c5f7d..d04ff62 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -2186,6 +2186,9 @@ llvm_compile_expr(ExprState *state)
LLVMBasicBlockRef b_init;
+ LLVMValueRef v_curaggcontext;
+ LLVMValueRef v_aggcontext;
+
aggstate = op->d.agg_init_trans.aggstate;
pertrans = op->d.agg_init_trans.pertrans;
@@ -2229,6 +2232,17 @@ llvm_compile_expr(ExprState *state)
{
LLVMValueRef params[3];
+ v_aggcontext = l_ptr_const(op->d.agg_init_trans.aggcontext,
+ l_ptr(StructExprContext));
+
+ v_curaggcontext =
+ LLVMBuildStructGEP(b,
+ v_aggstatep,
+ FIELDNO_AGGSTATE_CURAGGCONTEXT,
+ "aggstate.curaggcontext");
+
+ LLVMBuildStore(b, v_aggcontext, v_curaggcontext);
+
params[0] = v_aggstatep;
params[1] = v_pertransp;
params[2] = v_pergroupp;
Hi,
On 2018-07-07 21:41:05 +0200, Dmitry Dolgov wrote:
Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing
to do here (this change was just masking a problem by changing a memory context
so that the wrong one will never be used). The problem was that in the
llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory
context from op->d.agg_init_trans.aggcontext (see the attached patch),
otherwise we'll get in the situation when current memory context is hashcontext
instead of aggcontext.
Nice catch! I pushed your fix, but I also made it set current_set.
Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.
I'll look at that next.
And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1] from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:error: ‘ObjHandleT’ was not declared in this scope
So I'm confused how should it be?
I've merged the GDB part into LLVM, and am about to merge the perf part
too. I plan to push a fix to PG adapting it to use the agreed upon /
merged LLVM APIs. Then you'll just need a recent LLVM
checkout. Unfortunately the relevant LLVM internal APIs have changed
quite rapidly over the last few releases (and a lot within individual
releases), so it's not easy to provide a patch for the individual versions.
Greetings,
Andres Freund
Hi,
On 2018-07-22 17:09:37 -0700, Andres Freund wrote:
Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.I'll look at that next.
That's indeed necessary. Pushed. Thanks!
And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1] from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:error: ‘ObjHandleT’ was not declared in this scope
So I'm confused how should it be?
I've merged the GDB part into LLVM, and am about to merge the perf part
too. I plan to push a fix to PG adapting it to use the agreed upon /
merged LLVM APIs. Then you'll just need a recent LLVM
checkout. Unfortunately the relevant LLVM internal APIs have changed
quite rapidly over the last few releases (and a lot within individual
releases), so it's not easy to provide a patch for the individual versions.
Adapted the PG bits.
Greetings,
Andres Freund