jsonb crash

Started by Jaime Casanovaover 4 years ago23 messageshackers
Jump to latest
#1Jaime Casanova
jcasanov@systemguards.com.ec

Hi,

I found a crash (segmentation fault) on jsonb.
This is the best I could do to reduce the query:

"""
select
75 as c1
from
public.pagg_tab_ml as ref_0,
lateral (select
ref_0.a as c5
from generate_series(1, 300) as sample_0
fetch first 78 rows only
) as subq_0
where case when (subq_0.c5 < 2)
then cast(null as jsonb)
else cast(null as jsonb)
end ? ref_0.c
"""

And because it needs pagg_tab_ml it should be run a regression database.
This affects at least 14 and 15.

Attached is the backtrace.

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachments:

jsonbcrash.txttext/plain; charset=us-asciiDownload
#2Ranier Vilela
ranier.vf@gmail.com
In reply to: Jaime Casanova (#1)
Re: jsonb crash

Em qua., 29 de set. de 2021 às 15:55, Jaime Casanova <
jcasanov@systemguards.com.ec> escreveu:

Hi,

I found a crash (segmentation fault) on jsonb.
This is the best I could do to reduce the query:

"""
select
75 as c1
from
public.pagg_tab_ml as ref_0,
lateral (select
ref_0.a as c5
from generate_series(1, 300) as sample_0
fetch first 78 rows only
) as subq_0
where case when (subq_0.c5 < 2)
then cast(null as jsonb)
else cast(null as jsonb)
end ? ref_0.c
"""

And because it needs pagg_tab_ml it should be run a regression database.
This affects at least 14 and 15.

Attached is the backtrace.

Yeah, Coverity has a report about this at function:

JsonbValue *
pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
JsonbValue *jbval)

1. CID undefined: Dereference after null check (FORWARD_NULL)
return pushJsonbValueScalar(pstate, seq, jbval);

2. CID undefined (#1 of 1): Dereference after null check (FORWARD_NULL)16.
var_deref_model:
Passing pstate to pushJsonbValueScalar, which dereferences null *pstate

res = pushJsonbValueScalar(pstate, tok,
tok <
WJB_BEGIN_ARRAY ||
(tok ==
WJB_BEGIN_ARRAY &&
v.
val.array.rawScalar) ? &v : NULL);

regards,
Ranier Vilela

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jaime Casanova (#1)
Re: jsonb crash

Jaime Casanova <jcasanov@systemguards.com.ec> writes:

I found a crash (segmentation fault) on jsonb.
This is the best I could do to reduce the query:

"""
select
75 as c1
from
public.pagg_tab_ml as ref_0,
lateral (select
ref_0.a as c5
from generate_series(1, 300) as sample_0
fetch first 78 rows only
) as subq_0
where case when (subq_0.c5 < 2)
then cast(null as jsonb)
else cast(null as jsonb)
end ? ref_0.c
"""

I think this must be a memoize bug. AFAICS, nowhere in this query
can we be processing a non-null JSONB value, so what are we doing
in jsonb_hash? Something down-stack must have lost the information
that the Datum is actually null.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: jsonb crash

I wrote:

I think this must be a memoize bug. AFAICS, nowhere in this query
can we be processing a non-null JSONB value, so what are we doing
in jsonb_hash? Something down-stack must have lost the information
that the Datum is actually null.

After further inspection, "what are we doing in jsonb_hash?" is
indeed a relevant question, but it seems like it's a type mismatch
not a nullness issue. EXPLAIN VERBOSE shows

-> Memoize (cost=0.01..1.96 rows=1 width=4)
Output: subq_0.c5
Cache Key: ref_0.c, ref_0.a
-> Subquery Scan on subq_0 (cost=0.00..1.95 rows=1 width=4)
Output: subq_0.c5
Filter: (CASE WHEN (subq_0.c5 < 2) THEN NULL::jsonb ELSE NULL::jsonb END ? ref_0.c)
-> Limit (cost=0.00..0.78 rows=78 width=4)
Output: (ref_0.a)
-> Function Scan on pg_catalog.generate_series sample_0 (cost=0.00..3.00 rows=300 width=4)
Output: ref_0.a
Function Call: generate_series(1, 300)

so unless the "Cache Key" output is a complete lie, the cache key
types we should be concerned with are text and integer. The Datum
that's being passed to jsonb_hash looks suspiciously like it is a
text value '0000', too, which matches the "c" value from the first
row of pagg_tab_ml. I now think some part of Memoize is looking in
completely the wrong place to discover the cache key datatypes.

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#3)
Re: jsonb crash

On 9/29/21 4:00 PM, Tom Lane wrote:

Jaime Casanova <jcasanov@systemguards.com.ec> writes:

I found a crash (segmentation fault) on jsonb.
This is the best I could do to reduce the query:
"""
select
75 as c1
from
public.pagg_tab_ml as ref_0,
lateral (select
ref_0.a as c5
from generate_series(1, 300) as sample_0
fetch first 78 rows only
) as subq_0
where case when (subq_0.c5 < 2)
then cast(null as jsonb)
else cast(null as jsonb)
end ? ref_0.c
"""

I think this must be a memoize bug. AFAICS, nowhere in this query
can we be processing a non-null JSONB value, so what are we doing
in jsonb_hash? Something down-stack must have lost the information
that the Datum is actually null.

Yeah, confirmed that this is not failing in release 13.

cheers

andrew

--

Andrew Dunstan
EDB: https://www.enterprisedb.com

#6David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#4)
Re: jsonb crash

On Thu, 30 Sept 2021 at 09:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After further inspection, "what are we doing in jsonb_hash?" is
indeed a relevant question, but it seems like it's a type mismatch
not a nullness issue. EXPLAIN VERBOSE shows

I think you're right here. It should be hashing text. That seems to
be going wrong in check_memoizable() because it assumes it's always
fine to use the left side's type of the OpExpr to figure out the hash
function to use.

Maybe we can cache the left and the right type's hash function and use
the correct one in paraminfo_get_equal_hashops().

David

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#6)
Re: jsonb crash

David Rowley <dgrowleyml@gmail.com> writes:

On Thu, 30 Sept 2021 at 09:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After further inspection, "what are we doing in jsonb_hash?" is
indeed a relevant question, but it seems like it's a type mismatch
not a nullness issue. EXPLAIN VERBOSE shows

I think you're right here. It should be hashing text. That seems to
be going wrong in check_memoizable() because it assumes it's always
fine to use the left side's type of the OpExpr to figure out the hash
function to use.

Maybe we can cache the left and the right type's hash function and use
the correct one in paraminfo_get_equal_hashops().

Um ... it seems to have correctly identified the cache key expressions,
so why isn't it just doing exprType on those? The jsonb_exists operator
seems entirely irrelevant here.

regards, tom lane

#8David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#7)
Re: jsonb crash

On Thu, 30 Sept 2021 at 10:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Maybe we can cache the left and the right type's hash function and use
the correct one in paraminfo_get_equal_hashops().

Um ... it seems to have correctly identified the cache key expressions,
so why isn't it just doing exprType on those? The jsonb_exists operator
seems entirely irrelevant here.

This is down to the caching stuff I added to RestrictInfo to minimise
the amount of work done during the join search. I cached the hash
equal function in RestrictInfo so I didn't have to check what that was
each time we consider a join. The problem is, that I did a bad job of
taking inspiration from check_hashjoinable() which just looks at the
left type.

David

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#8)
Re: jsonb crash

David Rowley <dgrowleyml@gmail.com> writes:

On Thu, 30 Sept 2021 at 10:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Um ... it seems to have correctly identified the cache key expressions,
so why isn't it just doing exprType on those? The jsonb_exists operator
seems entirely irrelevant here.

This is down to the caching stuff I added to RestrictInfo to minimise
the amount of work done during the join search. I cached the hash
equal function in RestrictInfo so I didn't have to check what that was
each time we consider a join. The problem is, that I did a bad job of
taking inspiration from check_hashjoinable() which just looks at the
left type.

I'm still confused. AFAICS, the top-level operator of the qual clause has
exactly nada to do with the cache keys, as this example makes plain.

regards, tom lane

#10David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#9)
Re: jsonb crash

On Thu, 30 Sept 2021 at 10:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Thu, 30 Sept 2021 at 10:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Um ... it seems to have correctly identified the cache key expressions,
so why isn't it just doing exprType on those? The jsonb_exists operator
seems entirely irrelevant here.

This is down to the caching stuff I added to RestrictInfo to minimise
the amount of work done during the join search. I cached the hash
equal function in RestrictInfo so I didn't have to check what that was
each time we consider a join. The problem is, that I did a bad job of
taking inspiration from check_hashjoinable() which just looks at the
left type.

I'm still confused. AFAICS, the top-level operator of the qual clause has
exactly nada to do with the cache keys, as this example makes plain.

You're right that it does not. The lateral join condition could be
anything. We just need to figure out the hash function and which
equality function so that we can properly find any cached tuples when
we're probing the hash table. We need the equal function too as we
can't just return any old cache tuples that match the same hash value.

Maybe recording the operator is not the best thing to do. Maybe I
should have just recorded the regproc's Oid for the equal function.
That would save from calling get_opcode() in ExecInitMemoize().

David

#11David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#6)
Re: jsonb crash

On Thu, 30 Sept 2021 at 09:48, David Rowley <dgrowleyml@gmail.com> wrote:

Maybe we can cache the left and the right type's hash function and use
the correct one in paraminfo_get_equal_hashops().

Here's a patch of what I had in mind for the fix. It's just hot off
the press, so really only intended to assist discussion at this stage.

David

Attachments:

fix_incorrect_hashfunction_in_memoize.patchapplication/octet-stream; name=fix_incorrect_hashfunction_in_memoize.patchDownload+49-18
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#10)
Re: jsonb crash

David Rowley <dgrowley@gmail.com> writes:

On Thu, 30 Sept 2021 at 10:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm still confused. AFAICS, the top-level operator of the qual clause has
exactly nada to do with the cache keys, as this example makes plain.

You're right that it does not. The lateral join condition could be
anything.

Actually, the more I look at this the more unhappy I get, because
it's becoming clear that you have made unfounded semantic
assumptions. The hash functions generally only promise that they
will distinguish values that are distinguishable by the associated
equality operator. We have plenty of data types in which that does
not map to bitwise equality ... you need not look further than
float8 for an example. And in turn, that means that there are lots
of functions/operators that *can* distinguish hash-equal values.
The fact that you're willing to treat this example as cacheable
means that memoize will fail on such clauses.

So I'm now thinking you weren't that far wrong to be looking at
hashability of the top-level qual operator. What is missing is
that you have to restrict candidate cache keys to be the *direct*
arguments of such an operator. Looking any further down in the
expression introduces untenable assumptions.

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: jsonb crash

I wrote:

So I'm now thinking you weren't that far wrong to be looking at
hashability of the top-level qual operator. What is missing is
that you have to restrict candidate cache keys to be the *direct*
arguments of such an operator. Looking any further down in the
expression introduces untenable assumptions.

Hmm ... I think that actually, a correct statement of the semantic
restriction is

To be eligible for memoization, the inside of a join can use the
passed-in parameters *only* as direct arguments of hashable equality
operators.

In order to exploit RestrictInfo-based caching, you could make the
further restriction that all such equality operators appear at the
top level of RestrictInfo clauses. But that's not semantically
necessary.

As an example, assuming p1 and p2 are the path parameters,

(p1 = x) xor (p2 = y)

is semantically safe to memoize, despite the upper-level xor
operator. But the example we started with, with a parameter
used as an argument of jsonb_exists, is not safe to memoize
because we have no grounds to suppose that two hash-equal values
will act the same in jsonb_exists.

regards, tom lane

#14David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#13)
Re: jsonb crash

On Thu, 30 Sept 2021 at 11:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

So I'm now thinking you weren't that far wrong to be looking at
hashability of the top-level qual operator. What is missing is
that you have to restrict candidate cache keys to be the *direct*
arguments of such an operator. Looking any further down in the
expression introduces untenable assumptions.

Hmm ... I think that actually, a correct statement of the semantic
restriction is

To be eligible for memoization, the inside of a join can use the
passed-in parameters *only* as direct arguments of hashable equality
operators.

In order to exploit RestrictInfo-based caching, you could make the
further restriction that all such equality operators appear at the
top level of RestrictInfo clauses. But that's not semantically
necessary.

As an example, assuming p1 and p2 are the path parameters,

(p1 = x) xor (p2 = y)

is semantically safe to memoize, despite the upper-level xor
operator. But the example we started with, with a parameter
used as an argument of jsonb_exists, is not safe to memoize
because we have no grounds to suppose that two hash-equal values
will act the same in jsonb_exists.

I'm not really sure if I follow your comment about the top-level qual
operator. I'm not really sure why that has anything to do with it.
Remember that we *never* do any hashing of any values from the inner
side of the join. If we're doing a parameterized nested loop and say
our parameter has the value of 1, the first time through we don't find
any cached tuples, so we run the plan from the inner side of the
nested loop join and cache all the tuples that we get from it. When
the parameter changes, we check if the current value of the parameter
has any tuples cached. This is what the hashing and equality
comparison does. If the new parameter value is 2, then we'll hash that
and probe the hash table. Since we've only seen value 1 so far, we
won't get a cache hit. If at some later point in time we see the
parameter value of 1 again, we hash that, find something in the hash
bucket for that value then do an equality test to ensure the values
are actually the same and not just the same hash bucket or hash value.

At no point do we do any hashing on the actual cached tuples.

This allows us to memoize any join expression, not just equality
expressions. e.g if the SQL is: SELECT * FROM t1 INNER JOIN t2 on t1.a

t2.a; assuming t2 is on the inner side of the nested loop join,

then the only thing we hash is the t1.a parameter and the only thing
we do an equality comparison on is the current value of t1.a vs some
previous value of t1.a that is stored in the hash table. You can see
here that if t1.a and t2.a are not the same data type then that's of
no relevance as we *never* hash or do any equality comparisons on t2.a
in the memoize code.

The whole thing just hangs together by the assumption that parameters
with the same value will always yield the same tuples. If that's
somehow a wrong assumption, then we have a problem.

I'm not sure if this helps explain how it's meant to work, or if I
just misunderstood you.

David

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#14)
Re: jsonb crash

David Rowley <dgrowleyml@gmail.com> writes:

On Thu, 30 Sept 2021 at 11:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... I think that actually, a correct statement of the semantic
restriction is
To be eligible for memoization, the inside of a join can use the
passed-in parameters *only* as direct arguments of hashable equality
operators.

I'm not really sure if I follow your comment about the top-level qual
operator. I'm not really sure why that has anything to do with it.
Remember that we *never* do any hashing of any values from the inner
side of the join. If we're doing a parameterized nested loop and say
our parameter has the value of 1, the first time through we don't find
any cached tuples, so we run the plan from the inner side of the
nested loop join and cache all the tuples that we get from it. When
the parameter changes, we check if the current value of the parameter
has any tuples cached.

Right, and the point is that if you *do* get a hit, you are assuming
that the inner side would return the same values as it returned for
the previous hash-equal value. You are doing yourself no good by
thinking about simple cases like integers. Think about float8,
and ask yourself whether, if you cached a result for +0, that result
is still good for -0. In general we can only assume that for applications
of the hash equality operator itself (or members of its hash opfamily).
Anything involving a cast to text, for example, would fail on such a case.

This allows us to memoize any join expression, not just equality
expressions.

I am clearly failing to get through to you. Do I need to build
an example?

regards, tom lane

#16David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#15)
Re: jsonb crash

On Thu, 30 Sept 2021 at 11:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

This allows us to memoize any join expression, not just equality
expressions.

I am clearly failing to get through to you. Do I need to build
an example?

Taking your -0.0 / +0.0 float example, If I understand correctly due
to -0.0 and +0.0 hashing to the same value and being classed as equal,
we're really only guaranteed to get the same rows if the join
condition uses the float value (in this example) directly.

If for example there was something like a function we could pass that
float value through that was able to distinguish -0.0 from +0.0, then
that could cause issues as the return value of that function could be
anything and have completely different join partners on the other side
of the join.

A function something like:

create or replace function text_sign(f float) returns text as
$$
begin
if f::text like '-%' then
return 'neg';
else
return 'pos';
end if;
end;
$$ language plpgsql;

would be enough to do it. If the join condition was text_sign(t1.f) =
t2.col and the cache key was t1.f rather than text_sign(t1.f)

On Thu, 30 Sept 2021 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I'm now thinking you weren't that far wrong to be looking at
hashability of the top-level qual operator. What is missing is
that you have to restrict candidate cache keys to be the *direct*
arguments of such an operator. Looking any further down in the
expression introduces untenable assumptions.

I think I also follow you now with the above. The problem is that if
the join operator is able to distinguish something that the equality
operator and hash function are not then we have the same problem.
Restricting the join operator to hash equality ensures that the join
condition cannot distinguish anything that we cannot distinguish in
the cache hash table.

David

#17David Rowley
dgrowleyml@gmail.com
In reply to: Jaime Casanova (#1)
Re: jsonb crash

On Thu, 30 Sept 2021 at 07:55, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

"""
select
75 as c1
from
public.pagg_tab_ml as ref_0,
lateral (select
ref_0.a as c5
from generate_series(1, 300) as sample_0
fetch first 78 rows only
) as subq_0
where case when (subq_0.c5 < 2)
then cast(null as jsonb)
else cast(null as jsonb)
end ? ref_0.c
"""

I've attached 2 patches that aim to fix this bug. One for master and
one for pg14. Unfortunately, for pg14, RestrictInfo lacks a field to
store the righthand type's hash equality operator. I don't think it's
going to be possible as [1]https://codesearch.debian.net/search?q=makeNode%28RestrictInfo%29&amp;literal=1 shows me that there's at least one project
outside of core that does makeNode(RestrictInfo). The next best thing
I can think to do for pg14 is just to limit Memoization for
parameterizations where the RestrictInfo has the same types on the
left and right of an OpExpr. For pg14, the above query just does not
use Memoize anymore.

In theory, since this field is just caching the hash equality
operator, it might be possible to look up the hash equality function
each time when the left and right types don't match and we need to
know the hash equality operator of the right type. That'll probably
not be a super common case, but I wouldn't go as far as to say that
it'll be rare. I'd be a bit worried about the additional planning
time if we went that route. The extra lookup would have to be done
during the join search, so could happen thousands of times given a bit
more than a handful of tables in the join search.

For master, we can just add a new field to RestrictInfo. The master
version of the patch does that.

Does anyone have any thoughts on the proposed fixes?

David

[1]: https://codesearch.debian.net/search?q=makeNode%28RestrictInfo%29&amp;literal=1

Attachments:

memoize_type_bug_master.patchapplication/octet-stream; name=memoize_type_bug_master.patchDownload+54-22
memoize_type_bug_pg14.patchapplication/octet-stream; name=memoize_type_bug_pg14.patchDownload+15-4
#18Justin Pryzby
pryzby@telsasoft.com
In reply to: David Rowley (#17)
Re: jsonb crash

On Tue, Oct 26, 2021 at 07:07:01PM +1300, David Rowley wrote:

Does anyone have any thoughts on the proposed fixes?

I don't have any thoughts, but I want to be sure it isn't forgotten.

--
Justin

#19David Rowley
dgrowleyml@gmail.com
In reply to: Justin Pryzby (#18)
Re: jsonb crash

On Sat, 6 Nov 2021 at 11:38, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Oct 26, 2021 at 07:07:01PM +1300, David Rowley wrote:

Does anyone have any thoughts on the proposed fixes?

I don't have any thoughts, but I want to be sure it isn't forgotten.

Not forgotten. I was just hoping to get some feedback.

I've now pushed the fix to restrict v14 to only allow Memoize when the
left and right types are the same. For master, since it's possible to
add a field to RestrictInfo, I've changed that to cache the left and
right hash equality operators.

This does not fix the binary / logical issue mentioned by Tom. I have
ideas about allowing Memoize to operate in a binary equality mode or
logical equality mode. I'll need to run in binary mode when there are
lateral vars or when any join operator is not hashable.

David

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#19)
Re: jsonb crash

David Rowley <dgrowleyml@gmail.com> writes:

I've now pushed the fix to restrict v14 to only allow Memoize when the
left and right types are the same. For master, since it's possible to
add a field to RestrictInfo, I've changed that to cache the left and
right hash equality operators.

If you were going to push this fix before 14.1, you should have done it
days ago. At this point it's not possible to get a full set of buildfarm
results before the wrap. The valgrind and clobber-cache animals, in
particular, are unlikely to report back in time.

regards, tom lane

#21David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#20)
#22David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#12)
#23David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#22)