BUG #18722: Processing arrays with plpgsql raises errors

Started by PG Bug reporting formover 1 year ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18722
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17.2
Operating system: Ubuntu 22.04
Description:

The following script:
CREATE FUNCTION make_ia() RETURNS int[] LANGUAGE plpgsql AS
'declare x int[]; begin x := array[0]; return x; end';
CREATE FUNCTION ia_eq(int[], int[]) RETURNS boolean LANGUAGE plpgsql AS
'begin return array_eq($1, $2); end';
CREATE OPERATOR = (procedure = ia_eq, leftarg = int[], rightarg = int[]);
SELECT NULLIF(make_ia(), array[1]::int[]);

fails with:
ERROR: cache lookup failed for type 2139062143

Also,
SELECT NULLIF(make_ia(), array[1]::int[]) = NULL;

fails with:
ERROR: invalid memory alloc request size 18446744073642179576

The backtrace of the latter error is:
...
#6 0x000055b75ff0a96e in MemoryContextSizeFailure (context=0x55b760787d70,
size=18446744073642179576, flags=0) at mcxt.c:1170
#7 0x000055b75fefb4ec in MemoryContextCheckSize (context=0x55b760787d70,
size=18446744073642179576, flags=0) at
../../../../src/include/utils/memutils_internal.h:172
#8 0x000055b75fefc087 in AllocSetAllocLarge (context=0x55b760787d70,
size=18446744073642179576, flags=0) at aset.c:705
#9 0x000055b75fefc6c0 in AllocSetAlloc (context=0x55b760787d70,
size=18446744073642179576, flags=0) at aset.c:986
#10 0x000055b75ff0aa40 in MemoryContextAlloc (context=0x55b760787d70,
size=18446744073642179576) at mcxt.c:1200
#11 0x000055b75fd04a64 in copy_byval_expanded_array (eah=0x55b760787e70,
oldeah=0x55b760787e70) at array_expanded.c:197
#12 0x000055b75fd047d7 in expand_array (arraydatum=94246085885576,
parentcontext=0x55b7607b0cd0, metacache=0x7ffd548c2cc0) at
array_expanded.c:106
#13 0x00007f0094e5888f in plpgsql_exec_function (func=0x55b7606f2fb0,
fcinfo=0x55b7607992a8, simple_eval_estate=0x0, simple_eval_resowner=0x0,
procedure_resowner=0x0, atomic=true) at pl_exec.c:564
#14 0x00007f0094e75728 in plpgsql_call_handler (fcinfo=0x55b7607992a8) at
pl_handler.c:276
#15 0x000055b75f9dcd47 in ExecInterpExpr (state=0x55b760799150,
econtext=0x55b760798ef8, isnull=0x7ffd548c32df) at execExprInterp.c:746
...

(gdb) f 12
#12 0x000055b75fd047d7 in expand_array (arraydatum=94246085885576,
parentcontext=0x55b7607b0cd0, metacache=0x7ffd548c2cc0) at
array_expanded.c:106
106 copy_byval_expanded_array(eah, oldeah);
(gdb) p/x *oldeah
$1 = {hdr = {vl_len_ = 0xffffffff, eoh_methods = 0x55b7602d8f80, eoh_context
= 0x55b760787d70, eoh_rw_ptr = {0x1, 0x3, 0x70, 0x7e, 0x78, 0x60, 0xb7,
0x55, 0x0, 0x0}, eoh_ro_ptr = {0x1, 0x2, 0x70, 0x7e, 0x78, 0x60, 0xb7, 0x55,
0x0,
0x0}}, ea_magic = 0x29170a59, ndims = 0x7f7f7f7f, dims =
0x7f7f7f7f7f7f7f7f, lbound = 0x7f7f7f7f7f7f7f7f, element_type = 0x7f7f7f7f,
typlen = 0x7f7f, typbyval = 0x7f, typalign = 0x7f, dvalues =
0x7f7f7f7f7f7f7f7f,
dnulls = 0x7f7f7f7f7f7f7f7f, dvalueslen = 0x7f7f7f7f, nelems = 0x7f7f7f7f,
flat_size = 0x7f7f7f7f7f7f7f7f, fvalue = 0x7f7f7f7f7f7f7f7f, fstartptr =
0x7f7f7f7f7f7f7f7f, fendptr = 0x7f7f7f7f7f7f7f7f}

I discovered this issue with SQLsmith.
Reproduced starting from 1dc5ebc90.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18722: Processing arrays with plpgsql raises errors

PG Bug reporting form <noreply@postgresql.org> writes:

The following script:
CREATE FUNCTION make_ia() RETURNS int[] LANGUAGE plpgsql AS
'declare x int[]; begin x := array[0]; return x; end';
CREATE FUNCTION ia_eq(int[], int[]) RETURNS boolean LANGUAGE plpgsql AS
'begin return array_eq($1, $2); end';
CREATE OPERATOR = (procedure = ia_eq, leftarg = int[], rightarg = int[]);
SELECT NULLIF(make_ia(), array[1]::int[]);

fails with:
ERROR: cache lookup failed for type 2139062143

Nice catch! What is happening here is that make_ia returns a
read/write pointer to an expanded array object. The EEOP_NULLIF
code passes that pointer straight on to the equality function.
Which in this case is a plpgsql function that will suppose it
can take ownership of the expanded object, resulting in said
object being freed before return. (Neither function has done
anything wrong.) The problem is that EEOP_NULLIF then returns
the original Datum pointer, which is now pointing at garbage.
The different failures you get depending on what is done next
with the Datum are not too surprising.

What EEOP_NULLIF needs to do is pass a read-only pointer to the
equality function, so that the object is not modified and remains
available to return if we want to do so.

Attached is a quick WIP patch to handle that. It is missing a test
case, but the real omission is that llvm_compile_expr()'s EEOP_NULLIF
handling also needs to be fixed, and I'm pretty unsure how to do that.

I'm wondering now if any of our other conditional expressions have
similar bugs ...

regards, tom lane

Attachments:

bug-18722-fix-wip.patchtext/x-diff; charset=us-ascii; name=bug-18722-fix-wip.patchDownload+22-1
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: BUG #18722: Processing arrays with plpgsql raises errors

I wrote:

What EEOP_NULLIF needs to do is pass a read-only pointer to the
equality function, so that the object is not modified and remains
available to return if we want to do so.

Attached is a quick WIP patch to handle that. It is missing a test
case, but the real omission is that llvm_compile_expr()'s EEOP_NULLIF
handling also needs to be fixed, and I'm pretty unsure how to do that.

Here's a fleshed-out patch with a test case and JIT support. This
is about the first time I've messed with LLVM, so I wouldn't mind
some review of what I did in llvmjit_expr.c. In particular, do I
correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces
a reference to a copy of the initial value of args[0].value?
It seems to work in testing --- the original R/W pointer is returned
out of the expression --- but if we returned the R/O pointer instead
it'd be mighty difficult to detect the resulting inefficiency.

I'm wondering now if any of our other conditional expressions have
similar bugs ...

I was amused to discover that case.sql already had infrastructure
suitable for testing this, thanks to bug #14472.

regards, tom lane

Attachments:

bug-18722-fix-v1.patchtext/x-diff; charset=us-ascii; name=bug-18722-fix-v1.patchDownload+64-5
#4Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #18722: Processing arrays with plpgsql raises errors

On Sun, 24 Nov 2024 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a fleshed-out patch with a test case and JIT support. This
is about the first time I've messed with LLVM, so I wouldn't mind
some review of what I did in llvmjit_expr.c. In particular, do I
correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces
a reference to a copy of the initial value of args[0].value?

I don't know about that, but I wonder if this bug could be fixed by
having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it
wouldn't be necessary to make any changes to the expression evaluation
code.

Regards,
Dean

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#4)
Re: BUG #18722: Processing arrays with plpgsql raises errors

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sun, 24 Nov 2024 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a fleshed-out patch with a test case and JIT support. This
is about the first time I've messed with LLVM, so I wouldn't mind
some review of what I did in llvmjit_expr.c. In particular, do I
correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces
a reference to a copy of the initial value of args[0].value?

I don't know about that, but I wonder if this bug could be fixed by
having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it
wouldn't be necessary to make any changes to the expression evaluation
code.

That would entirely destroy one of the primary performance benefits of
the expanded-object infrastructure. The idea is that if you have
fconsumer(fproducer(...), ...)
and fproducer returns a read-write pointer to an object it's built,
then fconsumer should be able to take ownership of the object and
use it as a local variable (possibly modifying it) without incurring
any object-copying overhead.

This works in any context where an intermediate expression value
has a single consumer, which is most. If there are multiple
consumers then we need to insert MAKE_READONLY steps for all
(or all but one) of them. I overlooked EEOP_NULLIF as such
a case, but I don't think there are so many more cases as to
justify throwing away the concept altogether.

regards, tom lane

#6Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#5)
Re: BUG #18722: Processing arrays with plpgsql raises errors

On Mon, 25 Nov 2024 at 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

I don't know about that, but I wonder if this bug could be fixed by
having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it
wouldn't be necessary to make any changes to the expression evaluation
code.

That would entirely destroy one of the primary performance benefits of
the expanded-object infrastructure. The idea is that if you have
fconsumer(fproducer(...), ...)
and fproducer returns a read-write pointer to an object it's built,
then fconsumer should be able to take ownership of the object and
use it as a local variable (possibly modifying it) without incurring
any object-copying overhead.

This works in any context where an intermediate expression value
has a single consumer, which is most. If there are multiple
consumers then we need to insert MAKE_READONLY steps for all
(or all but one) of them. I overlooked EEOP_NULLIF as such
a case, but I don't think there are so many more cases as to
justify throwing away the concept altogether.

I didn't mean do it in all cases, I just meant the NullIfExpr case
identified here. My point was that instead of modifying the evaluation
code for EEOP_NULLIF to make it call
MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a
EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF
step.

Regards,
Dean

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#6)
Re: BUG #18722: Processing arrays with plpgsql raises errors

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

I didn't mean do it in all cases, I just meant the NullIfExpr case
identified here. My point was that instead of modifying the evaluation
code for EEOP_NULLIF to make it call
MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a
EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF
step.

But then the NULLIF step would only have access to the R/O pointer,
no? We do want to pass on a R/W pointer to the output, if we got
one, to handle cases like
fconsumer(NULLIF(fproducer(...), ...), ...)
Admittedly that's a pretty edgy edge-case, but still we're leaving
money on the table if we don't do it. So I think we have to deal
with the issue within NULLIF.

regards, tom lane

#8Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#7)
Re: BUG #18722: Processing arrays with plpgsql raises errors

On Mon, 25 Nov 2024 at 19:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

I didn't mean do it in all cases, I just meant the NullIfExpr case
identified here. My point was that instead of modifying the evaluation
code for EEOP_NULLIF to make it call
MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a
EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF
step.

But then the NULLIF step would only have access to the R/O pointer,
no? We do want to pass on a R/W pointer to the output, if we got
one, to handle cases like
fconsumer(NULLIF(fproducer(...), ...), ...)
Admittedly that's a pretty edgy edge-case, but still we're leaving
money on the table if we don't do it. So I think we have to deal
with the issue within NULLIF.

OK, that makes sense.

Regards,
Dean