Reference Leak with type
Hi All ,
I found the below reference leak on master.
Steps to reproduce the issue :
--1) create type
create type float_array_typ as ( i float8);
--2) create anonymous block
postgres=# do $$
declare
a float_array_typ[];
begin
a[1].i := 11;
commit;
end
$$;
WARNING: TupleDesc reference leak: TupleDesc 0x7ff7673b15f0 (16386,-1)
still referenced
ERROR: tupdesc reference 0x7ff7673b15f0 is not owned by resource owner
TopTransaction
postgres=#
*Regards,*
Rohit
On Tue, Apr 06, 2021 at 11:09:13AM +0530, Rohit Bhogate wrote:
I found the below reference leak on master.
Thanks for the report. This is indeed a new problem as of HEAD,
coming from c9d52984 as far as I can see, and 13 does not support this
grammar. From what I can see, there seems to be an issue with the
reference count of the TupleDesc here, your test case increments two
times a TupleDesc for this custom type in a portal, and tries to
decrement it three times, causing what looks like a leak.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Tue, Apr 06, 2021 at 11:09:13AM +0530, Rohit Bhogate wrote:
I found the below reference leak on master.
Thanks for the report. This is indeed a new problem as of HEAD,
Just for the record, it's not new. The issue is (I think) that
the tupledesc refcount created by get_cached_rowtype is being
logged in the wrong ResourceOwner. Other cases that use
get_cached_rowtype, such as IS NOT NULL on a composite value,
reproduce the same type of failure back to v11:
create type float_rec_typ as (i float8);
do $$
declare
f float_rec_typ := row(42);
r bool;
begin
r := f is not null;
commit;
end
$$;
WARNING: TupleDesc reference leak: TupleDesc 0x7f5f549809d8 (53719,-1) still referenced
ERROR: tupdesc reference 0x7f5f549809d8 is not owned by resource owner TopTransaction
Still poking at a suitable fix.
regards, tom lane
On Fri, Apr 9, 2021 at 10:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Tue, Apr 06, 2021 at 11:09:13AM +0530, Rohit Bhogate wrote:
I found the below reference leak on master.
Thanks for the report. This is indeed a new problem as of HEAD,
Just for the record, it's not new. The issue is (I think) that
the tupledesc refcount created by get_cached_rowtype is being
logged in the wrong ResourceOwner. Other cases that use
get_cached_rowtype, such as IS NOT NULL on a composite value,
reproduce the same type of failure back to v11:create type float_rec_typ as (i float8);
do $$
declare
f float_rec_typ := row(42);
r bool;
begin
r := f is not null;
commit;
end
$$;WARNING: TupleDesc reference leak: TupleDesc 0x7f5f549809d8 (53719,-1)
still referenced
ERROR: tupdesc reference 0x7f5f549809d8 is not owned by resource owner
TopTransactionStill poking at a suitable fix.
regards, tom lane
Hi,
I think I have some idea about the cause for the 'resource owner' error.
When commit results in calling exec_stmt_commit(), the ResourceOwner
switches to a new one.
Later, when ResourceOwnerForgetTupleDesc() is called, we get the error
since owner->tupdescarr doesn't carry the tuple Desc to be forgotten.
One potential fix is to add the following to resowner.c
/*
* Transfer resources from resarr1 to resarr2
*/
static void
ResourceArrayTransfer(ResourceArray *resarr1, ResourceArray *resarr2)
{
}
In exec_stmt_commit(), we save reference to the old ResourceOwner before
calling SPI_commit() (line 4824).
Then after the return from SPI_start_transaction(), ResourceArrayTransfer()
is called to transfer remaining items in tupdescarr from old ResourceOwner
to the current ResourceOwner.
I want to get some opinion on the feasibility of this route.
It seems ResourceOwner is opaque inside exec_stmt_commit(). And
no ResourceArrayXX call exists in pl_exec.c
So I am still looking for the proper structure of the solution.
Cheers
Here's a proposed patch for this problem.
The core problem in this test case is that the refcount is logged in the
Portal resowner, which is a child of the initial transaction's resowner,
so it goes away in the COMMIT (after warning of a resource leak); but
the expression tree is still there and still thinks it has a refcount.
By chance a new ResourceOwner is created in the same place where the old
one was, so that when the expression tree is finally destroyed at the
end of the DO block, we see an error about "this refcount isn't logged
here" rather than a crash. Unrelated-looking code changes could turn
that into a real crash, of course.
I spent quite a bit of time fruitlessly trying to fix it by manipulating
which resowner the tupledesc refcount is logged in, specifically by
running plpgsql "simple expressions" with the simple_eval_resowner as
CurrentResourceOwner. But this just causes other problems to appear,
because then that resowner becomes responsible for more stuff than
just the plancache refcounts that plpgsql is expecting it to hold.
Some of that stuff needs to be released at subtransaction abort,
which is problematic because most of what plpgsql wants it to deal
in needs to survive until end of main transaction --- in particular,
the plancache refcounts need to live that long, and so do the tupdesc
refcounts we're concerned with here, because those are associated with
"simple expression" trees that are supposed to have that lifespan.
It's possible that we could make this approach work, but at minimum
it'd require creating and destroying an additional resowner per
subtransaction; and maybe we'd have to give up on sharing "simple
expression" trees across subtransactions. So the potential performance
hit is pretty bad, and I'm not even 100% sure it'd work at all.
So the alternative proposed in the attached is to give up on associating
a long-lived tupdesc refcount with these expression nodes at all.
Intead, we can use a method that plpgsql has been using for a few
years, which is to rely on the fact that typcache entries never go
away once made, and just save a pointer into the typcache. We can
detect possible changes in the cache entry by watching for changes
in its tupDesc_identifier counter.
This infrastructure exists as far back as v11, so using it doesn't
present any problems for back-patchability. It is slightly
nervous-making that we have to change some fields in struct ExprEvalStep
--- but the overall struct size isn't changing, and I can't really
see a reason why extensions would be interested in the contents of
these particular subfield types.
regards, tom lane