Should pg 11 use a lot more memory building an spgist index?

Started by Bruno Wolff IIIabout 7 years ago24 messages
#1Bruno Wolff III
bruno@wolff.to

While reloading a database cluster to move from 10.5 to 11, I'm getting
out of memory crashes that I did see when doing reloads on pg 10.
The statement flagged in the log is this:
2018-10-23 16:44:34.815 CDT [126839] STATEMENT: ALTER TABLE ONLY public.iplocation
ADD CONSTRAINT overlap EXCLUDE USING spgist (network WITH &&);

iplocation has 4398722 rows.

This is geolite data where the networks are a partial covering of the total
address spaces with no overlaps.

Should I expect to have to make config changes to make this work?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#1)
Re: Should pg 11 use a lot more memory building an spgist index?

Bruno Wolff III <bruno@wolff.to> writes:

While reloading a database cluster to move from 10.5 to 11, I'm getting
out of memory crashes that I did see when doing reloads on pg 10.
The statement flagged in the log is this:
2018-10-23 16:44:34.815 CDT [126839] STATEMENT: ALTER TABLE ONLY public.iplocation
ADD CONSTRAINT overlap EXCLUDE USING spgist (network WITH &&);

Hm, there's a fair amount of new code in SP-GIST in v11, so maybe you've
hit a memory leak in that. Can you create a self-contained test case?

regards, tom lane

#3Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#2)
Re: Should pg 11 use a lot more memory building an spgist index?

On Wed, Oct 24, 2018 at 09:33:48 +0100,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruno Wolff III <bruno@wolff.to> writes:

While reloading a database cluster to move from 10.5 to 11, I'm getting
out of memory crashes that I did see when doing reloads on pg 10.
The statement flagged in the log is this:
2018-10-23 16:44:34.815 CDT [126839] STATEMENT: ALTER TABLE ONLY public.iplocation
ADD CONSTRAINT overlap EXCLUDE USING spgist (network WITH &&);

Hm, there's a fair amount of new code in SP-GIST in v11, so maybe you've
hit a memory leak in that. Can you create a self-contained test case?

I'll try. I think I should only need the geolite data to cause the problem
and I can share that publicly. So far the problem seems to be happening
consistently. I'll work on this at the office, but probably won't get it
done until the afternoon.

If I have a substantial database dump file to provide for reproducing this
do you prefer it on a web server somewhere? I expect that mailing very
large attachments to the lists is a bad idea.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#3)
Re: Should pg 11 use a lot more memory building an spgist index?

Bruno Wolff III <bruno@wolff.to> writes:

If I have a substantial database dump file to provide for reproducing this
do you prefer it on a web server somewhere? I expect that mailing very
large attachments to the lists is a bad idea.

No, don't do that. If you can make sample data available for download,
or point to some accessible dataset somewhere, that'd work.

regards, tom lane

#5Bruno Wolff III
bruno@wolff.to
In reply to: Bruno Wolff III (#1)
Re: Should pg 11 use a lot more memory building an spgist index?

On Tue, Oct 23, 2018 at 20:23:14 -0500,
Bruno Wolff III <bruno@wolff.to> wrote:

While reloading a database cluster to move from 10.5 to 11, I'm
getting out of memory crashes that I did see when doing reloads on pg
10.
The statement flagged in the log is this:
2018-10-23 16:44:34.815 CDT [126839] STATEMENT: ALTER TABLE ONLY public.iplocation
ADD CONSTRAINT overlap EXCLUDE USING spgist (network WITH &&);

iplocation has 4398722 rows.

I'm trying to reproduce this on my desktop but so far it isn't producing
the same issue. The database is still loading after 9 hours, but it looks
like it got past the point where the problem index was created. (I'm not
sure how to check if the index has really finished being created, but \d
shows it as existing.)

I'll know better tomorrow.

My workstation is using the Fedora version of postgresql 11 which might
have some relevant difference from the pgdg version for rhel7. I still
have a number of things I can try, but they might take significant time
to run and I might not get a reasonable test case for a while.

#6Bruno Wolff III
bruno@wolff.to
In reply to: Bruno Wolff III (#5)
Re: Should pg 11 use a lot more memory building an spgist index?

It looks like it got past creating the exclude constraint based on the
ordering of commands in the dump file. However creating a more normal
spgist index is taking a very long time with a lot of disk wait time.
CPU usage seems pretty low for the amount of time it has been working
on building that index, but maybe that is normal for building indexes.
I used the almost the default postgresql.conf on my workstation. I bumped
up the limits in a few places on the server that could have allowed a lot
more memory to be used especially if the index creation was parallelized.
While the load is running I'll see if I can tell if there is a memory leak
with this index build. Once it finishes, I can dump a specific table
and test building the exclude spgist index with some different settings
to see if I can reproduce the out of memory error with a lot less data
then is in the whole database.

#7Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#4)
Re: Should pg 11 use a lot more memory building an spgist index?

On Wed, Oct 24, 2018 at 10:21:11 +0100,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruno Wolff III <bruno@wolff.to> writes:

If I have a substantial database dump file to provide for reproducing this
do you prefer it on a web server somewhere? I expect that mailing very
large attachments to the lists is a bad idea.

No, don't do that. If you can make sample data available for download,
or point to some accessible dataset somewhere, that'd work.

regards, tom lane

I have something that seems to produce it on rhel7. Fedora isn't working
well either, but the difference may be due to postgresql.conf being
different or some difference in the Fedora build.

http://wolff.to/iplocation is a bit under 400mb. It should download at about
1MB/sec. It is a plain text dump of the iplocation table with the alter table
for the constaint / exclude index added at the end.

http://wolff.to/postgresql.conf is the config file I use on the server.

The server has the following installed (but you don't need plperl for the
test):
postgresql11-plperl-11.0-1PGDG.rhel7.x86_64
postgresql11-libs-11.0-1PGDG.rhel7.x86_64
postgresql11-docs-11.0-1PGDG.rhel7.x86_64
postgresql11-11.0-1PGDG.rhel7.x86_64
postgresql11-server-11.0-1PGDG.rhel7.x86_64

The output of
createdb -U postgres test
psql -U postgres -f iplocation test
is:
SET
SET
SET
SET
SET
set_config
------------

(1 row)

SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
COPY 4398722
psql:iplocation:4398789: ERROR: out of memory
DETAIL: Failed on request of size 6264 in memory context "PortalContext".

It is certainly possible that my postgresql.conf file is bad and that I
just got away with it under 10.5 by the. The server is a vm with 32GB of
memory allocated to it. I set vm.overcommit_memory = 2 to avoid the oom
killer after upgrading to 11. Before that I didn't have a problem.

On Fedora with a more vanilla postgresql.conf the exclude constraint
built fine, but creating an spgist index file is taking forever (near a
day now) creating a normal spgist index on an ip address column for a
table with a lot of rows (I think around 150 million), that ran in a
reasonable amount of time on the server.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#7)
Re: Should pg 11 use a lot more memory building an spgist index?

Bruno Wolff III <bruno@wolff.to> writes:

I have something that seems to produce it on rhel7. Fedora isn't working
well either, but the difference may be due to postgresql.conf being
different or some difference in the Fedora build.

Hmm, in my hands this produces the same size leak (~28GB) in either v10
or v11. In HEAD, somebody's made it even worse (~43GB). So this is
certainly pretty broken, but I'm not sure why it seems worse to you in
v11 than before.

The core of the problem I see is that check_exclusion_or_unique_constraint
does index_beginscan/index_rescan/index_endscan in a long-lived context,
while spgendscan seems to have employed dice while deciding which of
spgbeginscan's allocations it would bother to pfree. This is ancient,
though the specific case you have here can only be tested back to v10
because the inet SPGIST opclass wasn't there before.

A quick review of the other index AM endscan methods seems to indicate
that they all try to clean up their mess. So maybe we should just make
spgendscan do likewise. Alternatively, we could decide that requiring
endscan methods to free storage is not very safe, in which case it would
be incumbent on check_exclusion_or_unique_constraint to make a temporary
context to run the scan in. But if we're going to do the latter, I think
we oughta go full in and remove the retail pfree's from all the *endscan
functions. We'd also have to review other callers of
index_beginscan/index_endscan to see which ones might also need their own
temp contexts. So that would surely end up being more invasive than
just adding some pfree's to spgendscan would be. Maybe in the long run
it'd be worth it, but probably not in the short run, or for back-patching.

Thoughts?

regards, tom lane

#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#8)
Re: Should pg 11 use a lot more memory building an spgist index?

On 2018/10/26 18:16, Tom Lane wrote:

The core of the problem I see is that check_exclusion_or_unique_constraint
does index_beginscan/index_rescan/index_endscan in a long-lived context,
while spgendscan seems to have employed dice while deciding which of
spgbeginscan's allocations it would bother to pfree. This is ancient,
though the specific case you have here can only be tested back to v10
because the inet SPGIST opclass wasn't there before.

A quick review of the other index AM endscan methods seems to indicate
that they all try to clean up their mess. So maybe we should just make
spgendscan do likewise. Alternatively, we could decide that requiring
endscan methods to free storage is not very safe, in which case it would
be incumbent on check_exclusion_or_unique_constraint to make a temporary
context to run the scan in. But if we're going to do the latter, I think
we oughta go full in and remove the retail pfree's from all the *endscan
functions. We'd also have to review other callers of
index_beginscan/index_endscan to see which ones might also need their own
temp contexts. So that would surely end up being more invasive than
just adding some pfree's to spgendscan would be. Maybe in the long run
it'd be worth it, but probably not in the short run, or for back-patching.

FWIW, I would prefer the latter. Not that people write new AMs on a
regular basis because we gave them an easier interface via CREATE ACCESS
METHOD, but it still seems better for the core code to deal with memory
allocation/freeing to avoid running into issues like this.

Thanks,
Amit

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#9)
Re: Should pg 11 use a lot more memory building an spgist index?

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/10/26 18:16, Tom Lane wrote:

A quick review of the other index AM endscan methods seems to indicate
that they all try to clean up their mess. So maybe we should just make
spgendscan do likewise. Alternatively, we could decide that requiring
endscan methods to free storage is not very safe, in which case it would
be incumbent on check_exclusion_or_unique_constraint to make a temporary
context to run the scan in. But if we're going to do the latter, I think
we oughta go full in and remove the retail pfree's from all the *endscan
functions. We'd also have to review other callers of
index_beginscan/index_endscan to see which ones might also need their own
temp contexts. So that would surely end up being more invasive than
just adding some pfree's to spgendscan would be. Maybe in the long run
it'd be worth it, but probably not in the short run, or for back-patching.

FWIW, I would prefer the latter. Not that people write new AMs on a
regular basis because we gave them an easier interface via CREATE ACCESS
METHOD, but it still seems better for the core code to deal with memory
allocation/freeing to avoid running into issues like this.

After a quick look around, I think that making systable_begin/endscan
do this is a nonstarter; there are just too many call sites that would
be affected. Now, you could imagine specifying that indexes on system
catalogs (in practice, only btree) have to clean up at endscan time
but other index types don't, so that only operations that might be
scanning user indexes need to have suitable wrapping contexts. Not sure
there's a lot of benefit to that, though.

regards, tom lane

#11Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#10)
Re: Should pg 11 use a lot more memory building an spgist index?

On 2018/10/26 18:59, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/10/26 18:16, Tom Lane wrote:

A quick review of the other index AM endscan methods seems to indicate
that they all try to clean up their mess. So maybe we should just make
spgendscan do likewise. Alternatively, we could decide that requiring
endscan methods to free storage is not very safe, in which case it would
be incumbent on check_exclusion_or_unique_constraint to make a temporary
context to run the scan in. But if we're going to do the latter, I think
we oughta go full in and remove the retail pfree's from all the *endscan
functions. We'd also have to review other callers of
index_beginscan/index_endscan to see which ones might also need their own
temp contexts. So that would surely end up being more invasive than
just adding some pfree's to spgendscan would be. Maybe in the long run
it'd be worth it, but probably not in the short run, or for back-patching.

FWIW, I would prefer the latter. Not that people write new AMs on a
regular basis because we gave them an easier interface via CREATE ACCESS
METHOD, but it still seems better for the core code to deal with memory
allocation/freeing to avoid running into issues like this.

After a quick look around, I think that making systable_begin/endscan
do this is a nonstarter; there are just too many call sites that would
be affected. Now, you could imagine specifying that indexes on system
catalogs (in practice, only btree) have to clean up at endscan time
but other index types don't, so that only operations that might be
scanning user indexes need to have suitable wrapping contexts. Not sure
there's a lot of benefit to that, though.

By "core code", I didn't mean systable_being/endscan, but rather
check_exclusion_or_unique_constraint() or its core-side caller(s). But
maybe I misunderstood something about your diagnosis upthread where you said:

"The core of the problem I see is that check_exclusion_or_unique_constraint
does index_beginscan/index_rescan/index_endscan in a long-lived context,"

Thanks,
Amit

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#11)
Re: Should pg 11 use a lot more memory building an spgist index?

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/10/26 18:59, Tom Lane wrote:

After a quick look around, I think that making systable_begin/endscan
do this is a nonstarter; there are just too many call sites that would
be affected. Now, you could imagine specifying that indexes on system
catalogs (in practice, only btree) have to clean up at endscan time
but other index types don't, so that only operations that might be
scanning user indexes need to have suitable wrapping contexts. Not sure
there's a lot of benefit to that, though.

By "core code", I didn't mean systable_being/endscan, but rather
check_exclusion_or_unique_constraint() or its core-side caller(s).

Well, we'd need to consider any call path leading to index_endscan.
One of those is systable_endscan and its multitude of callers. It seems
unlikely that you could just switch context in systable_beginscan
without breaking many of the callers.

If we forbade leaks in system catalog index AMs, then the number of
places that would need work would be manageable (about 3, it looked
like). But then it seems more like a wart than a real API improvement.

regards, tom lane

#13Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#8)
Re: Should pg 11 use a lot more memory building an spgist index?

On Fri, Oct 26, 2018 at 10:16:09 +0100,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruno Wolff III <bruno@wolff.to> writes:

I have something that seems to produce it on rhel7. Fedora isn't working
well either, but the difference may be due to postgresql.conf being
different or some difference in the Fedora build.

Hmm, in my hands this produces the same size leak (~28GB) in either v10
or v11. In HEAD, somebody's made it even worse (~43GB). So this is
certainly pretty broken, but I'm not sure why it seems worse to you in
v11 than before.

As a short term work around, could I create the index first and use
insert statements, each in their own transaction, to get the table loaded
with the index?

Is the issue on Fedora taking very long to build a normal spgist index for
network addresses worth pursuing separately, or is it likely to be the same
underlying cause? I don't really need to get this working there, as that was
just to help with testing.

I could also try adjusting memory limits temporarily. If the leak is 28GB
on a 32GB system I might be able to get the index built if less memory
is tied up in other things. My workstation also has 32GB and the exclude
index did build there with the postgresql.conf having lower memory limits.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#13)
Re: Should pg 11 use a lot more memory building an spgist index?

Bruno Wolff III <bruno@wolff.to> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm, in my hands this produces the same size leak (~28GB) in either v10
or v11. In HEAD, somebody's made it even worse (~43GB). So this is
certainly pretty broken, but I'm not sure why it seems worse to you in
v11 than before.

As a short term work around, could I create the index first and use
insert statements, each in their own transaction, to get the table loaded
with the index?

Yes; it might also be that you don't even need to break it up into
separate statements.

Is the issue on Fedora taking very long to build a normal spgist index for
network addresses worth pursuing separately, or is it likely to be the same
underlying cause?

This issue only applies if it was an exclusion constraint. If you saw
slowness or bloat with a plain index, that would be worth investigating
separately.

regards, tom lane

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#10)
Re: Should pg 11 use a lot more memory building an spgist index?

On 2018-Oct-26, Tom Lane wrote:

After a quick look around, I think that making systable_begin/endscan
do this is a nonstarter; there are just too many call sites that would
be affected. Now, you could imagine specifying that indexes on system
catalogs (in practice, only btree) have to clean up at endscan time
but other index types don't, so that only operations that might be
scanning user indexes need to have suitable wrapping contexts. Not sure
there's a lot of benefit to that, though.

How about modifying SysScanDescData to have a memory context member,
which is created by systable_beginscan and destroyed by endscan?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#15)
Re: Should pg 11 use a lot more memory building an spgist index?

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Oct-26, Tom Lane wrote:

After a quick look around, I think that making systable_begin/endscan
do this is a nonstarter; there are just too many call sites that would
be affected. Now, you could imagine specifying that indexes on system
catalogs (in practice, only btree) have to clean up at endscan time
but other index types don't, so that only operations that might be
scanning user indexes need to have suitable wrapping contexts. Not sure
there's a lot of benefit to that, though.

How about modifying SysScanDescData to have a memory context member,
which is created by systable_beginscan and destroyed by endscan?

I think it would still have problems, in that it would affect in which
context index_getnext delivers its output. We could reasonably make
these sorts of changes in places where the entire index_beginscan/
index_getnext/index_endscan call structure is in one place, but that
doesn't apply for the systable functions.

Also, going in that direction would imply an additional memory context
switch / switch-back per tuple processed (around the index_getnext call),
which would create questions about whether it has a negative performance
impact.

regards, tom lane

#17Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#14)
Re: Should pg 11 use a lot more memory building an spgist index?

On Fri, Oct 26, 2018 at 13:44:07 +0100,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruno Wolff III <bruno@wolff.to> writes:

As a short term work around, could I create the index first and use
insert statements, each in their own transaction, to get the table loaded
with the index?

Yes; it might also be that you don't even need to break it up into
separate statements.

It was time to refresh the geolite data anyway so I tried this. I needed
to turn memory_overcommit back on (0) to avoid an error, but the load went OK
without the oom killer doing anything. So things are fully working again.

Thanks for your help.

Is the issue on Fedora taking very long to build a normal spgist index for
network addresses worth pursuing separately, or is it likely to be the same
underlying cause?

This issue only applies if it was an exclusion constraint. If you saw
slowness or bloat with a plain index, that would be worth investigating
separately.

I'll start a seperate thread if I get something to reasonably ask about.
The current dataset is probably a lot larger then needed to demonstrate the
issue. The difference might be do to configuration or how Fedora built it.
And I'll want to compare back to version 10. In the end I'll probably ask
why it is slower in one case as opposed to the other and it might not even
be a real bug.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#16)
1 attachment(s)
Re: Should pg 11 use a lot more memory building an spgist index?

I wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

How about modifying SysScanDescData to have a memory context member,
which is created by systable_beginscan and destroyed by endscan?

I think it would still have problems, in that it would affect in which
context index_getnext delivers its output. We could reasonably make
these sorts of changes in places where the entire index_beginscan/
index_getnext/index_endscan call structure is in one place, but that
doesn't apply for the systable functions.

Actually, after looking more closely, index_getnext doesn't return a
palloc'd object anyway, or at least not one that the caller is responsible
for managing. So maybe that could work.

I was confused about why the memory leak in Bruno's example is so much
larger in HEAD than v11; spgbeginscan does allocate more stuff than
before, but only if numberOfOrderBys > 0, which surely doesn't apply for
the exclusion-check code path. Eventually I realized that the difference
is because commit 2a6368343 made SpGistScanOpaqueData a good deal larger
than it had been (10080 vs 6264 bytes, on my x86_64 box), so it was just
the failure to pfree that struct that accounted for the bigger leak.

There's another issue with 2a6368343, though: it added a couple of
fmgr_info_copy calls to spgbeginscan. I don't understand why it'd be a
good idea to do that rather than using the FmgrInfos in the index's
relcache entry directly. Making a copy every time risks a memory leak,
because spgendscan has no way to free any fn_extra data that gets
allocated by the called function; and by the same token it's inefficient,
because the function has no way to cache fn_extra data across queries.

If we consider only the leak aspect, this coding presents a reason why
we should try to change things as Alvaro suggests. However, because of
the poor-caching aspect, I think this code is pretty broken anyway,
and once we fix it the issue is much less clear-cut.

(Looking around, it seems like we're very schizophrenic about whether
to copy index relcache support function FmgrInfos or just use them
directly. But nbtree and hash seem to always do the latter, so I think
it's probably reasonable to standardize on that.)

Anyway, the attached proposed patch for HEAD makes Bruno's problem go
away, and it also fixes an unrelated leak introduced by 2a6368343
because it reallocates so->orderByTypes each time through spgrescan.
I think we should apply this, and suitable subsets in the back branches,
to fix the immediate problem. Then we can work at leisure on a more
invasive HEAD-only patch, if anyone is excited about that.

regards, tom lane

Attachments:

fix-spgist-memory-leaks-1.patchtext/x-diff; charset=us-ascii; name=fix-spgist-memory-leaks-1.patchDownload
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index a63fde2..c883ae9 100644
*** a/src/backend/access/spgist/spgscan.c
--- b/src/backend/access/spgist/spgscan.c
*************** spgbeginscan(Relation rel, int keysz, in
*** 294,303 ****
  	/* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */
  	so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel);
  
  	if (scan->numberOfOrderBys > 0)
  	{
! 		so->zeroDistances = palloc(sizeof(double) * scan->numberOfOrderBys);
! 		so->infDistances = palloc(sizeof(double) * scan->numberOfOrderBys);
  
  		for (i = 0; i < scan->numberOfOrderBys; i++)
  		{
--- 294,311 ----
  	/* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */
  	so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel);
  
+ 	/* Allocate various arrays needed for order-by scans */
  	if (scan->numberOfOrderBys > 0)
  	{
! 		/* This will be filled in spgrescan, but allocate the space here */
! 		so->orderByTypes = (Oid *)
! 			palloc(sizeof(Oid) * scan->numberOfOrderBys);
! 
! 		/* These arrays have constant contents, so we can fill them now */
! 		so->zeroDistances = (double *)
! 			palloc(sizeof(double) * scan->numberOfOrderBys);
! 		so->infDistances = (double *)
! 			palloc(sizeof(double) * scan->numberOfOrderBys);
  
  		for (i = 0; i < scan->numberOfOrderBys; i++)
  		{
*************** spgbeginscan(Relation rel, int keysz, in
*** 305,313 ****
  			so->infDistances[i] = get_float8_infinity();
  		}
  
! 		scan->xs_orderbyvals = palloc0(sizeof(Datum) * scan->numberOfOrderBys);
! 		scan->xs_orderbynulls = palloc(sizeof(bool) * scan->numberOfOrderBys);
! 		memset(scan->xs_orderbynulls, true, sizeof(bool) * scan->numberOfOrderBys);
  	}
  
  	fmgr_info_copy(&so->innerConsistentFn,
--- 313,324 ----
  			so->infDistances[i] = get_float8_infinity();
  		}
  
! 		scan->xs_orderbyvals = (Datum *)
! 			palloc0(sizeof(Datum) * scan->numberOfOrderBys);
! 		scan->xs_orderbynulls = (bool *)
! 			palloc(sizeof(bool) * scan->numberOfOrderBys);
! 		memset(scan->xs_orderbynulls, true,
! 			   sizeof(bool) * scan->numberOfOrderBys);
  	}
  
  	fmgr_info_copy(&so->innerConsistentFn,
*************** spgrescan(IndexScanDesc scan, ScanKey sc
*** 336,341 ****
--- 347,353 ----
  		memmove(scan->keyData, scankey,
  				scan->numberOfKeys * sizeof(ScanKeyData));
  
+ 	/* initialize order-by data if needed */
  	if (orderbys && scan->numberOfOrderBys > 0)
  	{
  		int			i;
*************** spgrescan(IndexScanDesc scan, ScanKey sc
*** 343,350 ****
  		memmove(scan->orderByData, orderbys,
  				scan->numberOfOrderBys * sizeof(ScanKeyData));
  
- 		so->orderByTypes = (Oid *) palloc(sizeof(Oid) * scan->numberOfOrderBys);
- 
  		for (i = 0; i < scan->numberOfOrderBys; i++)
  		{
  			ScanKey		skey = &scan->orderByData[i];
--- 355,360 ----
*************** spgendscan(IndexScanDesc scan)
*** 380,390 ****
--- 390,411 ----
  	MemoryContextDelete(so->tempCxt);
  	MemoryContextDelete(so->traversalCxt);
  
+ 	if (so->keyData)
+ 		pfree(so->keyData);
+ 
+ 	if (so->state.deadTupleStorage)
+ 		pfree(so->state.deadTupleStorage);
+ 
  	if (scan->numberOfOrderBys > 0)
  	{
+ 		pfree(so->orderByTypes);
  		pfree(so->zeroDistances);
  		pfree(so->infDistances);
+ 		pfree(scan->xs_orderbyvals);
+ 		pfree(scan->xs_orderbynulls);
  	}
+ 
+ 	pfree(so);
  }
  
  /*
#19Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#18)
Re: Should pg 11 use a lot more memory building an spgist index?

On 2018/10/30 4:48, Tom Lane wrote:

I was confused about why the memory leak in Bruno's example is so much
larger in HEAD than v11; spgbeginscan does allocate more stuff than
before, but only if numberOfOrderBys > 0, which surely doesn't apply for
the exclusion-check code path. Eventually I realized that the difference
is because commit 2a6368343 made SpGistScanOpaqueData a good deal larger
than it had been (10080 vs 6264 bytes, on my x86_64 box), so it was just
the failure to pfree that struct that accounted for the bigger leak.

There's another issue with 2a6368343, though: it added a couple of
fmgr_info_copy calls to spgbeginscan. I don't understand why it'd be a
good idea to do that rather than using the FmgrInfos in the index's
relcache entry directly. Making a copy every time risks a memory leak,
because spgendscan has no way to free any fn_extra data that gets
allocated by the called function; and by the same token it's inefficient,
because the function has no way to cache fn_extra data across queries.

If we consider only the leak aspect, this coding presents a reason why
we should try to change things as Alvaro suggests. However, because of
the poor-caching aspect, I think this code is pretty broken anyway,
and once we fix it the issue is much less clear-cut.

(Looking around, it seems like we're very schizophrenic about whether
to copy index relcache support function FmgrInfos or just use them
directly. But nbtree and hash seem to always do the latter, so I think
it's probably reasonable to standardize on that.)

Agreed about trying to avoid fmgr_info_copy(), at least in this case.

By the way, do I get it right that the reason to want to avoid copying in
this instance is that the currently active context could be a long-lived
one, as in the case of create index, alter table add constraint, etc.
calling the copying code for every tuple? There are other instances of
fmgr_info_copy throughout index AM implementations, including the helper
function ScanKeyEntryInitializeWithInfo() called from them, but the copies
made in those cases don't appear very leak-prone.

Anyway, the attached proposed patch for HEAD makes Bruno's problem go
away, and it also fixes an unrelated leak introduced by 2a6368343
because it reallocates so->orderByTypes each time through spgrescan.
I think we should apply this, and suitable subsets in the back branches,
to fix the immediate problem. Then we can work at leisure on a more
invasive HEAD-only patch, if anyone is excited about that.

Makes sense to fix it like this for back-patching.

Thanks,
Amit

#20Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#18)
2 attachment(s)
Re: Should pg 11 use a lot more memory building an spgist index?

On 2018/10/30 4:48, Tom Lane wrote:

I wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

How about modifying SysScanDescData to have a memory context member,
which is created by systable_beginscan and destroyed by endscan?

I think it would still have problems, in that it would affect in which
context index_getnext delivers its output. We could reasonably make
these sorts of changes in places where the entire index_beginscan/
index_getnext/index_endscan call structure is in one place, but that
doesn't apply for the systable functions.

Actually, after looking more closely, index_getnext doesn't return a
palloc'd object anyway, or at least not one that the caller is responsible
for managing. So maybe that could work.

Instead of SysScanDescData, could we add one to IndexScanData? It's
somewhat clear that catalog scans don't leak much, but user index scans
can, as seen in this case.

In this particular case, I see that it's IndexCheckExclusion() that
invokes check_unique_or_exclusion_constraint() repeatedly for each heap
tuple after finishing building index on the heap. The latter performs
index_beginscan/endscan() for every heap tuple, but doesn't bother to
release the memory allocated for IndexScanDesc and its members.

I've tried to fix that with the attached patches.

0001 adds the ability for the callers of index_beginscan to specify a
memory context. index_beginscan_internals switches to that context before
calling ambeginscan and stores into a new field xs_scan_cxt of
IndexScanData, but maybe storing it is unnecessary.

0002 builds on that and adds the ability for the callers of
check_exclusion_or_unique_constraints to specify a memory context. Using
that, it fixes the leak by teaching IndexCheckExclusion and
ExecIndexInsertTuples to pass a memory context that's reset before calling
check_exclusion_or_unique_constraints() for the next tuple.

The following example shows that the patch works.

With HEAD:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- this consumes about 1GB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- this too
alter table foo add exclude using spgist (a with &&);

Patched:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- memory consumption doesn't grow above 37MB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- ditto
alter table foo add exclude using spgist (a with &&);

Thoughts?

Thanks,
Amit

Attachments:

v1-0001-Allow-IndexScanDesc-allocation-in-caller-specifie.patchtext/plain; charset=UTF-8; name=v1-0001-Allow-IndexScanDesc-allocation-in-caller-specifie.patchDownload
From 39900ceb569fc32946eb4106ae5db4cf0386f95a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Oct 2018 16:56:17 +0900
Subject: [PATCH v1 1/2] Allow IndexScanDesc allocation in caller-specified
 context

---
 src/backend/access/index/genam.c         |  6 ++++--
 src/backend/access/index/indexam.c       | 21 +++++++++++++++------
 src/backend/commands/cluster.c           |  3 ++-
 src/backend/executor/execIndexing.c      |  3 ++-
 src/backend/executor/execReplication.c   |  2 +-
 src/backend/executor/nodeIndexonlyscan.c |  3 ++-
 src/backend/executor/nodeIndexscan.c     |  6 ++++--
 src/backend/utils/adt/selfuncs.c         |  4 ++--
 src/include/access/genam.h               |  3 ++-
 src/include/access/relscan.h             |  6 ++++++
 10 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 9d08775687..0744b00d15 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -370,7 +370,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, nkeys, 0);
+										 snapshot, nkeys, 0,
+										 CurrentMemoryContext);
 		index_rescan(sysscan->iscan, key, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 	}
@@ -572,7 +573,8 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, nkeys, 0);
+									 snapshot, nkeys, 0,
+									 CurrentMemoryContext);
 	index_rescan(sysscan->iscan, key, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index eade540ef5..b17ad9417b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -125,7 +125,8 @@ do { \
 
 static IndexScanDesc index_beginscan_internal(Relation indexRelation,
 						 int nkeys, int norderbys, Snapshot snapshot,
-						 ParallelIndexScanDesc pscan, bool temp_snap);
+						 ParallelIndexScanDesc pscan, bool temp_snap,
+						 MemoryContext scan_cxt);
 
 
 /* ----------------------------------------------------------------
@@ -222,11 +223,13 @@ IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				MemoryContext scan_cxt)
 {
 	IndexScanDesc scan;
 
-	scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot, NULL, false);
+	scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot,
+									NULL, false, scan_cxt);
 
 	/*
 	 * Save additional parameters into the scandesc.  Everything else was set
@@ -251,7 +254,8 @@ index_beginscan_bitmap(Relation indexRelation,
 {
 	IndexScanDesc scan;
 
-	scan = index_beginscan_internal(indexRelation, nkeys, 0, snapshot, NULL, false);
+	scan = index_beginscan_internal(indexRelation, nkeys, 0, snapshot,
+									NULL, false, CurrentMemoryContext);
 
 	/*
 	 * Save additional parameters into the scandesc.  Everything else was set
@@ -268,9 +272,11 @@ index_beginscan_bitmap(Relation indexRelation,
 static IndexScanDesc
 index_beginscan_internal(Relation indexRelation,
 						 int nkeys, int norderbys, Snapshot snapshot,
-						 ParallelIndexScanDesc pscan, bool temp_snap)
+						 ParallelIndexScanDesc pscan, bool temp_snap,
+						 MemoryContext scan_cxt)
 {
 	IndexScanDesc scan;
+	MemoryContext oldcxt;
 
 	RELATION_CHECKS;
 	CHECK_REL_PROCEDURE(ambeginscan);
@@ -286,11 +292,14 @@ index_beginscan_internal(Relation indexRelation,
 	/*
 	 * Tell the AM to open a scan.
 	 */
+	oldcxt = MemoryContextSwitchTo(scan_cxt);
 	scan = indexRelation->rd_amroutine->ambeginscan(indexRelation, nkeys,
 													norderbys);
 	/* Initialize information for parallel scan. */
 	scan->parallel_scan = pscan;
 	scan->xs_temp_snap = temp_snap;
+	scan->xs_scan_cxt = scan_cxt;
+	MemoryContextSwitchTo(oldcxt);
 
 	return scan;
 }
@@ -504,7 +513,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel, int nkeys,
 	snapshot = RestoreSnapshot(pscan->ps_snapshot_data);
 	RegisterSnapshot(snapshot);
 	scan = index_beginscan_internal(indexrel, nkeys, norderbys, snapshot,
-									pscan, true);
+									pscan, true, CurrentMemoryContext);
 
 	/*
 	 * Save additional parameters into the scandesc.  Everything else was set
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 68be470977..2c5c23a145 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -926,7 +926,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	if (OldIndex != NULL && !use_sort)
 	{
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0,
+									CurrentMemoryContext);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9927ad70e6..74d15e6b43 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -719,7 +719,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, indnkeyatts, 0,
+								 CurrentMemoryContext);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while ((tup = index_getnext(index_scan,
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 25ba93e03c..432785c7d1 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -132,7 +132,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	InitDirtySnapshot(snap);
 	scan = index_beginscan(rel, idxrel, &snap,
 						   IndexRelationGetNumberOfKeyAttributes(idxrel),
-						   0);
+						   0, CurrentMemoryContext);
 
 	/* Build scan key. */
 	build_replindex_scan_key(skey, rel, idxrel, searchslot);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index daedf342f7..8b654a9eee 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -91,7 +91,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_RelationDesc,
 								   estate->es_snapshot,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   CurrentMemoryContext);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index ba7821b0e2..750b455e66 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_RelationDesc,
 								   estate->es_snapshot,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   CurrentMemoryContext);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -220,7 +221,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_RelationDesc,
 								   estate->es_snapshot,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   CurrentMemoryContext);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e0ece74bb9..16f7cc3641 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5582,7 +5582,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 				 */
 				index_scan = index_beginscan(heapRel, indexRel,
 											 &SnapshotNonVacuumable,
-											 1, 0);
+											 1, 0, CurrentMemoryContext);
 				index_rescan(index_scan, scankeys, 1, NULL, 0);
 
 				/* Fetch first tuple in sortop's direction */
@@ -5615,7 +5615,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			{
 				index_scan = index_beginscan(heapRel, indexRel,
 											 &SnapshotNonVacuumable,
-											 1, 0);
+											 1, 0, CurrentMemoryContext);
 				index_rescan(index_scan, scankeys, 1, NULL, 0);
 
 				/* Fetch first tuple in reverse direction */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 534fac7bf2..676eb2b752 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -140,7 +140,8 @@ extern bool index_insert(Relation indexRelation,
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
-				int nkeys, int norderbys);
+				int nkeys, int norderbys,
+				MemoryContext scan_cxt);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 					   Snapshot snapshot,
 					   int nkeys);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index e5289b8aa7..169435d746 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -139,6 +139,12 @@ typedef struct IndexScanDescData
 
 	/* parallel index scan information, in shared memory */
 	ParallelIndexScanDesc parallel_scan;
+
+	/*
+	 * The context under which memory for this struct and its members are
+	 * allocated.
+	 */
+	MemoryContext	xs_scan_cxt;
 }			IndexScanDescData;
 
 /* Generic structure for parallel scans */
-- 
2.11.0

v1-0002-Add-a-MemoryContext-arg-to-check_exclusion_or_uni.patchtext/plain; charset=UTF-8; name=v1-0002-Add-a-MemoryContext-arg-to-check_exclusion_or_uni.patchDownload
From 9252ae65eee9b9718bd62e5804a7783b4b6b9000 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Oct 2018 17:53:07 +0900
Subject: [PATCH v1 2/2] Add a MemoryContext arg to
 check_exclusion_or_unique_constraint

check_exclusion_or_unique_constraint performs an index scan and some
index AMs may use huge amounts of memory that might end up getting
allocated in a possibly long-lived context.  A given caller may call
check_exclusion_or_unique_constraint repeatedly as it's scanning a
heap, for example, where it's desirable to free the memory for each
new tuple.

With that in place, teach IndexCheckExclusion and
ExecIndexInsertTuples to pass a per-tuple context down to
check_exclusion_or_unique_constraint.
---
 src/backend/catalog/index.c         | 12 +++++++++++-
 src/backend/commands/constraint.c   |  2 +-
 src/backend/executor/execIndexing.c | 21 ++++++++++++++-------
 src/include/executor/executor.h     |  3 ++-
 4 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5885899c9b..ccafceb73a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2983,6 +2983,7 @@ IndexCheckExclusion(Relation heapRelation,
 	EState	   *estate;
 	ExprContext *econtext;
 	Snapshot	snapshot;
+	MemoryContext index_scan_cxt;
 
 	/*
 	 * If we are reindexing the target index, mark it as no longer being
@@ -3017,11 +3018,20 @@ IndexCheckExclusion(Relation heapRelation,
 								true,	/* buffer access strategy OK */
 								true);	/* syncscan OK */
 
+	/*
+	 * This context contains the allocations needed to scan the index
+	 * for a given heap tuple, reset for every tuple.
+	 */
+	index_scan_cxt = AllocSetContextCreate(CurrentMemoryContext,
+										   "IndexCheckExclusion_cxt",
+										   ALLOCSET_DEFAULT_SIZES);
+
 	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		CHECK_FOR_INTERRUPTS();
 
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
+		MemoryContextReset(index_scan_cxt);
 
 		/* Set up for predicate or expression evaluation */
 		ExecStoreHeapTuple(heapTuple, slot, false);
@@ -3050,7 +3060,7 @@ IndexCheckExclusion(Relation heapRelation,
 		check_exclusion_constraint(heapRelation,
 								   indexRelation, indexInfo,
 								   &(heapTuple->t_self), values, isnull,
-								   estate, true);
+								   estate, true, index_scan_cxt);
 	}
 
 	heap_endscan(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index f472355b48..a4b1e4c112 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -178,7 +178,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 		 */
 		check_exclusion_constraint(trigdata->tg_relation, indexRel, indexInfo,
 								   &tmptid, values, isnull,
-								   estate, false);
+								   estate, false, CurrentMemoryContext);
 	}
 
 	/*
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 74d15e6b43..cc1f5ffc39 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -129,7 +129,8 @@ static bool check_exclusion_or_unique_constraint(Relation heap, Relation index,
 									 EState *estate, bool newIndex,
 									 CEOUC_WAIT_MODE waitMode,
 									 bool errorOK,
-									 ItemPointer conflictTid);
+									 ItemPointer conflictTid,
+									 MemoryContext index_scan_cxt);
 
 static bool index_recheck_constraint(Relation index, Oid *constr_procs,
 						 Datum *existing_values, bool *existing_isnull,
@@ -410,6 +411,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		{
 			bool		violationOK;
 			CEOUC_WAIT_MODE waitMode;
+			ExprContext *econtext = GetPerTupleExprContext(estate);
 
 			if (applyNoDupErr)
 			{
@@ -432,7 +434,8 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 													 indexRelation, indexInfo,
 													 tupleid, values, isnull,
 													 estate, false,
-													 waitMode, violationOK, NULL);
+													 waitMode, violationOK, NULL,
+													 econtext->ecxt_per_tuple_memory);
 		}
 
 		if ((checkUnique == UNIQUE_CHECK_PARTIAL ||
@@ -582,7 +585,8 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 												 indexInfo, &invalidItemPtr,
 												 values, isnull, estate, false,
 												 CEOUC_WAIT, true,
-												 conflictTid);
+												 conflictTid,
+												 CurrentMemoryContext);
 		if (!satisfiesConstraint)
 			return false;
 	}
@@ -643,7 +647,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 									 EState *estate, bool newIndex,
 									 CEOUC_WAIT_MODE waitMode,
 									 bool violationOK,
-									 ItemPointer conflictTid)
+									 ItemPointer conflictTid,
+									 MemoryContext index_scan_cxt)
 {
 	Oid		   *constr_procs;
 	uint16	   *constr_strats;
@@ -720,7 +725,7 @@ retry:
 	conflict = false;
 	found_self = false;
 	index_scan = index_beginscan(heap, index, &DirtySnapshot, indnkeyatts, 0,
-								 CurrentMemoryContext);
+								 index_scan_cxt);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while ((tup = index_getnext(index_scan,
@@ -865,12 +870,14 @@ check_exclusion_constraint(Relation heap, Relation index,
 						   IndexInfo *indexInfo,
 						   ItemPointer tupleid,
 						   Datum *values, bool *isnull,
-						   EState *estate, bool newIndex)
+						   EState *estate, bool newIndex,
+						   MemoryContext index_scan_cxt)
 {
 	(void) check_exclusion_or_unique_constraint(heap, index, indexInfo, tupleid,
 												values, isnull,
 												estate, newIndex,
-												CEOUC_WAIT, false, NULL);
+												CEOUC_WAIT, false, NULL,
+												index_scan_cxt);
 }
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index edf538365b..c5998d9885 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -556,7 +556,8 @@ extern void check_exclusion_constraint(Relation heap, Relation index,
 						   IndexInfo *indexInfo,
 						   ItemPointer tupleid,
 						   Datum *values, bool *isnull,
-						   EState *estate, bool newIndex);
+						   EState *estate, bool newIndex,
+						   MemoryContext index_scan_cxt);
 
 /*
  * prototypes from functions in execReplication.c
-- 
2.11.0

#21Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#20)
Re: Should pg 11 use a lot more memory building an spgist index?

On Tue, Oct 30, 2018 at 7:11 PM Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/10/30 4:48, Tom Lane wrote:

I wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

How about modifying SysScanDescData to have a memory context member,
which is created by systable_beginscan and destroyed by endscan?

I think it would still have problems, in that it would affect in which
context index_getnext delivers its output. We could reasonably make
these sorts of changes in places where the entire index_beginscan/
index_getnext/index_endscan call structure is in one place, but that
doesn't apply for the systable functions.

Actually, after looking more closely, index_getnext doesn't return a
palloc'd object anyway, or at least not one that the caller is responsible
for managing. So maybe that could work.

Instead of SysScanDescData, could we add one to IndexScanData? It's
somewhat clear that catalog scans don't leak much, but user index scans
can, as seen in this case.

In this particular case, I see that it's IndexCheckExclusion() that
invokes check_unique_or_exclusion_constraint() repeatedly for each heap
tuple after finishing building index on the heap. The latter performs
index_beginscan/endscan() for every heap tuple, but doesn't bother to
release the memory allocated for IndexScanDesc and its members.

I've tried to fix that with the attached patches.

0001 adds the ability for the callers of index_beginscan to specify a
memory context. index_beginscan_internals switches to that context before
calling ambeginscan and stores into a new field xs_scan_cxt of
IndexScanData, but maybe storing it is unnecessary.

0002 builds on that and adds the ability for the callers of
check_exclusion_or_unique_constraints to specify a memory context. Using
that, it fixes the leak by teaching IndexCheckExclusion and
ExecIndexInsertTuples to pass a memory context that's reset before calling
check_exclusion_or_unique_constraints() for the next tuple.

The following example shows that the patch works.

With HEAD:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- this consumes about 1GB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- this too
alter table foo add exclude using spgist (a with &&);

Patched:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- memory consumption doesn't grow above 37MB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- ditto
alter table foo add exclude using spgist (a with &&);

Sorry I forgot to try the example with your patch. Maybe, it will
have more or less the same behavior as mine, although I didn't realize
that when I started writing my patch.

Thanks,
Amit

#22Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#21)
Re: Should pg 11 use a lot more memory building an spgist index?

On 2018/10/30 21:27, Amit Langote wrote:

On Tue, Oct 30, 2018 at 7:11 PM Amit Langote

I've tried to fix that with the attached patches.

0001 adds the ability for the callers of index_beginscan to specify a
memory context. index_beginscan_internals switches to that context before
calling ambeginscan and stores into a new field xs_scan_cxt of
IndexScanData, but maybe storing it is unnecessary.

0002 builds on that and adds the ability for the callers of
check_exclusion_or_unique_constraints to specify a memory context. Using
that, it fixes the leak by teaching IndexCheckExclusion and
ExecIndexInsertTuples to pass a memory context that's reset before calling
check_exclusion_or_unique_constraints() for the next tuple.

The following example shows that the patch works.

With HEAD:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- this consumes about 1GB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- this too
alter table foo add exclude using spgist (a with &&);

Patched:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- memory consumption doesn't grow above 37MB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- ditto
alter table foo add exclude using spgist (a with &&);

Sorry I forgot to try the example with your patch. Maybe, it will
have more or less the same behavior as mine, although I didn't realize
that when I started writing my patch.

Yep, I checked that fix-spgist-memory-leaks-1.patch posted upthread gives
almost the same numbers, i.e., the maximum amount of memory consumed.

Maybe, we don't need to spoil the interface of index_beginscan with the
new memory context argument like my patch does if the simple following of
its contract by amendscan would suffice.

Thanks,
Amit

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#22)
Re: Should pg 11 use a lot more memory building an spgist index?

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

Maybe, we don't need to spoil the interface of index_beginscan with the
new memory context argument like my patch does if the simple following of
its contract by amendscan would suffice.

Yeah, I'm not enamored of changing the API of index_beginscan for this;
the existing convention that it should allocate in CurrentMemoryContext
seems perfectly suitable. And changing it would create a lot of code
churn, not only for us but for externally-maintained AMs.

What is at stake is changing the API of amendscan, to specify that it
need not release memory because the current context is expected to be
destroyed or reset shortly after ending the scan. Then, for the small
number of call sites where that wouldn't be true, it's on those callers
to set up a suitable context and switch into it. Note this is actually
forwards compatible, in that an AM that's still following the convention
of releasing stuff manually would not be broken.

regards, tom lane

#24Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#23)
Re: Should pg 11 use a lot more memory building an spgist index?

I see that a fix got committed. Thanks!
I'll double check it after the point release comes out (which looks like it
will be next week) and let you know if there is still a problem.