OOM in spgist insert

Started by Dilip Kumarabout 5 years ago29 messageshackers

dilipbalaut@gmail.com

about 5 years ago

While testing something on spgist I found that at certain point while
inserting in spgist it is going for doPickSplit, but even after split
is is not able to find a place to insert a tuple and it keeping going
in that loop infinitely it seems and finally error out with OOM
because in this loop we are continuously allocating memory for the
tuple in temp context but since we are never coming out of the loop it
is erroring out with OOM.

To reproduce load the data from the attached script 'data_load.sql'
and run below commands

------Load data before running this using 'data_load.sql'
-------Test case start---
create extension spgist_name_ops;

select opcname, amvalidate(opc.oid)
from pg_opclass opc join pg_am am on am.oid = opcmethod
where amname = 'spgist' and opcname = 'name_ops';

-- warning expected here
select opcname, amvalidate(opc.oid)
from pg_opclass opc join pg_am am on am.oid = opcmethod
where amname = 'spgist' and opcname = 'name_ops_old';

create table t(f1 name, f2 integer, f3 text);
create index on t using spgist(f1) include(f2, f3);
\d+ t_f1_f2_f3_idx

insert into t select proname, case when length(proname) % 2 = 0 then
pronargs else null end, prosrc from pg_proc_test;

---- Test case end

--Memory allocation stack----
#1 0x0000000000bf96c5 in palloc0 (size=9696) at mcxt.c:1133
#2 0x000000000056b24b in spgFormLeafTuple (state=0x7ffedea15b80,
heapPtr=0x27df306, datums=0x7ffedea15660, isnulls=0x7ffedea15640) at
spgutils.c:892
#3 0x000000000055e15c in doPickSplit (index=0x7fa4b1ddd5c8,
state=0x7ffedea15b80, current=0x7ffedea159c0, parent=0x7ffedea159a0,
newLeafTuple=0x27df300, level=9,
isNulls=false, isNew=true) at spgdoinsert.c:848
#4 0x0000000000561e53 in spgdoinsert (index=0x7fa4b1ddd5c8,
state=0x7ffedea15b80, heapPtr=0x27718d8, datums=0x7ffedea15cc0,
isnulls=0x7ffedea15ca0)
at spgdoinsert.c:2115

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Pavel Borisov

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Dilip Kumar (#1)

Re: OOM in spgist insert

ср, 12 мая 2021 г. в 11:09, Dilip Kumar <dilipbalaut@gmail.com>:

While testing something on spgist I found that at certain point while
inserting in spgist it is going for doPickSplit, but even after split
is is not able to find a place to insert a tuple and it keeping going
in that loop infinitely it seems and finally error out with OOM
because in this loop we are continuously allocating memory for the
tuple in temp context but since we are never coming out of the loop it
is erroring out with OOM.

My first idea is that this is the case when index tuple doesn't fit into

one index page. As INCLUDED columns are added as is the tuple can not be
made shorter by prefix-stripping. Seems we should check every index tuple
length to fit the page before its insertion. Will see the code little bit
later.

Thanks for the reporting!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Dilip Kumar

dilipbalaut@gmail.com

about 5 years ago

In reply to: Pavel Borisov (#2)

Re: OOM in spgist insert

On Wed, 12 May 2021 at 1:43 PM, Pavel Borisov <pashkin.elfe@gmail.com>
wrote:

ср, 12 мая 2021 г. в 11:09, Dilip Kumar <dilipbalaut@gmail.com>:

While testing something on spgist I found that at certain point while
inserting in spgist it is going for doPickSplit, but even after split
is is not able to find a place to insert a tuple and it keeping going
in that loop infinitely it seems and finally error out with OOM
because in this loop we are continuously allocating memory for the
tuple in temp context but since we are never coming out of the loop it
is erroring out with OOM.

My first idea is that this is the case when index tuple doesn't fit into

one index page. As INCLUDED columns are added as is the tuple can not be
made shorter by prefix-stripping. Seems we should check every index tuple
length to fit the page before its insertion.

Thanks for looking into this.

Will see the code little bit later.

--

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Pavel Borisov

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Dilip Kumar (#3)

Re: OOM in spgist insert

ср, 12 мая 2021 г. в 12:36, Dilip Kumar <dilipbalaut@gmail.com>:

On Wed, 12 May 2021 at 1:43 PM, Pavel Borisov <pashkin.elfe@gmail.com>
wrote:

ср, 12 мая 2021 г. в 11:09, Dilip Kumar <dilipbalaut@gmail.com>:

While testing something on spgist I found that at certain point while
inserting in spgist it is going for doPickSplit, but even after split
is is not able to find a place to insert a tuple and it keeping going
in that loop infinitely it seems and finally error out with OOM
because in this loop we are continuously allocating memory for the
tuple in temp context but since we are never coming out of the loop it
is erroring out with OOM.

My first idea is that this is the case when index tuple doesn't fit into

one index page. As INCLUDED columns are added as is the tuple can not be
made shorter by prefix-stripping. Seems we should check every index tuple
length to fit the page before its insertion.

Thanks for looking into this.

Will see the code little bit later.

Ok

PFA v1 patch. Does this help?

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Pavel Borisov

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Pavel Borisov (#4)

Re: OOM in spgist insert

ср, 12 мая 2021 г. в 12:39, Pavel Borisov <pashkin.elfe@gmail.com>:

ср, 12 мая 2021 г. в 12:36, Dilip Kumar <dilipbalaut@gmail.com>:

On Wed, 12 May 2021 at 1:43 PM, Pavel Borisov <pashkin.elfe@gmail.com>
wrote:

ср, 12 мая 2021 г. в 11:09, Dilip Kumar <dilipbalaut@gmail.com>:

While testing something on spgist I found that at certain point while
inserting in spgist it is going for doPickSplit, but even after split
is is not able to find a place to insert a tuple and it keeping going
in that loop infinitely it seems and finally error out with OOM
because in this loop we are continuously allocating memory for the
tuple in temp context but since we are never coming out of the loop it
is erroring out with OOM.

My first idea is that this is the case when index tuple doesn't fit

into one index page. As INCLUDED columns are added as is the tuple can not
be made shorter by prefix-stripping. Seems we should check every index
tuple length to fit the page before its insertion.

Thanks for looking into this.

Will see the code little bit later.

Ok

PFA v1 patch. Does this help?

I've made a mistake in attributes count in v1. PFA v2

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Dilip Kumar

dilipbalaut@gmail.com

about 5 years ago

In reply to: Pavel Borisov (#5)

Re: OOM in spgist insert

On Wed, May 12, 2021 at 2:21 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

PFA v1 patch. Does this help?

I've made a mistake in attributes count in v1. PFA v2

V2 works. Thanks for fixing this quickly, I think you can add a
comment for the new error condition you added.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Pavel Borisov

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Dilip Kumar (#6)

Re: OOM in spgist insert

V2 works. Thanks for fixing this quickly, I think you can add a
comment for the new error condition you added.

Added comments. PFA v3

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Dilip Kumar

dilipbalaut@gmail.com

about 5 years ago

In reply to: Pavel Borisov (#7)

Re: OOM in spgist insert

On Wed, May 12, 2021 at 5:11 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

V2 works. Thanks for fixing this quickly, I think you can add a
comment for the new error condition you added.

Added comments. PFA v3

Thanks.

+ *
+ * For indexes with INCLUDEd columns we do not know whether we can reduce
+ * index tuple size by suffixing its key part or we will go into the
+ * endless loop on pick-split (in case included columns data is big enough

INCLUDEd -> why you have used a mixed case here?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Pavel Borisov

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Dilip Kumar (#8)

Re: OOM in spgist insert

INCLUDEd -> why you have used a mixed case here?

It is current practice to call INCLUDE columns in capital, you can find
many places in the current code. But case mixture can be avoided indeed ))
PFA v4

#10

Dilip Kumar

dilipbalaut@gmail.com

about 5 years ago

In reply to: Pavel Borisov (#9)

Re: OOM in spgist insert

On Wed, May 12, 2021 at 5:35 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

INCLUDEd -> why you have used a mixed case here?

It is current practice to call INCLUDE columns in capital, you can find many places in the current code. But case mixture can be avoided indeed ))
PFA v4

Okay, that makes sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Pavel Borisov (#9)

Re: OOM in spgist insert

Pavel Borisov <pashkin.elfe@gmail.com> writes:

[ v4-0001-When-there-are-INCLUDEd-columns-in-SpGist-check-t.patch ]

I don't like this patch one bit --- it's basically disabling a fairly
important SPGiST feature as soon as there are included columns.
What's more, it's not really giving us any better defense against
the infinite-picksplit-loop problem than we had before.

I wonder if we should give up on the theory posited circa
spgdoinsert.c:2213:

* Note: if the opclass sets longValuesOK, we rely on the
* choose function to eventually shorten the leafDatum
* enough to fit on a page. We could add a test here to
* complain if the datum doesn't get visibly shorter each
* time, but that could get in the way of opclasses that
* "simplify" datums in a way that doesn't necessarily
* lead to physical shortening on every cycle.

The argument that there might be a longValuesOK opclass that *doesn't*
shorten the datum each time seems fairly hypothetical to me.

An alternative way of looking at things is that this is the opclass's
fault. It should have realized that it's not making progress, and
thrown an error. However, I'm not sure if the opclass picksplit
function has enough info to throw a meaningful error. It looks to
me like the trouble case from spg_text_picksplit's point of view
is that it's given a single tuple containing a zero-length string,
so that there is no prefix it can strip. However, it seems possible
that that could be a legitimate case. Even if it's not, the opclass
function doesn't have (for instance) the name of the index to cite
in an error message. Nor did we give it a way to return a failure
indication, which is seeming like a mistake right now.

BTW, another nasty thing I discovered while testing this is that
the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
we're holding a buffer lock there so InterruptHoldoffCount > 0.
So once you get into this loop you can't even cancel the query.
Seems like that needs a fix, too.

regards, tom lane

#12

Alvaro Herrera

alvherre@2ndquadrant.com

about 5 years ago

In reply to: Tom Lane (#11)

Re: OOM in spgist insert

On 2021-May-13, Tom Lane wrote:

BTW, another nasty thing I discovered while testing this is that
the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
we're holding a buffer lock there so InterruptHoldoffCount > 0.
So once you get into this loop you can't even cancel the query.
Seems like that needs a fix, too.

This comment made me remember a patch I've had for a while, which splits
the CHECK_FOR_INTERRUPTS() definition in two -- one of them is
INTERRUPTS_PENDING_CONDITION() which let us test the condition
separately; that allows the lock we hold to be released prior to
actually processing the interrupts.

The btree code modified was found to be an actual problem in production
when a btree is corrupted in such a way that vacuum would get an
infinite loop. I don't remember the exact details but I think we saw
vacuum running for a couple of weeks, and had to restart the server in
order to terminate it (since it wouldn't respond to signals).

--
ï¿½lvaro Herrera Valdivia, Chile
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this." (Fotis)
(http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)

#13

Alvaro Herrera

alvherre@2ndquadrant.com

about 5 years ago

In reply to: Alvaro Herrera (#12)

Re: OOM in spgist insert

On 2021-May-13, Alvaro Herrera wrote:

The btree code modified was found to be an actual problem in production
when a btree is corrupted in such a way that vacuum would get an
infinite loop. I don't remember the exact details but I think we saw
vacuum running for a couple of weeks, and had to restart the server in
order to terminate it (since it wouldn't respond to signals).

(Looking again, the nbtpage.c hunk might have been made obsolete by
c34787f91058 and other commits).

--
ï¿½lvaro Herrera Valdivia, Chile

#14

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#12)

Re: OOM in spgist insert

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2021-May-13, Tom Lane wrote:

BTW, another nasty thing I discovered while testing this is that
the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
we're holding a buffer lock there so InterruptHoldoffCount > 0.
So once you get into this loop you can't even cancel the query.
Seems like that needs a fix, too.

Attached is a WIP patch for that part. Basically, if it looks
like there's an interrupt pending, make spgdoinsert() fall out of
its loop, so it'll release the buffer locks, and then recheck
CHECK_FOR_INTERRUPTS() at a point where it can really throw the
error. Now, the hole in this approach is what if ProcessInterrupts
still declines to throw the error? I've made the patch make use of
the existing provision to retry spgdoinsert() in case of deadlock,
so that it just goes around and tries again. But that doesn't seem
terribly satisfactory, because if InterruptPending remains set then
we'll just have an infinite loop at that outer level. IOW, this
patch can only cope with the situation where there's not any outer
function holding a buffer lock (or other reason to prevent the
query cancel from taking effect). Maybe that's good enough, but
I'm not terribly pleased with it.

This comment made me remember a patch I've had for a while, which splits
the CHECK_FOR_INTERRUPTS() definition in two -- one of them is
INTERRUPTS_PENDING_CONDITION() which let us test the condition
separately; that allows the lock we hold to be released prior to
actually processing the interrupts.

Hm. Yeah, I was feeling that this patch is unduly friendly with
the innards of CHECK_FOR_INTERRUPTS(). So maybe some refactoring
is called for, but I'm not quite sure what it should look like.
Testing the condition separately is fine, but what about the case
of ProcessInterrupts refusing to pull the trigger?

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#13)

Re: OOM in spgist insert

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

(Looking again, the nbtpage.c hunk might have been made obsolete by
c34787f91058 and other commits).

OK. Here's a revision that adopts your idea, except that I left
out the nbtpage.c change since you aren't sure of that part.

I added a macro that allows spgdoinsert to Assert that it's not
called in a context where the infinite-loop-due-to-InterruptPending
risk would arise. This is a little bit fragile, because it'd be
easy for ill-considered changes to ProcessInterrupts to break it,
but it's better than nothing.

regards, tom lane

#16

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Tom Lane (#15)

Re: OOM in spgist insert

Here's a patch that attempts to prevent the infinite loop in spgdoinsert
by checking whether the proposed leaf tuple is getting smaller at each
iteration.

We can't be totally rigid about that, because for example if the opclass
shortens a 7-byte string to 5 bytes, that will make no difference in the
tuple's size after alignment. I first tried to handle that by checking
datumGetSize() of the key datum itself, but observed that spgtextproc.c
has some cases where it'll return an empty leaf-datum string at two
levels before succeeding. Maybe it'd be okay to fail that on the
grounds that it can't become any smaller later. But on the whole, and
considering the existing comment's concerns about opclasses that don't
shorten the datum every time, it seems like a good idea to allow some
fuzz here. So what this patch does is to allow up to 10 cycles with no
reduction in the actual leaf tuple size before failing. That way we can
handle slop due to alignment roundoff and slop due to opclass corner
cases with a single, very cheap mechanism. It does mean that we might
build a few more useless inner tuples before failing --- but that seems
like a good tradeoff for *not* failing in cases where the opclass is
able to shorten the leaf datum sufficiently.

I have not bothered to tease apart the query-cancel and infinite-loop
parts of the patch, but probably should do that before committing.

What do people think about back-patching this? In existing branches,
we've defined it to be an opclass bug if it fails to shorten the leaf
datum enough. But not having any defenses against that seems like
not a great idea. OTOH, the 10-cycles-to-show-progress rule could be
argued to be an API break.

regards, tom lane

#17

Alvaro Herrera

alvherre@2ndquadrant.com

about 5 years ago

In reply to: Tom Lane (#15)

Re: OOM in spgist insert

On 2021-May-13, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

(Looking again, the nbtpage.c hunk might have been made obsolete by
c34787f91058 and other commits).

OK. Here's a revision that adopts your idea, except that I left
out the nbtpage.c change since you aren't sure of that part.

Thanks.

I added a macro that allows spgdoinsert to Assert that it's not
called in a context where the infinite-loop-due-to-InterruptPending
risk would arise. This is a little bit fragile, because it'd be
easy for ill-considered changes to ProcessInterrupts to break it,
but it's better than nothing.

Hmm, it looks OK to me, but I wonder why you kept the original
CHECK_FOR_INTERRUPTS()s since these would be done once we've broken out
of the loop anyway. I tested Dilip's original test case and while we
still die on OOM, we're able to interrupt it before dying.

Not related to this patch -- I was bothered by the UnlockReleaseBuffer
calls at the bottom of spgdoinsert that leave the buffer still set in
the structs. It's not a problem if you look only at this routine, but I
notice that callee doPickSplit does the same thing, so maybe there is
some weird situation in which you could end up with the buffer variable
set, but we don't hold lock nor pin on the page, so an attempt to clean
up would break. I don't know enough about spgist to figure out how to
craft a test case, maybe it's impossible to reach for some reason, but
it seems glass-in-the-beach sort of thing.

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude." (Brian Kernighan)

#18

Alvaro Herrera

alvherre@2ndquadrant.com

about 5 years ago

In reply to: Tom Lane (#16)

Re: OOM in spgist insert

On 2021-May-13, Tom Lane wrote:

What do people think about back-patching this? In existing branches,
we've defined it to be an opclass bug if it fails to shorten the leaf
datum enough. But not having any defenses against that seems like
not a great idea. OTOH, the 10-cycles-to-show-progress rule could be
argued to be an API break.

I think if the alternative is to throw an error, we can afford to retry
quite a few more times than 10 in order not have that called an API
break. Say, retry (MAXIMUM_ALIGNOF << 3) times or so (if you want to
parameterize on maxalign). It's not like this is going to be a
performance drag where not needed .. but I think leaving back-branches
unfixed is not great.

I did run Dilip's test case as well as your new regression test, and
both work as intended with your new code (and both OOM-crash the
original code).

--
ï¿½lvaro Herrera Valdivia, Chile

#19

Pavel Borisov

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Alvaro Herrera (#17)

Re: OOM in spgist insert

I think it's good to backpatch the check as it doesn't change acceptable
behavior, just replace infinite loop with the acceptable thing.

#20

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#17)

Re: OOM in spgist insert

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Hmm, it looks OK to me, but I wonder why you kept the original
CHECK_FOR_INTERRUPTS()s since these would be done once we've broken out
of the loop anyway. I tested Dilip's original test case and while we
still die on OOM, we're able to interrupt it before dying.

Hm. My thought was that in the cases where InterruptPending is set for
some reason other than a query cancel, we could let ProcessInterrupts
service it at less cost than abandoning and retrying the index insertion.
On reflection though, that only works for the first CHECK_FOR_INTERRUPTS
at the top of the loop, and only the first time through, because during
later calls we'll be holding buffer locks.

Maybe the best idea is to have one CHECK_FOR_INTERRUPTS at the top of
the function, in hopes of clearing out any already-pending interrupts,
and then just use the condition test inside the loop.

Not related to this patch -- I was bothered by the UnlockReleaseBuffer
calls at the bottom of spgdoinsert that leave the buffer still set in
the structs. It's not a problem if you look only at this routine, but I
notice that callee doPickSplit does the same thing, so maybe there is
some weird situation in which you could end up with the buffer variable
set, but we don't hold lock nor pin on the page, so an attempt to clean
up would break.

Maybe I'm confused, but aren't those just local variables that are about
to go out of scope anyway? Clearing them seems more likely to draw
compiler warnings about dead stores than accomplish something useful.

regards, tom lane

#21

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Alvaro Herrera (#18)

#22

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Tom Lane (#21)

#23