Broken resetting of subplan hash tables

Started by Andreas Karlssonalmost 6 years ago5 messages
#1Andreas Karlsson
andreas@proxel.se
1 attachment(s)

Hi,

I looked again at one of the potential issues Ranier Vilela's static
analysis found and after looking more at it I still think this one is a
real bug. But my original patch was incorrect and introduced a use after
free bug.

The code for resetting the hash tables of the SubPlanState node in
buildSubPlanHash() looks like it can never run, and additionally it
would be broken if it would ever run. This issue was introduced in
commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd.

As far as I gather the following is true:

1. It sets node->hashtable and node->hashnulls to NULL a few lines
before checking if they are not NULL which means the code for resetting
them cannot ever be reached.

2. But if we changed to code so that the ResetTupleHashTable() calls are
reachable we would hit a typo. It resets node->hashtable twice and never
resets node->hashnulls which would cause non-obvious bugs.

3. Additionally since the memory context used by the hash tables is
reset in buildSubPlanHash() if we start resetting hash tables we will
get a use after free bug.

I have attached a patch which makes sure the code for resetting the hash
tables is actually run while also fixing the code for resetting them.

Since the current behavior of the code in HEAD is not actually broken,
it is just an optimization which is not used, this fix does not have to
be backpatched.

Andreas

Attachments:

fix-subplan-hash-reset-v2.patchtext/x-patch; charset=UTF-8; name=fix-subplan-hash-reset-v2.patchDownload
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index ff95317879..4d7306ff06 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -494,9 +494,6 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 	 * If it's not necessary to distinguish FALSE and UNKNOWN, then we don't
 	 * need to store subplan output rows that contain NULL.
 	 */
-	MemoryContextReset(node->hashtablecxt);
-	node->hashtable = NULL;
-	node->hashnulls = NULL;
 	node->havehashrows = false;
 	node->havenullrows = false;
 
@@ -533,7 +530,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 		}
 
 		if (node->hashnulls)
-			ResetTupleHashTable(node->hashtable);
+			ResetTupleHashTable(node->hashnulls);
 		else
 			node->hashnulls = BuildTupleHashTableExt(node->parent,
 													 node->descRight,
@@ -549,6 +546,8 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 													 node->hashtempcxt,
 													 false);
 	}
+	else
+		node->hashnulls = NULL;
 
 	/*
 	 * We are probably in a short-lived expression-evaluation context. Switch
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Karlsson (#1)
Re: Broken resetting of subplan hash tables

Andreas Karlsson <andreas@proxel.se> writes:

The code for resetting the hash tables of the SubPlanState node in
buildSubPlanHash() looks like it can never run, and additionally it
would be broken if it would ever run. This issue was introduced in
commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd.

Right. Justin Pryzby also noted this a couple weeks back, but we
didn't deal with it at that point because we were up against a
release deadline.

As far as I gather the following is true:

1. It sets node->hashtable and node->hashnulls to NULL a few lines
before checking if they are not NULL which means the code for resetting
them cannot ever be reached.

Yeah. Those lines should have been removed when the ResetTupleHashTable
logic was added.

2. But if we changed to code so that the ResetTupleHashTable() calls are
reachable we would hit a typo. It resets node->hashtable twice and never
resets node->hashnulls which would cause non-obvious bugs.

Right.

3. Additionally since the memory context used by the hash tables is
reset in buildSubPlanHash() if we start resetting hash tables we will
get a use after free bug.

Nope, not right. The hash table metadata is now allocated in the
es_query_cxt; what is in the hashtablecxt is just tuples, and that
does need to be cleared, per the comment for ResetTupleHashTable.
Your patch as given results in a nasty memory leak, which is easily
demonstrated with a small mod to the regression test case I added:

select sum(ss.tst::int) from
generate_series(1,10000000) o cross join lateral (
select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
random() as r
from onek i where i.unique1 = o%1000 ) ss;

Since the current behavior of the code in HEAD is not actually broken,
it is just an optimization which is not used, this fix does not have to
be backpatched.

Unfortunately ... this test case also leaks memory like mad in
HEAD, v12, and v11, because all of them are rebuilding the hash
table from scratch without clearing the old one. So this is
indeed broken and a back-patch is necessary.

I noted while looking at this that most of the calls of
ResetTupleHashTable are actually never reached in the regression
tests (per code coverage report) so I made up some test cases
that do reach 'em, and included those in the commit.

TBH, I think that this tuple table API is seriously misdesigned;
it is confusing and very error-prone to have the callers need to
reset the tuple context separately from calling ResetTupleHashTable.
Also, the callers all look like their resets are intended to destroy
the whole hashtable not just its contents (as indeed they were doing,
before the faulty commit). But I didn't attempt to fix that today.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Broken resetting of subplan hash tables

Hi,

On 2020-02-29 14:02:59 -0500, Tom Lane wrote:

3. Additionally since the memory context used by the hash tables is
reset in buildSubPlanHash() if we start resetting hash tables we will
get a use after free bug.

Nope, not right. The hash table metadata is now allocated in the
es_query_cxt; what is in the hashtablecxt is just tuples, and that
does need to be cleared, per the comment for ResetTupleHashTable.
Your patch as given results in a nasty memory leak, which is easily
demonstrated with a small mod to the regression test case I added:

select sum(ss.tst::int) from
generate_series(1,10000000) o cross join lateral (
select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
random() as r
from onek i where i.unique1 = o%1000 ) ss;

Since the current behavior of the code in HEAD is not actually broken,
it is just an optimization which is not used, this fix does not have to
be backpatched.

Unfortunately ... this test case also leaks memory like mad in
HEAD, v12, and v11, because all of them are rebuilding the hash
table from scratch without clearing the old one. So this is
indeed broken and a back-patch is necessary.

Yea :(. Thanks for doing that.

I noted while looking at this that most of the calls of
ResetTupleHashTable are actually never reached in the regression
tests (per code coverage report) so I made up some test cases
that do reach 'em, and included those in the commit.

Cool.

TBH, I think that this tuple table API is seriously misdesigned;
it is confusing and very error-prone to have the callers need to
reset the tuple context separately from calling ResetTupleHashTable.

Do you have an alternative proposal? Before committing the patch adding
it that way, I'd waited for quite a while asking for input... In
several cases (nodeAgg.c, nodeSetOp.c) there's memory from outside
execGrouping.c that's also allocated in the same context as the table
contents (via TupleHashTable->additional) - just resetting the context
passed to BuildTupleHashTableExt() as part of ResetTupleHashTable()
seems problematic too.

We could change it so more of the metadata for execGrouping.c is
computed outside of BuildTupleHashTableExt(), and continue to destroy
the entire hashtable. But we'd still have to reallocate the hashtable,
the slot, etc. So having a reset interface seems like the right thing.

I guess we could set it up so that BuildTupleHashTableExt() registers a
memory context reset callback on tablecxt, which'd reinitialize the
hashtable. But that seems like it'd be at least as confusing?

Also, the callers all look like their resets are intended to destroy
the whole hashtable not just its contents (as indeed they were doing,
before the faulty commit). But I didn't attempt to fix that today.

Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
that to me? Why would they want to drop the hashtable metadata when
resetting? What am I missing?

Greetings,

Andres Freund

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: Broken resetting of subplan hash tables

Andres Freund <andres@anarazel.de> writes:

On 2020-02-29 14:02:59 -0500, Tom Lane wrote:

TBH, I think that this tuple table API is seriously misdesigned;
it is confusing and very error-prone to have the callers need to
reset the tuple context separately from calling ResetTupleHashTable.

Do you have an alternative proposal?

I'd be inclined to let the tuple hashtable make its own tuple-storage
context and reset that for itself. Is it really worth the complexity
and bug hazards to share such a context with other uses?

We could change it so more of the metadata for execGrouping.c is
computed outside of BuildTupleHashTableExt(), and continue to destroy
the entire hashtable. But we'd still have to reallocate the hashtable,
the slot, etc. So having a reset interface seems like the right thing.

Agreed, the reset interface is a good idea. I'm just not happy that
in addition to resetting, you have to remember to reset some
vaguely-related context (and heaven help you if you reset that context
but not the hashtable).

Also, the callers all look like their resets are intended to destroy
the whole hashtable not just its contents (as indeed they were doing,
before the faulty commit). But I didn't attempt to fix that today.

Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
that to me? Why would they want to drop the hashtable metadata when
resetting? What am I missing?

They may not look like it to you; but Andreas misread that, and so did
I at first --- not least because that *is* how it used to work, and
there are still comments suggesting that that's how it works, eg this
in ExecInitRecursiveUnion:

* If hashing, we need a per-tuple memory context for comparisons, and a
* longer-lived context to store the hash table. The table can't just be
* kept in the per-query context because we want to be able to throw it
* away when rescanning.

"throw it away" sure looks like it means the entire hashtable, not just
its tuple contents.

regards, tom lane

#5Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#4)
Re: Broken resetting of subplan hash tables

Em sáb., 29 de fev. de 2020 às 18:44, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

"throw it away" sure looks like it means the entire hashtable, not just
its tuple contents.

I don't know if I can comment clearly to help, but from my experience,
destroying and rebuilding the hashtable is a waste if possible, resetting
it.
By analogy, I have code with arrays where, I reuse them, with only one
line, instead of reconstructing them.
a->nelts = 0; / * reset array * /
If possible, doing the same for hashtables would be great.

regards,
Ranier Vilela