Valgrind-detected bug in partitioning code

Started by Tom Lanealmost 9 years ago9 messages
#1Tom Lane
tgl@sss.pgh.pa.us

skink has been unhappy since commit d26fa4f went in, but I think
that just exposed a pre-existing bug. Running valgrind here
duplicates the failure:

==00:00:02:01.653 16626== Conditional jump or move depends on uninitialised value(s)
==00:00:02:01.653 16626== at 0x4BDF6B: btint4cmp (nbtcompare.c:97)
==00:00:02:01.653 16626== by 0x81D6BE: FunctionCall2Coll (fmgr.c:1318)
==00:00:02:01.653 16626== by 0x52D584: partition_bounds_equal (partition.c:627)
==00:00:02:01.653 16626== by 0x80CF8E: RelationClearRelation (relcache.c:1203)
==00:00:02:01.653 16626== by 0x80E601: RelationCacheInvalidateEntry (relcache.c:2662)
==00:00:02:01.653 16626== by 0x803DD6: LocalExecuteInvalidationMessage (inval.c:568)
==00:00:02:01.653 16626== by 0x803F53: ProcessInvalidationMessages.clone.0 (inval.c:444)
==00:00:02:01.653 16626== by 0x8040C8: CommandEndInvalidationMessages (inval.c:1056)
==00:00:02:01.653 16626== by 0x80C719: RelationSetNewRelfilenode (relcache.c:3490)
==00:00:02:01.653 16626== by 0x5CD50A: ExecuteTruncate (tablecmds.c:1393)
==00:00:02:01.653 16626== by 0x721AC7: standard_ProcessUtility (utility.c:532)
==00:00:02:01.653 16626== by 0x71D943: PortalRunUtility (pquery.c:1163)

IOW, partition_bounds_equal() is testing uninitialized memory during
a TRUNCATE on a partitioned table.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: Valgrind-detected bug in partitioning code

On Fri, Jan 20, 2017 at 8:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

skink has been unhappy since commit d26fa4f went in, but I think
that just exposed a pre-existing bug. Running valgrind here
duplicates the failure:

==00:00:02:01.653 16626== Conditional jump or move depends on uninitialised value(s)
==00:00:02:01.653 16626== at 0x4BDF6B: btint4cmp (nbtcompare.c:97)
==00:00:02:01.653 16626== by 0x81D6BE: FunctionCall2Coll (fmgr.c:1318)
==00:00:02:01.653 16626== by 0x52D584: partition_bounds_equal (partition.c:627)
==00:00:02:01.653 16626== by 0x80CF8E: RelationClearRelation (relcache.c:1203)
==00:00:02:01.653 16626== by 0x80E601: RelationCacheInvalidateEntry (relcache.c:2662)
==00:00:02:01.653 16626== by 0x803DD6: LocalExecuteInvalidationMessage (inval.c:568)
==00:00:02:01.653 16626== by 0x803F53: ProcessInvalidationMessages.clone.0 (inval.c:444)
==00:00:02:01.653 16626== by 0x8040C8: CommandEndInvalidationMessages (inval.c:1056)
==00:00:02:01.653 16626== by 0x80C719: RelationSetNewRelfilenode (relcache.c:3490)
==00:00:02:01.653 16626== by 0x5CD50A: ExecuteTruncate (tablecmds.c:1393)
==00:00:02:01.653 16626== by 0x721AC7: standard_ProcessUtility (utility.c:532)
==00:00:02:01.653 16626== by 0x71D943: PortalRunUtility (pquery.c:1163)

IOW, partition_bounds_equal() is testing uninitialized memory during
a TRUNCATE on a partitioned table.

Hmm. That's bad. I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload, because
couldn't we be processing an invalidation in the context of an aborted
transaction? And I wonder why we really need or want to do that
anyway. For purposes of equalPartitionDescs(), it seems like the
relevant test is datumIsEqual(), not the equality operator derived
from the partition opclass.

But I think the immediate problem here is some fuzzy thinking about
the handling of the values taken from b1->content and b2->content.
Those have to be checked before examining values from b1->datums
and/or b2->datums, and the latter should be inspected only if the
former are both identical and both RANGE_DATUM_FINITE. I'll push a
fix for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: Valgrind-detected bug in partitioning code

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. That's bad. I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload, because
couldn't we be processing an invalidation in the context of an aborted
transaction?

You're doing WHAT?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#3)
Re: Valgrind-detected bug in partitioning code

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. That's bad. I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload, because
couldn't we be processing an invalidation in the context of an aborted
transaction?

You're doing WHAT?

Uh. +1.

Thanks!

Stephen

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#4)
Re: Valgrind-detected bug in partitioning code

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. That's bad. I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload,

You're doing WHAT?

Uh. +1.

Now that I've calmed down a bit: the right way to do this sort of thing is
simply to flush the invalidated data during reload, and recompute it when
it is next requested, which necessarily will be inside a valid
transaction. Compare e.g. the handling of the lists of a relation's
indexes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: Valgrind-detected bug in partitioning code

On Fri, Jan 20, 2017 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. That's bad. I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload,

You're doing WHAT?

Uh. +1.

Now that I've calmed down a bit: the right way to do this sort of thing is
simply to flush the invalidated data during reload, and recompute it when
it is next requested, which necessarily will be inside a valid
transaction. Compare e.g. the handling of the lists of a relation's
indexes.

The existing handling of partition descriptors is modeled on and very
similar to the existing handling for other types of objects:

keep_tupdesc = equalTupleDescs(relation->rd_att,
newrel->rd_att);
keep_rules = equalRuleLocks(relation->rd_rules,
newrel->rd_rules);
keep_policies = equalRSDesc(relation->rd_rsdesc,
newrel->rd_rsdesc);
keep_partkey = (relation->rd_partkey != NULL);
keep_partdesc = equalPartitionDescs(relation->rd_partkey,

relation->rd_partdesc,

newrel->rd_partdesc);

And I think the reason is the same too, namely, if we've got a pointer
into partition descriptor in the relcache, we don't want that to
suddenly get swapped out and replaced with a pointer to an equivalent
data structure at a different address, because then our pointer will
be dangling. That seems fine as far as it goes.

The difference is that those other equalBLAH functions call a
carefully limited amount of code whereas, in looking over the
backtrace you sent, I realized that equalPartitionDescs is calling
partition_bounds_equal which does this:

cmpval =
DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j],

key->partcollation[j],

b1->datums[i][j],

b2->datums[i][j]))

That's of course opening up a much bigger can of worms. But apart
from the fact that it's unsafe, I think it's also wrong, as I said
upthread. I think calling datumIsEqual() there should be better all
around. Do you think that's unsafe here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: Valgrind-detected bug in partitioning code

Robert Haas <robertmhaas@gmail.com> writes:

The difference is that those other equalBLAH functions call a
carefully limited amount of code whereas, in looking over the
backtrace you sent, I realized that equalPartitionDescs is calling
partition_bounds_equal which does this:
cmpval =
DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j],
key->partcollation[j],
b1->datums[i][j],
b2->datums[i][j]))

Ah, gotcha.

That's of course opening up a much bigger can of worms. But apart
from the fact that it's unsafe, I think it's also wrong, as I said
upthread. I think calling datumIsEqual() there should be better all
around. Do you think that's unsafe here?

That sounds like a plausible solution. It is safe in the sense of
being a bounded amount of code. It would return "false" in various
interesting cases like toast pointer versus detoasted equivalent,
but I think that would be fine in this application.

It would probably be a good idea to add something to datumIsEqual's
comment to the effect that trying to make it smarter would be a bad idea,
because some callers rely on it being stupid.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#7)
1 attachment(s)
Re: Valgrind-detected bug in partitioning code

On 2017/01/21 9:01, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The difference is that those other equalBLAH functions call a
carefully limited amount of code whereas, in looking over the
backtrace you sent, I realized that equalPartitionDescs is calling
partition_bounds_equal which does this:
cmpval =
DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j],
key->partcollation[j],
b1->datums[i][j],
b2->datums[i][j]))

Ah, gotcha.

That's of course opening up a much bigger can of worms. But apart
from the fact that it's unsafe, I think it's also wrong, as I said
upthread. I think calling datumIsEqual() there should be better all
around. Do you think that's unsafe here?

That sounds like a plausible solution. It is safe in the sense of
being a bounded amount of code. It would return "false" in various
interesting cases like toast pointer versus detoasted equivalent,
but I think that would be fine in this application.

Sorry for jumping in late. Attached patch replaces the call to
partitioning-specific comparison function by the call to datumIsEqual().
I wonder if it is safe to assume that datumIsEqual() would return true for
a datum and copy of it made using datumCopy(). The latter is used to copy
a single datum from a bound's Const node (what is stored in the catalog
for every bound).

It would probably be a good idea to add something to datumIsEqual's
comment to the effect that trying to make it smarter would be a bad idea,
because some callers rely on it being stupid.

I assume "making datumIsEqual() smarter" here means to make it account for
toasting of varlena datums, which is not a good idea because some of its
callers may be working in the context of an aborted transaction. I tried
to update the header comment along these lines, though please feel to
rewrite it.

Thanks,
Amit

Attachments:

safe-partition_bounds_equal-1.patchtext/x-diff; name=safe-partition_bounds_equal-1.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ad95b1bc55..8c9a373f1f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -639,12 +639,14 @@ partition_bounds_equal(PartitionKey key,
 					continue;
 			}
 
-			/* Compare the actual values */
-			cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j],
-													 key->partcollation[j],
-													 b1->datums[i][j],
-													 b2->datums[i][j]));
-			if (cmpval != 0)
+			/*
+			 * Compare the actual values; note that we do not invoke the
+			 * partitioning specific comparison operator here, because we
+			 * could be in an aborted transaction.
+			 */
+			if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+							  key->parttypbyval[j],
+							  key->parttyplen[j]))
 				return false;
 		}
 
diff --git a/src/backend/utils/adt/datum.c b/src/backend/utils/adt/datum.c
index 535e4277cc..071a7d4db1 100644
--- a/src/backend/utils/adt/datum.c
+++ b/src/backend/utils/adt/datum.c
@@ -209,6 +209,10 @@ datumTransfer(Datum value, bool typByVal, int typLen)
  * of say the representation of zero in one's complement arithmetic).
  * Also, it will probably not give the answer you want if either
  * datum has been "toasted".
+ *
+ * Do not try to make this any smarter than it currently is with respect
+ * to "toasted" datums, because some of the callers could be working in the
+ * context of an aborted transaction.
  *-------------------------------------------------------------------------
  */
 bool
#9Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#8)
Re: Valgrind-detected bug in partitioning code

On Mon, Jan 23, 2017 at 12:45 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Sorry for jumping in late. Attached patch replaces the call to
partitioning-specific comparison function by the call to datumIsEqual().
I wonder if it is safe to assume that datumIsEqual() would return true for
a datum and copy of it made using datumCopy(). The latter is used to copy
a single datum from a bound's Const node (what is stored in the catalog
for every bound).

Thanks, committed. I expanded the comment in partition.c because I
think you missed the other rationale for doing it this way, which is
that the partitioning operator might ignore some "unimportant" changes
(e.g. for numeric, the difference between 1.0 and 1.00) but for this
purpose it's better to update the relcache if there is *any* change.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers