Patch to fix write after end of array in hashed agg initialization

Started by Colm McHughalmost 7 years ago7 messageshackers
Jump to latest
#1Colm McHugh
colm.mchugh@gmail.com

Attached is a patch for a write after allocated memory which we found in
testing. Its an obscure case but can happen if the same column is used in
different grouping keys, as in the example below, which uses tables from
the regress test suite (build with --enable-cassert in order to turn on
memory warnings). Patch is against master.

The hashed aggregate state has an array for the column indices that is
sized using the number of non-aggregated columns in the set that includes
the agg's targetlist, quals and input grouping columns. The duplicate
elimination of columns can result in under-allocation, as below. Sizing
based on the number of grouping columns and number of quals/targetlists not
in the grouping columns avoids this.

Regards,
Colm McHugh (Salesforce)

explain (costs off) select 1 from tenk where (hundred, thousand) in (select
twothousand, twothousand from onek);

psql: WARNING: problem in alloc set ExecutorState: detected write past
chunk end in block 0x7f8b8901fa00, chunk 0x7f8b89020cd0

psql: WARNING: problem in alloc set ExecutorState: detected write past
chunk end in block 0x7f8b8901fa00, chunk 0x7f8b89020cd0

QUERY PLAN

-------------------------------------------------------------

Hash Join

Hash Cond: (tenk.hundred = onek.twothousand)

-> Seq Scan on tenk

Filter: (hundred = thousand)

-> Hash

-> HashAggregate

Group Key: onek.twothousand, onek.twothousand

-> Seq Scan on onek
(8 rows)

Attachments:

0001-Fix-write-after-end-of-array-in-hashed-Agg-initializ.patchapplication/x-patch; name=0001-Fix-write-after-end-of-array-in-hashed-Agg-initializ.patchDownload+48-12
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Colm McHugh (#1)
Re: Patch to fix write after end of array in hashed agg initialization

Colm McHugh <colm.mchugh@gmail.com> writes:

Attached is a patch for a write after allocated memory which we found in
testing. Its an obscure case but can happen if the same column is used in
different grouping keys, as in the example below, which uses tables from
the regress test suite (build with --enable-cassert in order to turn on
memory warnings). Patch is against master.

I confirm the appearance of the memory-overwrite warnings in HEAD.

It looks like the bad code is (mostly) the fault of commit b5635948.
Andrew, can you take a look at this fix?

regards, tom lane

#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: Patch to fix write after end of array in hashed agg initialization

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Attached is a patch for a write after allocated memory which we
found in testing. Its an obscure case but can happen if the same
column is used in different grouping keys, as in the example below,
which uses tables from the regress test suite (build with
--enable-cassert in order to turn on memory warnings). Patch is
against master.

Tom> I confirm the appearance of the memory-overwrite warnings in HEAD.

Tom> It looks like the bad code is (mostly) the fault of commit
Tom> b5635948. Andrew, can you take a look at this fix?

I'll look into it.

--
Andrew (irc:RhodiumToad)

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Andrew Gierth (#3)
Re: Patch to fix write after end of array in hashed agg initialization

"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Attached is a patch for a write after allocated memory which we
found in testing. Its an obscure case but can happen if the same
column is used in different grouping keys, as in the example below,
which uses tables from the regress test suite (build with
--enable-cassert in order to turn on memory warnings). Patch is
against master.

Andrew> I'll look into it.

OK, so my first impression is that this is down to (a) the fact that
when planning a GROUP BY, we eliminate duplicate grouping keys; (b) due
to (a), the executor code isn't expecting to have to deal with
duplicates, but (c) when using a HashAgg to implement a Unique path, the
planner code isn't making any attempt to eliminate duplicates so they
get through.

It was wrong before commit b5635948, looks like Andres' fc4b3dea2 which
introduced the arrays and the concept of narrowing the stored tuples is
the actual culprit. But I'll deal with fixing it anyway unless Andres
has a burning desire to step in.

My inclination is to fix this in the planner rather than the executor;
there seems no good reason to actually hash a duplicate column more than
once.

--
Andrew (irc:RhodiumToad)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#4)
Re: Patch to fix write after end of array in hashed agg initialization

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

My inclination is to fix this in the planner rather than the executor;
there seems no good reason to actually hash a duplicate column more than
once.

Sounds reasonable --- but would it make sense to introduce some
assertions, or other cheap tests, into the executor to check that
it's not being given a case it can't handle?

regards, tom lane

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#5)
Re: Patch to fix write after end of array in hashed agg initialization

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

My inclination is to fix this in the planner rather than the
executor; there seems no good reason to actually hash a duplicate
column more than once.

Tom> Sounds reasonable --- but would it make sense to introduce some
Tom> assertions, or other cheap tests, into the executor to check that
Tom> it's not being given a case it can't handle?

Oh definitely, I was planning on it.

--
Andrew (irc:RhodiumToad)

#7Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Andrew Gierth (#4)
Re: Patch to fix write after end of array in hashed agg initialization

"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Andrew> My inclination is to fix this in the planner rather than the
Andrew> executor; there seems no good reason to actually hash a
Andrew> duplicate column more than once.

I take this back; I don't believe it's possible to eliminate duplicates
in all cases. Consider (a,b) IN (select c,c from...), where a,b,c are
different types; I don't think we can assume that (a=c) and (b=c)
cross-type comparisons will necessarily induce the same hash function on
c, and so we might legitimately need to keep it duplicated.

So I'm going with a simpler method of ensuring the array is adequately
sized at execution time and not touching the planner at all. Draft patch
is attached, will commit it later.

--
Andrew (irc:RhodiumToad)

Attachments:

hashfix.patchtext/x-patchDownload+47-7