PostgreSQL crashes with SIGSEGV

Started by Bernd Helmleover 8 years ago33 messageshackersbugs
Jump to latest
#1Bernd Helmle
mailings@oopsware.de
hackersbugs

A customer recently reported a crash in a postgres backend. The backend
encountered a SIGSEGV, crashing during SELECTs from a fairly
complicated view using a grouping set directive. I've managed to
reproduce it by tracking it down to a specific SELECT, but
unfortunately couldn't yet manage to strip it down to a small,
repeatable test case which doesn't involve the whole (sensitive)
dataset. I'm reporting my findings so far, maybe it helps to track it
down already.

The crashed backend gives the following backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x000000000099be84 in pfree (pointer=0x17194e8) at mcxt.c:1007
1007 context = ((StandardChunkHeader *)
(gdb) bt
#0 0x000000000099be84 in pfree (pointer=0x17194e8) at mcxt.c:1007
#1 0x00000000004798d8 in heap_free_minimal_tuple (mtup=0x17194e8) at
heaptuple.c:1403
#2 0x0000000000687202 in ExecClearTuple (slot=0x10a3f88) at
execTuples.c:455
#3 0x0000000000686b89 in ExecResetTupleTable (tupleTable=0x10a06a0,
shouldFree=0 '\000') at execTuples.c:169
#4 0x00000000006773ff in ExecEndPlan (planstate=0x10a0398,
estate=0x10a01f0) at execMain.c:1469
#5 0x0000000000675927 in standard_ExecutorEnd (queryDesc=0x10910c0) at
execMain.c:468
#6 0x0000000000675865 in ExecutorEnd (queryDesc=0x10910c0) at
execMain.c:439
#7 0x000000000062c71c in PortalCleanup (portal=0xf79ae0) at
portalcmds.c:280
#8 0x000000000099ce97 in PortalDrop (portal=0xf79ae0, isTopCommit=0
'\000') at portalmem.c:510
#9 0x0000000000813639 in exec_simple_query (
query_string=0xf3dbe0 "SELECT * FROM \"xxxx\".\"xxxx\" WHERE (xxxx
= 'xxxx') LIMIT 1000;") at postgres.c:1095
#10 0x00000000008177ea in PostgresMain (argc=1, argv=0xee9e10,
dbname=0xeb9a78 "xxxx", username=0xee9c78 "bernd")
at postgres.c:4072
#11 0x000000000078cc27 in BackendRun (port=0xedc700) at
postmaster.c:4342
#12 0x000000000078c377 in BackendStartup (port=0xedc700) at
postmaster.c:4016
#13 0x0000000000788983 in ServerLoop () at postmaster.c:1721
#14 0x0000000000787fb8 in PostmasterMain (argc=1, argv=0xeb7960) at
postmaster.c:1329
#15 0x00000000006d1b52 in main (argc=1, argv=0xeb7960) at main.c:228

I've tested this so far against very current REL9_6_STABLE and
REL9_5_STABLE and got them to crash with the same backtrace. The crash
is dependent on the chosen plan, experiments with work_mem show that
the crash seems to happen only if you get external sorts into the
execution plan.

REL10_STABLE seems not affected, as my extracted application query
doesn't crash there.

Running the query against REL9_6_STABLE with valgrind shows the
following results:

==00:00:01:33.067 13158== Invalid write of size 8
==00:00:01:33.067 13158== at 0x93D53B: AllocSetFree (aset.c:998)
==00:00:01:33.067 13158== by 0x93EC03: pfree (mcxt.c:1012)
==00:00:01:33.067 13158== by 0x476E34: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:01:33.067 13158== by 0x6521A3: ExecClearTuple
(execTuples.c:455)
==00:00:01:33.067 13158== by 0x651D4B: ExecResetTupleTable
(execTuples.c:169)

==00:00:03:34.540 14381== Invalid read of size 8
==00:00:03:34.540 14381== at 0x99E29C: pfree (mcxt.c:1007)
==00:00:03:34.540 14381== by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:34.540 14381== by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:34.540 14381== by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:34.540 14381== by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:34.540 14381== by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:34.540 14381== by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:34.540 14381== by 0x62CA58: PortalCleanup
(portalcmds.c:280)
==00:00:03:34.540 14381== by 0x99F412: PortalDrop (portalmem.c:510)
==00:00:03:34.540 14381== by 0x8142A4: exec_simple_query
(postgres.c:1095)
==00:00:03:34.540 14381== by 0x818455: PostgresMain
(postgres.c:4072)
==00:00:03:34.540 14381== by 0x78CF63: BackendRun
(postmaster.c:4342)
==00:00:03:34.540 14381== Address 0x188cd220 is 474,080 bytes inside a
block of size 1,048,576 free'd
==00:00:03:34.540 14381== at 0x4C2FD18: free
(vg_replace_malloc.c:530)
==00:00:03:34.540 14381== by 0x99A8DB: AllocSetDelete (aset.c:653)
==00:00:03:34.540 14381== by 0x99C7A4: MemoryContextDelete
(mcxt.c:225)
==00:00:03:34.540 14381== by 0x99C855: MemoryContextDeleteChildren
(mcxt.c:245)
==00:00:03:34.540 14381== by 0x99C772: MemoryContextDelete
(mcxt.c:208)
==00:00:03:34.540 14381== by 0x9A5F10: tuplesort_end
(tuplesort.c:1198)
==00:00:03:34.540 14381== by 0x692568: ExecEndAgg (nodeAgg.c:3449)
==00:00:03:34.540 14381== by 0x67BB46: ExecEndNode
(execProcnode.c:755)
==00:00:03:34.540 14381== by 0x6AD652: ExecEndSubqueryScan
(nodeSubqueryscan.c:181)
==00:00:03:34.540 14381== by 0x67BA69: ExecEndNode
(execProcnode.c:697)
==00:00:03:34.540 14381== by 0x69DF6E: ExecEndLimit
(nodeLimit.c:438)
==00:00:03:34.540 14381== by 0x67BB9D: ExecEndNode
(execProcnode.c:779)
==00:00:03:34.540 14381== Block was alloc'd at
==00:00:03:34.540 14381== at 0x4C2EB6B: malloc
(vg_replace_malloc.c:299)
==00:00:03:34.540 14381== by 0x99AED4: AllocSetAlloc (aset.c:866)
==00:00:03:34.540 14381== by 0x99DC7A: palloc (mcxt.c:904)
==00:00:03:34.540 14381== by 0x4798F6: heap_copy_minimal_tuple
(heaptuple.c:1417)
==00:00:03:34.540 14381== by 0x6878D7: ExecCopySlotMinimalTuple
(execTuples.c:593)
==00:00:03:34.540 14381== by 0x9AC1D2: copytup_heap
(tuplesort.c:3998)
==00:00:03:34.540 14381== by 0x9A616C: tuplesort_puttupleslot
(tuplesort.c:1345)
==00:00:03:34.540 14381== by 0x68D7D8: fetch_input_tuple
(nodeAgg.c:601)
==00:00:03:34.540 14381== by 0x68FEC1: agg_retrieve_direct
(nodeAgg.c:2168)
==00:00:03:34.540 14381== by 0x68F9AB: ExecAgg (nodeAgg.c:1903)
==00:00:03:34.540 14381== by 0x67B70A: ExecProcNode
(execProcnode.c:503)
==00:00:03:34.540 14381== by 0x6AD40F: SubqueryNext
(nodeSubqueryscan.c:53)
==00:00:03:34.540 14381==
{
<insert_a_suppression_name_here>
Memcheck:Addr8
fun:pfree
fun:heap_free_minimal_tuple
fun:ExecClearTuple
fun:ExecResetTupleTable
fun:ExecEndPlan
fun:standard_ExecutorEnd
fun:ExecutorEnd
fun:PortalCleanup
fun:PortalDrop
fun:exec_simple_query
fun:PostgresMain
fun:BackendRun
}
==00:00:03:34.548 14381== Invalid read of size 4
==00:00:03:34.548 14381== at 0x99E2AE: pfree (mcxt.c:1010)
==00:00:03:34.548 14381== by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:34.548 14381== by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:34.548 14381== by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:34.548 14381== by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:34.548 14381== by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:34.548 14381== by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:34.548 14381== by 0x62CA58: PortalCleanup
(portalcmds.c:280)
==00:00:03:34.548 14381== by 0x99F412: PortalDrop (portalmem.c:510)
==00:00:03:34.548 14381== by 0x8142A4: exec_simple_query
(postgres.c:1095)
==00:00:03:34.548 14381== by 0x818455: PostgresMain
(postgres.c:4072)
==00:00:03:34.548 14381== by 0x78CF63: BackendRun
(postmaster.c:4342)
==00:00:03:34.548 14381== Address 0x7f7f7f7f7f7f7f7f is not stack'd,
malloc'd or (recently) free'd
==00:00:03:34.548 14381==
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:pfree
fun:heap_free_minimal_tuple
fun:ExecClearTuple
fun:ExecResetTupleTable
fun:ExecEndPlan
fun:standard_ExecutorEnd
fun:ExecutorEnd
fun:PortalCleanup
fun:PortalDrop
fun:exec_simple_query
fun:PostgresMain
fun:BackendRun
}
==00:00:03:34.548 14381==
==00:00:03:34.548 14381== Process terminating with default action of
signal 11 (SIGSEGV): dumping core
==00:00:03:34.548 14381== General Protection Fault
==00:00:03:34.548 14381== at 0x99E2AE: pfree (mcxt.c:1010)
==00:00:03:34.548 14381== by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:34.548 14381== by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:34.548 14381== by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:34.548 14381== by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:34.548 14381== by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:34.548 14381== by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:34.548 14381== by 0x62CA58: PortalCleanup
(portalcmds.c:280)
==00:00:03:34.548 14381== by 0x99F412: PortalDrop (portalmem.c:510)
==00:00:03:34.548 14381== by 0x8142A4: exec_simple_query
(postgres.c:1095)
==00:00:03:34.548 14381== by 0x818455: PostgresMain
(postgres.c:4072)
==00:00:03:34.548 14381== by 0x78CF63: BackendRun
(postmaster.c:4342)
==00:00:03:35.088 14381==
==00:00:03:35.088 14381== Process terminating with default action of
signal 11 (SIGSEGV)
==00:00:03:35.088 14381== General Protection Fault
==00:00:03:35.088 14381== at 0x632016C: _dl_catch_error (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381== by 0x631F8E6: __libc_dlclose (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381== by 0x634B5E4: free_mem (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381== by 0x634B1E1: __libc_freeres (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381== by 0x4A296DB: _vgnU_freeres
(vg_preloaded.c:77)
==00:00:03:35.088 14381== by 0x18118EA7: ???
==00:00:03:35.088 14381== by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:35.088 14381== by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:35.088 14381== by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:35.088 14381== by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:35.088 14381== by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:35.088 14381== by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:35.088 14381==
==00:00:03:35.088 14381== HEAP SUMMARY:
==00:00:03:35.088 14381== in use at exit: 6,833,625 bytes in 531
blocks
==00:00:03:35.088 14381== total heap usage: 814,111 allocs, 1,264
frees, 118,761,978 bytes allocated
==00:00:03:35.088 14381==
==00:00:03:35.088 14381== For a detailed leak analysis, rerun with: --
leak-check=full
==00:00:03:35.088 14381==
==00:00:03:35.088 14381== For counts of detected and suppressed errors,
rerun with: -v
==00:00:03:35.088 14381== ERROR SUMMARY: 2 errors from 2 contexts
(suppressed: 35 from 6)

It seems the backend tries to free a minimal tuple at executor end,
which is already gone by deleting the memory context it was allocated
in before. ExecResetTupleTable() is looping through a list with 70
entries, it encounters (after 6 or seven rounds) the first tuple slot
with tts_shouldFreeMin set, all others before don't have it set:

(gdb) p *slot
$6 = {type = T_TupleTableSlot, tts_isempty = 0 '\000', tts_shouldFree =
0 '\000', tts_shouldFreeMin = 1 '\001', tts_slow = 1 '\001',
tts_tuple = 0x10a6c48, tts_tupleDescriptor = 0x10d1be8, tts_mcxt =
0xf59668, tts_buffer = 0, tts_nvalid = 6, tts_values = 0x10d2640,
tts_isnull = 0x10d26d8 "", tts_mintuple = 0x171c168, tts_minhdr =
{t_len = 162, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0},
ip_posid = 0}, t_tableOid = 0, t_data = 0x171c160}, tts_off = 88}
(gdb) p list_length(tupleTable)

tts_mintuple is invalid, though:

(gdb) p *slot->tts_mintuple
Cannot access memory at address 0x171c168

The following attempt to clean it in ExecClearTuple() then crashes the
backend.

Bernd

#2Michael Paquier
michael@paquier.xyz
In reply to: Bernd Helmle (#1)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Fri, Dec 8, 2017 at 12:47 AM, Bernd Helmle <mailings@oopsware.de> wrote:

A customer recently reported a crash in a postgres backend. The backend
encountered a SIGSEGV, crashing during SELECTs from a fairly
complicated view using a grouping set directive. I've managed to
reproduce it by tracking it down to a specific SELECT, but
unfortunately couldn't yet manage to strip it down to a small,
repeatable test case which doesn't involve the whole (sensitive)
dataset. I'm reporting my findings so far, maybe it helps to track it
down already.

Hmm. Even if you cannot reproduce an isolated test case, could it be
possible to get an idea of the shape the SELECT query involved and of
the schema plus the view? No need for sensitive data here. This would
help in reproducing a test case. What are also the sizes involved?
Even a small data set with work_mem low should trigger the problem?

I've tested this so far against very current REL9_6_STABLE and
REL9_5_STABLE and got them to crash with the same backtrace. The crash
is dependent on the chosen plan, experiments with work_mem show that
the crash seems to happen only if you get external sorts into the
execution plan.
REL10_STABLE seems not affected, as my extracted application query
doesn't crash there.

That's one thing to begin with. So HEAD is not affected as well?
--
Michael

In reply to: Bernd Helmle (#1)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Thu, Dec 7, 2017 at 7:47 AM, Bernd Helmle <mailings@oopsware.de> wrote:

I've tested this so far against very current REL9_6_STABLE and
REL9_5_STABLE and got them to crash with the same backtrace. The crash
is dependent on the chosen plan, experiments with work_mem show that
the crash seems to happen only if you get external sorts into the
execution plan.

REL10_STABLE seems not affected, as my extracted application query
doesn't crash there.

Does "set replacement_sort_tuples = 0" change anything on
REL9_6_STABLE? If you only get a crash when there is very little
work_mem, then that might be a good avenue of investigation. Note that
my changes to external sorting started in REL9_6_STABLE, so they can't
be involved here.

Are you aware of commit 512f67c8d02cc558f9c269cc848b0f0f788c4fe1,
which fixed a bug affecting external sorts? Are you sure that you have
that fix on REL9_5_STABLE + REL9_6_STABLE?

--
Peter Geoghegan

In reply to: Bernd Helmle (#1)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Thu, Dec 7, 2017 at 7:47 AM, Bernd Helmle <mailings@oopsware.de> wrote:

It seems the backend tries to free a minimal tuple at executor end,
which is already gone by deleting the memory context it was allocated
in before. ExecResetTupleTable() is looping through a list with 70
entries, it encounters (after 6 or seven rounds) the first tuple slot
with tts_shouldFreeMin set, all others before don't have it set:

On second thought, it seems more likely that the reason that
REL10_STABLE is unaffected is commit
3856cf9607f41245ec9462519c53f1109e781fc5. As of that commit (which
built on earlier v10 work) there is no question about memory for
tuples retrieved via tuplesort_gettupleslot() not belonging to caller
-- it must belong to caller. The old interface already resulted in
bugs in early 9.6 point releases that looked similar to this one, so I
was glad to remove it. (Note also that this picture was slightly
complicated by the performance optimization commit
fa117ee40330db401da776e7b003f047098a7d4c that followed, which made
some callers opt out of copying when that was clearly safe, but that
probably isn't important.)

So, as you said, the question that we probably need to answer is: just
how did grouping sets/nodeAgg.c code end up getting tuple memory
lifetime wrong. One good way to get more information is to rerun
Valgrind, but this time with track origins enabled. I myself run
Valgrind like this when I want to see the origin of memory involved in
an error. I specify:

$ valgrind --leak-check=no --gen-suppressions=all --trace-children=yes
--track-origins=yes --read-var-info=yes
--suppressions=/home/pg/postgresql/root/source/src/tools/valgrind.supp
-v postgres --log_line_prefix="%m %p " --log_statement=all
--shared_buffers=64MB 2>&1 | tee postmaster.log

(Probably the only change that you'll need is to make is to run
Valgrind with an the extra "--track-origins=yes".)

--track-origins=yes is usually something I use when I already know
that Valgrind will complain, but want more information about the
nature of the problem.

--
Peter Geoghegan

#5Bernd Helmle
mailings@oopsware.de
In reply to: Peter Geoghegan (#3)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

Am Donnerstag, den 07.12.2017, 18:23 -0800 schrieb Peter Geoghegan:

Does "set replacement_sort_tuples = 0" change anything on
REL9_6_STABLE? If you only get a crash when there is very little
work_mem, then that might be a good avenue of investigation. Note
that
my changes to external sorting started in REL9_6_STABLE, so they
can't
be involved here.

replacement_sort_tuples = 0 changes the picture, indeed. With that
setting, the query runs without problems in REL9_6_STABLE.

Are you aware of commit 512f67c8d02cc558f9c269cc848b0f0f788c4fe1,
which fixed a bug affecting external sorts? Are you sure that you
have
that fix on REL9_5_STABLE + REL9_6_STABLE?

My test instances are build against a fresh pull from yesterday.

#6Bernd Helmle
mailings@oopsware.de
In reply to: Peter Geoghegan (#4)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

Am Donnerstag, den 07.12.2017, 18:54 -0800 schrieb Peter Geoghegan:

So, as you said, the question that we probably need to answer is:
just
how did grouping sets/nodeAgg.c code end up getting tuple memory
lifetime wrong. One good way to get more information is to rerun
Valgrind, but this time with track origins enabled. I myself run
Valgrind like this when I want to see the origin of memory involved
in
an error. I specify:

$ valgrind --leak-check=no --gen-suppressions=all --trace-
children=yes
--track-origins=yes --read-var-info=yes
--
suppressions=/home/pg/postgresql/root/source/src/tools/valgrind.supp
-v postgres --log_line_prefix="%m %p " --log_statement=all
--shared_buffers=64MB 2>&1 | tee postmaster.log

(Probably the only change that you'll need is to make is to run
Valgrind with an the extra "--track-origins=yes".)

--track-origins=yes is usually something I use when I already know
that Valgrind will complain, but want more information about the
nature of the problem.

That's what i've already did. My usage of valgrind was this:

valgrind --leak-check=no --gen-suppressions=all \
--track-origins=yes --suppressions=src/tools/valgrind.supp \
--time-stamp=yes --trace-children=yes postgres

#7Michael Paquier
michael@paquier.xyz
In reply to: Bernd Helmle (#6)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Fri, Dec 8, 2017 at 6:19 PM, Bernd Helmle <mailings@oopsware.de> wrote:

valgrind --leak-check=no --gen-suppressions=all \
--track-origins=yes --suppressions=src/tools/valgrind.supp \
--time-stamp=yes --trace-children=yes postgres

I have been trying for a couple of hours to reproduce a failure using
views with grouping sets, enforcing an external sort with things like
that:zfAgg
create table aa (a1 int, a2 int, a3 int, a4 int, a5 int, a6 int);
insert into aa SELECT (random() * 2000000000)::int,
(random() * 2000000000)::int,
(random() * 2000000000)::int,
(random() * 2000000000)::int,
(random() * 2000000000)::int,
(random() * 2000000000)::int FROM generate_series(1,1000000);
set work_mem = '128kB';
create or replace view aav as SELECT a1, a2, a3, a4, a5, avg(a6)
FROM aa GROUP BY GROUPING SETS (a1, a2, a3, a4, a5);
explain analyze select * from aav order by a1, a2, a3, a4, a5;

Not sure if you can provide this information, but what does the plan
of the query look like?
--
Michael

#8Andreas Seltenreich
seltenreich@gmx.de
In reply to: Michael Paquier (#7)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

Michael Paquier writes:

Not sure if you can provide this information, but what does the plan
of the query look like?

We did some more reducing work on the original query and data. The
following testcase reproduces the double free reported by valgrind for
me when run against a vanilla REL9_6_STABLE build.

regards,
Andreas

--8<---------------cut here---------------start------------->8---
drop table if exists bug;
create table bug (n text, v text, b text, t text);
insert into bug
select i%9, i%9, i%16 ,i%4096 from generate_series(1,100000) g(i);
analyze bug;

explain select * from (
select thecube.nv
from ( select
(n || ' ') || coalesce(v, '') as nv
from bug
group by ((n || ' ') || coalesce(v, '')) ,cube(b, t)
) thecube
where nv = '8 8'
) sub limit 7000;
--8<---------------cut here---------------end--------------->8---

QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=13422.32..15747.28 rows=7000 width=32)
-> Subquery Scan on thecube (cost=13422.32..27622.49 rows=42754 width=32)
-> GroupAggregate (cost=13422.32..27194.95 rows=42754 width=38)
Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.b, bug.t
Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.b
Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text)))
Sort Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.t
Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.t
Filter: ((((bug.n || ' '::text) || COALESCE(bug.v, ''::text))) = '8 8'::text)
-> Sort (cost=13422.32..13672.32 rows=100000 width=42)
Sort Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.b, bug.t
-> Seq Scan on bug (cost=0.00..2041.00 rows=100000 width=42)

In reply to: Andreas Seltenreich (#8)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Thu, Dec 14, 2017 at 10:47 AM, Andreas Seltenreich
<andreas.seltenreich@credativ.de> wrote:

We did some more reducing work on the original query and data. The
following testcase reproduces the double free reported by valgrind for
me when run against a vanilla REL9_6_STABLE build.

--8<---------------cut here---------------start------------->8---
drop table if exists bug;
create table bug (n text, v text, b text, t text);
insert into bug
select i%9, i%9, i%16 ,i%4096 from generate_series(1,100000) g(i);
analyze bug;

explain select * from (
select thecube.nv
from ( select
(n || ' ') || coalesce(v, '') as nv
from bug
group by ((n || ' ') || coalesce(v, '')) ,cube(b, t)
) thecube
where nv = '8 8'
) sub limit 7000;
--8<---------------cut here---------------end--------------->8---

I can reproduce this against REL9_6_STABLE, once work_mem is set to
4MB, and replacement_sort_tuples is set to 150000.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#9)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Thu, Dec 14, 2017 at 12:06 PM, Peter Geoghegan <pg@bowt.ie> wrote:

I can reproduce this against REL9_6_STABLE, once work_mem is set to
4MB, and replacement_sort_tuples is set to 150000.

I've figured this one out. It's a double free. Consider this code:

(Although REL9_5_STABLE is also affected, the following remarks
reference REL9_6_STABLE-only code.)

* The slot receives a copied tuple (sometimes allocated in caller memory
* context) that will stay valid regardless of future manipulations of the
* tuplesort's state.
*/
bool
tuplesort_gettupleslot(/* SNIP */)

I wrote that comment block, in commit a5f0bd77a, an early bugfix for
9.6 (I actually mentioned this specific commit to Bernd in passing
already). This bug is very similar to the one fixed by a5f0bd77a, and
is arguably a broader version of that same bug. Note that I didn't do
anything with external sorts until 9.6, so I expect that this affects
all stable branches. That said, it may not be possible to produce a
hard crash on all versions. More or less by accident.

The function opens with this code:

MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
SortTuple stup;
bool should_free;

if (!tuplesort_gettuple_common(state, forward, &stup, &should_free))
stup.tuple = NULL;

MemoryContextSwitchTo(oldcontext);
/* SNIP */
...

As long as the slot receives a copied tuple that may or may not be in
caller's memory context (it may instead get allocated in a
tuplesort-private memory context), and as long as destroying a
tuplesort via tuplesort_end() counts as a "future manipulation", then
you it's possible to crash with a double-pfree() while not violating
the tuplesort_gettupleslot() contract. The memory may or may not be
freed as part of tuplesort_end() teardown, based only on accidental
implementation factors (that's the first time the memory gets freed
when we do crash). Specifically, if state->status == TSS_SORTEDONTAPE,
we risk this crash on versions before Postgres 10 (more on why 10
appears to be okay tomorrow), when tuplesort_end() just so happens to
have been called (destroying the 2 memory contexts) before the
executor's superfluous pfree() within ExecResetTupleTable().

The ambiguity about who owns the tuple memory when is the basic
problem, once again. One could argue that it's the caller's fault for
not knowing to not pfree() the tuple after tuplesort_end() is called,
but I don't buy that because it seems like an undue burden on callers
to do that. It seems okay to either provide a very weak, simple
guarantee about tuple memory lifetime, or very strong simple guarantee
about tuple memory lifetime. That's what routines like
tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
their contracts to -- we've had enough problems in this area over the
years that that seems pretty clear to me.

ISTM that an appropriate fix is one that results in having
tuplesort_gettupleslot() tuple memory that is *consistently* allocated
within caller's own memory context, at least on versions prior to
Postgres 10 (it's a bit more complicated in Postgres 10). I'll begin
work at a patch for this tomorrow. It's not obvious what approach to
take, because on 9.6+, memory is allocated in tuplecontext, not
sortcontext. A well-targeted fix will not burden low-level READTUP()
routines (that use tuplecontext) with direct knowledge of the problem.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#10)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Thu, Dec 14, 2017 at 5:58 PM, Peter Geoghegan <pg@bowt.ie> wrote:

The ambiguity about who owns the tuple memory when is the basic
problem, once again. One could argue that it's the caller's fault for
not knowing to not pfree() the tuple after tuplesort_end() is called,
but I don't buy that because it seems like an undue burden on callers
to do that. It seems okay to either provide a very weak, simple
guarantee about tuple memory lifetime, or very strong simple guarantee
about tuple memory lifetime. That's what routines like
tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
their contracts to -- we've had enough problems in this area over the
years that that seems pretty clear to me.

ISTM that an appropriate fix is one that results in having
tuplesort_gettupleslot() tuple memory that is *consistently* allocated
within caller's own memory context, at least on versions prior to
Postgres 10 (it's a bit more complicated in Postgres 10).

This took longer than expected. Attached patch, which targets 9.6,
shows what I have in mind. I'm going on vacation shortly, but wanted
to post this patch before I leave. It could definitely use some more
testing, since it was written under time pressure. I expect to be back
early in the new year, so if someone feels like taking this off my
hands, they're more than welcome to do so.

Postgres v10 isn't affected because there is no such thing as
should_free from commit f1f5ec1ef onwards (commit fa117ee40
reintroduced a no-copy optimization a while later, which looks like it
made things unsafe again, but it's actually fine -- the
ExecClearTuple() pfree() won't happen, so no crash). Everything is
fine on v10 because the design was improved for v10. On v10,
tuplesort_gettupleslot() calls ExecStoreMinimalTuple() with arguments
that indicate that the memory still belongs to the tuplesort, unless
it was an explicit copy made with heap_copy_minimal_tuple() there and
then, in which case it belong to caller. This works because on v10
we're always managing some temp buffers, regardless of state->status.
There is no question of a routine like readtup_alloc() allocating or
using memory without knowing about its lifetime, because it's always
using the smallish slab allocator buffers.

To put in another way, in v10 everything is fine because callers to
routines like tuplesort_gettupleslot() either have very limited
guarantees about fetched tuple memory that is owned by tuplesort (it
can go away almost immediately, on the next call to
tuplesort_gettupleslot()), or infinitely flexible guarantees (it's
unambiguously caller's copy, and lives in caller's memory context).
There is no awkward mix of these two situations, as is the case in
earlier versions. That awkward mix is: "You (caller) should free this
memory, and can rely on this memory sticking around until
tuplesort_end() is called, at which point I (tuplesort) will free it
if you haven't -- though I might *not* free it then, based on whether
or not I felt like using your memory context!". This is an accident
waiting to happen. Remember, the crash isn't a use-after-free; it's a
double-free caused by conflated responsibilities across two low-level,
refcount-like mechanisms. The idea of a tts_shouldFree=true tuple
whose memory exists in a context that isn't under the executor's
direct control is a fundamentally bogus idea. It kinda works most of
the time, but only to the extent that the lifetime of a tuplesort is
accidentally tied to the lifetime of an executor node, without the
tuple/slot making in into something like estate->es_tupleTable (as was
the case in Andreas' example).

An alternative approach to fixing this issue, that I also want to
throw out there, is perform a simple change within
ExecResetTupleTable() to make ExecClearTuple()'d slots force
"tts_shouldFree = false" -- a second patch shows what I mean here. All
ExecResetTupleTable() callers pass "shouldFree = false" anyway,
because all memory is about to go away (within FreeExecutorState()).
This second approach seems like a real kludge to me, but it's also a
fix that requires a tiny tweak, rather than non-trivial tuplesort.c
memory management changes. I would feel worse about the idea of
applying this kludge if we had to rely on it forever, but since v10 is
unaffected anyway, we really don't. But it's still an awful kludge. I
find it hard to know what to do here (the vacation is overdue, it
seems).

(Note that v10 does have a minor, related bug, noted in commit
messages of both patches.)

--
Peter Geoghegan

Attachments:

WIP-kludge-fix.patchtext/x-patch; charset=US-ASCII; name=WIP-kludge-fix.patchDownload+21-26
WIP-tuplesort-memcontext-fix.patchtext/x-patch; charset=US-ASCII; name=WIP-tuplesort-memcontext-fix.patchDownload+124-68
#12Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#11)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Sun, Dec 17, 2017 at 10:40 AM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Dec 14, 2017 at 5:58 PM, Peter Geoghegan <pg@bowt.ie> wrote:

The ambiguity about who owns the tuple memory when is the basic
problem, once again. One could argue that it's the caller's fault for
not knowing to not pfree() the tuple after tuplesort_end() is called,
but I don't buy that because it seems like an undue burden on callers
to do that. It seems okay to either provide a very weak, simple
guarantee about tuple memory lifetime, or very strong simple guarantee
about tuple memory lifetime. That's what routines like
tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
their contracts to -- we've had enough problems in this area over the
years that that seems pretty clear to me.

ISTM that an appropriate fix is one that results in having
tuplesort_gettupleslot() tuple memory that is *consistently* allocated
within caller's own memory context, at least on versions prior to
Postgres 10 (it's a bit more complicated in Postgres 10).

This took longer than expected. Attached patch, which targets 9.6,
shows what I have in mind. I'm going on vacation shortly, but wanted
to post this patch before I leave. It could definitely use some more
testing, since it was written under time pressure. I expect to be back
early in the new year, so if someone feels like taking this off my
hands, they're more than welcome to do so.

Could you add that to the next commit fest registered as a bug fix? We
don't want to lose track of that and this should be reviewed.
--
Michael

In reply to: Michael Paquier (#12)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Sun, Dec 17, 2017 at 2:46 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Could you add that to the next commit fest registered as a bug fix? We
don't want to lose track of that and this should be reviewed.

Done.

--
Peter Geoghegan

#14Aleksandr Parfenov
a.parfenov@postgrespro.ru
In reply to: Peter Geoghegan (#13)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

Hi,

All information is related to WIP-tuplesort-memcontext-fix.patch.
The patch applies and fixes the bug on REL9_6_STABLE and LGTM.

However, REL9_5_STABLE affected as well and the patch doesn't applicable to it.
But it may be another entry because the difference in tuplesort may cause some changes in the fix too.

The new status of this patch is: Ready for Committer

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aleksandr Parfenov (#14)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes:

The new status of this patch is: Ready for Committer

I don't feel particularly comfortable committing a patch that
was clearly labeled as a rushed draft by its author.
Peter, where do you stand on this work?

In a quick look at the patches, WIP-kludge-fix.patch seems clearly
unacceptable for back-patching because it changes the signature and
behavior of ExecResetTupleTable, which external code might well be using.

regards, tom lane

In reply to: Tom Lane (#15)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Wed, Jan 17, 2018 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes:

The new status of this patch is: Ready for Committer

I don't feel particularly comfortable committing a patch that
was clearly labeled as a rushed draft by its author.
Peter, where do you stand on this work?

I would like to take another pass over
WIP-tuplesort-memcontext-fix.patch, to be on the safe side. I'm
currently up to my neck in parallel CREATE INDEX work, though, and
would prefer to avoid context switching for a week or two, if
possible. How time sensitive do you think this is?

I think we'll end up doing this:

* Committing the minimal modifications made (in both WIP patches) to
tuplesort_getdatum() to both v10 and master branches.
tuplesort_getdatum() must follow the example of
tuplesort_gettupleslot() on these branches, since
tuplesort_gettupleslot() already manages to get everything right in
recent releases. (There is no known tuplesort_getdatum() crash on
these versions, but this still seems necessary on general principle.)

* Committing something pretty close to
WIP-tuplesort-memcontext-fix.patch to 9.6.

* Committing another fix to 9.5. This fix will apply the same
principles as WIP-tuplesort-memcontext-fix.patch, but will be simpler
mechanically, since the whole batch memory mechanism added to
tuplesort.c for 9.6 isn't involved.

I'm not sure whether or not we should also apply this
still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions
don't have grouping sets, and so cannot crash. ISTM that we should
leave them alone, since tuplesort has had this problem forever.

Perhaps you should go ahead and commit a patch with just the changes
to tuplesort_getdatum() now. This part seems low risk, and worth doing
in a single, (almost) consistent pass over the back branches. This
part is a trivial backpatch, and could be thought of as an independent
problem. (It will be nice to get v10 and master branches completely
out of the way quickly.)

In a quick look at the patches, WIP-kludge-fix.patch seems clearly
unacceptable for back-patching because it changes the signature and
behavior of ExecResetTupleTable, which external code might well be using.

The signature change isn't required, and it was silly of me to add it.
But I also really dislike the general approach it takes, and mostly
posted it because I thought that it was a useful counterpoint.

--
Peter Geoghegan

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#16)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Jan 17, 2018 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't feel particularly comfortable committing a patch that
was clearly labeled as a rushed draft by its author.
Peter, where do you stand on this work?

I would like to take another pass over
WIP-tuplesort-memcontext-fix.patch, to be on the safe side. I'm
currently up to my neck in parallel CREATE INDEX work, though, and
would prefer to avoid context switching for a week or two, if
possible. How time sensitive do you think this is?

Probably not very. It'd be nice to have it done by the next minor
releases, ie before 5-Feb ... but given that these bugs are years
old, missing that deadline would not be catastrophic.

I'm not sure whether or not we should also apply this
still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions
don't have grouping sets, and so cannot crash. ISTM that we should
leave them alone, since tuplesort has had this problem forever.

+1. If the problem isn't known to be reproducible in those branches,
the risk of adding new bugs seems to outweigh any benefit.

regards, tom lane

In reply to: Tom Lane (#17)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Wed, Jan 17, 2018 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Probably not very. It'd be nice to have it done by the next minor
releases, ie before 5-Feb ... but given that these bugs are years
old, missing that deadline would not be catastrophic.

Got it.

I'm not sure whether or not we should also apply this
still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions
don't have grouping sets, and so cannot crash. ISTM that we should
leave them alone, since tuplesort has had this problem forever.

+1. If the problem isn't known to be reproducible in those branches,
the risk of adding new bugs seems to outweigh any benefit.

You could make the same objection to changing tuplesort_getdatum()
outside of the master branch, though. I think that going back further
than that for the (arguably independent) tuplesort_getdatum() subset
fix might still be a good idea. I wonder where you stand on this.

--
Peter Geoghegan

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#18)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Jan 17, 2018 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

+1. If the problem isn't known to be reproducible in those branches,
the risk of adding new bugs seems to outweigh any benefit.

You could make the same objection to changing tuplesort_getdatum()
outside of the master branch, though. I think that going back further
than that for the (arguably independent) tuplesort_getdatum() subset
fix might still be a good idea. I wonder where you stand on this.

I haven't been following the thread very closely, so I don't have an
opinion on that.

regards, tom lane

In reply to: Tom Lane (#19)
hackersbugs
Re: PostgreSQL crashes with SIGSEGV

On Wed, Jan 17, 2018 at 1:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You could make the same objection to changing tuplesort_getdatum()
outside of the master branch, though. I think that going back further
than that for the (arguably independent) tuplesort_getdatum() subset
fix might still be a good idea. I wonder where you stand on this.

I haven't been following the thread very closely, so I don't have an
opinion on that.

A complicating factor for this fix of mine is that mode_final() seems
to have its own ideas about tuple memory lifetime, over and above what
tuplesort_getdatum() explicitly promises, as can be seen here:

/*
* Note: we *cannot* clean up the tuplesort object here, because the value
* to be returned is allocated inside its sortcontext. We could use
* datumCopy to copy it out of there, but it doesn't seem worth the
* trouble, since the cleanup callback will clear the tuplesort later.
*/

My WIP-tuplesort-memcontext-fix.patch fix is premised on the idea that
nodeAgg.c/grouping sets got it right: nodeAgg.c should be able to
continue to assume that in "owning" the memory used for a tuple (in a
table slot), it has it in its own memory context -- otherwise, the
whole tts_shouldFree tuple slot mechanism is prone to double-frees.
This comment directly contradicts/undermines that premise.

ISTM that either grouping sets or mode_final() must necessarily be
wrong, because each oversteps, and infers a different contract from
tuplesort tuple fetching routines (different assumptions about memory
contexts are made in each case). Only one can be right, unless it's
okay to have one rule for tuplesort_getdatum() and another for
tuplesort_gettupleslot() (which seems questionable to me). I still
think that grouping sets is right (and that mode_final() is wrong). Do
you?

--
Peter Geoghegan

In reply to: Peter Geoghegan (#20)
hackersbugs
#22Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#15)
hackersbugs
In reply to: Craig Ringer (#22)
hackersbugs
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#21)
hackersbugs
In reply to: Tom Lane (#24)
hackersbugs
In reply to: Peter Geoghegan (#25)
hackersbugs
In reply to: Peter Geoghegan (#26)
hackersbugs
#28Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Peter Geoghegan (#27)
hackersbugs
In reply to: Kyotaro Horiguchi (#28)
hackersbugs
#30Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Peter Geoghegan (#29)
hackersbugs
In reply to: Kyotaro Horiguchi (#30)
hackersbugs
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#29)
hackersbugs
In reply to: Tom Lane (#32)
hackersbugs