BUG #14344: string_agg(DISTINCT ..) crash
The following bug has been logged on the website:
Bug reference: 14344
Logged by: Regina Obe
Email address: lr@pcorp.us
PostgreSQL version: 9.6rc1
Operating system: Windows 2012 and Windows 7 64-bit
Description:
Let me know if you need more info or if this has already been fixed.
I was testing under
PostgreSQL 9.6rc1, compiled by Visual C++ build 1800, 64-bit
And was able to trigger a crash with this:
-- create test table
CREATE TABLE string_agg_crash_test AS
SELECT i AS id, states[mod(i,6) + 1] AS state, 'junk' || chr(mod(i, 26) + 1)
AS junk
FROM generate_series(1,100000) AS i, (SELECT
'{NULL,MA,HI,IL,KS,NY,NJ}'::varchar[] AS states) AS s ;
-- try to summarize crashes with parallel or not
-- so doesn't seem to be parallel worker related.
set max_parallel_workers_per_gather = 0;
SELECT string_agg(DISTINCT state, ',' ORDER BY state)
FROM string_agg_crash_test;
-- it must have something to do with the DISTINCT, because
-- this doesn't crash
set max_parallel_workers_per_gather = 0;
SELECT string_agg(state, ',' ORDER BY state)
FROM string_agg_crash_test;
The logs don't give me any useful info just
WARNING: terminating connection because of crash of another server
process
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
lr@pcorp.us writes:
Let me know if you need more info or if this has already been fixed.
Hm, works OK for me:
regression=# SELECT string_agg(DISTINCT state, ',' ORDER BY state)
FROM string_agg_crash_test;
string_agg
-------------------
HI,IL,KS,MA,MA,NY
(1 row)
But I see no post-rc1 fixes that look like they might explain this.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Sep 28, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
lr@pcorp.us writes:
Let me know if you need more info or if this has already been fixed.
regression=# SELECT string_agg(DISTINCT state, ',' ORDER BY state)
FROM string_agg_crash_test;
string_agg
-------------------
HI,IL,KS,MA,MA,NY
(1 row)But I see no post-rc1 fixes that look like they might explain this.
No problems on OSX or Linux, but I am getting a crash on Windows
(Win7, MSVC 2010) even with latest REL9_6_STABLE. Oops.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
lr@pcorp.us writes:
Let me know if you need more info or if this has already been fixed.
Hm, works OK for me:
regression=# SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
string_agg
-------------------
HI,IL,KS,MA,MA,NY
(1 row)
But I see no post-rc1 fixes that look like they might explain this.
regards, tom lane
I was able to get it to crash under my mingw64 9.6beta4ish build too. Gdb
looks like this:
Reading symbols from
C:\ming64gcc48\projects\postgresql\rel\pg9.6w64gcc48\bin\postgres.exe...done
.
0x000000007732ae11 in ntdll!DbgBreakPoint ()
from C:\Windows\SYSTEM32\ntdll.dll
(gdb) cont
Continuing.
[Thread 6944.0x61c exited with code 0]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 6944.0x2384]
toast_raw_datum_size (value=242751291) at tuptoaster.c:358
358 tuptoaster.c: No such file or directory.
(gdb) bt
#0 toast_raw_datum_size (value=242751291) at tuptoaster.c:358
#1 0x000000000078e616 in texteq (fcinfo=0x299e760) at varlena.c:1615
#2 0x00000000007bb045 in FunctionCall2Coll (flinfo=flinfo@entry=0x50ea400,
collation=collation@entry=0, arg1=arg1@entry=85010827,
arg2=<optimized out>) at fmgr.c:1319
#3 0x0000000000585548 in execTuplesMatch (slot1=slot1@entry=0x50e9e28,
slot2=slot2@entry=0x50ea248, numCols=numCols@entry=2,
matchColIdx=0x50ea350, eqfunctions=0x50ea3d0,
evalContext=evalContext@entry=0x38d550) at execGrouping.c:100
#4 0x00000000005984e5 in process_ordered_aggregate_multi (
pergroupstate=0x50e9980, pertrans=0x50e8968, aggstate=0x50aaaf0)
at nodeAgg.c:1244
#5 finalize_aggregates (aggstate=aggstate@entry=0x50aaaf0,
peraggs=peraggs@entry=0x50abfb0, pergroup=pergroup@entry=0x50e9980,
currentSet=currentSet@entry=0) at nodeAgg.c:1552
#6 0x0000000000598c7a in agg_retrieve_direct (aggstate=0x50aaaf0)
at nodeAgg.c:2157
#7 ExecAgg (node=node@entry=0x50aaaf0) at nodeAgg.c:1837
#8 0x000000000058b228 in ExecProcNode (node=node@entry=0x50aaaf0)
at execProcnode.c:503
#9 0x00000000005871c5 in ExecutePlan (dest=0x50e5838,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x50aaaf0, estate=0x50aa9d8) at execMain.c:1567
#10 standard_ExecutorRun (queryDesc=0x5057288, direction=<optimized out>,
count=0) at execMain.c:338
#11 0x00000000006b1fe6 in PortalRunSelect (portal=portal@entry=0x509e978,
forward=forward@entry=1 '\001', count=0, count@entry=43643472,
dest=dest@entry=0x0) at pquery.c:948
#12 0x00000000006b3686 in PortalRun (portal=0x299ee90,
portal@entry=0x509e978, count=43643472, count@entry=2147483647,
isTopLevel=isTopLevel@entry=0 '\000', dest=0x0, dest@entry=0x50e5838,
altdest=altdest@entry=0x50e5838, completionTag=0x299f210 "",
completionTag@entry=0x4000000007b <error: Cannot access memory at
address 0x4000000007b>) at pquery.c:789
#13 0x00000000006b0e54 in exec_simple_query (
query_string=0x100000000000000 <error: Cannot access memory at address
0x100000000000000>) at postgres.c:1094
#14 PostgresMain (argc=<optimized out>, argv=argv@entry=0x2893e0,
dbname=0x18001700160015 <error: Cannot access memory at address
0x18001700160015>, username=<optimized out>) at postgres.c:4074
#15 0x0000000000647aad in BackendRun (port=0x299f400) at postmaster.c:4262
#16 SubPostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2f7ea0)
at postmaster.c:4752
#17 0x0000000000803d68 in main (argc=3, argv=0x2f7ea0) at main.c:205
(gdb)
I'll compile against the latest 9.6 branch then and see if it crashes there
too. Might be windows specific perhaps.
Thanks,
Regina
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Sep 28, 2016 at 9:23 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Wed, Sep 28, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
lr@pcorp.us writes:
Let me know if you need more info or if this has already been fixed.
regression=# SELECT string_agg(DISTINCT state, ',' ORDER BY state)
FROM string_agg_crash_test;
string_agg
-------------------
HI,IL,KS,MA,MA,NY
(1 row)But I see no post-rc1 fixes that look like they might explain this.
No problems on OSX or Linux, but I am getting a crash on Windows
(Win7, MSVC 2010) even with latest REL9_6_STABLE. Oops.
After rolling in crashdumps/ and remembering some stuff about windbg:
- .sympath to point to symbols
- !analyze -v to load the backtrace:
00000000`0042eff0 00000001`40380293 : 000007fe`fda07a01 000007fe`fda07a11
00000000`ffff0000 00000000`000004e4 : postgres!toast_raw_datum_size+0x4
[c:\users\ioltas\git\postgres\src\backend\access\heap\tuptoaster.c @ 358]
00000000`0042f030 00000001`403ab6dc : 00000000`009f0000 00000000`00000002
00000000`008491f8 00000000`008f3caa : postgres!texteq+0x33
[c:\users\ioltas\git\postgres\src\backend\utils\adt\varlena.c @ 1616]
00000000`0042f060 00000001`4012a29e : 00000000`00000002 00000000`008491f8
00000000`008f3caa 00000000`00000001 : postgres!FunctionCall2Coll+0x5c
[c:\users\ioltas\git\postgres\src\backend\utils\fmgr\fmgr.c @ 1322]
00000000`0042f450 00000001`4013d05c : 00000000`00000002 00000000`01b33b30
00000000`008f3b00 00000000`008f3830 : postgres!execTuplesMatch+0xae
[c:\users\ioltas\git\postgres\src\backend\executor\execgrouping.c @ 101]
00000000`0042f4a0 00000001`4013d61c : 00000000`00895f00 00000000`00000000
00000000`01b359d8 00000000`00849308 :
postgres!process_ordered_aggregate_multi+0xfc
[c:\users\ioltas\git\postgres\src\backend\executor\nodeagg.c @ 1248]
00000000`0042f520 00000001`4013f081 : 00000000`00000001 00000000`00000000
00000000`01b33f98 00000000`00000000 : postgres!finalize_aggregates+0x9c
[c:\users\ioltas\git\postgres\src\backend\executor\nodeagg.c @ 1557]
00000000`0042f580 00000001`4013fe9a : 00000000`008ee800 00000000`00000003
00000000`01b33ee8 00000000`01b33f98 : postgres!agg_retrieve_direct+0x461
[c:\users\ioltas\git\postgres\src\backend\executor\nodeagg.c @ 2163]
00000000`0042f620 00000001`40130352 : 00000000`00000054 00000001`4029eaea
00000000`00000000 00000000`00000001 : postgres!ExecAgg+0x4a
[c:\users\ioltas\git\postgres\src\backend\executor\nodeagg.c @ 1838]
00000000`0042f650 00000001`4012c98d : 00000000`01b33a20 00000001`4000a42c
00000000`ffffffff 00000000`00000001 : postgres!ExecProcNode+0x202
[c:\users\ioltas\git\postgres\src\backend\executor\execprocnode.c @ 503]
00000000`0042f680 00000001`4012dc58 : 00000000`008d9b60 00000000`00000001
00000000`01b33a20 00000000`00000000 : postgres!ExecutePlan+0x7d
[c:\users\ioltas\git\postgres\src\backend\executor\execmain.c @ 1573]
00000000`0042f6d0 00000001`4029f659 : 00000000`008df510 00000000`00000000
00000000`008df510 00000001`403cb144 : postgres!standard_ExecutorRun+0xb8
[c:\users\ioltas\git\postgres\src\backend\executor\execmain.c @ 350]
00000000`0042f740 00000001`4029fe6a : 00000000`008df510 00000000`008df510
00000000`00843d50 00000000`0042f8b0 : postgres!PortalRunSelect+0x99
[c:\users\ioltas\git\postgres\src\backend\tcop\pquery.c @ 949]
00000000`0042f7a0 00000001`4029d723 : 00000000`00000000 00000000`00843d50
00000000`00000000 00000000`00000000 : postgres!PortalRun+0x1ca
[c:\users\ioltas\git\postgres\src\backend\tcop\pquery.c @ 796]
00000000`0042f980 00000001`4029e1fc : 00000000`00000051 00000000`00894e60
00000000`0042fb70 00000000`009f8220 : postgres!exec_simple_query+0x573
[c:\users\ioltas\git\postgres\src\backend\tcop\postgres.c @ 1101]
00000000`0042fa70 00000001`4022eaea : 00000001`405f4300 00000001`404a37b8
00000000`000010a6 00000001`404a3938 : postgres!PostgresMain+0x63c
[c:\users\ioltas\git\postgres\src\backend\tcop\postgres.c @ 4072]
00000000`0042fc60 00000001`40232645 : 00000001`00000001 00000000`000861df
00000000`1f7f5412 00000000`00000000 : postgres!BackendRun+0x1ca
[c:\users\ioltas\git\postgres\src\backend\postmaster\postmaster.c @ 4271]
00000000`0042fcc0 00000001`4016ca71 : 00000000`00000003 00000000`009f2329
00000000`00000003 00000000`00000000 : postgres!SubPostmasterMain+0x255
[c:\users\ioltas\git\postgres\src\backend\postmaster\postmaster.c @ 4762]
00000000`0042ff00 00000001`403f1bfa : 00000000`00000000 00000000`00000001
00000000`00000000 00000000`00000000 : postgres!main+0x3c1
[c:\users\ioltas\git\postgres\src\backend\main\main.c @ 216]
00000000`0042ff30 00000000`76cc59cd : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : postgres!__tmainCRTStartup+0x11a
[f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 555]
00000000`0042ff60 00000000`76efa2e1 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`0042ff90 00000000`00000000 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
STACK_COMMAND: ~0s; .ecxr ; kb
FAULTING_SOURCE_CODE:
354: {
355: struct varlena *attr = (struct varlena *)
DatumGetPointer(value);
356: Size result;
357:
358: if (VARATT_IS_EXTERNAL_ONDISK(attr))
359: {
360: /* va_rawsize is the size of the original datum --
including header */
361: struct varatt_external toast_pointer;
362:
363: VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
--
Michael
On Wed, Sep 28, 2016 at 9:28 PM, Regina Obe <lr@pcorp.us> wrote:
I'll compile against the latest 9.6 branch then and see if it crashes there
too. Might be windows specific perhaps.
Confirmed. I got the same trace.That's an aggregate bug in the executor.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 29 September 2016 at 17:36, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Sep 28, 2016 at 9:28 PM, Regina Obe <lr@pcorp.us> wrote:
I'll compile against the latest 9.6 branch then and see if it crashes there
too. Might be windows specific perhaps.Confirmed. I got the same trace.That's an aggregate bug in the executor.
hmm, I would be surprised about that. The planner would never generate
a parallel aggregate plan for that given that string_agg has no
combine function, and there's also a DISTINCT in there, which is never
made parallel. The code for the non parallel case is not really
changed very much in 9.6.
My guess would be a sorting bug...
Let me check....
Looks like:
commit 0011c0091e886b874e485a46ff2c94222ffbf550
Author: Robert Haas <rhaas@postgresql.org>
Date: Thu Mar 17 16:05:05 2016 -0400
Improve memory management for external sorts.
Introduce a new memory context which stores tuple data, and reset it
at the end of each merge pass; this helps avoid memory fragmentation
and, consequently, overallocation. Also, for the final merge patch,
eliminate memory context chunk header overhead entirely by allocating
all of the memory used for buffering tuples during the merge in a
single chunk. Since this modestly increases the number of tuples we
can store, grow the memtuples array a bit so that we're less likely to
run short of slots there.
Peter Geoghegan. Review and testing of patches in this series by
Jeff Janes, Greg Stark, Mithun Cy, and me.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 29 September 2016 at 18:41, David Rowley
<david.rowley@2ndquadrant.com> wrote:
My guess would be a sorting bug...
Let me check....
Looks like:
commit 0011c0091e886b874e485a46ff2c94222ffbf550
Just for the record I can only get this to crash on Windows too, but I
don't have much confidence that it's a windows only issue.
I compiled with TRACE_SORT on Windows and Linux to find out what the
sort does differently on each platform.
On windows:
LOG: begin tuple sort: nkeys = 2, workMem = 4096, randomAccess = f
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: switching to external sort with 15 tapes: CPU 0.00s/0.03u sec
elapsed 0.02 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: performsort starting: CPU 0.00s/0.37u sec elapsed 0.37 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: finished writing run 1 to tape 0: CPU 0.00s/0.43u sec elapsed 0.43 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: finished writing final run 2 to tape 1: CPU 0.00s/0.48u sec
elapsed 0.48 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: grew memtuples 1.29x from 58253 (1366 KB) to 74896 (1756 KB) for
final merge
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: tape 0 initially used 878 KB of 1147 KB batch (0.765) and 37446
out of 37447 slots (1.000)
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: tape 1 initially used 309 KB of 1147 KB batch (0.269) and 13182
out of 37447 slots (0.352)
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: performsort done (except 2-way final merge): CPU 0.01s/0.48u sec
elapsed 0.49 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) from
string_Agg_crash_test;
LOG: server process (PID 10848) was terminated by exception 0xC0000005
On Linux:
LOG: begin tuple sort: nkeys = 2, workMem = 4096, randomAccess = f
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: switching to external sort with 15 tapes: CPU 0.00s/0.03u sec
elapsed 0.03 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: replacement selection will sort 58253 first run tuples
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: performsort starting: CPU 0.00s/0.16u sec elapsed 0.16 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: finished incrementally writing first run 1 to tape 0: CPU
0.00s/0.20u sec elapsed 0.20 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: starting quicksort of run 2: CPU 0.00s/0.20u sec elapsed 0.20 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: finished quicksort of run 2: CPU 0.00s/0.20u sec elapsed 0.20 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: finished writing run 2 to tape 1: CPU 0.00s/0.21u sec elapsed 0.21 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: grew memtuples 1.29x from 58253 (1366 KB) to 74896 (1756 KB) for
final merge
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: tape 0 initially used 878 KB of 1147 KB batch (0.765) and 37446
out of 37447 slots (1.000)
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: tape 1 initially used 309 KB of 1147 KB batch (0.269) and 13182
out of 37447 slots (0.352)
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: performsort done (except 2-way final merge): CPU 0.00s/0.21u sec
elapsed 0.21 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
LOG: external sort ended, 177 disk blocks used: CPU 0.00s/0.24u sec
elapsed 0.24 sec
STATEMENT: SELECT string_agg(DISTINCT state, ',' ORDER BY state) FROM
string_agg_crash_test;
Both versions are 6ad8ac6
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 6:41 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Let me check....
Looks like:
commit 0011c0091e886b874e485a46ff2c94222ffbf550
Why do you say that? My first suspicion is
f1f5ec1efafe74ca45e24e0bf3371b1d6985c8ee, fwiw.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 30 September 2016 at 00:29, Peter Geoghegan <pg@heroku.com> wrote:
On Thu, Sep 29, 2016 at 6:41 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Let me check....
Looks like:
commit 0011c0091e886b874e485a46ff2c94222ffbf550
Why do you say that? My first suspicion is
f1f5ec1efafe74ca45e24e0bf3371b1d6985c8ee, fwiw.
git bisect
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 12:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
git bisect
That commit might just change things in a way that makes the test case
accidentally not crash. The characteristics of the merge will be
significantly different. Could you tell me if reverting
f1f5ec1efafe74ca45e24e0bf3371b1d6985c8ee on Windows helps?
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 12:34 PM, Peter Geoghegan <pg@heroku.com> wrote:
That commit might just change things in a way that makes the test case
accidentally not crash.
Of course, I meant that reverting that commit might merely mask the
problem, rather than removing it.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 30 September 2016 at 00:34, Peter Geoghegan <pg@heroku.com> wrote:
On Thu, Sep 29, 2016 at 12:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:git bisect
That commit might just change things in a way that makes the test case
accidentally not crash. The characteristics of the merge will be
significantly different. Could you tell me if reverting
f1f5ec1efafe74ca45e24e0bf3371b1d6985c8ee on Windows helps?
I didn't get a clean revert due to some conflicts. I manually ended up
with the attached, which, when applied still crashes on windows.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
f1f5ec1e_revert.patchapplication/octet-stream; name=f1f5ec1e_revert.patchDownload+23-75
On Thu, Sep 29, 2016 at 1:04 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
I didn't get a clean revert due to some conflicts. I manually ended up
with the attached, which, when applied still crashes on windows.
Thanks.
I have been able to reproduce the problem, in a fashion, so can now
probably debug it easily enough. Specifically, I can get Valgrind to
complain about Regina's test case. Valgrind memcheck with
track_origins enabled, running Postgres git tip, prints the following:
2016-09-29 05:06:56.382 PDT 15292 LOG: tape 1 initially used 409 KB
of 1147 KB batch (0.357) and 17448 out of 37447 slots (0.466)
2016-09-29 05:06:56.383 PDT 15292 LOG: performsort done (except 2-way
final merge): CPU 0.02s/13.32u sec elapsed 13.35 sec
==15292== Invalid read of size 1
==15292== at 0x4A2F13: toast_raw_datum_size (tuptoaster.c:358)
==15292== by 0x7A41EF: texteq (varlena.c:1615)
==15292== by 0x7CA538: FunctionCall2Coll (fmgr.c:1319)
==15292== by 0x5BBC48: execTuplesMatch (execGrouping.c:100)
==15292== by 0x5CD7D5: process_ordered_aggregate_multi (nodeAgg.c:1244)
==15292== by 0x5CD7D5: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== by 0x6CCC43: exec_simple_query (postgres.c:1094)
==15292== by 0x6CCC43: PostgresMain (postgres.c:4070)
==15292== Address 0xc3e573b is 399,099 bytes inside a block of size
1,173,864 free'd
==15292== at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E51C6: AllocSetFree (aset.c:989)
==15292== by 0x7E6C5D: pfree (mcxt.c:1012)
==15292== by 0x7ED2CC: mergebatchfreetape (tuplesort.c:3100)
==15292== by 0x7ED2CC: tuplesort_gettuple_common (tuplesort.c:2017)
==15292== by 0x7F11A0: tuplesort_gettupleslot (tuplesort.c:2066)
==15292== by 0x5CD876: process_ordered_aggregate_multi (nodeAgg.c:1231)
==15292== by 0x5CD876: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== Block was alloc'd at
==15292== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E4897: AllocSetAlloc (aset.c:691)
==15292== by 0x7E6F6D: MemoryContextAllocHuge (mcxt.c:1083)
==15292== by 0x7ECBF0: mergebatch (tuplesort.c:2983)
==15292== by 0x7ECBF0: beginmerge (tuplesort.c:2793)
==15292== by 0x7F10F0: mergeruns (tuplesort.c:2547)
==15292== by 0x7F10F0: tuplesort_performsort (tuplesort.c:1797)
==15292== by 0x5CD757: process_ordered_aggregate_multi (nodeAgg.c:1225)
==15292== by 0x5CD757: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292==
{
<insert_a_suppression_name_here>
Memcheck:Addr1
fun:toast_raw_datum_size
fun:texteq
fun:FunctionCall2Coll
fun:execTuplesMatch
fun:process_ordered_aggregate_multi
fun:finalize_aggregates
fun:agg_retrieve_direct
fun:ExecAgg
fun:ExecProcNode
fun:ExecutePlan
fun:standard_ExecutorRun
fun:ExecutorRun
fun:PortalRunSelect
fun:PortalRun
fun:exec_simple_query
fun:PostgresMain
}
==15292== Invalid read of size 1
==15292== at 0x7CC7AF: pg_detoast_datum_packed (fmgr.c:2214)
==15292== by 0x7A420C: texteq (varlena.c:1621)
==15292== by 0x7CA538: FunctionCall2Coll (fmgr.c:1319)
==15292== by 0x5BBC48: execTuplesMatch (execGrouping.c:100)
==15292== by 0x5CD7D5: process_ordered_aggregate_multi (nodeAgg.c:1244)
==15292== by 0x5CD7D5: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== by 0x6CCC43: exec_simple_query (postgres.c:1094)
==15292== by 0x6CCC43: PostgresMain (postgres.c:4070)
==15292== Address 0xc3e573b is 399,099 bytes inside a block of size
1,173,864 free'd
==15292== at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E51C6: AllocSetFree (aset.c:989)
==15292== by 0x7E6C5D: pfree (mcxt.c:1012)
==15292== by 0x7ED2CC: mergebatchfreetape (tuplesort.c:3100)
==15292== by 0x7ED2CC: tuplesort_gettuple_common (tuplesort.c:2017)
==15292== by 0x7F11A0: tuplesort_gettupleslot (tuplesort.c:2066)
==15292== by 0x5CD876: process_ordered_aggregate_multi (nodeAgg.c:1231)
==15292== by 0x5CD876: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== Block was alloc'd at
==15292== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E4897: AllocSetAlloc (aset.c:691)
==15292== by 0x7E6F6D: MemoryContextAllocHuge (mcxt.c:1083)
==15292== by 0x7ECBF0: mergebatch (tuplesort.c:2983)
==15292== by 0x7ECBF0: beginmerge (tuplesort.c:2793)
==15292== by 0x7F10F0: mergeruns (tuplesort.c:2547)
==15292== by 0x7F10F0: tuplesort_performsort (tuplesort.c:1797)
==15292== by 0x5CD757: process_ordered_aggregate_multi (nodeAgg.c:1225)
==15292== by 0x5CD757: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292==
{
<insert_a_suppression_name_here>
Memcheck:Addr1
fun:pg_detoast_datum_packed
fun:texteq
fun:FunctionCall2Coll
fun:execTuplesMatch
fun:process_ordered_aggregate_multi
fun:finalize_aggregates
fun:agg_retrieve_direct
fun:ExecAgg
fun:ExecProcNode
fun:ExecutePlan
fun:standard_ExecutorRun
fun:ExecutorRun
fun:PortalRunSelect
fun:PortalRun
fun:exec_simple_query
fun:PostgresMain
}
==15292== Invalid read of size 1
==15292== at 0x7A4214: texteq (varlena.c:1623)
==15292== by 0x7CA538: FunctionCall2Coll (fmgr.c:1319)
==15292== by 0x5BBC48: execTuplesMatch (execGrouping.c:100)
==15292== by 0x5CD7D5: process_ordered_aggregate_multi (nodeAgg.c:1244)
==15292== by 0x5CD7D5: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== by 0x6CCC43: exec_simple_query (postgres.c:1094)
==15292== by 0x6CCC43: PostgresMain (postgres.c:4070)
==15292== by 0x670D36: BackendRun (postmaster.c:4260)
==15292== by 0x670D36: BackendStartup (postmaster.c:3934)
==15292== by 0x670D36: ServerLoop (postmaster.c:1691)
==15292== by 0x670D36: PostmasterMain (postmaster.c:1299)
==15292== Address 0xc3e573b is 399,099 bytes inside a block of size
1,173,864 free'd
==15292== at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E51C6: AllocSetFree (aset.c:989)
==15292== by 0x7E6C5D: pfree (mcxt.c:1012)
==15292== by 0x7ED2CC: mergebatchfreetape (tuplesort.c:3100)
==15292== by 0x7ED2CC: tuplesort_gettuple_common (tuplesort.c:2017)
==15292== by 0x7F11A0: tuplesort_gettupleslot (tuplesort.c:2066)
==15292== by 0x5CD876: process_ordered_aggregate_multi (nodeAgg.c:1231)
==15292== by 0x5CD876: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== Block was alloc'd at
==15292== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E4897: AllocSetAlloc (aset.c:691)
==15292== by 0x7E6F6D: MemoryContextAllocHuge (mcxt.c:1083)
==15292== by 0x7ECBF0: mergebatch (tuplesort.c:2983)
==15292== by 0x7ECBF0: beginmerge (tuplesort.c:2793)
==15292== by 0x7F10F0: mergeruns (tuplesort.c:2547)
==15292== by 0x7F10F0: tuplesort_performsort (tuplesort.c:1797)
==15292== by 0x5CD757: process_ordered_aggregate_multi (nodeAgg.c:1225)
==15292== by 0x5CD757: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292==
{
<insert_a_suppression_name_here>
Memcheck:Addr1
fun:texteq
fun:FunctionCall2Coll
fun:execTuplesMatch
fun:process_ordered_aggregate_multi
fun:finalize_aggregates
fun:agg_retrieve_direct
fun:ExecAgg
fun:ExecProcNode
fun:ExecutePlan
fun:standard_ExecutorRun
fun:ExecutorRun
fun:PortalRunSelect
fun:PortalRun
fun:exec_simple_query
fun:PostgresMain
fun:BackendRun
fun:BackendStartup
fun:ServerLoop
fun:PostmasterMain
}
==15292== Invalid read of size 1
==15292== at 0x4C33D28: __memcmp_sse4_1 (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7A4241: texteq (varlena.c:1623)
==15292== by 0x7CA538: FunctionCall2Coll (fmgr.c:1319)
==15292== by 0x5BBC48: execTuplesMatch (execGrouping.c:100)
==15292== by 0x5CD7D5: process_ordered_aggregate_multi (nodeAgg.c:1244)
==15292== by 0x5CD7D5: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== by 0x6CCC43: exec_simple_query (postgres.c:1094)
==15292== by 0x6CCC43: PostgresMain (postgres.c:4070)
==15292== Address 0xc3e573c is 399,100 bytes inside a block of size
1,173,864 free'd
==15292== at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E51C6: AllocSetFree (aset.c:989)
==15292== by 0x7E6C5D: pfree (mcxt.c:1012)
==15292== by 0x7ED2CC: mergebatchfreetape (tuplesort.c:3100)
==15292== by 0x7ED2CC: tuplesort_gettuple_common (tuplesort.c:2017)
==15292== by 0x7F11A0: tuplesort_gettupleslot (tuplesort.c:2066)
==15292== by 0x5CD876: process_ordered_aggregate_multi (nodeAgg.c:1231)
==15292== by 0x5CD876: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== Block was alloc'd at
==15292== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E4897: AllocSetAlloc (aset.c:691)
==15292== by 0x7E6F6D: MemoryContextAllocHuge (mcxt.c:1083)
==15292== by 0x7ECBF0: mergebatch (tuplesort.c:2983)
==15292== by 0x7ECBF0: beginmerge (tuplesort.c:2793)
==15292== by 0x7F10F0: mergeruns (tuplesort.c:2547)
==15292== by 0x7F10F0: tuplesort_performsort (tuplesort.c:1797)
==15292== by 0x5CD757: process_ordered_aggregate_multi (nodeAgg.c:1225)
==15292== by 0x5CD757: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292==
{
<insert_a_suppression_name_here>
Memcheck:Addr1
fun:__memcmp_sse4_1
fun:texteq
fun:FunctionCall2Coll
fun:execTuplesMatch
fun:process_ordered_aggregate_multi
fun:finalize_aggregates
fun:agg_retrieve_direct
fun:ExecAgg
fun:ExecProcNode
fun:ExecutePlan
fun:standard_ExecutorRun
fun:ExecutorRun
fun:PortalRunSelect
fun:PortalRun
fun:exec_simple_query
fun:PostgresMain
}
==15292== Invalid read of size 1
==15292== at 0x4C33D45: __memcmp_sse4_1 (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7A4241: texteq (varlena.c:1623)
==15292== by 0x7CA538: FunctionCall2Coll (fmgr.c:1319)
==15292== by 0x5BBC48: execTuplesMatch (execGrouping.c:100)
==15292== by 0x5CD7D5: process_ordered_aggregate_multi (nodeAgg.c:1244)
==15292== by 0x5CD7D5: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== by 0x6CCC43: exec_simple_query (postgres.c:1094)
==15292== by 0x6CCC43: PostgresMain (postgres.c:4070)
==15292== Address 0xc3e573a is 399,098 bytes inside a block of size
1,173,864 free'd
==15292== at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E51C6: AllocSetFree (aset.c:989)
==15292== by 0x7E6C5D: pfree (mcxt.c:1012)
==15292== by 0x7ED2CC: mergebatchfreetape (tuplesort.c:3100)
==15292== by 0x7ED2CC: tuplesort_gettuple_common (tuplesort.c:2017)
==15292== by 0x7F11A0: tuplesort_gettupleslot (tuplesort.c:2066)
==15292== by 0x5CD876: process_ordered_aggregate_multi (nodeAgg.c:1231)
==15292== by 0x5CD876: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292== Block was alloc'd at
==15292== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15292== by 0x7E4897: AllocSetAlloc (aset.c:691)
==15292== by 0x7E6F6D: MemoryContextAllocHuge (mcxt.c:1083)
==15292== by 0x7ECBF0: mergebatch (tuplesort.c:2983)
==15292== by 0x7ECBF0: beginmerge (tuplesort.c:2793)
==15292== by 0x7F10F0: mergeruns (tuplesort.c:2547)
==15292== by 0x7F10F0: tuplesort_performsort (tuplesort.c:1797)
==15292== by 0x5CD757: process_ordered_aggregate_multi (nodeAgg.c:1225)
==15292== by 0x5CD757: finalize_aggregates (nodeAgg.c:1552)
==15292== by 0x5CE328: agg_retrieve_direct (nodeAgg.c:2157)
==15292== by 0x5CE328: ExecAgg (nodeAgg.c:1837)
==15292== by 0x5C135B: ExecProcNode (execProcnode.c:503)
==15292== by 0x5BD86C: ExecutePlan (execMain.c:1567)
==15292== by 0x5BD86C: standard_ExecutorRun (execMain.c:338)
==15292== by 0x5BD969: ExecutorRun (execMain.c:286)
==15292== by 0x6CE99E: PortalRunSelect (pquery.c:948)
==15292== by 0x6CFDC7: PortalRun (pquery.c:789)
==15292==
{
<insert_a_suppression_name_here>
Memcheck:Addr1
fun:__memcmp_sse4_1
fun:texteq
fun:FunctionCall2Coll
fun:execTuplesMatch
fun:process_ordered_aggregate_multi
fun:finalize_aggregates
fun:agg_retrieve_direct
fun:ExecAgg
fun:ExecProcNode
fun:ExecutePlan
fun:standard_ExecutorRun
fun:ExecutorRun
fun:PortalRunSelect
fun:PortalRun
fun:exec_simple_query
fun:PostgresMain
}
--15292-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0xf2
--15292-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0xf3
2016-09-29 05:06:58.308 PDT 15292 LOG: external sort ended, 177 disk
blocks used: CPU 0.02s/15.24u sec elapsed 15.27 sec
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 5:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm, works OK for me:
regression=# SELECT string_agg(DISTINCT state, ',' ORDER BY state)
FROM string_agg_crash_test;
string_agg
-------------------
HI,IL,KS,MA,MA,NY
(1 row)
Actually, it doesn't work for you, since "MA" appears twice. :-(
A fix is in the works.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@heroku.com> writes:
On Thu, Sep 29, 2016 at 5:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm, works OK for me:
regression=# SELECT string_agg(DISTINCT state, ',' ORDER BY state)
FROM string_agg_crash_test;
string_agg
-------------------
HI,IL,KS,MA,MA,NY
(1 row)
Actually, it doesn't work for you, since "MA" appears twice. :-(
Hah ... it was too late at night for me to notice that ;-)
A fix is in the works.
So you've identified the problem and the rest of us don't need to
chase it anymore?
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So you've identified the problem and the rest of us don't need to
chase it anymore?
Yes. Just verifying my understanding in detail. Shouldn't be long.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 1:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
I have been able to reproduce the problem, in a fashion, so can now
probably debug it easily enough. Specifically, I can get Valgrind to
complain about Regina's test case.
I see what the problem is. This use-after-free bug occurs because, for
some reason, tuplesort_gettupleslot()'s contract was not updated by me
alongside analogous routines like tuplesort_getindextuple() as part of
the batch memory commit. This meant that at least one particular case
(process_ordered_aggregate_multi() calls to tuplesort_gettupleslot(),
with distinct columns) felt entitled to reuse some slot's tuple, a
tuple located in memory managed by tuplesort.c as batch memory. The
reuse was across calls to tuplesort_gettupleslot().
Attached patch fixes the bug, and updates the contract of
tuplesort_gettupleslot() so it's no longer the odd one out. However,
I'm not recommending this be committed without further discussion,
because the performance overhead could be a concern. Also, it looks
like hypothetical_dense_rank_final() requires similar treatment, which
isn't addressed by this patch. I did audit other callers of
tuplesort_gettupleslot() and analogous routines for other classes of
caller-tuple (e.g., tuplesort_getindextuple() callers were examined
too). Look like it's just these two tuplesort callers that are
affected.
How much could it hurt performance to fix the bug in this way? Does
anyone see an alternative? Pushing knowledge of this special case into
tuplesort.c seems like an unappealing alternative.
--
Peter Geoghegan
Attachments:
0001-Fix-use-after-free-around-DISTINCT-transition-functi.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-use-after-free-around-DISTINCT-transition-functi.patchDownload+6-5
Peter Geoghegan <pg@heroku.com> writes:
I see what the problem is. This use-after-free bug occurs because, for
some reason, tuplesort_gettupleslot()'s contract was not updated by me
alongside analogous routines like tuplesort_getindextuple() as part of
the batch memory commit. This meant that at least one particular case
(process_ordered_aggregate_multi() calls to tuplesort_gettupleslot(),
with distinct columns) felt entitled to reuse some slot's tuple, a
tuple located in memory managed by tuplesort.c as batch memory.
My immediate reaction to this is WTF. It seems like you have completely
broken the expected contract of tuplesort_gettupleslot, which is that
it copies the data into caller-owned memory. That cannot stand. Even
if you are willing to hack all the core callers so that they no longer
expect that, you've probably broken --- subtly --- callers in extensions.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Sep 29, 2016 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My immediate reaction to this is WTF. It seems like you have completely
broken the expected contract of tuplesort_gettupleslot, which is that
it copies the data into caller-owned memory. That cannot stand.
Other comparable routines, like tuplesort_getindextuple(), could be
argued to have always had a contract that allows for this kind of
thing (recycling batch memory) because they get to examine a
*should_free pointer -- do you accept that much?
tuplesort_gettupleslot() does that for callers that happen to want to
use a tuple slot, rather than a direct caller tuple. So, it doesn't
seem like there is a very hard distinction there; that could also have
broken something in an extension too, since the lifetime of memory
controlled by tuplesort (should_free = false cases, managed by
tuplesort memory context) was always a bit unclear.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs