Tuple sort is broken. It crashes on simple test.

Started by Mithun Cyalmost 9 years ago7 messages
#1Mithun Cy
mithun.cy@enterprisedb.com

Test:
------
create table seq_tab(t int);
insert into seq_tab select generate_series(1, 10000000);
select count(distinct t) from seq_tab;

#0 0x000000000094a9ad in pfree (pointer=0x0) at mcxt.c:1007
#1 0x0000000000953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
#2 0x0000000000953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
#3 0x00000000009521bc in tuplesort_performsort (state=0x1611450) at
tuplesort.c:1813
#4 0x0000000000662b85 in process_ordered_aggregate_single
(aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
nodeAgg.c:1178
#5 0x00000000006636e0 in finalize_aggregates (aggstate=0x160b540,
peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
#6 0x0000000000664509 in agg_retrieve_direct (aggstate=0x160b540) at
nodeAgg.c:2266
#7 0x0000000000663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
#8 0x0000000000650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Git bisect shows me the following commit has caused it.

commit Id e94568ecc10f2638e542ae34f2990b821bbf90ac

Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2016-10-03 16:07:49
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2016-10-03 16:07:49

Change the way pre-reading in external sort's merge phase works.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Michael Paquier
michael.paquier@gmail.com
In reply to: Mithun Cy (#1)
Re: Tuple sort is broken. It crashes on simple test.

(Adding Peter in CC who also worked on that)

On Mon, Jan 16, 2017 at 7:14 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Test:
------
create table seq_tab(t int);
insert into seq_tab select generate_series(1, 10000000);
select count(distinct t) from seq_tab;

#0 0x000000000094a9ad in pfree (pointer=0x0) at mcxt.c:1007
#1 0x0000000000953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
#2 0x0000000000953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
#3 0x00000000009521bc in tuplesort_performsort (state=0x1611450) at
tuplesort.c:1813
#4 0x0000000000662b85 in process_ordered_aggregate_single
(aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
nodeAgg.c:1178
#5 0x00000000006636e0 in finalize_aggregates (aggstate=0x160b540,
peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
#6 0x0000000000664509 in agg_retrieve_direct (aggstate=0x160b540) at
nodeAgg.c:2266
#7 0x0000000000663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
#8 0x0000000000650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Indeed. It crashes for me immediately by adding an ORDER BY:
select count(distinct t) from seq_tab order by 1;
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Peter Geoghegan
pg@heroku.com
In reply to: Michael Paquier (#2)
1 attachment(s)
Re: Tuple sort is broken. It crashes on simple test.

On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Indeed. It crashes for me immediately by adding an ORDER BY:
select count(distinct t) from seq_tab order by 1;

The problem was that one particular call to the macro
RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
already have such a check.

Attached patch fixes the bug.

--
Peter Geoghegan

Attachments:

0001-Fix-NULL-pointer-dereference-in-tuplesort.c.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-NULL-pointer-dereference-in-tuplesort.c.patchDownload
From ce24bff1aad894b607ee1ce67757efe72c5acb93 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 16 Jan 2017 10:14:02 -0800
Subject: [PATCH] Fix NULL pointer dereference in tuplesort.c

This could cause a crash when an external datum tuplesort of a
pass-by-value type required multiple passes.
---
 src/backend/utils/sort/tuplesort.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index cbaf009..e1e692d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2800,7 +2800,8 @@ mergeonerun(Tuplesortstate *state)
 		WRITETUP(state, destTape, &state->memtuples[0]);
 
 		/* recycle the slot of the tuple we just wrote out, for the next read */
-		RELEASE_SLAB_SLOT(state, state->memtuples[0].tuple);
+		if (state->memtuples[0].tuple)
+			RELEASE_SLAB_SLOT(state, state->memtuples[0].tuple);
 
 		/*
 		 * pull next tuple from the tape, and replace the written-out tuple in
-- 
2.7.4

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Geoghegan (#3)
Re: Tuple sort is broken. It crashes on simple test.

2017-01-16 19:24 GMT+01:00 Peter Geoghegan <pg@heroku.com>:

On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Indeed. It crashes for me immediately by adding an ORDER BY:
select count(distinct t) from seq_tab order by 1;

The problem was that one particular call to the macro
RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
already have such a check.

Attached patch fixes the bug.

Should not be enhanced regress tests too?

Regards

Pavel

Show quoted text

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Geoghegan
pg@heroku.com
In reply to: Pavel Stehule (#4)
Re: Tuple sort is broken. It crashes on simple test.

On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Should not be enhanced regress tests too?

We already have coverage of multi-pass external tuplesorts, as of a
few months back. That didn't catch this bug only because it was a
pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call
does have line coverage already.

I wouldn't object to adding a test case that would have exercised this
bug, too. It took me a while to talk Tom into the test that was added
several months back, which discouraged me from adding another test
case here. (There were concerns about the overhead of an external sort
test on slower buildfarm animals.)

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#3)
Re: Tuple sort is broken. It crashes on simple test.

Peter Geoghegan <pg@heroku.com> writes:

The problem was that one particular call to the macro
RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
already have such a check.

Attached patch fixes the bug.

Pushed, thanks.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Geoghegan (#5)
Re: Tuple sort is broken. It crashes on simple test.

2017-01-16 19:43 GMT+01:00 Peter Geoghegan <pg@heroku.com>:

On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Should not be enhanced regress tests too?

We already have coverage of multi-pass external tuplesorts, as of a
few months back. That didn't catch this bug only because it was a
pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call
does have line coverage already.

I wouldn't object to adding a test case that would have exercised this
bug, too. It took me a while to talk Tom into the test that was added
several months back, which discouraged me from adding another test
case here. (There were concerns about the overhead of an external sort
test on slower buildfarm animals.)

ok

Show quoted text

--
Peter Geoghegan