BUG #8410: out of binary heap slots

Started by Terje Eldeover 12 years ago20 messagesbugs
Jump to latest
#1Terje Elde
terje@elde.net

The following bug has been logged on the website:

Bug reference: 8410
Logged by: Terje Elde
Email address: terje@elde.net
PostgreSQL version: Unsupported/Unknown
Operating system: FreeBSD
Description:

Running:

PostgreSQL 9.3beta1 on amd64-portbld-freebsd9.1, compiled by cc (GCC) 4.2.1
20070831 patched [FreeBSD], 64-bit

I'm getting "out of binary heap slots", which offcourse spoils the fun of
the query.

I'm having trouble reproducing it, as I'm only seeing the issue in about 1
in 20 000 queries.

I can get the error, turn right around and run the same again manually, and
it'll run just fine.

I'd love to see if I can reproduce in on 9.3rc1, but seeing it is rare
enough as it is.

Slightly anonymised from the logs:

ERROR: out of binary heap slots
CONTEXT: PL/pgSQL function foo(integer,bigint,character[],timestamp without
time zone,integer,timestamp without time zone,integer,timestamp without time
zone,integer,inet,character varying) line 233 at FETCH
STATEMENT: SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o
FROM foo( 100, 2221::bigint, ARRAY['m', 'f', '', ' ', NULL]::char[],
'2013-07-05T19:11:41.958154'::timestamp, 30::int, NULL::timestamp, 10::int,
'2013-08-30T19:11:41.958168'::timestamp::timestamp, 100::int,
'123.123.123.123'::inet, 'FOO/1.1 CFNetwork/609.1.4 Darwin/13.0.0'::varchar
)

The FETCH-line is running off of a cursor, with a query going pretty much
like this:

OPEN curs FOR SELECT * FROM (
SELECT
'n'::char AS noo, p.id, p.pub, p.details, i.hash AS foo,
p.bam, p.goo,
e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS
likes,
p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE barzz =
p.id ) AS tags
FROM
p, ps, i, e, s
WHERE
s.tihi = i_cid AND
s.v = True AND
p.pub = s.id AND
p.id = ps.id AND
i.id = p.foo AND
e.id = p.bar AND
i.baz IS NOT NULL AND
p.bam = ANY( i_bam ) AND
endt > NOW() AND
startt < NOW() + INTERVAL '15 minutes' AND
p.startt > i_newerthan
ORDER BY p.startt DESC LIMIT i_maxnew
FOR UPDATE OF ps ) AS newer_than
UNION SELECT * FROM (
SELECT
'u'::char AS noo, p.id, p.pub, p.details, i.hash AS foo,
p.bam, p.goo,
e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS
likes,
p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE bazz =
p.id ) AS tags
FROM
p, ps, i, e, s
WHERE
s.tihi = i_cid AND
s.v = True AND
p.pub = s.id AND
p.id = ps.id AND
i.id = p.foo AND
e.id = p.bar AND
i.baz IS NOT NULL AND
p.bam = ANY( i_bam ) AND
endt > NOW() AND
startt < NOW() + INTERVAL '15 minutes' AND
startt > i_oldest AND
p.last_change > i_newerthan
ORDER BY p.last_change DESC LIMIT i_maxupdates
FOR UPDATE OF ps ) AS updated;

Yeah, I know. Mangled/obfuscated bits in bug-reports are no fun, but the
code isn't mine to put in a public list.

If you'd like to take a look, I can't imagine it'd be much of a problem for
me to mail you the proper query (and whole function) off-list.

This is not a crisis for us, and I've been meaning to clean up or rewrite
the query anyway, but I figured I should fire off an email anyway, just in
case there's a bug in PostgreSQL that hasn't been caught since beta1.

(again, sorry about the old version).

Explain analyze gives this plan (again anonymized a bit, but can send proper
off-list):

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=423.84..424.15 rows=31 width=223) (actual
time=0.668..0.675 rows=30 loops=1)
-> Append (cost=35.50..422.67 rows=31 width=223) (actual
time=0.123..0.608 rows=30 loops=1)
-> Subquery Scan on apples (cost=35.50..412.18 rows=30 width=223)
(actual time=0.123..0.600 rows=30 loops=1)
-> Limit (cost=35.50..411.88 rows=30 width=217) (actual
time=0.122..0.591 rows=30 loops=1)
-> LockRows (cost=35.50..2494.50 rows=196 width=217)
(actual time=0.121..0.584 rows=30 loops=1)
-> Nested Loop (cost=35.50..2492.54 rows=196
width=217) (actual time=0.113..0.544 rows=30 loops=1)
-> Nested Loop (cost=35.23..410.51
rows=196 width=207) (actual time=0.085..0.343 rows=30 loops=1)
-> Nested Loop (cost=34.95..319.23
rows=202 width=164) (actual time=0.078..0.264 rows=32 loops=1)
-> Nested Loop
(cost=34.81..263.23 rows=211 width=147) (actual time=0.073..0.203 rows=32
loops=1)
-> Merge Append
(cost=34.66..172.40 rows=354 width=141) (actual time=0.067..0.126 rows=32
loops=1)
Sort Key: p.startt
-> Sort
(cost=0.01..0.02 rows=1 width=669) (actual time=0.016..0.016 rows=0
loops=1)
Sort Key:
p.startt
Sort Method:
quicksort Memory: 25kB
-> Seq Scan
on cars p (cost=0.00..0.00 rows=1 width=669) (actual time=0.001..0.001
rows=0 loops=1)
Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
-> Index Scan
Backward using cars_startt on cars_2013 p_1 (cost=0.28..129.49 rows=351
width=136) (actual time=0.043..0.096 rows=32 loops=1)
Index Cond:
((startt < (now() + '00:15:00'::interval)) AND (startt > '2013-07-05
19:11:41.958154'::timestamp without time zone))
Filter:
((endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])))
-> Sort
(cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0
loops=1)
Sort Key:
p_2.startt
Sort Method:
quicksort Memory: 25kB
-> Seq Scan
on cars_2014 p_2 (cost=0.00..13.44 rows=1 width=669) (actual
time=0.000..0.000 rows=0 loops=1)
Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
-> Sort
(cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0
loops=1)
Sort Key:
p_3.startt
Sort Method:
quicksort Memory: 25kB
-> Seq Scan
on cars_2015 p_3 (cost=0.00..13.44 rows=1 width=669) (actual
time=0.000..0.000 rows=0 loops=1)
Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
-> Index Scan using
oranges_pkey on oranges s (cost=0.14..0.25 rows=1 width=10) (actual
time=0.002..0.002 rows=1 loops=32)
Index Cond: (id =
p.agent)
Filter: (visible
AND (carpool = 100))
-> Index Scan using
roadworkers_pkey on roadworkers e (cost=0.14..0.26 rows=1 width=25) (actual
time=0.001..0.001 rows=1 loops=32)
Index Cond: (id =
p.employee)
-> Index Scan using bikes_pkey on
bikes i (cost=0.28..0.44 rows=1 width=55) (actual time=0.002..0.002 rows=1
loops=32)
Index Cond: (id = p.bike)
Filter: (source IS NOT NULL)
Rows Removed by Filter: 0
-> Index Scan using cars_stats_pkey on
cars_stats ps (cost=0.28..0.48 rows=1 width=18) (actual time=0.003..0.003
rows=1 loops=30)
Index Cond: (id = p.id)
SubPlan 2
-> Index Only Scan using fruit_tags_pkey
on fruit_tags fruit_tags_1 (cost=0.28..10.14 rows=3 width=4) (actual
time=0.001..0.002 rows=2 loops=30)
Index Cond: (fruit = p.id)
Heap Fetches: 60
-> Subquery Scan on updated (cost=10.16..10.18 rows=1 width=221)
(actual time=0.005..0.005 rows=0 loops=1)
-> Limit (cost=10.16..10.17 rows=1 width=0) (actual
time=0.005..0.005 rows=0 loops=1)
-> LockRows (cost=10.16..10.17 rows=1 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
-> Sort (cost=10.16..10.16 rows=1 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
Sort Key: last_change
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..10.15 rows=1
width=0) (actual time=0.000..0.000 rows=0 loops=1)
One-Time Filter: false
SubPlan 1
-> Index Only Scan using
fruit_tags_pkey on fruit_tags (cost=0.28..10.14 rows=3 width=4) (never
executed)
Index Cond: (fruit = id)
Heap Fetches: 0
Total runtime: 0.885 ms
(57 rows)

Terje

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Terje Elde
terje@elde.net
In reply to: Terje Elde (#1)
Re: BUG #8410: out of binary heap slots

On Aug 30, 2013, at 10:46 PM, terje@elde.net wrote:

The following bug has been logged on the website:

That didn't work out too well, rendering-wize.

Sending the explain analyze by email as well, hoping it'll be more readable:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=423.84..424.15 rows=31 width=223) (actual time=0.668..0.675 rows=30 loops=1)
-> Append (cost=35.50..422.67 rows=31 width=223) (actual time=0.123..0.608 rows=30 loops=1)
-> Subquery Scan on apples (cost=35.50..412.18 rows=30 width=223) (actual time=0.123..0.600 rows=30 loops=1)
-> Limit (cost=35.50..411.88 rows=30 width=217) (actual time=0.122..0.591 rows=30 loops=1)
-> LockRows (cost=35.50..2494.50 rows=196 width=217) (actual time=0.121..0.584 rows=30 loops=1)
-> Nested Loop (cost=35.50..2492.54 rows=196 width=217) (actual time=0.113..0.544 rows=30 loops=1)
-> Nested Loop (cost=35.23..410.51 rows=196 width=207) (actual time=0.085..0.343 rows=30 loops=1)
-> Nested Loop (cost=34.95..319.23 rows=202 width=164) (actual time=0.078..0.264 rows=32 loops=1)
-> Nested Loop (cost=34.81..263.23 rows=211 width=147) (actual time=0.073..0.203 rows=32 loops=1)
-> Merge Append (cost=34.66..172.40 rows=354 width=141) (actual time=0.067..0.126 rows=32 loops=1)
Sort Key: p.startt
-> Sort (cost=0.01..0.02 rows=1 width=669) (actual time=0.016..0.016 rows=0 loops=1)
Sort Key: p.startt
Sort Method: quicksort Memory: 25kB
-> Seq Scan on cars p (cost=0.00..0.00 rows=1 width=669) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND (endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND (startt < (now() + '00:15:00'::interval)))
-> Index Scan Backward using cars_startt on cars_2013 p_1 (cost=0.28..129.49 rows=351 width=136) (actual time=0.043..0.096 rows=32 loops=1)
Index Cond: ((startt < (now() + '00:15:00'::interval)) AND (startt > '2013-07-05 19:11:41.958154'::timestamp without time zone))
Filter: ((endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])))
-> Sort (cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0 loops=1)
Sort Key: p_2.startt
Sort Method: quicksort Memory: 25kB
-> Seq Scan on cars_2014 p_2 (cost=0.00..13.44 rows=1 width=669) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND (endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND (startt < (now() + '00:15:00'::interval)))
-> Sort (cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0 loops=1)
Sort Key: p_3.startt
Sort Method: quicksort Memory: 25kB
-> Seq Scan on cars_2015 p_3 (cost=0.00..13.44 rows=1 width=669) (actual time=0.000..0.000 rows=0 loops=1)
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND (endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND (startt < (now() + '00:15:00'::interval)))
-> Index Scan using oranges_pkey on oranges s (cost=0.14..0.25 rows=1 width=10) (actual time=0.002..0.002 rows=1 loops=32)
Index Cond: (id = p.agent)
Filter: (visible AND (carpool = 100))
-> Index Scan using roadworkers_pkey on roadworkers e (cost=0.14..0.26 rows=1 width=25) (actual time=0.001..0.001 rows=1 loops=32)
Index Cond: (id = p.employee)
-> Index Scan using bikes_pkey on bikes i (cost=0.28..0.44 rows=1 width=55) (actual time=0.002..0.002 rows=1 loops=32)
Index Cond: (id = p.bike)
Filter: (source IS NOT NULL)
Rows Removed by Filter: 0
-> Index Scan using cars_stats_pkey on cars_stats ps (cost=0.28..0.48 rows=1 width=18) (actual time=0.003..0.003 rows=1 loops=30)
Index Cond: (id = p.id)
SubPlan 2
-> Index Only Scan using fruit_tags_pkey on fruit_tags fruit_tags_1 (cost=0.28..10.14 rows=3 width=4) (actual time=0.001..0.002 rows=2 loops=30)
Index Cond: (fruit = p.id)
Heap Fetches: 60
-> Subquery Scan on updated (cost=10.16..10.18 rows=1 width=221) (actual time=0.005..0.005 rows=0 loops=1)
-> Limit (cost=10.16..10.17 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
-> LockRows (cost=10.16..10.17 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
-> Sort (cost=10.16..10.16 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Sort Key: last_change
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..10.15 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1)
One-Time Filter: false
SubPlan 1
-> Index Only Scan using fruit_tags_pkey on fruit_tags (cost=0.28..10.14 rows=3 width=4) (never executed)
Index Cond: (fruit = id)
Heap Fetches: 0
Total runtime: 0.885 ms
(57 rows)

#3Andres Freund
andres@anarazel.de
In reply to: Terje Elde (#1)
Re: BUG #8410: out of binary heap slots

Hi,

On 2013-08-30 20:46:27 +0000, terje@elde.net wrote:

I'm getting "out of binary heap slots", which offcourse spoils the fun of
the query.

Explain analyze gives this plan (again anonymized a bit, but can send proper
off-list):

Since I reviewed the patch that introduced that message, I'd be
interested in getting that. Ideally in a state where I can reproduce the
issue in a new cluster.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 23:00:15 +0200, Andres Freund wrote:

Hi,

On 2013-08-30 20:46:27 +0000, terje@elde.net wrote:

I'm getting "out of binary heap slots", which offcourse spoils the fun of
the query.

Explain analyze gives this plan (again anonymized a bit, but can send proper
off-list):

Since I reviewed the patch that introduced that message, I'd be
interested in getting that. Ideally in a state where I can reproduce the
issue in a new cluster.

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Terje Elde
terje@elde.net
In reply to: Andres Freund (#4)
Re: BUG #8410: out of binary heap slots

On 30. aug. 2013, at 23:05, Andres Freund <andres@2ndquadrant.com> wrote:

No need, found the bug. And I think can build a testcase myself.

Sounds good. :)

Please do let me know if there's anything I can do.

Terje

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#4)
Re: BUG #8410: out of binary heap slots

Andres Freund wrote:

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Um. Are we missing a binaryheap_clear() method?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: BUG #8410: out of binary heap slots

Andres Freund <andres@2ndquadrant.com> writes:

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Sounds like a bug all right, but I'm not convinced it explains Terje's
report. The thing this theory doesn't explain is why would Terje be
having trouble reproducing the failure? Seems like re-running the same
query ought to produce the same failure.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#6)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 17:15:32 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Um. Are we missing a binaryheap_clear() method?

In the patch I am patch-to-be it's binaryheap_reset(), but yes ;). Are
you already patching it, or do you want me to finish it? I have a not so
nice testcase and I can confirm that this is the issue and that a
binaryheap_reset() fixes it:

SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i);

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 17:23:51 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Sounds like a bug all right, but I'm not convinced it explains Terje's
report. The thing this theory doesn't explain is why would Terje be
having trouble reproducing the failure? Seems like re-running the same
query ought to produce the same failure.

The number of rescans can be rather data-dependant, so I'd guess that's
the reason. If many of the subplans don't return a tuple, it can take
several resets to actually ever reach the heap's limit as we don't
add the subplan to the heap in that case.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Terje Elde
terje@elde.net
In reply to: Tom Lane (#7)
Re: BUG #8410: out of binary heap slots

On Aug 30, 2013, at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sounds like a bug all right, but I'm not convinced it explains Terje's
report. The thing this theory doesn't explain is why would Terje be
having trouble reproducing the failure? Seems like re-running the same
query ought to produce the same failure.

Same query, but possibly different context.

The query originally runs as part of a larger stored procedure.

(mailing the thing out in a separate mail, omitting the list for that part)

Other than the context, the explain analyze was run with the same parameters as the failed query (copypasted from the logged failure). It's possibly that execution might be different as I ran it perhaps half an hour later. Possible, but not likely.

Terje

#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 23:05:25 +0200, Andres Freund wrote:

On 2013-08-30 23:00:15 +0200, Andres Freund wrote:

Hi,

On 2013-08-30 20:46:27 +0000, terje@elde.net wrote:

I'm getting "out of binary heap slots", which offcourse spoils the fun of
the query.

Explain analyze gives this plan (again anonymized a bit, but can send proper
off-list):

Since I reviewed the patch that introduced that message, I'd be
interested in getting that. Ideally in a state where I can reproduce the
issue in a new cluster.

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Ok, patch for that attached. Should we add
SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i);
as a regression test? I slightly on the "no" side...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Reset-the-binary-heap-in-merge-append-rescans.patchtext/x-patch; charset=us-asciiDownload+18-3
#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 17:23:51 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

No need, found the bug. And I think can build a testcase myself.

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Sounds like a bug all right, but I'm not convinced it explains Terje's
report. The thing this theory doesn't explain is why would Terje be
having trouble reproducing the failure? Seems like re-running the same
query ought to produce the same failure.

Even better explanation: The merge append is some steps below a LockRows
node, so the number of rescans might depend on the concurrency because
we'll do the EvalPlanQual dance on a concurrent row update.

Terje, do you use read committed or repeatable read/serializable?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Terje Elde
terje@elde.net
In reply to: Andres Freund (#12)
Re: BUG #8410: out of binary heap slots

On Aug 30, 2013, at 11:49 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Even better explanation: The merge append is some steps below a LockRows
node, so the number of rescans might depend on the concurrency because
we'll do the EvalPlanQual dance on a concurrent row update.

Terje, do you use read committed or repeatable read/serializable?

Using Read Committed.

Terje

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: BUG #8410: out of binary heap slots

Andres Freund <andres@2ndquadrant.com> writes:

Terje, do you use read committed or repeatable read/serializable?

Or even more to the point, can you apply the just-posted patch and see
if the problem goes away for you?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15Terje Elde
terje@elde.net
In reply to: Tom Lane (#14)
Re: BUG #8410: out of binary heap slots

On Aug 31, 2013, at 12:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or even more to the point, can you apply the just-posted patch and see
if the problem goes away for you?

Will do.

At our current organic load though, we likely wouldn't get any good confirmation either way this week.

I'll see if I can set up some synthetic load to provoke the error more rapidly, then patch and re-check.

It's getting quite late, and has been a bit of a long day, so it'll unfortunately have to wait until tomorrow I think, sorry. :(

Terje

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: BUG #8410: out of binary heap slots

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-30 23:05:25 +0200, Andres Freund wrote:

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Ok, patch for that attached.

I think the comments need a bit of copy-editing, but looks good otherwise.
Will fix and commit.

Should we add
SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i);
as a regression test? I slightly on the "no" side...

Not sure. It's pretty disturbing that this wasn't caught earlier;
it seems to me that means there's no regression coverage that hits
ExecReScanMergeAppend. However, I don't much like this specific test case
because it seems like hitting the bug could depend on what series of
random values you get.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 18:55:53 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-30 23:05:25 +0200, Andres Freund wrote:

ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
binaryheap. Thus no new elements fit.

Ok, patch for that attached.

I think the comments need a bit of copy-editing, but looks good otherwise.
Will fix and commit.

Thanks.

Should we add
SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i);
as a regression test? I slightly on the "no" side...

Not sure. It's pretty disturbing that this wasn't caught earlier;
it seems to me that means there's no regression coverage that hits
ExecReScanMergeAppend. However, I don't much like this specific test case
because it seems like hitting the bug could depend on what series of
random values you get.

Hm, that should be fixable. How about:

SELECT
-- correlated subquery, with dependency on outer query, to force rescans
-- which will be planned as a merge-append.
(SELECT g.i FROM (
(SELECT * FROM generate_series(1, 2) ORDER BY 1)
UNION ALL
(SELECT * FROM generate_series(1, 2) ORDER BY 1)
) f(i)
ORDER BY f.i LIMIT 1)
FROM generate_series(1, 3) g(i);

I couldn't find a simpler testcase within some minutes...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#17)
Re: BUG #8410: out of binary heap slots

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-30 18:55:53 -0400, Tom Lane wrote:

Not sure. It's pretty disturbing that this wasn't caught earlier;
it seems to me that means there's no regression coverage that hits
ExecReScanMergeAppend. However, I don't much like this specific test case
because it seems like hitting the bug could depend on what series of
random values you get.

Hm, that should be fixable. How about:

Looks good, applied.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#18)
Re: BUG #8410: out of binary heap slots

On 2013-08-30 19:28:39 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-30 18:55:53 -0400, Tom Lane wrote:

Not sure. It's pretty disturbing that this wasn't caught earlier;
it seems to me that means there's no regression coverage that hits
ExecReScanMergeAppend. However, I don't much like this specific test case
because it seems like hitting the bug could depend on what series of
random values you get.

Hm, that should be fixable. How about:

Looks good, applied.

On second thought, it might not be so good looking - the queries results
are independent of the data from merge-append. So we only check that we
don't crash and not that the results make any sense. How about the
attached patch?
I verified that it fails without the binaryheap_reset().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Improve-regression-test-for-8410.patchtext/x-patch; charset=us-asciiDownload+32-32
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#19)
Re: BUG #8410: out of binary heap slots

Andres Freund <andres@2ndquadrant.com> writes:

On second thought, it might not be so good looking - the queries results
are independent of the data from merge-append. So we only check that we
don't crash and not that the results make any sense. How about the
attached patch?

Good point --- pushed.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs